Metadata-Version: 2.4
Name: insurance-counterfactual-sets
Version: 0.1.0
Summary: Weighted conformal prediction sets for individual insurance counterfactuals (Lei & Candès 2021) with sensitivity analysis (Jin et al. 2023) and FCA harm reporting
Project-URL: Homepage, https://github.com/burning-cost/insurance-counterfactual-sets
Project-URL: Repository, https://github.com/burning-cost/insurance-counterfactual-sets
Project-URL: Issues, https://github.com/burning-cost/insurance-counterfactual-sets/issues
Author-email: Burning Cost <pricing.frontier@gmail.com>
License: MIT
License-File: LICENSE
Keywords: causal inference,conformal prediction,consumer duty,counterfactual,individual treatment effects,insurance,sensitivity analysis,weighted conformal inference
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.9
Requires-Dist: jinja2>=3.1
Requires-Dist: numpy>=1.24
Requires-Dist: polars>=0.20
Requires-Dist: pyarrow>=12.0
Requires-Dist: scikit-learn>=1.3
Requires-Dist: scipy>=1.10
Provides-Extra: catboost
Requires-Dist: catboost>=1.2; extra == 'catboost'
Provides-Extra: dev
Requires-Dist: pyarrow>=12.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# insurance-counterfactual-sets

Finite-sample valid prediction sets for individual insurance counterfactuals.

The problem this solves: you want to know what a specific policyholder *would have paid* under a different pricing treatment — say, new-business pricing instead of renewal pricing. A point estimate of the counterfactual is not enough; you need a prediction set with a rigorous coverage guarantee.

This library implements Lei & Candès (2021) weighted conformal inference for counterfactuals, with Jin et al. (2023) sensitivity analysis and FCA Consumer Duty harm reporting.

## Why conformal inference for counterfactuals?

Standard causal inference (DML, TMLE) gives you average treatment effects with asymptotic confidence intervals. If you want individual-level prediction sets — valid for each policyholder, not just on average — you need conformal methods.

The key properties:
- **Finite-sample marginal coverage**: no asymptotic approximations, no distributional assumptions on outcomes
- **Handles covariate shift**: importance weighting from propensity scores reweights the calibration distribution to match each test point
- **Heteroskedasticity**: conformalized quantile regression (CQR) adapts interval width to local variability in claims
- **Sensitivity analysis**: Gamma-values tell you how large unmeasured confounding would need to be to invalidate your conclusion

## What it is not

- The coverage guarantee is *marginal* (averaged across test policyholders), not conditional (for each individual). Conditional coverage is much harder and not provided here.
- This is a screening tool for Consumer Duty / ICOBS 6B review. It is not a legal determination of harm.
- The Gamma-value quantifies sensitivity under the *marginal* sensitivity model (Tan 2006), not all possible confounding structures.

## Installation

```bash
pip install insurance-counterfactual-sets
```

With optional CatBoost support:

```bash
pip install "insurance-counterfactual-sets[catboost]"
```

## Quick start

```python
import numpy as np
from sklearn.linear_model import Ridge, LogisticRegression
from insurance_counterfactual_sets import (
    WeightedConformalITE,
    PropensityWeighter,
    SensitivityAnalyzer,
    FCAHarmReport,
)

# Assume X, Y, T are your training data arrays
# T=0: new business, T=1: renewal (or whichever treatment is relevant)

# 1. Split into train / calibration / test
n = len(X)
train_idx = np.arange(n // 2)
cal_idx = np.arange(n // 2, 3 * n // 4)
test_idx = np.arange(3 * n // 4, n)

# 2. Fit
pw = PropensityWeighter(LogisticRegression(max_iter=1000))
model = WeightedConformalITE(
    outcome_model=Ridge(),
    propensity_model=pw,
    alpha=0.05,        # 95% coverage
    nonconformity="cqr",  # CQR default: adaptive width
)
model.fit(X[train_idx], Y[train_idx], T[train_idx])
model.calibrate(X[cal_idx], Y[cal_idx], T[cal_idx])

# 3. Predict: what would renewal policyholders have paid as new customers?
sets = model.predict_counterfactual(X[test_idx], treatment_arm=0)
# Returns a Polars DataFrame: lower, upper, point_estimate, half_width

# 4. ITE prediction sets
ite = model.predict_ite(X[test_idx])
# ite_lower, ite_upper, ite_point — Minkowski sum of Y(1) and Y(0) sets

# 5. Sensitivity analysis
sa = SensitivityAnalyzer(model)
gamma_df = sa.gamma_report(X[test_idx], treatment_arm=0)
# gamma_value: smallest Gamma that would invalidate the conclusion
# robust: True if gamma_value > 1.5

# 6. FCA harm report
renewal_premiums = ...  # actual premiums charged
report = FCAHarmReport(conformal_ite=model, sensitivity_analyzer=sa)
individual = report.individual_harm_assessment(
    X[test_idx], Y[test_idx], renewal_premiums
)
summary = report.portfolio_summary(individual)
report.fca_attestation_pack(individual, output_dir="./evidence")
```

## Core API

### `WeightedConformalITE`

The main class.

```python
WeightedConformalITE(
    outcome_model=None,      # sklearn regressor, default Ridge
    propensity_model=None,   # PropensityWeighter, default LogisticRegression
    alpha=0.05,              # miscoverage level
    method='split',          # split conformal (only option currently)
    nonconformity='cqr',     # 'cqr' or 'abs'
)
```

- `fit(X, Y, T)`: fit outcome models on training fold
- `calibrate(X_cal, Y_cal, T_cal)`: compute nonconformity scores on calibration fold
- `predict_counterfactual(X, treatment_arm=0)`: returns Polars DataFrame with `lower`, `upper`, `point_estimate`, `half_width`
- `predict_ite(X)`: ITE = Y(1) - Y(0) via Minkowski sum

**Nonconformity scores:**

`abs`: residual score `R_i = |Y_i - mu_t(X_i)|`. Simple, interpretable.

`cqr`: conformalized quantile regression score `R_i = max(q_lo(X_i) - Y_i, Y_i - q_hi(X_i))`. Gives tighter intervals when the outcome variance depends on covariates (typical for insurance claims). This is the default.

### `PropensityWeighter`

```python
PropensityWeighter(
    estimator=None,           # sklearn classifier, default LogisticRegression
    calibrate=True,           # Platt scaling via CalibratedClassifierCV
    calibration_method='sigmoid',
    clip_quantile=0.99,       # clip weights at 99th percentile
    min_propensity=0.01,      # hard floor on propensity scores
)
```

- `fit(X, T)`: fit propensity model
- `predict_propensity(X)`: returns P(T=1|X)
- `predict_weights(X, T)`: returns clipped importance weights
- `check_overlap(X, T)`: returns dict with ESS, weight diagnostics, overlap warning

**Weight clipping**: without clipping, a single calibration unit with a tiny propensity score can dominate the weighted quantile. The 99th percentile clip is a practical robustness measure. You can disable it with `clip_quantile=1.0`.

### `SensitivityAnalyzer`

```python
SensitivityAnalyzer(
    conformal_ite,           # fitted WeightedConformalITE
    gamma_grid=None,         # coarse grid for search, default arange(1, 5.1, 0.25)
)
```

- `robust_prediction_set(X_test, gamma, treatment_arm=0)`: prediction set valid under Gamma
- `gamma_value(X_test, treatment_arm=0, null_value=0.0, gamma_max=10.0)`: min Gamma that invalidates conclusion
- `gamma_report(X_test, treatment_arm=0)`: Gamma-value per test unit as Polars DataFrame
- `ite_gamma_value(X_test, null_ite=0.0)`: Gamma-value for the ITE interval

**Interpreting Gamma-values:**
- Gamma = 1.0: no unmeasured confounding needed to shift the conclusion (fragile)
- Gamma = 1.5: odds of treatment could be 1.5x larger/smaller than estimated
- Gamma = 3.0: very robust — confounders would need to triple the treatment odds

### `FCAHarmReport`

```python
FCAHarmReport(
    conformal_ite,            # fitted WeightedConformalITE
    sensitivity_analyzer=None # optional SensitivityAnalyzer
)
```

- `individual_harm_assessment(X_renewals, Y_actual, renewal_premium)`: per-policyholder harm flags
- `portfolio_summary(individual_results)`: aggregate statistics
- `fca_attestation_pack(individual_results, output_dir)`: write CSV + HTML evidence pack

**Harm definition**: a policyholder is flagged if their renewal premium exceeds the upper bound of the `(1-alpha)` prediction set for what they would have paid as a new customer.

## The algorithm

For test point `x` and target arm `t`:

1. Compute nonconformity scores `R_i` for calibration units in arm `t`
2. Compute importance weights: `w_i = e(x)^[t=1] * (1-e(x))^[t=0] / (e(X_i)^[t=1] * (1-e(X_i))^[t=0])`
3. Augmented weighted quantile at level `(n+1)(1-alpha)/n`
4. Prediction set: `[mu_t(x) - Q_w, mu_t(x) + Q_w]` (abs) or `[q_lo(x) - Q_w, q_hi(x) + Q_w]` (CQR)

The augmented quantile (Tibshirani et al. 2019) appends an infinite score with weight 1 before the quantile computation. This is the finite-sample correction that guarantees coverage.

## Overlap and diagnostics

Before trusting the prediction sets, check that your propensity model has adequate overlap:

```python
diag = pw.check_overlap(X_cal, T_cal)
print(diag)
# {'ess': 0.72, 'weight_max': 4.2, 'overlap_warning': False, ...}
```

A low effective sample size (ESS < 0.3) is a warning sign: the weighted quantile is being driven by a small number of calibration units. Consider using a more flexible propensity model or restricting the analysis to a population with better overlap.

## References

- Lei, L. & Candès, E.J. (2021). Conformal Inference of Counterfactuals and Individual Treatment Effects. *JRSS-B* 83(5):911-938. arXiv:2006.06138.
- Jin, Y., Ren, Z. & Candès, E.J. (2023). Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach. *PNAS* 120(6). arXiv:2111.12161.
- Romano, Y., Patterson, E. & Candès, E.J. (2019). Conformalized Quantile Regression. *NeurIPS 2019*. arXiv:1905.03222.
- Tibshirani, R.J., Barber, R.F., Candès, E.J. & Ramdas, A. (2019). Conformal Prediction Under Covariate Shift. *NeurIPS 2019*. arXiv:1904.06019.
- Tan, Z. (2006). A distributional approach for causal inference using propensity scores. *JASA* 101(476):1619-1637.

## License

MIT. See LICENSE.
