Metadata-Version: 2.4
Name: insurance-competing-risks
Version: 0.1.0
Summary: Fine-Gray subdistribution hazard regression for competing risks — built for insurance pricing
Project-URL: Homepage, https://github.com/burning-cost/insurance-competing-risks
Project-URL: Repository, https://github.com/burning-cost/insurance-competing-risks
Author-email: Burning Cost <pricing.frontier@gmail.com>
License: MIT
Keywords: actuarial,competing-risks,cumulative-incidence,fine-gray,insurance,subdistribution-hazard,survival-analysis
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Requires-Dist: lifelines>=0.27
Requires-Dist: matplotlib>=3.7
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.10
Provides-Extra: dev
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# insurance-competing-risks

Fine-Gray subdistribution hazard regression for competing risks — built for insurance pricing.

## The problem

When a policy can exit in more than one way, standard survival models are wrong.

A motor policy that lapses cannot also generate a mid-term cancellation. A house that burns cannot also flood. Once one event happens, the others are permanently prevented. These are **competing risks**, and they require a different statistical framework.

The standard fix — fitting a separate Cox model per cause and treating the other causes as censored — answers the wrong question. It tells you how the hazard rate *among currently-at-risk subjects* changes with covariates. It does not tell you how the *probability* of a specific exit route changes. For pricing, underwriting, and retention analysis, you almost always want the probability.

**Fine and Gray (1999)** solved this. Their subdistribution hazard model has a one-to-one correspondence with the Cumulative Incidence Function (CIF): the probability that cause k occurs before time t, given covariates. Fit a Fine-Gray model, and you can directly predict "what is the probability this customer lapses within 12 months?" while properly accounting for mid-term cancellation and claim-driven churn as competing events.

## The gap this fills

No pure-Python, pip-installable library provides Fine-Gray regression:

- **lifelines**: has Aalen-Johansen CIF, no Fine-Gray regression
- **scikit-survival**: non-parametric CIF from v0.24, no regression
- **hazardous**: gradient-boosted CIF, no interpretable SHRs
- **cmprsk** (Python): wraps R via rpy2, requires R runtime
- **pydts**: discrete time only

`insurance-competing-risks` fills the gap with a pure NumPy/SciPy implementation.

## Insurance use cases

**Home insurance — competing perils**: model time-to-first-claim where causes are fire, escape of water, flood, and subsidence. The Fine-Gray CIF gives the probability of each peril being the first reported, accounting for the fact that claiming flood prevents a separate subsidence claim on the same policy.

**Retention analysis**: a policy exits via lapse, mid-term cancellation (MTC), non-taken-up (NTU), or claim-driven churn. Fine-Gray on premium uplift and tenure directly estimates the lapse probability at renewal, properly accounting for competing exits.

**Motor claims**: first claim type (own damage, TPPD, TPBI, windscreen, theft) as competing events. Useful for understanding which perils drive early claims by risk segment.

## Installation

```bash
pip install insurance-competing-risks
```

## Quick start

```python
from insurance_competing_risks import FineGrayFitter, AalenJohansenFitter
from insurance_competing_risks.datasets import simulate_insurance_retention

df = simulate_insurance_retention(n=1000, seed=0)

# 1. Non-parametric CIF: what is the marginal lapse probability over time?
aj = AalenJohansenFitter()
aj.fit(df["T"], df["E"], event_of_interest=1)
aj.plot()  # step plot with 95% confidence band

# 2. Regression: how does premium uplift affect lapse probability?
fg = FineGrayFitter()
fg.fit(
    df[["T", "E", "premium_uplift", "tenure_years", "ncd_years"]],
    duration_col="T",
    event_col="E",
    event_of_interest=1,  # lapse
)
print(fg.summary)  # SHR, 95% CI, p-value per covariate

# 3. Predict CIF for new customers
import numpy as np
times = np.array([0.25, 0.5, 1.0])  # policy years
cif = fg.predict_cumulative_incidence(df.head(5), times=times)
print(cif)  # shape (5, 3): probability of lapsing before each time

# 4. Partial effects: how does a 20% vs 5% premium uplift change lapse risk?
fg.plot_partial_effects_on_outcome("premium_uplift", values=[-0.05, 0.10, 0.30])
```

## Modules

| Module | What it does |
|--------|-------------|
| `cif` | Aalen-Johansen non-parametric CIF estimator with confidence bands |
| `fine_gray` | Fine-Gray regression: `FineGrayFitter` with lifelines-compatible API |
| `gray_test` | Gray's K-sample test for CIF equality across groups |
| `metrics` | IPCW Brier score, integrated Brier score, cause-specific C-index, calibration curves |
| `datasets` | Bone marrow transplant benchmark; synthetic insurance retention data |
| `plots` | Forest plot, stacked CIF, Brier score over time |

## Fine-Gray: the key ideas

The **subdistribution hazard** for cause k is:

```
lambda_k(t) = -d/dt log(1 - F_k(t))
```

where F_k(t) is the CIF. This is modelled proportionally:

```
lambda_k(t | x) = lambda_k0(t) * exp(beta_k' x)
```

Because of the one-to-one relationship between the subdistribution hazard and the CIF, exp(beta_k) is the **subdistribution hazard ratio (SHR)**. An SHR of 1.5 for premium uplift means the subdistribution hazard for lapse is 50% higher for each unit increase in premium uplift — which translates directly to a higher CIF (higher lapse probability), though not proportionally.

The key estimation challenge is the **extended risk set**: subjects who already experienced a competing event remain in the risk set (with downweighted IPCW weights), reflecting that they are still "at risk" of the cause-k event in the subdistribution sense. This is what makes Fine-Gray different from cause-specific Cox.

## Model summary output

```
Fine-Gray Subdistribution Hazard Model
Event of interest: 1
Duration column: T
Event column: E
Log partial-likelihood: -487.3201

                coef  exp(coef)  se(coef)      z         p  lower_95%  upper_95%
covariate
premium_uplift  1.52       4.57      0.21   7.24  4.5e-13       1.11       1.93
tenure_years   -0.14       0.87      0.03  -4.81  1.5e-06      -0.20      -0.08
ncd_years      -0.05       0.95      0.02  -2.50  1.2e-02      -0.09      -0.01
```

## Gray's test

Before fitting a regression model, test whether the CIFs differ between groups:

```python
from insurance_competing_risks import gray_test

result = gray_test(df["T"], df["E"], df["rating_band"], event_of_interest=1)
print(result)
# Gray's 3-Sample CIF Test (cause 1)
#   chi^2 = 12.34  df = 2  p = 0.0021
```

## Evaluation

```python
from insurance_competing_risks.metrics import (
    competing_risks_brier_score,
    integrated_brier_score,
    competing_risks_c_index,
)

times = np.linspace(0.1, 2.0, 20)
cif_test = fg.predict_cumulative_incidence(test_df, times=times)

# Brier score at each time
bs = competing_risks_brier_score(
    cif_test, test_df["T"], test_df["E"],
    train_df["T"], train_df["E"],
    times, event_of_interest=1
)

# Integrated Brier Score
ibs = integrated_brier_score(
    cif_test, test_df["T"], test_df["E"],
    train_df["T"], train_df["E"],
    times, event_of_interest=1
)
print(f"IBS: {ibs:.4f}")  # lower is better; 0.25 = useless model
```

## References

Fine, J.P. & Gray, R.J. (1999). A proportional hazards model for the subdistribution of a competing risk. *Journal of the American Statistical Association*, 94(446), 496–509.

Gray, R.J. (1988). A class of K-sample tests for comparing the cumulative incidence of a competing risk. *Annals of Statistics*, 16(3), 1141–1154.

Milhaud, X. & Dutang, C. (2018). Lapse tables for lapse risk management in insurance: a competing risk approach. *European Actuarial Journal*, 8(1), 97–126.

Putter, H., Fiocco, M. & Geskus, R.B. (2007). Tutorial in biostatistics: Competing risks and multi-state models. *Statistics in Medicine*, 26(11), 2389–2430.

---

Part of the [Burning Cost](https://github.com/burning-cost) insurance pricing library ecosystem.
