Metadata-Version: 2.4
Name: insurance-gam
Version: 0.1.9
Summary: Interpretable GAM toolkit for insurance pricing — EBM, Neural Additive Models, and Pairwise Interaction Networks
Project-URL: Homepage, https://burning-cost.github.io/
Project-URL: Repository, https://github.com/burning-cost/insurance-gam
Project-URL: Changelog, https://github.com/burning-cost/insurance-gam/releases
Project-URL: Documentation, https://burning-cost.github.io/insurance-gam
Project-URL: Bug Tracker, https://github.com/burning-cost/insurance-gam/issues
Author-email: Burning Cost <pricing.frontier@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: EBM,GAM,actuarial,ga2m,insurance,interpretable-ml,neural-additive-model,pairwise-interactions,poisson,pricing,tweedie
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: matplotlib>=3.7.0
Requires-Dist: numpy>=2.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: polars>=1.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: scikit-learn>=1.3.0
Provides-Extra: all
Requires-Dist: flask>=3.1.3; extra == 'all'
Requires-Dist: interpret>=0.7.0; extra == 'all'
Requires-Dist: openpyxl>=3.0; extra == 'all'
Requires-Dist: pyasn1>=0.6.3; extra == 'all'
Requires-Dist: statsmodels>=0.14.5; extra == 'all'
Requires-Dist: torch>=2.0.0; extra == 'all'
Requires-Dist: werkzeug>=3.1.6; extra == 'all'
Provides-Extra: dev
Requires-Dist: databricks-sdk>=0.97.0; extra == 'dev'
Requires-Dist: pdoc>=14.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Provides-Extra: ebm
Requires-Dist: flask>=3.1.3; extra == 'ebm'
Requires-Dist: interpret>=0.7.0; extra == 'ebm'
Requires-Dist: pyasn1>=0.6.3; extra == 'ebm'
Requires-Dist: werkzeug>=3.1.6; extra == 'ebm'
Provides-Extra: excel
Requires-Dist: openpyxl>=3.0; extra == 'excel'
Provides-Extra: glm
Requires-Dist: statsmodels>=0.14.5; extra == 'glm'
Provides-Extra: neural
Requires-Dist: torch>=2.0.0; extra == 'neural'
Description-Content-Type: text/markdown

# insurance-gam

[![PyPI](https://img.shields.io/pypi/v/insurance-gam)](https://pypi.org/project/insurance-gam/) [![Downloads](https://img.shields.io/pypi/dm/insurance-gam)](https://pypi.org/project/insurance-gam/) [![Python](https://img.shields.io/pypi/pyversions/insurance-gam)](https://pypi.org/project/insurance-gam/) [![Tests](https://github.com/burning-cost/insurance-gam/actions/workflows/tests.yml/badge.svg)](https://github.com/burning-cost/insurance-gam/actions/workflows/tests.yml) [![License](https://img.shields.io/badge/license-MIT-blue)](https://github.com/burning-cost/insurance-gam/blob/main/LICENSE) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/burning-cost/insurance-gam/blob/main/notebooks/quickstart.ipynb) [![nbviewer](https://img.shields.io/badge/render-nbviewer-orange)](https://nbviewer.org/github/burning-cost/insurance-gam/blob/main/notebooks/quickstart.ipynb)

---

## The problem

GLMs need manual feature engineering to capture non-linear effects. A U-shaped driver age curve requires polynomial terms someone has to specify; a convex NCD discount requires a transformation someone has to choose. Get it wrong and the premium is wrong. Get it right and you have a model that looks well-specified but cannot discover interactions you did not anticipate.

GBMs discover those interactions automatically, but the output — thousands of trees — is not auditable by a pricing committee. A pricing actuary cannot look at a gradient booster and tell you whether the NCD discount curve is actuarially reasonable.

GAMs bridge the gap: each feature gets a smooth non-linear shape function, the output is additive and inspectable factor by factor, and interactions can be represented as pairwise 2D shape functions rather than opaque tree splits.

**Blog post:** [Your Model Is Either Interpretable or Accurate. insurance-gam Refuses That Trade-Off.](https://burning-cost.github.io/2026/03/14/insurance-gam-interpretable-nonlinearity/)

---

## Why this library?

The PRA expects Pillar 2 capital models to be interpretable. The FCA expects pricing models to be explainable. A black-box GBM satisfies neither requirement for a UK insurer. This library gives you three production-grade GAM variants — EBM, Neural Additive Model, and Pairwise Interaction Networks — that produce per-feature shape functions a pricing actuary can read, challenge, and sign off.

All three use the same GLM-family loss structure (Poisson, Tweedie, Gamma) with exposure offsets, so their outputs are directly comparable to your existing GLM. The subpackages are independent by design: importing `insurance_gam.ebm` does not load PyTorch, and vice versa.

---

## Compared to alternatives

| | Standard GLM | GBM (XGBoost/LightGBM) | R `mgcv` | `interpret.ml` EBM standalone | **insurance-gam** |
|---|---|---|---|---|---|
| Non-linear shape functions | Manual polynomials | Yes (opaque) | Yes | Yes | Yes |
| Per-feature relativity table | Yes (linear) | No | Yes | Partial | Yes (`RelativitiesTable`) |
| Pairwise interactions | Manual dummies | Yes (opaque) | Yes | No | Yes (PIN) |
| Poisson/Gamma/Tweedie loss | Yes | Yes | Yes | No | Yes |
| Exposure offset | Yes | Partial | Yes | No | Yes |
| Python-native | Yes | Yes | No | Yes | Yes |
| PRA/FCA-auditable output | Yes | No | Yes | Partial | Yes |

---

## Installation

```bash
pip install "insurance-gam[ebm]"     # EBM only (most common)
pip install "insurance-gam[neural]"  # ANAM and PIN (requires PyTorch)
pip install "insurance-gam[all]"     # everything
# or with uv:
uv add "insurance-gam[ebm]"
```

The three subpackages are independent: `insurance_gam.ebm` loads interpretML, `insurance_gam.anam` and `insurance_gam.pin` load PyTorch. Importing one does not load the other.

---

## Quickstart

```bash
uv add "insurance-gam[ebm]"
```

```python
import numpy as np
import polars as pl
from insurance_gam.ebm import InsuranceEBM, RelativitiesTable

rng = np.random.default_rng(42)
n = 2000

df = pl.DataFrame({
    "driver_age":   rng.integers(17, 75, n).astype(float),
    "vehicle_age":  rng.integers(0, 15, n).astype(float),
    "ncd_years":    rng.integers(0, 9, n).astype(float),
    "annual_miles": rng.integers(3000, 20000, n).astype(float),
    "area":         rng.integers(0, 5, n).astype(float),
})
exposure = rng.uniform(0.3, 1.0, n)
log_rate = (
    -2.5
    + 0.5 * (df["driver_age"].to_numpy() < 25).astype(float)
    - 0.12 * df["ncd_years"].to_numpy()
    + 0.3 * (df["vehicle_age"].to_numpy() > 10).astype(float)
)
y = rng.poisson(np.exp(log_rate) * exposure)

model = InsuranceEBM(loss="poisson", interactions="3x")
model.fit(df[:1600], y[:1600], exposure=exposure[:1600])

rt = RelativitiesTable(model)
print(rt.table("ncd_years"))   # shape_value, relativity — a pricing actuary can read this
print(rt.summary())
```

---

## What's inside

Three subpackages. Import only the one you need.

### `insurance_gam.ebm` — Explainable Boosting Machine

Wraps [interpretML's](https://github.com/interpretml/interpret) `ExplainableBoostingRegressor` with insurance tooling: exposure-aware fit/predict via Poisson/Gamma/Tweedie losses, relativity table extraction, post-fit monotonicity enforcement, and GLM comparison tools.

The `RelativitiesTable` output is directly readable as a rating factor table: NCD years, driver age, vehicle age, each with an auditable curve you can inspect and challenge factor by factor. No post-hoc SHAP required — the shape functions are the model.

```bash
uv add "insurance-gam[ebm]"
```

```python
from insurance_gam.ebm import InsuranceEBM, RelativitiesTable

model = InsuranceEBM(loss="poisson", interactions="3x")
model.fit(X_train, y_train, exposure=exp_train)

rt = RelativitiesTable(model)
print(rt.table("driver_age"))
print(rt.summary())
```

### `insurance_gam.anam` — Actuarial Neural Additive Model

Neural Additive Model (Laub, Pho, Wong 2025) adapted for insurance. One MLP subnetwork per feature, additive aggregation, Poisson/Tweedie/Gamma losses, Dykstra-projected monotonicity constraints. Beats GLMs on deviance metrics while producing per-feature shape functions a pricing team can inspect.

```bash
uv add "insurance-gam[neural]"
```

```python
from insurance_gam.anam import ANAM

model = ANAM(
    loss="poisson",
    monotone_increasing=["vehicle_age"],
    n_epochs=100,
)
model.fit(df, y, sample_weight=exposure)
shapes = model.shape_functions()
shapes["vehicle_age"].plot()
```

### `insurance_gam.pin` — Pairwise Interaction Networks

Neural GA2M (Richman, Scognamiglio, Wüthrich 2025). The prediction decomposes as a sum of pairwise interaction terms — one shared network differentiating all feature pairs by learned interaction tokens. Diagonal terms recover main effects. Captures interactions a GLM would miss while keeping the output interpretable as a sum of 2D shape functions.

```bash
uv add "insurance-gam[neural]"
```

```python
from insurance_gam.pin import PINModel

model = PINModel(
    features={"driver_age": "continuous", "vehicle_age": "continuous",
              "area": 5, "ncd_years": "continuous"},
    loss="poisson",
    max_epochs=200,
)
model.fit(df, y, exposure=exposure)
weights = model.interaction_weights()
effects = model.main_effects(df)
```
## Validated performance

On a 50,000-policy synthetic UK motor book with a known non-linear DGP (U-shaped driver age, convex NCD, hard vehicle age threshold, log-miles loading):

| Method | Gini vs linear GLM | Poisson deviance |
|---|---|---|
| GLM — linear terms only | baseline | baseline |
| GLM — polynomial + manual interaction | +3–5pp | -2–5% |
| `InsuranceEBM` (interactions=3x) | **+5–15pp** | -5–12% |

EBM finds the U-shaped driver age curve and the convex NCD discount without any feature engineering. On a 10,000-policy benchmark, EBM ranks risks ~28% better than a competent GLM by Gini coefficient.

Known caveat: EBM exposure handling via `init_score` can produce inflated absolute deviance figures on some DGPs without affecting risk ordering. Use Gini as the primary comparison metric and validate calibration separately. See the benchmark notebook for details.

Full benchmark: `benchmarks/run_benchmark_databricks.py`. Full validation: `notebooks/databricks_validation.py`.

---

## PRA/FCA context

The PRA's Supervisory Statement SS3/18 on model risk management expects firms to demonstrate that models are interpretable and that their outputs can be challenged by subject matter experts. The FCA's Consumer Duty requires pricing models to produce outcomes that can be explained to customers and the regulator.

A GBM satisfies neither criterion for a primary pricing model. The GAM shape functions produced by this library are the actuarial equivalent of the factor curves a pricing committee signs off in a traditional GLM tariff review — except they are fitted automatically rather than hand-crafted.

---

## Design choices

**Three subpackages, independent imports.** Importing `insurance_gam.ebm` does not load PyTorch. Importing `insurance_gam.anam` does not load interpretML. This matters in production where you may have one platform with interpretML but not PyTorch.

**Exposure-aware throughout.** All subpackages accept an `exposure` parameter and use it correctly in the loss function. This is the same GLM family structure pricing teams already use — model outputs are directly comparable to your existing GLM.

**No post-hoc explainability.** The shape functions are the model. You do not need SHAP values to explain why the model charges what it charges.

---

## Limitations

- Below 5,000 policies the EBM boosting procedure can overfit individual bins. Use a GLM below this threshold.
- EBM's `RelativitiesTable` is extracted from additive log-scale contributions, not multiplicative rating factors. The conversion is an approximation when EBM has learnt interaction terms. Cross-validate segment A/E ratios before implementing derived factors in a production tariff.
- `ANAM` and `PINModel` require PyTorch. Fit time on CPU without GPU: 10–30 minutes on complex datasets. EBM fits in 60–120 seconds on a single CPU.
- Monotonicity constraints in `ANAM` use Dykstra projection. Enforcing monotonicity on a factor that genuinely has non-monotone structure (e.g. declaring driver_age monotone when the U-shape is real) will misfit the model.

---

## Part of the Burning Cost stack

Takes smoothed exposure curves from [insurance-whittaker](https://github.com/burning-cost/insurance-whittaker) or raw rating factors directly. Feeds fitted tariff models into [insurance-conformal](https://github.com/burning-cost/insurance-conformal), [insurance-fairness](https://github.com/burning-cost/insurance-fairness), and [insurance-monitoring](https://github.com/burning-cost/insurance-monitoring). [See the full stack](https://burning-cost.github.io/stack/)

| Library | Description |
|---|---|
| [insurance-whittaker](https://github.com/burning-cost/insurance-whittaker) | Rating table smoothing — smoothed Whittaker curves feed into GAM as calibrated inputs |
| [insurance-fairness](https://github.com/burning-cost/insurance-fairness) | FCA proxy discrimination auditing — shape functions make it easier to isolate proxy effects |
| [insurance-monitoring](https://github.com/burning-cost/insurance-monitoring) | Model drift detection — tracks whether GAM shape functions remain calibrated over time |
| [insurance-causal](https://github.com/burning-cost/insurance-causal) | DML causal inference — establishes whether non-linear effects are genuinely causal |
| [insurance-conformal](https://github.com/burning-cost/insurance-conformal) | Distribution-free prediction intervals — uncertainty quantification around GAM predictions |
| [insurance-governance](https://github.com/burning-cost/insurance-governance) | Model validation and MRM governance — sign-off pack for GAM models entering production |

---

## References

- Laub, Pho, Wong (2025). "An Interpretable Deep Learning Model for General Insurance Pricing." arXiv:2509.08467.
- Richman, Scognamiglio, Wüthrich (2025). "Tree-like Pairwise Interaction Networks." arXiv:2508.15678.
- Lou, Caruana, Gehrke, Hooker (2013). "Accurate intelligible models with pairwise interactions." KDD.

---

## Community

- **Questions?** Start a [Discussion](https://github.com/burning-cost/insurance-gam/discussions)
- **Found a bug?** Open an [Issue](https://github.com/burning-cost/insurance-gam/issues)
- **Blog and tutorials:** [burning-cost.github.io](https://burning-cost.github.io)
- **Training course:** [Insurance Pricing in Python](https://burning-cost.github.io/course) — Module 5 covers GAMs and interpretable non-linear models. £97 one-time.

## Licence

MIT
