Metadata-Version: 2.4
Name: insurance-sensitivity
Version: 0.1.0
Summary: Global sensitivity analysis for insurance pricing models — variance decomposition via Shapley effects
Project-URL: Homepage, https://github.com/burning-cost/insurance-sensitivity
Project-URL: Repository, https://github.com/burning-cost/insurance-sensitivity
Author-email: Burning Cost <pricing.frontier@gmail.com>
License: MIT
Keywords: global sensitivity analysis,insurance,pricing,sensitivity analysis,shapley effects,sobol indices,variance decomposition
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scikit-learn>=1.3
Requires-Dist: scipy>=1.10
Provides-Extra: all
Requires-Dist: catboost>=1.2; extra == 'all'
Requires-Dist: lightgbm>=4.0; extra == 'all'
Requires-Dist: matplotlib>=3.7; extra == 'all'
Requires-Dist: polars>=0.20; extra == 'all'
Requires-Dist: xgboost>=2.0; extra == 'all'
Provides-Extra: catboost
Requires-Dist: catboost>=1.2; extra == 'catboost'
Provides-Extra: dev
Requires-Dist: matplotlib>=3.7; extra == 'dev'
Requires-Dist: polars>=0.20; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: lightgbm
Requires-Dist: lightgbm>=4.0; extra == 'lightgbm'
Provides-Extra: plots
Requires-Dist: matplotlib>=3.7; extra == 'plots'
Provides-Extra: polars
Requires-Dist: polars>=0.20; extra == 'polars'
Provides-Extra: xgboost
Requires-Dist: xgboost>=2.0; extra == 'xgboost'
Description-Content-Type: text/markdown

# insurance-sensitivity

Global sensitivity analysis for insurance pricing models.

## The problem

You have a fitted pricing model — a GLM, gradient boosted tree, or anything
else with a predict method. You want to know: **which rating factors drive
the most variance in your premiums?**

The naive answer is SHAP. But SHAP decomposes individual predictions, not
portfolio-level variance. For a regulatory submission or a fair value
assessment, you need a statement like "vehicle group explains 34% of the
variance in fitted premiums across our portfolio". That is a different
question, and it needs a different tool.

The standard tool for this is **Sobol indices** — but Sobol first-order
indices are only valid under independent inputs. UK motor rating factors
are not independent. Driver age correlates with NCD level. Postcode
correlates with vehicle type. Sobol S1 indices will over-count the
contribution of factors that are correlated with high-importance factors.

**Shapley effects** (Owen 2014, Song et al. 2016) solve this. They use the
same Shapley formula from cooperative game theory, but applied to variance
decomposition rather than individual predictions. The effects always sum
to V[Y] and are never negative, regardless of correlations.

This library implements Shapley effects with insurance-specific extensions:
- Exposure-weighted variance (mid-term policies, partial-year risks)
- Categorical rating factors via empirical sampling (no encoding)
- CLH subsampling for large portfolios (Rabitti & Tzougas 2025, EAJ)
- Fitted-model interface — pass your model, not parameter distributions

## Installation

```bash
pip install insurance-sensitivity
pip install insurance-sensitivity[plots]   # matplotlib for charts
pip install insurance-sensitivity[polars]  # polars DataFrame input
```

## Quick start

```python
import pandas as pd
from insurance_sensitivity import SensitivityAnalysis

# fitted_glm: any model with a .predict(X) method
# training_df: the data the model was fitted on, with an 'exposure' column

sa = SensitivityAnalysis(
    model=fitted_glm,
    X=training_df,
    exposure_col='exposure',  # year fractions for each policy
    log_scale=True,           # decompose Var[log(fitted)] — right choice
                              # for a multiplicative GLM
    random_state=42,
)

# Shapley effects: correct under correlated inputs
result = sa.shapley(
    n_perms=256,       # more permutations → lower Monte Carlo error
)
print(result)
# ShapleyResult(total_variance=0.1847)
#   vehicle_group: 34.2%
#   ncd_band: 22.1%
#   driver_age: 18.4%
#   area: 11.3%
#   ...

result.plot_bar()  # horizontal bar chart with 95% CIs
result.plot_pie()  # pie chart of % contributions

# Sobol indices: faster, but warns if inputs are correlated
sobol = sa.sobol(n_samples=1024)
sobol.plot_bar()  # S1 and ST side by side
```

## Large portfolios: CLH subsampling

For portfolios with >10k rows, the k-NN step in the Song estimator gets
slow. Rabitti & Tzougas (2025) showed that selecting ~2000 representative
observations via Conditional Latin Hypercube sampling gives results very
close to the full-sample estimate, at a fraction of the cost.

```python
result = sa.shapley(
    n_perms=256,
    n_subsample=2500,  # subsample size (default: use full dataset)
)
```

## Group attributions

If you want attribution at the level of rating factor groups (e.g. all
vehicle-related factors as one group, all driver-related factors as another):

```python
groups = {
    'vehicle': ['vehicle_group', 'vehicle_age', 'cc_band'],
    'driver':  ['driver_age', 'ncd_band', 'licence_years'],
    'area':    ['postcode_area', 'garage_type'],
}
result = sa.shapley(n_perms=256, groups=groups)
# effects DataFrame now has rows: vehicle, driver, area
```

## Interaction effects

```python
interactions = sa.interaction_effects()
# Returns a DataFrame comparing phi_j vs S1_j * V[Y].
# High phi_j - S1_j means factor j acts mostly through interactions,
# not in isolation.
print(interactions[['factor', 'phi', 'S1_abs', 'interaction_pct']])
```

## When to use Shapley effects vs Sobol

**Use Shapley effects** (`.shapley()`) when:
- Your rating factors are correlated (almost always true)
- You need the effects to sum to total variance (required for regulatory use)
- You want a defensible decomposition for fair value / FCA reporting

**Use Sobol indices** (`.sobol()`) when:
- You know your inputs are approximately independent
- You want a faster, rougher estimate for exploration
- You need second-order interaction indices S2(i,j)

The library warns you if you run Sobol on correlated inputs.

## Supported model types

The wrapper handles these automatically:
- **sklearn**: any estimator with `.predict()` or `.predict_proba()`
- **statsmodels**: GLM results with `.predict(exog=X)` signature
- **glum**: `GeneralizedLinearRegressor` with `.predict(X)`
- **LightGBM**: `Booster` and sklearn API
- **XGBoost**: `Booster` and sklearn API
- **CatBoost**: `CatBoostRegressor`, `CatBoostClassifier`

For anything else, pass `predict_fn='my_method_name'`.

## References

Owen, A.B. (2014). Sobol' indices and Shapley value. *SIAM/ASA Journal on
Uncertainty Quantification*, 2(1), 245–251.

Song, E., Nelson, B.L. & Staum, J.C. (2016). Shapley effects for global
sensitivity analysis: Theory and computation. *SIAM/ASA Journal on
Uncertainty Quantification*, 4(1), 1060–1083.

Biessy, G. (2024). Construction of Rating Systems Using Global Sensitivity
Analysis: A Numerical Investigation. *ASTIN Bulletin*, 54(1), 25–45.
DOI: 10.1017/asb.2023.34

Saltelli, A. et al. (2010). Variance based sensitivity analysis of model
output. *Computer Physics Communications*, 181(2), 259–270.

Rabitti, G. & Tzougas, G. (2025). Accelerating the computation of Shapley
effects for datasets with many observations. *European Actuarial Journal*,
15, 885–898. DOI: 10.1007/s13385-025-00412-z
