Metadata-Version: 2.4
Name: insurance-changepoint
Version: 0.1.0
Summary: Bayesian change-point detection for UK insurance pricing time series
Author-email: Burning Cost <pricing.frontier@gmail.com>
License: MIT
Keywords: BOCPD,actuarial,bayesian,changepoint,claims,frequency,insurance,loss ratio,regime detection,severity
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial
Requires-Python: >=3.10
Requires-Dist: jinja2>=3.1
Requires-Dist: matplotlib>=3.6
Requires-Dist: numpy>=1.23
Requires-Dist: polars>=0.20
Requires-Dist: ruptures>=1.1
Requires-Dist: scipy>=1.10
Provides-Extra: dev
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# insurance-changepoint

Bayesian change-point detection for UK insurance pricing time series.

## The problem

Your pricing model was built on data from 2019–2022. It's now 2024. You're running quarterly monitoring — loss ratios, claim frequencies, mean severities. The question is not "has anything changed?" (things always change). The question is "has the underlying regime changed enough that my pricing model is no longer valid?"

Existing tools either tell you nothing about uncertainty (PELT gives you a point estimate, no confidence interval on when the break happened) or ignore the exposure weighting problem (different-sized periods mean different amounts of information — a month with 500 policies is not the same as a month with 5,000).

This library adds:
1. **Exposure-weighted Poisson-Gamma BOCPD** — online Bayesian changepoint detection where claim counts are weighted by earned exposure. This is the correct model for insurance frequency; no other Python package implements it.
2. **Bootstrap CIs on break locations** — when PELT finds a break at period 47, we tell you it's period 47 ± 4 (95% CI).
3. **UK event prior calendar** — Ogden rate changes, COVID lockdown, Whiplash Reform, GIPP, and major storms encoded as hazard multipliers. Known events are easier to detect; the algorithm doesn't ignore what you already know.
4. **Consumer Duty evidence pack** — FCA PRIN 2A.9 compliant HTML report showing monitoring history, detected breaks, and a 'retrain/monitor' recommendation.

## Installation

```bash
pip install insurance-changepoint
```

## Quick start

### Monitor claim frequency

```python
from insurance_changepoint import FrequencyChangeDetector

detector = FrequencyChangeDetector(
    prior_alpha=1.0,    # Gamma prior shape
    prior_beta=12.0,    # Gamma prior rate — E[λ] = alpha/beta
    hazard=0.01,        # Expect a structural break every ~100 periods
    threshold=0.3,      # Flag break if P(changepoint) ≥ 0.3
    uk_events=True,     # Apply UK regulatory event calendar
    event_lines=['motor'],
)

result = detector.fit(
    claim_counts=monthly_claims,
    earned_exposure=vehicle_years,
    periods=month_labels,
)

print(f"{result.n_breaks} regime break(s) detected")
for break_ in result.detected_breaks:
    print(f"  {break_.period_label}: P = {break_.probability:.3f}")
```

### Online update (streaming monitoring)

```python
# After fitting on historical data, update each month
prob = detector.update(n=142, exposure=1850.0)
if prob > 0.3:
    print(f"Regime change signal: P(changepoint) = {prob:.3f}")
```

### Monitor loss ratio (frequency + severity jointly)

```python
from insurance_changepoint import LossRatioMonitor

monitor = LossRatioMonitor(
    lines=['motor'],
    uk_events=True,
    threshold=0.3,
)

result = monitor.monitor(
    claim_counts=monthly_claims,
    exposures=vehicle_years,
    mean_severities=mean_cost_per_claim,
    periods=month_labels,
)

print(result.recommendation)  # 'retrain' or 'monitor'
```

### Retrospective analysis with confidence intervals

```python
from insurance_changepoint import RetrospectiveBreakFinder

finder = RetrospectiveBreakFinder(model='l2', penalty='bic')
breaks = finder.fit(loss_ratio_series, periods=quarter_labels)

for ci in breaks.break_cis:
    print(f"Break at {ci.period_label}, 95% CI: [{ci.lower}, {ci.upper}]")
```

### Consumer Duty report

```python
from insurance_changepoint import ConsumerDutyReport, UKEventPrior

prior = UKEventPrior(lines=['motor'])
report = ConsumerDutyReport(
    result=monitor_result,
    product="Motor Private",
    segment="All risks",
    uk_events=prior.summary(),
)
report.to_html("monitoring_q4_2024.html")
data = report.to_dict()  # JSON-serialisable for audit trail
```

## How BOCPD works (briefly)

BOCPD (Adams & MacKay 2007) maintains a probability distribution over "run lengths" — how many periods have elapsed since the last regime change. At each new observation, it updates this distribution using Bayes' rule. The probability of a changepoint at time t is the probability that the run length just reset to zero.

For insurance claim frequency, the conjugate model is Poisson-Gamma:
- Claims: n_t | λ, e_t ~ Poisson(λ · e_t)
- Prior: λ ~ Gamma(α₀, β₀)
- Predictive: NegativeBinomial(α, β/(β+e))

The exposure e_t appears in the predictive. Periods with more vehicle-years contribute more evidence. This matters: a month with 50,000 policies tells you much more about the underlying frequency than a month with 5,000.

The hazard function H(t) — the prior probability of a changepoint at each step — is normally constant (e.g. 1/100). With UK event priors, it spikes around known events (e.g. 50× at COVID lockdown, March 2020) making the algorithm more sensitive in periods when a structural break was actually plausible.

## UK event calendar

10 events are encoded:

| Event | Date | Lines | Multiplier |
|-------|------|-------|-----------|
| Ogden −0.75% | Mar 2017 | Motor, Liability | 40× |
| Ogden −0.25% | Aug 2019 | Motor, Liability | 20× |
| COVID lockdown | Mar 2020 | Motor, Liability, Property | 50× |
| COVID recovery | Jun 2020 | Motor | 15× |
| Storm Ciara/Dennis | Feb 2020 | Property | 20× |
| Whiplash Reform | May 2021 | Motor | 40× |
| GIPP | Jan 2022 | Motor, Property | 30× |
| Storm Eunice | Feb 2022 | Property | 15× |
| Storm Babet | Oct 2023 | Property | 15× |
| Ogden +0.5% | Jul 2024 | Motor, Liability | 25× |

Multipliers cap at 0.5 (never force a break regardless of event).

## Design notes

**Why not use the `changepoint` package (Rust)?** It does not support exposure weighting or UK insurance priors. Building on top of it would mean we own none of the important parts.

**Why ruptures for PELT?** ruptures is actively maintained, has a clean API, and implements multiple cost models. We add the bootstrap CI layer on top — that is our contribution, not reimplementing PELT.

**Why Jinja2 for reports?** Same reason as insurance-bunching: the HTML template is readable by non-engineers, version-controllable, and produces clean output that works in email clients and SharePoint. No JavaScript, no external CSS.

**Why Normal-Gamma on log-severity?** Severity is log-normal to a good approximation. Working on log scale converts the problem to Gaussian, where the Normal-Gamma conjugate is exact. The alternative (NegBin-Gamma on rounded costs) is messier without being more correct.

## References

- Adams, R.P. & MacKay, D.J.C. (2007). Bayesian online changepoint detection. arXiv:0710.3742.
- Killick, R., Fearnhead, P. & Eckley, I.A. (2012). Optimal detection of changepoints with a linear computational cost. JASA 107(500):1590–1598.
- FCA PS21/5 (2021). General Insurance Pricing Practices.
- FCA PRIN 2A.9 (2023). Consumer Duty — Fair Value.
