Metadata-Version: 2.4
Name: pyiblm
Version: 2.0.1
Summary: Interpretable Boosted Linear Models: GLM + XGBoost ensemble with SHAP-based interpretability
Project-URL: Homepage, https://ifoa-adswp.github.io/pyIBLM/
Project-URL: Repository, https://github.com/IFoA-ADSWP/pyIBLM
Project-URL: Bug Tracker, https://github.com/IFoA-ADSWP/pyIBLM/issues
Project-URL: Documentation, https://ifoa-adswp.github.io/pyIBLM/
Author-email: Paul Beard <paul.beard.actuarial@gmail.com>, Karol Gawlowski <kg.actuarial@gmail.com>
Maintainer-email: Paul Beard <paul.beard.actuarial@gmail.com>
License: MIT
License-File: LICENSE
Keywords: actuarial,boosting,glm,insurance,interpretable,shap,xgboost
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Requires-Dist: matplotlib>=3.7
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.11
Requires-Dist: seaborn>=0.13
Requires-Dist: statsmodels>=0.14
Requires-Dist: xgboost>=2.0
Provides-Extra: data
Requires-Dist: rdata>=0.10; extra == 'data'
Provides-Extra: dev
Requires-Dist: ipykernel; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# pyIBLM

## Interpretable Boosted Linear Models

[![PyPI version](https://img.shields.io/pypi/v/pyiblm.svg)](https://pypi.org/project/pyiblm/)
[![Python versions](https://img.shields.io/pypi/pyversions/pyiblm.svg)](https://pypi.org/project/pyiblm/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://img.shields.io/pypi/dm/pyiblm.svg)](https://pypi.org/project/pyiblm/)

---

### Overview

**pyIBLM** implements *Interpretable Boosted Linear Models* — a hybrid modelling approach that combines the transparency of Generalized Linear Models (GLMs) with the predictive power of gradient boosting.

The model is a two-stage ensemble:

1. A **GLM** is fitted on the training data, producing interpretable coefficient estimates.
2. An **XGBoost booster** is trained on the GLM residuals, using the GLM's linear predictor as its base margin — learning only what the GLM could not capture.

Depending on the link function, the two components are combined as:

- **Multiplicative** (log-link families: Poisson, Gamma, Tweedie):
  `prediction = GLM prediction × Booster correction`
- **Additive** (identity-link families: Gaussian):
  `prediction = GLM prediction + Booster correction`

SHAP values decompose the booster correction back onto the original GLM feature scale, making the full model auditable and interpretable at the individual prediction level.

The package provides:

- Fitting of IBLM models across Poisson, Quasi-Poisson, Gamma, Tweedie, and Gaussian families
- SHAP-based explainability tools with beta coefficient visualisations
- Model comparison via pinball scores and correction corridor plots
- Bundled insurance pricing datasets (`freMTPLmini`, `freMTPL2freq`)

An equivalent **R package** is available on CRAN:
🔗 [https://CRAN.R-project.org/package=IBLM](https://CRAN.R-project.org/package=IBLM)

---

### Installation

Install the released version from PyPI:

```bash
pip install pyiblm
```

To use `load_freMTPL2freq()` (downloads the full French MTPL dataset), install with the optional `data` dependency:

```bash
pip install "pyiblm[data]"
```

---

### Quick start

```python
import numpy as np
from iblm import (
    load_freMTPLmini,
    split_into_train_validate_test,
    IBLM,
    ExplainIBLM,
    get_pinball_scores,
)

# Load and prepare data
df = load_freMTPLmini()
df["LogExposure"] = np.log(df["Exposure"])
df = df.drop(columns=["Exposure"])

df_dict = split_into_train_validate_test(df, seed=9000)

# Fit model
model = IBLM()
model.fit(
    df_dict,
    response_var="ClaimNb",
    offset_var="LogExposure",
    family="poisson",
)

# Evaluate
scores = get_pinball_scores(df_dict["test"], model)
print(scores)

# Explain
ex = ExplainIBLM(model, df_dict["test"])
fig = ex.beta_corrected_scatter("DrivAge", color="VehPower")
fig.show()
```

### Documentation

For full documentation on the R implementation (functions, methods and theoretical background):

🔗 [https://ifoa-adswp.github.io/IBLM/](https://ifoa-adswp.github.io/IBLM/)

---

### Contributing

Contributions are welcome. To report a bug or suggest a feature, please open an issue on GitHub:

🔗 [https://github.com/IFoA-ADSWP/pyIBLM/issues](https://github.com/IFoA-ADSWP/pyIBLM/issues)

---

### Citation

If you use **pyIBLM** in research or teaching, please cite it as:

> Gawlowski, K. and Beard, P. (2026). *pyIBLM: Interpretable Boosted Linear Models.* Python package version 2.0.1.

---

### Authors

- **Karol Gawlowski** — [kg.actuarial@gmail.com](mailto:kg.actuarial@gmail.com)
- **Paul Beard** — [paul.beard.actuarial@gmail.com](mailto:paul.beard.actuarial@gmail.com)
Additional contributions by **Zhouwen Zhou**.

---

### License

This package is licensed under the **MIT License**.
See the [`LICENSE`](LICENSE) file for full details.
