Metadata-Version: 2.1
Name: pyiblm
Version: 0.1.0
Summary: Interpretable Boosted Linear Model (IBLM): A transparent machine learning approach combining generalized linear models with gradient boosting
License: MIT
Author: Your Name
Author-email: you@example.com
Requires-Python: >=3.12,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Provides-Extra: all
Provides-Extra: explainability
Provides-Extra: visualization
Requires-Dist: altair (>=5.4.0,<6.0.0) ; extra == "visualization" or extra == "all"
Requires-Dist: altair-saver (>=0.5.0,<0.6.0) ; extra == "visualization" or extra == "all"
Requires-Dist: joblib (>=1.3.0,<2.0.0)
Requires-Dist: numpy (>=2.0.0,<3.0.0)
Requires-Dist: pandas (>=2.0.0,<3.0.0)
Requires-Dist: plotnine (>=0.15.0,<0.16.0) ; extra == "visualization" or extra == "all"
Requires-Dist: pydantic (>=2.10.0,<3.0.0)
Requires-Dist: scikit-learn (>=1.5.0,<2.0.0)
Requires-Dist: shap (>=0.45.0,<0.46.0) ; extra == "explainability" or extra == "all"
Requires-Dist: statsmodels (>=0.14.0,<0.15.0)
Requires-Dist: vl-convert-python (>=1.6.1,<2.0.0) ; extra == "visualization" or extra == "all"
Requires-Dist: xgboost (>=2.1.0,<3.0.0)
Description-Content-Type: text/markdown

# PyIBLM: Interpretable Boosted Linear Model

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)

**PyIBLM** is a Python package implementing the Interpretable Boosted Linear Model (IBLM), a transparent machine learning approach that combines the interpretability of Generalized Linear Models (GLMs) with the predictive power of gradient boosting.

## Features

- 🎯 **Interpretable by design**: Combines GLM transparency with boosting performance
- 📊 **Multiple model families**: Poisson, Tweedie, Gaussian, and more (via statsmodels)
- 🚀 **Gradient boosting integration**: Uses scikit-learn's HistGradientBoostingRegressor and XGBoost
- 📈 **SHAP explanations**: Built-in feature importance and contribution analysis
- 🔍 **Comprehensive diagnostics**: Pinball scores, deviance metrics, and model comparisons
- 📉 **Visualization tools**: Beta corrections, density plots, and correction corridors

## Installation

### Basic Installation
```bash
pip install pyiblm
```

### With Visualization Support
```bash
pip install pyiblm[visualization]
```

### With Explainability Features
```bash
pip install pyiblm[explainability]
```

### Full Installation
```bash
pip install pyiblm[all]
```

## Quick Start

```python
from pyBLM import (
    IBLMModel,
    BoosterConfig,
    GLMConfig,
    TrainingConfig,
    load_freMTPL2freq,
)

# Load example data
data = load_freMTPL2freq("data/freMTPL2freq.csv")
train, validate, test = data.split_into_train_validate_test(seed=123)

# Configure the model
config = TrainingConfig(
    response="ClaimRate",
    glm=GLMConfig(family="poisson"),
    booster=BoosterConfig(
        nrounds=500,
        early_stopping_rounds=20,
        params={"max_depth": 3, "eta": 0.025},
    ),
)

# Train the model
model = IBLMModel(config).fit(train, validate)

# Make predictions
predictions = model.predict(test)

# Get GLM parameters
glm_params = model.get_glm_params()
print(glm_params)
```

## Core Components

### Model Classes
- **`IBLMModel`**: Main model class combining GLM and gradient boosting
- **`BoosterConfig`**: Configuration for the gradient boosting component
- **`GLMConfig`**: Configuration for the GLM component  
- **`TrainingConfig`**: Overall training configuration

### Data Handling
- **`load_freMTPL2freq()`**: Load example insurance dataset
- **`FeaturePreprocessor`**: Automatic feature encoding and preprocessing

### Evaluation
- **`poisson_deviance()`**: Compute Poisson deviance
- **`get_pinball_scores()`**: Multi-model pinball loss comparison
- **`calculate_deviance()`**: Family-based deviance calculation

### Explanation & Visualization
- **`explain()`**: Generate explanation object with SHAP values
- **`IBLMPlotter`**: Visualization utilities for model interpretation
- **`correction_corridor()`**: Visualize model correction patterns
- **`extract_booster_shap()`**: Extract SHAP values from booster

## Documentation

For detailed documentation and tutorials, see:
- `examples/` - Example scripts and use cases
- `dev.ipynb` - Development notebook with comprehensive example

## Development

This package is actively developed. Contributions are welcome!

### Development Setup
```bash
git clone https://github.com/ZZhouGit/pyBLM.git
cd pyBLM
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e ".[all]"
poetry install --with dev
```

### Running Tests
```bash
pytest tests/
```

### Development Notebook
Open `dev.ipynb` in Jupyter to see comprehensive examples:
```bash
jupyter notebook dev.ipynb
```

## Requirements

- Python 3.12+
- pandas >= 2.0.0
- numpy >= 2.0.0
- scikit-learn >= 1.5.0
- xgboost >= 2.1.0
- pydantic >= 2.10.0
- statsmodels >= 0.14.0

Optional dependencies:
- plotnine >= 0.15.0 (for visualization)
- altair >= 5.4.0 (for interactive plots)
- shap >= 0.45.0 (for SHAP explanations)

## Citation

If you use PyBLM in your research, please cite:

```bibtex
@software{pyiblm2025,
  title={PyIBLM: Interpretable Boosted Linear Models},
  author={Your Name},
  year={2025},
  url={https://github.com/ZZhouGit/pyBLM},
}
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Authors

- Your Name

## Acknowledgments

Built with [scikit-learn](https://scikit-learn.org/), [XGBoost](https://xgboost.readthedocs.io/), [SHAP](https://github.com/shap/shap), and [statsmodels](https://www.statsmodels.org/).

