Metadata-Version: 2.1
Name: ordinal-xai
Version: 0.1.0
Summary: A Python package for ordinal regression and model-agnostic interpretation methods
Home-page: https://github.com/JWzero/ordinal_xai
Author: Jakob Wankmüller
Author-email: crjakobcr@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scikit-learn
Requires-Dist: scipy
Requires-Dist: matplotlib
Requires-Dist: gower
Requires-Dist: statsmodels
Requires-Dist: xgboost
Requires-Dist: torch
Requires-Dist: skorch
Requires-Dist: dlordinal

# Ordinal XAI

A Python package for explainable ordinal regression models and interpretation methods.

## Overview

This package provides a comprehensive suite of ordinal regression models and interpretation methods, designed to handle ordinal data while providing transparent and interpretable results. The implementation follows scikit-learn's API design, making it easy to integrate with existing machine learning workflows.

## Features

### Models

1. **Cumulative Link Model (CLM)**
   - Supports both logit and probit link functions
   - Handles categorical and numerical features automatically
   - Provides probability estimates for each class

2. **Ordinal Neural Network (ONN)**
   - Neural network architecture for ordinal regression
   - Configurable hidden layers
   - Supports various output layers and loss functions
   - Parameters like learning rate, batch size, dropout, etc. can be tuned
   - Automatic GPU acceleration when available
   - Early stopping to prevent overfitting

3. **Ordinal Binary Decomposition (OBD)**
   - Decomposes ordinal problem into binary classification tasks
   - Supports two decomposition strategies:
     - One-vs-following: Each class vs all higher classes
     - One-vs-next: Each class vs next class only
   - Multiple base classifier options:
     - Logistic Regression
     - SVM
     - Random Forest
     - XGBoost
   - Base classifier parameters can be tuned

The models provided in this package are only examples. The interpretation methods are model-agnostic and designed to work with any ordinal regression model. Necessary requirements for custom models are that they implement the BaseOrdinalModel and sklearn BaseEstimator interfacess.

### Interpretation Methods

1. **Feature Effects Analysis**
   - **Partial Dependence Plots (PDP)**
     - Shows average effect of features on predictions
     - Handles both categorical and numerical features
     - Automatic subplot arrangement
   - **PDP with Probabilities (PDPProb)**
     - Visualizes average probability distributions across feature values
     - Shows class probability changes
     - Detailed probability annotations
   - **Individual Conditional Expectation (ICE)**
     - Analyzes individual instance behavior
     - Shows heterogeneous effects across samples
     - Displays either entire population or individual instances (recommended)
   - **ICE with Probabilities (ICEProb)**
     - Visualizes individual probability distributions across feature values
     - Shows class probability changes per instance
     - Detailed probability annotations at original values
     - Displays either entire population or individual instances (recommended)

2. **Feature Importance Analysis**
   - **Permutation Feature Importance (PFI)**
     - Global feature importance through permutation
     - Multiple evaluation metrics support, subset can be specified
     - Handles both categorical and numerical features
     - Subset of features to perform analysis on can be specified
     - Visualizes feature importance scores in a bar plot
   - **Leave-One-Covariate-Out (LOCO)**
     - Global feature importance through feature removal and refitting
     - Uses train - test split to fit the models and evaluate performance
     - Supports multiple evaluation metrics, subset can be specified
     - Visualizes feature importance scores in a bar plot

3. **Local Explanations**
   - **Local Interpretable Model-agnostic Explanations (LIME)**
     - Provides local explanations for individual predictions
     - Supports both logistic regression and decision tree surrogate models
     - Multiple sampling strategies (grid, uniform, permutation)
     - Customizable kernel functions for sample weighting
     - Visualizes feature importance through coefficients or tree plot

### Datasets

The package includes several benchmark datasets for ordinal regression:

1. **Wine Quality**
   - Combined dataset of red and white wines
   - 6499 samples, 12 features
   - Ordinal target: 3-9 (wine quality rating)
   - Features include physicochemical properties (acidity, sugar, alcohol, etc.)
   - Separate datasets available for red and white wines

2. **Student Performance**
   - Two datasets: Mathematics and Portuguese
   - Mathematics: 397 samples, 33 features
   - Portuguese: 651 samples, 33 features
   - Ordinal targets: G1, G2, G3 (grades in three periods)
   - Features include demographic, social, and educational factors
   - Mixed numerical and categorical features

3. **Feature Importance Test Datasets**
   - FI_simple.csv: 1000 samples
   - FI_test.csv: 1000 samples
   - Designed for testing feature importance methods
   - Contains both numerical and categorical features

4. **Dummy Dataset**
   - 1000 samples
   - Used for testing and demonstration purposes
   - Contains synthetic data with known patterns
   - Labels are representing a health risk level influenced by health, and gender
   - Latent variable: 0.006*age + 0.0005*height + 0.3*gender_w + 0.05*normal(0,1)
   - ranks generated by generating 3 equal-sized buckets based on the latent variable

## Installation

```bash
pip install ordinal-xai
```

## Quick Start

```python
from ordinal_xai.models import CLM, ONN, OBD
from ordinal_xai.interpretation import LIME, LOCO, ICE, ICEProb, PDP, PDPProb, PFI
import pandas as pd
import numpy as np

# Create sample data
X = pd.DataFrame(np.random.randn(100, 5))
y = pd.Series(np.random.randint(0, 3, 100))

# Initialize and train model
model = CLM(link='logit')
model.fit(X, y)

# Make predictions
predictions = model.predict(X)
probabilities = model.predict_proba(X)

# Generate explanations
# Feature Effects
pdp = PDP(model, X, y)
pdp_effects = pdp.explain(features=['feature1', 'feature2'], plot=True)

pdp_prob = PDPProb(model, X, y)
pdp_prob_effects = pdp_prob.explain(features=['feature1', 'feature2'], plot=True)

ice = ICE(model, X, y)
ice_effects = ice.explain(features=['feature1', 'feature2'], plot=True)

ice_prob = ICEProb(model, X, y)
ice_prob_effects = ice_prob.explain(features=['feature1', 'feature2'], plot=True)

# Feature Importance
pfi = PFI(model, X, y)
pfi_importance = pfi.explain(plot=True)

loco = LOCO(model, X, y)
loco_importance = loco.explain(plot=True)

# Local Explanations
lime = LIME(model, X, y)
lime_explanation = lime.explain(observation_idx=0, plot=True)
```

## Documentation

For detailed documentation, including API reference and examples, visit our [documentation page](https://ordinal-xai.readthedocs.io/).

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.


