Metadata-Version: 2.4
Name: autoite
Version: 1.0.0
Summary: Automated Individual Treatment Effect Estimation via residual-based latent environment discovery
Author-email: Jake Peace <mail@jakepeace.me>
License: MIT
Project-URL: Homepage, https://github.com/hotprotato/autoite
Project-URL: Documentation, https://github.com/hotprotato/autoite#readme
Project-URL: Repository, https://github.com/hotprotato/autoite
Project-URL: Issues, https://github.com/hotprotato/autoite/issues
Keywords: causal-inference,treatment-effects,heterogeneous-effects,machine-learning,econometrics,individualized-medicine
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: scipy>=1.7.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: viz
Requires-Dist: matplotlib>=3.7.0; extra == "viz"
Requires-Dist: seaborn>=0.12.0; extra == "viz"
Provides-Extra: comparison
Requires-Dist: econml>=0.14.0; extra == "comparison"
Requires-Dist: lightgbm>=4.0.0; extra == "comparison"
Provides-Extra: all
Requires-Dist: autoite[comparison,dev,viz]; extra == "all"
Dynamic: license-file

# AutoITE: Automated Individual Treatment Effect Estimation

A residual-based approach to causal inference that detects latent heterogeneity through baseline coupling, enabling Just-in-Time discovery of treatment effects.

## Key Insight

Traditional causal inference methods condition on observed features, but latent confounders create hidden subgroups with dramatically different treatment responses. AutoITE exploits **baseline coupling**---the fact that latent confounders affect not just treatment response but also baseline outcomes---to discover these hidden subgroups through residual analysis.

## Installation

```bash
pip install -r requirements.txt
```

## Quick Start

```python
from autoite import AutoITEEstimator, BimodalityDiagnostic

# Fit the model
model = AutoITEEstimator(k=1000, alpha=1.0)
model.fit(X_train, T_train, Y_train, Y_pre_train)

# Predict individual treatment effects
tau_pred = model.predict(X_test, Y_pre_test)

# Check for hidden subgroups
diag = BimodalityDiagnostic()
diag.fit(X_train, Y_pre_train)
result = diag.quantify_unknown(X_test, Y_pre_test)
print(f"Bimodality Score: {result['bimodality_score']:.4f}")
print(f"Interpretation: {result['interpretation']}")
```

## Architecture

1. **Global Ridge**: Baseline model predicting pre-treatment outcomes from features
2. **Residual Computation**: Leave-one-out residuals encode latent causal state
3. **Residual Matching**: k-NN in residual space finds individuals with similar latent states
4. **Local Ridge**: Treatment effects estimated from residual neighbors
5. **Triage**: High-uncertainty cases flagged for expert review

## Key Results

From the accompanying paper:

| Method | Corr(τ̂, U) | Detection Rate | MAE | Median |
|--------|-------------|----------------|-----|--------|
| Causal Forest | 0.00 | 27.3% | 0.230 | 0.042 |
| X-Learner | 0.00 | 27.1% | 0.245 | 0.045 |
| **AutoITE** | **-0.94** | **97.5%** | **0.095** | **0.034** |

AutoITE achieves **59% lower MAE** than Causal Forest (0.095 vs 0.230). With 15% triage, MAE reduces to **0.042**—only 18% of Causal Forest's error—and deaths drop from 8 to **5**.

## Components

### AutoITEEstimator
Core estimator for individual treatment effect prediction.

- `k`: Number of residual neighbors (default: 1000, or use fraction like 0.10)
- `alpha`: Ridge regularization strength
- `triage_percentile`: Fraction of high-uncertainty cases to flag

### BimodalityDiagnostic
Detects hidden subgroups via GMM-based residual analysis.

- Bimodality score < 0.01: No hidden structure
- Bimodality score 0.01-0.05: Weak structure
- Bimodality score 0.05-0.10: Moderate structure
- Bimodality score > 0.10: Strong hidden structure (likely latent confounder)

### UnexplainedHeterogeneityIndex
Measures whether local models improve over global, indicating heterogeneity not captured by observed features.

## Reproducing Paper Results

```bash
cd experiments/paper_experiments
python run_all_experiments.py
```

## Real-World Validation

The UCI Student Performance experiment demonstrates AutoITE on real educational data:

```bash
cd experiments/paper_experiments
python uci_student_intervention.py
```

## Fundamental Limits

AutoITE can detect latent confounders that affect baseline outcomes (baseline coupling). However, **interaction-only confounders**---those that affect ONLY treatment response without leaving baseline fingerprints---are fundamentally undetectable by any observational method.

## Paper

See `paper/auto_ite_final.pdf` for the full manuscript:

> **AutoITE: Residual-Based Individual Treatment Effect Estimation via Baseline Coupling**
>
> Jake Peace, November 2025

## Data Attribution

### UCI Student Performance Dataset

The real-world validation uses the Student Performance dataset from the UCI Machine Learning Repository, provided under the **CC BY 4.0** license.

- **Creator**: Paulo Cortez
- **Source**: https://archive.ics.uci.edu/dataset/320/student+performance
- **DOI**: 10.24432/C5TG7T
- **Citation**: P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.

## License

MIT License - see LICENSE file for details.

## Author

Jake Peace (2025)
