Metadata-Version: 2.4
Name: causal-lens
Version: 0.3.1
Summary: Causal inference toolkit for observational data and treatment-effect diagnostics.
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: matplotlib<4.0,>=3.9
Requires-Dist: numpy<3.0,>=1.26
Requires-Dist: pandas<4.0,>=2.2
Requires-Dist: scikit-learn<2.0,>=1.5
Requires-Dist: scipy<2.0,>=1.13
Requires-Dist: statsmodels<0.15,>=0.14
Provides-Extra: dev
Requires-Dist: pytest<9.0,>=8.2; extra == "dev"
Dynamic: license-file

# CausalLens

**Diagnostics-first causal inference for Python.** Estimate treatment effects, inspect overlap and balance, and compare estimators—all with publication-ready diagnostics and plots.

## Why CausalLens?

Most causal inference software focuses on **fitting models quickly**. CausalLens instead focuses on **understanding whether your results are trustworthy**.

- ✅ **Diagnostics bundled with every result**: Overlap, balance improvement, effective sample size, and sensitivity checks are returned alongside estimates—not as optional afterthoughts
- ✅ **Specification transparency**: Compare regression, matching, IPW, and doubly robust estimates side-by-side to reveal model sensitivity and build confidence through agreement (or diagnose problems when estimates diverge)
- ✅ **Publication-ready in one command**: Exports formatted benchmark tables, Love plots, propensity histograms, sensitivity curves, and subgroup summaries—ready for manuscript submission
- ✅ **Falsification tests integrated**: Placebo tests and Rosenbaum bounds are CLI-integrated, not separate scripts, so stress-testing your assumptions becomes the default workflow
- ✅ **Python + pandas native**: Designed for scikit-learn-compatible workflows and seamless pandas integration

**Example:**
```python
# Load data, run estimation, inspect diagnostics—all in one object
from causal_lens import DoublyRobustEstimator
dr = DoublyRobustEstimator("treatment", "outcome", confounders)
result = dr.fit(data)
print(result.summary())  # Shows effect, CI, p-value, AND overlap/balance diagnostics
```

Compare this to typical workflows where you fit a model, then manually write separate scripts to check overlap, balance, and sensitivity. CausalLens makes diagnostics inseparable from estimation.

## Snapshot

- Lane: Observational data causal inference and treatment-effect diagnostics
- Domain: Tabular observational data with binary treatments
- Stack: Python (pandas, scikit-learn, statsmodels, matplotlib)
- Estimators: regression adjustment, propensity matching, IPW, doubly robust, 2SLS, difference-in-differences, synthetic control, plus heterogeneous effects and sensitivity analysis
- Publication-oriented: Exports benchmark tables, charts, and sensitivity summaries optimized for peer review

## Overview

CausalLens packages core causal-inference workflows for observational tabular data into a small, testable Python library. The initial release is designed around practical treatment-effect estimation rather than theory-heavy experimentation: estimate treatment effects, inspect overlap and balance, and compare estimators with consistent result objects.

The current repository now uses four complementary evidence tracks:

- a fixed public-safe observational intervention sample under `data/` for reproducible article figures and tests
- public benchmark datasets drawn from the causal inference literature for externally recognizable evaluation
- synthetic known-effect data for correctness-oriented validation of estimator behavior
- a formal Monte Carlo simulation study evaluating estimator bias, RMSE, coverage, and SE calibration across five data-generating processes

## What It Demonstrates

- Propensity score estimation with a scikit-learn logistic model with standardized covariates
- Regression-adjustment treatment effects with statsmodels OLS
- Nearest-neighbor propensity matching with optional calipers and Abadie-Imbens analytic standard errors
- Inverse probability weighting with stabilized weights and weight capping for ATE and ATT targets
- Doubly robust estimation that combines outcome and propensity models with weight trimming
- Cross-fitted doubly robust estimation (DML/AIPW style) with 5-fold out-of-fold nuisance estimates to avoid overfitting bias
- Flexible doubly robust estimation using gradient boosting outcome models for nonlinear confounding
- T-learner and S-learner meta-learners for conditional average treatment effect (CATE) estimation with optional GBM
- Analytic standard errors from OLS (regression), Hajek sandwich variance (IPW), Abadie-Imbens matched-pair variance (matching), and semiparametric influence functions (doubly robust)
- Covariate-balance summaries using standardized mean differences and variance ratios
- Kish effective sample size for weighted estimators to detect unstable weights
- Common-support and overlap diagnostics for positivity review
- Additive-bias sensitivity summaries for explain-away analysis on the outcome scale
- E-values for unmeasured confounding (VanderWeele & Ding 2017) quantifying the minimum confounder strength to explain away the effect
- Rosenbaum sensitivity bounds for matched-pair designs quantifying hidden-bias tolerance
- Placebo/falsification tests on pre-treatment outcomes for specification validation
- Subgroup treatment-effect summaries for quick heterogeneous-effect review
- A small command-line demo that exports a reproducible causal report
- A real-style observational intervention fixture for stable estimator-comparison tests
- Publication-oriented methodology notes explaining why the initial estimator set is justified
- Reference parity tests against manual formulas and direct statistical-model fits
- Paper-ready chart and table exports for estimator comparison, balance, sensitivity, and subgroup effects
- Love plots showing covariate-level balance before and after adjustment with standard |SMD| thresholds
- Propensity-score overlap histograms for visual positivity assessment
- Manuscript drafting docs, figure captions, and cross-dataset benchmark tables for the software-paper path
- Packaged public benchmarks based on Lalonde and NHEFS so installed users can reproduce the evidence stack without a source checkout
- Literature comparison table showing CausalLens results match published reference values from Dehejia & Wahba (1999) and Hernán & Robins (2020)
- Repeated-run stability analysis across seeds, bootstrap counts, and caliper settings
- External comparison script verifying CausalLens matches manual sklearn/statsmodels implementations to machine precision
- Difference-in-differences estimator with regression-based ATT, cluster-robust standard errors, and a parallel-trends pre-test
- Synthetic control method with constrained least-squares donor weights and placebo inference via leave-one-out permutation
- Two-stage least squares (2SLS) instrumental variables estimator with proper IV variance, first-stage F-statistic, and weak-instrument detection
- Monte Carlo simulation framework with five DGPs (linear, nonlinear outcome, nonlinear propensity, double nonlinear, strong confounding) evaluating bias, RMSE, coverage, and SE calibration ratio
- IPW standard errors corrected for propensity-score estimation uncertainty via the Lunceford & Davidian (2004) stacked estimating equations adjustment

## Current Output

The default command writes `outputs/causal_report.json` with:

- a fixed real-style observational dataset section with estimator comparisons
- a Lalonde benchmark section with public observational training-program data, using light propensity-overlap trimming for the weighting estimators
- an NHEFS benchmark section with public smoking-cessation observational data
- a synthetic validation dataset section with known-effect comparisons
- overlap summary and propensity score range checks
- covariate balance before/after weighting, variance ratios, and effective sample sizes
- lightweight bootstrap intervals for the selected estimate
- analytic standard errors and p-values from influence functions (DR, IPW) and OLS (regression)
- additive-bias sensitivity summaries with E-values for the primary doubly robust estimate
- subgroup treatment-effect estimates
- placebo/falsification test results on pre-treatment outcomes
- Rosenbaum sensitivity bounds for matched-pair designs
- external comparison and stability-analysis summaries for the exported benchmark artifacts

It also writes paper-oriented artifacts under `outputs/charts/` and `outputs/tables/` including:

- estimator comparison charts with confidence intervals
- balance before/after summary charts
- sensitivity curves
- subgroup effect charts
- estimator summary tables in CSV and Markdown
- `external_comparison.csv` showing parity against manual sklearn/statsmodels implementations
- `stability_raw.csv` and `stability_summary.csv` capturing repeated-run variability across benchmark settings
- `placebo_test.csv` showing falsification test results on pre-treatment outcomes
- `rosenbaum_bounds.csv` showing matched-pair sensitivity to hidden bias at each Gamma level
- Love plots and propensity-score overlap histograms for each benchmark dataset

## Next Upgrade Path

- add article figures, benchmark tables, and formal estimator-comparison writeups for DiD, synthetic control, and IV
- add regression discontinuity design (RDD) and bunching estimators
- expand simulation study to additional sample sizes and publish summary tables

All cross-sectional estimators, panel-data methods, IV, and simulation infrastructure are now in place.

## Installation

```bash
pip install .
```

Or in development mode:

```bash
pip install -e .
```

For review or replication work:

```bash
pip install -e .[dev]
```

## Quick Start

```python
from causal_lens import (
    generate_synthetic_observational_data,
    RegressionAdjustmentEstimator,
    DoublyRobustEstimator,
    CrossFittedDREstimator,
    DifferenceInDifferences,
    TwoStageLeastSquares,
    run_quick_simulation,
    summarize_simulation,
)

# --- Cross-sectional estimators ---
data = generate_synthetic_observational_data(rows=600, seed=42)
confounders = ["age", "severity", "baseline_score"]

reg = RegressionAdjustmentEstimator("treatment", "outcome", confounders)
result_reg = reg.fit(data)

dr = CrossFittedDREstimator("treatment", "outcome", confounders)
result_dr = dr.fit(data)

for r in [result_reg, result_dr]:
    print(f"{r.method:35s}  effect={r.effect:.2f}  SE={r.se:.3f}  p={r.p_value:.4f}")

# --- Panel data: Difference-in-Differences ---
import pandas as pd
panel = pd.DataFrame({"unit": [1,1,2,2], "period": [0,1,0,1],
                      "treat": [1,1,0,0], "y": [3.0,7.0,2.0,4.0]})
did = DifferenceInDifferences("unit", "period", "treat", "y")
result_did = did.fit(panel)
print(f"DiD ATT={result_did.att:.2f}  SE={result_did.se:.3f}")

# --- Monte Carlo simulation study ---
raw = run_quick_simulation()
summary = summarize_simulation(raw)
print(summary[["dgp", "estimator", "bias", "rmse", "coverage"]].to_string(index=False))
```

## Real-World Use Case: Complete Workflow

Here is how CausalLens handles a complete real-world analysis from data loading through diagnostics, estimation, and result inspection:

```python
import pandas as pd
import matplotlib.pyplot as plt
from causal_lens import (
    RegressionAdjustmentEstimator,
    PropensityMatcher,
    IPWEstimator,
    DoublyRobustEstimator,
    export_propensity_overlap,
    export_balance_summary,
)

# Load real observational data (e.g., NHEFS smoking cessation study)
data = pd.read_csv("data/nhefs_complete.csv")
print(f"Dataset: {len(data)} observations, {len(data.columns)} variables")

# Define analysis parameters
treatment_col = "treatment"      # Binary treatment assignment (1=quit smoking, 0=continue)
outcome_col = "weight_change"    # Outcome: change in weight (kg)
confounders = [
    "age", "sex", "race", "education",
    "baseline_weight", "baseline_smoking_intensity"
]

# Inspect the data
print(f"\nTreatment: {data[treatment_col].sum()} treated, {(1-data[treatment_col]).sum()} control")
print(f"Outcome mean (treated): {data[data[treatment_col]==1][outcome_col].mean():.2f} kg")
print(f"Outcome mean (control): {data[data[treatment_col]==0][outcome_col].mean():.2f} kg")
print(f"Raw difference: {data[data[treatment_col]==1][outcome_col].mean() - data[data[treatment_col]==0][outcome_col].mean():.2f} kg")

# --- Method 1: Regression Adjustment (fast, transparent) ---
print("\n" + "="*60)
print("1. REGRESSION ADJUSTMENT")
print("="*60)
reg = RegressionAdjustmentEstimator(treatment_col, outcome_col, confounders, bootstrap_repeats=100)
result_reg = reg.fit(data)
print(result_reg.summary())  # Human-readable output with diagnostics

# --- Method 2: Propensity Matching (robust to model misspecification) ---
print("\n" + "="*60)
print("2. PROPENSITY MATCHING (caliper=0.01)")
print("="*60)
matcher = PropensityMatcher(
    treatment_col, 
    outcome_col, 
    confounders, 
    caliper=0.01,
    bootstrap_repeats=100
)
result_match = matcher.fit(data)
print(result_match.summary())
print(f"\nMatched pairs: {result_match.treated_count} pairs")

# --- Method 3: IPW with propensity trimming (efficient but variance-sensitive) ---
print("\n" + "="*60)
print("3. INVERSE PROBABILITY WEIGHTING (propensity trim: 0.05-0.95)")
print("="*60)
ipw = IPWEstimator(
    treatment_col, 
    outcome_col, 
    confounders,
    propensity_trim_bounds=(0.05, 0.95),  # Exclude extreme propensity scores
    bootstrap_repeats=100
)
result_ipw = ipw.fit(data)
print(result_ipw.summary())
print(f"Effective sample size (treated): {result_ipw.diagnostics.ess_treated:.0f}")
print(f"Effective sample size (control): {result_ipw.diagnostics.ess_control:.0f}")

# --- Method 4: Doubly Robust (combines outcome + propensity strengths) ---
print("\n" + "="*60)
print("4. DOUBLY ROBUST (AIPW style)")
print("="*60)
dr = DoublyRobustEstimator(
    treatment_col, 
    outcome_col, 
    confounders,
    propensity_trim_bounds=(0.05, 0.95),
    bootstrap_repeats=100
)
result_dr = dr.fit(data)
print(result_dr.summary())

# --- DIAGNOSTICS & QUALITY CHECKS ---
print("\n" + "="*60)
print("DIAGNOSTIC SUMMARY ACROSS METHODS")
print("="*60)

results = [result_reg, result_match, result_ipw, result_dr]
summary_table = pd.DataFrame([
    {
        "Method": r.method,
        "Effect": f"{r.effect:.2f}",
        "95% CI": f"[{r.ci_low:.2f}, {r.ci_high:.2f}]" if r.ci_low else "N/A",
        "p-value": f"{r.p_value:.4f}" if r.p_value else "N/A",
        "Overlap": "✓" if r.diagnostics.overlap_ok else "✗",
        "Balance (before)": f"{sum(r.diagnostics.balance_before.values())/len(r.diagnostics.balance_before):.4f}",
        "Balance (after)": f"{sum(r.diagnostics.balance_after.values())/len(r.diagnostics.balance_after):.4f}",
    }
    for r in results
])
print(summary_table.to_string(index=False))

# --- SENSITIVITY ANALYSIS ---
print("\n" + "="*60)
print("SENSITIVITY ANALYSIS (primary doubly robust)")
print("="*60)
sensitivity = dr.sensitivity_analysis(data, steps=6)
print("\nBias scenarios:")
for scenario in sensitivity.scenarios:
    print(
        f"  Bias={scenario.bias:.2f}: adjusted effect={scenario.adjusted_effect:.2f}, "
        f"CI=[{scenario.adjusted_ci_low:.2f}, {scenario.adjusted_ci_high:.2f}]"
    )
print(f"\nE-value: {sensitivity.e_value:.2f} (minimum confounder strength to explain away effect)")

# --- SUBGROUP ANALYSIS ---
print("\n" + "="*60)
print("HETEROGENEOUS TREATMENT EFFECTS (by sex)")
print("="*60)
subgroups = dr.subgroup_analysis(data, subgroup_col="sex")
for sg in subgroups:
    print(
        f"  {sg.subgroup}: effect={sg.effect:.2f}, CI=[{sg.ci_low:.2f}, {sg.ci_high:.2f}], "
        f"n={sg.rows} ({sg.treated_count} treated)"
    )

# --- VISUALIZATION ---
print("\n" + "="*60)
print("GENERATING PUBLICATION-READY FIGURES")
print("="*60)

# 1. Propensity score overlap check
export_propensity_overlap(dr, data, output_path="nhefs_propensity_overlap.png")
print("✓ Propensity overlap histogram: nhefs_propensity_overlap.png")

# 2. Balance summary (before/after adjustment)
export_balance_summary(dr, data, output_path="nhefs_balance_summary.png")
print("✓ Balance summary plot: nhefs_balance_summary.png")

# 3. Estimator comparison with CIs
from causal_lens.reporting import export_estimator_comparison
estimates_dict = {
    "Regression": (result_reg.effect, result_reg.ci_low, result_reg.ci_high),
    "Matching": (result_match.effect, result_match.ci_low, result_match.ci_high),
    "IPW": (result_ipw.effect, result_ipw.ci_low, result_ipw.ci_high),
    "Doubly Robust": (result_dr.effect, result_dr.ci_low, result_dr.ci_high),
}
export_estimator_comparison(estimates_dict, output_path="nhefs_estimator_comparison.png")
print("✓ Estimator comparison plot: nhefs_estimator_comparison.png")

# --- CONCLUSION ---
print("\n" + "="*60)
print("ANALYSIS COMPLETE")
print("="*60)
print(f"Primary estimate (doubly robust): {result_dr.effect:.2f} kg")
print(f"95% CI: [{result_dr.ci_low:.2f}, {result_dr.ci_high:.2f}]")
print(f"p-value: {result_dr.p_value:.4e}")
print(f"\nEstimator agreement: effects range from {min([r.effect for r in results]):.2f} to {max([r.effect for r in results]):.2f} kg")
print(f"This suggests moderate specification robustness.")
print("\nAll diagnostics, sensitivity analyses, and figures are generated above.")
print("Exported plots are publication-ready (high-DPI, no label overlap).")
```

This example demonstrates:
- **Loading & inspecting real data**: dataset size, sample composition, raw associations
- **Trying multiple methods**: regression, matching, IPW, doubly robust—each with different parameter choices
- **Diagnostic outputs**: overlap checks, balance improvement, effective sample sizes, p-values
- **Result inspection**: human-readable summary() method showing all key metrics at once
- **Sensitivity analysis**: bias scenarios and E-values quantifying confounding robustness
- **Subgroup analysis**: heterogeneous treatment effects by covariate
- **Visualization**: exportable publication-ready figures (propensity overlap, balance before/after, estimator comparison)
- **Interpretation**: estimator agreement as a specification-robustness signal

Users can adapt this template to their own datasets by changing column names, confounders, and parameter choices (e.g., caliper, propensity trimming, bootstrap repeats).



Run the test suite:

```bash
pytest
```

Regenerate the default report and paper-oriented artifacts:

```bash
causal-lens
```

This writes the JSON report plus tracked charts and tables under `outputs/charts/` and `outputs/tables/`.

## Submission-Facing Assets

- `README.md` provides installation, scope, and reviewer-facing reproduction commands.
- `CITATION.cff` provides machine-readable citation metadata.
- `LICENSE` provides the repository license.
- `docs/methodology.md`, `docs/reference-validation.md`, and `docs/limitations-and-assumptions.md` provide manuscript-supporting narrative.
- `outputs/charts/` and `outputs/tables/` contain the tracked benchmark artifacts used in the current evidence stack.

## Documentation

See [docs/architecture.md](docs/architecture.md) for the design notes.
See [docs/methodology.md](docs/methodology.md) for assumptions, reasoning, and estimator justification.
See [docs/public-benchmarks.md](docs/public-benchmarks.md) for the public dataset choices and benchmark rationale.
See [docs/benchmark-interpretation.md](docs/benchmark-interpretation.md) for a results-oriented reading of the current benchmark artifacts.
See [docs/reference-validation.md](docs/reference-validation.md) for executable validation logic tied to the future journal article.
See [docs/limitations-and-assumptions.md](docs/limitations-and-assumptions.md) for a paper-ready limitations section.

## Citation

Citation metadata is available in `CITATION.cff`.
