Metadata-Version: 2.4
Name: econguard
Version: 0.1.0
Summary: The identification contract engine for the LLM era.
Author-email: DAGger Contributors <dagr@example.com>
License: Apache-2.0
License-File: LICENSE
Keywords: DAG,MCP,causal-DAG,causal-inference,difference-in-differences,econometrics,instrumental-variables
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.12
Requires-Dist: fastapi>=0.111
Requires-Dist: mcp>=1.0
Requires-Dist: numpy>=2.0
Requires-Dist: polars>=1.0
Requires-Dist: pydantic>=2.7
Requires-Dist: rich>=13.0
Requires-Dist: scipy>=1.13
Requires-Dist: statsmodels>=0.14
Requires-Dist: structlog>=24.0
Requires-Dist: uvicorn>=0.30
Provides-Extra: dev
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: hypothesis>=6.100; extra == 'dev'
Requires-Dist: ipykernel>=6.0; extra == 'dev'
Requires-Dist: jupyter>=1.0; extra == 'dev'
Requires-Dist: matplotlib>=3.8; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: nbconvert>=7.0; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: estimators
Requires-Dist: doubleml>=0.9; extra == 'estimators'
Requires-Dist: linearmodels>=6.0; extra == 'estimators'
Requires-Dist: pyfixest>=0.25; extra == 'estimators'
Provides-Extra: r-bridge
Requires-Dist: rpy2>=3.5; extra == 'r-bridge'
Description-Content-Type: text/markdown

<div align="center">

```
██████╗  █████╗  ██████╗  ██████╗ ███████╗██████╗
██╔══██╗██╔══██╗██╔════╝ ██╔════╝ ██╔════╝██╔══██╗
██║  ██║███████║██║  ███╗██║  ███╗█████╗  ██████╔╝
██║  ██║██╔══██║██║   ██║██║   ██║██╔══╝  ██╔══██╗
██████╔╝██║  ██║╚██████╔╝╚██████╔╝███████╗██║  ██║
╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝
```

**The identification contract engine for the LLM era.**

*Your AI agent doesn't test for parallel trends. DAGger does.*


[![PyPI version](https://badge.fury.io/py/dagr.svg)](https://badge.fury.io/py/dagr)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-3776ab.svg)](https://python.org)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

[**Quickstart**](#quickstart) · [**Why DAGger?**](#why-dagr) · [**Architecture**](#architecture) · [**MCP Server**](#mcp-server) · [**References**](#references)

</div>

---

## The Problem

Modern AI tooling has made econometric *execution* trivially easy and causal *validity* invisibly catastrophic.

Ask an AI agent to run a DiD analysis. It will produce a beautifully formatted coefficient table with stars, clustered standard errors, and a significant p-value. What it will never do: test whether parallel trends hold, check for anticipation effects, or verify that your instrument has a strong first stage.

**The output looks like science. It is causal fraud.**

This isn't a model capability failure — it's an architectural one. There is no software primitive that makes *"I must validate my identification strategy before I can estimate"* a programmable constraint rather than a vague checklist item in a methods section.

DAGger is that primitive.

---

## The Solution

```python
import dagr as dg

# 1. Declare your identification strategy — before touching data
contract = dg.DiffInDiffContract(
    estimand=dg.Estimand.ATT,
    outcome_var="log_employment",
    treatment_var="min_wage_increase",
    time_var="year",
    unit_var="county_fips",
    assumptions=frozenset([
        dg.Assumption.PARALLEL_TRENDS,
        dg.Assumption.NO_ANTICIPATION,
    ]),
    pre_periods=(-4, -3, -2, -1),
    post_periods=(0, 1, 2, 3),
)

# 2. Run the preflight battery — or don't estimate
with dg.AuditLedger(contract=contract, experiment_id="min_wage_2024",
                    ledger_path="artifacts/ledger.jsonld") as ledger:
    preflight = contract.validate(data, verbose=True)
    ledger.attach_preflight(preflight)
    preflight.assert_valid()          # raises IdentificationError if INVALID

    # 3. Estimate — @requires_contract is satisfied by the AuditLedger context
    results = contract.build_estimator(data).fit()
    ledger.attach_results(results)

# 4. Quantify robustness
report = ledger.generate_report()
print(report.sensitivity.rr_breakdown_m_bar)   # breakdown M* for Rambachan-Roth
print(report.sensitivity.oster_delta)          # Oster delta for selection on unobservables
```

The preflight renders this in your terminal:

```
+----------------------------------+-------------------+-----------+-----------+
| Assumption                       | Status            | Statistic |  p-value  |
+----------------------------------+-------------------+-----------+-----------+
| parallel_trends                  |  VALID            |  F=0.421  |  p=0.657  |
| no_anticipation                  |  VALID            |  F=0.183  |  p=0.831  |
| no_differential_attrition        |  VALID            |    -      |    -      |
+----------------------------------+-------------------+-----------+-----------+

VERDICT: VALID | Pass rate: 100% | Contract: eg:3f4a8c2b1d...
```

---

## Why DAGger?

> **DAG** (Directed Acyclic Graph) is the mathematical foundation of causal inference — Pearl's do-calculus, structural causal models, identification theory.
> **-ger** is the agent suffix: logger, debugger, linter.
> DAGger is the tool that brings DAG-rigour to production pipelines.

The mypy of causal inference.

Three principles:

**1. Contracts before estimation.** An `IdentificationContract` is a Pydantic v2 model that declares your entire causal strategy — estimand, assumptions, sensitivity analyses — in a typed, serializable, content-addressed document. You cannot call `.fit()` without one.

**2. Tests, not checklists.** Every declared assumption is backed by a statistically correct, peer-reviewed test. Parallel trends uses the Rambachan-Roth pre-trend F-test. First stage uses the Olea-Pflueger (2013) effective F, not the Staiger-Stock rule of thumb. The tests are the contract.

**3. Machine-readable by default.** Every result is a Pydantic model with semantic field names and paired interpretation strings. The audit ledger is SHA-256 content-addressed JSON-LD. The MCP server exposes everything to LLM agents natively. Provenance is not an afterthought — it's the architecture.

---

## Quickstart

### Install

```bash
pip install dagr
# With estimators (pyfixest, linearmodels, doubleml):
pip install "dagr[estimators]"
# With R bridge (HonestDiD, rdrobust, synthdid):
pip install "dagr[r-bridge]"
```

### DiD in 5 steps

```python
import dagr as dg
import polars as pl

# Step 1: Load your panel data
data = pl.read_parquet("county_employment_panel.parquet")

# Step 2: Declare the identification contract (pre-registration)
contract = dg.DiffInDiffContract(
    estimand=dg.Estimand.ATT,
    outcome_var="log_employment",
    treatment_var="min_wage_increase",
    time_var="year",
    unit_var="county_fips",
    assumptions=frozenset([
        dg.Assumption.PARALLEL_TRENDS,
        dg.Assumption.NO_ANTICIPATION,
    ]),
    pre_periods=(-4, -3, -2, -1),
    post_periods=(0, 1, 2, 3),
)
contract.to_file("artifacts/contract.json")   # pre-registration artifact

# Step 3: Validate assumptions — the preflight battery
with dg.AuditLedger(contract=contract, experiment_id="min_wage_2024",
                    ledger_path="artifacts/ledger.jsonld") as ledger:
    preflight = contract.validate(data)        # runs all declared tests
    ledger.attach_preflight(preflight)
    preflight.assert_valid()                   # hard gate: stops here if INVALID

    # Step 4: Estimate
    results = contract.build_estimator(data).fit()
    ledger.attach_results(results)

    # Step 5: Sensitivity analysis
    rr = dg.RambachanRothSensitivity(results=results)
    rr_report = rr.compute(pre_period_max_abs=0.03)
    ledger.attach_sensitivity(rr_report)

# Step 6: The machine-readable report
report = ledger.generate_report()
print(report.model_dump_json(indent=2))        # LLM-consumable, SHA-256 signed
```

### Instrumental Variables

```python
contract = dg.IVContract(
    estimand=dg.Estimand.LATE,
    outcome_var="earnings",
    treatment_var="years_education",
    time_var="birth_cohort",
    unit_var="individual_id",
    assumptions=frozenset([
        dg.Assumption.INSTRUMENT_RELEVANCE,    # tested: Olea-Pflueger effective F
        dg.Assumption.INSTRUMENT_EXCLUSION,    # tested: reduced-form plausibility
    ]),
    instruments=("compulsory_schooling_law",),
    endogenous_vars=("years_education",),
    estimator_preference="2SLS",
    pre_periods=(-3, -2, -1),
    post_periods=(0, 1, 2),
)
```

---

## Architecture

```
+-----------------------------------------------------------------------+
|                            DAGR STACK                                 |
+-----------------------------------------------------------------------+
|  Human Researcher (Python API)  |  AI Agent (MCP Tool Call)          |
|               |                 |           |                         |
|               v                             v                         |
|  +--------------------------------------------------+                |
|  |              dagr.contracts                      |                |
|  |  DiffInDiffContract  |  IVContract               |                |
|  |  Pydantic v2, frozen, content-addressed          |                |
|  +---------------------+----------------------------+                |
|                        | .validate(data)                             |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |              dagr.validators                     |                |
|  |  TWFE event-study  |  Olea-Pflueger F            |                |
|  |  Rambachan-Roth    |  Sargan-Hansen J            |                |
|  +---------------------+----------------------------+                |
|                        | ValidationSuiteResult                       |
|                        | VALID / VALID_CONDITIONAL                   |
|                        | FRAGILE / INVALID                           |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |    contract.build_estimator(data).fit()          |                |
|  |  TWFE (pyfixest)  |  2SLS / LIML / GMM-IV        |                |
|  |  Callaway-Sant'Anna  |  AIPW (doubly-robust)     |                |
|  +---------------------+----------------------------+                |
|                        | EconGuardResults                            |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |             dagr.sensitivity                     |                |
|  |  Rambachan-Roth (2023)  |  Oster delta (2019)    |                |
|  |  Spec Curve             |  Rosenbaum bounds      |                |
|  +---------------------+----------------------------+                |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |              dagr.ledger                         |                |
|  |  AuditLedger — SHA-256 content-addressed         |                |
|  |  IdentificationReport — LLM-optimised JSON       |                |
|  |  MCP Server — 4 tools, Claude/GPT-4 native       |                |
|  +--------------------------------------------------+                |
+-----------------------------------------------------------------------+
```

---

## MCP Server

DAGger exposes its full validation and sensitivity stack as an MCP server that any LLM agent can call natively.

```bash
dagr serve --port 8080
```

Four tools:

| Tool | Description |
|---|---|
| `run_identification_preflight` | Validate contract assumptions against data. Returns `ValidationSuiteResult`. |
| `compute_sensitivity` | Rambachan-Roth bounds or Oster delta. Returns `SensitivityReport`. |
| `generate_identification_report` | Full audit report from a ledger file. |
| `validate_did_contract` | Flat-parameter convenience tool for LLM agents. |

The `IdentificationReport` is designed for LLM consumption: semantic field names, paired interpretation strings, controlled-vocabulary verdicts.

```json
{
  "schema_version": "dagr/v1",
  "overall_verdict": "valid",
  "identification": {
    "strategy": "difference_in_differences",
    "estimand": "average_treatment_effect_on_treated",
    "status": "valid",
    "recommendation": "Identification is valid. Proceed with estimation.",
    "failed_assumptions": []
  },
  "sensitivity": {
    "rr_breakdown_m_bar": 1.43,
    "rr_verdict": "valid",
    "oster_delta": 2.14,
    "oster_verdict": "valid"
  },
  "audit_hash": "sha256:3f4a8c2b1d..."
}
```

---

## Feature Matrix

| Feature | DAGger | Naive LLM | Manual Checklist |
|---|:---:|:---:|:---:|
| Assumption validation | Automated | None | Manual |
| Fails on violation | Hard gate | Silent | Sometimes |
| Parallel trends | Event-study F + max\|beta\| | - | Visual |
| Weak instruments | Olea-Pflueger (2013) | - | Rule-of-thumb |
| Rambachan-Roth bounds | Python + R bridge | - | - |
| Oster delta | Analytic (3dp verified) | - | - |
| Audit trail | SHA-256 JSON-LD | - | Notes |
| LLM-readable output | Pydantic + MCP | Unstructured | - |
| Pre-registration | OSF JSON-LD | - | Manual |

---

## What DAGger Catches

The demo notebook `notebooks/01_the_llm_got_it_wrong.ipynb` walks through a real case:

1. An AI agent produces a "significant" employment effect of a minimum wage increase
2. DAGger runs the preflight — the pre-trend F-test **fails**
3. Rambachan-Roth bounds show the CI crosses zero at M-bar = 0.38
4. The corrected analysis on properly identified data: **VALID verdict**

```
[AI result]    ATT = -0.047***   SE = 0.018   p = 0.009   <- looks correct
[DAGger]       Pre-trend F(3,...) = 8.41   p = 0.002      <- assumption violated
               Rambachan-Roth: breakdown M* = 0.38        <- not robust
               Verdict: INVALID. Do not use this estimate.
```

---

## Quality

| Metric | Value |
|---|---|
| Test suite | 330+ tests |
| Type checking | `mypy --strict` (zero errors) |
| Linting | `ruff` (zero violations) |
| Coverage | >= 80% |
| Build | `uv build` + `twine check PASSED` |
| License | Apache 2.0 |
| Python | 3.12+ |

---

## Installation Options

```bash
# Core (contracts, validators, sensitivity, ledger, MCP, CLI)
pip install dagr

# With real estimators (pyfixest, linearmodels, doubleml)
pip install "dagr[estimators]"

# With R bridge (HonestDiD LP bounds, rdrobust, synthdid)
pip install "dagr[r-bridge]"

# Full installation
pip install "dagr[estimators,r-bridge]"
```

---

## References

DAGger implements or wraps published statistical methods. All implementations cite their source paper and include analytic test cases with known expected values.

Callaway, Brantly and Pedro H.C. Sant'Anna. 2021. "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230.

Olea, Jose Luis Montiel and Carolin Pflueger. 2013. "A Robust Test for Weak Instruments." *Journal of Business & Economic Statistics*, 31(3), 358-369.

Oster, Emily. 2019. "Unobservable Selection and Coefficient Stability: Theory and Evidence." *Journal of Business & Economic Statistics*, 37(2), 187-204.

Rambachan, Ashesh and Jonathan Roth. 2023. "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591.

Rosenbaum, Paul R. 2002. *Observational Studies* (2nd ed.). Springer. Chapter 4.

Simonsohn, Uri, Joseph P. Simmons, and Leif D. Nelson. 2020. "Specification Curve Analysis." *Nature Human Behaviour*, 4, 1208-1214.

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). We especially welcome:

- New validators with cited source papers and analytic test cases
- `RDDContract` implementation ([good first issue](https://github.com/your-org/dagr/issues))
- Callaway-Sant'Anna doubly-robust with cross-fitting
- Spanish/Portuguese translations of interpretation strings

**Critical rule**: Any modification to a statistical validator must cite the source paper
and include an analytic test with a known expected value. Statistical correctness is not negotiable.

---

<div align="center">

*DAGger — Because causal validity should be a compiler error, not a footnote.*

</div>
