Metadata-Version: 2.4
Name: pybacondecomp
Version: 0.1.0
Summary: Python replication of Stata's bacondecomp command — Goodman-Bacon (2021) decomposition of TWFE DiD estimators
Project-URL: Homepage, https://github.com/luzhiyu-econ/pybacondecomp
Project-URL: Repository, https://github.com/luzhiyu-econ/pybacondecomp
Project-URL: Issues, https://github.com/luzhiyu-econ/pybacondecomp/issues
Author-email: luzhiyu-econ <zhiyu.lu.econ@icloud.com>
License: MIT License
        
        Copyright (c) 2025 luzhiyu-econ
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: joblib
Requires-Dist: matplotlib
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: pyfixest
Requires-Dist: tqdm
Description-Content-Type: text/markdown

# bacondecomp

**Bacon Decomposition of Two-Way Fixed Effects Difference-in-Differences**

A Python implementation of the Goodman-Bacon (2021) decomposition, which expresses any two-way fixed effects (TWFE) DiD estimator as a weighted average of all possible 2×2 DiD comparisons. Supports uncontrolled, ddetail, and controlled (FWL) decompositions with optional multi-core parallelism via [joblib](https://joblib.readthedocs.io/).

---

## Installation

```bash
pip install pybacondecomp
```

**Dependencies:** `numpy`, `pandas`, `pyfixest` ≥ 0.25  
**Optional:** `joblib` (parallel execution), `tqdm` (progress bars), `matplotlib` (plots)

---

## Background

In staggered adoption designs, the TWFE estimator is a weighted average of all 2×2 DiD comparisons between pairs of timing groups. Some of these comparisons use already-treated units as the "control" group, which can produce negative weights when treatment effects are heterogeneous across groups or over time.

The decomposition identifies three types of comparisons:

| Type | Description |
|---|---|
| **Timing groups** | Earlier-adopting group vs. later-adopting group (and vice versa) |
| **Never vs. timing** | Timing group vs. never-treated units |
| **Always vs. timing** | Timing group vs. always-treated units |
| **Within** | Within-group variation (controlled decomposition only) |

**ddetail mode** further splits timing-group comparisons into:
- **Early vs. Late** — earlier-adopting group treated, later-adopting group as not-yet-treated control
- **Late vs. Early** — later-adopting group treated, earlier-adopting group as already-treated control

---

## Citation

This package is a Python port of the Stata command `bacondecomp` (v1.0.5, Goodman-Bacon, Goldring & Nichols, 2022). Please cite the original paper when using this package:

> Goodman-Bacon, Andrew. "Difference-in-differences with variation in treatment timing."
> *Journal of Econometrics* 225, no. 2 (2021): 254–277.
> https://doi.org/10.1016/j.jeconom.2021.03.014

The original working paper version:

> Goodman-Bacon, Andrew. "Difference-in-differences with variation in treatment timing."
> NBER Working Paper No. 25018, 2018.
> https://www.nber.org/papers/w25018

BibTeX:

```bibtex
@article{goodman-bacon2021,
  author  = {Goodman-Bacon, Andrew},
  title   = {Difference-in-differences with variation in treatment timing},
  journal = {Journal of Econometrics},
  volume  = {225},
  number  = {2},
  pages   = {254--277},
  year    = {2021},
  doi     = {10.1016/j.jeconom.2021.03.014}
}
```

The Stata implementation this port is based on:

> Goodman-Bacon, Andrew, Thomas Goldring, and Austin Nichols.
> `bacondecomp`: Stata module to perform Bacon decomposition of difference-in-differences estimation.
> Statistical Software Components S458676, Boston College Department of Economics, 2022.
> https://ideas.repec.org/c/boc/bocode/s458676.html

---

## Usage

### Basic (no controls)

```python
import pandas as pd
from pybacondecomp import bacondecomp

result = bacondecomp(
    df,
    y     = "outcome",      # outcome variable
    tr    = "treat",        # binary treatment (0/1, weakly increasing)
    unit  = "state",        # panel unit identifier
    time  = "year",         # time variable
)

print(result.dd_estimate)   # overall TWFE estimate
print(result.summary)       # weighted average by comparison type
print(result.two_by_two)    # every 2×2 DiD comparison
```

### ddetail mode — split Early vs. Late

```python
from pybacondecomp import bacondecomp

result = bacondecomp(df, y="outcome", tr="treat",
                     unit="state", time="year",
                     ddetail=True)
```

### Controlled decomposition (FWL)

```python
from pybacondecomp import bacondecomp

result = bacondecomp(df, y="outcome", tr="treat",
                     unit="state", time="year",
                     x=["log_income", "unemp_rate"])
```

### Parallel execution

```python
from pybacondecomp import bacondecomp

result = bacondecomp(df, y="outcome", tr="treat",
                     unit="state", time="year",
                     n_jobs=-1)   # use all cores
```

### Plot

```python
from pybacondecomp import bacon_plot

fig = bacon_plot(result)
fig.savefig("bacon.png", dpi=150)
```

### Stata-style interface

```python
from pybacondecomp import bacondecomp_stata

result = bacondecomp_stata(df, "outcome treat log_income unemp_rate",
                           unit="state", time="year")
```

---

## API Reference

### `bacondecomp(df, y, tr, unit, time, x=None, weights=None, ddetail=False, n_jobs=1, verbose=True)`

| Parameter | Type | Default | Description |
|---|---|---|---|
| `df` | `pd.DataFrame` | — | Strongly balanced panel |
| `y` | `str` | — | Outcome variable |
| `tr` | `str` | — | Binary treatment (0/1, weakly increasing) |
| `unit` | `str` | — | Panel unit identifier |
| `time` | `str` | — | Time variable |
| `x` | `list[str]` | `None` | Control variables (triggers FWL decomposition) |
| `weights` | `str` | `None` | Analytic weight variable |
| `ddetail` | `bool` | `False` | Split timing-group comparisons into Early/Late |
| `n_jobs` | `int` | `1` | Parallel workers (`-1` = all cores); requires `joblib` |
| `verbose` | `bool` | `True` | Print progress and summary |

**Returns:** `BaconResult` dataclass with fields:

| Field | Type | Description |
|---|---|---|
| `dd_estimate` | `float` | Overall TWFE DiD estimate |
| `se` | `float` | Standard error of TWFE estimate |
| `two_by_two` | `pd.DataFrame` | All 2×2 comparisons: `treated`, `control`, `estimate`, `weight`, `type` |
| `summary` | `pd.DataFrame` | Weighted averages by comparison type: `type`, `avg_estimate`, `total_weight` |
| `n_obs` | `int` | Number of observations |
| `n_groups` | `int` | Number of timing groups |
| `has_always` / `has_never` | `bool` | Whether always/never treated units are present |
| `within_estimate` | `float` | Within-group estimate (controlled only) |
| `elapsed_seconds` | `float` | Wall time |

### `bacon_plot(result, figsize=(8,5), show_dd_line=True, title=..., ax=None)`

Scatter plot of 2×2 estimates vs. weights, by comparison type.

---

## Data Requirements

- **Strongly balanced panel**: every unit observed at every time period.
- **Binary treatment**: `tr` ∈ {0, 1} in all periods.
- **Weakly increasing**: once treated, units remain treated (no reversals).
- No missing values on `y`, `tr`, `unit`, `time`, or any `x` variables.

---

## Stata Correspondence

| Stata | Python |
|---|---|
| `bacondecomp y tr` | `bacondecomp(df, "y", "tr", unit, time)` |
| `bacondecomp y tr, ddetail` | `bacondecomp(..., ddetail=True)` |
| `bacondecomp y tr x1 x2` | `bacondecomp(..., x=["x1","x2"])` |
| `e(sumdd)` | `result.summary` |
| `stub*B`, `stub*S` | `result.two_by_two[["estimate","weight"]]` |

---

## Validation Against Stata

The following results were produced on a synthetic staggered DiD panel (50 states × 9 years, 4 treatment cohorts: 2001/2003/2005/2007, 14 never-treated states; seed = 42) and cross-validated against Stata's `bacondecomp` v1.0.5.

The data and Stata do-file are available in [`tests/stata_verify/`](tests/stata_verify/).

### Branch 1 — no controls, no ddetail

Overall DD: **Python = 0.165726 | Stata = 0.16572565**

| Comparison type | Python Beta | Python Weight | Stata Beta | Stata Weight |
|---|---|---|---|---|
| Timing groups | 0.172517 | 0.506592 | 0.1725168 | 0.5065923 |
| Never vs timing | 0.158753 | 0.493408 | 0.1587530 | 0.4934077 |

### Branch 2 — ddetail (no controls)

Overall DD: **Python = 0.165726 | Stata = 0.16572565**

All 12 timing-group 2×2 comparisons match to 6 decimal places. Summary:

| Comparison type | Python Beta | Python Weight | Stata Beta | Stata Weight |
|---|---|---|---|---|
| Early vs Late | 0.174499 | 0.204361 | 0.174499* | 0.204361* |
| Late vs Early | 0.171176 | 0.302231 | 0.171176* | 0.302231* |
| Never vs timing | 0.158753 | 0.493408 | 0.1587530 | 0.4934077 |

\* Stata reports individual dyad rows; Python summary aggregates identically.

### Branch 3 — controlled (FWL, x = log income + unemployment rate)

Overall DD: **Python = 0.163864 | Stata = 0.163864**

| Comparison type | Python Beta | Python Weight | Stata Beta | Stata Weight |
|---|---|---|---|---|
| Timing groups | 0.172833 | 0.503206 | 0.172832956 | 0.5032063 |
| Never vs timing | 0.159164 | 0.489632 | 0.1591643654 | 0.4896315 |
| Within | −0.144980 | 0.007162 | −0.1449803561 | 0.0071621 |

All three branches replicate Stata output to at least 5 significant figures.

---

## License

MIT — see [LICENSE](LICENSE).
