Metadata-Version: 2.4
Name: pygsadf
Version: 2.0.2
Summary: Fast parallel GSADF bubble detection (PSY 2015) with wild bootstrap critical values
Author-email: Ali Madkhali <info@alimadkhali.com>
License: MIT
Project-URL: Homepage, https://alimadkhali.com
Project-URL: Repository, https://github.com/alixecon/pygsadf
Project-URL: Issues, https://github.com/alixecon/pygsadf/issues
Keywords: gsadf,bsadf,bubble-detection,psy-2015,econometrics,time-series,wild-bootstrap,right-tailed-adf,explosive-behaviour
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Office/Business :: Financial
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.22
Requires-Dist: pandas>=1.4
Requires-Dist: joblib>=1.1
Provides-Extra: fast
Requires-Dist: numba>=0.56; extra == "fast"
Provides-Extra: plot
Requires-Dist: matplotlib>=3.5; extra == "plot"
Provides-Extra: stats
Requires-Dist: statsmodels>=0.13; extra == "stats"
Provides-Extra: full
Requires-Dist: numba>=0.56; extra == "full"
Requires-Dist: matplotlib>=3.5; extra == "full"
Requires-Dist: statsmodels>=0.13; extra == "full"
Requires-Dist: tqdm>=4.60; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: Cython>=3.0; extra == "dev"
Requires-Dist: numba>=0.56; extra == "dev"
Requires-Dist: matplotlib>=3.5; extra == "dev"
Requires-Dist: statsmodels>=0.13; extra == "dev"
Requires-Dist: tqdm>=4.60; extra == "dev"
Dynamic: license-file

# pygsadf

**Fast parallel GSADF bubble detection (PSY 2015) with wild-bootstrap critical values.**

The first Python package to deliver production-grade, parallelised Generalised Sup ADF testing. Detects explosive bubbles in financial time series at 90%, 95%, and 99% confidence levels.

## Why pygsadf?

| Feature | pygsadf | R `exuber` | Stata `gsadf` | EViews |
|---|---|---|---|---|
| Parallel bootstrap | **Yes (all cores)** | Limited | No | No |
| CV 99% | **Yes** | Yes | No | No |
| Numba JIT kernel | **Yes** | C++ (Rcpp) | Mata | Proprietary |
| Pointwise BSADF CVs | **Yes** | Yes | No | No |
| CLI tool | **Yes** | No | No | No |
| One-line API | **Yes** | No | No | No |
| Free & open source | **MIT** | GPL | $$ | $$$ |

**Performance:** 1499 bootstrap iterations on T=3500 series in ~12 minutes on 192 cores, vs ~8 hours sequential.

## Installation

```bash
pip install pygsadf              # core (NumPy + joblib)
pip install pygsadf[fast]        # + Numba JIT (10-50x faster)
pip install pygsadf[full]        # + Numba + matplotlib + statsmodels + tqdm
```

## Quick Start

```python
import pygsadf
import pandas as pd

# Load your log-price series
prices = pd.read_csv("eth_prices.csv", index_col=0, parse_dates=True)
log_prices = prices["close"].apply(np.log)

# Run GSADF test (one line)
result = pygsadf.gsadf(log_prices)

# Results
print(result)                    # Full summary
result.reject_h0(0.95)          # True = bubble detected at 95%
result.reject_h0(0.99)          # True = bubble detected at 99%
result.bubbles                   # List of (start, end) episodes
result.plot()                    # Publication-ready figure

# Save / load
result.to_pickle("gsadf_result.pkl")
loaded = pygsadf.GSADFResult.from_pickle("gsadf_result.pkl")
```

## Command Line

```bash
# Full run with 1499 bootstrap replications
pygsadf --csv data.csv --col log_price --B 1499 --out result.pkl --plot bsadf.png

# Quick test (B=199)
pygsadf --csv data.csv --col close --log --B 199

# Use fewer cores
pygsadf --csv data.csv --col log_price --n-jobs 8
```

## API Reference

### `pygsadf.gsadf(y, B=1499, ...)`

Main entry point. Accepts NumPy array or pandas Series.

**Parameters:**
- `y` — Log-price series (array or pd.Series with DatetimeIndex)
- `B` — Bootstrap replications (default 1499; use 199 for quick tests)
- `max_lag` — Maximum ADF augmentation lags (BIC selects optimal)
- `quantiles` — Confidence levels, default `(0.90, 0.95, 0.99)`
- `seed` — RNG seed for reproducibility
- `n_jobs` — Parallel workers (-1 = all cores)

**Returns:** `GSADFResult` with:
- `.gsadf_stat` — Scalar GSADF statistic
- `.cv` — Dict of critical values `{"90%": ..., "95%": ..., "99%": ...}`
- `.bsadf` — Full BSADF sequence (ndarray)
- `.bsadf_cv` — Pointwise CV sequences
- `.bubbles` — List of `BubbleEpisode` objects
- `.reject_h0(confidence)` — Boolean hypothesis test
- `.plot()` — Matplotlib figure
- `.summary()` — Formatted text output
- `.to_pickle() / .from_pickle()` — Serialisation

### `pygsadf.wild_bootstrap_cv(y, r0, B=1499, ...)`

Low-level bootstrap function for custom workflows.

### `pygsadf.date_stamp_bubbles(bsadf, cv, dates, min_duration=5)`

Date-stamp explosive episodes from BSADF vs critical value sequences.

## How It Works

1. **BSADF Computation** — For each endpoint, compute the supremum of right-tailed ADF statistics over all valid start points (PSY 2015, Section 3)
2. **GSADF** — The overall supremum of the BSADF sequence
3. **Wild Bootstrap** — Generate synthetic unit-root series using Rademacher weights, compute GSADF on each, take empirical quantiles as critical values
4. **Date-Stamping** — Episodes where BSADF exceeds the pointwise 95% CV for at least `log(T)` consecutive days

The bootstrap is embarrassingly parallel — each replication is independent with its own deterministic RNG seed, giving identical results whether run on 1 core or 192.

## Architecture

```
pygsadf/
├── __init__.py          # Public API
├── core.py              # gsadf() + GSADFResult
├── adf.py               # Numba-JIT ADF kernel with BIC lag selection
├── bsadf.py             # GSADF + BSADF computation
├── bootstrap.py         # Parallel wild bootstrap
├── datestamp.py          # Bubble episode detection
└── cli.py               # Command-line interface
```

## Validation Against R `exuber`

pygsadf has been validated against the R [`exuber`](https://CRAN.R-project.org/package=exuber) package (v0.4.2+) on a synthetic series with a known embedded explosive regime (T=500, AR coefficient 1.05 at t=200–299).

**Apples-to-apples comparison (both fixed lag=1, B=999):**

| Metric | pygsadf | R exuber | Difference |
|---|---|---|---|
| **GSADF statistic** | **16.370363** | **16.370400** | **0.0002%** |
| BSADF correlation | — | — | **0.9987** |
| BSADF MAE | — | — | 0.033 |
| CV 90% (wild bootstrap) | 9.016 | 9.425 | 4.3% |
| CV 95% (wild bootstrap) | 10.270 | 10.760 | 4.6% |
| CV 99% (wild bootstrap) | 12.628 | 14.096 | 10.4% |
| Reject H₀ at 95% | Yes | Yes | **Match** |
| Reject H₀ at 99% | Yes | Yes | **Match** |

**Key findings:**

- **GSADF statistic matches to 6 decimal places** (0.0002% difference)
- **BSADF sequence correlation: 0.999** — the entire time-varying sequence matches
- **All rejection decisions agree** at every confidence level
- **CV differences (4–10%) are expected** — Python and R use different RNG implementations for bootstrap Rademacher draws; the underlying distributions converge as B → ∞
- **pygsadf's default BIC lag selection** produces higher GSADF values than exuber's fixed lag=1 default, because BIC can select lag=0 for some windows, yielding sharper test statistics. This is a methodological choice, not a discrepancy — both are valid implementations of PSY (2015)

The full validation suite is in [`validation/`](validation/), including the synthetic dataset, both Python and R scripts, and an automated comparison tool. To reproduce:

```bash
cd validation/
python generate_test_data.py        # create validation_series.csv
python run_pygsadf_lag1.py           # pygsadf with fixed lag=1
Rscript run_exuber.R                 # R exuber (requires R + exuber package)
python compare_results.py            # side-by-side comparison
```

## Citation

If you use pygsadf in academic work, please cite:

```bibtex
@software{pygsadf,
  title  = {pygsadf: Fast Parallel GSADF Bubble Detection},
  author = {Madkhali, Ali},
  year   = {2025},
  url    = {https://github.com/alixecon/pygsadf},
}
```

And the original methodology:

```bibtex
@article{psy2015,
  title   = {Testing for Multiple Bubbles: Historical Episodes of
             Exuberance and Collapse in the {S\&P} 500},
  author  = {Phillips, Peter C.B. and Shi, Shuping and Yu, Jun},
  journal = {International Economic Review},
  volume  = {56},
  number  = {4},
  pages   = {1043--1078},
  year    = {2015},
}
```

## License

MIT
