Metadata-Version: 2.4
Name: eapctf
Version: 0.3.0
Summary: Empirical asset pricing toolkit: ML prediction, factor models, cross-sectional tests, SDF/GMM
Project-URL: Repository, https://github.com/NanyeonK/eapctf
Project-URL: Documentation, https://NanyeonK.github.io/eapctf/
License: MIT
License-File: LICENSE
Keywords: CTF,Fama-MacBeth,IPCA,asset pricing,cross-section of returns,empirical finance,factor models,machine learning,portfolio optimization
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.11
Requires-Dist: cvxpy>=1.4
Requires-Dist: ipca>=0.6.7
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.1
Requires-Dist: polars>=1.39.3
Requires-Dist: pyarrow>=23.0.1
Requires-Dist: pyportfolioopt>=1.5
Requires-Dist: scikit-learn>=1.4
Requires-Dist: scipy>=1.11
Requires-Dist: statsmodels>=0.14
Provides-Extra: conformal
Requires-Dist: mapie>=1.0.0; extra == 'conformal'
Provides-Extra: ctf
Requires-Dist: python-dotenv>=1.0; extra == 'ctf'
Requires-Dist: requests>=2.31; extra == 'ctf'
Requires-Dist: wrds>=3.2; extra == 'ctf'
Provides-Extra: dev
Requires-Dist: mapie>=1.0.0; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pandas-stubs; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: types-markdown; extra == 'dev'
Requires-Dist: types-requests; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-gen-files>=0.5; extra == 'docs'
Requires-Dist: mkdocs-literate-nav>=0.6; extra == 'docs'
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs-section-index>=0.3; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'docs'
Provides-Extra: ml
Requires-Dist: lightgbm>=4.0; extra == 'ml'
Requires-Dist: torch>=2.1; extra == 'ml'
Description-Content-Type: text/markdown

# eapctf: Empirical Asset Pricing Toolkit

A Python package for empirical asset pricing research, covering factor construction,
cross-sectional tests, SDF/GMM estimation, ML-based return prediction, and portfolio
optimization.

## Installation

```bash
# with uv (recommended)
uv add eapctf

# with pip
pip install eapctf
```

Optional ML extras (PyTorch, LightGBM):

```bash
uv add "eapctf[ml]"
```

## Quick Start

All examples assume a long-format panel DataFrame `data` with columns
`date`, `permno`, `ret`, `mktcap`, `exchcd`, and any characteristic columns.

### 1. Portfolio Sorting

```python
from eapctf.sorting import univariate_sort, bivariate_sort, ff3_factors, ff5_factors

# Decile sort on book-to-market with NYSE breakpoints (JKP micro-cap filter on by default)
result = univariate_sort(data, char_col="bm", n_portfolios=10, weighting="vw")
print(result.portfolio_returns)   # DataFrame: date x [port_1, ..., port_10, long_short]
print(result.portfolio_stats)     # mean, std, Sharpe, t-stat per decile

# Fama-French 3-factor model
ff3 = ff3_factors(data, bm_col="bm", rf_col="rf")
print(ff3.factors)   # DataFrame: date x [mkt_rf, smb, hml]

# Fama-French 5-factor model
ff5 = ff5_factors(data, bm_col="bm", op_col="op", inv_col="inv", rf_col="rf")
print(ff5.factors)   # DataFrame: date x [mkt_rf, smb, hml, rmw, cma]
```

### 2. Fama-MacBeth Cross-Sectional Regression

```python
from eapctf.crosssection import fama_macbeth

# Characteristic-based FM: cross-sectional regression of ret on chars each period
result = fama_macbeth(data, char_cols=["bm", "size", "mom"])
print(result.lambdas[["coef", "t_shanken"]])  # risk premia with Shanken-corrected t-stats
print(result.r_squared)                        # time-series average cross-sectional R²

# Factor-based FM (two-pass): estimate betas first, then price them
result2 = fama_macbeth(data, factor_cols=["mkt_rf", "smb", "hml"])
```

### 3. Time-Series Alpha and GRS Test

```python
from eapctf.timeseries import time_series_alpha, grs_test

# Single portfolio or multiple portfolios (DataFrame)
alpha_res = time_series_alpha(result.portfolio_returns, ff3.factors)
# Returns AlphaResult (single) or list[AlphaResult] (multiple)
print(alpha_res.alpha)    # intercept
print(alpha_res.alpha_t)  # Newey-West t-statistic

# GRS test: are all portfolio alphas jointly zero?
grs = grs_test(result.portfolio_returns, ff3.factors)
print(grs.statistic, grs.p_value)
```

### 4. SDF / GMM Estimation

```python
from eapctf.sdf import gmm_estimate, hj_distance, hj_bounds

# Two-step efficient GMM (default)
gmm = gmm_estimate(port_ret, ff3.factors, two_step=True)
print(gmm.b)           # SDF loadings (K,)
print(gmm.t_stats)     # t-statistics
print(gmm.j_statistic, gmm.j_p_value)  # overidentification J-test

# HJ distance: pass a pre-computed SDF proxy (e.g., from GMM)
f_demeaned = ff3.factors.sub(ff3.factors.mean())
sdf_proxy = 1 - f_demeaned.values @ gmm.b
hj = hj_distance(port_ret, pd.Series(sdf_proxy, index=port_ret.index))
print(hj.distance)

# HJ volatility bounds
bounds = hj_bounds(port_ret)
```

### 5. ML Out-of-Sample Return Prediction

```python
from eapctf.predict import expanding_window_oos, make_predictor

model = make_predictor("lasso", alpha=0.01)
oos = expanding_window_oos(
    data,
    char_cols=["bm", "size", "mom", "op", "inv"],
    models=[model],
    train_min_periods=240,   # minimum 20 years of training data
)
print(oos.oos_r2)              # OOS R² averaged across models (GKX 2020)
print(oos.oos_r2_by_model)     # OOS R² per model
```

### 6. Portfolio Optimization

```python
from eapctf.sorting import long_short_portfolio
from eapctf.portfolio import mean_variance_weights, hrp_weights

# Long-short portfolio from a signal
ls = long_short_portfolio(data, signal_col="bm", n_portfolios=10, weighting="vw")
print(ls.returns["long_short"])   # long-short return series
print(ls.metrics)                 # mean, std, Sharpe, etc.

# Mean-variance optimization
weights = mean_variance_weights(
    expected_returns=mu,
    cov_matrix_input=sigma,
    method="max_sharpe",
)

# Hierarchical Risk Parity
weights_hrp = hrp_weights(returns_data=port_ret)
```

## CTF (Competition to Forecast)

`eapctf.ctf` provides a local replication pipeline for the [Common Task Framework](https://jkpfactors.com)
introduced in Hoberg, Jensen, Kelly & Pedersen (2025). The CTF evaluates portfolio strategies on a
shared holdout test set across 402 firm characteristics (153 JKP + 249 additional GFD factors).

### Pipeline

```python
from eapctf.ctf import run_local, compute_metrics, validate

# 1. Run a CTF model script locally
weights = run_local("models/my-model.py", data_dir="data/ctf/")

# 2. Evaluate performance (10% vol-targeting matches CTF server methodology)
daily_ret = pd.read_parquet("data/ctf/ctff_daily_ret.parquet")
metrics = compute_metrics(weights, daily_ret, vol_target=0.10)
print(metrics)

# 3. Check compliance before submission
report = validate("models/my-model.py", data_dir="data/ctf/")
print(report)
```

### Starting a New Model

```bash
cp reference/template-ctf-model.py models/my-model.py
# Edit models/my-model.py — replace TODO sections with your implementation
```

The template provides a complete rolling-window train/predict loop with rank-normalized features,
OLS prediction, and z-score portfolio weights. Replace `train_model()` / `predict_returns()` /
`construct_weights()` with your approach; the rest of the pipeline stays the same.

### Replication Results

The table below shows `eap.ctf.compute_metrics()` output against known CTF leaderboard entries,
confirming that local evaluation with `vol_target=0.10` reproduces CTF server metrics closely.
All returns are scaled to 10% annualized volatility before computing statistics (CTF standard).

| Model | Sharpe (local) | Sharpe (CTF) | Diff % | Annual Return | Vol | Max Drawdown |
|---|---|---|---|---|---|---|
| 1/N (equal weight) | 0.551 | 0.491 | +12.2% | 5.13% | 10.00% | -30.43% |
| IPCA (KPS 2019) | **1.939** | **1.948** | **-0.5%** | 20.64% | 10.00% | -10.30% |

The IPCA replication uses the parallelized benchmark script at `reference/benchmark-ipca-pf.py`
(n_factors=5, window=120 months, 402 features, 408 test dates). The Sharpe replicates within
0.5% of the CTF leaderboard value; the 1/N discrepancy reflects minor differences in stock
universe filtering conventions between local evaluation and the CTF server.

## Module Overview

| Module | Key Functions | Reference |
|--------|---------------|-----------|
| `eapctf.ctf` | `run_local`, `compute_metrics`, `validate`, `fetch_leaderboard`, `pipeline`, `download_ctf_data` | Hoberg, Jensen, Kelly & Pedersen (2025) |
| `eapctf.sorting` | `univariate_sort`, `bivariate_sort`, `char_factor`, `ff3_factors`, `ff5_factors`, `hxz4_factors`, `sy4_factors`, `mom_factor`, `long_short_portfolio` | Fama & French (1993, 2015); Hou, Xue & Zhang (2015); Stambaugh & Yuan (2017) |
| `eapctf.crosssection` | `fama_macbeth`, `cs_regression`, `multiple_testing_correction` | Fama & MacBeth (1973); Shanken (1992) |
| `eapctf.timeseries` | `time_series_alpha`, `grs_test`, `spanning_test`, `rolling_beta` | Gibbons, Ross & Shanken (1989) |
| `eapctf.sdf` | `gmm_estimate`, `hj_distance`, `hj_bounds`, `pricing_errors` | Hansen (1982); Hansen & Jagannathan (1991) |
| `eapctf.predict` | `expanding_window_oos`, `make_predictor`, `char_prep` | Gu, Kelly & Xiu (2020) |
| `eapctf.portfolio` | `mean_variance_weights`, `hrp_weights`, `black_litterman_weights`, `ParametricPolicy`, `evaluate_portfolio` | Markowitz (1952); Lopez de Prado (2016) |
| `eapctf.utils` | `rank_normalize`, `classify`, `EAPPanel`, `JKP_153`, `load_gfd_chars` | — |

## Development

```bash
# install with dev dependencies
uv sync --dev

# run tests
uv run python -m pytest

# lint and type check
uv run ruff check eapctf/
uv run mypy eapctf/ --ignore-missing-imports
```

## License

MIT
