Metadata-Version: 2.4
Name: ere
Version: 0.1.0
Summary: Point-in-time data operations for Polars — prevent look-ahead bias in quantitative research
Keywords: polars,point-in-time,backtesting,look-ahead-bias,quantitative-finance
Author: Carl Sandström
Author-email: Carl Sandström <csandst@kth.se>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Dist: polars>=1.0
Requires-Python: >=3.12
Project-URL: Homepage, https://github.com/csandstrom/ere
Project-URL: Repository, https://github.com/csandstrom/ere
Project-URL: Issues, https://github.com/csandstrom/ere/issues
Description-Content-Type: text/markdown

# ere

[![CI](https://github.com/csandstrom/ere/actions/workflows/test.yml/badge.svg)](https://github.com/csandstrom/ere/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/ere.svg)](https://pypi.org/project/ere/)
[![Python](https://img.shields.io/pypi/pyversions/ere.svg)](https://pypi.org/project/ere/)

**Point-in-time data operations for Polars.** Prevent look-ahead bias in quantitative research and backtesting.

## The problem

When you join quarterly earnings onto daily prices using a naive merge, every row gets the *final* value, including restated figures that weren't published yet. Your backtest sees the future and the results are meaningless.

Standard tools have no concept of *when* a data point became known. **ere** fixes this by tracking two dates per row:

- **ref_date**: when the event occurred (e.g. fiscal quarter end)
- **knowledge_date** — the date you actually had it (e.g. SEC filing date)

After any `as_of()` call, every row satisfies `knowledge_date <= query_date`. No future information leaks.

## Install

```
pip install ere
```

Requires Python >= 3.12 and [Polars](https://pola.rs) >= 1.0.

## Quick start

```python
from datetime import date
import polars as pl
import ere

PRICES = ere.TemporalSpec(ref_date="date", knowledge_date="date", entity="ticker")
EARNINGS = ere.TemporalSpec(
    ref_date="fiscal_quarter_end",
    knowledge_date="filing_date",
    entity="ticker",
)

ere.validate(prices_df, spec=PRICES)
ere.validate(earnings_df, spec=EARNINGS)

# What did we know on Feb 1?
snap = ere.as_of(earnings_df, query_date="2025-02-01", spec=EARNINGS)
# AAPL shows original filing (1.50), not the restatement filed Feb 20 (1.42)

snap_later = ere.as_of(earnings_df, query_date="2025-03-01", spec=EARNINGS)
# now AAPL shows 1.42, the restated value

# Multi-source snapshot at one date
combined = ere.as_of(
    [(prices_df, PRICES), (earnings_df, EARNINGS)],
    query_date="2025-02-15",
)

# Multi-date snapshots for backtesting
rebalance_dates = [date(2025, 1, 31), date(2025, 2, 15), date(2025, 2, 28)]
snapshots = ere.panel(prices_df, spec=PRICES, dates=rebalance_dates, lookback=252)

# Post-hoc audit
ere.audit(earn_snap, as_of_date="2025-02-15", knowledge_date_col="filing_date")
```

## API

| Function | Purpose |
|---|---|
| `as_of(df, query_date, spec)` \* | Point-in-time snapshot, filtered to what was known at `query_date` |
| `panel(df, spec, dates)` \* | Multi-date snapshots; returns `{date: DataFrame}` |
| `panel_iter(df, spec, dates)` \* | Iterator over `panel()`, yielding `(date, DataFrame)` |
| `panel_lazy(df, spec, dates)` \* | Fully lazy version. Single inequality join under the hood, tagged by query date |
| `panel_map(df, spec, dates, fn)` \* | Apply a Python function to each snapshot (for logic that doesn't fit as Polars expressions) |
| `validate(df, spec)` | Check temporal structure (columns, dtypes, no nulls, no time travel) |
| `audit(df, as_of_date)` | Assert no look-ahead leakage in a result frame |
| `tag_knowledge_date(df, ...)` | Add a knowledge_date column from a fixed lag |
| `deduplicate(df, spec)` | Remove duplicate versions |
| `align(sources, date)` | Per-source PIT snapshots at one date |

\* Also accepts a list/dict of `(frame, spec)` sources; see *Multi-source snapshots* below.

## Key concepts

**TemporalSpec** binds your column names to temporal roles. Define it once per dataset:

```python
SPEC = ere.TemporalSpec(
    ref_date="fiscal_quarter_end",  # when the event happened
    knowledge_date="filing_date",   # when it became known
    entity="ticker",                # optional grouping key
)
```

**Restatements** are handled automatically. If a data point is published multiple times (same entity + ref_date, different knowledge_dates), `as_of()` returns the latest version that was known at the query date.

**Multi-source snapshots.** Pass a list (or dict) of `(frame, spec)` pairs anywhere a single `(frame, spec)` goes. Each source is PIT-aligned independently, then asof-joined into one frame:

```python
sources = [(prices_df, PRICES), (earnings_df, EARNINGS)]

snapshot = ere.as_of(sources, "2025-02-15")            # joined point-in-time view
panels   = ere.panel(sources, dates=rebalance_dates)   # multi-date snapshots

# Entity-less specs (no `entity=`) broadcast across entities automatically:
MACRO = ere.TemporalSpec(ref_date="release_period", knowledge_date="release_date")
with_macro = ere.as_of([*sources, (macro_df, MACRO)], "2025-02-15")

# Want them separate instead of merged? Use align:
per_source = ere.align(sources, "2025-02-15")          # -> [prices_snap, earn_snap]
```

## Examples

Runnable end-to-end scripts live in [`examples/`](examples/):

- [`quickstart.py`](examples/quickstart.py) — end-to-end backtest walkthrough: restatements, naive-join leakage, multi-date rebalancing.
- [`multi_source.py`](examples/multi_source.py) — list/dict sources, entity-less broadcasting, `panel`/`panel_iter`/`align`.
- [`panel_lazy.py`](examples/panel_lazy.py) — benchmarks `as_of()` loop vs `panel_lazy`/`panel_map` on second-granularity orderbook data.

Run any of them with `uv run python examples/<name>.py`.

## License

MIT
