Metadata-Version: 2.4
Name: nfl-bigquery
Version: 0.1.0
Summary: NFL play-by-play → BigQuery: idempotent nflverse ingestion + LLM-friendly docs + verification
Project-URL: Homepage, https://github.com/blahovec-labs/nfl-bigquery
Project-URL: Issues, https://github.com/blahovec-labs/nfl-bigquery/issues
Project-URL: Changelog, https://github.com/blahovec-labs/nfl-bigquery/blob/main/CHANGELOG.md
Author: Jason Blahovec
License: MIT
License-File: LICENSE
Keywords: bigquery,data-engineering,nfl,nflfastr,nflverse,play-by-play
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Requires-Python: >=3.11
Requires-Dist: db-dtypes<2.0,>=1.0
Requires-Dist: google-cloud-bigquery<4.0,>=3.20
Requires-Dist: nflreadpy
Requires-Dist: pandas<3.0,>=2.0
Requires-Dist: pyarrow<19.0,>=15.0
Provides-Extra: dev
Requires-Dist: build>=1.2.0; extra == 'dev'
Requires-Dist: pyright>=1.1.380; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Description-Content-Type: text/markdown

# nfl-bigquery

NFL play-by-play → BigQuery. Idempotent nflverse ingestion + LLM-friendly docs + verification.
Fourth in the `*-bigquery` family (`statcast-bigquery`, `nhl-bigquery`, `yfinance-bigquery`).

## Install
```bash
pip install nfl-bigquery
gcloud auth application-default login
```

## Sync
```bash
# Backfill 1999–present into a shared dataset (idempotent, resumable)
nfl-bigquery sync --seasons 1999-2025 --resume \
    --pbp-table myproject.mydataset.nfl_plays
# Player dimension
nfl-bigquery players --players-table myproject.mydataset.dim_players
```

Writes `nfl_plays`, `games`, `weekly_player_stats` (season-chunked DELETE→INSERT) and
`dim_players` (MERGE on `gsis_id`), plus a `_nfl_ingest_runs` log for `--resume`.

## Docs & verify
```bash
nfl-bigquery docs --format llm
nfl-bigquery docs --format bq-apply --table myproject.mydataset.nfl_plays
nfl-bigquery verify --source internal --season 2024 \
    --pbp-table myproject.mydataset.nfl_plays
```

The `verify` command runs 8 checks: duplicate-play detection, orphan-play detection, and
plays→weekly reconciliation for **passing/rushing/receiving yards and TDs** (6 metrics).
The `--tolerance` flag (default 0.0) sets the maximum allowed absolute aggregate diff;
lateral plays cause small legitimate yard diffs, so consider `--tolerance 5` for yards.

## Data source
All data from [nflverse](https://nflreadpy.nflverse.com/) via `nflreadpy`. Play-by-play is
nflfastR's 372-column feed (EPA/WPA/air-yards/CPOE modeled), back to 1999.

MIT licensed.
