Metadata-Version: 2.4
Name: nhl-bigquery
Version: 0.1.1
Summary: NHL play-by-play → BigQuery: idempotent ingestion + LLM-friendly docs + verification
Project-URL: Homepage, https://github.com/blahovec-labs/nhl-bigquery
Project-URL: Issues, https://github.com/blahovec-labs/nhl-bigquery/issues
Project-URL: Changelog, https://github.com/blahovec-labs/nhl-bigquery/blob/main/CHANGELOG.md
Author: Jason Blahovec
License: MIT
License-File: LICENSE
Keywords: bigquery,data-engineering,hockey,nhl,play-by-play
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.11
Requires-Dist: db-dtypes<2.0,>=1.0
Requires-Dist: google-cloud-bigquery<4.0,>=3.20
Requires-Dist: pandas<3.0,>=2.0
Requires-Dist: pyarrow<19.0,>=15.0
Requires-Dist: requests<3.0,>=2.31
Provides-Extra: dev
Requires-Dist: build>=1.2.0; extra == 'dev'
Requires-Dist: pyright>=1.1.380; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: responses>=0.25; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Description-Content-Type: text/markdown

# nhl-bigquery

Idempotent NHL play-by-play (with on-ice arrays merged from shift-charts) → BigQuery ingestion, with first-class documentation for SQL/LLM agents and verification against the NHL public API.

## Install

    pip install nhl-bigquery

## Quickstart

    gcloud auth application-default login
    nhl-bigquery sync \
        --start 2024-10-01 --end 2025-06-30 \
        --plays-table myproject.mydataset.nhl_plays

This writes six tables to `myproject.mydataset.*`:

- `nhl_plays` — one row per event, with `home_on_ice_ids` / `away_on_ice_ids` arrays
- `games` — schedule dimension
- `game_officials` — referees + linesmen per game
- `boxscore_stats` — per-player per-game stats
- `shifts` — per-shift per-player intervals
- `standings` — daily team-standings snapshots

## Backfill

Backfill 15 seasons in resumable monthly chunks:

    nhl-bigquery sync \
        --start 2010-10-01 --end 2026-05-11 \
        --chunk-by month --resume \
        --plays-table myproject.mydataset.nhl_plays

`--resume` skips chunks already recorded as `success` or `empty` in
`<dataset>._nhl_ingest_runs`. Re-running with the same `--chunk-by` is
safe; switching between runs will re-process (chunks must match exactly).

## Documentation

    nhl-bigquery docs --format llm > NHL_FOR_LLMS.md
    nhl-bigquery docs --format bq-apply --table myproject.mydataset.nhl_plays

Five formats: `bq-apply` (push descriptions to BigQuery), `llm` (one
Markdown file packing every column for LLM context), `dictionary`
(JSON rows for a data dictionary table), `markdown` (human reference),
and `dbt` (dbt YAML schema stub).

## Verification

    nhl-bigquery verify --source internal \
        --aggregation internal-consistency \
        --table myproject.mydataset.nhl_plays

    nhl-bigquery verify --source nhl-api \
        --aggregation team-season --metric all --season 2024 \
        --table myproject.mydataset.nhl_plays

MIT licensed. This software does not include or distribute NHL data.
