Metadata-Version: 2.4
Name: frontierlag
Version: 0.1.0
Summary: Audit the capability gap between frontier AI models and the models tested in academic papers.
Author-email: David Gringras <davidgringras@hsph.harvard.edu>
License: MIT License
        
        Copyright (c) 2026 David Gringras
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/davidgringras/frontierlag
Project-URL: Documentation, https://davidgringras.github.io/frontierlag/
Project-URL: Paper, https://arxiv.org/abs/TBD
Project-URL: Dataset, https://osf.io/TBD
Project-URL: Issues, https://github.com/davidgringras/frontierlag/issues
Keywords: AI evaluation,bibliometric audit,LLM,frontier models,research methodology
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28
Requires-Dist: pyyaml>=6.0
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Requires-Dist: pytest-cov>=4.0; extra == "test"
Provides-Extra: dev
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Dynamic: license-file

# frontierlag

**Audit the capability gap between frontier AI and the models tested in academic papers.**

Paste a DOI. Get a report: what model the paper tested, where it sat relative to the frontier at evaluation date, what configuration the paper disclosed, and whether the paper fails all three audit dimensions at the pre-registered thresholds from the companion study.

```
$ pip install frontierlag
$ frontierlag check 10.1038/s41591-024-03425-5
```

This package is a companion to the paper *Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation* (Gringras, 2026; arXiv:TBD). The audit dataset embedded here is the frozen snapshot used in that paper; quarterly refreshes are shipped as point releases.

---

## What it does

`frontierlag` measures three dimensions of the gap between what published AI evaluations test and what the frontier can do at the same moment:

| Dimension | What it captures |
|---|---|
| **Capability gap** | ECI points and calendar months between the tested model and the frontier at evaluation date. |
| **Tier gap** | Number of same-family siblings with higher ECI that were already available at evaluation date. |
| **Configuration** | Fraction of reasoning mode, tools, scaffolding, and sampling items the paper discloses (items match the VERSIO-AI v1 checklist). |

A paper that fails all three at the pre-registered thresholds is flagged as a **compound failure**. See `frontierlag/config.yaml` for the thresholds (they mirror the paper's pre-registration).

The package does **not** estimate counterfactual capability — it does not claim "the paper's conclusion would have been X if they had used Y." That move is absent from the companion paper by design, and it is absent here too.

---

## Quick start

```python
import frontierlag as fl

# By DOI (hits the frozen corpus if the paper is in the audit; otherwise
# resolves publication date via CrossRef and leaves you to supply the model).
report = fl.check("10.1038/s41591-024-03425-5")
print(report.to_text())

# Override / supply fields for a paper not in the frozen corpus.
report = fl.check(
    "10.1000/your-doi",
    primary_model="GPT-4",
    evaluation_date="2024-06-01",
    configuration_disclosures={
        "model_version_exact": True,
        "access_date": True,
        "reasoning_mode": None,  # not applicable to GPT-4
        "tool_use": False,
        # ... other items default to "not reported"
    },
)

# Or audit already-extracted metadata directly.
from frontierlag import audit, PaperMetadata
m = PaperMetadata(primary_model="GPT-3.5", publication_date="2024-07-01")
print(fl.audit(m).to_text())

# Individual lookups.
fl.lookup_model("claude-3.5-sonnet")          # → ModelRecord
fl.get_frontier_at_date("2025-06-01")          # → FrontierSnapshot
fl.list_known_models()                         # → list[str]
```

## CLI

```
frontierlag check <DOI>               audit a paper
frontierlag lookup <MODEL>            single-model metadata
frontierlag frontier <YYYY-MM-DD>     frontier at a date
frontierlag models                    list known canonical names
frontierlag info                      version + data-freeze date
```

Every command accepts `--json` for machine-readable output. `frontierlag check` accepts `--model`, `--eval-date`, and `--config-file` to override or supply fields a paper does not otherwise provide.

---

## Example output

```
$ frontierlag check 10.1038/s41746-023-00961-1 --model GPT-4 --eval-date 2023-03-20
frontierlag audit (data freeze: 2026-04-01)
========================================================================
Paper:  ChatGPT performance on USMLE-style medical examinations
DOI:    10.1038/s41746-023-00961-1
Evaluation date: 2023-03-20

Primary model tested
  input  : 'GPT-4' → canonical: GPT-4 (Mar 2023)
  release: 2023-03-15     ECI: +126.2

Frontier at evaluation date
  GPT-4 (Mar 2023) (released 2023-03-15, ECI +126.2)

Audit dimensions
  Capability gap : +0.0 ECI pts   (+0 months)
  Tier gap       : 0 stronger same-family sibling(s) available
  Configuration  : —  of applicable items disclosed

Compound failure: undetermined (insufficient structured metadata).
```

(A fully-extracted audit with configuration disclosures returns a clean PASS/FAIL verdict.)

---

## Data freeze

The embedded dataset is frozen at `FREEZE_DATE = 2026-04-01`. Every report prints this at the top so readers know how stale the comparison is. Quarterly updates ship as `frontierlag >= 1.0.X`; a banner on the static site tracks the current freeze.

| File | Source |
|---|---|
| `data/eci_scores.csv` | Epoch AI Capabilities Index snapshot (Epoch AI, 2026) |
| `data/monthly_frontier_trajectory.csv` | Derived from ECI + model release dates |
| `data/model_version_lookup.json` | Maintainer-curated, cross-checked against Epoch AI model tracker |
| `data/frozen_audit.json` | The companion paper's extracted audit (empty until production extraction completes) |

All dataset files are plain text and diffable; the freeze history is visible in `git log`.

---

## Install

```
pip install frontierlag
```

From source:
```
git clone https://github.com/davidgringras/frontierlag.git
cd frontierlag
pip install -e '.[test]'
pytest
```

Requires Python ≥ 3.9. Runtime dependencies are `requests` and `pyyaml`; no heavy scientific stack.

---

## Citation

```bibtex
@misc{gringras2026frontierlag,
  author       = {Gringras, David},
  title        = {Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic {AI} Evaluation},
  year         = {2026},
  eprint       = {TBD},
  archivePrefix= {arXiv},
  primaryClass = {cs.AI},
  note         = {Companion package: \url{https://github.com/davidgringras/frontierlag}}
}
```

## Contributing

Two things the package needs from the community and will welcome pull requests for:

1. **Model aliases.** Every paper spells model names differently. `config.yaml::aliases` is the single file to extend. PRs that add an alias mapping without touching code are the fastest path to review.
2. **Frontier trajectory updates.** When a new model ships, add a row to `data/monthly_frontier_trajectory.csv` and bump `_version.py::FREEZE_DATE`. The package has a quarterly release cadence; out-of-cycle PRs are welcome for newly-released frontier models.

Code changes should include tests and run `pytest`. See `tests/` for conventions.

## License

MIT. See `LICENSE`.
