Metadata-Version: 2.4
Name: vernier
Version: 0.0.2
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Dist: numpy>=2.0
Requires-Dist: rfdetr==1.6.5.post0 ; extra == 'real-models'
Requires-Dist: platformdirs>=4 ; extra == 'real-models'
Requires-Dist: polars>=1.0 ; extra == 'tables'
Requires-Dist: torch>=2.4 ; extra == 'torch'
Requires-Dist: plotly>=6.0 ; extra == 'viz'
Provides-Extra: real-models
Provides-Extra: tables
Provides-Extra: torch
Provides-Extra: viz
License-File: LICENSE-APACHE
License-File: LICENSE-MIT
Summary: High-performance, parity-preserving COCO-style evaluation
Keywords: evaluation,metrics,computer-vision,detection,coco,object-detection
Author: The vernier authors
License: MIT OR Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/NoeFontana/vernier#readme
Project-URL: Homepage, https://github.com/NoeFontana/vernier
Project-URL: Issues, https://github.com/NoeFontana/vernier/issues
Project-URL: Repository, https://github.com/NoeFontana/vernier

# vernier

[![CI](https://github.com/NoeFontana/vernier/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/NoeFontana/vernier/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/vernier.svg?label=pypi%20%7C%20vernier)](https://pypi.org/project/vernier/)
[![crates.io vernier-core](https://img.shields.io/crates/v/vernier-core.svg?label=crates.io%20%7C%20vernier-core)](https://crates.io/crates/vernier-core)
[![crates.io vernier-mask](https://img.shields.io/crates/v/vernier-mask.svg?label=crates.io%20%7C%20vernier-mask)](https://crates.io/crates/vernier-mask)
[![crates.io vernier-cli](https://img.shields.io/crates/v/vernier-cli.svg?label=crates.io%20%7C%20vernier-cli)](https://crates.io/crates/vernier-cli)
[![License: MIT OR Apache-2.0](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg)](#license)

> Fast, parity-preserving evaluation for object detection, instance /
> panoptic / semantic segmentation, boundary IoU, OKS keypoints, and
> LVIS federated. Rust core, Python frontend, optional CLI.

`pycocotools==2.0.11` is the de-facto reference for COCO evaluation —
slow, unmaintained, and full of edge-case quirks. Faster
reimplementations exist, but each silently fixes some quirks and not
others, so you discover the divergences empirically. vernier takes a
third path:

- **Auditable parity.** Every divergence from pycocotools is filed in
  the quirks survey under
  [ADR-0002](docs/adr/0002-three-tier-parity-model.md) as either
  `strict` (bit-equal output, even when vernier's implementation is
  structurally different) or `corrected` (opt-in opinionated fix).
  Strict is the default; corrected fixes are itemized so you always
  know when your numbers diverge from a reference run. A drop-in shim
  (`vernier.patch_pycocotools()`) keeps existing pycocotools-based
  scripts working with one line.
- **One toolkit instead of five.** bbox / segm / boundary / keypoints
  AP, panoptic PQ, semantic mIoU, and LVIS federated all live behind
  one Python API and one CLI. Per-paradigm migration guides under
  [`docs/migrate/`](docs/migrate/) show how to replace `pycocotools`,
  `faster-coco-eval`, `panopticapi`, `lvis-api`, and
  `mmsegmentation` one at a time.
- **Rust core, Python frontend.** The matching kernel is pure Rust
  with runtime SIMD dispatch; the FFI layer is data conversion only.
  The CLI ships as a static binary, so CI pipelines call vernier
  without provisioning a Python interpreter.

## Status & validation

Pre-1.0; public API is unstable. See [`docs/adr/`](docs/adr/) for the
design decisions shaping it. Per-paradigm parity status:

| Paradigm / metric | Oracle | Parity tier | Open caveat |
| --- | --- | --- | --- |
| Instance bbox / segm / keypoints AP | `pycocotools==2.0.11` | strict bit-equal | none |
| Instance boundary IoU | `boundary-iou-api` | strict bit-equal | none |
| Segm + boundary TIDE thresholds (`t_b`) | none yet | corrected-only | [ADR-0022](docs/adr/0022-tide-thresholds.md) still `proposed`; defaults extrapolated, not measured |
| Panoptic PQ | `panopticapi` (single-core path) | strict bit-equal | `boundary=True` raises `NotImplementedError` ([ADR-0025](docs/adr/0025-panoptic-api.md) §Q3) |
| Semantic mIoU / FWIoU / pAcc / mAcc | `mmseg.IoUMetric` vendored at v1.2.2 ([ADR-0036](docs/adr/0036-vendor-mmsegmentation-ioumetric.md), still `proposed`); cityscapesScripts + ADE20K cross-impl bench externally blocked | strict bit-equal on the four per-class u64 marginals at val2017 scale | [ADR-0028](docs/adr/0028-sem-seg.md); ADE20K-scale bench gated on license-cleared cache |
| LVIS federated AP | `lvis-api` (vendored at `031ac21f`, ORACLE_LVIS_COMMIT_SHA) | strict bit-equal on the `(T, R, K, A)` precision tensor at full LVIS v1 val | bench paradigm wired; segm cell waits on `evaluate_segm_grid_with_dataset` |

Three-tier parity model: [ADR-0002](docs/adr/0002-three-tier-parity-model.md);
per-library comparison: [`docs/comparison.md`](docs/comparison.md).

## Benchmarks

<!-- The headline table and pinned-baselines block below are hand-mirrored from
     docs/benchmarks.md (which is auto-generated by tools/render_benchmarks.py).
     After a fresh bench round, refresh both: re-run the renderer, then update
     the numbers + version pins here. -->

| Workload | vernier median | Speedup vs alternatives |
| --- | ---: | --- |
| Instance — bbox AP (val2017) | 360 ms | **5.9×** faster-coco-eval · **16.2×** pycocotools |
| Instance — segm AP (val2017) | 968 ms | **3.7×** faster-coco-eval · **7.1×** pycocotools |
| Instance — boundary AP (val2017) | 3.1 s | **5.7×** faster-coco-eval · **19.9×** boundary-iou-api |
| Instance — keypoints AP (val2017, OKS) | 136 ms | **12.5×** faster-coco-eval · **17.1×** pycocotools |
| Panoptic — PQ (val2017) | 11.6 s | **3.04×** panopticapi |
| Semantic — mIoU (val2017) | 5.1 s | **4.2×** mmsegmentation |
| Instance — LVIS bbox AP (v1 val, perfect-DT) | 3.7 s | **56.9×** lvis-api · 10× lower peak RSS (1.49 GiB vs 15.01 GiB) |

Median total-stage wall time on a KVM VPS (AMD EPYC-Milan, 4 cores ×
2 threads = 8 logical CPUs, `x86_64` — not a bare-metal Milan box),
harness mode `release` (N=10 measurement reps + 2 warmup, randomised
impl order, 5% relative-IQR gate per impl), build profile = cargo
release defaults (`opt-level=3`, `lto=thin`, `codegen-units=1`, no
`target-cpu`) — same as the PyPI wheel. Full per-cell breakdown
(including IQRs), RSS, and methodology in
[`docs/benchmarks.md`](docs/benchmarks.md); per-library comparison of
when to pick which in [`docs/comparison.md`](docs/comparison.md).

**Baselines pinned for these numbers** —
[`pycocotools==2.0.11`](https://pypi.org/project/pycocotools/2.0.11/),
[`faster-coco-eval==1.7.2`](https://pypi.org/project/faster-coco-eval/1.7.2/),
[`panopticapi` @ `7bb4655`](https://github.com/cocodataset/panopticapi/commit/7bb4655548f9),
[`boundary-iou-api` @ `37d2558`](https://github.com/bowenc0221/boundary-iou-api/commit/37d25586a677),
[`mmsegmentation` @ `c685fe6`](https://github.com/open-mmlab/mmsegmentation/commit/c685fe6767c4cadf6b051983ca6208f1b9d1ccb8) (vendored),
[`lvis-api` @ `031ac21`](https://github.com/lvis-dataset/lvis-api/commit/031ac21f939b)
(PyPI `lvis==0.5.3`).
COCO and panoptic / semantic numbers were measured at HEAD `1fd5720bf56c`;
the LVIS row was added at HEAD `e9d9c4d71303` after the bench
paradigm landed. Each baseline is locked in its own uv-managed venv per
[ADR-0017](docs/adr/0017-local-bench-harness.md).

## Install

```bash
pip install vernier                  # Python wheel
cargo add vernier-core               # Rust library
cargo install vernier-cli            # `vernier` CLI binary
```

Wheels ship for linux x86_64 / aarch64 (glibc + musl), macOS
x86_64 / arm64, and windows x64. The umbrella `vernier` crate name on
crates.io is held as a `0.0.0` placeholder; `vernier-core` is the real
Rust entry point — see
[`docs/engineering/registry-reservations.md`](docs/engineering/registry-reservations.md).

## 60-second example

One-shot — predictions already serialized to JSON (end-of-epoch
checkpoint, CI gate, post-training inspection):

```python
from pathlib import Path
from vernier.instance import Bbox, CocoDataset, Evaluator

gt_bytes = Path("instances_val2017.json").read_bytes()
dt_bytes = Path("detections.json").read_bytes()

dataset = CocoDataset.from_json(gt_bytes)
summary = Evaluator(iou=Bbox()).evaluate(dataset, dt_bytes)
for line in summary.pretty_lines():
    print(line)
```

In a training loop — overlap eval with the next training step. The
matching kernel runs on a worker thread, so `submit(...)` returns
immediately and the training thread keeps moving. Passing a
`CocoDataset` reuses the parsed-once GT and its per-kernel
derivation cache across every epoch (ADR-0020):

```python
import json
from pathlib import Path
from vernier.instance import Bbox, CocoDataset, Evaluator

gt = CocoDataset.from_json(Path("instances_val2017.json").read_bytes())
evaluator = Evaluator(iou=Bbox())
with evaluator.background(gt) as bg:
    for images, _ in val_loader:
        detections = model(images)  # list[{image_id, category_id, bbox, score}]
        bg.submit(json.dumps(detections).encode())
    summary = bg.finalize()
print("AP =", summary.stats[0])
```

Both end in the same 12-line `pycocotools`-shaped Summary;
[`docs/tutorials/first-evaluation.md`](docs/tutorials/first-evaluation.md)
walks each end-to-end.

## Three evaluation paradigms

Pick the submodule whose **input shape** matches your model's output —
they have different data models, different matching rules, and
different parity oracles:

- `vernier.instance` — detections with scores → bbox / segm /
  boundary / keypoints AP.
- `vernier.panoptic` — RGB-encoded panoptic PNGs + `segments_info`
  JSON → PQ.
- `vernier.semantic` — single-channel class-id label maps → mIoU /
  FWIoU / pAcc / mAcc.

See [Three paradigms](docs/explanation/three-paradigms.md) for when to
use which.

## Documentation

- **Tutorials** — [`docs/tutorials/`](docs/tutorials/)
- **Migration guides** (from pycocotools, faster-coco-eval,
  panopticapi, lvis-api, mmsegmentation) —
  [`docs/migrate/`](docs/migrate/)
- **How-to** —  [`docs/how-to/`](docs/how-to/)
- **Reference** — [`docs/reference/`](docs/reference/)
- **Design / ADRs** — [`docs/adr/`](docs/adr/)
- **Comparison vs pycocotools / faster-coco-eval / panopticapi /
  boundary-iou-api / lvis-api / mmsegmentation** —
  [`docs/comparison.md`](docs/comparison.md)

## Contributing

Local checks: `just lint && just test && just audit`. The full
contributor workflow (ADR lifecycle, vendoring policy, code style) is
in [`CONTRIBUTING.md`](CONTRIBUTING.md). Repository layout and
common just recipes are in [`CLAUDE.md`](CLAUDE.md).

## License

Dual-licensed under [Apache-2.0](LICENSE-APACHE) or [MIT](LICENSE-MIT)
at your option.

## Third-party code

vernier vendors a small number of test-only reference implementations
to support parity testing. None of this code is included in published
wheels or linked into the Rust binary. See
[`THIRD_PARTY_NOTICES.md`](THIRD_PARTY_NOTICES.md) for the full
inventory and license attributions.

