Metadata-Version: 2.4
Name: robovet
Version: 0.1.0
Summary: Vet your robot datasets: diagnose, repair, and quality-score LeRobot-format episode data before you waste a training run.
Author: robovet contributors
License: Apache-2.0
Project-URL: Homepage, https://github.com/RonaldSit/robovet
Project-URL: Issues, https://github.com/RonaldSit/robovet/issues
Keywords: robotics,lerobot,dataset,imitation-learning,data-quality,vla
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: pyarrow>=14
Requires-Dist: typer>=0.12
Requires-Dist: rich>=13
Requires-Dist: jinja2>=3.1
Provides-Extra: video
Requires-Dist: av>=12; extra == "video"
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: av>=12; extra == "dev"
Dynamic: license-file

# robovet

**Vet your robot data.** Diagnose, repair, and quality-score LeRobot-format
datasets — *before* you waste a training run on broken episodes.

```text
$ robovet doctor ./my_dataset

  FAIL DATA-104   1 episode where metadata 'length' disagrees with the parquet
                  row count — the classic signature of a corrupted episode map.
  FAIL STATS-302  1 stat block disagrees with the actual data — every training
                  run normalizes with these numbers.
  WARN TIME-202   Loading this dataset requires tolerance_s ≥ 7.7e-03
                  (77× the default). Worst: episode 2, 7.29 ms off the grid.

  3 fail · 4 warn · 22 pass
  UNSAFE TO TRAIN — fix the FAILs first.        (exit code 1 — CI-gate it)
```

## Why this exists

Robot learning's bottleneck moved from models to data, and the data is quietly
broken. An April 2026 audit of 10 popular open-source robot datasets found
floating-point drift that breaks video decoding after ~45 episodes, a
v2.1→v3.0 conversion bug that **silently** corrupts episode↔frame mapping
(your run "works" — the policy just learns from jumbled sequences), datasets
that only load with `tolerance_s` set to 100× the default, and **no quality
metrics anywhere**. Hugging Face's own community-dataset cleaning run tells the
same story: **111 of 240 datasets failed validation** — and that pipeline is
internal, not something you can run on yours. Meanwhile the 2026 consensus is that a well-curated
500-demo fine-tune beats a poorly-curated one at 10× the scale — curation
tooling is the gap, not model size.

Every check in robovet maps to a documented, real-world failure. The receipts
— issue numbers, audit findings, papers — live in [PAIN.md](PAIN.md).

## Try it in 30 seconds (no robot required)

```bash
pip install robovet[video]

robovet demo ./demo          # synthetic SO-100-style dataset, 10 real-world
                             # defect classes injected (each tagged with the
                             # GitHub issue it reproduces)
robovet demo ./demo3 --v3    # same idea in the v3.0 shared-file layout
robovet doctor ./demo        # catches all of them; exit 1
robovet fix    ./demo --apply  # repairs the metadata class; .bak backups
robovet doctor ./demo        # metadata FAILs gone
robovet report ./demo -o report.html   # one self-contained, shareable page
```

`robovet demo ./demo --clean` builds the same dataset with zero defects, so
you can see what all-green looks like.

## What it checks

| Group | Catches | Maps to |
|---|---|---|
| `STRUCT-0xx` | missing/invalid metadata, dangling episodes, orphan files | lerobot#761 (no validator for hand-rolled conversions) |
| `DATA-1xx` | **episode↔frame mapping corruption**, schema drift, NaN/Inf, dead dims | lerobot#2401 (silent v2.1→v3.0 corruption) |
| `TIME-2xx` | off-grid timestamps **with the exact `tolerance_s` you'd need**, non-monotonic time, cumulative FP drift | lerobot#933, lerobot#3177 |
| `STATS-3xx` | stored normalization stats that disagree with the data ("normalization poison"), **broken quantile stats** (q01/q99) | HF docs warning; phospho repair post; lerobot#2189 |
| `VIDEO-4xx` | video/parquet frame-count desync — **including per-episode windows inside shared v3 files**, codec-aware compatibility tiers (h264 ✓ / AV1 info — it's lerobot's own default / mpeg4-hevc warn), fps mismatch | Correll-lab postmortem; phospho notes |

`robovet doctor` exits **1** on any FAIL and takes `--json`, so it drops
straight into CI: gate dataset merges the way Codecov gates coverage.

## Quality scoring (triage, not truth)

```bash
robovet score ./my_dataset --csv scores.csv
```

Per-episode signals, all computed in one pass: jerk smoothness, idle ratio,
gripper chatter, duration outliers, action saturation, exact duplicates.
This is deliberately the *cheap first pass* — the smoothness-first approach
the 2026 curation literature (rinse, Demo-SCORE, QoQ) argues should precede
expensive policy-rollout or influence-function filtering. Scores put the worst
episodes in front of a human in seconds; **review before you delete**.
Statistical flags carry practical-significance guards, so homogeneous datasets
don't self-flag.

## Repair contract

`robovet fix` is **dry-run by default**. With `--apply` it rewrites only
metadata (episode lengths, normalization stats, info.json counters), backs up
every touched file as `.bak`, never modifies parquet or video payloads, and
**preserves everything it doesn't understand**: quantile keys (q01/q99 — the
v3 QUANTILES-normalization era), image-stat blocks, and unknown episode fields
such as tags. A repair tool must never be the thing that deletes your data;
the test suite enforces these guarantees. Frame surgery
(tail-trimming desynced episodes, timestamp re-gridding) is on the
[roadmap](ROADMAP.md) under the same contract.

## Scope, honestly

- **LeRobot v2.0 / v2.1 and v3.x are both first-class for diagnosis** — each
  has its own synthetic fixture and test suite, and v3 gets per-episode video
  alignment inside shared files (`VIDEO-405`) plus per-episode stats checks
  parsed from the v3 metadata. `fix` currently rewrites v2.x episode metadata
  and global stats; v3 per-episode stats regeneration is on the roadmap.
- robovet does **not** merge/split/delete episodes — `lerobot` ships that
  natively now. We do what the official stack doesn't: deep validation,
  metadata repair, and quality triage.
- Local-first by design. Your data never leaves your disk — deployment-specific
  data is a competitive asset; treat it like one.

## Library use

```python
from robovet import load_dataset, run_doctor, score_dataset

ds = load_dataset("./my_dataset")
rep = run_doctor(ds)          # rep.exit_code, rep.results, rep.counts
sc  = score_dataset(ds, scan=rep.scan)   # reuses the same single IO pass
```

Apache-2.0. Issues and broken-dataset war stories very welcome — if your
dataset breaks in a way robovet doesn't catch, that's a bug report we want.
