LoopBench

tasks
19
submissions
seeds
5
updated

Leaderboard

Generalist rank — mean of four suite scores.

View
Task
Chart

# Loop Submitter LES ↕ Success@k ↕ Cost ↕ Harness Spec Repro

Tasks

Metrics

What LoopBench measures

Closed loops, not prompts

LSS YAML → LoopGym SimEnv → Success@k and observed LES.

Reproducible

Five seeds, SimEnv v0.2, auditable specs.

Community scoreboard

External rows credited; human review on merge.

Run your first score

pip install "le-loopforge>=0.2.0" "le-loopctl>=0.1.0" loopbench loopgym
loopbench run --suite suite-repair --spec your-loop.yaml --seeds 0,1,2,3,4 -o results.json
loopbench validate results.json

Beat LB-CR-1 guide →