Eval Report: ci-nightly

Profile: gdm-swebench-lite-v1 | Tasks: 8 | Pass rate: 37.5% | Cost: $0.0800

Task IDBandScorePassedCost
canary-6hard0.309$0.0100
canary-0hard0.309$0.0100
canary-2hard0.309$0.0100
python-bugfix-easy-001easy0.740$0.0100
python-config-easy-001easy0.740$0.0100
canary-7hard0.309$0.0100
canary-12hard0.309$0.0100
python-dependency-easy-001easy0.740$0.0100

Leaderboard Snapshot

Latest run: ad28542d-917d-47fd-af0c-80be5332faa9 | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-26T08:07:15.096104+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
ad28542d-917d-47fd-af0c-80be5332faa9coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T08:07:15.096104+00:00
dd7ef049-0fd5-49b2-bfb9-9303a1d7ae0fcoder4669773b4fbe9d507f1396f38777a1b36998faf30.3092026-04-26T08:07:15.000709+00:00
24b884fa-0e48-475c-990b-533ed409652acoder4669773b4fbe9d507f1396f38777a1b36998faf30.3092026-04-26T08:07:14.888902+00:00
dc8f5abf-b429-4f0d-a524-06ba5a0f1ecacoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T08:07:14.790079+00:00
cef0ccbc-41c5-4e21-96fc-2a7501578e9acoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T08:07:14.721875+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [#######-------------] 37.5% @ $0.0100  (coder)