Eval Report: ci-nightly

Profile: gdm-swebench-lite-v1 | Tasks: 8 | Pass rate: 37.5% | Cost: $0.0800

Task IDBandScorePassedCost
canary-6hard0.309$0.0100
canary-0hard0.309$0.0100
canary-2hard0.309$0.0100
python-bugfix-easy-001easy0.740$0.0100
python-config-easy-001easy0.740$0.0100
canary-7hard0.309$0.0100
canary-12hard0.309$0.0100
python-dependency-easy-001easy0.740$0.0100

Leaderboard Snapshot

Latest run: d18c468d-b475-4c55-8f88-b3cc8b170d54 | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-26T08:44:48.776459+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
d18c468d-b475-4c55-8f88-b3cc8b170d54coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T08:44:48.776459+00:00
198c771f-655f-4796-9fb7-5a1803bfd5d1coder4669773b4fbe9d507f1396f38777a1b36998faf30.3092026-04-26T08:44:48.681332+00:00
5b9aa15c-78a7-4e33-88af-137027f13a48coder4669773b4fbe9d507f1396f38777a1b36998faf30.3092026-04-26T08:44:48.613029+00:00
0464fae2-ebcd-4ee7-bbe4-b14c6233540ccoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T08:44:48.532319+00:00
18db992c-b776-4be9-a5e4-6ab8af01745fcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T08:44:48.447448+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [#######-------------] 37.5% @ $0.0100  (coder)