Eval Report: ci-nightly

Profile: gdm-swebench-lite-v1 | Tasks: 8 | Pass rate: 37.5% | Cost: $0.0080

Task IDBandScorePassedCost
canary-6hard0.310$0.0010
canary-0hard0.310$0.0010
canary-2hard0.310$0.0010
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010
canary-7hard0.310$0.0010
canary-12hard0.310$0.0010
python-dependency-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: 0d49ee40-cdeb-4e80-a404-66c635c25885 | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-26T16:05:33.777002+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
0d49ee40-cdeb-4e80-a404-66c635c25885coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:05:33.777002+00:00
7654aac2-45aa-4c96-8928-948d3d48c1a6coder4669773b4fbe9d507f1396f38777a1b36998faf30.3102026-04-26T16:05:33.725775+00:00
f74e4532-2fb6-43ab-b6f3-6987d7f63b90coder4669773b4fbe9d507f1396f38777a1b36998faf30.3102026-04-26T16:05:33.655274+00:00
d717703e-5245-40c2-a252-29bf5eafcf22coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:05:33.493307+00:00
9f755db7-dc12-485f-8c47-bbb59834aaf9coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:05:33.397055+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [#######-------------] 37.5% @ $0.0010  (coder)