Eval Report: ci-nightly

Profile: gdm-swebench-lite-v1 | Tasks: 8 | Pass rate: 37.5% | Cost: $0.0080

Task IDBandScorePassedCost
canary-6hard0.310$0.0010
canary-0hard0.310$0.0010
canary-2hard0.310$0.0010
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010
canary-7hard0.310$0.0010
canary-12hard0.310$0.0010
python-dependency-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: c5bd05dc-b37d-4ae8-9d2c-03608572db24 | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-27T01:21:05.624826+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
c5bd05dc-b37d-4ae8-9d2c-03608572db24coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T01:21:05.624826+00:00
a7123a5c-3a24-4710-bb2b-d4f610df5712coder4669773b4fbe9d507f1396f38777a1b36998faf30.3102026-04-27T01:21:05.537524+00:00
020a68de-2084-4144-8233-61bb47f7ac6bcoder4669773b4fbe9d507f1396f38777a1b36998faf30.3102026-04-27T01:21:05.440221+00:00
67043b70-6c33-49a3-8314-2f5301356c9bcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T01:21:05.347993+00:00
6c77a649-6f59-4b0a-9375-2d98d01f0a2dcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T01:21:05.287182+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [#######-------------] 37.5% @ $0.0010  (coder)