Eval Report: ci-nightly

Profile: gdm-swebench-lite-v1 | Tasks: 8 | Pass rate: 37.5% | Cost: $0.0080

Task IDBandScorePassedCost
canary-6hard0.310$0.0010
canary-0hard0.310$0.0010
canary-2hard0.310$0.0010
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010
canary-7hard0.310$0.0010
canary-12hard0.310$0.0010
python-dependency-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: 4761611a-0952-41e4-8cac-6bfb2627aacc | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-26T16:08:15.271736+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
4761611a-0952-41e4-8cac-6bfb2627aacccoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:08:15.271736+00:00
9cad7b9d-1f64-4fe7-a9de-af6ade2a98aacoder4669773b4fbe9d507f1396f38777a1b36998faf30.3102026-04-26T16:08:15.193888+00:00
83ec6118-bb27-4e11-ab0c-e7d918bcb726coder4669773b4fbe9d507f1396f38777a1b36998faf30.3102026-04-26T16:08:15.098087+00:00
cee5c41c-6470-42f5-a7b6-b7b366e37abdcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:08:14.905573+00:00
8dae0598-5693-4fa8-92f8-efa49f6e80a0coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:08:14.820858+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [#######-------------] 37.5% @ $0.0010  (coder)