Eval Report: ci-pr-smoke

Profile: gdm-swebench-lite-v1 | Tasks: 2 | Pass rate: 100.0% | Cost: $0.0020

Task IDBandScorePassedCost
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: dab26c4a-16b2-4ef5-aa30-964b72cf4c29 | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-26T09:16:32.720036+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
dab26c4a-16b2-4ef5-aa30-964b72cf4c29coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:16:32.720036+00:00
da14a72a-af7d-4e6a-bf00-32a3ecaaf649coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:16:32.218111+00:00
9404c3ab-d1ec-490a-ba02-67150b92f695coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:16:31.660289+00:00
2067fc3c-b2e9-46cb-8527-3a7126323486coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:16:31.567122+00:00
cbabe835-2607-4b14-9fd8-65d3dcb2edd7coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:16:31.472535+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [####################] 100.0% @ $0.0097  (coder)