Eval Report: ci-pr-smoke

Profile: gdm-swebench-lite-v1 | Tasks: 2 | Pass rate: 100.0% | Cost: $0.0020

Task IDBandScorePassedCost
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: f8c4bbf5-09c5-447b-8dc7-438db4965339 | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-26T09:24:56.089936+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
f8c4bbf5-09c5-447b-8dc7-438db4965339coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:56.089936+00:00
83041cba-3f18-4973-aa60-adb953063fb8coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:56.016205+00:00
d269b23a-9396-4f67-a8d6-24a86e684381coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:55.767576+00:00
c0386627-b3eb-4cb7-ba21-0559904f6df0coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:55.701385+00:00
d7451d63-ee19-401e-84fb-a4ecdcdb272fcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:55.627080+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [####################] 100.0% @ $0.0095  (coder)