Eval Report: ci-pr-smoke

Profile: gdm-swebench-lite-v1 | Tasks: 2 | Pass rate: 100.0% | Cost: $0.0020

Task IDBandScorePassedCost
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: e2fe354e-baa2-45d7-991e-4d288282dafe | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-27T11:45:13.289416+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
e2fe354e-baa2-45d7-991e-4d288282dafecoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T11:45:13.289416+00:00
6a2ae84d-a56e-45de-9c13-76655b1bb9fecoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T11:45:13.181618+00:00
ecdbf39d-6e67-4aa3-b438-ced38c7712b1coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T11:45:12.279898+00:00
ffb00e32-3c93-4a37-af25-423ba60cebe4coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T11:45:12.168971+00:00
2d30d0c9-b463-46a2-ac49-d39ba58278aecoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T11:45:12.015531+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [####################] 100.0% @ $0.0067  (coder)