Eval Report: ci-pr-smoke

Profile: gdm-swebench-lite-v1 | Tasks: 2 | Pass rate: 100.0% | Cost: $0.0020

Task IDBandScorePassedCost
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: ef710517-417c-4559-a619-dfb9b69959db | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-27T01:10:36.100995+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
ef710517-417c-4559-a619-dfb9b69959dbcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T01:10:36.100995+00:00
1ce13ed9-584a-4351-91c7-9727ff14495acoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T01:10:36.061007+00:00
3229d894-4a92-4f48-848b-5980bd9fa8a0coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T01:10:35.931867+00:00
5c0c3071-d3ea-4a6a-b58e-56e30c31332bcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T01:10:35.892618+00:00
15e16330-d928-4af7-bcc4-a2d981c98520coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T01:10:35.862763+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [####################] 100.0% @ $0.0075  (coder)