Eval Report: ci-pr-smoke

Profile: gdm-swebench-lite-v1 | Tasks: 2 | Pass rate: 100.0% | Cost: $0.0020

Task IDBandScorePassedCost
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: 90faf7ec-dff2-412a-b808-e3f3fe2f7dd3 | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-26T16:07:15.406970+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
90faf7ec-dff2-412a-b808-e3f3fe2f7dd3coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:07:15.406970+00:00
c298b0fd-a152-4479-a61a-798022f23bcdcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:07:15.259523+00:00
6a69e095-47e3-4da7-80a2-27e2bb466e6dcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:07:15.072786+00:00
85bd88da-0ea3-4032-9c75-017c6b17dfa7coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:07:15.008355+00:00
298d1714-2daa-47af-973a-f501deec0ee5coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T16:07:14.950676+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [####################] 100.0% @ $0.0090  (coder)