Eval Report: ci-pr-smoke

Profile: gdm-swebench-lite-v1 | Tasks: 2 | Pass rate: 100.0% | Cost: $0.0020

Task IDBandScorePassedCost
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: dbb0e180-0d4b-4e8e-ba00-a54d97a5908b | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-27T12:45:15.088630+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
dbb0e180-0d4b-4e8e-ba00-a54d97a5908bcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T12:45:15.088630+00:00
b9bd1f9a-4e6a-45df-8b4a-b70ca99311e7coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T12:45:14.980683+00:00
a073994c-7144-4664-a1fc-28eee44cb650coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T12:45:13.983516+00:00
a12bc14b-a170-48f9-a46f-7102cbe03b31coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T12:45:13.753615+00:00
0da2c3d4-e388-444f-9083-2e4889b4ae42coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T12:45:13.645741+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [####################] 100.0% @ $0.0060  (coder)