Eval Report: ci-pr-smoke

Profile: gdm-swebench-lite-v1 | Tasks: 15 | Pass rate: 100.0% | Cost: $0.0150

Task IDBandScorePassedCost
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010
python-dependency-easy-001easy0.740$0.0010
python-explain-easy-001easy0.740$0.0010
python-multi-file-easy-001easy0.740$0.0010
python-performance-easy-001easy0.740$0.0010
python-recovery-easy-001easy0.740$0.0010
python-refactor-easy-001easy0.740$0.0010
python-security-fix-easy-001easy0.740$0.0010
python-test-writing-easy-001easy0.740$0.0010
typescript-bugfix-easy-001easy0.740$0.0010
typescript-config-easy-001easy0.740$0.0010
typescript-dependency-easy-001easy0.740$0.0010
typescript-explain-easy-001easy0.740$0.0010
typescript-multi-file-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: d269b23a-9396-4f67-a8d6-24a86e684381 | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-26T09:24:55.767576+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
d269b23a-9396-4f67-a8d6-24a86e684381coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:55.767576+00:00
c0386627-b3eb-4cb7-ba21-0559904f6df0coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:55.701385+00:00
d7451d63-ee19-401e-84fb-a4ecdcdb272fcoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:55.627080+00:00
f57a270f-7ffc-4db8-94b6-86e3a39fe575coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:55.559663+00:00
843a998b-8602-4ad8-9e9b-5701519a3224coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-26T09:24:55.485028+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [####################] 100.0% @ $0.0095  (coder)