Eval Report: ci-nightly

Profile: gdm-swebench-lite-v1 | Tasks: 8 | Pass rate: 37.5% | Cost: $0.0080

Task IDBandScorePassedCost
canary-6hard0.310$0.0010
canary-0hard0.310$0.0010
canary-2hard0.310$0.0010
python-bugfix-easy-001easy0.740$0.0010
python-config-easy-001easy0.740$0.0010
canary-7hard0.310$0.0010
canary-12hard0.310$0.0010
python-dependency-easy-001easy0.740$0.0010

Leaderboard Snapshot

Latest run: 26256635-acc7-489a-8768-23f70e9bc02e | Latest model: coder | Latest score: 0.740 | Recorded at: 2026-04-27T00:58:49.528753+00:00

Recent Trend

Run IDModelGit SHAScoreCreated
26256635-acc7-489a-8768-23f70e9bc02ecoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T00:58:49.528753+00:00
4ff3f238-2361-42f3-acbd-96b03225e1e8coder4669773b4fbe9d507f1396f38777a1b36998faf30.3102026-04-27T00:58:49.467507+00:00
6fc328ea-cdfd-4a51-9ee5-cfba38e5d214coder4669773b4fbe9d507f1396f38777a1b36998faf30.3102026-04-27T00:58:49.412347+00:00
acea0cb1-573e-4134-a38a-018fa13c75a3coder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T00:58:49.337767+00:00
36ca7251-929b-4d84-a5ad-0812c81ecf9ccoder4669773b4fbe9d507f1396f38777a1b36998faf30.7402026-04-27T00:58:49.265132+00:00

Cost Frontier

pass_rate vs cost_usd (Pareto frontier marked with *)
* [#######-------------] 37.5% @ $0.0010  (coder)