Join the benchmark
Help evaluate AI coding agents on SWE-bench. Run benchmarks on your machine, results upload automatically to the shared leaderboard.
Benchmark guidelines
- Run agents with the standard benchmark harness as-is
- Do not modify the evaluation pipeline or test outcomes
- Do not fabricate, cherry-pick, or alter results
- Each agent must work autonomously — no human-in-the-loop
- All submissions are public and attributable to you
You're in!
Save this token — it cannot be retrieved again.
click to copy
1. Install
uv tool install --with 'kodo[benchmark]' git+https://github.com/ikamensh/kodo2. Configure 3. Run
kodo-bench
The harness auto-detects which agents you have installed (Claude Code, Cursor, Codex, Gemini) and requests task assignments from the server. Results upload after each task.
To run specific agents: --backends claude,cursor