kodo bench / contribute

Join the benchmark

Help evaluate AI coding agents on SWE-bench. Run benchmarks on your machine, results upload automatically to the shared leaderboard.

Benchmark guidelines
  • Run agents with the standard benchmark harness as-is
  • Do not modify the evaluation pipeline or test outcomes
  • Do not fabricate, cherry-pick, or alter results
  • Each agent must work autonomously — no human-in-the-loop
  • All submissions are public and attributable to you

You're in!

Save this token — it cannot be retrieved again.

click to copy
1. Install
uv tool install --with 'kodo[benchmark]' git+https://github.com/ikamensh/kodo
2. Configure

      3. Run
      
kodo-bench

The harness auto-detects which agents you have installed (Claude Code, Cursor, Codex, Gemini) and requests task assignments from the server. Results upload after each task.

To run specific agents: --backends claude,cursor