Benchmark AI Coding Agents on YOUR Repo

The Agent Quality Toolkit β€” measure, generate, guard, and improve AI agent performance across your entire codebase in minutes.

Get Started View on GitHub
2690
Tests Passing
6
PyPI Packages
57
Versions Shipped
pip install
No Config Required

One Pipeline, Five Stages

From raw codebase to actionable agent quality score β€” automated end to end.

πŸ“
MEASURE
Benchmark agents on real tasks
β†’
βš™οΈ
GENERATE
Auto-create context files
β†’
πŸ›‘οΈ
GUARD
Static analysis on diffs
β†’
🧠
LEARN
Turn failures into rules
β†’
πŸ†
BENCHMARK
Compare Claude vs Codex

Six Tools, One Install

Everything you need to evaluate and improve AI agent quality.

MEASURE

coderace

Benchmark AI agents on real coding tasks from your repository. Compare Claude vs Codex vs any agent side-by-side.

GENERATE

agentmd

Automatically generate AGENTS.md and CLAUDE.md context files that make any AI coding agent perform better on your codebase.

GUARD

agentlint

Static analysis for AI diffs and context files. Catch hallucinated imports, broken references, and anti-patterns before they land.

LEARN

agentreflect

Turn agent failures into reusable rules. Distill lessons from bad diffs into your AGENTS.md automatically.

CONNECT

agentkit-mcp

Model Context Protocol server exposing all toolkit tools. Drop agentkit into any MCP-compatible agent workflow.

ORCHESTRATE

agentkit-cli

The umbrella CLI that ties it all together. One command to run the full pipeline, score your repo, and generate reports.

LEARN

Daily Leaderboard

Daily ranking of the most agent-ready trending GitHub repos. Run agentkit daily --pages to publish a permanent leaderboard to GitHub Pages.


Quickstart

Quality score in under 60 seconds β€” no configuration required.

# Install the toolkit
pip install agentkit-cli

# Run quickstart β€” checks toolchain, scores your repo, shares a scorecard
agentkit quickstart

πŸš€ agentkit quickstart
Your fastest path to an agent quality score.

Checking toolchain readiness…
βœ“ 4 tools ready
Running fast analysis (agentlint + agentmd)…

β”Œβ”€ my-project ──────────────────────────────┐
β”‚ Score: 78/100 Grade: B (12.4s) β”‚
β”‚ Top Findings: β”‚
β”‚ β€’ [agentlint] 3 context issues found β”‚
β”‚ β€’ [agentmd] Documentation score: 82 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Score card: https://here.now/abc123

# Full analysis
agentkit run .

Command Reference

All major commands at a glance.

Command Description
agentkit quickstart Fastest path to a composite quality score β€” start here
agentkit run . Full pipeline analysis on the current directory
agentkit analyze github:owner/repo Analyze any public GitHub repository
agentkit benchmark Compare Claude vs Codex on your codebase tasks
agentkit score Compute and display composite score
agentkit gate --min-score 70 Fail CI if score falls below threshold
agentkit demo --record Print VHS tape commands for terminal recording
agentkit org github:vercel Score every public repo in a GitHub org
agentkit doctor Check toolchain health and configuration
agentkit init Initialize agentkit in the current project