The Agent Quality Toolkit β measure, generate, guard, and improve AI agent performance across your entire codebase in minutes.
From raw codebase to actionable agent quality score β automated end to end.
Everything you need to evaluate and improve AI agent quality.
Benchmark AI agents on real coding tasks from your repository. Compare Claude vs Codex vs any agent side-by-side.
Automatically generate AGENTS.md and CLAUDE.md context files that make any AI coding agent perform better on your codebase.
Static analysis for AI diffs and context files. Catch hallucinated imports, broken references, and anti-patterns before they land.
Turn agent failures into reusable rules. Distill lessons from bad diffs into your AGENTS.md automatically.
Model Context Protocol server exposing all toolkit tools. Drop agentkit into any MCP-compatible agent workflow.
The umbrella CLI that ties it all together. One command to run the full pipeline, score your repo, and generate reports.
Daily ranking of the most agent-ready trending GitHub repos. Run agentkit daily --pages to publish a permanent leaderboard to GitHub Pages.
AI-ready repos today β trending GitHub repos scored for agent-readiness and published to a permanent, SEO-crawlable leaderboard. Run agentkit pages-trending to publish your own.
Subscribe to daily AI-ready repos β
Generate a shareable agent-readiness profile card for any GitHub user. Scores all public repos and ranks them with an AβD grade. Run agentkit user-scorecard github:<user> --share to publish an instant link.
Quality score in under 60 seconds β no configuration required.
All major commands at a glance.
| Command | Description |
|---|---|
| agentkit quickstart | Fastest path to a composite quality score β start here |
| agentkit run . | Full pipeline analysis on the current directory |
| agentkit analyze github:owner/repo | Analyze any public GitHub repository |
| agentkit benchmark | Compare Claude vs Codex on your codebase tasks |
| agentkit score | Compute and display composite score |
| agentkit gate --min-score 70 | Fail CI if score falls below threshold |
| agentkit demo --record | Print VHS tape commands for terminal recording |
| agentkit org github:vercel | Score every public repo in a GitHub org |
| agentkit doctor | Check toolchain health and configuration |
| agentkit init | Initialize agentkit in the current project |