ai-skill-auditThe linter for AI skills.
Grade quality across 6 dimensions and scan for security threats — one tool to harden the skills you write and vet the ones you install.
Quality scoring alone can't tell these apart — both are well-written. The trust scan is what separates them.
14 hidden findings across 7 categories — prompt injection, credential theft, obfuscated shell execution — behind a perfect-looking skill.
80,000+ community skills are circulating across Claude Code, Cursor, and MCP platforms. You copy-paste a config, install a skill, or browse a marketplace — and trust that it's safe.
Independent audits have found 13–37% of marketplace skills contain critical issues: prompt injection, hardcoded credentials, data exfiltration, and destructive commands hidden in otherwise normal-looking files.
Score and harden your own skills before you ship them.
audit SKILL.md --verbose
Vet a community skill or repo before it reaches your agent.
audit github.com/user/repo
Gate CI so quality and trust can't regress below a bar.
audit skills/ --min-grade B
Patterns informed by arXiv:2604.03070, ClawHavoc, OWASP LLM Top 10, and ongoing security research.
Real scans with static analysis and agent-friendly structured outputs:
A crafted test skill with near-perfect quality scores — but trust at 0%. Hides prompt injection, credential theft, obfuscated shell execution, and destructive commands.
14 findings across 7 categories — exactly how real attacks work
A popular "100-tool MCP config" with hardcoded GitHub, Slack, Discord, and API keys.
Overall risk: CRITICAL — secret hygiene scored 0%
Test skill mapping to every vulnerability category from the "Credential Leakage in LLM Agent Skills" paper — reverse shells, persistence, crypto mining, credential logging.
16 findings across 6 categories — all 10 steps flagged
The same static scan with an optional LLM review layered on to surface semantic findings such as intent mismatch and hidden behavior.
Use --llm for small, public inputs where semantic review is worth the extra cost
A legitimate WebGPU/Three.js skill collection scanned straight from GitHub — 16 files, zero security threats. The tool grades them on quality: 7 score A, 9 land C–D for missing examples, gotchas, or structure. This is the "harden the skills you write" half, on real-world code.
trust: clean across all 16 files · quality is where the work is — actionable, not accusatory
# Score a skill you're writing
ai-skill-audit audit SKILL.md --verbose
# Vet a GitHub repo before installing
ai-skill-audit audit https://github.com/user/repo --summary
# Scan an MCP config
ai-skill-audit audit mcp.json
# Optional semantic review (only this sends data out)
ai-skill-audit audit skills/ --llm --verbose
# Generate a shareable HTML report
ai-skill-audit audit skills/ --output html > report.html
# Gate CI on quality + trust
ai-skill-audit audit skills/ --min-grade B