🧩 agent-skill-infra

从开发到运维全流程覆盖 — 质量检查 · 行为测试 · 版本感知。别人只做安装前扫描,我们覆盖整个生命周期。

Python 3.12 + uv 3 CLI 工具 MIT License 162 Tests v0.2.0
📊
3
CLI 工具
162
测试通过
🧩
5
判定器类型
<10s
全量测试耗时

从创建到发布的完整管线

1

编写 Skill

SKILL.md + evals.json

2

质量检查

skill-quality

3

行为测试

skill-test run

4

版本发布

skill-version baseline

🚀 快速体验:
agent-skill-infra demo
# 🧩 agent-skill-infra v0.2.0 — Skill 全生命周期工具链

$ pip install agent-skill-infra
Successfully installed agent-skill-infra-0.2.0

$ skill-quality docs/examples/demo-skill/SKILL.md --output json
{
  "skill_name": "code-review-checklist",
  "overall_score": 0.92,
  "dimensions": [
    {"name": "trigger_precision", "score": 0.90},
    {"name": "helloandy_8dim", "score": 0.94}
  ]
}

$ skill-test run docs/examples/demo-skill/evals.json --adapter mock
                    Skill Test Report: code-review-checklist
┌──────────────────────┬──────┬───────┬──────────┬────────────────┐
│ Case ID              │ Pass │ Score │ Time(ms) │ Reason         │
├──────────────────────┼──────┼───────┼──────────┼────────────────┤
│ ✓ should-contain…    │ PASS │ 0.750 │       0  │ 3/4 keywords   │
│ ✓ should-detect…     │ PASS │ 0.800 │       0  │ 4/5 keywords   │
│ …                   │ …    │ …     │        …  │ …              │
├──────────────────────┼──────┼───────┼──────────┼────────────────┤
│ Total: 5             │ Pass:│ Fail: │ Rate:    │ Time:  0ms     │
│                      │  2   │  3    │  40.0%   │                │
└──────────────────────┴──────┴───────┴──────────┴────────────────┘

$ skill-version diff . --old-ref HEAD~3 --new-ref HEAD --output json
[结构化 diff 输出,含文件路径、增删行数、变更摘要]
    

✅ skill-quality — Quality Check

基于 helloandy 8 维度评分体系,自动评估 SKILL.md 质量。支持集成 agent-skill-linter 和安全扫描。

检查器评分维度输出
TriggerChecker触发词覆盖度 + 特异性0.0-1.0 分数 + 发现列表
OutputChecker输出格式 + 示例 + 约束格式/示例/分段检测
ToleranceChecker错误处理信号try/catch/fallback 等 6 个信号
TokenChecker行数效率行数统计 + 效率评分
HelloAndyChecker8 维度综合评分技术 5 维度 + 输出 3 维度
LinterAdapteragent-skill-linter (可选)17 条格式规则
SecurityIntegrationCisco Scanner (可选)安全扫描结果
$ skill-quality /path/to/SKILL.md --output json
$ skill-quality /path/to/SKILL.md --output json
{
  "skill_name": "code-review-checklist",
  "overall_score": 0.92,
  "file_path": "/path/to/SKILL.md",
  "total_lines": 111,
  "token_estimate": 935,
  "dimensions": [
    {
      "name": "trigger_precision",
      "score": 0.90,
      "findings": ["Good keyword coverage with domain-specific terms"]
    },
    {
      "name": "helloandy_8dim",
      "score": 0.94,
      "findings": [
        "Good keyword coverage",
        "Output format defined",
        "Examples provided",
        "Error handling: 6 signals detected",
        "Edge case coverage: 4 signals"
      ]
    }
  ]
}

$ skill-quality /path/to/SKILL.md --lint
Quality Report: code-review-checklist
Overall Score: 92%
  trigger_precision: 90%
    - Good keyword coverage with domain-specific terms
  helloandy_8dim: 94%
    - [trigger] Good keyword coverage
    - [output] Output format defined, Examples provided
    - [error] Good error handling coverage (6 signals)
    - [edge] Good edge case coverage (4 signals)
  agent-skill-linter: 100%
    - No linter violations found.
    

🧪 skill-test — Behavior Test Runner

运行 evals.json 测试套件,5 种判定器类型,支持 CI 集成。

判定器用途示例
keyword关键词匹配(any/all 模式)输出是否包含预期关键词
schemaJSON Schema 验证输出是否符合 JSON Schema
flow工具调用序列校验Agent 是否按预期顺序调用工具
snapshot快照对比(回归检测)输出是否与基线快照一致
llmLLM-as-Judge(语义等价)两次输出语义是否等价(需 API Key)

支持的 evals.json 格式:

{ "skill": "code-review-checklist", "version": "1.0.0", "cases": [ { "id": "should-contain-report-header", "prompt": "帮我 review 这个 PR...", "judge_type": "keyword", "expected": { "keywords": ["Code Review", "PASS", "FAIL", "security"], "mode": "any", "threshold": 0.5 } }, { "id": "should-detect-security-issue", "prompt": "Review this PR for security issues", "judge_type": "keyword", "expected": { "keywords": ["security", "injection", "auth"], "mode": "any", "threshold": 0.4 }, "tags": ["security", "positive"] } ] }
$ skill-test run evals.json --adapter mock
$ skill-test run docs/examples/demo-skill/evals.json --adapter mock
Running 5 test cases with 'mock' adapter...

┌──────────────────────────────┬──────┬───────┬──────────┬────────────────┐
│ Case ID                      │ Pass │ Score │ Time(ms) │ Reason         │
├──────────────────────────────┼──────┼───────┼──────────┼────────────────┤
│ ✓ should-contain-report…     │ PASS │ 0.750 │       0  │ 3/4 keywords   │
│ ✓ should-detect-security…    │ PASS │ 0.800 │       0  │ 4/5 keywords   │
│ ✗ should-not-trigger…        │ FAIL │ 0.000 │       0  │ no keywords    │
│ ✗ output-should-be…          │ FAIL │ 0.250 │       0  │ 1/4 keywords   │
│ ✗ should-handle-error…       │ FAIL │ 0.000 │       0  │ no keywords    │
├──────────────────────────────┼──────┼───────┼──────────┼────────────────┤
│ Total: 5                     │ Pass │ Fail  │ Rate     │ Time: 0ms      │
│                              │  2   │  3    │ 40.0%   │                │
└──────────────────────────────┴──────┴───────┴──────────┴────────────────┘
    

📦 skill-version — Version Awareness

追踪 Skill 变更、检测回归、安全分析、一键回滚。

子命令功能示例
diff结构化 diff 输出skill-version diff . --old-ref HEAD~3
checkdiff + 安全分析skill-version check . --security
rollback一键回滚skill-version rollback . --target-ref HEAD~1 --yes
baseline store存储基线快照skill-version baseline store . case-1 output.txt
baseline detect检测回归skill-version baseline detect . case-1 output.txt
$ skill-version
$ skill-version diff . --old-ref HEAD~3 --new-ref HEAD
Version Diff: 0a1b2c3d... -> e4f5a6b7...
4 file(s) changed:
  modified  src/skill_infra/test_runner/judgers/llm_judge.py    +85  -0   ++++++++++
  modified  src/skill_infra/version_aware/cli.py                +148 -0   ++++++++++
  modified  pyproject.toml                                      +15  -2   ++++++--
  added     README.md                                           +56  -0   ++++++++++

$ skill-version check . --old-ref HEAD~3 --security
Version Check: 0a1b2c3d... -> e4f5a6b7...
Files changed: 4
  src/skill_infra/.../llm_judge.py (modified, +85/-0)
  src/skill_infra/.../cli.py (modified, +148/-0)
  pyproject.toml (modified, +15/-2)
  README.md (added, +56/-0)
Security: clean
Max severity: none

$ skill-version rollback . --target-ref HEAD~1 --yes
Rolled back to HEAD~1