Metadata-Version: 2.4
Name: skill-lab
Version: 0.7.0
Summary: Evaluate agent skills via static analysis, trigger testing, and trace analysis. Run `sklab` after installing to scan your skills.
Author-email: Eddie Hu <eddiehu0314@gmail.com>, Claudia Wong <wongclaudiawm@gmail.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/8ddieHu0314/Skill-Lab
Project-URL: Documentation, https://github.com/8ddieHu0314/Skill-Lab#readme
Project-URL: Repository, https://github.com/8ddieHu0314/Skill-Lab
Project-URL: Issues, https://github.com/8ddieHu0314/Skill-Lab/issues
Project-URL: Releases, https://github.com/8ddieHu0314/Skill-Lab/releases
Keywords: agent,skills,evaluation,cli,static-analysis,quality,SKILL.md,ai-agents
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: certifi>=2023.0.0
Requires-Dist: anthropic>=0.39.0
Requires-Dist: openai>=1.0.0
Requires-Dist: google-generativeai>=0.8.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: docker>=7.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: types-PyYAML>=6.0; extra == "dev"
Requires-Dist: types-docker>=7.0; extra == "dev"
Dynamic: license-file

# Skill Lab

[![PyPI version](https://badge.fury.io/py/skill-lab.svg?v=0.5.0)](https://badge.fury.io/py/skill-lab)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

**Agent Skills Evaluation Framework**

Your agent's skills are probably broken in at least one way — and you don't know it yet. **Skill Lab** catches skills that drain tokens, never fire, or leak data before they cause damage.

```bash
pip install skill-lab
```

---

## Why Skill Lab

**Performance** — A badly-written skill can triple your token usage with zero gain. We score every skill 0–100 and show exactly what it costs. `sklab evaluate ./my-skill`

**Security** — A malicious skill can exfiltrate company data to an external endpoint. Static checks catch that before the conversation starts. `sklab scan ./my-skill`

**Trigger Testing** — If your description doesn't have enough trigger examples, the skill sits there doing nothing. We generate and run ~13 tests automatically. `sklab trigger ./my-skill`

---

## Quick Start

```bash
# Install
pip install skill-lab

# First run — scans your repo and shows the getting started guide
sklab
```

---

## Commands

| Command / Flag | Description |
|---|---|
| **Evaluate** | |
| `sklab evaluate ./my-skill` | Static checks + LLM quality review (0-100 scores) |
| `--verbose / -V` | Show all checks + LLM reasoning |
| `--skip-review` | Skip LLM review (static checks only) |
| `--model / -m <model>` | Choose LLM model for review (supports Anthropic, OpenAI, Gemini) |
| `--spec-only / -s` | Only run spec-required checks |
| `--format / -f json` | Output as JSON |
| `--output / -o <file>` | Write output to a file |
| `--all` | Evaluate every skill in the current directory |
| `--repo` | Evaluate every skill from the git repo root |
| **Check** | |
| `sklab check ./my-skill` | Quick pass/fail — exits 0 or 1, great for CI pipelines |
| `--spec-only / -s` | Only validate against the Agent Skills spec |
| `--all` | Validate every skill in the current directory |
| `--repo` | Validate every skill from the git repo root |
| **Scan** | |
| `sklab scan ./my-skill` | Security scan — shows BLOCK / SUS / ALLOW status per check |
| `--all` | Scan every skill in the current directory |
| **Info** | |
| `sklab info ./my-skill` | Skill metadata + token cost estimates (discovery vs activation) |
| `--json` | Output as JSON |
| `--field / -f <name>` | Extract a single field value |
| **Prompt** | |
| `sklab prompt ./skill-a` | Export skill(s) as a prompt for agent platforms |
| `--format / -f <fmt>` | Output format: `xml` (default), `markdown`, `json` |
| **Stats** | |
| `sklab stats` | Your personal usage history and score trends |
| `count` | Skill invocation counts for the current month |
| `score` | Score trend for all evaluated skills |
| `tokens` | Token usage per skill for the current month |
| **Browse** | |
| `sklab list-checks` | Browse all 37 checks across 5 dimensions |
| `--spec-only` | Only spec-required checks |
| `--suggestions-only` | Only quality suggestions |
| **Trigger Testing** _(requires `ANTHROPIC_API_KEY`)_ | |
| `sklab generate ./my-skill` | Auto-generate ~13 trigger test cases via LLM |
| `--model <model-id>` | Anthropic model ID to use (e.g. `claude-sonnet-4-6`). The skill path is a positional argument that comes before this flag. |
| `--force` | Overwrite existing test file |
| `sklab trigger ./my-skill` | Run trigger tests against a live runtime |
| `--type <type>` | Filter by type: `explicit`, `implicit`, `contextual`, `negative` |
| **Telemetry** | |
| `sklab telemetry` | Show telemetry status |
| `enable` | Enable anonymous usage telemetry |
| `disable` | Disable anonymous usage telemetry |
| `show` | View recent events (`--limit / -n N`, `--json`) |

---

## What Gets Checked

37 checks across 5 dimensions. Run `sklab list-checks` to browse all of them with severity labels.

**Structure** (13)
- SKILL.md Exists · Valid Frontmatter · Standard Frontmatter Fields
- Allowed Tools Format · Compatibility Length · License Format · Metadata Format
- Scripts Folder Valid · Scripts Self-Contained · Scripts No Interactive Input · Scripts Help Support
- References Folder Valid · Files Outside Spec Dirs

**Naming** (3)
- Name Required · Name Format (kebab-case) · Name Matches Directory

**Description** (3)
- Description Required · Description Not Empty · Description Max Length

**Content** (13)
- Body Not Empty · Has Examples · Description Actionable · Line Budget · Token Budget
- Metadata Token Budget · Reference Depth · Asset Paths Exist · Script Paths Exist
- Scripts Referenced · Compatibility Prerequisites · Broken Internal Links · Orphaned Files

**Security** (5)
- Prompt Injection & Jailbreak · Evaluator Manipulation · Unicode Obfuscation · YAML Anomalies · Suspicious Size & Structure

---

## Trigger Testing

Skill Lab generates ~13 test cases per skill across 4 types — explicit, implicit, contextual, and negative — then runs them against a live LLM via Claude CLI.

Requires Claude CLI: `npm install -g @anthropic-ai/claude-code`

```yaml
# .sklab/tests/triggers.yaml
skill: my-skill
test_cases:
  # should fire
  - id: explicit-1
    type: explicit
    prompt: "$my-skill do the thing"
    expected: trigger
  # should NOT fire
  - id: negative-1
    type: negative
    prompt: "unrelated question"
    expected: no_trigger
```

---

## Telemetry

sklab collects anonymous usage data (command names, duration, exit codes, scores, token counts). **No skill content, file paths, or flag values are ever collected.** To opt out:

```bash
sklab telemetry disable
```

See [docs/PRIVACY.md](docs/PRIVACY.md) for the full privacy policy.

---

## Development

```bash
pip install -e ".[dev]"
pytest tests/ -v
mypy src/
ruff check src/
ruff format src/
```

---

Apache License 2.0
