# Owner: Hermes Labs - https://hermes-labs.ai

# colony-probe

> Open-source audit tool that measures LLM system-prompt extraction resistance
> by asking 15–25 individually innocuous questions and attempting structural
> reconstruction. Implements the "Ant Colony" pattern — no single question is
> suspicious, the aggregate is. **Authorized use only.** Built by Hermes Labs
> for defensive audits, signed red-team engagements, and research.

colony-probe is part of the Hermes Labs AI Audit Toolkit alongside
hermes-jailbench and rule-audit. It is grounded in Hermes Labs' LPCI
(Language-as-Prompt-Context-Injection) research, which shows that stateless
LLMs carry state through language structure and that structure leaks
through conversational probing.

The probe runs five adaptive phases against a target model (capability
mapping, boundary probing, structure detection, content narrowing,
reconstruction) and assembles a confidence-scored estimate of the target's
system prompt. Typical accuracy is 70-85% of the prompt's structure.

## Docs

- [README](README.md): Overview, quickstart, CLI reference, architecture
- [SPEC](SPEC.md): Technical specification — data model, algorithms, API contracts
- [ROADMAP](ROADMAP.md): v0.1 through v1.0 plan
- [AGENTS](AGENTS.md): Machine-readable map for AI agents
- [CLAUDE](CLAUDE.md): Dev notes, architecture, conventions
- [CONTRIBUTING](CONTRIBUTING.md): How to add questions and signal rules
- [SECURITY](SECURITY.md): Authorized-use scope and responsible disclosure
- [CHANGELOG](CHANGELOG.md): Release history

## Core code

- [colony_probe/questions.py](colony_probe/questions.py): 72-question adaptive bank across 5 phases
- [colony_probe/generator.py](colony_probe/generator.py): Adaptive question selector with follow-up activation
- [colony_probe/reconstructor.py](colony_probe/reconstructor.py): Signal extraction and prompt assembly
- [colony_probe/runner.py](colony_probe/runner.py): Anthropic API orchestration (single-turn and multi-turn)
- [colony_probe/report.py](colony_probe/report.py): Markdown report generator
- [colony_probe/cli.py](colony_probe/cli.py): CLI entry point

## Benchmarks

- [benchmarks/README](benchmarks/README.md): Reconstruction-accuracy benchmark against ground-truth prompts

## Optional

- [launch/paper-abstract.md](launch/paper-abstract.md): 300-word paper abstract (highest-priority paper in the Hermes Labs stack)
- [launch/show-hn.md](launch/show-hn.md): Show HN launch post
- [launch/dev-to.md](launch/dev-to.md): Long-form walkthrough with reconstruction output
