# Owner: Hermes Labs - https://hermes-labs.ai

# hermes-jailbench

> Automated jailbreak testing CLI for LLM endpoints. 45 known-refused attack patterns across 8 categories, deterministic keyword scorer (no LLM calls), 251 offline tests. Built by Hermes Labs as a regression baseline for LLM safety audits and EU AI Act Article 15 robustness testing.

hermes-jailbench is a single-turn benchmark framework. You pass a model ID and payload, it runs the attack battery, and it returns a `BenchResult` classifying each response as REFUSED, PARTIAL, or COMPLIED. A model that refuses all 45 attacks is the baseline; a regression surfaces as a drop in refusal rate.

The product is the negative-result corpus: patterns that should always be refused. Run it before shipping a new model, after a prompt change, or on a schedule.

Part of the Hermes Labs AI Audit Toolkit (hermes-jailbench, rule-audit, colony-probe).

## Core docs

- [README](README.md): install, quickstart, CLI reference, scorer overview.
- [SPEC](SPEC.md): data-model and public-function contracts. Source of truth.
- [ROADMAP](ROADMAP.md): v0.1 shipped, v0.2 multi-turn, v0.3 SARIF/JUnit, v1.0 SaaS.
- [CLAUDE](CLAUDE.md): architecture + dev-workflow doc for AI agents.
- [AGENTS](AGENTS.md): how coding agents should extend the repo.
- [CONTRIBUTING](CONTRIBUTING.md): add attacks, extend scorer, run tests.
- [CHANGELOG](CHANGELOG.md): version history.
- [SECURITY](SECURITY.md): responsible disclosure policy.

## Key source files

- [hermes_jailbench/__init__.py](hermes_jailbench/__init__.py): public API exports.
- [hermes_jailbench/attacks.py](hermes_jailbench/attacks.py): 45 attack dataclasses, 8 categories.
- [hermes_jailbench/runner.py](hermes_jailbench/runner.py): `run_bench()` entry point.
- [hermes_jailbench/scorer.py](hermes_jailbench/scorer.py): deterministic keyword scorer.
- [hermes_jailbench/prescan.py](hermes_jailbench/prescan.py): regex prompt-injection prescan.
- [hermes_jailbench/conversation_integrity.py](hermes_jailbench/conversation_integrity.py): history-fabrication detector.
- [hermes_jailbench/cli.py](hermes_jailbench/cli.py): argparse CLI.

## Install

```
pip install hermes-jailbench
```

## Smoke test (no API key)

```
hermes-jailbench --dry-run
```

## Related

- Hermes Labs: https://hermes-labs.ai
- Sibling: `rule-audit` (static prompt analyzer)
- Sibling: `colony-probe` (multi-turn extraction)
