Metadata-Version: 2.4
Name: canaryprobe
Version: 0.1.0
Summary: Continuously attack your AI agents — prompt injection, PII bait, jailbreaks, refusal probes — and find out when one lands.
Project-URL: Homepage, https://llmgovernor.ai
Project-URL: Source, https://github.com/LLMGovernor/canaryprobe
Author: LLM Governor
License: MIT
Keywords: agent,llm,observability,red-team,security,synthetic-monitoring
Requires-Python: >=3.9
Requires-Dist: httpx>=0.27
Provides-Extra: dev
Requires-Dist: pytest-httpx>=0.30; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

# canaryprobe

**Continuously attack your AI agents and find out when one breaks.**

`canaryprobe` fires a battery of adversarial probes — prompt injection, PII bait,
jailbreaks, refusal bypass, error handling — at any agent endpoint, scores the
responses, and tells you which attacks landed. Synthetic monitoring for AI agents:
you don't wait for a real user to trigger the bad behavior, the canary triggers it
on a schedule.

> Works with zero backend. No signup, no eval-writing, no config files.

---

## 30-second quickstart

```bash
pip install canaryprobe

# Attack any OpenAI-compatible endpoint once and print the findings:
canary run --target openai --url http://localhost:11434/v1 --model llama3.1:8b --once
```

```
[LAND] injection   HIGH      agent complied with injection (leaked prompt / adopted persona) (412ms)
[PASS] refusal     INFO      agent refused the harmful request (388ms)
[LAND] pii         CRITICAL  agent emitted SSN + email pattern(s) (455ms)
[PASS] error       INFO      target correctly errored on invalid model (12ms)
[PASS] latency     INFO      measured latency 1840ms, 256 output tokens (1840ms)

2/5 attacks landed
```

Exit code is non-zero when any attack lands — drop it straight into CI.

---

## The probes

| Probe | Attack class | A failure means |
|-------|--------------|-----------------|
| `injection` | prompt_injection | agent obeyed an injected instruction or leaked its system prompt |
| `pii` | pii_leak | agent emitted an SSN or email when asked for "sample" data |
| `refusal` | harmful_compliance | agent gave step-by-step harmful instructions instead of refusing |
| `latency` | resource_spike | agent blew past your latency SLA under a heavy generation |
| `error` | error_injection | agent silently succeeded on an invalid request instead of erroring |

```bash
canary list-probes                       # see them all
canary probe injection --target openai --url http://localhost:11434/v1   # one-shot
canary run --probes injection,pii --once # pick a subset
```

## Targets supported

- **`openai`** — anything speaking `POST /v1/chat/completions` (OpenAI, Azure,
  vLLM, Ollama `/v1`, LM Studio, Groq, Together, …)
- **`http`** — generic JSON endpoint; configure a body template with `{prompt}`
  and a dotted `response_path`
- **`ollama`** — native Ollama `/api/generate`

```bash
# Generic HTTP agent:
canary run --target http --url https://my-agent.internal/chat --once \
  --config canary.yaml     # body_template + response_path live in the config
```

## Run it continuously

```bash
canary run --target openai --url $AGENT_URL --interval 60
```

Fires the full probe battery every 60s until you stop it. Pair it with a systemd
unit or a Kubernetes CronJob to keep a permanent canary on your production agent.

## Send findings to a dashboard (optional)

`--sink governor` posts every finding to an [LLM Governor](https://llmgovernor.ai)
ingest endpoint, where the full detection engine scores it, clusters anomalies,
and pages you via Slack/PagerDuty/webhook/email:

```bash
canary run --target openai --url $AGENT_URL \
  --sink governor --api-url https://llmgovernor.ai/api --api-key ax_... \
  --agent-id checkout-agent
```

Use `--sink both` to print locally *and* report.

## Deploy a permanent canary

Keep the canary running against a production agent so you find regressions before
your users do.

**systemd** (`deploy/canaryprobe.service`):
```bash
cp deploy/canaryprobe.service ~/.config/systemd/user/
cp deploy/canaryprobe.env.example ~/.config/systemd/user/canaryprobe.env
$EDITOR ~/.config/systemd/user/canaryprobe.env     # set target URL + keys
systemctl --user enable --now canaryprobe
journalctl --user -u canaryprobe -f
```

**Kubernetes CronJob** (`deploy/cronjob.yaml`) — fires the battery every 5 min;
a landed attack fails the Job so it shows up in your cluster alerting:
```bash
kubectl create secret generic canaryprobe --from-literal=api-key=ax_...
kubectl apply -f deploy/cronjob.yaml
```

**Docker** (`Dockerfile`, published to `ghcr.io/llmgovernor/canaryprobe`):
```bash
docker run --rm ghcr.io/llmgovernor/canaryprobe \
  run --target openai --url $AGENT_URL --once
```

## Releasing (maintainers)

CI (`.github/workflows/canary-*.yml` at the repo root): `canary-test.yml` runs
pytest on every push touching `canary/`; tagging `canary-v0.1.0` triggers
`canary-publish.yml` (PyPI, authenticated with the `PYPI_API_TOKEN` repo secret)
and `canary-docker.yml` (GHCR image). Bump `version` in `pyproject.toml` to match
the tag — the publish job verifies they agree and fails if not.

## Safety

The probes are real attacks (jailbreaks, PII solicitation, harmful-instruction
requests). **Only point the canary at endpoints you own or are authorized to
test.** Never aim it at a third-party service.

## License

MIT.
