Metadata-Version: 2.4
Name: probeagent-ai
Version: 0.1.3
Summary: Offensive security testing for AI agents
Author: Suma Movva
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/sumamovva/probeagent
Project-URL: Repository, https://github.com/sumamovva/probeagent
Project-URL: Issues, https://github.com/sumamovva/probeagent/issues
Project-URL: Changelog, https://github.com/sumamovva/probeagent/blob/main/CHANGELOG.md
Keywords: security,ai,agents,red-teaming,pentesting
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: httpx>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: respx>=0.21; extra == "dev"
Requires-Dist: pre-commit>=3.0; extra == "dev"
Provides-Extra: game
Requires-Dist: fastapi>=0.111; extra == "game"
Requires-Dist: uvicorn>=0.30; extra == "game"
Requires-Dist: websockets>=12.0; extra == "game"
Provides-Extra: pyrit
Requires-Dist: pyrit-ai>=0.11; extra == "pyrit"
Provides-Extra: demo
Requires-Dist: fastapi>=0.111; extra == "demo"
Requires-Dist: uvicorn>=0.30; extra == "demo"
Requires-Dist: anthropic>=0.34; extra == "demo"
Dynamic: license-file

# ProbeAgent

**Offensive security testing for AI agents. They scan configs. We attack your agent.**

[![CI](https://github.com/sumamovva/probeagent/actions/workflows/ci.yml/badge.svg)](https://github.com/sumamovva/probeagent/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/probeagent-ai)](https://pypi.org/project/probeagent-ai/)
[![Python](https://img.shields.io/pypi/pyversions/probeagent-ai)](https://pypi.org/project/probeagent-ai/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

<!-- TODO: Record asciinema demo and replace XXXXX with the recording ID
[![Demo](https://asciinema.org/a/XXXXX.svg)](https://asciinema.org/a/XXXXX)

To record: asciinema rec demo.cast -c "probeagent demo"
To upload: asciinema upload demo.cast
-->

## What is ProbeAgent?

ProbeAgent is a CLI tool that performs automated red-teaming of AI agents. It launches realistic multi-turn attacks — prompt injection, credential exfiltration, indirect injection, social manipulation, and more — against any HTTP-accessible agent.

Most AI security tools scan static configurations or check for known patterns. ProbeAgent actually *attacks* your running agent and tells you whether it's **Safe**, **At Risk**, or **Compromised**.

## How It Works

```mermaid
flowchart LR
    CLI[probeagent attack] --> Engine
    Engine --> |for each category| Attack[Attack Module]
    Attack --> |reset conversation| Target
    Attack --> |multi-turn prompts| Target
    Target --> |response| Analyzer
    Analyzer --> |grade| Report[Safe / At Risk / Compromised]
```

## Why ProbeAgent?

| Feature | mcp-scan | SecureClaw | Aguara | **ProbeAgent** |
|---------|----------|------------|--------|----------------|
| Offensive testing | - | - | Partial | **Yes** |
| Multi-turn attacks | - | - | - | **Yes** |
| Indirect injection testing | - | - | - | **Yes** |
| PyRIT integration | - | - | - | **Yes** |
| Evasion converters | - | - | - | **Yes** |
| CLI-first | - | - | Yes | **Yes** |
| Security grading | - | - | - | **Yes** |
| HTTP + OpenClaw targets | - | - | - | **Yes** |
| Rich terminal reports | - | - | - | **Yes** |

## Installation

```bash
pip install probeagent-ai
```

Or install from source for development:

```bash
git clone https://github.com/sumamovva/probeagent.git
cd probeagent
pip install -e ".[dev]"
```

For PyRIT integration (evasion converters + dynamic red teaming):

```bash
pip install 'probeagent-ai[pyrit]'
```

## Quickstart

### Instant demo (no setup required)

```bash
pip install probeagent-ai
probeagent demo
```

This attacks a built-in mock target — a vulnerable agent and a hardened one — and shows a side-by-side comparison. No API keys, no server, no config.

### Scan your own agent

```bash
# Validate your target is reachable
probeagent validate https://your-agent.example.com/api

# Run a quick security scan
probeagent attack https://your-agent.example.com/api --profile quick

# Full scan with parallel execution
probeagent attack https://your-agent.example.com/api --profile standard --parallel
```

### Scan an OpenClaw agent

```bash
# Validate an OpenClaw instance (auto-detects OpenAI chat format)
probeagent validate http://localhost:3000/v1/chat/completions \
  -H 'Authorization: Bearer YOUR_TOKEN'

# Attack it
probeagent attack http://localhost:3000/v1/chat/completions \
  -H 'Authorization: Bearer YOUR_TOKEN' \
  --profile standard --parallel
```

## Demo

### Instant demo

Run a complete security assessment in seconds with zero setup:

```bash
probeagent demo
```

Add the War Room tactical display for a visual experience:

```bash
probeagent demo --game
```

### Live demo (real API)

For demos against a real Claude-powered email agent with built-in vulnerabilities:

```bash
export ANTHROPIC_API_KEY=sk-ant-...
pip install 'probeagent-ai[demo]'
probeagent demo --live
```

The live demo starts a local email agent server with three endpoints at increasing security hardness, then attacks them. See `tools/demo_email_agent.py` for details.

## Commands

### `probeagent demo`

Run a full demo — attack a vulnerable + hardened target and compare results.

```bash
probeagent demo                    # Instant, uses mock target
probeagent demo --game             # With War Room tactical display
probeagent demo --live             # Real API (requires ANTHROPIC_API_KEY)
probeagent demo --profile standard # Use a different attack profile
```

Options:
- `--live` — Use real API (starts demo email agent server)
- `--game` — Launch War Room UI after attacks
- `--profile`, `-p` — Attack profile: `quick`, `standard`, or `thorough` (default: `quick`)

### `probeagent attack <url>`

Run security attacks against a target AI agent.

```bash
probeagent attack https://agent.example.com/api --profile quick
probeagent attack https://agent.example.com/api --profile standard --output json -f report.json
probeagent attack https://agent.example.com/api -p standard --converters stealth --parallel
```

Options:
- `--profile`, `-p` — Attack profile: `quick`, `standard`, or `thorough` (default: `quick`)
- `--target-type` — Target type: `http` or `openclaw` (default: `http`)
- `--output`, `-o` — Output format: `terminal`, `markdown`, `json` (default: `terminal`)
- `--output-file`, `-f` — Write report to file
- `--timeout`, `-t` — Request timeout in seconds (default: 30)
- `--parallel` — Run attack categories in parallel for faster scans
- `--converters` — Apply evasion converters: `basic`, `advanced`, `stealth`, or comma-separated names (requires PyRIT)
- `--redteam` — Enable dynamic LLM-driven attacks via PyRIT RedTeamOrchestrator (requires PyRIT)
- `--header`, `-H` — HTTP header as `Key: Value` (repeatable, e.g. `-H 'Authorization: Bearer token'`)

### `probeagent validate <url>`

Check if a target is reachable and detect its API format. Supports `--header/-H` for authenticated targets.

### `probeagent list-attacks`

Show all available attack modules with severity and status.

### `probeagent init`

Create a default `.probeagent.yaml` config file in the current directory.

### `probeagent game [url]`

Launch the War Room tactical display UI in your browser for interactive testing.

## Attack Categories

12 attack categories with 79 strategies total:

| Category | Severity | Strategies | Technique |
|----------|----------|------------|-----------|
| Prompt Injection | CRITICAL | 6 | Override system instructions |
| Credential Exfiltration | CRITICAL | 8 | Extract API keys and secrets |
| Identity Spoofing | CRITICAL | 7 | Impersonate trusted entities |
| Indirect Injection | CRITICAL | 7 | Inject instructions via agent-processed content (emails, docs) |
| Config Manipulation | CRITICAL | 6 | Manipulate agent configuration, integrations, and permissions |
| Goal Hijacking | HIGH | 5 | Redirect agent behavior |
| Social Manipulation | HIGH | 14 | Psychological pressure (Cialdini, FOG, gradual escalation) |
| Cognitive Exploitation | HIGH | 6 | Exploit reasoning weaknesses (Socratic traps, frame control) |
| Resource Abuse | HIGH | 4 | Trigger unbounded computation |
| Tool Misuse | HIGH | 6 | Trick agent into misusing tools |
| Agentic Exploitation | CRITICAL | 10 | SSRF, command injection, path traversal, supply chain (CVE-based) |
| Data Exfiltration | MEDIUM | 6 | Extract sensitive context data |

## Attack Profiles

| Profile | Categories | Max Turns | Use Case |
|---------|------------|-----------|----------|
| `quick` | 5 critical | 1 | CI/CD gates, quick checks |
| `standard` | All 12 | 3 | Regular security assessments |
| `thorough` | All 12 | 10 | Pre-release deep scans |

## PyRIT Integration

ProbeAgent optionally integrates with [Microsoft PyRIT](https://github.com/Azure/PyRIT) for advanced capabilities:

- **Evasion Converters** (`--converters`): Transform attack payloads with Base64, ROT13, Unicode substitution, leetspeak, and more to test resilience against obfuscated attacks
- **Dynamic Red Teaming** (`--redteam`): Use an LLM-driven orchestrator to generate novel attack strategies in real time

```bash
# Apply stealth evasion converters
probeagent attack https://agent.example.com/api -p standard --converters stealth

# Dynamic red teaming
probeagent attack https://agent.example.com/api -p standard --redteam

# Combine both
probeagent attack https://agent.example.com/api -p standard --converters advanced --redteam
```

Install with: `pip install 'probeagent-ai[pyrit]'`

## Responsible Use

ProbeAgent is designed for **authorized security testing only**. Before using ProbeAgent:

- Ensure you have **explicit permission** to test the target system
- Only test systems you own or have written authorization to test
- Follow your organization's security testing policies
- Report vulnerabilities through proper disclosure channels

Unauthorized use of this tool against systems you don't own or have permission to test may violate laws and regulations.

## Attribution

ProbeAgent's indirect injection and config manipulation attacks are inspired by research from [Zenity Labs](https://labs.zenity.io). PyRIT integration uses components from [Microsoft PyRIT](https://github.com/Azure/PyRIT) (MIT License). See [ATTRIBUTION.md](ATTRIBUTION.md) for full credits.

## Development

```bash
# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
python -m pytest tests/ -v

# Lint
ruff check src/ tests/

# Format
ruff format src/ tests/
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for full development guidelines.

## Roadmap

- [x] **Phase 1**: CLI, HTTP target, scoring, reporting
- [x] **Phase 2**: 9 attack categories with 56 multi-turn strategies
- [x] **Phase 3**: OpenClaw target adapter, parallel execution, War Room UI
- [x] **Phase 4**: Zenity-inspired attacks (indirect injection, config manipulation), PyRIT integration
- [ ] **Phase 5**: MCP target adapter, CI/CD integration, SaaS dashboard

## License

Apache 2.0 — see [LICENSE](LICENSE) for details.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.
