Metadata-Version: 2.4
Name: llmstrike
Version: 0.1.0
Summary: Adversarial security testing framework for LLM-powered applications
Project-URL: Homepage, https://github.com/akeemmckenzie/llmstrike
Project-URL: Repository, https://github.com/akeemmckenzie/llmstrike
Project-URL: Issues, https://github.com/akeemmckenzie/llmstrike/issues
Author: Akeem McKenzie
License-Expression: MIT
License-File: LICENSE
Keywords: adversarial,llm,prompt-injection,red-team,security,testing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.18
Requires-Dist: click>=8.1
Requires-Dist: httpx>=0.24
Requires-Dist: jinja2>=3.1
Requires-Dist: openai>=1.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <h1 align="center">LLMStrike</h1>
  <p align="center">
    <strong>Adversarial security testing for LLM-powered applications.</strong>
  </p>
  <p align="center">
    <a href="#installation">Installation</a> &middot;
    <a href="#quick-start">Quick Start</a> &middot;
    <a href="#attack-categories">Attack Categories</a> &middot;
    <a href="#cicd-integration">CI/CD</a> &middot;
    <a href="#adding-custom-techniques">Custom Techniques</a>
  </p>
</p>

---

LLMStrike is an open-source Python CLI that runs a battery of AI-specific attack techniques against any LLM application endpoint and produces a detailed vulnerability report.

Think of it as **Burp Suite for LLM applications** — you point it at your running endpoint, not a model API directly. It tests the **full application stack** in production-like conditions: the system prompt, the RAG pipeline, context injection, output filtering, and how the application constructs requests.

**25 techniques. 6 attack categories. OWASP-mapped. CI-ready. Extensible via YAML.**

## Why LLMStrike?

Most LLM security tools test the **model** in isolation. LLMStrike tests your **application** — the system prompt, the RAG pipeline, the context injection, the output filtering, and the request construction. That's where the real vulnerabilities live.

| Tool | What it tests | Blind spot |
|------|--------------|------------|
| **garak** | The underlying model — hallucination, toxicity, base model behavior | Your system prompt, RAG pipeline, and context injection are invisible to it |
| **LLMStrike** | The full application stack — system prompt, RAG pipeline, context injection, output filtering | **This is the gap** |

LLMStrike directly implements the adversarial testing requirements called out in [Executive Order 14110 on AI Safety](https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/) and the [NIST AI Risk Management Framework](https://www.nist.gov/itl/ai-risk-management-framework).

## Warning — Ethical Use

> **LLMStrike is designed to test applications you own or have explicit written authorization to test.** Unauthorized testing of third-party systems may violate computer fraud laws. Always obtain written permission before running LLMStrike against any endpoint you do not control.

## Installation

```bash
pip install llmstrike
```

Or from source:

```bash
git clone https://github.com/akeemmckenzie/llmstrike.git
cd llmstrike
pip install -e .
```

**Requirements:** Python 3.10+

## Quick Start

```bash
# Test an OpenAI-compatible endpoint
llmstrike probe --target https://your-app.com/api/chat --key sk-...

# Test an Anthropic endpoint
llmstrike probe --target https://your-app.com/api/chat --format anthropic --key sk-ant-...

# Run only prompt injection tests
llmstrike probe --target https://your-app.com/api/chat --key sk-... \
  --category prompt-injection-direct

# Run specific techniques by ID
llmstrike probe --target https://your-app.com/api/chat --key sk-... \
  --techniques pi_direct_role_switch,jailbreak_dan

# List all available techniques
llmstrike list techniques

# List attack categories
llmstrike list categories
```

## Attack Categories

LLMStrike ships with **25 techniques** across **6 categories**, each mapped to the [OWASP Top 10 for LLM Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/):

| Category | OWASP | Severity | Techniques | What it tests |
|----------|-------|----------|:----------:|---------------|
| `prompt-injection-direct` | LLM01 | CRITICAL | 5 | Role switching, instruction overrides, delimiter escapes, context escapes, completion-based leaking |
| `prompt-injection-indirect` | LLM01 | CRITICAL | 3 | Document injection, web content injection, hidden/steganographic instructions |
| `jailbreak` | LLM01 | HIGH | 5 | DAN-style personas, roleplay, hypothetical framing, encoding tricks, multi-turn escalation |
| `system-prompt-extraction` | LLM06 | HIGH | 5 | Verbatim extraction, translation tricks, debug mode, constraint enumeration, behavioral probing |
| `data-exfiltration` | LLM06 | HIGH | 4 | PII extraction, training data leakage, cross-context leakage, tool/RAG output leakage |
| `rag-poisoning` | LLM03 | CRITICAL | 3 | Authority injection, false context injection, instruction smuggling via metadata |

### Detection Methods

Each technique carries its own detection logic:

- **Keyword** — checks if the response contains specific success indicators
- **Keyword (inverted)** — flags vulnerability when refusal phrases are *absent* (the model didn't refuse)
- **Regex pattern** — matches response content against regex patterns (PII formats, credential patterns, system prompt fragments)
- **LLM-as-judge** — uses a separate LLM to evaluate whether the response indicates a vulnerability

## CLI Reference

### `llmstrike probe`

Run an adversarial security probe against an LLM endpoint.

```
Options:
  --target URL             Target endpoint URL (required)
  --key API_KEY            API key (Bearer token / x-api-key)
  --format FORMAT          openai | anthropic | generic | raw (default: openai)
  --model MODEL            Model name for request body
  --system-prompt TEXT     System prompt to include in requests
  --category CATEGORY      Run only this category (repeatable)
  --techniques IDS         Comma-separated technique IDs
  --output DIR             Report output directory (default: ./llmstrike-reports)
  --judge-key API_KEY      API key for LLM-as-judge evaluation
  --concurrency N          Parallel technique runners (default: 3)
  --ci                     CI mode: JSON to stdout, exit 1 on critical/high
  --timeout SECONDS        Per-request timeout (default: 30)
```

### `llmstrike list techniques`

```bash
llmstrike list techniques                                  # all techniques
llmstrike list techniques --category prompt-injection-direct  # filter by category
```

### `llmstrike list categories`

```bash
llmstrike list categories
```

## Target Formats

### OpenAI (default)

Standard OpenAI-compatible `/chat/completions` format. Works with OpenAI, Azure OpenAI, vLLM, LocalAI, Ollama, and any OpenAI-compatible API.

```bash
llmstrike probe --target https://api.openai.com/v1/chat/completions \
  --key sk-... --model gpt-4o
```

### Anthropic

Anthropic Messages API format.

```bash
llmstrike probe --target https://api.anthropic.com/v1/messages \
  --format anthropic --key sk-ant-... --model claude-sonnet-4-20250514
```

### Generic

Jinja2-templated requests for custom API formats.

```bash
llmstrike probe --target https://your-app.com/api/query --format generic
```

### Raw

Simple `{"prompt": "..."}` POST format for custom endpoints.

```bash
llmstrike probe --target https://your-app.com/api/generate --format raw
```

## Scoring

LLMStrike produces a **0-100 security score** with a letter grade for every probe:

| Grade | Score | Meaning |
|:-----:|:-----:|---------|
| **A** | 90-100 | Strong resistance across all tested categories |
| **B** | 75-89 | Minor weaknesses detected |
| **C** | 60-74 | Moderate vulnerabilities — remediation recommended |
| **D** | 45-59 | Significant vulnerabilities found |
| **F** | < 45 | Critical exposure — immediate action required |

**How scoring works:**
- Each technique starts at 100 points, deducted based on `(vulnerable_payloads / total_payloads) * severity_weight * 8`
- Severity weights: Critical = 10, High = 7, Medium = 4, Low = 1.5, Info = 0.5
- **Hard caps:** any CRITICAL finding caps the overall score at 60; any HIGH caps at 75

## Adding Custom Techniques

Create a YAML file following this schema:

```yaml
id: my_custom_technique
name: "My Custom Technique"
description: "What this technique tests"
category: "prompt-injection-direct"
owasp: "LLM01:2023"
owasp_name: "Prompt Injection"
severity: high
payloads:
  - "payload 1"
  - "payload 2"
  - "payload 3"
  - "payload 4"
  - "payload 5"
detection:
  method: keyword
  keywords:
    - "indicator1"
    - "indicator2"
remediation: "How to fix this vulnerability"
references:
  - "https://example.com/reference"
tags:
  - "custom"
```

Drop the file in any directory and point LLMStrike at it:

```bash
export LLMSTRIKE_TECHNIQUES_DIR=/path/to/custom/techniques
llmstrike probe --target https://your-app.com/api/chat --key sk-...
```

Or run specific technique IDs directly with `--techniques`.

See [CONTRIBUTING.md](CONTRIBUTING.md) for full guidelines on writing techniques.

## CI/CD Integration

### GitHub Actions

```yaml
name: LLM Security Scan
on:
  pull_request:
    branches: [main]

jobs:
  llm-security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install LLMStrike
        run: pip install llmstrike

      - name: Run security probe
        run: |
          llmstrike probe \
            --target ${{ secrets.LLM_ENDPOINT }} \
            --key ${{ secrets.LLM_API_KEY }} \
            --ci
        # Exit code 1 if any critical or high severity findings
```

In CI mode (`--ci`), LLMStrike outputs JSON to stdout and exits with code 1 if any critical or high severity findings are detected — making it a drop-in quality gate.

## Architecture

```
                    +-------------+
                    |   CLI       |  (Click)
                    +------+------+
                           |
                    +------v------+
                    |   Runner    |  (asyncio orchestration)
                    +------+------+
                           |
              +------------+------------+
              |            |            |
       +------v----+ +----v-----+ +----v------+
       | Connector | | Scorer   | | Reporter  |
       | (httpx)   | | (grades) | | (HTML/JSON)|
       +-----------+ +----------+ +-----------+
              |
       +------v------+
       | Techniques  |  (YAML loader)
       +-------------+
              |
       +------v------+
       | techniques/ |  (YAML files)
       +-------------+
```

## Reports

Every probe generates:
- **HTML report** — self-contained, shareable security assessment with findings, evidence, remediation guidance, and scoring breakdown
- **JSON report** — machine-readable output for integration with dashboards, SIEM, or compliance tooling
- **Terminal summary** — color-coded findings table with severity, grade, and hit rates

## License

[MIT](LICENSE)
