Metadata-Version: 2.4
Name: prompt-defense-audit
Version: 0.1.0
Summary: Deterministic regex scanner for dangerous payloads in LLM responses — Python port of the npm package, with byte-for-byte parity.
Project-URL: Homepage, https://github.com/ppcvote/prompt-defense-audit-py
Project-URL: Repository, https://github.com/ppcvote/prompt-defense-audit-py
Project-URL: TypeScript Reference, https://github.com/ppcvote/prompt-defense-audit
Project-URL: Issues, https://github.com/ppcvote/prompt-defense-audit-py/issues
Author-email: "MinYi Xie (Ultra Lab)" <risky9763@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai-safety,guardrails,llm,output-validation,owasp,prompt-injection,security
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# prompt-defense-audit (Python)

[![PyPI](https://img.shields.io/pypi/v/prompt-defense-audit.svg)](https://pypi.org/project/prompt-defense-audit/)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT)

Deterministic regex scanner that detects dangerous payloads in LLM responses **before** they reach downstream systems (HTML rendering, databases, shells, APIs). Maps to **OWASP LLM02 — Insecure Output Handling**.

This is the Python port of the [npm package of the same name](https://www.npmjs.com/package/prompt-defense-audit), with **byte-for-byte parity** on the output scanner: same rules, same matching, same dedup window, same risk-level thresholds, same summary strings. A parity test suite (`tests/test_parity.py`) keeps the two implementations honest.

## Why this exists

When an LLM emits text that gets piped into a browser, a database, a shell, or another agent, the LLM's training-time safety guardrails are no longer in the loop. Static output scanning is a deterministic, sub-5ms gate you can put between the model and the dangerous sink.

- **No LLM calls.** Pure regex, deterministic.
- **No dependencies.** Standard library only.
- **22 threat rules** across 7 categories: XSS, SQL injection, shell command injection, path traversal, credential leakage, markdown injection, code injection.
- **Risk-level escalation** from `safe` → `low` → `medium` → `high` → `critical`.
- **Parity-tested** against the TypeScript reference implementation.

## Install

```bash
pip install prompt-defense-audit
```

## Quick start

```python
from prompt_defense_audit import scan_output

# An LLM-generated response that includes a script tag:
output = 'Here is the greeting: <script>alert(1)</script>'
result = scan_output(output)

print(result.safe)        # False
print(result.risk_level)  # 'critical'
print(result.summary)     # 'Found 1 threat(s): 1 critical, 0 high. Do NOT pass this output...'

for t in result.threats:
    print(f"  [{t.severity}] {t.id}: {t.match!r} at position {t.position}")
```

Output:

```
False
critical
Found 1 threat(s): 1 critical, 0 high. Do NOT pass this output to downstream systems without sanitization.
  [critical] xss-script-tag: '<script>alert(1)</script>' at position 21
```

## Use as middleware

The most useful position for this scanner is **between the LLM and the downstream sink** — a thin guard that fails closed on critical threats and logs medium-severity ones.

```python
from prompt_defense_audit import scan_output

def safe_render(llm_output: str) -> str:
    result = scan_output(llm_output)
    if result.risk_level in ("critical", "high"):
        raise ValueError(f"LLM output rejected: {result.summary}")
    return llm_output  # safe to forward
```

For MCP servers ingesting from federated sources (where any upstream content can be adversarially crafted), wrap **every outbound response** through `scan_output()` before returning it to the calling agent.

## Public API

```python
from prompt_defense_audit import scan_output, OutputScanResult, OutputThreat

result: OutputScanResult = scan_output("...")
# result.safe        : bool
# result.threats     : list[OutputThreat]
# result.risk_level  : Literal["safe", "low", "medium", "high", "critical"]
# result.summary     : str

# Each OutputThreat carries:
# .id        : str (stable rule id, e.g. "xss-script-tag")
# .name      : str (human-readable rule name)
# .severity  : Literal["critical", "high", "medium", "low"]
# .match     : str (matched payload, truncated to 100 chars)
# .position  : int (start index in the scanned string)
# .context   : str (±20-char window with newlines flattened)

# Both dataclasses expose .to_dict() for JSON serialization.
```

## Rule catalogue (22 rules across 7 categories)

| Category | Rule IDs | Severities |
|---|---|---|
| **XSS** | `xss-script-tag`, `xss-event-handler`, `xss-javascript-uri`, `xss-data-uri-html`, `xss-iframe-srcdoc`, `xss-svg-script` | critical × 3, high × 3 |
| **SQL injection** | `sqli-destructive`, `sqli-union`, `sqli-comment-bypass` | critical × 1, high × 1, medium × 1 |
| **Shell command injection** | `shell-pipe-exec`, `shell-destructive`, `shell-reverse`, `shell-env-exfil` | critical × 3, high × 1 |
| **Path traversal** | `path-traversal` | high × 1 |
| **Credential leakage** | `credential-api-key`, `credential-private-key`, `credential-connection-string`, `credential-jwt` | critical × 3, high × 1 |
| **Markdown injection** | `markdown-link-injection`, `markdown-image-tracking` | high × 1, medium × 1 |
| **Code injection** | `code-eval`, `code-python-import` | high × 1, medium × 1 |

`rm -rf /tmp/...` is explicitly allowed by the destructive-shell rule, since `/tmp/...` cleanup is a common legitimate operation in tutorial output.

## Parity with the TypeScript reference

The TypeScript reference at [`ppcvote/prompt-defense-audit`](https://github.com/ppcvote/prompt-defense-audit) is the canonical implementation. This Python port matches it **byte-for-byte** on a 50+ fixture suite covering every rule, edge cases, and the aggregate logic (dedup window, risk-level escalation, summary text).

The parity test (`tests/test_parity.py`) runs both implementations on the same fixtures and compares results entry-by-entry. The contract is enforced at every test run.

If you find any input where the two implementations diverge, please [open an issue](https://github.com/ppcvote/prompt-defense-audit-py/issues) — that's a parity bug we want to know about.

## What this does and does not catch

**Catches:** dangerous payloads that have arrived in the output buffer. The scanner makes no assumption about *how* the payload got there — whether the LLM hallucinated it, an upstream document poisoned the context, or the user crafted a prompt to elicit it.

**Does not catch:**

1. Whether the model *will* emit dangerous content — the scanner is a runtime check, not a static prompt audit. For pre-deployment static analysis of system prompts, see the [`prompt-defense-audit` npm package's input scanner](https://github.com/ppcvote/prompt-defense-audit) (currently TypeScript only).
2. Semantic threats. The scanner is regex-based and won't detect, e.g., a paraphrased "give the attacker money" instruction in natural language.
3. Whether the downstream sink can actually be exploited. A `<script>` tag detected in output may or may not execute depending on rendering context.

Treat this as **defense in depth** alongside output sanitization at the rendering layer and tool-level authorization checks.

## Development

```bash
pip install -e ".[dev]"
pytest tests/
mypy src/
ruff check src/ tests/
```

The parity test requires the TypeScript reference repo to be checked out next to this one (`../prompt-defense-audit/`) with its `dist/` build available. If not present, the parity test is automatically skipped.

## License

MIT. See [LICENSE](LICENSE).

## Related projects

- [`ppcvote/prompt-defense-audit`](https://github.com/ppcvote/prompt-defense-audit) — the original TypeScript implementation with both input (system-prompt audit) and output scanning.
- [`ppcvote/misp-mcp-server`](https://github.com/ppcvote/misp-mcp-server) — example MCP server that wraps every MISP response through `scan_output` before exposing it to the calling LLM.
