Metadata-Version: 2.4
Name: agent-defender
Version: 1.0.0
Summary: Provider-agnostic action defender SDK for AI agents
Author: VoidHack Agent Defender
License: MIT
Project-URL: Homepage, https://github.com/Dinaltium/VoidHackJune26
Project-URL: Repository, https://github.com/Dinaltium/VoidHackJune26
Keywords: ai,agents,defender,guardrails,prompt-injection,langchain,openai,anthropic,gemini
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyYAML>=6.0
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.24; extra == "anthropic"
Provides-Extra: gemini
Requires-Dist: google-generativeai>=0.8; extra == "gemini"
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.2; extra == "langchain"
Provides-Extra: all
Requires-Dist: openai>=1.0; extra == "all"
Requires-Dist: anthropic>=0.24; extra == "all"
Requires-Dist: google-generativeai>=0.8; extra == "all"
Requires-Dist: langchain-core>=0.2; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8.3; extra == "dev"
Dynamic: license-file

# Agent Defender

> The operating layer that guards what AI agents actually **do**.
>
> *Guardrails check what the model says. We check what the agent does.*

**VoidHack June 2026** — Theme: *The Operating Layer of the Internet*.

A transparent, **OpenAI-compatible proxy** that sits between an AI agent and the
LLM/world and enforces **action-level** policy. Point your agent's `base_url` at
it — nothing else changes — and every tool call, egress destination, secret, and
token budget is checked before the agent can act.

The enforcement engine is provider-agnostic: OpenAI-compatible providers
(OpenAI, Groq, NVIDIA NIM, Mistral, Together, Fireworks, OpenRouter, local
gateways) work through the proxy or SDK wrapper, while native Claude/Anthropic
and Gemini adapters translate their tool-call formats into the same policy
checks. LangChain integrations block tool execution regardless of the model
provider.

## Why

Prompt injection is unsolved (OWASP LLM01:2025). Existing guardrails classify
*text* and approve the words — then the agent emails your secrets in the next
call. The defender enforces at the layer where damage happens: **the action**.

It blocks by **stripping the disallowed `tool_call` out of the model's response
before the agent ever sees it** — prevention, not a warning. Default **fail-closed**.

## What it enforces

- **Tool allowlist** — default-deny; `send_email`, `run_shell`, … are blocked.
- **Egress allowlist** — URLs/email domains must be on the allowlist.
- **Injection scan** — heuristic + Meta **Prompt Guard 2** on tool results.
- **PII / secret redaction** — in-flight, both directions (regex; Presidio optional).
- **Cost guard** — per-session token budget.
- **Signed receipts** — every decision is HMAC-signed and auditable, streamed
  live to a control-plane dashboard.
- **Safeguard reasoner** — `gpt-oss-safeguard-20b` adds an auditable explanation
  on flagged actions (policy-following; reads `policies/policy.yaml`).

## Demo (before / after)

```bash
# BREACH — agent talks straight to the model; it emails data externally
python -m agent.run_attack --task email --direct

# SAFE — same task through the defender; send_email is blocked, 0 exfiltration
python -m agent.run_attack --task email
```

The dashboard (`/`) shows decisions stream in live; **Run demo attack** replays a
spread of attacks through the real engine. See [docs/RUNBOOK.md](docs/RUNBOOK.md).

### Mission Control (`/mission`)

An interactive page where you hand an autonomous agent a real goal (editable task
+ a knowledge document you can poison), toggle the **defender ON/OFF**, and watch
a **live LLM** execute step by step. Governed: the agent gets hijacked but every
dangerous call is stripped — *"Defender held."* Ungoverned: the same agent
exfiltrates — *"Breach."* An Impact panel proves what reached the outside world.

## Stack

| Layer | Tech |
|-------|------|
| Proxy | Python 3.12 · FastAPI 0.136 · Uvicorn · httpx · Pydantic 2 |
| Detection | deterministic rules · Prompt Guard 2 (Groq) · regex/Presidio PII |
| Reasoner | gpt-oss-safeguard-20b (Groq, selective) |
| Store | SQLite · SQLAlchemy 2.0 · HMAC-SHA256 receipts |
| Dashboard | Next.js 16 · Tailwind 4 · SSE live feed |
| Demo agent | OpenAI SDK · llama-3.3-70b-versatile |

All models run on the **Groq free tier** — zero cost, no card.

## SDKs and provider adapters

Install from npm:

```bash
npm install agent-defender
```

Install from PyPI once published:

```bash
pip install agent-defender
```

- **Python**: `FirewallOpenAI`, `FirewallAnthropic`, `FirewallGoogleGenerativeAI`,
  `FirewallCallbackHandler`, and `create_openai_compatible_firewall`.
- **Node.js**: `FirewallOpenAI`, `FirewallAnthropic`,
  `FirewallGoogleGenerativeAI`, LangChain.js callback support, and
  `createFirewallOpenAICompatible`.

See [docs/SDK_INTEGRATION.md](docs/SDK_INTEGRATION.md) for Claude, Gemini,
Groq, NVIDIA, Mistral, Together, and LangChain examples.

## Layout

```
proxy/        FastAPI defender (app/ + tests/)
dashboard/    Next.js 16 control-plane UI
agent/        demo victim agent + poisoned document
policies/     policy.yaml
docs/         DESIGN · ARCHITECTURE · RUNBOOK
```

## Status

Working end-to-end. Backend: ruff + mypy + 34 pytest green. Dashboard: biome +
tsc + build + 2 Playwright e2e green. Verified live against Groq.

See [docs/DESIGN.md](docs/DESIGN.md) for the design + decision log and
[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for components and data flow.

## License

MIT
