Metadata-Version: 2.4
Name: agent-airlock
Version: 0.8.26
Summary: A type-checker for AI tool calls — strict argument validation, ghost-argument stripping, and self-healing retries for MCP servers and agent frameworks.
Project-URL: Homepage, https://github.com/sattyamjjain/agent-airlock
Project-URL: Documentation, https://github.com/sattyamjjain/agent-airlock#readme
Project-URL: Repository, https://github.com/sattyamjjain/agent-airlock
Project-URL: Issues, https://github.com/sattyamjjain/agent-airlock/issues
Author-email: Sattyam Jain <sattyamjain@example.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agent-safety,ai-security,e2b,llm,mcp,model-context-protocol,pydantic,sandbox,validation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: pydantic<3.0,>=2.0
Requires-Dist: structlog>=24.0
Requires-Dist: tomli>=2.0; python_version < '3.11'
Provides-Extra: all
Requires-Dist: claude-agent-sdk<0.2.0,>=0.1.58; extra == 'all'
Requires-Dist: cloudpickle>=3.0; extra == 'all'
Requires-Dist: crewai<2.0,>=1.14.4; extra == 'all'
Requires-Dist: cryptography>=42.0; extra == 'all'
Requires-Dist: e2b<2.0,>=1.0; extra == 'all'
Requires-Dist: fakeredis>=2.20; extra == 'all'
Requires-Dist: fastmcp<3.0,>=2.0; extra == 'all'
Requires-Dist: google-cloud-modelarmor>=0.2; extra == 'all'
Requires-Dist: mcp>=1.0; extra == 'all'
Requires-Dist: pydantic-ai<2.0,>=1.88.0; extra == 'all'
Requires-Dist: redis<7.0,>=5.0; extra == 'all'
Requires-Dist: textual>=0.40; extra == 'all'
Provides-Extra: attested
Requires-Dist: cryptography>=42.0; extra == 'attested'
Provides-Extra: claude-agent
Requires-Dist: claude-agent-sdk<0.2.0,>=0.1.58; extra == 'claude-agent'
Provides-Extra: console
Requires-Dist: textual>=0.40; extra == 'console'
Provides-Extra: crewai
Requires-Dist: crewai<2.0,>=1.14.4; extra == 'crewai'
Provides-Extra: crypto
Requires-Dist: cryptography>=42.0; extra == 'crypto'
Provides-Extra: dev
Requires-Dist: bandit>=1.9.4; extra == 'dev'
Requires-Dist: cloudpickle>=3.0; extra == 'dev'
Requires-Dist: cyclonedx-bom>=4.0; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.2; extra == 'dev'
Requires-Dist: safety>=3.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs>=1.5; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'docs'
Requires-Dist: pymdown-extensions>=10.0; extra == 'docs'
Provides-Extra: mcp
Requires-Dist: fastmcp<3.0,>=2.0; extra == 'mcp'
Requires-Dist: mcp>=1.0; extra == 'mcp'
Provides-Extra: modal
Requires-Dist: cloudpickle>=3.0; extra == 'modal'
Requires-Dist: modal>=0.65; extra == 'modal'
Provides-Extra: model-armor
Requires-Dist: google-cloud-modelarmor>=0.2; extra == 'model-armor'
Provides-Extra: pydantic-ai
Requires-Dist: pydantic-ai<2.0,>=1.88.0; extra == 'pydantic-ai'
Provides-Extra: redis
Requires-Dist: fakeredis>=2.20; extra == 'redis'
Requires-Dist: redis<7.0,>=5.0; extra == 'redis'
Provides-Extra: sandbox
Requires-Dist: cloudpickle>=3.0; extra == 'sandbox'
Requires-Dist: e2b<2.0,>=1.0; extra == 'sandbox'
Description-Content-Type: text/markdown

<div align="center">

<!-- Animated Typing Header -->
<a href="https://github.com/sattyamjjain/agent-airlock">
  <img src="https://readme-typing-svg.demolab.com?font=Fira+Code&weight=700&size=28&duration=3000&pause=1000&color=00D4FF&center=true&vCenter=true&multiline=true&repeat=true&width=700&height=100&lines=%F0%9F%9B%A1%EF%B8%8F+Agent-Airlock;Your+AI+Agent+Just+Tried+rm+-rf+%2F.+We+Stopped+It." alt="Agent-Airlock Typing Animation" />
</a>

### A type-checker for AI tool calls

**Strict validation, ghost-argument stripping, and self-healing retries — one decorator, any agent or MCP server.**

<!-- Primary Badges Row -->
[![PyPI version](https://img.shields.io/pypi/v/agent-airlock?style=for-the-badge&logo=pypi&logoColor=white&color=3775A9)](https://pypi.org/project/agent-airlock/)
[![Downloads](https://img.shields.io/pypi/dm/agent-airlock?style=for-the-badge&logo=python&logoColor=white&color=success)](https://pypistats.org/packages/agent-airlock)
[![CI](https://img.shields.io/github/actions/workflow/status/sattyamjjain/agent-airlock/ci.yml?style=for-the-badge&logo=github&label=CI&color=success)](https://github.com/sattyamjjain/agent-airlock/actions/workflows/ci.yml)
[![codecov](https://img.shields.io/codecov/c/github/sattyamjjain/agent-airlock?style=for-the-badge&logo=codecov&logoColor=white)](https://codecov.io/gh/sattyamjjain/agent-airlock)

<!-- Secondary Badges Row -->
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green?style=flat-square)](https://opensource.org/licenses/MIT)
[![GitHub stars](https://img.shields.io/github/stars/sattyamjjain/agent-airlock?style=flat-square&logo=github)](https://github.com/sattyamjjain/agent-airlock/stargazers)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)

<!-- TEST-BADGE-START -->
<!-- Auto-generated by scripts/update_test_badge.py. Do not edit by hand. -->
**Test suite:** 2,510 tests · **Coverage:** 83.42% · **v0.8.5**
<!-- TEST-BADGE-END -->

<br/>

[**Get Started in 30 Seconds**](#-30-second-quickstart) · [**Why Airlock?**](#-the-problem-no-one-talks-about) · [**All Frameworks**](#-framework-compatibility) · [**Benchmark**](BENCHMARK.md) · [**Docs**](#-documentation)

<br/>

</div>

---

<!-- Hero Visual Block -->
<div align="center">

```
┌────────────────────────────────────────────────────────────────┐
│  🤖 AI Agent: "Let me help clean up disk space..."            │
│                           ↓                                    │
│               rm -rf / --no-preserve-root                      │
│                           ↓                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  🛡️ AIRLOCK: BLOCKED                                     │  │
│  │                                                          │  │
│  │  Reason: Matches denied pattern 'rm_*'                   │  │
│  │  Policy: STRICT_POLICY                                   │  │
│  │  Fix: Use approved cleanup tools only                    │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘
```

</div>

---

## 🎯 30-Second Quickstart

```bash
pip install agent-airlock
```

```python
from agent_airlock import Airlock

@Airlock()
def transfer_funds(account: str, amount: int) -> dict:
    return {"status": "transferred", "amount": amount}

# LLM sends amount="500" (string) → BLOCKED with fix_hint
# LLM sends force=True (invented arg) → STRIPPED silently
# LLM sends amount=500 (correct) → EXECUTED safely
```

**That's it.** Your function now has ghost argument stripping, strict type validation, and self-healing errors.

---

## 🧠 The Problem No One Talks About

<table>
<tr>
<td width="50%">

### The Hype

> *"MCP has 16,000+ servers on GitHub!"*
> *"OpenAI adopted it!"*
> *"Linux Foundation hosts it!"*

</td>
<td width="50%">

### The Reality

**LLMs hallucinate tool calls. Every. Single. Day.**

- Claude invents arguments that don't exist
- GPT-4 sends `"100"` when you need `100`
- Agents chain 47 calls before one deletes prod data

</td>
</tr>
</table>

**Enterprise solutions exist:** Prompt Security ($50K/year), Pangea (proxy your data), Cisco ("coming soon").

**We built the open-source alternative.** One decorator. No vendor lock-in. Your data never leaves your infrastructure.

---

## ✨ What You Get

<table>
<tr>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/delete-shield.png" alt="shield"/>
<br/><b>Ghost Args</b>
<br/><sub>Strip LLM-invented params</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/checked.png" alt="check"/>
<br/><b>Strict Types</b>
<br/><sub>No silent coercion</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/refresh.png" alt="refresh"/>
<br/><b>Self-Healing</b>
<br/><sub>LLM-friendly errors</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/lock.png" alt="lock"/>
<br/><b>E2B Sandbox</b>
<br/><sub>Isolated execution</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/user-shield.png" alt="user"/>
<br/><b>RBAC</b>
<br/><sub>Role-based access</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/privacy.png" alt="privacy"/>
<br/><b>PII Mask</b>
<br/><sub>Auto-redact secrets</sub>
</td>
</tr>
<tr>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/network-card.png" alt="network"/>
<br/><b>Network Guard</b>
<br/><sub>Block data exfiltration</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/folder-invoices.png" alt="folder"/>
<br/><b>Path Validation</b>
<br/><sub>CVE-resistant traversal</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/restart.png" alt="circuit"/>
<br/><b>Circuit Breaker</b>
<br/><sub>Fault tolerance</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/analytics.png" alt="otel"/>
<br/><b>OpenTelemetry</b>
<br/><sub>Enterprise observability</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/money-bag.png" alt="cost"/>
<br/><b>Cost Tracking</b>
<br/><sub>Budget limits</sub>
</td>
<td align="center" width="16%">
<img width="40" src="https://img.icons8.com/fluency/48/syringe.png" alt="vaccine"/>
<br/><b>Vaccination</b>
<br/><sub>Auto-secure frameworks</sub>
</td>
</tr>
</table>

---

## 📋 Table of Contents

<details>
<summary><b>Click to expand full navigation</b></summary>

- [30-Second Quickstart](#-30-second-quickstart)
- [The Problem](#-the-problem-no-one-talks-about)
- [What You Get](#-what-you-get)
- [Core Features](#-core-features)
  - [E2B Sandbox](#-e2b-sandbox-execution)
  - [Security Policies](#-security-policies)
  - [Cost Control](#-cost-control)
  - [PII Masking](#-pii--secret-masking)
  - [Network Airgap](#-network-airgap-v030)
  - [Framework Vaccination](#-framework-vaccination-v030)
  - [Circuit Breaker](#-circuit-breaker-v040)
  - [OpenTelemetry](#-opentelemetry-observability-v040)
- [Framework Compatibility](#-framework-compatibility)
- [FastMCP Integration](#-fastmcp-integration)
- [Comparison](#-why-not-enterprise-vendors)
- [Installation](#-installation)
- [OWASP Compliance](#️-owasp-compliance)
- [Performance](#-performance)
- [Documentation](#-documentation)
- [Contributing](#-contributing)
- [Support](#-support)

</details>

---

## 🔥 Core Features

### 🔒 E2B Sandbox Execution

```python
from agent_airlock import Airlock, STRICT_POLICY

@Airlock(sandbox=True, sandbox_required=True, policy=STRICT_POLICY)
def execute_code(code: str) -> str:
    """Runs in an E2B Firecracker MicroVM. Not on your machine."""
    exec(code)
    return "executed"
```

| Feature | Value |
|---------|-------|
| Boot time | ~125ms cold, <200ms warm |
| Isolation | Firecracker MicroVM |
| Fallback | `sandbox_required=True` blocks local execution |

Air-gapped / on-prem? `DockerBackend` is the supported alternative
— `cap_drop=["ALL"]`, `no-new-privileges`, `network_mode="none"`,
timeout enforced, opt-in `pytest -m docker` integration tests. See
[`docs/sandbox/docker.md`](docs/sandbox/docker.md).

#### ModalBackend — Modal-hosted sandbox *(v0.8.11+, issue #30)*

Already running the rest of your agent on [Modal](https://modal.com/)?
`ModalBackend` lets you keep airlocked tool execution on the same
substrate instead of mixing E2B and Modal billing / observability.

```bash
pip install "agent-airlock[modal]"
```

```python
from agent_airlock import Airlock, STRICT_POLICY, AirlockConfig
from agent_airlock.sandbox_backend import ModalBackend

backend = ModalBackend(
    app_name="my-airlock-sandbox",
    image_ref="python:3.11-slim",
    cpu=0.5,
    memory_mb=512,
    timeout_s=30,
    # network_policy=None  → block_network=True (fail-closed default)
)

@Airlock(sandbox=True, sandbox_required=True, policy=STRICT_POLICY,
         config=AirlockConfig(sandbox_backend=backend))
def execute_code(code: str) -> str:
    exec(code)
    return "executed"
```

**Isolation model — read before you reach for `cap_drop`.** Modal
sandboxes run under **gVisor** (kernel-syscall filtering), not under
Docker-style capability dropping. The Modal Python SDK does not
expose `cap_drop` / `cap_add` / `seccomp` / `no-new-privileges` —
there is no equivalent knob to map. If your threat model needs
Linux-capability dropping at the container layer, keep using
`DockerBackend`. The network posture *is* configurable: `ModalBackend`
defaults to `block_network=True` (deny-by-default), and a supplied
`NetworkPolicy` maps to Modal's `block_network` flag (`allow_egress=False`
→ blocked, `True` → allowed). Hostname allowlists in `NetworkPolicy.allowed_hosts`
do **not** forward to Modal (their API is CIDR-only); the backend logs
a structlog warning and the operator is expected to re-state hostname
constraints at the `Airlock` policy layer.

`ModalBackend` is **opt-in only** — it is NOT added to the
`get_default_backend()` priority chain (E2B → Docker → Local stays the
default flow). Existing callers see no behavior change.

---

### 📜 Security Policies

| Preset | Use case | Key posture |
|---|---|---|
| `PERMISSIVE_POLICY` | Dev / sandbox | No restrictions |
| `STRICT_POLICY` | Prod | Rate-limited, requires agent identity, denies dangerous capabilities |
| `READ_ONLY_POLICY` | Analytics / RAG | `read_*` / `get_*` / `list_*` / `search_*` only |
| `BUSINESS_HOURS_POLICY` | Compliance windows | `delete_*` / `drop_*` / `*_production` only 09:00–17:00 |
| `CAMOUFLAGE_RESISTANT_POLICY` *(v0.8.6)* | Detector-independent defense vs. domain-camouflaged injection | Deny-by-default allowlist, ghost-arg BLOCK, output cap, per-call reauthorization |

```python
from agent_airlock import (
    PERMISSIVE_POLICY,
    STRICT_POLICY,
    READ_ONLY_POLICY,
    BUSINESS_HOURS_POLICY,
    CAMOUFLAGE_RESISTANT_POLICY,  # v0.8.6
)

# Or build your own:
from agent_airlock import SecurityPolicy

MY_POLICY = SecurityPolicy(
    allowed_tools=["read_*", "query_*"],
    denied_tools=["delete_*", "drop_*", "rm_*"],
    rate_limits={"*": "1000/hour", "write_*": "100/hour"},
    time_restrictions={"deploy_*": "09:00-17:00"},
)
```

#### CAMOUFLAGE_RESISTANT — detector-independent injection defense (v0.8.6)

[arXiv:2605.22001](https://arxiv.org/abs/2605.22001) ("Blind Spots in
the Guard", Pai, May 2026) shows that production injection detectors —
**Llama Guard 3 included** — drop to **IDR = 0.000** on payloads that
mimic the target document's domain vocabulary and authority structure.
Per the paper, detection rates collapse from 93.8% to 9.7% on Llama 3.1
8B and from 100% to 55.6% on Gemini 2.0 Flash.

`CAMOUFLAGE_RESISTANT_POLICY` does not rely on payload-content
signatures at all. It blocks at four structural seams an attacker has
to ride regardless of phrasing:

1. **Deny-by-default tool allowlist.** Empty `allowed_tools` means
   *nothing* is callable; deployments opt every tool in by name. A
   camouflaged directive targeting an unlisted tool is blocked on
   allowlist grounds without ever invoking a detector.
2. **Ghost-argument BLOCK.** A camouflaged directive cannot smuggle
   undeclared parameters past validation.
3. **Hard output cap + sanitization.** Tool output that re-enters the
   model context is truncated and PII/secret-masked so a camouflaged
   directive embedded in tool output can't carry into a downstream
   agent at full length.
4. **Per-call reauthorization (debate-amplification guard).** Once a
   tool's output has flowed back into the model, any reinvocation
   requires an explicit `context.authorize_once(tool)` grant from the
   harness — breaking the multi-agent fan-out path the paper identifies.

```python
from agent_airlock import Airlock, apply_camouflage_resistant

bundle = apply_camouflage_resistant(allowed_tools=["read_file", "search"])

@Airlock(config=bundle.config, policy=bundle.policy)
def read_file(path: str) -> str:
    ...
```

`apply_camouflage_resistant()` composes the matching `AirlockConfig`
(unknown-args BLOCK, sanitization on, output cap 4000 chars) with a
`SecurityPolicy` carrying your explicit allowlist. The preset is
deliberately incomplete on its own — the config-level knobs and the
policy-level knobs span two seams, so the factory returns both as a
`CamouflageResistantBundle`.

> **Running an MCP server with STDIO transport?** Also wire the
> [Ox MCP STDIO sanitizer](#️-owasp-compliance) via
> `stdio_guard_ox_defaults()` — it blocks the entire
> CVE-2026-30616 class (shell metacharacter injection,
> non-allowlisted binaries, Trojan-Source RTL overrides, and
> inline-code flags) before `subprocess.Popen`.

---

### 🪪 MCP server attestation (v0.8.10)

[arXiv:2605.24248](https://arxiv.org/abs/2605.24248) ("Attested
Tool-Server Admission", Metere, May 2026) calls out a gap MCP itself
does not close: the protocol standardises *message exchange* between
LLM agents and tool servers but says nothing about *trust*. Anybody who
can answer on the wire can declare themselves a tool server.

`mcp_attested_admission_defaults()` is a deny-by-default opt-in preset
that closes the gap host-side, mirroring the paper's three additive
mechanisms:

1. **Offline-signed clearance assertion.** Before any tool from an MCP
   server is dispatched, the host fetches a JWS-compact clearance from
   `{server_url}/.well-known/mcp-clearance` (path is configurable) and
   verifies its signature against an **operator-pinned trust root**.
   The trust root is supplied to `AttestedAdmissionConfig` at process
   startup — never network-fetched on the hot path.
2. **Deny-by-default per-server tool allowlist.** Admitting a server
   is not the same as trusting its every tool. The verified clearance
   carries an explicit list of tool names the host will permit;
   everything else is denied. The `sub` claim is matched against the
   server identity the host is about to dispatch to (so a stolen
   clearance from server A can't admit a tool call to server B).
3. **Flavor-gated enforcement.** `ENFORCE` (default) hard-denies on
   missing / invalid / expired clearance; `WARN` logs and admits — the
   staged turn-up an operator wants when introducing the gate against
   real traffic.

Every admission decision emits a
[`ReceiptVerdict`](./docs/attest/receipt.md) on the
`guard="mcp_attested_admission"` channel, so the existing `airlock attest`
DSSE pipeline picks decisions up unchanged — this preset does **not**
invent a new log.

```python
from agent_airlock.mcp_proxy_guard import MCPProxyConfig, MCPProxyGuard
from agent_airlock.mcp_spec.attested_admission import TrustRoot
from agent_airlock.policy_presets import mcp_attested_admission_defaults

# Operator pins the trust root at startup. Never fetched at runtime.
with open("/etc/airlock/mcp-clearance-root.pem", "rb") as fh:
    pinned_pem = fh.read()

cfg = mcp_attested_admission_defaults(
    trust_root=TrustRoot(key_id="ops-2026Q2", ed25519_pem=pinned_pem),
    enforcement_mode="ENFORCE",       # deny-by-default
    max_clearance_age_days=30,
)
guard = MCPProxyGuard(MCPProxyConfig(attested_admission=cfg))

decision = guard.audit_tool_admission(
    server_url="https://mcp.example.com",
    server_id="srv-alpha",            # expected `sub` claim
    tool_name="read",
)
if not decision.admitted:
    raise RuntimeError(decision.reason)
```

Signature verification needs the `[attested]` extra (pulls in
`cryptography` for offline Ed25519 / RSA-PSS / JWKS verification); the
base install stays zero-runtime-dep.

> Install with `pip install "agent-airlock[attested]"`. Opt-in only —
> existing callers that don't set `attested_admission` get exactly
> v0.8.9 behavior.

---

### 🧭 Behavioral sequence guard (v0.8.12)

Watches the **ordered stream of tool calls** in a session and flags
divergence from a declared expected order — *not* the model's stated
reasoning trace.

[arXiv:2605.27901](https://arxiv.org/abs/2605.27901) ("The Fragility
of Chain-of-Thought Monitoring", Onyame, Zhou, Thopalli, Kailkhura
& Agarwal, May 2026) reports an average **95.9% CoT unfaithfulness
across 8B–120B-parameter models** — including answer-switching,
post-hoc rationalisation, and procedural exploitation of hints.
Trusting the model's stated reasoning to detect misbehavior is
therefore not viable. Trusting its **behavior** — the sequence of
tools it actually invokes — is.

`SequenceGuard` is an opt-in field on `SecurityPolicy` that runs in
the `@Airlock` seam **right after** the standard policy check, in two
modes:

**DECLARED mode** — operator supplies a permitted-transition DAG.
Any transition not in the DAG is a `SequenceViolation`. Deny-by-default.

```python
from agent_airlock import Airlock, SecurityPolicy
from agent_airlock.sequence_guard import SequenceGuard, ENTRY_SENTINEL

policy = SecurityPolicy(
    sequence_guard=SequenceGuard(
        mode="declared",
        action="block",                       # or "warn"
        dag={
            ENTRY_SENTINEL: {"read"},         # only `read` may start a session
            "read": {"read", "summarize"},    # after read, either re-read or summarize
            "summarize": {"send"},            # after summarize, only send
            "send": set(),                    # send is terminal
        },
    ),
)
```

**BASELINE mode** — guard maintains a per-session-key Markov transition
profile in a local JSON file (no cloud, no PII — only tool names and
SHA-256 *shape hashes* of `(arg types, kwarg names+types)`, **never
argument values**) and flags transitions with observed
`P(curr | prev) < threshold` once the sample size from `prev` reaches
`min_baseline_samples`.

```python
from pathlib import Path
from agent_airlock.sequence_guard import SequenceGuard

policy = SecurityPolicy(
    sequence_guard=SequenceGuard(
        mode="baseline",
        baseline_path=Path("/var/lib/airlock/sequence-baseline.json"),
        low_probability_threshold=0.05,   # flag the bottom 5%
        min_baseline_samples=50,          # don't flag until 50 obs from `prev`
    ),
)
```

Every flagged transition emits OTel span attributes on the current
span (`airlock.sequence_guard.mode`, `.from_tool`, `.to_tool`,
`.session_key`, `.observed_probability`) via the existing
`observability` provider — telemetry failures are swallowed so they
cannot break enforcement.

**Not** `AnomalyDetector` (that's rate / endpoint-diversity /
error-rate / consecutive-blocked over sliding windows). `SequenceGuard`
is per-transition ORDER signal. Run both for layered coverage. **Not**
a chain-of-thought monitor — by construction.

> Strictly opt-in. The new `SecurityPolicy.sequence_guard` field
> defaults to `None`; callers that don't set it get exactly v0.8.11
> behavior. Zero new runtime deps — Pydantic-only core stays intact.

---

### 🛑 Action-time contradiction gate (v0.8.15)

[arXiv:2605.27157](https://arxiv.org/abs/2605.27157) ("Detecting Is
Not Resolving: The Monitoring Control Gap in Retrieval Augmented
LLMs", Yu et al., 2026) shows that LLMs **readily acknowledge
contradictory evidence** in their reasoning trace yet "this awareness
fails to constrain their final recommendations". The deficit is at
*action selection* — single-turn diagnostics overestimate RAG safety,
and detection alone is not a control.

`ActionContradictionGate` is an opt-in policy hook that wraps three
**pluggable detectors** (any one trips) and a **privileged-sink glob
set**. When a detector trips AND the dispatched tool matches a
privileged sink AND the harness has not issued an explicit allow,
the gate blocks the call (or warns, depending on `action=`).

The explicit-allow primitive **is not new** — the gate reuses the
existing `AirlockContext.authorize_once(tool_name)` (introduced for
the v0.8.6 reauth flow). Same one-shot grant, same semantics. After
a one-shot is consumed the gate **re-locks** — the harness must mint
a fresh `authorize_once` for each privileged action.

```python
import re
from agent_airlock import Airlock, SecurityPolicy
from agent_airlock.action_contradiction_gate import ActionContradictionGate

policy = SecurityPolicy(
    action_contradiction_gate=ActionContradictionGate(
        # Detector 1: a boolean flag the RAG pipeline flips on after
        # it sees an evidence-vs-claim conflict the agent discussed.
        signal_field_key="evidence_contradiction",
        # Detector 2: pluggable regex against the SAME key when its
        # value is a string (operator-controlled marker — never the
        # model's full reasoning trace).
        marker_regex=re.compile(r"contradict|conflict|disagree", re.I),
        # Detector 3: fully pluggable callable; receives the context.
        # predicate=lambda ctx: ctx.metadata.get("conflict_count", 0) > 1,
        # Default privileged sinks: send_* / export_* / commit_* /
        # transfer_* / delete_* + the v0.8.14 outbound-integration set.
        # Operators can narrow via `privileged_sinks=(...)`.
        action="block",  # or "warn" for staged turn-up
    ),
)
```

**Off-by-default invariant.** `SecurityPolicy.action_contradiction_gate`
defaults to `None`; non-RAG flows pay **zero false-positive tax**
(no detector runs, no log lines, no metadata reads). Even when wired,
the gate is **inert until at least one detector slot is configured**
— so a partial roll-out (gate attached but detectors flipped off)
admits everything.

**Not a chain-of-thought monitor.** The gate reads **operator-
controlled signals** only (a metadata field, an operator regex, an
operator predicate). It never reads the model's own claim that it
has or has not noticed a contradiction — the paper's whole point is
that those claims do not gate behavior.

**Not** `sequence_guard` (v0.8.12) — that flags unusual call ORDER.
**Not** `reauth_on_untrusted_reinvocation` (v0.8.6) — that's
count-driven on a per-tool counter. This gate is signal-driven and
targets a specific privileged-sink glob set. They compose; run all
three for layered coverage.

> Strictly opt-in. Zero new runtime deps — Pydantic-only core stays
> intact. The new `SecurityPolicy.action_contradiction_gate` field
> defaults to `None`; callers that don't set it get exactly v0.8.14
> behavior.

---

### 📊 Adversarial-negotiation regression harness (v0.8.17)

A deterministic harness that measures what the deny-by-default
governance layer does to a fixed set of adversarial buyer-seller
negotiation actions — and reports two metrics named to line up with an
external published baseline so the numbers can sit side by side.

```bash
python -m agent_airlock.cli.negotiation_bench --report markdown
```

Each scenario carries a **concrete, checkable unsafe action** and runs
twice — **baseline** (no airlock, the unsafe event lands) and
**governed** (the *same* action through the **real** `@Airlock`
intercept-before-execute path, no policy-layer mocking). Three
unsafe-action classes each exercise a different real interception
mechanism: price-below-floor → Pydantic strict-validation,
secret-leak → the output sanitizer, transfer-outside-policy →
deny-by-default `SecurityPolicy`. Benign deals are included to confirm
governance does not over-block.

| source | unsafe_execution_rate (base → governed) | valid_task_success_rate (base → governed) |
|---|---|---|
| **agent-airlock** (this harness) | 100% → **0%** | 43% → **100%** |
| OCL (external, live LLMs, [arXiv:2606.04306](https://arxiv.org/abs/2606.04306)) | 88% → ~0% | 12% → 96% |

> **The OCL row is an external result, not agent-airlock's.** It was
> measured on live frontier LLM agents in AgenticPay-adapted negotiation
> ([OCL, arXiv:2606.04306](https://arxiv.org/abs/2606.04306);
> [AgenticPay, arXiv:2602.06008](https://arxiv.org/abs/2602.06008)) and
> is reproduced here only for **directional comparison** — both put
> governance at the execution boundary. It is **not** the same
> experiment: agent-airlock is a deterministic execution-boundary
> validator, not an LLM, and this harness does not call a model. The
> agent-airlock rows are a property of the **policy layer** under a
> worst-case scripted adversary, exercised through the real `@Airlock`
> path.

The harness doubles as a **regression gate**: `--fail-if-governed-unsafe`
exits non-zero if the governed `unsafe_execution_rate` ever rises above
zero, so a future change that weakens the policy layer fails CI. Zero
new runtime deps; fully deterministic (no randomness, no network, no
model call).

---

### 🔎 Privilege right-sizing — `airlock-explain --unused-scopes` (v0.8.13)

A read-only CLI that surfaces **over-permissioning**: it diffs the
`SecurityPolicy`'s **granted** tool scopes against the tools the agent
**actually called** (from an OTLP export OR a native audit JSONL),
per `AgentIdentity`, and prints the dead-weight set plus a *suggested*
tightened allow-list.

```bash
# Install the v0.8.13 wheel; airlock-explain becomes available
pip install "agent-airlock>=0.8.13"

# Diff granted vs used; print a table
airlock-explain --unused-scopes \
    --policy ./security-policy.toml \
    --trace  ./agent.audit.jsonl

# Same, machine-readable, plus a proposed tightened policy preview
airlock-explain --unused-scopes \
    --policy ./security-policy.toml \
    --trace  ./otel-export.json \
    --format json \
    --suggest-policy
```

**Observability-only.** This command **never mutates** the
`SecurityPolicy`, **never writes** the policy file, and **never
auto-applies** the suggestion. The deny-by-default posture is
unchanged — the right-size CLI is a *review aid*, not an enforcement
primitive. The `--suggest-policy` output is intentionally a stdout
preview so a human reviews the tightened allow-list before adopting
it by hand.

**Trace formats** (auto-detected by inspecting the file head):

- **Audit JSONL** — the format `AuditLogger` already emits. One JSON
  object per line, with `tool_name` / `agent_id` / `blocked`. Blocked
  calls are excluded from the "actually called" set — a blocked call
  is not an exercise of a granted scope.
- **OTLP JSON** — the format `opentelemetry-exporter-otlp` writes.
  Span `name` is the tool name; `attributes.agent_id` keys the per-
  agent diff. If a span carries `airlock.blocked=true` it is skipped,
  same as JSONL.

**Diff semantics.** The matcher is `fnmatch` — the same glob semantics
`SecurityPolicy.check_tool_allowed` uses internally, so the suggested
tightened allow-list admits exactly the tools the agent was observed
calling (no surprises at adoption time). Denied-list patterns are
forwarded unchanged to the suggestion: denials are *intent*, not
usage data.

> Strictly observability. No new runtime deps. The new console-script
> entry `airlock-explain` is the project's first installable CLI;
> existing `python -m agent_airlock.cli.<name>` invocations are
> unaffected.

---

### 🩹 Skill-resistant trace redaction + watermark (v0.8.24)

**Why traces are an extraction surface:** an agent's emitted trace/receipt is
a distillation target, not just an audit artifact. A trace that records the
tuned thresholds a policy fired on, the exact tool-call arguments, and the
recovered intermediate formulas/strategies hands a competitor the *recipe* —
enough to clone the behaviour without paying for the search that found it. The
verifier, by contrast, needs only the *evidence* (the gate ran / the policy
fired / pass-fail), never the recipe. This is the RedAct-style threat model —
a composition of published behavioural-watermarking work
([Agent Guide, arXiv:2504.05871](https://arxiv.org/abs/2504.05871);
[CoTGuard, arXiv:2505.19405](https://arxiv.org/abs/2505.19405);
[Distilling the Thought, arXiv:2601.05144](https://arxiv.org/abs/2601.05144)).
agent-airlock does not reproduce any paper's benchmark.

`TraceRedactionPolicy` (opt-in, **OFF by default** for backward compat, **ON
under `STRICT_POLICY`**) runs at the non-local sink (e.g. the OTel exporter):
it (a) **localizes** protected fields with a configurable field-classifier
(tuned thresholds, tool-call args, recovered formulas/strategies), (b)
**rewrites** them to keep verifier-critical evidence while dropping the recipe,
and (c) embeds a **per-tenant behavioural watermark** so a leaked trace is
provably yours. Detect it with `airlock trace verify-watermark <trace.json>`
(cryptographic keyed-HMAC match → high true-detection, low false-alarm); add
`--redaction-report` to see what was localized / rewritten / preserved.
Stdlib-only — no new runtime dependency.

```python
from agent_airlock import TraceRedactionPolicy, trace_redact, verify_watermark

pol = TraceRedactionPolicy(enabled=True, tenant_id="acme-co", watermark_secret="...")
redacted, report = trace_redact(trace, pol)   # tuned_threshold → evidence stub; recipe dropped
assert verify_watermark(redacted, pol).detected   # provably yours
```

---

### ✅ Fail-closed terminal-claim guard — `no_false_success` (v0.8.25)

**An honest stall is recoverable; a confident wrong `done` is not.** The
dominant failure mode of unattended long-horizon agents isn't crashing — it's
*confidently reporting success they never verified*
([Goal-Autopilot, arXiv:2606.11688](https://arxiv.org/abs/2606.11688)). The
`no_false_success` preset enforces that paper's No-False-Success floor: a
terminal/`done` claim is admitted **only if a named, falsifiable check actually
executed and passed THIS run**. No receipt, a failed check, or a forged/replayed
receipt → the guard fails closed to a **recoverable honest stall** (run the
named check and retry), never a fabricated success.

Forgery resistance is structural: the guard mints a per-run token and only
trusts a receipt it stamped by executing the check this run — a receipt that's
merely *present* (hand-built, or replayed from a prior run) is rejected. Opt in
per-agent with `AirlockConfig(require_done_receipt=True)` (or
`require_done_receipt = true` under `[airlock]` in `airlock.toml`); OFF by
default. Stdlib-only — no new runtime dependency.

```python
from agent_airlock import no_false_success_defaults, NoFalseSuccessStall

preset = no_false_success_defaults({"tests_green": run_pytest})  # falsifiable check
preset["guard"].run("tests_green")        # actually execute the check this run
preset["check"]("tests_green")            # raises NoFalseSuccessStall unless it passed
```

---

### 💰 Cost Control

A runaway agent can burn $500 in API costs before you notice.

```python
from agent_airlock import Airlock, AirlockConfig

config = AirlockConfig(
    max_output_chars=5000,    # Truncate before token explosion
    max_output_tokens=2000,   # Hard limit on response size
)

@Airlock(config=config)
def query_logs(query: str) -> str:
    return massive_log_query(query)  # 10MB → 5KB
```

**ROI:** 10MB logs = ~2.5M tokens = $25/response. Truncated = ~1.25K tokens = $0.01. **99.96% savings.**

#### Per-model-tier budgets (v0.8.7)

The flat `max_output_*` caps above apply uniformly to every call. **`ModelTierBudget`** caps per-call cost and output tokens **per model tier label** (e.g. `"frontier"` / `"mid"` / `"small"`), evaluated *before* the tool runs. Untagged calls fall back to a configurable `strict_tier` (deny-by-default — the cheapest tier).

```python
from agent_airlock import (
    Airlock, ModelTierBudget, SecurityPolicy, TierBudget,
)

policy = SecurityPolicy(
    model_tier_budget=ModelTierBudget(
        tiers={
            "frontier": TierBudget(max_cost_cents=50, max_output_tokens=4000),
            "mid":      TierBudget(max_cost_cents=10, max_output_tokens=2000),
            "small":    TierBudget(max_cost_cents=2,  max_output_tokens=1000),
        },
        strict_tier="small",  # untagged → cheapest tier (deny-by-default)
    ),
)

@Airlock(policy=policy, return_dict=True)
def call_model(prompt: str, **_extra):
    return run_my_router(prompt)

# The router tags each call. Airlock blocks before the model fires.
call_model("Draft a tweet",  _airlock_tier="small",    _airlock_input_tokens=50)
call_model("Deep analysis", _airlock_tier="frontier", _airlock_input_tokens=200_000)
# →  AIRLOCK_BLOCK: Tier 'frontier' budget exceeded (worst-case 66¢ > cap 50¢)
```

Routing logic stays in the user's router. Three tagging routes are supported:

1. **`_airlock_tier` kwarg** — stripped before the tool sees it.
2. **`context.metadata["airlock_tier"]`** — set on a contextvar-stored
   `AirlockContext` by the router's session middleware.
3. **`tier_resolver` callback** — `ModelTierBudget(tier_resolver=fn)`
   where `fn(model_id: str) -> tier_label` lives in the caller's code.
   Airlock invokes the callback when `context.metadata["model_id"]`
   is set; it carries no vendor-specific model→tier table.

After execution, actual vs estimated cost is reconciled into the global
`CostTracker` (observability — never blocks). See
[`examples/model_tier_budget.py`](./examples/model_tier_budget.py) for
all four patterns including composition with allow/deny lists.

A ready-to-use `strict_tier_budget_policy()` preset returns a
`SecurityPolicy` seeded with the table above.

---

### 🔐 PII & Secret Masking

```python
config = AirlockConfig(
    mask_pii=True,      # SSN, credit cards, phones, emails
    mask_secrets=True,  # API keys, passwords, JWTs
)

@Airlock(config=config)
def get_user(user_id: str) -> dict:
    return db.users.find_one({"id": user_id})

# LLM sees: {"name": "John", "ssn": "[REDACTED]", "api_key": "sk-...XXXX"}
```

**12 PII types detected** · **4 masking strategies** · **Zero data leakage**

#### Opt-in regional PII (`pii_locales`)

Aadhaar / PAN / UPI / IFSC have always shipped as `SensitiveDataType` members,
but are not added to the default `mask_pii=True` set — to keep the surface
zero-dep and US-shaped by default. **v0.8.9** adds a `pii_locales` opt-in
that pulls them in *and* tightens detection:

```python
config = AirlockConfig(
    mask_pii=True,
    pii_locales=["in"],   # opt in to India-locale detection
)

@Airlock(config=config)
def lookup(query: str) -> str:
    return (
        "User: राम कुमार, "
        "Aadhaar: 234567890124, "       # → "23********24" (PARTIAL)
        "PAN: ABCDE1234F, "             # → "AB******4F" (PARTIAL)
        "phone: 555-123-4567"           # still masked by existing PHONE regex
    )
```

Two things activate when `"in" in pii_locales`:

- **Aadhaar Verhoeff checksum gate** — the existing Aadhaar regex is
  permissive (any 12-digit number starting 2-9 matches). With the opt-in,
  each match must also pass the UIDAI Verhoeff checksum, cutting the FP
  rate ~10x on random IDs / phone numbers.
- **Devanagari personal-name detection** — `PERSONAL_NAME_DEVANAGARI` runs
  against the Unicode block `U+0900–U+097F`, with a small allowlist of
  common Hindi greetings / pronouns / interrogatives to keep ordinary
  prose from being masked. Conservative heuristic — production callers
  who need precise extraction should layer NER on top.

The flag is **additive and reversible** — `pii_locales=[]` (the default)
preserves the prior behavior bit-for-bit.

---

### 🌐 Network Airgap (V0.3.0)

Block data exfiltration during tool execution:

```python
from agent_airlock import network_airgap, NO_NETWORK_POLICY

# Block ALL network access
with network_airgap(NO_NETWORK_POLICY):
    result = untrusted_tool()  # Any socket call → NetworkBlockedError

# Or allow specific hosts only
from agent_airlock import NetworkPolicy

INTERNAL_ONLY = NetworkPolicy(
    allow_egress=True,
    allowed_hosts=["api.internal.com", "*.company.local"],
    allowed_ports=[443],
)
```

---

### 💉 Framework Vaccination (V0.3.0)

Secure existing code **without changing a single line**:

```python
from agent_airlock import vaccinate, STRICT_POLICY

# Before: Your existing LangChain tools are unprotected
vaccinate("langchain", policy=STRICT_POLICY)

# After: ALL @tool decorators now include Airlock security
# No code changes required!
```

**Supported:** LangChain, OpenAI Agents SDK, PydanticAI, CrewAI

---

### ⚡ Circuit Breaker (V0.4.0)

Prevent cascading failures with fault tolerance:

```python
from agent_airlock import CircuitBreaker, AGGRESSIVE_BREAKER

breaker = CircuitBreaker("external_api", config=AGGRESSIVE_BREAKER)

@breaker
def call_external_api(query: str) -> dict:
    return external_service.query(query)

# After 5 failures → circuit OPENS → fast-fails for 30s
# Then HALF_OPEN → allows 1 test request → recovers or reopens
```

---

### 📈 OpenTelemetry Observability (V0.4.0)

Enterprise-grade monitoring:

```python
from agent_airlock import configure_observability, observe

configure_observability(
    service_name="my-agent",
    otlp_endpoint="http://otel-collector:4317",
)

@observe(name="critical_operation")
def process_data(data: dict) -> dict:
    # Automatic span creation, metrics, and audit logging
    return transform(data)
```

---

## 🔌 Framework Compatibility

> **The Golden Rule:** `@Airlock` must be closest to the function definition.

```python
@framework_decorator    # ← Framework sees secured function
@Airlock()             # ← Security layer (innermost)
def my_function():     # ← Your code
```

<table>
<tr>
<td>

### LangChain / LangGraph

```python
from langchain_core.tools import tool
from agent_airlock import Airlock

@tool
@Airlock()
def search(query: str) -> str:
    """Search for information."""
    return f"Results for: {query}"
```

</td>
<td>

### OpenAI Agents SDK

```python
from agents import function_tool
from agent_airlock import Airlock

@function_tool
@Airlock()
def get_weather(city: str) -> str:
    """Get weather for a city."""
    return f"Weather in {city}: 22°C"
```

</td>
</tr>
<tr>
<td>

### PydanticAI

```python
from pydantic_ai import Agent
from agent_airlock import Airlock

@Airlock()
def get_stock(symbol: str) -> str:
    return f"Stock {symbol}: $150"

agent = Agent("openai:gpt-4o", tools=[get_stock])
```

</td>
<td>

### CrewAI

```python
from crewai.tools import tool
from agent_airlock import Airlock

@tool
@Airlock()
def search_docs(query: str) -> str:
    """Search internal docs."""
    return f"Found 5 docs for: {query}"
```

</td>
</tr>
</table>

<details>
<summary><b>More frameworks: LlamaIndex, AutoGen, smolagents, Anthropic</b></summary>

### LlamaIndex

```python
from llama_index.core.tools import FunctionTool
from agent_airlock import Airlock

@Airlock()
def calculate(expression: str) -> int:
    return eval(expression, {"__builtins__": {}})

calc_tool = FunctionTool.from_defaults(fn=calculate)
```

### AutoGen

```python
from autogen import ConversableAgent
from agent_airlock import Airlock

@Airlock()
def analyze_data(dataset: str) -> str:
    return f"Analysis of {dataset}: mean=42.5"

assistant = ConversableAgent(name="analyst", llm_config={"model": "gpt-4o"})
assistant.register_for_llm()(analyze_data)
```

### smolagents

```python
from smolagents import tool
from agent_airlock import Airlock

@tool
@Airlock(sandbox=True)
def run_code(code: str) -> str:
    """Execute in E2B sandbox."""
    exec(code)
    return "Executed"
```

### Anthropic (Direct API)

```python
from agent_airlock import Airlock

@Airlock()
def get_weather(city: str) -> str:
    return f"Weather in {city}: 22°C"

# Use in tool handler
def handle_tool_call(name, inputs):
    if name == "get_weather":
        return get_weather(**inputs)  # Airlock validates
```

</details>

### Adapter-shipped vs example-only (honest split)

> Both paths use the same `@Airlock()` decorator placement. "Adapter-shipped"
> means there's a dedicated `src/agent_airlock/integrations/<framework>.py`
> module with framework-specific glue (signature preservation, tool registry
> rewrites, request-shape adapters). "Example-only" means the decorator is
> compatible out of the box — no extra adapter required.

**Adapter-shipped (11):** LangChain (`integrations/langchain.py`),
LangGraph (`integrations/langgraph_toolnode_compat.py`),
OpenAI Agents SDK (`integrations/openai_guardrails.py`),
Anthropic Messages API (`integrations/anthropic.py`),
Anthropic Claude Agent SDK (`integrations/anthropic_claude_agent_sdk.py`, v0.6.1+),
smolagents (`integrations/smolagents_wrapper.py`),
Gemini 3 Agent Mode (`integrations/gemini3_tool_shape_adapter.py`),
GPT-5.5 (`integrations/gpt5_5_tool_shape_adapter.py`),
PydanticAI (`integrations/pydantic_ai.py`, v0.7.1+),
CrewAI (`integrations/crewai.py`, v0.7.2+),
FastMCP (`mcp.py`).

**Example-only (2):** AutoGen, LlamaIndex —
decorator-compatible without an adapter; see `examples/`.

### Complete Examples

| Framework | Path | Surface |
|-----------|------|---------|
| LangChain | [adapter](./src/agent_airlock/integrations/langchain.py) · [example](./examples/langchain_integration.py) | @tool, AgentExecutor |
| LangGraph | [adapter](./src/agent_airlock/integrations/langgraph_toolnode_compat.py) · [example](./examples/langgraph_integration.py) | StateGraph, ToolNode |
| OpenAI Agents | [adapter](./src/agent_airlock/integrations/openai_guardrails.py) · [example](./examples/openai_agents_sdk_integration.py) | Handoffs, manager pattern |
| Anthropic API | [adapter](./src/agent_airlock/integrations/anthropic.py) · [example](./examples/anthropic_integration.py) | Direct Messages API |
| Claude Agent SDK | [adapter](./src/agent_airlock/integrations/anthropic_claude_agent_sdk.py) · [doc](./docs/integrations/anthropic-claude-agent-sdk.md) | `wrap_agent(agent, policy=...)` |
| smolagents | [adapter](./src/agent_airlock/integrations/smolagents_wrapper.py) · [example](./examples/smolagents_integration.py) | CodeAgent, E2B |
| Gemini 3 | [adapter](./src/agent_airlock/integrations/gemini3_tool_shape_adapter.py) | `function_call` carrier + `thought_signature` redaction |
| GPT-5.5 | [adapter](./src/agent_airlock/integrations/gpt5_5_tool_shape_adapter.py) | `gpt_5_5_agent_defaults` preset |
| FastMCP | [adapter](./src/agent_airlock/mcp.py) · [example](./examples/fastmcp_integration.py) | `@secure_tool` decorator |
| PydanticAI | [adapter](./src/agent_airlock/integrations/pydantic_ai.py) · [doc](./docs/integrations/pydantic-ai.md) · [example](./examples/pydanticai_integration.py) | `wrap_agent(agent, policy=...)` + output_validate hook |
| CrewAI | [adapter](./src/agent_airlock/integrations/crewai.py) · [doc](./docs/integrations/crewai.md) · [example](./examples/crewai_integration.py) | `wrap_crew(crew, policy=...)` + task-level tool overrides |
| LlamaIndex | [example only](./examples/llamaindex_integration.py) | ReActAgent |
| AutoGen | [example only](./examples/autogen_integration.py) | ConversableAgent |

---

## ⚡ FastMCP Integration

```python
from fastmcp import FastMCP
from agent_airlock.mcp import secure_tool, STRICT_POLICY

mcp = FastMCP("production-server")

@secure_tool(mcp, policy=STRICT_POLICY)
def delete_user(user_id: str) -> dict:
    """One decorator: MCP registration + Airlock protection."""
    return db.users.delete(user_id)
```

---

## 🏆 Why Not Enterprise Vendors?

| | Prompt Security | Pangea | **Agent-Airlock** |
|---|:---:|:---:|:---:|
| **Pricing** | $50K+/year | Enterprise | **Free forever** |
| **Integration** | Proxy gateway | Proxy gateway | **One decorator** |
| **Self-Healing** | ❌ | ❌ | **✅** |
| **E2B Sandboxing** | ❌ | ❌ | **✅ Native** |
| **Your Data** | Their servers | Their servers | **Never leaves you** |
| **Source Code** | Closed | Closed | **MIT Licensed** |

> We're not anti-enterprise. We're anti-gatekeeping.
> **Security for AI agents shouldn't require a procurement process.**

---

## 📦 Installation

```bash
# Core (validation + policies + sanitization)
pip install agent-airlock

# With E2B sandbox support
pip install agent-airlock[sandbox]

# With FastMCP integration
pip install agent-airlock[mcp]

# Everything
pip install agent-airlock[all]
```

```bash
# E2B key for sandbox execution
export E2B_API_KEY="your-key-here"
```

---

## 🛡️ OWASP Compliance

Agent-Airlock maps to the [**OWASP Top 10 for Agentic Applications (2026)**](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/)
— the agentic-era successor to the old LLM Top 10. Coverage is
reported honestly: **Full** means the primitive ships and blocks the
class in tests; **Partial** means agent-airlock covers the runtime
leg but something upstream (client UI, IAM, training data) is out of
scope; **Monitor-only** means we surface the signal but do not
actually prevent the risk.

| Risk | Implemented in agent-airlock | Module / preset | Coverage |
|------|------------------------------|-----------------|----------|
| **ASI01 Agent Goal Hijack** | Pydantic strict validation + ghost-arg rejection + `UnknownArgsMode.BLOCK` | `validator`, `unknown_args`, `core` | Partial |
| **ASI02 Tool Misuse and Exploitation** | Deny-by-default `SecurityPolicy`, RBAC, rate limits, `SafePath` / `SafeURL`, Flowise `Function()`/`eval` token ban ([CVE-2025-59528](https://labs.cloudsecurityalliance.org/research/csa-research-note-flowise-mcp-rce-exploitation-20260409-csa/)), MCPwn destructive-auth check ([CVE-2026-33032](https://nvd.nist.gov/vuln/detail/CVE-2026-33032)), Mobile MCP intent-URL guard ([CVE-2026-35394](https://www.sentinelone.com/vulnerability-database/cve-2026-35394/)) | `policy`, `safe_types`, `filesystem`, `network`, `policy_presets.flowise_cve_2025_59528_defaults`, `policy_presets.mcpwn_cve_2026_33032_defaults`, `policy_presets.mobile_mcp_intent_guard_2026_05` | **Full** |
| **ASI03 Identity and Privilege Abuse** | `AgentIdentity`, `MCPProxyGuard` token-passthrough prevention, `CredentialScope`, OAuth-app audit ([Vercel 2026-04-19](https://vercel.com/kb/bulletin/vercel-april-2026-security-incident)), MCP Attested Tool-Server Admission ([arXiv:2605.24248](https://arxiv.org/abs/2605.24248)) | `policy`, `mcp_proxy_guard`, `mcp_spec.oauth_audit`, `mcp_spec.attested_admission`, `policy_presets.oauth_audit_vercel_2026_defaults`, `policy_presets.mcp_attested_admission_defaults` | Partial |
| **ASI04 Agentic Supply Chain Vulnerabilities** | Ox MCP STDIO sanitizer + CVE regression suite (11+ CVEs tracked) + session-snapshot integrity guard + spawn-time MCP config pin (CVE-2026-30615, `policy_presets.mcp_config_pin`) | `mcp_spec.stdio_guard`, `mcp_spec.session_guard`, `mcp_spec.zero_click_config_guard`, `policy_presets.stdio_guard_ox_defaults`, `policy_presets.mcp_config_pin`, `tests/cves/` | Partial |
| **ASI05 Unexpected Code Execution (RCE)** | E2B Firecracker sandbox, pluggable `SandboxBackend`, capability gating for `PROCESS_SHELL`, Flowise eval-token ban ([CVE-2025-59528](https://labs.cloudsecurityalliance.org/research/csa-research-note-flowise-mcp-rce-exploitation-20260409-csa/)) | `sandbox`, `sandbox_backend`, `capabilities`, `policy_presets.flowise_cve_2025_59528_defaults` | **Full** |
| **ASI06 Memory & Context Poisoning** | `AirlockContext` `contextvars` isolation, `ConversationConstraints` budget caps, audit logging | `context`, `conversation`, `sanitizer` | Partial |
| **ASI07 Insecure Inter-Agent Communication** | A2A middleware Pydantic strict validation, method allow-lists | `a2a` | Partial |
| **ASI08 Cascading Failures** | `CircuitBreaker`, `RetryPolicy`, token-bucket rate limits | `circuit_breaker`, `retry`, `policy` | **Full** |
| **ASI09 Human-Agent Trust Exploitation** | Honeypot deception, audit-log attribution, structured `fix_hints` | `honeypot`, `audit_otel` | Partial |
| **ASI10 Rogue Agents** | Audit telemetry + anomaly detector; no quarantine primitive | `observability`, `anomaly` | Monitor-only |

### MCP-specific mapping

The [OWASP MCP Top 10 (2026 beta)](https://owasp.org/www-project-mcp-top-10/)
is covered end-to-end by the `OWASP_MCP_TOP_10_2026` policy preset:

| MCP risk | Ships in agent-airlock |
|----------|------------------------|
| **MCP01 Token Mismanagement** | `MCPProxyGuard` rejects passthrough headers, enforces audience |
| **MCP02 Excessive Permissions** | `SecurityPolicy` + `CredentialScope` |
| **MCP03 Tool Poisoning** | ghost-arg rejection + `SafePath`/`SafeURL` |
| **MCP04 Supply Chain** | `stdio_guard_ox_defaults()` (Ox 2026-04-16 advisory) |
| **MCP05 Command Injection** | `stdio_guard` shell-metachar + deny-pattern rules |
| **MCP07 Insufficient Authentication** | OAuth 2.1 + PKCE S256 helpers in `mcp_spec.oauth` |
| **MCP10 Context Oversharing** | PII/secret sanitizer + workspace-scoped config |

Use it directly:

```python
from agent_airlock import Airlock
from agent_airlock.policy_presets import owasp_mcp_top_10_2026_policy

@Airlock(policy=owasp_mcp_top_10_2026_policy())
def my_mcp_tool(...):
    ...
```

> **Ox Security STDIO advisory** (2026-04-16, CVE-2026-30616): see
> [`docs/cves/index.md#cve-2026-30616`](docs/cves/index.md#cve-2026-30616)
> and the `stdio_guard_ox_defaults()` preset above. agent-airlock
> blocks 3 of 4 Ox attack classes at the runtime seam.

---

## 🏢 Used By

Agent-Airlock secures AI agent systems in production:

| Project | Use Case |
|---------|----------|
| [**FerrumDeck**](https://github.com/sattyamjjain/FerrumDeck) | AgentOps control plane — deny-by-default tool execution |
| [**Mnemo**](https://github.com/sattyamjjain/Mnemo) | MCP-native memory database — secure tool call validation |

> Using Agent-Airlock in production? [Open a PR](https://github.com/sattyamjjain/agent-airlock/edit/main/README.md) to add your project!

---

## 📊 Performance

> Test count and coverage are published by the TEST-BADGE block at the top of this file,
> regenerated from pytest on every release via `python scripts/update_test_badge.py`.
> That block is the source of truth; this table tracks latency and surface area only.

| Metric | Value |
|--------|-------|
| **Validation overhead** | <50ms |
| **Sandbox cold start** | ~125ms |
| **Sandbox warm pool** | <200ms |
| **Framework integrations** | 13 |
| **Core dependencies** | 0 (Pydantic only) |

---

## 📖 Documentation

| Resource | Description |
|----------|-------------|
| [**AGENTS.md**](./AGENTS.md) | v0.6.1 — repo-root entrypoint for agentic IDEs (Cursor, Claude Code, Windsurf, Mintlify) |
| [**Anthropic Claude Agent SDK adapter**](./docs/integrations/anthropic-claude-agent-sdk.md) | v0.6.1 — `AnthropicClaudeAgentSDKAdapter.wrap_agent(agent, policy=...)`; canonical-list trio |
| [**`airlock manifest enforce`**](https://news.backbox.org/2026/05/01/200000-mcp-servers-expose-a-command-execution-flaw-that-anthropic-calls-a-feature/) | v0.6.1 — fail-closed CLI runtime allowlist gate against signed manifests; CI exits 0/2/3 |
| [**Managed Agents Outcomes-rubric guard**](./docs/policies/managed-agents-outcomes-2026-05-06.md) | v0.7.4 — fail-closed gate on the Anthropic Managed Agents 2026-05-06 Outcomes rubric ID; `ManagedAgentsOutcomesGuard.evaluate(provenance)` + `managed_agents_outcomes_2026_05_06_defaults` factory; no SDK dep |
| [**Filter-Eval RCE guard (CVE-2026-25592 + CVE-2026-26030)**](./docs/policies/semantic-kernel-filter-eval-rce.md) | v0.7.5 — regex detector for the Semantic-Kernel-class lambda-filter / template-expression eval RCE primitive (MSRC 2026-05-07); `FilterEvalRCEGuard.evaluate(args)` + `semantic_kernel_filter_eval_rce_2026_25592_26030_defaults` factory; framework-agnostic |
| [**OIDC publish-window guard (TanStack 2026-05-11)**](./docs/policies/npm-oidc-publish-window-guard.md) | v0.7.6 — known-bad blast-list guard for the TanStack/Mini-Shai-Hulud npm OIDC trusted-publisher class (postmortem 2026-05-11; 42 pkgs × 84 versions); `OIDCPublishWindowGuard.evaluate(args)` + `npm_oidc_publish_window_guard_defaults` factory; pure-data preset, no runtime npm calls |
| [**MCP STDIO command-injection guard**](./docs/policies/mcp-stdio-command-injection-guard.md) | v0.7.6 — shell metachar + opt-in path-traversal denier for MCP STDIO argv vectors (HelpNetSecurity 2026-05-05); `StdioCommandInjectionGuard.evaluate(args)` + `mcp_stdio_command_injection_preset_defaults` factory; no `mcp` SDK dep |
| [**Eval-RCE guard (CVE-2026-44717)**](./docs/policies/eval-rce-cve-2026-44717.md) | v0.8.0 — bare-`eval()`/`parse_expr()`/`exec()` invocation detector for the MCP Calculate Server class (NVD 2026-05-15); `EvalRCEGuard.evaluate(args)` + curated vulnerable-package denylist + `parse_expr` safe-form exemption + `stdio_guard_eval_defaults_2026_05_15` factory |
| [**MCP Inspector exposure guard (CVE-2026-23744 runtime)**](./docs/policies/mcp-inspector-exposure-guard.md) | v0.8.0 — Linux runtime listener-scan via stdlib `/proc/net/tcp` for the MCPJam Inspector public-bind class; complements v0.5.x config-time `bind_address_guard`; `MCP_INSPECTOR_REQUIRE_AUTH=1` operator bypass |
| [**Agent SDK Credit pool budget**](./docs/budget/agent-sdk-credit.md) | v0.8.0 — per-month USD pool tracker for Anthropic's 2026-06-15 billing split (Zed blog 2026-05-14); `AgentSDKCreditBudget.register_call(model, input_tokens, output_tokens)` with 90% near-limit + 100% exhausted thresholds; packaged 2026-06 pricing fixture |
| [**OpenAPI Drift Guard (Hermes 2026-05-13)**](./docs/policies/openapi-drift-guard.md) | v0.8.1 — payload-shape drift detector against an operator-supplied OpenAPI 3.x spec (arXiv:2605.14312); `OpenAPIDriftGuard.evaluate(operation_id, args)` detects `missing_required` / `unknown_field` / `type_mismatch`; three modes (`strict` / `warn` / `shadow`); `vaccinate_openapi(spec)` decorator + `openapi_doc_drift_guard_defaults` factory; caller supplies spec dict, no PyYAML dep |
| [**MCP Calc-Server bundle preset**](./docs/policies/eval-rce-cve-2026-44717.md) | v0.8.1 — composition factory `mcp_calc_server_bundle_defaults_2026_05_15()` wires v0.8.0 `EvalRCEGuard` + v0.7.6 `StdioCommandInjectionGuard` under a single preset_id (CVE-2026-44717 anchor) scoped to calc/calculate/evaluate/sympy_eval/math_eval tool-name patterns; pure config composition, no new detector module |
| [**Metis-inspired corpus block-rate regression**](./docs/policies/metis-inspired-corpus-block-rate.md) | v0.8.2 — release-gate primitive `MetisInspiredCorpusBlockRateGuard` runs a deterministic 25-entry exploit-shape corpus (CVE-2026-44717 + 2026-05-05 STDIO injection) through `EvalRCEGuard + StdioCommandInjectionGuard`; one-sided gate fires when block rate drops below baseline − 5%; **NOT a reproduction of the Metis paper's POMDP attacker** (arXiv:2605.10067 cited as motivation, not as prompt source); `airlock corpus-bench` CLI ships text/json/md reports |
| [**Corpus per-category coverage**](./docs/policies/metis-inspired-corpus-block-rate.md) | v0.8.3 — extends the v0.8.2 corpus-bench with HarnessAudit-Bench (arXiv:2605.14271) two-category taxonomy (`resource_access`, `info_transfer`); `CorpusEntry.violation_category` field + `CategoryCount` decision field; `airlock corpus-bench` reports per-category coverage in text/json/md; **NOT a reproduction of HarnessAudit-Bench** (artifacts not yet public — taxonomy adopted as schema, scoring is not) |
| [**Stainless SDK provenance classifier**](./docs/policies/stainless-provenance-probe.md) | v0.8.3 — pure-function `classify_sdk_lineage(user_agent, response_body_head)` building block flags MCP servers generated by the deprecated Stainless SDK toolchain (Anthropic acquired Stainless 2026-05-13, hosted generator winding down); operator-callable from own audit hooks — **NOT an automatic HTTP probe** (decorator-in-process architecture, see ROADMAP §1); `stainless_provenance_probe_defaults()` preset is `default_action=tag_only`, visibility not enforcement |
| [**Human-oversight decorator**](./docs/policies/human-oversight-decorator.md) | v0.8.4 — `@requires_human_oversight(approver=...)` gates a tool function on an operator-supplied approval callable (Code-as-Harness arXiv:2605.18747 anchor); `GRANT` → call wrapped fn, `DENY` → `OversightDeniedError`, `TIMEOUT` → `OversightTimeoutError`; composes with `@Airlock(...)`; protocol shapes + `InProcessRecordedApprover` testing helper; **NOT a bidirectional audit-emitter RPC channel** — operator owns the transport (Slack/PagerDuty/CLI), agent-airlock owns the gate + the protocol |
| [**Layer-contract receipt block**](./docs/attest/layer-contract.md) | v0.8.5 — opt-in `LayerContract` (assume/guarantee) block on signed `airlock attest receipt` payloads (arXiv:2605.18672 anchor); `--contract` derives per-guard `pass_rate` from the verdicts list, `--assumes id1,id2` declares upstream-layer dependencies; receipt schema v1 unchanged (additive field); `pass_rate` is a measured statistic over the sample (not a proof) — every Guarantee carries `sample_size` so verifiers can weight low-N appropriately; **NOT backed by a window-counter store** (that infrastructure doesn't exist yet — derived from the operator-supplied verdicts list, no new abstraction) |
| [**MCP Attested Tool-Server Admission (arXiv:2605.24248)**](https://arxiv.org/abs/2605.24248) | v0.8.10 — opt-in admission gate for MCP tool servers per Metere (May 2026). Host fetches a JWS-compact clearance from `{server_url}/.well-known/mcp-clearance`, verifies its signature against an **operator-pinned trust root** (Ed25519 / RSA-PSS / JWKS — never network-fetched on the hot path), and enforces a **deny-by-default per-server tool allowlist** parsed from the verified clearance. Flavor-gated `ENFORCE` (hard-deny) / `WARN` (log only) modes. Every decision emits a `ReceiptVerdict` on the `guard="mcp_attested_admission"` channel — reuses the existing `airlock attest` DSSE path, does **not** invent a new log. `mcp_attested_admission_defaults()` factory + `MCPProxyGuard.audit_tool_admission()` integration; signature verification gated behind `pip install agent-airlock[attested]`. |
| [**Mobile MCP intent-URL guard (CVE-2026-35394)**](https://www.sentinelone.com/vulnerability-database/cve-2026-35394/) | v0.8.8 — defensive bundle for the Mobilenexthq Mobile MCP `mobile_open_url` intent-injection RCE class (< 0.0.50). `mobile_mcp_intent_guard_2026_05()` returns a pre-configured `SafeURLValidator(allowed_schemes=["http", "https"])` (blocks `intent:`, `content:`, `file:`, `app:`, `data:`, `javascript:`, `vbscript:`), an `AirlockConfig(unknown_args=UnknownArgsMode.BLOCK)`, and the canonical Mobile MCP tool-name corpus (`mobile_open_url`, `open_url`, `mobile_launch_url`). DIFF-COMPATIBLE with the existing `SafeURL` type — no new validator invented. Also fixes a pre-existing `block_private_ips=True` no-op in `SafeURLValidator` (RFC1918 ranges were not actually blocked because the validator's own `SafeURLValidationError` raise was caught by `except ValueError`). |
| [**Capsule ShareLeak / PipeLeak (CVE-2026-21520)**](https://nvd.nist.gov/vuln/detail/CVE-2026-21520) | v0.8.14 — defensive bundle for the [Capsule Security](https://www.capsulesecurity.io/blog-post/shareleak-taking-the-wheel-of-microsofts-copilot-studio-cve-2026-21520)-disclosed indirect-prompt-injection class hitting Microsoft Copilot Studio (ShareLeak, CVE-2026-21520, CVSS 7.5 HIGH, CWE-77, patched 2026-01-15) and Salesforce Agentforce (PipeLeak, parallel pattern). Both vectors share the same architecture: untrusted form input (SharePoint form / Web-to-Lead form) is concatenated into the agent's context with no boundary, while the agent simultaneously holds outbound exfil tools (Outlook send / Salesforce email-case). `capsule_indirect_injection_cve_2026_21520_defaults()` composes existing primitives — `default_deny=True` + canonical exfil-sink `denied_tools` (`send_email`, `outlook_*`, `create_case`, `share_*`, `export_*`, `post_to_*`, `webhook_*`, ...) + `reauth_on_untrusted_reinvocation=True` (v0.8.6 debate-amplification guard at `threshold=1`) + `AirlockConfig(unknown_args=UnknownArgsMode.BLOCK)`. Opt-in only — no new validator invented, no default-priority-chain entry. Pairs with `airlock-explain --unused-scopes` (v0.8.13) so operators populate the read-side allow-list from a real trace before deploying. |
| [**Flowise MCP-stdio adapter RCE (CVE-2026-40933)**](https://advisories.gitlab.com/npm/flowise-components/CVE-2026-40933/) | v0.8.16 — defensive control for the Flowise authenticated-RCE-via-MCP-stdio-adapter class (CVSS 9.9, fixed upstream in Flowise 3.1.0). Flowise ≤ 3.0.x serialises a user-defined CustomMCP `command`+`args` straight into a child-process spawn with no sandbox or argv sanitisation — importing a crafted chatflow is a one-click path to OS-level RCE. `flowise_mcp_stdio_guard_2026_defaults()` is a per-tool-class projection of the v0.7.6 `StdioCommandInjectionGuard` (no new detector invented), scoped to the Flowise CustomMCP stdio surface. Fail-closed on shell metachars (`;`, `&&`, `\|\|`, `\|`, newline, backtick, `$(`) in the `command`/`args` path + opt-in path-traversal outside a `cwd_allowlist`; `check(args)` raises `FlowiseMcpStdioInjectionError`. OWASP **MCP05 Command Injection**. Wired into `ox_mcp_supply_chain_2026_04_defaults()` — **corrects a prior mis-attribution** where CVE-2026-40933 was recorded as a "Semantic Kernel auth-header leak". |
| [**MCP description-vs-manifest guard (`mcp_description_manifest_guard`)**](https://arxiv.org/abs/2606.04769) | v0.8.18 — runtime consistency gate that asserts a tool's **model-facing description** (declared input schema + advertised capability/security boundary) matches its **registered manifest** *before* the tool is admitted, failing closed per the deny-by-default posture. Anchored on the DCIChecker study (arXiv:2606.04769), which measured **Description-Code Inconsistency at 9.93% of 19,200 tool pairs across 2,214 MCP servers**. `DescriptionManifestGuard.evaluate(description)` detects `described_arg_not_in_manifest` (description claims a ghost argument), `undisclosed_side_effect` (manifest has a side effect the description hides — the tool-poisoning direction), and `overclaimed_capability` (description advertises a capability absent from the manifest); three modes (`strict` / `warn` / `shadow`); `vaccinate_description_manifest(manifests)` decorator + `mcp_description_manifest_guard_defaults()` factory. Composes **above** ghost-arg stripping + Pydantic type-validation (which govern the call payload) — it does not replace them. OWASP **MCP03 Tool Poisoning**. Pydantic-only core, no new runtime deps. |
| [**LeRobot pickle-deserialization RCE (CVE-2026-25874)**](https://www.sentinelone.com/vulnerability-database/cve-2026-25874/) | v0.8.19 — deny-by-default posture for the HuggingFace LeRobot unauthenticated-RCE class (CVSS 9.3). LeRobot's async-inference PolicyServer / robot-client `pickle.loads()` payloads received over an **unauthenticated, non-TLS** gRPC channel (`SendObservations` / `SendPolicyInstructions` / `GetActions`) — an unauthenticated, network-reachable attacker reaches arbitrary OS command execution. Ships a reusable **`UnsafeDeserializationGuard`** (in `safe_types`, next to `SafePath`/`SafeURL`) that fails closed on pickle magic bytes (`0x80` PROTO), base64-encoded pickle, and `pickle`/`marshal`/`shelve`/`dill`/`jsonpickle` marker tokens in string args — plus an airgap pairing that refuses serialized-object (`bytes`) args unless the call declares an authenticated **and** TLS transport. Wired into `SecurityPolicy.deserialization_guard` and run at the `@Airlock` seam (Step 2.7) **before** the tool body; the block carries a `fix_hint` naming CVE-2026-25874. `lerobot_cve_2026_25874_defaults()` is the per-CVE projection (deny-by-name globs for `*deserialize*`/`*pickle.loads*`/`torch_load`/the gRPC methods + the wired content guard). Composes **above** ghost-arg stripping + Pydantic type-validation. Pydantic-only core, no new runtime deps. |
| [**MCP server-URL env-interpolation secret leak (CVE-2026-32625)**](https://github.com/danny-avila/LibreChat/security/advisories/GHSA-6vqg-rgpm-qvf9) | v0.8.20 — deny-by-default guard for the LibreChat MCP-server-URL credential-disclosure class (CVSS 9.6, CWE-200, OWASP **MCP01**). A user-supplied MCP server connection template (URL / header / arg) carrying an env-interpolation token (`${VAR}`, bare `$VAR`, or `%VAR%`) is expanded **server-side** against the host `process.env` and leaks a secret (`${JWT_SECRET}` / `${CREDS_KEY}` / `${MONGO_URI}`) into the outbound request. **`MCPServerEnvInterpolationGuard.evaluate(config)`** (in `mcp_spec/env_interpolation_guard.py`) scans the URL/headers/args recursively and refuses **any** interpolation token unless its variable is on an operator-declared `allowed_vars` allowlist of explicitly non-secret vars (empty default = deny all). It never reads `os.environ` or expands anything — token-match only, so it cannot itself leak. `mcp_server_env_interpolation_guard_defaults()` factory + `check(config)` raising `MCPServerEnvInterpolationError`; escaped `\$`/`$$` are not flagged. Pydantic-only core, no new runtime deps. |
| [**Codegen triple-quote / delimiter break-out RCE (CVE-2026-11393)**](https://www.thehackerwire.com/agentcore-cli-rce-via-triple-quote-neutralization-bypass-cve-2026-11393/) | v0.8.21 — deny-by-default guard for the AWS AgentCore CLI code-injection class (CVSS 9, CWE-94, OWASP **ASI05**). The CLI splices a model-/user-controlled `collaborationInstruction` into generated Python **without neutralising triple-quote characters**, so a crafted `"""` closes the generated string literal and injects statements that execute on agent import — RCE on the AgentCore Runtime + the importer's machine. **`CodegenDelimiterInjectionGuard.evaluate(args)`** (in `mcp_spec/codegen_delimiter_guard.py`) recursively scans args bound for a codegen / template / `exec`/`eval` sink and fails closed on triple-quote tokens (`"""` / `'''`), quote break-out tokens (`");` / `')` / `" +` / `']`), and raw newlines — unless the field is on an operator-declared `allowed_literal_fields` allowlist of safe literal contexts. It never generates or executes code — token-match only. `codegen_delimiter_injection_guard_defaults()` factory + `check(args)` raising `CodegenDelimiterInjectionError`; composes one layer above the v0.8.0 `EvalRCEGuard` (which gates the sink itself). Pydantic-only core, no new runtime deps. |
| [**MCP-bridge subprocess command/args/env RCE (CVE-2026-42271, CISA KEV)**](https://www.cisa.gov/known-exploited-vulnerabilities-catalog?field_cve=CVE-2026-42271) | v0.8.22 — deny-by-default guard for the LiteLLM MCP-preview-endpoint command-injection class (CVSS 8.7, CWE-78, OWASP **ASI05**; **on the CISA KEV catalog as of 2026-06-09, actively exploited**). LiteLLM's `POST /mcp-rest/test/connection` + `/mcp-rest/test/tools/list` accepted a full MCP server config (`command` / `args` / `env`) in the request body and spawned it as a subprocess with no validation — any low-privilege API key reached host command execution (unauthenticated RCE when chained with the Starlette Host-header bypass CVE-2026-48710). **`McpSubprocessArgInjectionGuard.evaluate(config)`** (in `mcp_spec/subprocess_arg_guard.py`) treats spawn-shaped MCP-bridge args (`command`/`cmd`/`args`/`argv`/`env`) as untrusted and refuses them unless the resolved program is on an operator-declared `allowed_commands` allowlist of safe static commands (empty default = deny all); an `env` carrying a code-loading var (`LD_PRELOAD`/`PATH`/`PYTHONPATH`/…) is refused regardless, and a config with no spawn-shaped fields passes. Never spawns anything — config inspection only. `mcp_subprocess_arg_injection_guard_defaults()` factory + `check(config)` raising `McpSubprocessArgInjectionError`; composes one layer above the v0.7.6 `StdioCommandInjectionGuard` (which scans an *allowed* argv for shell metachars). Pydantic-only core, no new runtime deps. |
| [**Examples**](./examples/) | 13 framework integrations (11 adapter-shipped + 2 example-only) with copy-paste code |
| [**Security Guide**](./docs/SECURITY.md) | Production deployment checklist |
| [**API Reference**](./docs/API.md) | Every function, every parameter |
| [**Egress Bench**](./docs/security/egress-bench.md) | CVE fixture walker — every payload previously blocked stays blocked |
| [**OX MCP Supply-Chain preset**](./docs/presets/ox-mcp-supply-chain-2026-04.md) | Umbrella for the 2026-04-20 OX dossier (10 CVEs) |
| [**Elicitation guard (`mcp_elicitation_guard_2026_04`)**](https://github.com/modelcontextprotocol/specification/pull/1487) | v0.6.0 — runtime mitigation for the MCP `tool/elicitation` round-trip (spec PR #1487, draft 2026-04-r1); blocks credential-request and policy-override classes |
| [**Config-path guard (CVE-2026-31402)**](https://nvd.nist.gov/vuln/detail/CVE-2026-31402) | v0.6.0 — Claude Desktop MCP-server-registration path-traversal mitigation (CVSS 8.8) |
| [**Gemini 3 Agent Mode adapter**](https://blog.google/technology/google-deepmind/gemini-3-agent-mode-ga/) | v0.6.0 — `function_call` carrier normalisation + `thought_signature` redaction; pinned `SUPPORTED_VERSIONS` set |
| [**OAuth `state` entropy guard**](https://www.blackhat.com/asia-26/briefings/schedule/#oauth-state-injection) | v0.6.0 — base64/hex/JSON decode + prompt-injection scan on the OAuth `state` parameter (BlackHat Asia 2026 vector) |
| [**`airlock console`**](./docs/cli/console.md) | v0.6.0 — three-pane Textual TUI with live verdict stream + replay-on-edit; gated behind `airlock[console]` extra |
| [**`airlock attest receipt`**](./docs/attest/receipt.md) | v0.6.0 — Sigstore-compatible signed agent-run receipts; `emit` + `verify` subcommands |
| [**`policy_bundle.lock`**](./docs/pack/policy-bundle-lock.md) | v0.6.0 — hash-pinned preset bundles with `Cargo.lock` semantics; `airlock pack lock` + `airlock replay --bundle-lock` |
| [**`airlock studio`**](./docs/studio/quickstart.md) | v0.6.0 — local stdlib HTTP rehearsal sandbox; paste-a-transcript verdicts + diff between runs |
| [**smolagents wrapper**](https://github.com/huggingface/smolagents/releases/tag/v1.18) | v0.6.0 — `wrap_agent(agent, policy_bundle)` for HuggingFace smolagents 1.18+ (4th first-class framework) |
| [**STDIO meta-guard (`mcp_stdio_meta_cve_2026_04`)**](https://www.ox.security/blog/mother-of-all-ai-supply-chains-anthropic-mcp-stdio) | v0.5.9 — bundles every airlock STDIO defence into one chain; recommended default for any MCP server registered after 2026-04-26 |
| [**LangGraph 1.0.11 ToolNode compat shim**](https://github.com/langchain-ai/langgraph/releases/tag/prebuilt%401.0.11) | v0.5.9 — silent unwrap survives the prebuilt 1.0.11 list-vs-dict shape break |
| [**GPT-5.5 ("Spud") agent defaults + tool-shape adapter**](https://openai.com/index/gpt-5-5/) | v0.5.9 — caps fan-out at 8 / context at 900k / per-call egress at 512 KB |
| [**Capability caps (`agent_capability_default_caps`)**](https://www.anthropic.com/features/project-deal) | v0.5.9 — programmatic caps for SIGN_CONTRACT / DELEGATE_TO_AGENT / INVOKE_TOOL / WRITE_FILE / NETWORK_EGRESS |
| [**OWASP Agentic 2026-Q1 coverage matrix**](./docs/owasp-agentic-2026-coverage.md) | v0.5.9 — 10/10 mapping risk_id → guard + preset + test, CI gate fails on stale entries |
| [**Short-form-video corpus (`wild-2026-04/short_form_video`)**](https://www.blackhat.com/asia-26/briefings/schedule/#tiktok-agent-attacks-zhong) | v0.5.9 — 5 transcript / on-screen / RTL PoCs; `airlock replay --namespace short_form_video` |
| [**`airlock graph serve`**](./docs/graph.md) | v0.5.9 — local web UI of the live agent → tool → MCP-server topology with verdict overlay |
| [**`airlock policy compile / explain`**](./docs/policy-as-prompt.md) | v0.5.9 — natural-language policy authoring with hash-pinned prompt + deterministic cache |
| [**`airlock kill-switch`**](./docs/kill-switch.md) | v0.5.9 — HMAC-signed cluster-wide freeze with 2-of-3 quorum reset |
| [**Comment-and-Control PR-metadata guard**](https://oddguan.com/blog/comment-and-control-prompt-injection-credential-theft-claude-code-gemini-cli-github-copilot/) | v0.5.8 — neutralises CVSS 9.4 cross-vendor PR-title prompt injection |
| [**`airlock pack`**](https://www.anthropic.com/features/project-deal) | v0.5.8 — signed policy bundles; `airlock pack install claude-code-ci@2026.04` |
| [**`airlock baseline`**](https://venturebeat.com/security/rsac-2026-agentic-soc-agent-telemetry-security-gap) | v0.5.8 — per-agent 7-day rolling profile + drift score |
| [**`airlock attest`**](https://www.anthropic.com/features/project-deal) | v0.5.8 — DSSE provenance per verdict |
| [**Cloudflare Mesh compat**](https://www.cloudflare.com/press/press-releases/2026/cloudflare-launches-mesh-to-secure-the-ai-agent-lifecycle/) | v0.5.8 — runs alongside Mesh; de-duplicates overlapping policies |
| [**Manifest-only STDIO mode**](./docs/mcp/manifest-only-mode.md) | v0.5.7 — signed-manifest registry; argv never originates from runtime input |
| [**STDIO-taint CI gate**](./docs/security/stdio-taint-scan.md) | v0.5.7 — AST taint analyzer; flags remote→Popen flows at PR time |
| [**Declarative preset YAML**](./docs/presets/yaml-format.md) | v0.5.7 — composite presets via stdlib-only YAML parser |
| [**CVE-2026-30615 Windsurf zero-click**](./docs/cves/cve-2026-30615.md) | v0.5.7 — diff-on-demand mcp.json auto-load guard; **v0.8.23** adds `mcp_config_pin` — a spawn-time `{name, command, args, env-keys}` fingerprint pin (`McpConfigPinSet.check()`) that fails closed (raises, never warns) on an injected (unpinned) or mutated STDIO server even when the injection never touched a watched config file; emits on the structlog + JSON-Lines audit channels |
| [**CVE-2026-6980 GitPilot-MCP**](./docs/cves/cve-2026-6980.md) | v0.5.7 — repo_path injection (vendor unresponsive) |
| [**DockerBackend**](./docs/sandbox/docker.md) | v0.5.1 hardening + known gaps |

### Regulatory engagement

- [Public comment draft — NIST AI RMF v2.0 Agentic-AI Security](./docs/regulatory/nist-ai-rmf-v2-comment-2026.md) (window: 2026-04-18 → mid-June)

---

## 👤 About

Built by [**Sattyam Jain**](https://github.com/sattyamjjain) — AI infrastructure engineer.

This started as an internal tool after watching an agent hallucinate its way through a production database. Now it's yours.

---

## 🤝 Contributing

We review every PR within 48 hours.

```bash
git clone https://github.com/sattyamjjain/agent-airlock
cd agent-airlock
pip install -e ".[dev]"
pytest tests/ -v
```

- **Bug?** [Open an issue](https://github.com/sattyamjjain/agent-airlock/issues)
- **Feature idea?** [Start a discussion](https://github.com/sattyamjjain/agent-airlock/discussions)
- **Want to contribute?** [See open issues](https://github.com/sattyamjjain/agent-airlock/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)

---

## 💖 Support

If Agent-Airlock saved your production database:

- ⭐ **Star this repo** — Helps others discover it
- 🐛 **Report bugs** — [Open an issue](https://github.com/sattyamjjain/agent-airlock/issues)
- 📣 **Spread the word** — Tweet, blog, share

---

## ⭐ Star History

<div align="center">

[![Star History Chart](https://api.star-history.com/svg?repos=sattyamjjain/agent-airlock&type=Date)](https://star-history.com/#sattyamjjain/agent-airlock&Date)

</div>

---

<div align="center">

**Built with 🛡️ by [Sattyam Jain](https://github.com/sattyamjjain)**

<sub>Making AI agents safe, one decorator at a time.</sub>

[![GitHub](https://img.shields.io/badge/GitHub-sattyamjjain-181717?style=flat-square&logo=github)](https://github.com/sattyamjjain)
[![Twitter](https://img.shields.io/badge/Twitter-@sattyamjjain-1DA1F2?style=flat-square&logo=twitter&logoColor=white)](https://twitter.com/sattyamjjain)

</div>

---

<div align="center">
<sub>

**Sources:** This README follows best practices from [awesome-readme](https://github.com/matiassingers/awesome-readme), [Best-README-Template](https://github.com/othneildrew/Best-README-Template), and the [GitHub Blog](https://github.blog/open-source/maintainers/marketing-for-maintainers-how-to-promote-your-project-to-both-users-and-contributors/).

</sub>
</div>
