Metadata-Version: 2.4
Name: mcp-guardian-ai
Version: 0.2.0
Summary: Whitelist NLP intent enforcement for MCP agents — pre-execution tool call validation
Author: MCP Guardian
License: Apache-2.0
Project-URL: Homepage, https://github.com/mcp-guardian/mcp-guardian
Project-URL: Repository, https://github.com/mcp-guardian/mcp-guardian
Project-URL: Issues, https://github.com/mcp-guardian/mcp-guardian/issues
Keywords: mcp,guardian,agent,security,intent,guardrail,openai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: openai-agents>=0.3.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"

# MCP Guardian

**Agent intent enforcement for MCP tool calls — pre-execution security for AI agents.**

MCP Guardian is not a firewall for MCP servers. It's a **declarative intent guardrail** for agent behavior. It validates every tool call against declared intent policies *before execution*. If the call doesn't match the policy, the MCP server never sees it.

## Install

```bash
pip install mcp-guardian-ai
```

This pulls in all dependencies: `openai-agents`, `pydantic`, `pyyaml`.

Or install everything explicitly:

```bash
pip install mcp-guardian-ai openai-agents pydantic pyyaml
```

Set your OpenAI API key (used by the LLM intent evaluator — the fast check tier runs without it):

```bash
export OPENAI_API_KEY=sk-...
```

> **Note:** The PyPI package is `mcp-guardian-ai`. The Python import is `mcp_guardian`.

For development from source:

```bash
git clone https://github.com/mcp-guardian/mcp-guardian.git
cd mcp-guardian
pip install -e ".[dev]"
```

## Three Ways to Use It

### Path 1: Pure Python (no files needed)

```python
import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServerStreamableHttp
from mcp_guardian import GuardianToolGuardrail, IntentPolicy

policy = IntentPolicy(
    name="read-only",
    description="Read files only — no writes, no shell",
    expected_workflow="Read and list files to answer user questions",
    forbidden_tools=["write_*", "execute_*", "delete_*"],
)
guardrail = GuardianToolGuardrail(policy=policy)

async def main():
    async with MCPServerStreamableHttp(
        name="my-server",
        params={"url": "https://my-mcp-server.example.com/mcp"},
    ) as server:
        tools = await guardrail.wrap_mcp_tools([server])
        agent = Agent(name="Worker", model="gpt-4o", tools=tools)
        result = await Runner.run(agent, "List all files")
        print(result.final_output)

    # Print audit log
    for entry in guardrail.audit_log:
        verdict = str(entry.verdict)
        icon = "✓" if verdict == "allow" else "✗"
        print(f"  {icon} {entry.tool_name} → {verdict.upper()} "
              f"(conf={entry.confidence:.2f}, {entry.method}, {entry.elapsed_ms:.0f}ms)")
        if verdict != "allow":
            print(f"    Reason: {entry.reason}")

asyncio.run(main())
```

### Path 2: YAML policy file (recommended)

Define a `policy.yaml`:

```yaml
name: read-only
description: Read-only file access
expected_workflow: Read and list files to answer user questions
allowed_tools: ["read_*", "list_*"]
forbidden_tools: ["write_*", "execute_*", "delete_*"]
allowed_transitions:
  list_directory: [read_file, list_directory]
  read_file: [read_file, list_directory]
constraints:
  - Do not access files outside the working directory
escalation_threshold: 0.7
```

Load it:

```python
policy = IntentPolicy.from_file("policy.yaml")
guardrail = GuardianToolGuardrail(policy=policy)
```

### Path 3: guardian.yaml + policy files (multi-server / production)

A single `guardian.yaml` ties together multiple servers, per-server policies, auth headers, and model settings:

```yaml
model: gpt-4o
guardian_model: gpt-4o
default_policy: policies/default.yaml
servers:
  - name: filesystem
    url: https://fs-server.example.com/mcp
    policy: policies/read-only.yaml
  - name: database
    url: https://db-server.example.com/mcp
    policy: policies/db-read-only.yaml
    headers:
      Authorization: "Bearer ${DB_TOKEN}"
```

```python
config = GuardianConfig.from_file("guardian.yaml")
```

See the [Quick Start](docs/getting-started/quickstart.md) for complete examples of all three paths.

## Tool Sources: MCP, Local Functions, or Both

The three policy paths above are about *how you configure the policy*. This section is about *what tools you attach the policy to*. mcp-guardian is tool-source agnostic — it runs against the SDK's general `FunctionTool` type, of which MCP-discovered tools are one case. Three entry points cover the spectrum:

| Entry point | Use when |
|---|---|
| `guardrail.wrap_mcp_tools(servers)` | Connecting to one or more MCP servers. Handles discovery, schema sanitization for OpenAI strict mode, and emits a `WARNING` on non-canonical tool names (defense against case-perturbation evasion). |
| `guardrail.attach_to_tools(tools)` | You already hold a list of `FunctionTool` objects — locally decorated with `@function_tool`, constructed from JSON schemas, or returned from somewhere other than MCP. Attaches the guardrail's `ToolInputGuardrail` to each tool's `tool_input_guardrails` list. |
| `guardrail.make_input_guardrail()` | Full control. Returns the SDK's `ToolInputGuardrail` directly; wire it onto any subset of tools you choose. |

### Local `@function_tool` example

```python
from agents import Agent, Runner, function_tool
from mcp_guardian import GuardianToolGuardrail, IntentPolicy

@function_tool
def read_file(path: str) -> str:
    """Read a file from disk."""
    return open(path).read()

@function_tool
def write_file(path: str, content: str) -> None:
    """Write content to a file."""
    open(path, "w").write(content)

policy = IntentPolicy(
    name="read-only",
    description="Read-only access",
    expected_workflow="Answer the user by reading files; never modify.",
    allowed_tools=["read_*"],
    forbidden_tools=["write_*"],
    fast_path_allow=True,
)
guardrail = GuardianToolGuardrail(policy=policy)
tools = guardrail.attach_to_tools([read_file, write_file])

agent = Agent(name="Assistant", model="gpt-4o", tools=tools)
```

A complete runnable version is in [`example_local.py`](example_local.py). Same policy shape and same defenses (fast-path block, transition graph, `case_insensitive_patterns`, audit log) as the MCP-based `example.py`.

### What is MCP-specific

Three behaviours only apply on the `wrap_mcp_tools` path because they exist at the *trust boundary with an untrusted server*:

- **Schema sanitization** strips JSON-Schema constructs that MCP servers emit but OpenAI's strict mode rejects (`format`, `default`, `$ref`, `allOf`, etc.). Locally-decorated tools usually produce clean schemas from Python signatures and don't need this.
- **Discovery-time `WARNING`** fires on tool names outside `^[a-z0-9_\-./]+$`. Local tools have names the developer chose; non-canonical there is self-inflicted, not an evasion attempt.
- **`MCPUtil.to_function_tool` conversion** translates MCP tool descriptors into SDK `FunctionTool` instances.

### Out of scope

OpenAI's *hosted* tools — `WebSearchTool`, `FileSearchTool`, `ComputerTool`, `CodeInterpreterTool` — are not `FunctionTool` instances and don't expose the `tool_input_guardrails` hook. mcp-guardian can't gate them as currently designed. Use the SDK's own approval / filter mechanisms for those, and consider them outside the trust boundary mcp-guardian enforces.

## How It Works

Three-tier enforcement pipeline on every tool call:

1. **Fast check (0ms)** — forbidden tools, whitelists, glob patterns, **transition graph**. Deterministic, no LLM, impossible to bypass with prompt injection.
2. **LLM intent evaluation (1–5s)** — analyzes the call against policy constraints and workflow context.
3. **Escalation** — low-confidence decisions flagged for human review.

The **transition graph** (`allowed_transitions`) is a state machine over tool calls — similar to [LangGraph](https://github.com/langchain-ai/langgraph), but enforced externally on the agent rather than built into the agent's own execution graph. After tool A, only tools B and C are allowed. Everything else is blocked deterministically at 0ms.

This makes MCP Guardian a **reasoning guardrail, not just a tool filter**. Anyone can do allow/block lists. The LLM intent evaluation layer supervises the agent's reasoning — catching an allowed tool called with suspicious arguments, or a permitted call that doesn't fit the declared intent. A second LLM evaluating the first LLM's decisions.

The guardian LLM defaults to `gpt-4o-mini` (fast, cheap) but can point at **any OpenAI-compatible endpoint** — Ollama, vLLM, Azure OpenAI, or a fine-tuned model:

```python
# Use a local Ollama model for the guardian
guardrail = GuardianToolGuardrail(
    policy=policy,
    guardian_model="llama3.2",
    guardian_base_url="http://localhost:11434/v1",
)
```

Or in `guardian.yaml`:

```yaml
guardian_model: llama3.2
guardian_base_url: http://localhost:11434/v1
```

Every evaluation is logged with verdict, confidence, timing, and reasoning.

## Policy Fields

| Field | Purpose |
|-------|---------|
| `allowed_tools` | Whitelist with glob patterns (`read_*`, `list_*`) |
| `forbidden_tools` | Blacklist — always blocked (`write_*`, `execute_*`) |
| `allowed_transitions` | State machine: tool A → [tool B, C] |
| `constraints` | Free-text rules for the LLM evaluator |
| `expected_workflow` | What the agent should be doing (LLM context) |
| `escalation_threshold` | Below this confidence → ask human |

## Demo: Exfiltration Prevention

A working demo blocks a data exfiltration attack across two MCP servers. The agent reads a secret (allowed), then an adversarial prompt tries to send it to an attacker URL — blocked at Tier 1 by the transition graph (0ms) and independently at Tier 2 by the LLM constraints.

See [`demos/exfiltration/`](demos/exfiltration/README.md) for details.

## Documentation

The full docs are built with [MkDocs Material](https://squidfunk.github.io/mkdocs-material/). Run them locally with Docker:

```bash
docker build -f Dockerfile.docs -t mcp-guardian-docs .
docker run -p 8000:8000 -v $(pwd)/docs:/docs/docs mcp-guardian-docs
```

Then open [http://localhost:8000](http://localhost:8000). The `-v` mount gives you live reload as you edit.

Or without Docker:

```bash
pip install mkdocs-material
mkdocs serve
```

## Roadmap

The core engine (policies, fast check, transition graphs, LLM evaluation, audit log) is SDK-agnostic. Currently we ship an adapter for the OpenAI Agents SDK. Future adapters under consideration:

| Runtime | Hook point | Status |
|---------|-----------|--------|
| OpenAI Agents SDK | `ToolInputGuardrail` | **Shipped** |
| Anthropic Claude | `PreToolUse` hook | Planned |
| Microsoft Agent Framework | `FunctionInvocationFilter` | Planned |

Same YAML policies, same `pip install`, any runtime. Feedback welcome — open an issue if your framework isn't listed.

## Built On

- [OpenAI Agents SDK](https://github.com/openai/openai-agents-python) — `ToolInputGuardrail`, `AgentHooksBase`
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) — tool server standard

## License

Apache 2.0
