Metadata-Version: 2.4
Name: aisafepy
Version: 0.1.0
Summary: Capability-based IFC, streaming-native cascaded guardrails, and an eval-to-guardrail compiler for LLM agents.
Project-URL: Homepage, https://github.com/Vidura-Wijekoon/aisafepy
Project-URL: Documentation, https://github.com/Vidura-Wijekoon/aisafepy#readme
Project-URL: Repository, https://github.com/Vidura-Wijekoon/aisafepy
Project-URL: Issues, https://github.com/Vidura-Wijekoon/aisafepy/issues
Author-email: Vidura Wijekoon <businessaividura@viduraaitech.space>
License: Apache-2.0
License-File: LICENSE
Keywords: agent-security,ai-safety,guardrails,information-flow-control,llm,prompt-injection
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: anyio>=4.3
Requires-Dist: opentelemetry-api>=1.24
Requires-Dist: pydantic>=2.6
Requires-Dist: tenacity>=8.2
Requires-Dist: typing-extensions>=4.10
Provides-Extra: adapt
Requires-Dist: hdbscan>=0.8.33; extra == 'adapt'
Requires-Dist: numpy>=1.26; extra == 'adapt'
Requires-Dist: scikit-learn>=1.4; extra == 'adapt'
Requires-Dist: sentence-transformers>=2.6; extra == 'adapt'
Requires-Dist: umap-learn>=0.5.5; extra == 'adapt'
Provides-Extra: all
Requires-Dist: hdbscan>=0.8.33; extra == 'all'
Requires-Dist: langgraph>=0.0.40; extra == 'all'
Requires-Dist: llama-index-core>=0.10; extra == 'all'
Requires-Dist: mcp>=0.9; extra == 'all'
Requires-Dist: numpy>=1.26; extra == 'all'
Requires-Dist: openai-agents>=0.0.7; extra == 'all'
Requires-Dist: scikit-learn>=1.4; extra == 'all'
Requires-Dist: sentence-transformers>=2.6; extra == 'all'
Requires-Dist: torch>=2.2; extra == 'all'
Requires-Dist: transformers>=4.40; extra == 'all'
Requires-Dist: umap-learn>=0.5.5; extra == 'all'
Provides-Extra: contrib-llm-guard
Requires-Dist: llm-guard>=0.3; extra == 'contrib-llm-guard'
Provides-Extra: contrib-presidio
Requires-Dist: presidio-analyzer>=2.2; extra == 'contrib-presidio'
Provides-Extra: dev
Requires-Dist: hypothesis>=6.100; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.1; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: flow-langgraph
Requires-Dist: langgraph>=0.0.40; extra == 'flow-langgraph'
Provides-Extra: flow-llamaindex
Requires-Dist: llama-index-core>=0.10; extra == 'flow-llamaindex'
Provides-Extra: flow-mcp
Requires-Dist: mcp>=0.9; extra == 'flow-mcp'
Provides-Extra: flow-openai
Requires-Dist: openai-agents>=0.0.7; extra == 'flow-openai'
Provides-Extra: probes
Requires-Dist: numpy>=1.26; extra == 'probes'
Requires-Dist: scikit-learn>=1.4; extra == 'probes'
Requires-Dist: torch>=2.2; extra == 'probes'
Requires-Dist: transformers>=4.40; extra == 'probes'
Provides-Extra: stream
Requires-Dist: numpy>=1.26; extra == 'stream'
Requires-Dist: torch>=2.2; extra == 'stream'
Requires-Dist: transformers>=4.40; extra == 'stream'
Description-Content-Type: text/markdown

<p align="center">
  <img src="docs/aisafepy-logo.jpg" alt="AIsafePy" width="420"/>
</p>

# AIsafePy

**Capability-based information-flow control, streaming-native cascaded guardrails, and a continuous eval-to-guardrail compiler for LLM agents.**

AIsafePy fills three gaps that the existing OSS guardrails ecosystem (NeMo, Guardrails AI, llm-guard, LlamaFirewall, OpenAI Guardrails) has not closed:

1. **`aisafepy.flow`**. Capability-based, taint-propagating runtime around tool-calling agents (CaMeL / FIDES / RTBAS-style information-flow control), packaged as drop-in adapters for OpenAI Agents SDK, LangGraph, LlamaIndex, Anthropic tools, and MCP servers.
2. **`aisafepy.stream`**. Streaming-native cascaded guardrails with deterministic Tier-1, small-classifier Tier-2, and optional white-box activation probes / LLM-judge Tier-3, plus an explicit p95 latency budget and structured `GuardDecision`s.
3. **`aisafepy.adapt`**. A continuous eval-to-guardrail compiler that promotes PyRIT / Garak / Inspect failures into runtime guards: distilled classifiers, synthesized regexes, Cedar/OPA policy rules, steering vectors (for self-hosted models), and deliberative cases.

## Status

**Alpha (v0.1).** API surface is stable enough to build against, but expect rough edges and missing optional dependencies in the heavier extras.

## Install

```bash
pip install aisafepy                       # core only
pip install "aisafepy[stream]"             # + HF classifiers, regex, deterministic Tier 1/2
pip install "aisafepy[probes]"             # + linear/MLP activation probes for HF models
pip install "aisafepy[adapt]"              # + clustering and compiler targets
pip install "aisafepy[flow-openai]"        # + OpenAI Agents SDK adapter
pip install "aisafepy[all]"                # everything except contrib-* extras
```

For development:

```bash
uv venv
uv pip install -e ".[dev,all]"
uv run pytest
```

## Quickstart

### `flow`: defeating indirect prompt injection by construction

```python
from aisafepy.flow import Policy, Capability, secure_agent, Tainted
from agents import Agent, Runner  # openai-agents

policy = (
    Policy()
    .label_source("web.fetch", integrity="UNTRUSTED")
    .label_source("gmail.read", integrity="UNTRUSTED", caps={Capability.READ_USER})
    .label_source("user_prompt", integrity="TRUSTED")
    .require("send_email", control_flow_integrity="TRUSTED")
    .require("payments.transfer", control_flow_integrity="TRUSTED",
             caps={Capability.WRITE_EXTERNAL})
    .deny_if("send_email",
             when=lambda to, body: "read.secrets" in body.provenance,
             reason="secret-to-external-sink")
)

agent = Agent(name="ops-bot", tools=[gmail_read, web_fetch, send_email, transfer])
safe_agent = secure_agent(agent, policy=policy)
result = Runner.run_sync(safe_agent, "Read my last email and act on it.")
```

### `stream`: cascaded guardrails with a latency budget

```python
from aisafepy.stream import (
    GuardPipeline, RegexGuard, ClassifierGuard, probes,
)

pipeline = GuardPipeline(
    tier1=[
        RegexGuard.compile_pii(),
        RegexGuard.blocklist(["api_key=", "BEGIN PRIVATE"]),
    ],
    tier2=[ClassifierGuard.from_hf("meta-llama/Llama-Prompt-Guard-2-22M")],
    tier3=[ClassifierGuard.from_hf("meta-llama/Llama-Guard-4")],
    budget_ms_p95=80,
)

async for chunk_or_decision in pipeline.guard_stream(model.generate_stream(prompt)):
    if hasattr(chunk_or_decision, "action"):
        log_otel(chunk_or_decision)
        break
    yield chunk_or_decision
```

### `adapt`: PyRIT failures → deployed guard pipeline

```python
from aisafepy.adapt import PyRITSource, GuardCompiler, Target, promote
from aisafepy.stream import GuardPipeline

source = PyRITSource(memory_db="pyrit_memory.duckdb")
compiler = GuardCompiler(
    source=source,
    targets=[
        Target.distill_classifier(base="meta-llama/Llama-Prompt-Guard-2-22M"),
        Target.synthesize_regex(min_precision=0.99),
        Target.steering_vector(model="Qwen/Qwen3-8B-Instruct"),
        Target.deliberative_case(policy="policies/company_safety.md"),
    ],
    min_attack_success_rate=0.05,
)
report = compiler.compile()
promote(report, to=GuardPipeline.from_yaml("guards.yaml"),
        canary_traffic_pct=1.0, fp_budget=0.005)
```

## Layout

```
src/aisafepy/
├── core/           # shared primitives: GuardDecision, telemetry, budgets, progress, policies
├── flow/           # Gap 1. Capability-based IFC
│   └── adapters/   # openai_agents, langgraph, llamaindex, anthropic_tools, mcp
├── stream/         # Gap 2. Streaming cascade
│   └── adapters/   # openai_agents, langchain, llamaindex
├── adapt/          # Gap 3. Eval-to-guardrail compiler
│   └── compile/    # classifier, regex, policy, steering, deliberative
└── contrib/        # thin wrappers: presidio, llama_guard, shield_gemma, prompt_guard, llm_guard, lakera
```

## Design principles

1. **Pythonic, not DSL-first.** Decorators and types, not Colang. Cedar / OPA appears only as an emission target inside `adapt.compile.policy`.
2. **Composable primitives.** Every guard is a `Callable[[Context], Awaitable[GuardDecision]]`. Pipelines, IFC, and `adapt` all consume and produce this type.
3. **Bring your own model.** No proprietary models are shipped. `contrib/` wraps Llama Guard 4, ShieldGemma, Prompt Guard 2, llm-guard, Lakera, Presidio.
4. **Defense in depth.** `flow` (architectural) + `stream` (detective) + `adapt` (continuous) compose.
5. **Observability is a first-class output.** Structured `GuardDecision` / `IFCViolation`, OpenTelemetry-native, with explicit `why_blocked` + `evidence`.
6. **Self-hosted parity.** Probe-based and steering-based features work on HF Transformers; hosted APIs fall back to classifier guards.

## Caveats

Capability-based defenses reduce risk dramatically but are not free. CaMeL reports ~2.7× tokens, RTBAS ~2% utility loss. Streaming forecasters require MC rollouts or token-level supervision to train. Activation probes are model-specific. AIsafePy does not solve sleeper-agent / deceptive-alignment problems. See `docs/CAVEATS.md`.

## License

Apache-2.0. See `LICENSE`.
