Metadata-Version: 2.4
Name: agent-safety-layer
Version: 0.1.0
Summary: Production-grade safety boundaries for AI agents - policies, tracing, replay, and human-in-the-loop approval
Author-email: Korah Stone <korahcomm@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/KorahStone/agent-safety-layer
Project-URL: Repository, https://github.com/KorahStone/agent-safety-layer
Project-URL: Issues, https://github.com/KorahStone/agent-safety-layer/issues
Keywords: ai,agents,safety,llm,boundaries,security,tracing,replay,guardrails
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: langchain
Requires-Dist: langchain>=0.1.0; extra == "langchain"
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
Provides-Extra: all
Requires-Dist: agent-safety-layer[anthropic,langchain,openai]; extra == "all"
Dynamic: license-file

# agent-safety-layer

Production-grade safety boundaries for AI agents — policies, runtime limits, execution tracing, replay, and human-in-the-loop approval.

[![PyPI version](https://badge.fury.io/py/agent-safety-layer.svg)](https://badge.fury.io/py/agent-safety-layer)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Why?

AI agents can do dangerous things — delete files, drop databases, send emails, make API calls. This library provides guardrails:

- **Block dangerous operations** before they execute
- **Sandbox file and network access** to allowed paths/hosts
- **Trace everything** for debugging and auditing
- **Replay sessions** to test policy changes safely
- **Human approval gates** for sensitive operations

## Installation

```bash
pip install agent-safety-layer
```

## Quick Start

```python
from agent_safety_layer import SafetyLayer, Policy, PathBoundary, NetworkBoundary

# Create a safety layer with policies and boundaries
safety = SafetyLayer(
    policies=[
        # Block dangerous shell commands
        Policy.block_pattern(r"rm\s+-rf\s+/", "No recursive delete from root"),
        Policy.block_pattern(r"DROP\s+TABLE", "No DROP TABLE in production"),
        
        # Require approval for emails
        Policy.require_approval("send_email", timeout=300),
    ],
    boundaries=[
        # Only allow file access to these paths
        PathBoundary(allowed_paths=["/tmp", "/home/user/workspace"]),
        
        # Only allow these API hosts
        NetworkBoundary(allowed_hosts=["api.openai.com", "api.anthropic.com"]),
    ],
)

# Use the decorator to guard functions
@safety.guard
def execute_command(cmd: str) -> str:
    import subprocess
    return subprocess.run(cmd, shell=True, capture_output=True).stdout.decode()

# This works fine
execute_command("ls -la /tmp")

# This raises SafetyViolation
execute_command("rm -rf /")  # Blocked by policy!
```

## Features

### Policy-Based Blocking

Define rules for what operations should be blocked, warned, or audited:

```python
from agent_safety_layer import Policy, PolicyAction

# Block by pattern
Policy.block_pattern(r"DROP\s+TABLE", "No DROP TABLE")

# Warn on pattern (logs but doesn't block)
Policy.warn_pattern(r"sudo", "Warning: using sudo")

# Audit pattern (just records)
Policy.audit_pattern(r"SELECT.*FROM", "Auditing DB queries")

# Custom policy logic
def check_cost(operation: str, context: dict):
    if context.get("estimated_cost", 0) > 100:
        return PolicyResult(
            action=PolicyAction.BLOCK,
            policy_name="cost_limit",
            reason="Operation exceeds cost limit",
        )
    return None

Policy.custom("cost_check", check_cost)
```

### Runtime Boundaries

Restrict what resources your agent can access:

```python
from agent_safety_layer import (
    PathBoundary,
    NetworkBoundary, 
    TimeBoundary,
    ResourceBoundary,
)

# File system sandboxing
PathBoundary(
    allowed_paths=["/tmp", "/home/user/workspace"],
    blocked_paths=["/etc", "/var"],
    block_patterns=["*.exe", "*.dll"],
)

# Network access control
NetworkBoundary(
    allowed_hosts=["api.openai.com", "*.anthropic.com"],
    blocked_hosts=["localhost", "127.0.0.1"],
    blocked_ports=[22, 23, 3389],  # SSH, Telnet, RDP
    allow_private_ips=False,
)

# Execution time limits
TimeBoundary(
    max_execution_time=60.0,  # Per operation
    max_total_time=3600.0,    # Total session time
)

# Resource limits
ResourceBoundary(
    max_memory_mb=1024,
    max_cpu_percent=80,
    max_operations=1000,
)
```

### Execution Tracing

Record everything for debugging and auditing:

```python
from agent_safety_layer import SafetyLayer, TraceExporter

safety = SafetyLayer(trace=True)

with safety.session(name="my_session") as session:
    session.execute("read_file", lambda: read_file("/tmp/data.txt"))
    session.execute("process_data", lambda: process(data))
    session.log("Processing complete")

# Export trace
trace = session.finish()
print(TraceExporter.to_summary(trace))
TraceExporter.to_file(trace, "trace.json")
```

Output:
```
Trace: my_session (abc-123)
Started: 2024-01-15T10:30:00
Ended: 2024-01-15T10:30:05
Duration: 5000.00ms
Entries: 3
Errors: 0
Blocked: 0

Operations:
  ✓ read_file (50.0ms)
  ✓ process_data (4900.0ms)
  ✓ Processing complete (0.0ms)
```

### Session Replay

Record sessions and replay with different policies:

```python
from agent_safety_layer import SafetyLayer, SessionRecorder, SessionReplayer, Policy

# Record a session
safety = SafetyLayer()
with safety.session(record=True) as session:
    session.execute("op1", lambda: do_thing_1())
    session.execute("op2", lambda: do_thing_2())
    session.execute("rm -rf /tmp/test", lambda: cleanup())

recording = session.get_recording()
recording.save("session.json")

# Replay with stricter policies
replayer = SessionReplayer(policies=[
    Policy.block_pattern(r"rm\s+-rf", "No rm -rf allowed"),
])

result = replayer.replay(recording)
print(f"Blocked: {result.blocked_operations}/{result.total_operations}")
print(f"Would block: {result.blocked_details}")

# Compare policy sets
results = replayer.compare_policies(recording, {
    "permissive": [],
    "moderate": [Policy.warn_pattern(r"rm", "Warning on rm")],
    "strict": [Policy.block_pattern(r"rm", "Block all rm")],
})

for name, result in results.items():
    print(f"{name}: {result.block_rate}% blocked")
```

### Human-in-the-Loop Approval

Gate sensitive operations on human approval:

```python
from agent_safety_layer import SafetyLayer, ApprovalGate, Policy
import threading

# Set up approval gate
gate = ApprovalGate(
    default_timeout=300,  # 5 minutes
    on_request=lambda r: print(f"Approval needed: {r.operation}"),
)

safety = SafetyLayer(
    policies=[
        Policy.require_approval("send_email", timeout=60),
        Policy.require_approval("delete_user", timeout=300),
    ],
    approval_gate=gate,
)

# In another thread/process, handle approvals
def approval_handler():
    while True:
        for request in gate.get_pending():
            print(f"Approve {request.operation}? (y/n)")
            if input() == "y":
                gate.approve(request.id, responder="admin")
            else:
                gate.deny(request.id, responder="admin", message="Not allowed")

# Agent code
with safety.session() as session:
    # This will block until approved or timeout
    session.execute(
        "send_email",
        lambda: send_email(to="user@example.com", body="Hello!"),
        context={"operation_type": "send_email"},
    )
```

### Convenience Function

For common setups:

```python
from agent_safety_layer import create_safety_layer

safety = create_safety_layer(
    block_dangerous_commands=True,   # rm -rf, DROP TABLE, etc.
    block_production_access=True,    # Block *prod* database access
    allowed_paths=["/tmp", "/home/user"],
    allowed_hosts=["api.openai.com"],
    enable_tracing=True,
    enable_approval=False,
)
```

## API Reference

### SafetyLayer

Main class that ties everything together.

```python
SafetyLayer(
    policies: List[Policy] = None,
    boundaries: List[Boundary] = None,
    approval_gate: ApprovalGate = None,
    trace: bool = True,
    raise_on_violation: bool = True,
    on_violation: Callable = None,
)
```

Methods:
- `check(operation, context)` — Check if operation is allowed
- `guard` — Decorator to guard functions
- `session(name, record)` — Context manager for traced sessions
- `add_policy(policy)` / `remove_policy(name)`
- `add_boundary(boundary)` / `remove_boundary(name)`

### Policy

Factory methods for creating policies:
- `Policy.block_pattern(pattern, reason)` — Block matching operations
- `Policy.warn_pattern(pattern, reason)` — Warn on matches
- `Policy.audit_pattern(pattern, reason)` — Just record matches
- `Policy.require_approval(op_type, timeout)` — Require human approval
- `Policy.custom(name, check_fn)` — Custom logic

### Boundaries

- `PathBoundary(allowed_paths, blocked_paths, allow_patterns, block_patterns)`
- `NetworkBoundary(allowed_hosts, blocked_hosts, allowed_ports, blocked_ports, allow_private_ips)`
- `TimeBoundary(max_execution_time, max_total_time)`
- `ResourceBoundary(max_memory_mb, max_cpu_percent, max_open_files, max_operations)`

### Tracing

- `Tracer` — Records operations
- `Trace` — Container for trace entries
- `TraceExporter` — Export to JSON, files, summaries

### Replay

- `SessionRecorder` — Records operations for replay
- `SessionReplayer` — Replays with different policies
- `ReplayResult` — Analysis of what would be blocked

### Approval

- `ApprovalGate` — Manages approval requests
- `ApprovalRequest` — A pending approval
- `InMemoryApprovalQueue` — Simple queue implementation

## Framework Integrations

Coming soon: LangChain, OpenAI, Anthropic integrations.

## License

MIT
