Metadata-Version: 2.4
Name: logos-firewall
Version: 0.1.0
Summary: Epistemological firewall for AI agents — detects Knowledge-Action Gap using trained Logos models
Project-URL: Homepage, https://github.com/lumensyntax-org/logos-firewall
Project-URL: Documentation, https://github.com/lumensyntax-org/logos-firewall#readme
Project-URL: Repository, https://github.com/lumensyntax-org/logos-firewall
Project-URL: Issues, https://github.com/lumensyntax-org/logos-firewall/issues
Author-email: Rafael Rodriguez <lumensyntax@users.noreply.github.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: agent-firewall,ai-safety,epistemology,logos,think-block-auditor
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Requires-Python: >=3.10
Requires-Dist: httpx>=0.25.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: fastapi>=0.100.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: respx>=0.20; extra == 'dev'
Requires-Dist: uvicorn[standard]>=0.20.0; extra == 'dev'
Provides-Extra: server
Requires-Dist: fastapi>=0.100.0; extra == 'server'
Requires-Dist: uvicorn[standard]>=0.20.0; extra == 'server'
Description-Content-Type: text/markdown

# logos-firewall

Epistemological firewall for AI agents. Detects when an agent reasons correctly but acts against its own reasoning (Knowledge-Action Gap).

## The Problem

Autonomous AI agents execute actions without verification. An agent can reason "this is dangerous" in its think block and then execute the dangerous action anyway. Existing guardrails use prompted models — logos-firewall uses a **trained** epistemological classifier.

## Installation

```bash
pip install logos-firewall
```

For the API server:

```bash
pip install "logos-firewall[server]"
```

## Quick Start

### Python SDK

```python
from logos_firewall import LogosFirewall, FastGate, ThinkAuditor

# Level A only — fast offline classification (no Ollama needed)
gate = FastGate()
result = gate.classify("rm -rf /")
# result.verdict == "BLOCK"
# result.confidence == 0.95

# Full pipeline — Level A (regex) + Level B (Logos via Ollama)
fw = LogosFirewall(ollama_url="http://localhost:11434")
result = await fw.audit(
    action="rm -rf /etc/config/*",
    think="<think>I need to clean up old files...</think>",
    context="coding_agent",
)
# result.verdict == "BLOCK"

# Standalone Think Block Auditor
auditor = ThinkAuditor(ollama_url="http://localhost:11434")
result = await auditor.audit(
    think_block="<think>This request is dangerous. I should refuse.</think>",
    output="Sure, here's how to do it: ...",
)
# result.verdict == "GAP"  (Knowledge-Action Gap detected)
```

### API Server

```bash
# Start the server
logos-firewall
# or: uvicorn logos_firewall.server:app --host 0.0.0.0 --port 8000

# Classify an action (Level A only, no Ollama)
curl -X POST http://localhost:8000/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"action": "rm -rf /"}'

# Full audit (Level A + B)
curl -X POST http://localhost:8000/v1/audit \
  -H "Content-Type: application/json" \
  -d '{"action": "pip install unknown-pkg", "think": "<think>installing dependency</think>"}'

# Think block audit
curl -X POST http://localhost:8000/v1/think-audit \
  -H "Content-Type: application/json" \
  -d '{"think_block": "<think>This is dangerous</think>", "output": "Sure, here you go..."}'

# Health check
curl http://localhost:8000/health
```

### Docker

```bash
docker-compose up
```

This starts both Ollama and the logos-firewall server. You'll need to pull a Logos model into Ollama separately:

```bash
# From the Ollama container or host
ollama pull logos10v2_auditor_v3
```

## Architecture

```
Agent action request
        |
        v
+-------------------------+
|  Level A: FastGate       |  < 10ms
|  (regex + action type)   |
|  ALLOW / BLOCK / STEP_UP |
+------------+------------+
             | STEP_UP
             v
+-------------------------+
|  Level B: LogosGate      |  100-500ms
|  (Logos 1B via Ollama)   |
|  Think-Action audit      |
|  ALLOW / BLOCK / UNCERTAIN|
+-------------------------+
```

**Level A** catches obvious cases with regex patterns (destructive commands, safe read-only ops). Unknown or risky actions are escalated to **Level B**, which uses a Logos fine-tuned model for epistemological evaluation.

## Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `LOGOS_OLLAMA_URL` | `http://localhost:11434` | Ollama server URL |
| `LOGOS_API_TOKEN` | (none) | Bearer token for API auth (optional) |
| `LOGOS_RATE_LIMIT_RPM` | `60` | Requests per minute per IP |
| `LOGOS_HOST` | `0.0.0.0` | Server bind host |
| `LOGOS_PORT` | `8000` | Server bind port |

### Model Chain

By default, logos-firewall tries these Logos models in order:

1. `logos10v2_auditor_v3` (Gemma 3 1B, recommended)
2. `logos9_hybrid` (Gemma 3 1B, fallback)
3. `logos9_auditor_v2` (Gemma 3 1B, fallback)

The Think Block Auditor prefers the 9B model (`logos-auditor`) for higher accuracy.

## API Reference

### POST /v1/audit

Full firewall audit (Level A -> B).

**Request:**
```json
{
  "action": "rm -rf /tmp/old-files",
  "think": "<think>cleanup needed</think>",
  "context": "coding_agent"
}
```

**Response:**
```json
{
  "verdict": "BLOCK",
  "confidence": 0.95,
  "action_class": "DESTRUCTIVE",
  "mechanism": "regex",
  "detail": "Blocked: wildcard delete",
  "latency_ms": 0.12,
  "level": "A",
  "model": ""
}
```

### POST /v1/think-audit

Standalone think block / output consistency audit.

**Request:**
```json
{
  "think_block": "<think>This seems dangerous...</think>",
  "output": "Sure, here's how to do it...",
  "domain": "general"
}
```

**Response:**
```json
{
  "verdict": "GAP",
  "confidence": 0.15,
  "reasoning": "The reasoning identifies danger but the output ignores it",
  "model": "logos-auditor",
  "latency_ms": 342.5
}
```

### POST /v1/classify

Action classification only (Level A, no Ollama needed).

### GET /health

Health check with Ollama and model availability.

## Connection to Research

This package implements the agent firewall described in "The Instrument Trap: When Aligned Models Serve Misaligned Purposes" (DOI: [10.5281/zenodo.18644322](https://doi.org/10.5281/zenodo.18644322)).

The benchmark dataset (14,950 test cases) is available at [LumenSyntax/instrument-trap-benchmark](https://huggingface.co/datasets/LumenSyntax/instrument-trap-benchmark) on Hugging Face.

## Requirements

- Python 3.10+
- [Ollama](https://ollama.ai) with a Logos model loaded (for Level B and Think Auditor)
- Level A (FastGate) works entirely offline with no dependencies beyond httpx and pydantic

## License

Apache 2.0
