Metadata-Version: 2.4
Name: spooled-ai
Version: 0.5.1
Summary: CI for AI agents - behavioral fingerprinting and drift detection
Author: Spooled Team
License: Proprietary
Project-URL: Homepage, https://spooled.ai
Project-URL: Documentation, https://spooled.ai/docs/getting-started/quickstart
Keywords: ai,agents,tracing,debugging,replay,ci,behavioral-testing,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic<3.0.0,>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: structlog>=23.0.0
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.0.0
Requires-Dist: httpx>=0.24.0
Requires-Dist: requests>=2.31.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-python-dateutil; extra == "dev"
Provides-Extra: cli
Provides-Extra: backend
Requires-Dist: aws-cdk-lib>=2.100.0; extra == "backend"
Requires-Dist: constructs>=10.0.0; extra == "backend"
Requires-Dist: boto3>=1.28.0; extra == "backend"
Provides-Extra: tools
Requires-Dist: langchain>=0.3.0; extra == "tools"
Requires-Dist: langchain-community>=0.3.0; extra == "tools"
Requires-Dist: langchain-openai>=0.2.0; extra == "tools"
Requires-Dist: langgraph>=0.2.0; extra == "tools"
Requires-Dist: crewai>=0.80.0; extra == "tools"
Requires-Dist: llama-index>=0.10.0; extra == "tools"
Requires-Dist: pyautogen>=0.2.0; extra == "tools"
Requires-Dist: aiohttp>=3.9.0; extra == "tools"
Requires-Dist: boto3>=1.28.0; extra == "tools"
Provides-Extra: metrics
Requires-Dist: prometheus-client>=0.20.0; extra == "metrics"
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20.0; extra == "otel"
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == "otel"
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.20.0; extra == "otel"
Dynamic: license-file

# Spooled — Behavioral CI for AI Agents

> **One prompt edit quietly turned this customer-support agent into a refund machine.
> Spooled caught it on the PR.**

A PM asks for "a more helpful tone for frustrated customers." An engineer adds **one sentence** to the system prompt: *"Resolve their issue when possible."* Unit tests pass. The reviewer approves. The PR is ready to merge.

But the LLM now interprets "resolve" liberally. On complaint tickets, the agent stops escalating refund requests to humans and starts issuing refunds itself. The structure changed even though the prompt looked harmless.

Spooled diffs the agent's behavior against the committed baseline and posts this on the PR:

```
🚨 Merge blocked: agent now calls `issue_refund`

This tool was never observed in the baseline. It appears in
2 of 5 traces in this PR (~40%).

Triggered by a one-sentence change to the system prompt.
```

Caught content-blind — Spooled compared tool graphs, not language. It never saw a customer message or an LLM response.

<!-- TODO(andy): embed 60-second screencast — terminal + GitHub PR side-by-side.
     Asset: assets/killer-demo.mp4 once recorded. See assets/README.md. -->

## Run it yourself in 60 seconds

```bash
pip install spooled-ai
spooled demo
```

Runs the entire scenario in your terminal — no API key, no setup, no files left behind. The variant agent differs from the baseline by exactly one line in the system prompt. The code is otherwise identical.

## What It Does

**Capture** — wraps your LLM client and records the structural fingerprint of every agent run: which tools were called, in what order, how many times. Content-blind by architecture — prompts, customer data, and AI responses never leave your infrastructure.

**Compare** — diffs the current run against a committed baseline. Shows exactly what changed: tools added, tools removed, sequence reordered, token usage shifted.

**Gate** — posts a PR comment with the human-readable consequence as the headline. Blocks the merge if the policy says so. Resolution instructions included.

## Install

```bash
pip install spooled-ai
```

## Quick Start

```python
import spooled
from spooled.wrappers import wrap_openai
from openai import OpenAI

spooled.init(agent_id="my_agent")
client = wrap_openai(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Analyze this deal"}],
    tools=MY_TOOLS,
)

spooled.shutdown()
```

That's it. Every tool call is captured. The trace is saved to `.spooled/traces/`. The hash chain signs every interaction at capture time.

## CI Integration

```yaml
# .github/workflows/spooled.yml
- name: Generate traces
  run: python ci_runner.py
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

- name: Spooled behavioral check
  run: |
    pip install spooled-ai
    spooled ci compare .spooled/traces/*.jsonl \
      --baseline .github/baselines \
      --policy spooled-policy.yml \
      --enable-blocking
```

Example PR comment:

```markdown
## ❌ Spooled Behavioral CI: FAIL
> Spooled Score: 59/100 (D) 🔴

> [!CAUTION]
> ## 🚨 Merge blocked: agent now calls `issue_refund`
>
> This tool was **never observed in the baseline**. It appears in
> **2 of 5** traces in this PR (~40%).

**5** traces analyzed  |  ✅ **3** passed  |  ❌ **2** policy failures

### Trace Results
| Agent          | Fingerprint     | Status        | Score |
|----------------|-----------------|---------------|-------|
| support_agent  | `4d893b5cef...` | ⚠️ Behavior change | 59 |

<details>
  <summary>🔧 Tool Changes (2 traces)</summary>

  - ➕ `issue_refund` added
  - ➖ `escalate_to_human` removed
</details>
```

## What Spooled Catches

| Change type | Example | Unit tests | Spooled |
|-------------|---------|:----------:|:-------:|
| Prompt tweak | "Be concise" drops compliance tools | ✅ Pass | **Behavior change** |
| Model swap | Model drops sanctions screening | ✅ Pass | **Behavior change** |
| Tool deprecation | Agent proceeds without critical data | ✅ Pass | **Behavior change** |
| KB refresh | Ticket response path changes | ✅ Pass | **Behavior change** |
| Schema migration | Field rename breaks detection | ✅ Pass | **Behavior change** |
| Upstream degradation | Retry paths appear in fingerprint | ✅ Pass | **Behavior change** |

## Content-Blind Architecture

Spooled never captures prompts, customer data, or AI responses. Only structural metadata: tool names, call sequence, token counts, timing. This is enforced in code — content is stripped before the trace reaches disk.

## Supported Libraries

**LLM Providers (explicit wrappers):**
- OpenAI (sync/async, streaming)
- Anthropic (sync/async, streaming)

**HTTP & Cloud (auto-instrumented via hooks):**
- AWS Bedrock
- requests, httpx, aiohttp

**Frameworks (callback handlers):**
- LangChain, LlamaIndex, AutoGen, CrewAI, LangGraph

## Documentation

- [Quick Start](https://spooled.ai/docs/getting-started/quickstart)
- [CI/CD Integration](https://spooled.ai/docs/guides/ci-cd)
- [Privacy Architecture](https://spooled.ai/docs/concepts/privacy-architecture)

## License

Proprietary.
