Metadata-Version: 2.4
Name: llm-mock
Version: 0.1.1
Summary: Record and replay LLM API calls for deterministic testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.27
Requires-Dist: respx>=0.21
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: anthropic>=0.25; extra == "dev"
Requires-Dist: openai>=1.30; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Dynamic: license-file

# llm-mock

Record real LLM responses once, replay them in tests forever — no API key required, no cost, no non-determinism.

```python
# Record once against the real API (run locally with your API key)
with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
    result = my_pipeline("Summarize this document...")

# Replay in tests — no API key, no cost, deterministic
@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
    result = my_pipeline("Summarize this document...")
    assert "key points" in result
```

---

## Why

- **API calls during tests are expensive.** A CI run hitting real LLM APIs can cost dollars per run at scale.
- **LLM outputs are non-deterministic.** Even at `temperature=0`, responses can vary across model versions.
- **Your production code stays untouched.** llm-mock intercepts at the HTTP transport layer — no changes to application code required.

llm-mock records and replays at the structured request level (model + messages + temperature), stores human-readable JSON fixtures, and integrates natively with pytest.

---

## Installation

```bash
pip install llm-mock
```

Or install from source:

```bash
git clone https://github.com/autopost/llm-mock.git
cd llm-mock
pip install -e .
```

**Runtime dependencies:** `httpx`, `respx`, `pydantic`

---

## How to use

### Your production code — untouched

```python
# my_app/pipeline.py
import anthropic

client = anthropic.Anthropic()

def summarize(text: str) -> str:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=100,
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )
    return message.content[0].text
```

`pipeline.py` has zero knowledge of llm-mock. No imports, no changes needed.

### Step 1 — Record (run once, locally)

Create a small script or a dedicated test that runs with `mode="record"`. You need a real API key for this step.

```python
# record_fixtures.py
from llm_mock import llm_mock
from my_app.pipeline import summarize

with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
    result = summarize("Long article about climate change...")
    print(result)  # real response from the API
```

```bash
ANTHROPIC_API_KEY=sk-... python record_fixtures.py
```

This creates `tests/fixtures/summarize.json`. **Commit this file to git.**

### Step 2 — Replay (in tests, forever)

Use the pytest decorator — no `with` block needed inside the test:

```python
# tests/test_pipeline.py
import pytest
from my_app.pipeline import summarize

@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
    result = summarize("Long article about climate change...")
    assert "climate" in result
```

```bash
pytest  # no API key needed, runs offline, instant
```

The decorator auto-discovers the fixture path relative to the test file — `fixture="summarize"` looks for `tests/fixtures/summarize.json` when the test lives in `tests/`.

llm-mock intercepts the httpx call the Anthropic SDK makes internally and returns the saved response — your test code calls `summarize()` exactly as it would in production.

**Alternative:** use the context manager directly if you need more control:

```python
from llm_mock import llm_mock

def test_summarize():
    with llm_mock(mode="replay", fixture="tests/fixtures/summarize"):
        result = summarize("Long article about climate change...")
        assert "climate" in result
```

### Step 3 — Re-record when things change

If you change the prompt, update the model, or want to refresh fixtures:

```bash
ANTHROPIC_API_KEY=sk-... python record_fixtures.py  # overwrites old fixture
git add tests/fixtures/summarize.json
git commit -m "refresh summarize fixture"
```

---

## Quick start (direct API usage)

A complete working example from scratch.

### 1. Install

```bash
pip install llm-mock
```

### 2. Save your API key

```bash
echo 'export ANTHROPIC_API_KEY=sk-ant-api03-...' > .env
echo '.env' >> .gitignore
```

### 3. Create a record script

Create `try_record.py`:

```python
import anthropic
from llm_mock import llm_mock

client = anthropic.Anthropic()

with llm_mock(mode="record", fixture="fixtures/hello"):
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    print("Response:", message.content[0].text)
    print("Fixture saved to fixtures/hello.json")
```

### 4. Run it

```bash
source .env && .venv/bin/python try_record.py
```

You should see the real response printed and `fixtures/hello.json` created.

### 5. Verify the fixture

```bash
llm-mock list tests/fixtures/hello
```

### 6. Replay without an API key

Create `try_replay.py`:

```python
import anthropic
from llm_mock import llm_mock

client = anthropic.Anthropic(api_key="fake-key")  # key is irrelevant in replay

with llm_mock(mode="replay", fixture="fixtures/hello"):
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    print("Replayed:", message.content[0].text)
```

```bash
.venv/bin/python try_replay.py
```

The exact same response is returned instantly — no network call made.

### 7. Write a test with the pytest decorator

```python
# tests/test_hello.py
import anthropic
import pytest

client = anthropic.Anthropic(api_key="fake-key")

@pytest.mark.llm_replay(fixture="hello")
def test_hello():
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    assert message.content[0].text  # replayed from fixtures/hello.json
```

```bash
.venv/bin/pytest tests/test_hello.py -v
```

---

## CLI

Inspect and manage fixture files from the terminal.

> **Note:** activate your virtual environment first so `llm-mock` is on your PATH:
> ```bash
> source .venv/bin/activate
> ```
> Or run it directly with `.venv/bin/llm-mock <command>`.

### `llm-mock list <fixture>`

Show all recorded interactions in a fixture file:

```bash
$ llm-mock list tests/fixtures/summarize

Fixture : tests/fixtures/summarize.json
Provider: anthropic
Interactions: 2

  1. a3f2c1d4e5b6…  claude-sonnet-4-6        2026-04-23T10:00:00
       "Summarize this document about climate change..."
  2. b4g3d2e5f6c7…  claude-haiku-4-5-20251001  2026-04-24T11:00:00
       "What is the capital of France?"
```

### `llm-mock clear <fixture>`

Delete an entire fixture file:

```bash
llm-mock clear tests/fixtures/summarize
```

Delete a single interaction by hash:

```bash
llm-mock clear tests/fixtures/summarize --hash a3f2c1d4e5b6
```

---

## How it works

```
Record mode:
  Your code → Anthropic/OpenAI SDK → httpx
    → llm-mock intercepts → forwards to real API
    → saves response to fixture JSON
    → returns response to your code

Replay mode:
  Your code → Anthropic/OpenAI SDK → httpx
    → llm-mock intercepts → looks up fixture by SHA256(model + messages + temperature)
    → returns saved response — no network call made
```

**Request matching** uses SHA256 of `(model, messages, temperature)`. Same request always hits the same fixture entry. Different temperature or different message content → different fixture entry.

---

## API reference

### `llm_mock(mode, fixture, provider="all")`

Context manager that activates record or replay mode.

| Parameter | Type | Description |
|---|---|---|
| `mode` | `"record"` \| `"replay"` | Whether to hit the real API and save, or return from fixture |
| `fixture` | `str` | Path to the fixture file. `.json` extension added automatically if omitted |
| `provider` | `"anthropic"` \| `"openai"` \| `"all"` | Which provider(s) to intercept. Default: `"all"` |

```python
from llm_mock import llm_mock

with llm_mock(mode="replay", fixture="tests/fixtures/my_test", provider="anthropic"):
    ...
```

### Exceptions

| Exception | When raised |
|---|---|
| `FixtureNotFoundError` | Replay mode: fixture file missing, or no matching hash in file |
| `FixtureParseError` | Fixture file exists but contains invalid JSON |

```python
from llm_mock import llm_mock, FixtureNotFoundError

try:
    with llm_mock(mode="replay", fixture="tests/fixtures/missing"):
        client.messages.create(...)
except FixtureNotFoundError as e:
    print(e)  # includes hint to run in record mode first
```

---

## Fixture file format

Fixture files are plain JSON — readable, diffable, committable.

```json
{
  "version": "1.0",
  "provider": "anthropic",
  "interactions": [
    {
      "hash": "a3f2c1...",
      "request": {
        "model": "claude-sonnet-4-6",
        "messages": [{"role": "user", "content": "Say hello."}],
        "max_tokens": 64
      },
      "response": {
        "id": "msg_01XYZ",
        "type": "message",
        "role": "assistant",
        "content": [{"type": "text", "text": "Hello! How can I help you today?"}],
        "model": "claude-sonnet-4-6",
        "stop_reason": "end_turn",
        "usage": {"input_tokens": 10, "output_tokens": 9}
      },
      "recorded_at": "2026-04-23T10:00:00+00:00"
    }
  ]
}
```

Multiple interactions (from different requests) are stored in the same file. Re-recording an existing hash overwrites only that entry.

---

## Supported providers

| Provider | Intercepted endpoint | Status |
|---|---|---|
| Anthropic | `api.anthropic.com/v1/messages` | Supported |
| OpenAI | `api.openai.com/v1/chat/completions` | Supported |
| Streaming (`stream=True`) | — | v1.1 |

---

## Comparison

| Tool | Record mode | Native SDK support | In-process |
|---|---|---|---|
| **llm-mock** | yes | yes (Anthropic + OpenAI) | yes |
| [llm_recorder](https://github.com/zby/llm_recorder) | yes | no (LiteLLM only) | yes |
| [AIMock](https://github.com/CopilotKit/aimock) | no | yes | no (HTTP server) |
| [vcr-langchain](https://github.com/amosjyng/vcr-langchain) | yes | no (LangChain only) | yes |

---

## Development

```bash
git clone https://github.com/yourname/llm-mock
cd llm-mock
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
```

---

## Roadmap

- **v0.2** — `auto` mode, disable via env var (`LLM_MOCK_DISABLED`)
- **v1.1** — streaming support
- **v2** — shared fixtures for teams, semantic matching, web dashboard
