Metadata-Version: 2.4
Name: llmreplay
Version: 0.1.2
Summary: Deterministic replay debugger for LLM agents
License-Expression: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: click>=8.1
Requires-Dist: rich>=13.0
Requires-Dist: aiofiles>=23.0
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20; extra == "anthropic"
Provides-Extra: langchain
Requires-Dist: langchain>=0.2; extra == "langchain"
Requires-Dist: langchain-core>=0.2; extra == "langchain"
Provides-Extra: grok
Requires-Dist: openai>=1.0; extra == "grok"
Provides-Extra: gemini
Requires-Dist: google-genai>=0.8; extra == "gemini"
Provides-Extra: s3
Requires-Dist: boto3>=1.28; extra == "s3"
Provides-Extra: web
Requires-Dist: streamlit>=1.35; extra == "web"
Requires-Dist: plotly>=5.0; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Provides-Extra: all
Requires-Dist: llmreplay[anthropic,dev,gemini,grok,langchain,openai,s3,web]; extra == "all"

# llmreplay

Deterministic replay debugger for LLM agents. Records LLM calls and tool executions to SQLite, replays from the log with no network calls.

```python
from llmreplay import record, replay

# Record
with record("my_run", seed=42):
    response = client.chat.completions.create(...)

# Replay — zero network calls
session = replay("my_run")
for event in session.events():
    print(event.step, event.kind, event.payload)
```

## Install

```bash
pip install llmreplay
```

Requirements: Python >= 3.10

## What gets recorded

- LLM requests/responses (OpenAI, Anthropic, Grok/xAI, Gemini)
- Tool calls/results (via `@record_tool` decorator)
- Random seeds (Python `random`, numpy)
- Exceptions

Events are stored in `~/.llmreplay/<run_id>.db`.

## CLI

```bash
llmreplay list                    # List recorded runs
llmreplay view my_run             # Show all events
llmreplay view my_run --step 42   # Jump to step
llmreplay cost my_run             # Cost breakdown
llmreplay export my_run --json    # Export bug report
llmreplay web my_run              # Launch timeline UI
```

## Features

**Auto-instrumentation** — OpenAI, Anthropic, Grok, Gemini, LangChain hooks install automatically within `record()` context.

**Tool mocking** — Record tool I/O with `@record_tool`, replay with `ToolMocker`:

```python
from llmreplay import ToolMocker, EventStore

mocker = ToolMocker()
mocker.load(EventStore("my_run"))

@mocker.mock(name="fetch_price")
def fetch_price(ticker: str) -> dict: ...  # returns recorded result
```

**Regression testing** — Run recorded traces against updated code:

```python
from llmreplay import RegressionSuite

suite = RegressionSuite()

@suite.case("run_001")
def check(original, session):
    return session.total_cost() <= original["total_cost_usd"] * 1.1

suite.run()
```

**Fork/branch** — Copy a trace up to a step for counterfactual debugging:

```python
from llmreplay import fork
new_store = fork("broken_run", "fixed_run", at_step=50)
```

**Fine-tuning export** — Export prompt/response pairs:

```python
from llmreplay import export_finetune_dataset
export_finetune_dataset(["run_001", "run_002"], "data.jsonl")
```

## License

MIT
