Metadata-Version: 2.4
Name: llmreplay
Version: 0.1.0
Summary: Deterministic replay debugger for LLM agents
License-Expression: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: click>=8.1
Requires-Dist: rich>=13.0
Requires-Dist: aiofiles>=23.0
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20; extra == "anthropic"
Provides-Extra: langchain
Requires-Dist: langchain>=0.2; extra == "langchain"
Requires-Dist: langchain-core>=0.2; extra == "langchain"
Provides-Extra: grok
Requires-Dist: openai>=1.0; extra == "grok"
Provides-Extra: gemini
Requires-Dist: google-genai>=0.8; extra == "gemini"
Provides-Extra: s3
Requires-Dist: boto3>=1.28; extra == "s3"
Provides-Extra: web
Requires-Dist: streamlit>=1.35; extra == "web"
Requires-Dist: plotly>=5.0; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Provides-Extra: all
Requires-Dist: llmreplay[anthropic,dev,gemini,grok,langchain,openai,s3,web]; extra == "all"

# llmreplay

Deterministic replay layer for LLM-driven systems.

---

## Overview

LLM Replay is a lightweight framework for capturing, replaying, and testing LLM interactions.

It converts non-deterministic LLM behavior into reproducible system behavior, enabling reliable debugging and testing.

---

## Problem

LLM applications are difficult to test because they are:

- Non-deterministic by design  
- Dependent on external APIs  
- Hard to reproduce across runs  
- Fragile in CI environments  
- Difficult to debug historically  

This leads to unreliable regression testing and unstable evaluation pipelines.

---

## Solution

LLMReplay introduces a replay abstraction layer for LLM systems.

It enables you to:

- Capture real LLM executions
- Store structured interaction traces
- Replay executions deterministically
- Remove dependency on live model calls during tests

---

## Features

- Request/response capture layer  
- Deterministic replay engine  
- Tool-call mocking support  
- Snapshot-based testing workflow  
- CI-safe execution mode  
- Minimal integration overhead  

---

## Architecture

LLMReplay operates in two primary modes:

### Record Mode

Captures live execution traces from your LLM application, including:

- Inputs
- Outputs
- Tool calls (if applicable)
- Execution metadata

These traces are persisted for later reuse.

---

### Replay Mode

Replays stored traces without invoking external LLM APIs.

This ensures:

- Deterministic outputs
- Fast execution
- No network dependency
- Stable CI behavior

---

## Core Workflow

1. Run your application in **record mode**
2. Generate and store interaction traces
3. Run the same application in **replay mode**
4. Validate outputs against recorded snapshots

---

## Use Cases

- LLM application testing  
- Agent workflow debugging  
- Prompt regression testing  
- Evaluation pipelines  
- CI/CD validation for LLM systems  
- Tool-using agent simulation  

---

## Installation

```bash
pip install llmreplay
````

---

## Quick Start

```python
from llmreplay import ReplayClient

client = ReplayClient()

# Record mode
client.record()
run_your_llm_app()

# Replay mode
client.replay()
run_your_llm_app()
```

---

## Design Principle

> If it cannot be replayed, it cannot be tested.

---

## Roadmap

* Structured trace DAG visualization
* Multi-model replay support
* Latency and stochasticity simulation layer
* Distributed trace collection
* Web-based replay inspector
* Plugin system for tool mocking

---

## License

MIT
