Metadata-Version: 2.3
Name: agentic-lab
Version: 0.1.0
Summary: Universal record-and-replay for LLM agents.
Keywords: llm,agents,replay,tracing,evals
Author: Ambuj Agrawal, Garima Luthra
Author-email: Ambuj Agrawal <ambujagrawal741@gmail.com>, Garima Luthra <garimaluthra2198@gmail.com>
License: Apache-2.0
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Dist: httpx>=0.27,<1.0
Requires-Dist: pydantic>=2.7,<3.0
Requires-Dist: pydantic-settings>=2.4,<3.0
Requires-Dist: structlog>=24.1,<26.0
Requires-Dist: tenacity>=8.5,<10.0
Requires-Dist: python-dotenv>=1.0,<2.0
Requires-Dist: protobuf>=5.27,<7.0
Requires-Dist: pytest>=8.2,<9.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23,<1.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=5.0,<7.0 ; extra == 'dev'
Requires-Dist: pytest-xdist[psutil]>=3.6,<4.0 ; extra == 'dev'
Requires-Dist: respx>=0.21,<1.0 ; extra == 'dev'
Requires-Dist: ruff>=0.6,<1.0 ; extra == 'dev'
Requires-Dist: mypy>=1.11,<2.0 ; extra == 'dev'
Requires-Dist: pre-commit>=3.7,<5.0 ; extra == 'dev'
Requires-Dist: openai>=1.40,<2.0 ; extra == 'dev'
Requires-Dist: starlette>=0.40,<1.0 ; extra == 'dev'
Requires-Dist: uvicorn>=0.30,<1.0 ; extra == 'dev'
Requires-Dist: anyio>=4.4,<5.0 ; extra == 'dev'
Requires-Dist: langchain-core>=0.3,<1.0 ; extra == 'dev'
Requires-Dist: playwright>=1.59,<2.0 ; extra == 'dev'
Requires-Dist: langchain-core>=0.3,<1.0 ; extra == 'langchain'
Requires-Dist: langchain-openai>=0.3.35 ; extra == 'langchain'
Requires-Dist: langgraph>=1.0.1 ; extra == 'langchain'
Requires-Dist: starlette>=0.40,<1.0 ; extra == 'ui'
Requires-Dist: uvicorn>=0.30,<1.0 ; extra == 'ui'
Requires-Dist: anyio>=4.4,<5.0 ; extra == 'ui'
Maintainer: Ambuj Agrawal, Garima Luthra
Maintainer-email: Ambuj Agrawal <ambujagrawal741@gmail.com>, Garima Luthra <garimaluthra2198@gmail.com>
Requires-Python: >=3.12
Project-URL: Homepage, https://github.com/ambuj-krishna-agrawal/agent-lab
Project-URL: Repository, https://github.com/ambuj-krishna-agrawal/agent-lab
Project-URL: Issues, https://github.com/ambuj-krishna-agrawal/agent-lab/issues
Project-URL: Documentation, https://github.com/ambuj-krishna-agrawal/agent-lab#readme
Project-URL: Error reference, https://github.com/ambuj-krishna-agrawal/agent-lab/blob/main/docs/errors.md
Project-URL: Examples, https://github.com/ambuj-krishna-agrawal/agent-lab/tree/main/example
Project-URL: Changelog, https://github.com/ambuj-krishna-agrawal/agent-lab/blob/main/CHANGELOG.md
Provides-Extra: dev
Provides-Extra: langchain
Provides-Extra: ui
Description-Content-Type: text/markdown

# AgentLab

> Universal record-and-replay for LLM agents.

**Status:** pre-alpha, APIs will change.

AgentLab captures model calls, tools, state transitions, and timing into a
trace you can replay without hitting the network. It is built around a
framework-agnostic core and an HTTP capture layer that works with any SDK
that routes requests through `httpx`.

## Overhead

Per-LLM-call cost of running inside `agentlab.record()`:

| metric         | baseline | recorded  | overhead  |
|----------------|----------|-----------|-----------|
| latency p50    | 13.5 ms  | 14.7 ms   | +1.16 ms  |
| latency p99    | 14.4 ms  | 15.9 ms   | +1.52 ms  |

Measured against an in-process loopback HTTP server with a 10 ms upstream
delay (eliminates network jitter so the delta isolates SDK overhead:
HTTP capture, span emit, JSONL write+fsync, matcher, LLMSpan build).
Real LLM calls land in the 100 ms – 2000 ms range, so this works out to
under 1% wall-clock overhead in practice.

Reproduce with:

```bash
uv run python scripts/bench_record_overhead.py --calls 200 --runs 5
```

## Installation

```bash
pip install agentic-lab           # minimal SDK
pip install 'agentic-lab[ui]'     # + Starlette UI server
```

The PyPI distribution is **`agentic-lab`**; the importable Python
module is **`agentlab`**:

```python
import agentlab as al
```

For local development, this repo is `uv`-managed:

```bash
git clone https://github.com/ambuj-krishna-agrawal/agent-lab.git
cd agent-lab
uv sync --all-extras --frozen
```

Use `--frozen` by default so your environment matches `uv.lock` and CI.

## Documentation

* [Quickstart](#quickstart) — five minutes from install to a replayable trace.
* [Provider coverage](#provider-coverage) — every supported LLM provider + how to add custom ones.
* [Error reference](docs/errors.md) — every `AGL-…` code with a remediation sentence (auto-generated from `src/agentlab/errors.py`).
* [Changelog](CHANGELOG.md) — version history.
* [`AGENTS.md`](AGENTS.md) — invariants and quality gates contributors must respect.
* [`CONTRIBUTING.md`](CONTRIBUTING.md) — human-contributor process.

## Configuration

- Secrets live in `.env` (git-ignored). Copy `.env.example` and set the
  provider keys you use.
- Non-secret defaults live in `src/agentlab/_defaults.toml` and can be
  overridden by `AGENTLAB_*` environment variables.
- Full typed config lives in `src/agentlab/config.py`.

## Quickstart

Five minutes from `pip install` to a trace you can replay without an
API key.  The full runnable script lives at
[`example/quickstart.py`](example/quickstart.py); the inline version:

```python
import os
import openai
import agentlab as al

client = openai.OpenAI(
    api_key=os.environ["OPENROUTER_API_KEY"],
    base_url="https://openrouter.ai/api/v1",
)

# 1. Record.
with (
    al.record(agent_name="quickstart") as recording,
    al.agent(name="quickstart", version="0"),
    al.step(role=al.StepRole.EXECUTE),
):
    response = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Reply with the single word 'ok'."}],
        max_tokens=16,
    )
print("model said:", response.choices[0].message.content)
print("trace at:  ", recording.directory)

# 2. Replay — no network, no key.
with al.replay(str(recording.directory)) as session:
    replay = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Reply with the single word 'ok'."}],
        max_tokens=16,
    )
print("replay said:", replay.choices[0].message.content)
print("cache hits: ", session.cache_hits)
```

```bash
pip install 'agentic-lab[ui]' openai
export OPENROUTER_API_KEY=sk-or-...
python example/quickstart.py
agentlab serve --root ~/.agentlab/traces
# → http://127.0.0.1:7861/
```

The `with al.agent(...)` and `al.step(...)` envelopes give the
auto-emitted `LLMSpan` a typed parent (the V4 schema forbids LLM
under bare RUN).  Production agents normally establish these once
near their entrypoints and don't repeat them per-call — see
[`example/workflows/`](example/) for that shape.

## Larger example agents

Three reference agents under [`example/`](example/) cover the
Anthropic [building-effective-agents](https://www.anthropic.com/research/building-effective-agents)
shapes:

| Folder | Shape | What it does |
|---|---|---|
| `workflows/` | Workflow (fixed code path) | Decompose → Wikipedia search → cite → LLM-as-judge → revise. |
| `autonomous/` | Autonomous (model picks each step) | LangGraph observe-plan-act loop that triages support tickets. |
| `hybrid/` | Workflow + autonomous sub-agent | Incident-response pipeline with autonomous investigation step. |

All three use OpenRouter via `langchain-openai`, real (or
realistic) tools, and produce traces directly into `example_traces/`
that `agentlab serve` can browse.

## Provider coverage

Inside an `agentlab.record()` block AgentLab patches `httpx` transport
methods, so **every** SDK that routes through `httpx` (which is most
modern Python LLM SDKs) lands its raw exchange in `http.jsonl`.  That
file is the source of truth for replay; the typed `LLMSpan` is a
best-effort view layered on top.

The built-in matchers turn recognised exchanges into typed `LLMSpan`s
out of the box:

| Provider                          | Endpoint(s)                                          | Stream? |
|-----------------------------------|------------------------------------------------------|---------|
| OpenAI chat completions           | `api.openai.com/v1/chat/completions`                 | yes     |
| OpenAI Responses                  | `api.openai.com/v1/responses`                        | yes     |
| OpenAI Embeddings                 | `api.openai.com/v1/embeddings`                       | n/a     |
| Azure OpenAI chat completions     | `*.openai.azure.com/openai/deployments/<dep>/chat/completions` | yes     |
| Anthropic Messages                | `api.anthropic.com/v1/messages`                      | yes     |
| AWS Bedrock — Invoke              | `bedrock-runtime.<region>.amazonaws.com/model/<id>/invoke[-with-response-stream]` | partial[^1] |
| AWS Bedrock — Converse            | `bedrock-runtime.<region>.amazonaws.com/model/<id>/converse[-stream]` | partial[^1] |
| Google Gemini                     | `generativelanguage.googleapis.com/.../models/<m>:[stream]generateContent` | yes     |
| Vertex AI — Gemini                | `<region>-aiplatform.googleapis.com/.../models/<m>:[stream]generateContent` | yes     |
| Vertex AI — Anthropic (Claude)    | `<region>-aiplatform.googleapis.com/.../models/<m>:[stream]rawPredict` | yes     |
| OpenRouter                        | `openrouter.ai/api/v1/chat/completions`              | yes     |
| Together AI                       | `api.together.{xyz,ai}/v1/chat/completions`          | yes     |
| Groq                              | `api.groq.com/openai/v1/chat/completions`            | yes     |
| Mistral                           | `api.mistral.ai/v1/chat/completions`                 | yes     |
| Fireworks                         | `api.fireworks.ai/inference/v1/chat/completions`     | yes     |
| DeepInfra                         | `api.deepinfra.com/v1/openai/chat/completions`       | yes     |
| Perplexity                        | `api.perplexity.ai/chat/completions`                 | yes     |

[^1]: Bedrock streaming uses AWS event-stream binary framing.
   Buffered responses populate every LLMSpan field; streamed responses
   record the request side and a `validation_errors` entry explaining
   why the response side is empty.  The raw bytes are still preserved
   in `http.jsonl`.

### Adding a custom or self-hosted provider

OpenAI-compatible hosts (vLLM, Ollama, your private gateway) need one
line:

```python
import agentlab as al
from agentlab.llm.matchers.openai import HostPathMatcher

al.register_llm_provider(HostPathMatcher(
    name="my-vllm",
    host_suffix="llm.internal.example.com",
    path_prefix="/v1/chat/completions",
))
```

For wholly different body shapes, subclass `agentlab.llm.LLMProviderMatcher`.

### Pricing

The SDK is **token-only by default** — `LLMSpan.cost.usd` stays at
`0.0` and the span is annotated with `agentlab.llm.pricing.unknown=True`.
Provider list-prices change too often to bake into the SDK.  Operators
who want USD computed on every span install their own table:

```python
from agentlab.llm.pricing import PriceRow, StaticPriceTable, set_price_table

set_price_table(StaticPriceTable(rows=(
    PriceRow("openai", "gpt-4o", 2.50, 10.00),
    PriceRow("anthropic", "claude-3-5-sonnet*", 3.00, 15.00),
)))
```

### Strict mode for unrecognised exchanges

By default, exchanges that don't match any provider matcher log a
warning (one per `(trace, host)`) and the raw exchange remains in
`http.jsonl`.  Power users can opt into stricter behaviour:

```python
with al.record(strict_unknown_provider="raise"):  # or "emit_op"
    ...
```

`"raise"` surfaces the gap as `UnknownLLMProviderError`; `"emit_op"`
records the call as a typed `OpSpan` so the trace tree is complete
even without a matcher.

## UI and examples

Run the backend UI server against bundled traces:

```bash
uv run agentlab --root example_traces serve --port 7861
```

Optional frontend dev server with HMR:

```bash
cd frontend
npm install
npm run dev
```

The bundled runnable agents are seeded from `example/` and are available from
the Agents page when the server starts successfully.

## Production deployment

The OSS UI server can be hosted on a single EC2 box behind Caddy, with a
separate Next.js + Clerk marketing/auth site on Vercel that redirects
authenticated users to it.  See [`deploy/README.md`](deploy/README.md)
for the end-to-end runbook.

## UI walkthrough

### Dashboard
![Dashboard](docs/assets/dashboard.png)

### Traces list
![Traces list](docs/assets/traces-list.png)

### Trace detail
![Trace detail](docs/assets/trace-detail.png)

### Agents
![Agents](docs/assets/agents.png)

### Settings
![Settings](docs/assets/settings.png)

## Development

Run the local quality gate:

```bash
bash scripts/check.sh
```

Equivalent commands:

```bash
uv run ruff check .
uv run ruff format --check .
uv run mypy
uv run pytest tests/unit tests/integration -n auto --dist=worksteal
```

## Testing

Current test tiers:

- `tests/unit/`: hermetic unit tests (no real network).
- `tests/integration/`: in-process integration tests with mocked HTTP where needed.

For live-provider smoke runs, use the runnable examples in `example/` through
their CLIs or the UI Agents page.

## Project layout

```text
agentlab/
├── src/agentlab/
│   ├── __init__.py          # public API surface
│   ├── cli.py               # `agentlab` console entry point
│   ├── config.py            # typed settings
│   ├── recorder.py          # public `record()` context manager
│   ├── _defaults.toml       # bundled non-secret defaults
│   ├── _proto/              # generated protobuf bindings (private)
│   ├── bridges/             # export bridges (e.g. OTel GenAI)
│   ├── core/                # recording primitives
│   ├── io/                  # trace IO + HTTP capture
│   ├── integrations/        # framework adapters
│   ├── llm/                 # provider-agnostic LLM client
│   ├── replay/              # deterministic replay engine
│   ├── storage/             # JSONL + protobuf stores
│   ├── ui/                  # Starlette UI server + DTO mapping
│   ├── pytest.py            # pytest plugin
│   └── promote.py           # replay-test scaffold generator
├── frontend/                # React SPA for the UI server
├── example/                 # bundled runnable agent seeds
├── proto/agentlab/v1/trace.proto
├── scripts/                 # check, proto regen, UI screenshot helpers
├── tests/{unit,integration}/
└── uv.lock
```

## License

Apache 2.0 — see [`LICENSE`](LICENSE).
