Metadata-Version: 2.4
Name: mimir-observe
Version: 1.1.1
Summary: Auto-instrumentation and visibility for AI agents — OpenAI Agents SDK, Claude SDK, and more.
Author-email: TA Studios <hello@tastudios.ai>
License-Expression: MIT
Project-URL: Homepage, https://mimir.sh
Project-URL: Repository, https://github.com/TA-Studios-AI-Avengers/mimir
Keywords: agents,observability,instrumentation,openai,claude,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# mimir-observe

Auto-instrumentation and visibility for AI agents. Two lines of code, zero config.

```bash
pip install mimir-observe
```

Then run:

```bash
mimir quickstart     # getting-started guide with copy-paste snippets
mimir dashboard      # start the local dashboard at http://localhost:9847
```

## Quick start

### 1. Add instrumentation (2 lines)

Pick the one that matches your stack:

```python
import mimir

# Raw OpenAI client (chat.completions.create)
mimir.instrument_openai()

# Raw Anthropic client (messages.create)
mimir.instrument_anthropic()

# OpenAI Agents SDK (Runner.run / Runner.run_streamed)
mimir.instrument_openai_agents()

# Claude Agent SDK (query)
mimir.instrument_claude()
```

Add these lines **at the top of your entry point**, before any API calls. That's it. Your existing code stays exactly the same.

### 2. Start the dashboard

In a separate terminal:

```bash
python -m mimir.cli dashboard
```

Open **http://localhost:9847** to see your runs.

### 3. There is no step 3

Every API call and agent run is now captured automatically. The dashboard shows:

- **Agent list** with run counts, models, and tools
- **Run timeline** with every tool call (args + results), reasoning block, and token usage
- **Run diffing** -- side-by-side comparison of any two runs
- **Deep Dive** -- multi-run comparison grid with step alignment
- **Divergence detection** -- flags agents whose reruns follow different tool patterns
- **AI Analysis** -- click "Analyze" on any run for an AI-powered trace breakdown

## What gets captured

| Data | How |
|---|---|
| Tool calls | Name, arguments, result, duration |
| Reasoning | Model output text between tool calls |
| Token usage | Input/output tokens per call |
| Cost | If set via `run.set_cost()` |
| Run duration | Wall clock time |
| Run status | Success or error |
| Input/output | Prompt and final result |

## Which instrument function do I use?

| Your code uses | Function |
|---|---|
| `from openai import OpenAI` | `mimir.instrument_openai()` |
| `from anthropic import Anthropic` | `mimir.instrument_anthropic()` |
| `from agents import Runner` | `mimir.instrument_openai_agents()` |
| `from claude_code_sdk import query` | `mimir.instrument_claude()` |

You can call multiple if your project uses more than one SDK.

## Multi-turn agentic loops

If your agent calls the API multiple times in a loop, wrap it with `mimir.trace()` so all calls are grouped as one named run:

```python
import mimir
mimir.instrument_openai()  # or instrument_anthropic()

from openai import OpenAI
client = OpenAI()

with mimir.trace("Migration Planner"):
    # Every API call inside here becomes a step in one run
    response = client.chat.completions.create(model="gpt-4o", messages=[...])
    response = client.chat.completions.create(model="gpt-4o", messages=[...])
    response = client.chat.completions.create(model="gpt-4o", messages=[...])
```

This is important when you have multiple agents using the same model — without `trace()`, they all get lumped together. Each `trace("name")` creates a distinct agent on the dashboard.

Without the wrapper, each API call creates its own run — fine for single calls, wrong for loops.

## How it works

Mimir monkey-patches the SDK at the class level when you call `instrument_*()`. Every subsequent API call is intercepted, telemetry is extracted from the request/response, and it's sent to the local dashboard via fire-and-forget HTTP. Your agent code is never blocked or slowed down.

- Zero external dependencies (stdlib only)
- All data stays local (`~/.mimir/`)
- Dashboard down? Agent runs normally, no errors
- Uninstrument anytime: `mimir.uninstrument_openai()`, etc.

## Manual instrumentation

For custom setups where auto-instrumentation doesn't fit:

```python
import mimir

t = mimir.task(
    name="My Agent",
    config="what it does",
    tools=["search", "write"],
    model="gpt-4o",
)

with t.run(input={"prompt": "user input"}) as run:
    run.tool("search", {"q": "test"}, "3 results", duration_ms=150)
    run.reasoning("Found relevant results, writing report...")
    run.tool("write", {"file": "report.md"}, "ok", duration_ms=50)
    run.set_usage(1500, 800)
    run.set_output("Report written")
```

## Onboarding with Claude Code

If you use Claude Code, paste this prompt to have it instrument your project automatically:

```
Install and set up Mimir agent observability in this project.

Step 1: pip install mimir-observe (if not already installed). Import as `import mimir`.

Step 2: Find the entry point(s) and determine which SDK is used:
  - `from openai import OpenAI` → add `mimir.instrument_openai()`
  - `from anthropic import Anthropic` → add `mimir.instrument_anthropic()`
  - `from agents import Runner` → add `mimir.instrument_openai_agents()`
  - `from claude_code_sdk import query` → add `mimir.instrument_claude()`

Add the 2 lines (import + instrument call) at the top of each entry point,
BEFORE any API calls. No other code changes needed.

Step 3: If the code has multi-turn agentic loops (calling the API multiple times
in a while/for loop), wrap each agent's loop with mimir.trace("Agent Name") so
all turns become steps in one run instead of separate runs:

    with mimir.trace("Migration Planner"):
        # ... the existing loop goes here, unchanged ...

Each distinct agent should get its own mimir.trace() wrapper with a unique name.
Single API calls outside a loop do NOT need this wrapper — they auto-create runs.

Step 4: Start the dashboard: python -m mimir.cli dashboard
```

## AI Analysis

Click the **Analyze** button on any run in the dashboard for an AI-powered breakdown covering:

- Plain English summary of what the agent did
- Efficiency analysis (redundant steps, wasted tokens)
- Cost breakdown by step
- Red flags (loops, repeated failures, excessive reasoning)
- Concrete improvement suggestions

**Zero config** — uses your existing `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` from the environment. Your key is only used locally and goes directly from your machine to the LLM provider. No data passes through any third-party server.

The sidebar shows whether AI Analysis is available. If you don't see it, start the dashboard from the same terminal where your API key is set.

## CLI

```bash
mimir quickstart                   # getting-started guide + Claude Code onboarding prompt
mimir dashboard                    # start dashboard on :9847
mimir dashboard --port 8080        # custom port
mimir version                      # print version
```

All commands also work as `python -m mimir.cli <command>`.

## Requirements

- Python 3.10+
- No external dependencies

The SDKs you want to instrument (`openai`, `anthropic`, `openai-agents`, etc.) must be installed separately.
