Metadata-Version: 2.4
Name: agentweave-sdk
Version: 0.3.1
Summary: Observability and mesh layer for multi-agent AI systems — track what your agents decided, why they decided it, and how they're connected.
Project-URL: Homepage, https://github.com/arniesaha/agentweave
Project-URL: Issues, https://github.com/arniesaha/agentweave/issues
Author: Arnab Saha
License-Expression: MIT
Keywords: agents,observability,opentelemetry,provenance,tracing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: opentelemetry-api>=1.20
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.20
Requires-Dist: opentelemetry-sdk>=1.20
Requires-Dist: pydantic>=2
Requires-Dist: rich>=13
Requires-Dist: typer>=0.9
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: fastapi>=0.110; extra == 'dev'
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: uvicorn[standard]>=0.27; extra == 'dev'
Provides-Extra: grpc
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc>=1.20; extra == 'grpc'
Provides-Extra: proxy
Requires-Dist: fastapi>=0.110; extra == 'proxy'
Requires-Dist: httpx>=0.27; extra == 'proxy'
Requires-Dist: uvicorn[standard]>=0.27; extra == 'proxy'
Description-Content-Type: text/markdown

# AgentWeave

**Agent runtime observability and provenance layer for multi-agent AI systems.**

When agents delegate, loop, and fanout across tools and models, the final output tells you nothing. AgentWeave makes the decision chain the first-class artifact — every span carries [W3C PROV-O](https://www.w3.org/TR/prov-o/) provenance on [OpenTelemetry](https://opentelemetry.io/): which agent acted, which model ran, what was consumed, what was generated, and how much it cost.

Three paths to instrumentation: decorators, auto-instrumentation, or zero-code proxy. Any OTLP backend.

```
agent.nix                          94ms
├── llm.claude-sonnet-4-6          81ms  ← prompt_tokens=847, completion_tokens=312
├── tool.delegate_to_max           312ms
│   └── agent.max                  298ms
│       ├── llm.gemini-2.0-flash   187ms ← prompt_tokens=1203, completion_tokens=89
│       └── tool.web_search         94ms
├── llm.claude-sonnet-4-6          80ms  ← found it
└── tool.deploy_portfolio           48ms
```

## How it works

```mermaid
graph LR
    subgraph Agents["Agents — proxy mode"]
        A1["Anthropic Agent"]
        A2["Gemini Agent"]
        A3["OpenAI Agent"]
    end

    SDK["Any Agent — SDK decorators / auto_instrument()"]

    subgraph Proxy["AgentWeave Proxy :4000"]
        P["Multi-Provider Proxy"]
    end

    subgraph LLMs["Upstream LLMs"]
        AN[api.anthropic.com]
        GO[generativelanguage.googleapis.com]
        OA[api.openai.com]
    end

    subgraph Observability
        OT["OTLP Collector — Tempo / Jaeger / Langfuse"]
        GR["AgentWeave Dashboard"]
    end

    A1 -- "ANTHROPIC_BASE_URL" --> P
    A2 -- "GOOGLE_GENAI_BASE_URL" --> P
    A3 -- "OPENAI_BASE_URL" --> P
    SDK -- "OTel spans" --> OT

    P -- "/v1/messages" --> AN
    P -- "/v1beta/models/*" --> GO
    P -- "/v1/chat/completions" --> OA
    P -- "OTel spans" --> OT
    OT --> GR
```

**Three paths to instrumentation:**

1. **Auto-instrumentation** (`auto_instrument()`) — one call patches Anthropic and OpenAI SDKs. No decorators needed.
2. **Decorators** (`@trace_agent`, `@trace_llm`, `@trace_tool`) — wrap your functions directly in Python, TypeScript, or Go. Zero infrastructure needed.
3. **Proxy** — point any agent's base URL at AgentWeave. It auto-detects the provider, forwards upstream, extracts token counts, and emits OTel spans. No code changes.

## Screenshots

<p align="center">
  <img src="screenshots/agentweave-overview.png" alt="AgentWeave dashboard overview with KPIs, latency, cost, and model/agent breakdowns" width="100%">
  <br>
  <em>Main dashboard overview (KPIs, latency, token/cost trends, and agent/model breakdowns)</em>
</p>

<p align="center">
  <img src="screenshots/agentweave-session.png" alt="AgentWeave session explorer view" width="100%">
  <br>
  <em>Session explorer view</em>
</p>

<p align="center">
  <img src="screenshots/agentweave-routing.png" alt="AgentWeave routing view" width="100%">
  <br>
  <em>Routing view</em>
</p>

<p align="center">
  <img src="screenshots/agentweave-replay.png" alt="AgentWeave replay and debug view" width="100%">
  <br>
  <em>Replay / debug view</em>
</p>

<p align="center">
  <img src="screenshots/agentweave-grafana.png" alt="Grafana Tempo trace view for AgentWeave spans" width="100%">
  <br>
  <em>Grafana / Tempo trace view</em>
</p>

## How AgentWeave fits in the ecosystem

Tools like [OpenLIT](https://openlit.io), [Langfuse](https://langfuse.com), and [LangSmith](https://smith.langchain.com) are good at answering: *what did my LLM do?* Token counts, latency, cost per request, prompt logging. If you have a single agent or a single app making LLM calls, those tools cover the problem well.

AgentWeave answers a different question: *what did my agent system do?*

When one agent delegates to another across different machines, frameworks, or providers, you lose the thread. A trace that stops at the process boundary tells you nothing about why the overall task failed, which agent introduced the bad output, or where the cost actually went.

| | OpenLIT / Langfuse / LangSmith | AgentWeave |
|---|---|---|
| Single-agent LLM tracing | Great | Basic |
| Cost and token tracking per request | Great | Supported |
| Prompt management, evals, playground | Yes (varies) | Out of scope |
| Cross-agent delegation traces | No | Core feature |
| Traces spanning multiple machines | No | Core feature |
| Proxy-based, zero code changes | No | Yes |
| Open source, self-hosted, no SaaS tier | Varies | Yes (MIT) |

**The intended use:** run OpenLIT or Langfuse inside each agent for deep per-agent observability, and point them all at AgentWeave for the system view above that. The delegation graph, cross-agent cost rollups, and traces that span process boundaries are what AgentWeave adds.

No cloud, no SaaS, no enterprise tier. Just the tool.

## Install

AgentWeave is in developer preview. Start with a local proxy and a local OTLP
collector; private dogfood deployments live in runbooks, not in the public
quickstart.

| SDK | Language | Install |
|-----|----------|---------|
| [sdk/python](./sdk/python) | Python | `pip install agentweave-sdk` |
| [sdk/js](./sdk/js) | TypeScript / JavaScript | `npm install agentweave-sdk` |
| [sdk/go](./sdk/go) | Go | `go get github.com/arniesaha/agentweave-go` |

### Local proxy path

```bash
pip install "agentweave-sdk[proxy]"
agentweave proxy start --port 4000 --endpoint http://localhost:4318
export ANTHROPIC_BASE_URL=http://localhost:4000/v1
```

Use your normal provider API key in the client environment. Proxy-side key
injection and private NodePort URLs are dogfood-only conveniences, not required
for the public developer-preview path.

## Quickstart (Python)

### Option A — Auto-instrumentation (zero decorators)

```python
from agentweave import auto_instrument

auto_instrument()  # patches Anthropic + OpenAI SDKs automatically

# Every client.messages.create() and client.chat.completions.create()
# now emits OTel spans with token counts — no wrappers needed.
```

### Option B — Decorators (explicit control)

```python
from agentweave import AgentWeaveConfig, trace_agent, trace_llm, trace_tool

AgentWeaveConfig.setup(
    agent_id="my-agent-v1",
    agent_model="claude-sonnet-4-6",
    otel_endpoint="http://localhost:4318",
)

@trace_llm(provider="anthropic", model="claude-sonnet-4-6",
           captures_input=True, captures_output=True)
def call_claude(messages: list) -> ...:
    return client.messages.create(...)

@trace_tool(name="web_search", captures_input=True, captures_output=True)
def web_search(query: str) -> str:
    ...

@trace_agent(name="my-agent")
async def handle(message: str) -> str:
    response = call_claude(messages=[{"role": "user", "content": message}])
    return web_search(response.content[0].text)
```

All three spans link to the same trace ID. Open any OTLP backend and you see the waterfall.

## Framework Examples

| Framework | Example |
|-----------|---------|
| LangGraph | [examples/langgraph](./examples/langgraph) |
| CrewAI | [examples/crewai](./examples/crewai) |
| AutoGen | [examples/autogen](./examples/autogen) |
| OpenAI Agents SDK | [examples/openai-agents-sdk](./examples/openai-agents-sdk) |

## Auto-Instrumentation

Patch LLM SDK client methods with a single call — no decorators needed.

```python
from agentweave import auto_instrument, uninstrument

auto_instrument()                              # patch all detected SDKs
auto_instrument(providers=["anthropic"])        # selective
auto_instrument(captures_output=True)          # include response preview

uninstrument()                                 # restore originals
```

- Supports **Anthropic** (`Messages.create`) and **OpenAI** (`Completions.create`), sync + async
- Composes with explicit `@trace_llm` — auto-instrumentation detects existing spans and skips to avoid double-tracing
- Idempotent — calling `auto_instrument()` twice is safe
- Streaming support deferred to a follow-up

## Decorators

### `@trace_agent`

Root span for an agent turn. Nests all downstream tool and LLM calls.

```python
@trace_agent(name="nix")
def handle(message: str) -> str: ...
```

#### Session grouping

Pass `session_id` to group all spans from a single user conversation together.
The value is attached as `session.id` on every span, making it a filterable
dimension in Grafana / Tempo.

```python
@trace_agent(name="nix", session_id="conv-abc123")
def handle(message: str) -> str: ...
```

The proxy also accepts the `X-AgentWeave-Session-Id` header for zero-code
session tagging — see [docs/session-grouping.md](docs/session-grouping.md).

### `@trace_tool`

Span for any tool call — file ops, API calls, shell commands, A2A delegation.

```python
@trace_tool(name="delegate_to_max", captures_input=True, captures_output=True)
def delegate_to_max(task: str) -> dict: ...
```

### `@trace_llm`

Span for LLM invocations. Auto-extracts token counts and stop reason from Anthropic, OpenAI, and Google Gemini response shapes.

```python
@trace_llm(provider="anthropic", model="claude-sonnet-4-6", captures_output=True)
def call_claude(messages: list) -> anthropic.Message: ...
```

**Captured automatically:**
- `prov.llm.prompt_tokens` / `prov.llm.completion_tokens` / `prov.llm.total_tokens`
- `prov.llm.stop_reason`
- `prov.llm.response_preview` (first 512 chars, when `captures_output=True`)

## Sub-agent Attribution

When agents delegate to sub-agents, use the sub-agent attribution parameters to link child sessions to their parent and distinguish agent roles in traces.

### Python SDK

```python
# Main agent — tags itself as the root session
@trace_agent(name="nix", session_id="sess-main-123", agent_type="main", turn_depth=1)
def main_agent(msg: str) -> str:
    return delegate_to_sub(msg)

# Sub-agent — linked to parent session
@trace_agent(name="max", parent_session_id="sess-main-123",
             agent_type="subagent", turn_depth=2)
def sub_agent(task: str) -> str:
    return call_llm(task)
```

### Environment variable auto-detection

Set `AGENTWEAVE_PARENT_SESSION_ID` and the SDK auto-populates `prov.parent.session.id`, defaults `agent_type` to `"subagent"`, and `turn_depth` to `2`:

```bash
export AGENTWEAVE_PARENT_SESSION_ID=sess-main-123
```

### Proxy headers

When using the proxy, pass sub-agent context via HTTP headers:

| Header | Span attribute | Example |
|--------|---------------|---------|
| `X-AgentWeave-Parent-Session-Id` | `prov.parent.session.id` | `sess-main-123` |
| `X-AgentWeave-Agent-Type` | `prov.agent.type` | `subagent` |
| `X-AgentWeave-Turn-Depth` | `prov.session.turn` | `2` |

### TypeScript SDK

```typescript
import { traceAgent } from 'agentweave-sdk';

const subAgent = traceAgent({
  name: 'max',
  parentSessionId: 'sess-main-123',
  agentType: 'subagent',
  turnDepth: 2,
})(async (task: string) => {
  return callLlm(task);
});
```

### New span attributes

| Attribute | Description |
|---|---|
| `prov.parent.session.id` | ID of the parent session that spawned this sub-agent |
| `prov.agent.type` | `"main"`, `"subagent"`, or `"delegated"` |
| `prov.session.turn` | Turn depth: 1 = main session, 2 = first-level sub-agent |

## PROV-O Attributes

| Attribute | Description |
|---|---|
| `prov.activity.type` | `tool_call`, `agent_turn`, or `llm_call` |
| `prov.agent.id` | Agent identifier |
| `prov.agent.model` | Model name |
| `prov.used` | Serialized inputs consumed by the activity |
| `prov.wasGeneratedBy` | Output produced by the activity |
| `prov.wasAssociatedWith` | Agent responsible for the activity |
| `prov.llm.provider` | `anthropic`, `openai`, or `google` |
| `prov.llm.prompt_tokens` | Input token count |
| `prov.llm.completion_tokens` | Output token count |
| `prov.llm.total_tokens` | Total tokens |
| `prov.llm.stop_reason` | Why the model stopped |
| `prov.task.label` | Human-readable label for the task this agent is executing |

Full schema: [`sdk/python/agentweave/schema.py`](sdk/python/agentweave/schema.py)

## Proxy — zero-code observability

For agents you can't instrument with decorators (Claude Code, Node.js, any runtime), run the **AgentWeave proxy** — a transparent HTTP server that sits between your agents and their LLM providers. Works with Claude Code out of the box — just set `ANTHROPIC_BASE_URL` in `~/.claude/settings.json` ([setup guide](docs/claude-code-proxy.md)).

```bash
pip install "agentweave[proxy]"
agentweave proxy start --port 4000 --endpoint http://localhost:4318 --agent-id my-agent

# Point agents at the proxy — no code changes needed
export ANTHROPIC_BASE_URL=http://localhost:4000
export GOOGLE_GENAI_BASE_URL=http://localhost:4000
export OPENAI_BASE_URL=http://localhost:4000
```

**OpenAI/Codex streaming note:**
- `/v1/chat/completions` needs `stream_options.include_usage=true` for token usage
- `/v1/responses` and `/codex/responses` do **not** support `stream_options`; usage should arrive in the final `response.completed` event
- AgentWeave handles this difference in the proxy so the traced spans still get tokens/cost when upstream provides them

One port, all providers. Every LLM call gets a span automatically.

> Docker / k8s setup: see [`deploy/docker/Dockerfile`](deploy/docker/Dockerfile)

## Backends

AgentWeave emits standard OTLP HTTP — works with any compatible backend:

| Backend | Endpoint |
|---|---|
| **Grafana Tempo** | `http://tempo:4318` — recommended for self-hosted |
| **Jaeger** | `http://jaeger:4318` |
| **Langfuse v3** | `https://cloud.langfuse.com/api/public/otel` |
| **Console (dev)** | `from agentweave import add_console_exporter; add_console_exporter()` |

## Docs

| Topic | Doc |
|-------|-----|
| Claude Code proxy setup | [docs/claude-code-proxy.md](./docs/claude-code-proxy.md) |
| Session grouping | [docs/session-grouping.md](./docs/session-grouping.md) |
| Proxy setup | [docs/proxy-setup.md](./docs/proxy-setup.md) |
| Production hardening | [docs/production-hardening.md](./docs/production-hardening.md) |
| Provider compatibility | [docs/compatibility.md](./docs/compatibility.md) |
| Deterministic trace IDs | [docs/deterministic-trace-ids.md](./docs/deterministic-trace-ids.md) |
| Span linking design | [docs/span-linking-design.md](./docs/span-linking-design.md) |
| Proxy benchmarks | [docs/benchmarks.md](./docs/benchmarks.md) |
| Versioning policy | [docs/versioning.md](./docs/versioning.md) |

## Development

```bash
git clone https://github.com/arniesaha/agentweave && cd agentweave
pip install -e "./sdk/python[dev]"

pytest sdk/python                                    # 237 Python tests
(cd sdk/js && npm ci && npx jest --verbose)           # 10 TypeScript tests
(cd sdk/go && go test ./... -v)                       # 4 Go tests
```

## License

MIT
