Metadata-Version: 2.4
Name: amplitude-ai
Version: 1.0.0.post3
Summary: Agent analytics for Amplitude
Project-URL: Homepage, https://amplitude.com
Project-URL: Documentation, https://docs.amplitude.com
Project-URL: Repository, https://github.com/amplitude/amplitude-ai-python
Author-email: Amplitude <sdk@amplitude.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agents,ai,amplitude,analytics,anthropic,llm,observability,openai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: amplitude-analytics>=1.0.0
Requires-Dist: genai-prices>=0.0.53
Requires-Dist: tiktoken>=0.12.0
Provides-Extra: all
Requires-Dist: anthropic<1.0.0,>=0.75.0; extra == 'all'
Requires-Dist: boto3<2.0.0,>=1.40.0; extra == 'all'
Requires-Dist: crewai>=0.80.0; extra == 'all'
Requires-Dist: google-genai>=1.0.0; extra == 'all'
Requires-Dist: langchain-core>=0.1.0; extra == 'all'
Requires-Dist: llama-index-core>=0.10.0; extra == 'all'
Requires-Dist: mistralai>=1.0.0; extra == 'all'
Requires-Dist: openai-agents>=0.0.3; extra == 'all'
Requires-Dist: openai<3.0.0,>=2.0.0; extra == 'all'
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'all'
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'all'
Requires-Dist: starlette>=0.41.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic<1.0.0,>=0.75.0; extra == 'anthropic'
Provides-Extra: azure
Requires-Dist: openai<3.0.0,>=2.0.0; extra == 'azure'
Provides-Extra: bedrock
Requires-Dist: boto3<2.0.0,>=1.40.0; extra == 'bedrock'
Provides-Extra: crewai
Requires-Dist: crewai>=0.80.0; extra == 'crewai'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
Requires-Dist: mkdocs>=1.6.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.25.0; extra == 'docs'
Provides-Extra: fastapi
Requires-Dist: starlette>=0.41.0; extra == 'fastapi'
Provides-Extra: gemini
Requires-Dist: google-genai>=1.0.0; extra == 'gemini'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.1.0; extra == 'langchain'
Provides-Extra: llamaindex
Requires-Dist: llama-index-core>=0.10.0; extra == 'llamaindex'
Provides-Extra: mistral
Requires-Dist: mistralai>=1.0.0; extra == 'mistral'
Provides-Extra: openai
Requires-Dist: openai<3.0.0,>=2.0.0; extra == 'openai'
Provides-Extra: openai-agents
Requires-Dist: openai-agents>=0.0.3; extra == 'openai-agents'
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'otel'
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'otel'
Description-Content-Type: text/markdown

# amplitude-ai

[![PyPI](https://img.shields.io/pypi/v/amplitude-ai)](https://pypi.org/project/amplitude-ai/)
[![Python](https://img.shields.io/pypi/pyversions/amplitude-ai)](https://pypi.org/project/amplitude-ai/)
[![License](https://img.shields.io/pypi/l/amplitude-ai)](https://github.com/amplitude/amplitude-ai/blob/main/LICENSE)

Agent analytics for [Amplitude](https://amplitude.com). Track every LLM call as events in your Amplitude project, then build funnels, cohorts, and retention charts across AI and product behavior.

```bash
pip install amplitude-ai
```

```python
from amplitude import Amplitude
import amplitude_ai

amplitude_ai.patch(amplitude=Amplitude("YOUR_API_KEY"))

# Your existing code -- unchanged
import openai
response = openai.OpenAI(api_key="sk-...").chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is retention?"}],
)
# [Agent] User Message + [Agent] AI Response now in your Amplitude project
```

One call auto-detects and patches every installed provider (OpenAI, Anthropic, Azure OpenAI, Gemini, Mistral). It also instruments libraries that use these SDKs under the hood (LiteLLM, Pydantic AI, LangChain).

Want more control? See [Choose Your Integration Tier](#choose-your-integration-tier) below.

### Quickstart (5 minutes)

1. **Install:** `pip install amplitude-ai`
2. **Get your API key:** In Amplitude, go to **Settings > Projects** and copy the API key.
3. **Add two lines** to the top of your app (see the example above).
4. **Run your app.** Any `openai`, `anthropic`, `mistralai`, or `google.genai` call is now instrumented automatically.
5. **Open Amplitude > Events.** You should see `[Agent] AI Response` within 30 seconds.

User identity is set per-request: pass `amplitude_user_id="..."` on individual LLM calls, use `Session` for multi-turn conversations, or use `AmplitudeAIMiddleware` in web apps. See [User Identity](#user-identity) below.

To verify locally before checking Amplitude, add `debug=True`:
```python
amplitude_ai.patch(amplitude=Amplitude("YOUR_API_KEY"), debug=True)
# Prints: [amplitude-ai] [Agent] AI Response | model=gpt-4o | tokens=847 | cost=$0.0042 | latency=1,203ms
```

### Current Limitations

| Area | Status |
|---|---|
| Language support | Python only. JS/TS SDK is the next major investment (no public ETA yet). |
| Zero-code patching | OpenAI, Anthropic, Azure OpenAI, Gemini, Mistral. Bedrock: use `wrap()` or swap import. CLI wrapper available for env-var-only setup. |
| Proxy/gateway instrumentation | Use the [OTEL bridge](#otel-genai-bridge-reference) for proxy setups (LiteLLM, Portkey, custom gateways). See [Path B](#path-b-you-already-use-an-otel-llm-tool). |
| Streaming cost tracking | Automatic for OpenAI and Anthropic. Manual token counts for other providers' streamed responses. |

### Is this for me?

**Yes, if** you're building an AI-powered feature (chatbot, copilot, agent, RAG pipeline) and you want to measure how it impacts real user behavior. AI events land in the same Amplitude project as your product events, so you can build funnels from "user asks a question" to "user converts," create cohorts of users with low AI quality scores, and measure retention without stitching data across tools.

**Already using an LLM observability tool?** Keep it. The [OTEL bridge](#otel-genai-bridge-reference) adds Amplitude as a second destination in one line. Your existing traces stay, and you get product analytics on top.

This SDK is for teams who want AI session review, automated enrichment, and business impact measurement in the same place they measure product behavior. The [zero-code path](#quickstart-5-minutes) takes under 5 minutes.

### Why this SDK?

Most AI observability tools give you traces. This SDK gives you **per-turn events that live in your product analytics** so you can build funnels from "user opens chat" through "AI responds" to "user converts," create cohorts of users with low AI quality scores and measure their 7-day retention, and answer "is this AI feature helping or hurting?" without moving data between tools.

The structural difference is the event model. Trace-centric tools typically produce spans per LLM call. This SDK produces **one event per conversation turn** with 40+ properties: model, tokens, cost, latency, reasoning, implicit feedback signals (regeneration, copy, abandonment), cache breakdowns, agent hierarchy, and experiment context. Each event is independently queryable in Amplitude's charts, cohorts, funnels, and retention analysis.

**Every AI event carries your product `user_id`.** No separate identity system, no data joining required. Build a funnel from "user opens chat" to "AI responds" to "user upgrades" directly in Amplitude, using the same user properties and cohort definitions you already have.

**Server-side enrichment does the evals for you.** When content is available (`content_mode="full"`), Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, quality rubrics, behavioral flags, and session outcomes without writing or maintaining any eval code. Define your own topics and scoring rubrics; the pipeline applies them to every session automatically. Results appear as `[Agent] Score` events with rubric scores, `[Agent] Topic Classification` events with category labels, and `[Agent] Session Evaluation` summaries, all queryable in charts, cohorts, and funnels alongside your product events.

**Quality signals from every source in one event type.** User thumbs up/down (`source="user"`), automated rubric scores from the enrichment pipeline (`source="ai"`), and reviewer assessments (`source="reviewer"`) all produce `[Agent] Score` events differentiated by `[Agent] Evaluation Source`. One chart shows all three side by side. Filter by source or view them together. Filter by `[Agent] Agent ID` for per-agent quality attribution.

**Three content-control tiers.** `full` sends content and Amplitude runs enrichments for you. `metadata_only` sends zero content (you still get cost, latency, tokens, session grouping, and everything that doesn't require text). `customer_enriched` sends zero content but lets you provide your own structured labels via `track_session_enrichment()` for the same analytics value with full data control. See [Privacy & Content Control](#privacy--content-control) for what each tier enables.

**Cache-aware cost tracking.** Pass `cache_read_tokens` and `cache_creation_tokens` for accurate blended costs. With Anthropic's prompt caching, naive `tokens x price` overestimates by 2-5x on multi-turn sessions. The SDK uses cache-aware pricing automatically via [genai-prices](https://pypi.org/project/genai-prices/) when you provide the token breakdown. Supported for OpenAI, Anthropic, Gemini, Azure OpenAI, and AWS Bedrock.

**Works alongside your existing LLM tools.** Add the OTEL GenAI exporter to your pipeline to send spans to Amplitude alongside Langfuse, OpenLIT, or other destinations with no changes to your existing instrumentation code. Use LangChain, LlamaIndex, OpenAI Agents SDK, Anthropic tool_use loop, or CrewAI integrations for framework-level tracking. Or swap in provider wrappers (OpenAI, Anthropic, Gemini, Azure, Bedrock, Mistral) for the richest field coverage.

**Multi-agent and multi-tenant from day one.** `ai.agent()` creates a bound handle that carries `agent_id`, `agent_version`, `env`, and optional multi-tenant fields so you never repeat them. `ai.tenant()` pre-fills `customer_org_id` and `groups` for platforms serving multiple customers. `agent.child()` auto-sets `parent_agent_id` and inherits `agent_version`. `agent.session()` manages lifecycle automatically and propagates context to provider wrappers and the OTEL bridge via Python's `contextvars`.

### What you can build

Once AI events are in Amplitude alongside your product events:

**Cohorts.** "Users who had 3+ task failures in the last 30 days." "Users with low task completion scores." Target them with Guides, measure churn impact.

**Funnels.** "AI session about charts -> Chart Created." "Sign Up -> First AI Session -> Conversion." Measure whether AI drives feature adoption and onboarding.

**Retention.** Do users with successful AI sessions retain better than those with failures? Segment retention curves by `[Agent] Overall Outcome` or task completion score.

**Agent analytics.** Compare quality, cost, and failure rate across agents in one chart. Identify which agent in a multi-agent chain introduced a failure.

### Choose Your Integration Tier

If you're adding to an existing codebase, use **Zero-code** or **Wrap**. If you're starting fresh, use **Swap import**. All three auto-track `[Agent] User Message` and `[Agent] AI Response` with full token, cost, and latency properties. Combine any of them with `BoundAgent` + `Session` to unlock the full event model: tool calls, scoring, implicit feedback, session lifecycle, and enrichment.

| Tier | When to use | How |
|---|---|---|
| **Zero-code** | You want tracking without touching existing code | `amplitude_ai.patch(amplitude=amp)` |
| **Wrap** | You've already created a client | `wrap(client, amplitude=amp)` |
| **Swap import** | Starting fresh or want richest field coverage | `from amplitude_ai import OpenAI` |
| **Full control** | Multi-agent, custom scoring, session lifecycle | `BoundAgent` + `Session` |
| **FastAPI middleware** | Web app, auto-session per request | `AmplitudeAIMiddleware` |

**Zero-code** patches provider modules so existing calls are instrumented automatically:

```python
import amplitude_ai

amplitude_ai.patch(amplitude=amp)
# All subsequent openai, anthropic, gemini, mistral calls are instrumented

amplitude_ai.unpatch()  # Restore all originals -- critical for test isolation
```

`patch()` auto-detects installed providers and returns a list of what it patched (e.g. `["openai", "anthropic", "gemini"]`). If you only want to patch a specific provider, use the per-provider functions:

```python
amplitude_ai.patch_openai(amplitude=amp)
amplitude_ai.patch_anthropic(amplitude=amp)
amplitude_ai.patch_gemini(amplitude=amp)
amplitude_ai.patch_mistral(amplitude=amp)
amplitude_ai.patch_azure_openai(amplitude=amp)
```

Zero-code patching is available for OpenAI, Anthropic, Azure OpenAI, Gemini, and Mistral. For Bedrock, use the **Swap import** provider class directly (`from amplitude_ai.providers.bedrock import Bedrock`) because the `boto3.client()` factory pattern doesn't support clean monkey-patching.

**No-code setup (CLI wrapper)** — instrument your application using only environment variables, without modifying any Python source code:

```bash
pip install amplitude-ai
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument python app.py
```

The CLI detects installed LLM providers, patches them, then runs your command. All standard `openai`, `anthropic`, `mistralai`, and `google.genai` calls in your app are instrumented automatically. Optional environment variables:

| Variable | Description |
|---|---|
| `AMPLITUDE_AI_API_KEY` | **(required)** Amplitude API key |
| `AMPLITUDE_AI_AUTO_PATCH` | Must be `"true"` to enable patching |
| `AMPLITUDE_AI_CONTENT_MODE` | `"full"` (default), `"metadata_only"`, or `"customer_enriched"` |
| `AMPLITUDE_AI_DEBUG` | `"true"` for colored event summaries on stderr |

**Wrap** instruments a client you've already created (OpenAI, Anthropic, Azure OpenAI):

```python
from openai import OpenAI
client = OpenAI(api_key="sk-...")
wrapped = amplitude_ai.wrap(client, amplitude=amp, user_id="u1")
# wrapped is a real amplitude_ai.OpenAI instance
```

Move to **Full control** when you need multi-agent hierarchy, custom scoring, or session lifecycle management.

### Multi-service / Distributed Tracing

If your LLM pipeline spans multiple services (e.g., an orchestrator calling a retrieval service that calls an LLM), enable context propagation so sessions link across service boundaries:

```python
from amplitude_ai import AmplitudeAI
from amplitude_ai.config import AIConfig

ai = AmplitudeAI(
    amplitude=amplitude,
    config=AIConfig(propagate_context=True),
)
```

When enabled, provider wrappers inject W3C `traceparent` and `x-amplitude-session-id` headers on outgoing LLM calls. Downstream services running the SDK (or the `AmplitudeAIMiddleware`) automatically pick up this context, linking the sessions into a single distributed trace.

You can also inject/extract context manually for non-LLM HTTP calls:

```python
from amplitude_ai.propagation import inject_context, extract_context

# Sender: inject context into outgoing headers
headers = inject_context(existing_headers)
requests.post("https://downstream-service/api", headers=headers)

# Receiver: extract context from incoming headers
ctx = extract_context(request.headers)
# ctx = {"trace_id": "...", "session_id": "...", "agent_id": "..."}
```

Context propagation is opt-in (default `False`) because injecting extra headers into LLM API calls is harmless for most providers (they ignore unknown headers), but some proxies or custom endpoints may reject them.

### Developer Experience

Enable **debug mode** to see every tracked event in your terminal. For the zero-code tier, pass `debug=True` to `patch()` (see [Quickstart](#quickstart-5-minutes)). For full-control usage, set it on `AIConfig`:

```python
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(debug=True))
# [amplitude-ai] [Agent] AI Response | user=u1 | session=a3f8... | model=gpt-4o | tokens=1,247 | cost=$0.0089 | latency=1,203ms
```

Use **dry-run mode** in CI to validate events without sending them:

```python
ai = AmplitudeAI(api_key="unused", config=AIConfig(dry_run=True))
```

Enable **strict validation** to catch bad inputs early (empty `user_id`, negative `latency_ms`, non-numeric scores):

```python
ai = AmplitudeAI(api_key="...", config=AIConfig(validate=True))
# Raises ValidationError on bad inputs instead of silently continuing
```

Combine all three for the strictest CI configuration:

```python
ai = AmplitudeAI(api_key="unused", config=AIConfig(debug=True, dry_run=True, validate=True))
```

Inspect current configuration at any time:

```python
ai.status()
# {"content_mode": "full", "debug": False, "dry_run": False,
#  "redact_pii": False, "providers_available": ["openai", "anthropic"],
#  "patched_providers": ["openai"]}
```

### Model Tier Auto-Inference

Every `[Agent] AI Response` event automatically includes a `[Agent] Model Tier` property (`"fast"`, `"standard"`, or `"reasoning"`) inferred from the model name. This enables cost optimization insights like "70% of simple sessions use your most expensive model."

Override when the auto-inference is wrong:

```python
ai.track_ai_message(..., model_tier="reasoning")
```

Coverage at launch: GPT-4o-mini/Haiku/Flash = `fast`, GPT-4o/Sonnet/Pro = `standard`, o1/o3/DeepSeek-R1 = `reasoning`.

### Semantic Cache Tracking

Track full-response semantic cache hits (distinct from token-level prompt caching):

```python
ai.track_ai_message(..., was_cached=True)  # Served from Redis/semantic cache
```

Maps to `[Agent] Was Cached`. Enables "cache hit rate" charts and cost optimization analysis.

### FastAPI / Starlette Middleware

Auto-create sessions per HTTP request with context propagation to all SDK calls within the handler:

```python
from amplitude_ai.middleware import AmplitudeAIMiddleware

app.add_middleware(
    AmplitudeAIMiddleware,
    amplitude_ai=ai,
    user_id_resolver=lambda request: request.state.user.id,
)
```

Provider wrappers and `@tool` calls within the request handler automatically inherit the session context. No manual `session_id` passing needed.

## Get Started

Pick the path that matches where you are today. Both converge on the same analytics: sessions, scoring, enrichments, funnels across product and AI events.

### Path A: You use Amplitude for product analytics

You already have `amplitude-analytics` sending product events. Now you're adding AI features and want those events in the same project.

**Step 1: Swap your LLM import**

```bash
pip install "amplitude-ai[openai]"   # or [anthropic], [gemini], [bedrock], [mistral]
```

```python
from amplitude import Amplitude
from amplitude_ai import AmplitudeAI, OpenAI  # drop-in replacement

# Share your existing Amplitude instance -- same pipeline, no duplicate queues
amplitude = Amplitude("YOUR_API_KEY")
ai = AmplitudeAI(amplitude=amplitude)

# Use the wrapped client exactly like the original
client = OpenAI(amplitude=amplitude, api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is retention?"}],
    amplitude_user_id="user-1",
)
# [Agent] User Message + [Agent] AI Response now appear in your Amplitude project
# alongside your existing product events
```

That's it. You're already getting `[Agent] User Message` and `[Agent] AI Response` events with model, tokens, cost, and latency. Everything below is progressive enhancement for when you need session grouping, multi-agent hierarchy, or enrichments.

**Step 2: Add session context** *(optional)*

Wrap your code in `agent.session()`. The provider wrapper call itself **doesn't change**. It automatically inherits `session_id`, `trace_id`, `agent_id`, and `turn_id` from the active session via Python's `contextvars`.

```python
agent = ai.agent("support-bot", env="production")

with agent.session(user_id="user-1") as s:
    s.new_trace()
    s.track_user_message(content="What is retention?")

    # Same wrapper call as Step 1 -- no amplitude_user_id needed
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is retention?"}],
    )
    # NOW emits with session_id, trace_id, agent_id, turn_id

    s.score(name="helpful", value=1.0, target_id="...")
# Session auto-ends, server-side enrichment kicks in
```

For sessions where gaps between messages may exceed 30 minutes (e.g., coding assistants, support agents waiting on customer replies), pass `idle_timeout_minutes` so Amplitude knows the session is still active:

```python
with agent.session(idle_timeout_minutes=240) as s:  # expect up to 4-hour gaps
    ...
```

Without this, sessions with long idle periods may be closed and evaluated prematurely. The default is 30 minutes.

**Link to Session Replay** *(optional)*

If your frontend uses Amplitude's [Session Replay](https://www.docs.developers.amplitude.com/session-replay/), you can link browser recordings to AI sessions. Pass the browser's `device_id` and `session_id` to `agent.session()` and every `[Agent]` event will automatically include the `[Amplitude] Session Replay ID` property (`device_id/session_id`), enabling one-click navigation from an AI session to the corresponding replay.

```python
# The frontend sends device_id and session_id to your backend
# (e.g., via request headers, query params, or the request body).
with agent.session(
    user_id="user-1",
    device_id=request.headers["X-Amp-Device-Id"],
    browser_session_id=request.headers["X-Amp-Session-Id"],
) as s:
    s.new_trace()
    s.track_user_message(content="What is retention?")
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is retention?"}],
    )
    # All events now carry [Amplitude] Session Replay ID
```

Provider wrappers, `@tool` calls, and manual `track_*` calls all inherit the replay ID automatically when inside the session block.

**What unlocks at each step:**

| Step | Events | Key fields | What's new |
|---|---|---|---|
| **Swap import** | `[Agent] User Message`, `[Agent] AI Response` | model, provider, tokens (input/output/reasoning/cache), cost (cache-aware), latency, TTFB, system prompt, temperature, top_p, max_output_tokens, is_streaming, reasoning content, finish_reason, errors, implicit feedback (is_regeneration, is_edit, was_copied), file attachments | AI events appear alongside product events. Build funnels across both. |
| **+ session context** | + `[Agent] Session End`, `[Agent] Score` | + session_id, trace_id, turn_id, agent_id, agent_version, env, abandonment_turn | Session grouping, scoring, abandonment analysis, server-side enrichments (when content_mode=full). Cohort by session quality, funnel from product events through AI sessions. Compare agent versions. |
| **+ manual track_\* calls** | + `[Agent] Tool Call`, `[Agent] Embedding`, `[Agent] Span`, `[Agent] Session Enrichment` | + event linking (parent_message_id, parent_span_id), custom properties, multi-agent fields (parent_agent_id) | Full event graph, customer-provided enrichments, multi-agent hierarchies. |

**Next:** [Scoring](#scoring) | [Enrichments](#enrichments) | [All providers](#provider-wrappers) | [Privacy](#privacy--content-control)

---

### Path B: You already use an OTEL LLM tool

Already using Langfuse, OpenLIT, or Datadog for tracing? Keep them. Add Amplitude as a second destination in one line. You get product analytics for AI (funnels, cohorts, retention across AI and product events) without ripping out your existing setup. The OTEL GenAI exporter consumes any [OTEL GenAI semantic convention](https://opentelemetry.io/docs/specs/semconv/gen-ai/) spans and maps them to Amplitude `[Agent]` events.

**Step 1: Add the bridge**

```bash
pip install "amplitude-ai[otel]"
```

```python
from amplitude import Amplitude
from amplitude_ai import AmplitudeAI
from opentelemetry import trace
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from amplitude_ai.integrations.opentelemetry import AmplitudeAgentExporter

amplitude = Amplitude("YOUR_API_KEY")
ai = AmplitudeAI(amplitude=amplitude)

# Add alongside your existing TracerProvider (e.g. Langfuse, OpenLIT, etc.)
trace.get_tracer_provider().add_span_processor(
    SimpleSpanProcessor(AmplitudeAgentExporter(amplitude=amplitude, user_id="user-123"))
)
# All GenAI spans now flow to Amplitude as [Agent] events -- zero changes to your
# existing instrumentation. Your OTEL tool keeps working exactly as before.
```

**Step 2: Add session context**

Wrap your code in `agent.session()` and the OTEL bridge automatically inherits session/agent context via the same `ContextVar` mechanism:

```python
agent = ai.agent("support-bot", env="production")

with agent.session(user_id="user-123") as s:
    s.new_trace()
    s.track_user_message(content="What is retention?")

    # Any OTEL-instrumented GenAI calls inside this block automatically get
    # session_id, trace_id, turn_id, and agent_id in Amplitude
    result = my_instrumented_function(...)  # Langfuse @observe, OpenLIT, etc.

    s.score(name="helpful", value=1.0, target_id="...")
# Session auto-ends, server-side enrichment kicks in
```

**What unlocks at each step:**

| Step | Events | Key fields | Not available from OTEL |
|---|---|---|---|
| **Add bridge** | `[Agent] User Message`, `[Agent] AI Response`, `[Agent] Embedding`, `[Agent] Tool Call` | model, provider, tokens (input/output), basic cost, latency, system prompt, temperature, top_p, max_output_tokens, content (if opted-in), errors | Cache tokens, reasoning content/tokens, TTFB, streaming detection, implicit feedback, file attachments, event graph linking (parent_message_id) |
| **+ session context** | + `[Agent] Session End`, `[Agent] Score` | + session_id, trace_id, turn_id, agent_id, agent_version, env, abandonment_turn | Same field gaps, but now: session grouping, scoring, abandonment analysis, server-side enrichments. Compare agent versions. Build funnels from product events through AI sessions. |
| **+ selective native wrappers** | Same events, richer fields on wrapped providers | + cache-aware cost, reasoning content, TTFB, streaming, implicit feedback (is_regeneration, is_edit, was_copied), file attachments for those providers | Gaps closed progressively per provider you wrap. See [Provider Wrappers](#provider-wrappers). |

The third row is the natural upgrade path: start with the OTEL bridge for everything, then selectively wrap your most important provider calls for full field coverage. The bridge and native wrappers coexist; you don't have to choose one or the other.

**Next:** [OTEL Bridge details](#otel-genai-bridge-reference) | [Scoring](#scoring) | [Privacy](#privacy--content-control) | [Provider Wrappers](#provider-wrappers)

### User Identity

User identity flows through the **session** or **per-call**, not at agent creation or patch time. This keeps the agent reusable across users.

**Via sessions** (recommended): pass `user_id` when opening a session:
```python
agent = ai.agent("support-bot", env="production")
with agent.session(user_id="user-42") as s:
    s.new_trace()
    s.track_user_message(content="Hello")
    response = client.chat.completions.create(model="gpt-4o", messages=[...])
```

**Per-call**: pass `amplitude_user_id` on each LLM call (useful with the zero-code tier):
```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    amplitude_user_id="user-42",
)
```

**Via middleware**: `AmplitudeAIMiddleware` extracts user identity from the request (see [FastAPI / Starlette Middleware](#fastapi--starlette-middleware)).

### Initialization Options

```python
from amplitude import Amplitude

# Recommended -- share your existing Amplitude pipeline
amplitude = Amplitude("YOUR_API_KEY")
ai = AmplitudeAI(amplitude=amplitude)

# Standalone (creates an Amplitude instance internally)
ai = AmplitudeAI(api_key="YOUR_API_KEY")

# EU Data Residency
amplitude = Amplitude("YOUR_API_KEY")
amplitude.configuration.server_zone = "EU"
ai = AmplitudeAI(amplitude=amplitude)
```

---

## Core Concepts

The SDK organizes AI interactions into **sessions**, **traces**, **turns**, and **spans**:

```
What you instrument:
SESSION
  Trace                            ← one per user message (new_trace())
    [Agent] User Message             automatic from provider wrapper
    [Agent] Tool Call                automatic or track_tool_call()
    [Agent] AI Response              automatic from provider wrapper
    [Agent] Score                    s.score() — rate this response
  ...repeat per conversation turn...
  [Agent] Session Enrichment         track_session_enrichment()
  [Agent] Score                      s.score() — rate the session
  [Agent] Session End                track_session_end()

What Amplitude adds automatically (content_mode="full" only):
  [Agent] Session Evaluation         outcome, flags, behavioral patterns
  [Agent] Topic Classification       one per topic model you define
  [Agent] Score (ai)                 one per rubric you define
```

Scores can attach at **message level** (rate a specific response) or **session level** (rate the whole conversation). Enrichments attach at **session level** only.

| Concept | Property | Description |
|---------|----------|-------------|
| **Session** | `session_id` | A conversation between a user and the AI. All events in one conversation share the same `session_id`. |
| **Trace** | `trace_id` | One user-message-to-AI-response cycle. Generate a new `trace_id` (UUID) each time the user sends a message. All events in that cycle (the user message, any tool calls, and the AI response) share the same `trace_id`. Use `new_trace()` or pass a UUID directly. |
| **Turn** | `turn_id` | Monotonically increasing counter for event ordering. The SDK auto-increments per session when omitted. For custom ordering (e.g., per-trace numbering), pass explicit values. |
| **Span** | `span_id` | A tracked operation: tool call, embedding, vector search, or custom step. |
| **Agent** | `agent_id` | Which agent handled the interaction (for multi-agent systems). |

### Events at a Glance

The SDK produces 8 event types. When `content_mode="full"`, Amplitude's server adds 3 more per session automatically.

| Event | What it captures |
|-------|-----------------|
| `[Agent] User Message` | User's input, attachments (file uploads), regeneration/edit signals |
| `[Agent] AI Response` | Model output, tokens, cost, latency, reasoning, system prompt, model config, copy signal |
| `[Agent] Tool Call` | Function/tool invocation by the AI |
| `[Agent] Embedding` | Vector embedding operation |
| `[Agent] Span` | Any pipeline step (search, rerank, guardrails) |
| `[Agent] Session End` | Explicit session close, abandonment tracking |
| `[Agent] Session Enrichment` | Your structured labels (topics, rubrics, outcomes) |
| `[Agent] Score` | Quality signal on a message or session (user, automated, or annotator) |
| *Server-side (automatic when content_mode="full"):* | |
| `[Agent] Session Evaluation` | Session summary with outcome and behavioral flags |
| `[Agent] Topic Classification` | Category label per configured topic model |
| `[Agent] Score` (automated) | Rubric score per configured rubric |

See [Event Schema](#event-schema) for the full property reference. Every tracking method returns a unique ID (`message_id`, `invocation_id`, or `span_id`) that you can use to link related events into a graph. See [Event Linking](#event-linking) for the full table and code examples.

### What You Actually Get

Every SDK tracking call produces a standard Amplitude event. Here's what they look like in practice. These are the events you query in charts, cohorts, and funnels. Notice that `user_id` is the same one your product events use.

**A `[Agent] AI Response` event** (SDK-emitted, immediate):

```json
{
  "event_type": "[Agent] AI Response",
  "user_id": "user-42",
  "event_properties": {
    "[Agent] Session ID": "sess-abc-123",
    "[Agent] Trace ID": "trace-7f3a",
    "[Agent] Turn ID": 2,
    "[Agent] Message ID": "msg-9e2f-4a1b",
    "[Agent] Model Name": "claude-sonnet-4-20250514",
    "[Agent] Provider": "anthropic",
    "[Agent] Latency Ms": 1240.5,
    "[Agent] TTFB Ms": 89.3,
    "[Agent] Input Tokens": 4850,
    "[Agent] Output Tokens": 312,
    "[Agent] Cache Read Tokens": 4200,
    "[Agent] Cost USD": 0.0019,
    "[Agent] Finish Reason": "end_turn",
    "[Agent] Is Streaming": true,
    "[Agent] Temperature": 0.7,
    "[Agent] Agent ID": "support-bot",
    "[Agent] Agent Version": "v4.2",
    "[Agent] Env": "production",
    "[Agent] Context": "{\"experiment_variant\": \"prompt-v2\", \"surface\": \"chat\"}",
    "[Agent] Was Copied": true,
    "[Agent] Is Error": false,
    "[Agent] Component Type": "llm",
    "[Agent] SDK Version": "0.3.0",
    "[Agent] Runtime": "python"
  }
}
```

**A `[Agent] Session Evaluation` event** (server-generated after session closes, when `content_mode="full"`):

```json
{
  "event_type": "[Agent] Session Evaluation",
  "user_id": "user-42",
  "event_properties": {
    "[Agent] Session ID": "sess-abc-123",
    "[Agent] Overall Outcome": "response_provided",
    "[Agent] Turn Count": 4,
    "[Agent] Has Task Failure": false,
    "[Agent] Has Negative Feedback": false,
    "[Agent] Has Technical Failure": false,
    "[Agent] Behavioral Patterns": ["multi_turn_refinement"],
    "[Agent] Agent Chain Depth": 1,
    "[Agent] Models Used": ["claude-sonnet-4-20250514"],
    "[Agent] Session Cost USD": 0.0087,
    "[Agent] Evaluation Source": "ai",
    "[Agent] Taxonomy Version": "2.0"
  }
}
```

The first event is queryable immediately. The second appears within minutes of session close. Both carry the same `user_id` and `session_id`. Build a cohort from Session Evaluation properties (e.g., `Has Task Failure = true`) and measure that cohort's 7-day retention using your existing product events.

### Data Flow

The SDK uses **composition**: it wraps an `Amplitude` instance rather than subclassing it. When `amplitude` is passed in, the SDK shares your existing event pipeline with no duplicate queues. It never opens its own network connections:

```
Your Code                        amplitude-ai SDK                  Amplitude
─────────────────────────────────────────────────────────────────────────────
                                 ┌─────────────────┐
ai.track_ai_message(...)  ────→  │   AmplitudeAI   │
                                 │  - Apply privacy │
                                 │  - Build event   │
                                 └────────┬─────────┘
                                          ▼
                                 ┌─────────────────┐
                                 │    Amplitude     │  ────→  Amplitude API
                                 │  (your instance) │         (US or EU)
                                 └─────────────────┘
```

Privacy controls (content mode, PII redaction) are applied _before_ events leave your process. Content is never sent unfiltered and then redacted server-side.

---

## Integration Patterns

The SDK supports three integration patterns. Pick the one that matches your architecture.

### Pattern A: Single-Request Handler

**Use when:** Your session starts and ends in a single code path (Lambda functions, synchronous API endpoints, CLI tools).

```python
agent = ai.agent("support-bot", env="production")

with agent.session(user_id="user-1") as s:
    s.new_trace()
    s.track_user_message(content="What is retention?")
    ai_msg = s.track_ai_message(
        content="Retention measures...",
        model="gpt-4o",
        provider="openai",
        latency_ms=350.0,
        input_tokens=50,
        output_tokens=200,
    )
    s.score(name="helpful", value=1.0, target_id=ai_msg)
# Session auto-ends here -- track_session_end() called on __exit__
# Server-side enrichment kicks in (when content_mode="full")
```

The `Session` context manager auto-generates a `session_id`, handles `track_session_end()` on exit (even on exception), and publishes session context into Python's `contextvars` so provider wrappers and the OTEL bridge inherit `session_id`, `trace_id`, `agent_id`, and `turn_id` automatically.

### Pattern B: Long-Lived Conversation

**Use when:** Your session spans multiple HTTP requests, WebSocket messages, or Slack interactions. This is the most common real-world pattern for chatbots and conversational agents.

Use `BoundAgent` directly, NOT the `Session` context manager, because the session outlives any single code path.

```python
agent = ai.agent(
    "support-bot",
    user_id="user-1",
    env="production",
    session_id="thread-abc",
)

# --- Request 1: user sends a message ---
agent.track_user_message(content="What is retention?", trace_id="req-1", turn_id=1)
ai_msg = agent.track_ai_message(
    content="Retention measures...",
    model="gpt-4o",
    provider="openai",
    latency_ms=350.0,
    trace_id="req-1",
    turn_id=2,
)

# --- Request 2: user follows up ---
agent.track_user_message(content="Show me an example", trace_id="req-2", turn_id=1)
agent.track_ai_message(
    content="Here's a retention chart...",
    model="gpt-4o",
    provider="openai",
    latency_ms=200.0,
    trace_id="req-2",
    turn_id=2,
)

# --- When conversation truly ends (or rely on server-side 30min idle timeout) ---
agent.track_session_end()
```

Key differences from Pattern A:
- `BoundAgent` carries all context (`user_id`, `agent_id`, `session_id`, `env`, etc.) without a `with` block
- Generate a new `trace_id` for each user-message-to-AI-response cycle
- Pass explicit `turn_id` values (or omit to auto-increment per session)
- Call `track_session_end()` explicitly when the conversation ends, or let the server auto-close after 30 minutes of inactivity

### Pattern C: Multi-Agent Orchestrator

**Use when:** Multiple agents collaborate on a task. `agent.child()` creates a sub-agent that inherits session context and auto-sets `parent_agent_id`.

```python
orchestrator = ai.agent("orchestrator", env="production")

with orchestrator.session(user_id="u1") as s:
    s.track_user_message(content="Compare our pricing to competitors")

    # Fan out to sub-agents
    researcher = orchestrator.child("researcher")
    writer = orchestrator.child("writer")
    # researcher.parent_agent_id == "orchestrator" (automatic)
    # researcher inherits user_id, env, session_id, trace_id, groups

    researcher.track_tool_call(
        tool_name="web_search", latency_ms=500, success=True,
        session_id=s.session_id, trace_id=s.trace_id,
    )

    # Fan in: orchestrator synthesizes results
    s.track_ai_message(
        content="Based on research...",
        model="gpt-4o",
        provider="openai",
        latency_ms=500,
    )
```

See [Multi-Agent Patterns](#multi-agent-patterns) for more examples (linear chains, fan-out/fan-in, dynamic routing).

### Which API Should I Use?

```
Want automatic LLM call tracking?
  YES --> Provider Wrappers (OpenAI, Anthropic, etc.)
          + BoundAgent for session/agent context
  NO  --> Manual track_*() calls

Want to track a Python function as a tool call automatically?
  YES --> @tool decorator (zero boilerplate)
  NO  --> track_tool_call() manually

Want to track a function as a span (pipeline step, retriever, etc.)?
  YES --> @observe decorator (auto session lifecycle)
  NO  --> track_span() manually

Session contained in one code path (Lambda, sync handler)?
  YES --> agent.session() context manager  (Pattern A)
  NO  --> BoundAgent directly + explicit track_session_end()  (Pattern B)

Multiple agents collaborating?
  YES --> agent.child() for sub-agents  (Pattern C)
  NO  --> Single BoundAgent is sufficient
```

---

## Going Deeper

### Privacy & Content Control

Three tiers control who does the enrichment and what data leaves your environment:

| Mode | What you send | Who enriches | Best for |
|------|--------------|-------------|----------|
| `full` | Content + metrics | **Amplitude** automatically classifies every session: topic models, quality rubrics, behavioral flags, outcomes | Maximum insight, zero eval code, works out of the box |
| `metadata_only` | Metrics only (no content) | Nobody | Strict environments where no conversation text can leave your infrastructure |
| `customer_enriched` | Your labels + metrics | **You** run your own classifiers, send structured labels via `track_session_enrichment()` | Teams in regulated industries who want full analytics value and full data control |

> **This is a control gradient, not a quality gradient.** `customer_enriched` gives the same analytics output as `full`. The difference is who runs the enrichment. In `full` mode, Amplitude does it for you. In `customer_enriched` mode, you do it yourself and send structured labels. The result in your charts, cohorts, and funnels is the same.

For teams in regulated industries or with strict data residency requirements, `customer_enriched` is the recommended path: you get full analytics value without sending any conversation content to Amplitude.

The table below shows what analytics patterns each tier enables:

| Analytics pattern | `full` | `metadata_only` | `customer_enriched` |
|---|---|---|---|
| Cohort by topic | Yes | No | Yes (your labels) |
| Cohort by task failure | Yes | No | No |
| Cohort by quality score | Yes | No | Yes (your scores) |
| Retention by AI engagement | Yes | Yes | Yes |
| Behavioral pattern detection (retry_storm, etc.) | Yes | No | No |
| Cost analytics | Yes | Yes | Yes |

In `full` mode, message content is automatically chunked to fit within Amplitude's per-property size limits (up to 8KB per message, with balanced head/tail truncation for longer content). See [Content Storage](#content-storage-llm_message-chunking) for details.

```python
from amplitude_ai import AmplitudeAI, AIConfig, ContentMode

# Full (default) -- raw content, server enrichments enabled
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(content_mode=ContentMode.FULL))

# Metadata only -- no content at all
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(content_mode=ContentMode.METADATA_ONLY))

# Customer enriched -- you provide your own classifications
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(content_mode=ContentMode.CUSTOMER_ENRICHED))

# PII redaction (works with any mode -- strips emails, phone numbers, credit cards, SSNs)
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(redact_pii=True))
```

**`AIConfig` options (complete surface):**

| Name | Type | Default | Description |
|------|------|---------|-------------|
| `content_mode` | `ContentMode` | `FULL` | Privacy tier. See above. |
| `redact_pii` | `bool` | `False` | Scrub emails, phone numbers, credit cards, SSNs before sending. |
| `custom_redaction_patterns` | `list[str]` | `[]` | Additional regex patterns to redact when `redact_pii=True`. |
| `debug` | `bool` | `False` | Print colored one-line event summaries to stderr. See [Developer Experience](#developer-experience). |
| `dry_run` | `bool` | `False` | Validate and print events without sending to Amplitude. |
| `validate` | `bool` | `False` | Raise `ValidationError` on bad inputs instead of silently continuing. |
| `on_event_callback` | `Callable` | `None` | Per-event delivery callback: `(event, status_code, message) -> None`. |

**OTEL bridge privacy (two-gate model):** If you're using the [OTEL GenAI Bridge](#otel-genai-bridge-reference), the OTEL GenAI spec marks message content as Opt-In. Our `PrivacyConfig` acts as a second gate, so you control exactly what reaches Amplitude regardless of what your OTEL source captures:

| Customer intent | OTEL tool setting | Amplitude PrivacyConfig | Result |
|---|---|---|---|
| Maximum insight | Content capture ON | `full` | Content + server enrichments in Amplitude |
| No conversation text in Amplitude | Content capture ON | `metadata_only` | Amplitude receives model, tokens, cost, latency (no message text) |
| No content anywhere | Content capture OFF | any | No content in the span to begin with |
| Own classifications | Content capture OFF | `customer_enriched` + `track_session_enrichment()` | Your structured labels in Amplitude, no raw content |

---

### Bound Agents & Sessions

**Bound Agents.** `ai.agent()` creates a pre-configured handle that carries context fields so you never repeat them:

```python
agent = ai.agent(
    "support-bot",
    user_id="user-1",
    agent_version="v4.2",
    env="production",
    context={
        "experiment_variant": "prompt-v2-treatment",
        "prompt_revision": "abc123",
    },
)

# Every call inherits user_id, agent_id, agent_version, env, context
msg = agent.track_user_message(content="How do I set up a funnel?", session_id="s1")
ai_msg = agent.track_ai_message(
    content="To create a funnel...",
    session_id="s1",
    model="gpt-4o",
    provider="openai",
    latency_ms=450.0,
    input_tokens=120,
    output_tokens=340,
)
agent.score(name="user-feedback", value=1.0, target_id=ai_msg)
```

Explicit kwargs always override bound defaults:

```python
# Uses "override-agent" for this call only, not "support-bot"
agent.track_ai_message(agent_id="override-agent", ...)
```

**Child Agents.** For multi-agent orchestration, `child()` creates a new handle that inherits `env`, `session_id`, `trace_id`, and `groups` from the parent. It automatically sets `parent_agent_id`:

```python
orchestrator = ai.agent("orchestrator", env="production")

researcher = orchestrator.child("researcher")
# researcher.agent_id = "researcher"
# researcher.parent_agent_id = "orchestrator"  (automatic)
# researcher.env = "production"                 (inherited)

executor = researcher.child("executor")
# executor.parent_agent_id = "researcher"       (chains correctly)
```

**Session Context Manager.** `agent.session()` returns a context manager that auto-calls `track_session_end` when the block exits (even on exception):

```python
agent = ai.agent("support-bot", env="prod")

with agent.session(user_id="u1") as s:
    s.new_trace()                          # auto-generate trace_id (UUID)
    msg = s.track_user_message(content="How do I set up a funnel?")
    ai_msg = s.track_ai_message(
        content="To create a funnel...",
        model="gpt-4o",
        provider="openai",
        latency_ms=450.0,
    )
    s.score(name="user-feedback", value=1.0, target_id=ai_msg)
# session auto-ended here -- track_session_end("sess-1") called automatically
```

Switch traces mid-session, set enrichments, auto-generate session IDs:

```python
with agent.session() as s:                     # auto-generated UUID session_id
    t1 = s.new_trace()
    s.track_user_message(content="First question")
    s.track_ai_message(content="Answer 1", model="gpt-4o", provider="openai", latency_ms=200)

    t2 = s.new_trace()                         # new trace for a follow-up
    s.track_user_message(content="Follow-up question")
    s.track_ai_message(content="Answer 2", model="gpt-4o", provider="openai", latency_ms=150)

    s.set_enrichments(SessionEnrichments(overall_outcome="response_provided"))
# session auto-ended with enrichments
```

Works with async too:

```python
async with agent.session("sess-1") as s:
    s.new_trace()
    ...
```

How it works: `Session.__enter__()` publishes a `ContextVar` with the active session/agent context. Provider wrappers and the OTEL bridge read this `ContextVar` and auto-fill any missing fields. `Session.__exit__()` restores the previous context. This is the same pattern used by OpenTelemetry and works correctly with threads and `asyncio`.

---

### Context Dict Conventions

The `context` parameter on `ai.agent()` accepts an arbitrary `dict[str, Any]` that is JSON-serialized and attached to every event as `[Agent] Context`. This is the recommended way to add segmentation dimensions without requiring new global properties.

**Recommended keys:**

| Key | Example Values | Use Case |
|-----|---------------|----------|
| `agent_type` | `"planner"`, `"executor"`, `"retriever"`, `"router"`, `"evaluator"` | Filter/group analytics by agent role in multi-agent systems. Build charts like "latency by agent type" or "error rate by agent role." |
| `experiment_variant` | `"control"`, `"treatment-v2"`, `"prompt-rewrite-a"` | Segment AI sessions by A/B test variant. Compare quality scores, abandonment rates, or cost across experiment arms. See note below. |
| `feature_flag` | `"new-rag-pipeline"`, `"reasoning-model-enabled"` | Track which feature flags were active during the session. Correlate flag states with quality regressions. |
| `surface` | `"chat"`, `"search"`, `"copilot"`, `"email-draft"` | Identify which UI surface or product area triggered the AI interaction. Build per-surface quality dashboards. |
| `prompt_revision` | `"v7"`, `"abc123"`, `"2026-02-15"` | Track which prompt version was used. Detect prompt regression when combined with `agent_version`. |
| `deployment_region` | `"us-east-1"`, `"eu-west-1"` | Segment by deployment region for latency analysis or compliance tracking. |
| `canary_group` | `"canary"`, `"stable"` | Identify canary vs. stable deployments for progressive rollout monitoring. |

**Example:**

```python
agent = ai.agent(
    "support-bot",
    user_id="u1",
    agent_version="4.2.0",
    context={
        "agent_type": "executor",
        "experiment_variant": "reasoning-enabled",
        "surface": "chat",
        "feature_flag": "new-rag-pipeline",
    },
)

# All events from this agent (and its sessions, child agents, and provider
# wrappers) will include [Agent] Context with these keys.
```

**Context merging in child agents:**

```python
parent = ai.agent("orchestrator", context={"experiment_variant": "treatment", "surface": "chat"})
child = parent.child("researcher", context={"agent_type": "retriever"})
# child.context == {"experiment_variant": "treatment", "surface": "chat", "agent_type": "retriever"}
# Child keys override parent keys; parent keys absent from the child are preserved.
```

**Querying in Amplitude:** The `[Agent] Context` property is a JSON string. Use Amplitude's JSON property parsing to extract individual keys for charts, cohorts, and funnels. For example, group by `[Agent] Context.agent_type` to see metrics by agent role.

> **Note on `experiment_variant` and server-generated events:** Context keys appear on all SDK-emitted events (`[Agent] User Message`, `[Agent] AI Response`, etc.). Server-generated events (`[Agent] Session Evaluation`, `[Agent] Score` with `source="ai"`) do not yet inherit context keys. To segment server-generated quality scores by experiment arm, use Amplitude Derived Properties to extract from `[Agent] Context` on SDK events. First-class support is planned.

> **Why a dict instead of first-class fields?** Context is a dict for flexibility without schema migrations. Adding a new segmentation dimension takes one line of code, not a data catalog update. First-class properties exist for universal, stable dimensions (`agent_id`, `agent_version`, `env`). The context dict exists for customer-specific, evolving dimensions (`experiment_variant`, `feature_flags`, `prompt_revision`). Adding dedicated event properties for each dimension would consume global property slots, which are limited per organization. If usage patterns converge and the Amplitude product builds dedicated chart support for specific keys, they can be promoted to first-class fields later.

---

### Multi-Agent Patterns

The SDK supports multi-agent orchestration via `BoundAgent.child()` and `parent_agent_id`. Here are common patterns:

**Pattern 1: Linear delegation chain**

A simple pipeline where each agent hands off to the next:

```python
orchestrator = ai.agent("orchestrator", env="production")

with orchestrator.session(user_id="u1") as s:
    # Orchestrator decides to delegate to researcher
    researcher = orchestrator.child("researcher")
    with researcher.session(session_id=s.session_id) as rs:
        rs.track_user_message(content="Find pricing info")
        rs.track_ai_message(content="Found 3 articles...", model="gpt-4o",
                            provider="openai", latency_ms=200)

    # Researcher done, orchestrator delegates to writer
    writer = orchestrator.child("writer")
    with writer.session(session_id=s.session_id) as ws:
        ws.track_ai_message(content="Here is a summary...", model="gpt-4o",
                            provider="openai", latency_ms=300)

# Events show: orchestrator -> researcher -> writer
# Each agent's events carry its own agent_id and parent_agent_id
```

**Pattern 2: Fan-out / fan-in**

An orchestrator dispatches multiple sub-agents in parallel:

```python
orchestrator = ai.agent("orchestrator", context={"agent_type": "router"})

with orchestrator.session(user_id="u1") as s:
    s.track_user_message(content="Compare our pricing to competitors")

    # Fan out to parallel agents
    researcher_a = orchestrator.child("researcher-web", context={"agent_type": "retriever"})
    researcher_b = orchestrator.child("researcher-db", context={"agent_type": "retriever"})

    # Both share the same session and parent_agent_id="orchestrator"
    # Run in parallel (via asyncio, threads, etc.)
    # ...

    # Fan in: orchestrator synthesizes results
    s.track_ai_message(content="Based on research...", model="gpt-4o",
                       provider="openai", latency_ms=500)
```

**Pattern 3: Dynamic routing**

A router agent selects from a pool of specialist agents at runtime:

```python
router = ai.agent("router", context={"agent_type": "router"})

with router.session(user_id="u1") as s:
    user_msg = s.track_user_message(content="I need a refund")

    # Router decides based on intent
    intent = classify_intent(user_msg)
    specialist = router.child(f"specialist-{intent}", context={"agent_type": "executor"})

    with specialist.session(session_id=s.session_id) as ss:
        ss.track_ai_message(content="I can help with your refund...",
                            model="gpt-4o", provider="openai", latency_ms=400)
```

**Analytics this enables:**

- **Per-agent quality scores:** Filter `[Agent] Score` by `[Agent] Agent ID` to see which agents produce high-quality responses and which don't, across user feedback, automated evals, and server-generated rubric scores.
- **Cost attribution:** Group cost by `[Agent] Agent ID` to see which sub-agent is expensive relative to its quality contribution. Find the agent that accounts for 60% of token spend but only 20% of task completions.
- **Failure attribution:** When a multi-agent chain produces a bad outcome, per-agent quality scores help identify which agent introduced the failure. Filter `[Agent] Session Evaluation` sessions where `has_task_failure=True`, then drill into individual agent scores.
- **Handoff analysis:** Build funnels across agent boundaries using `[Agent] Parent Agent ID`: "orchestrator dispatches → researcher completes → writer delivers." Measure conversion and drop-off at each handoff.
- **Role-based dashboards:** Use `[Agent] Context.agent_type` (see [Context Dict Conventions](#context-dict-conventions)) to compare latency, error rate, and cost across agent roles (router, retriever, executor).

---

### Event Linking

Every tracking call returns a unique ID. Use these IDs to wire events into a graph:

| Method | Returns | ID Name |
|--------|---------|---------|
| `track_user_message()` | `str` | `message_id` |
| `track_ai_message()` | `str` | `message_id` |
| `track_tool_call()` | `str` | `invocation_id` |
| `track_embedding()` | `str` | `span_id` |
| `track_span()` | `str` | `span_id` |

Link events together:

```python
agent = ai.agent("support-bot", env="prod")

with agent.session(user_id="u1") as s:
    s.new_trace()

    # 1. User asks a question
    msg = s.track_user_message(content="Explain funnels")

    # 2. AI decides to call a tool -- link to the user message
    tool_inv = s.track_tool_call(
        tool_name="search_docs",
        latency_ms=85.0,
        success=True,
        parent_message_id=msg,              # ← links tool call to the user message
    )

    # 3. AI responds
    ai_msg = s.track_ai_message(
        content="A funnel measures conversion...",
        model="gpt-4o",
        provider="openai",
        latency_ms=450.0,
    )

    # 4. Score the AI response
    s.score(name="user-feedback", value=1.0, target_id=ai_msg)  # ← links score to AI response

    # 5. Nested spans for pipeline operations
    parent_span = s.track_span(span_name="rag_pipeline", latency_ms=200.0)
    child_span = s.track_span(
        span_name="vector_search",
        latency_ms=50.0,
        parent_span_id=parent_span,         # ← links span to parent
    )
```

---

### Scoring

Attach quality signals to any message or session. Covers user feedback, AI evals, and human reviews. Use `source` to distinguish origin (`"user"`, `"ai"`, `"reviewer"`).

```python
# User feedback (thumbs up/down on a specific response)
ai.score(user_id="user-1", name="user-feedback", value=1.0,
         target_id=ai_msg_id, target_type="message", source="user")

# Automated evaluation (LLM-as-judge)
ai.score(user_id="user-1", name="accuracy", value=0.92,
         target_id=ai_msg_id, source="ai", comment="Matches ground truth")

# Human review (internal review queue, RLHF labeling)
ai.score(user_id="reviewer-1", name="groundedness", value=0.8,
         target_id=ai_msg_id, source="reviewer", comment="Minor hallucination in step 3")

# Session-level rating
ai.score(user_id="user-1", name="csat", value=4.0,
         target_id="sess-1", target_type="session", source="user")
```

Common scoring patterns:

| Use Case | Example |
|----------|---------|
| User thumbs up/down | `score(name="user-feedback", value=1, target_type="message", source="user")` |
| Star rating (1-5) | `score(name="user-rating", value=4, target_type="message", source="user")` |
| LLM-as-judge eval | `score(name="accuracy", value=0.92, target_type="message", source="ai")` |
| Human reviewer | `score(name="quality", value=0.8, target_type="message", source="reviewer")` |
| Session-level CSAT | `score(name="csat", value=4, target_type="session", source="user")` |
| Server rubric score | Emitted automatically by enrichment pipeline with `source="ai"` for each configured rubric |

Each `score()` produces a `[Agent] Score` event. The server enrichment pipeline also emits `[Agent] Score` events with `source="ai"` for each configured rubric. User feedback, AI evals, and server-generated rubric scores all share the same event type, enabling unified queries across all quality signals in a single chart.

> **All quality signals in one event type.** User feedback (`source="user"`), human reviewer annotations (`source="reviewer"`), and automated rubric scores from the enrichment pipeline (`source="ai"`) all produce `[Agent] Score` events. A single chart shows all three side by side. No joins, no separate tables. Filter by `[Agent] Evaluation Source` to compare signal types. Filter by `[Agent] Agent ID` for per-agent quality attribution.

---

### Labeling and Tagging Messages

Attach custom key-value labels to any message event for filtering and segmentation in Amplitude. Labels are flexible; use whatever keys make sense for your product.

**Common use cases:**

- **Routing tags**: `flow`, `surface`, `experiment_variant`. Segment by where the message originated.
- **Classifier output**: `intent`, `sentiment`, `toxicity`. Attach ML classifier results with confidence scores.
- **Business context**: `tier`, `plan`, `feature_area`. Slice by customer attributes.

#### Inline Labels (at tracking time)

Pass `labels` when you already know the tags at tracking time:

```python
from amplitude_ai import MessageLabel

# Custom tags -- no confidence needed
msg_id = ai.track_user_message(
    user_id="user-1",
    content="How do I create a funnel?",
    session_id="sess-1",
    labels=[
        MessageLabel(key="flow", value="onboarding"),
        MessageLabel(key="surface", value="chat_widget"),
        MessageLabel(key="experiment", value="new_prompt_v2"),
    ],
)

# Classifier output -- include confidence scores
ai_msg_id = ai.track_ai_message(
    user_id="user-1",
    content="To create a funnel, go to...",
    session_id="sess-1",
    model="gpt-4o",
    provider="openai",
    latency_ms=300.0,
    labels=[
        MessageLabel(key="intent", value="how_to", confidence=0.94),
        MessageLabel(key="sentiment", value="neutral", confidence=0.88),
    ],
)
```

Labels are emitted as `[Agent] Message Labels` on the event. In Amplitude, filter or group by label key/value to build charts like "messages by intent" or "sessions where flow=onboarding".

#### Retrospective Labels (after the session)

When classifier results arrive after the session ends (e.g., from a background pipeline), attach them via `SessionEnrichments.message_labels`, keyed by the `message_id` returned from tracking calls:

```python
from amplitude_ai import SessionEnrichments, MessageLabel

enrichments = SessionEnrichments(
    message_labels={
        msg_id: [
            MessageLabel(key="intent", value="how_to", confidence=0.94),
        ],
        ai_msg_id: [
            MessageLabel(key="quality", value="good", confidence=0.91),
        ],
    },
)
ai.track_session_enrichment(user_id="user-1", session_id="sess-1", enrichments=enrichments)
```

---

### Enrichments

Session enrichments attach structured classifications to a completed session: topic categories, rubric scores, outcome labels, and behavioral flags. They work differently depending on your privacy configuration:

**When `content_mode` is `"full"`**, Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, rubric scores, behavioral flags, and session outcomes without writing or maintaining any eval code. The pipeline classifies sessions across configurable dimensions:

| Category | Description | Configurable |
|----------|-------------|--------------|
| **Quality Scores** | Task completion, response quality, user satisfaction, agent confusion (0-1 scores with rationales) | Rubrics customizable per org |
| **Safety** | Toxicity detection, prompt injection detection, content policy violations | Custom policies per org |
| **Emotions** | User emotion classification with trajectory tracking | Custom emotion taxonomy per org |
| **Dialog Acts** | Conversation patterns: complaints, requests, apologies, completions | Default taxonomy provided |
| **Behavioral Patterns** | Anti-patterns: retry storms, clarification loops, early abandonment | Fixed taxonomy |

Three event types are produced per session:

| Event | What It Contains | Cardinality |
|-------|-----------------|-------------|
| `[Agent] Session Evaluation` | Session summary: outcome, turn count, boolean flags (`has_task_failure`, `has_negative_feedback`), metadata | 1 per session |
| `[Agent] Topic Classification` | Category label per topic model (e.g., `query_intent`, `product_area`, `error_domain`) | 1 per topic model per session |
| `[Agent] Score` (ai) | Rubric score with rationale (e.g., `task_completion: 0.85`), `source="ai"` | 1 per rubric per session |

**Configurability:** Topic models, rubric definitions, safety policies, and emotion taxonomies are configurable per organization. The categories in the table above are defaults, not fixed. Contact your Amplitude team to customize which dimensions are evaluated and what category values are used.

**When do enrichments run?** Enrichment runs **asynchronously** after the session closes, not inline with your SDK calls. A session closes when you call `track_session_end()`, or after 30 minutes of inactivity if you don't. Enrichment events typically appear within minutes of session close. Calling `track_session_end()` explicitly is recommended because it ensures timely enrichment and lets you attach `SessionEnrichments` in the same call.

**When `content_mode` is `"metadata_only"` or `"customer_enriched"`**, server-side enrichment is not available (the pipeline needs raw text to classify content). Use `customer_enriched` with `track_session_enrichment()` to bridge this gap: run your own classifier in your environment, then send structured labels (topics, rubric scores, outcomes) to Amplitude. No raw content leaves your environment, but you get the same analytics power (cohorts, funnels, retention segmented by session quality) as customers using `full` mode. This is how you get full analytics value without sending content to Amplitude.

#### Defining Your Taxonomy

The topic model names, rubric names, and category values are **yours to define**. The examples below use values from Amplitude's internal taxonomy as a reference, but you should use whatever categories make sense for your product and agents.

```python
from amplitude_ai import (
    SessionEnrichments, TopicClassification, RubricScore,
    EvidenceQuote, MessageLabel,
)

enrichments = SessionEnrichments(
    # Topic models -- categorical labels for your sessions
    topic_classifications={
        # Single-select: what was the user trying to do?
        "query_intent": TopicClassification(l1="quantitative_diagnostic"),

        # Multi-select: which product areas were involved?
        "product_area": TopicClassification(
            values=["charts", "cohorts"], primary="charts",
            topics_covered=["charts", "cohorts", "funnels"],
            outcomes_by_topic={"charts": "response_provided", "funnels": "abandoned"},
        ),

        # L2 subcategory for finer classification
        "error_domain": TopicClassification(l1="TAX", l2="WRONG_EVENT"),
    },

    # Rubrics -- scored evaluation dimensions (0.0 to 1.0)
    rubrics=[
        RubricScore(name="task_completion", score=0.85),
        RubricScore(
            name="response_quality", score=0.92,
            rationale="Clear and accurate",
            evidence=[
                EvidenceQuote(quote="Here is how to build a funnel...", turn_index=2, role="assistant"),
            ],
            improvement_opportunities="Could include a screenshot link",
        ),
    ],

    # Session outcome
    overall_outcome="response_provided",  # or "abandoned", "escalated", etc.

    # Session-level scores
    quality_score=0.88,
    sentiment_score=0.75,

    # Boolean flags for quick filtering
    has_task_failure=False,
    has_negative_feedback=False,

    # Failure detail (when has_task_failure=True)
    # task_failure_type="unable_to_complete",
    # task_failure_reason="Data source not connected",

    # Agent chain metadata (multi-agent flows)
    agent_chain=["router", "analytics-agent"],
    root_agent_name="router",

    # Request classification
    request_complexity="moderate",

    # Supplementary data
    error_categories=["timeout"],
    behavioral_patterns=["multi_turn_refinement"],
    custom_metadata={"deployment": "canary-v2"},

    # Retrospective message labels (keyed by message_id)
    message_labels={
        "msg-uuid-1": [MessageLabel(key="intent", value="how_to", confidence=0.94)],
        "msg-uuid-2": [MessageLabel(key="quality", value="good", confidence=0.91)],
    },
)

# Attach when ending a session
ai.track_session_end(user_id="user-1", session_id="sess-1", enrichments=enrichments)

# Or send enrichments at any time from a background pipeline
ai.track_session_enrichment(user_id="user-1", session_id="sess-1", enrichments=enrichments)
# Note: each call creates a separate [Agent] Session Enrichment event (not an overwrite).
# Call multiple times for streaming enrichment -- e.g., topics first, then rubric scores later.
```

#### SessionEnrichments Dataclass

The `SessionEnrichments` dataclass uses the same vocabulary as Amplitude's enrichment taxonomy framework: **topic models** for categorical classification and **rubrics** for scored evaluation. This ensures `[Agent] Session Enrichment` events from the SDK have property naming consistent with the server-side `[Agent] Session Evaluation`, `[Agent] Topic Classification`, and `[Agent] Score` events.

```python
@dataclass
class MessageLabel:
    """A key-value label attached to a message event."""
    key: str                          # e.g., "intent", "flow", "sentiment"
    value: str                        # e.g., "how_to", "onboarding", "neutral"
    confidence: float | None = None   # Optional 0.0-1.0

@dataclass
class EvidenceQuote:
    """A quoted excerpt from the conversation supporting a rubric score."""
    quote: str                        # The quoted text
    turn_index: int                   # 0-based position in conversation
    role: str | None = None           # "user", "assistant", "tool"

@dataclass
class TopicClassification:
    """Result of classifying a session for a single topic model."""
    l1: str | None = None             # Single-select mode (MECE) — e.g., "quantitative_diagnostic"
    values: list[str] | None = None   # Multi-select mode — e.g., ["charts", "cohorts"]
    primary: str | None = None        # Primary value in multi-select — e.g., "charts"
    l2: str | None = None             # L2 subcategory — e.g., "WRONG_EVENT"
    topics_covered: list[str] | None = None   # All topics discussed
    outcomes_by_topic: dict[str, str] | None = None  # Outcome per topic

@dataclass
class RubricScore:
    """Result of scoring a session on a single rubric."""
    name: str                         # e.g., "task_completion", "response_quality"
    score: float                      # 0.0-1.0
    rationale: str | None = None      # Optional explanation
    evidence: list[EvidenceQuote] | None = None  # Supporting quotes
    improvement_opportunities: str | None = None  # Suggested improvements

@dataclass
class SessionEnrichments:
    # Topic models — categorical classification per topic model
    topic_classifications: dict[str, TopicClassification] | None = None

    # Rubrics — scored evaluation dimensions
    rubrics: list[RubricScore] | None = None

    # Outcome
    overall_outcome: str | None = None  # "response_provided", "abandoned", etc.

    # Session-level scores
    quality_score: float | None = None       # 0.0-1.0
    sentiment_score: float | None = None     # 0.0-1.0

    # Boolean flags
    has_task_failure: bool = False
    has_negative_feedback: bool = False
    has_data_quality_issues: bool = False
    has_technical_failure: bool = False

    # Failure detail
    task_failure_type: str | None = None     # e.g., "unable_to_complete"
    task_failure_reason: str | None = None   # Free-text explanation

    # Feedback and error detail
    negative_feedback_phrases: list[str] | None = None
    data_quality_issues: list[str] | None = None
    technical_error_count: int | None = None

    # Agent chain metadata
    agent_chain: list[str] | None = None     # Ordered agent delegation chain
    root_agent_name: str | None = None       # Entry-point agent

    # Request classification
    request_complexity: str | None = None    # "simple", "moderate", "complex", "ambiguous"

    # Supplementary data
    error_categories: list[str] | None = None
    behavioral_patterns: list[str] | None = None
    custom_metadata: dict[str, Any] | None = None  # Arbitrary customer-defined metadata
    schema_version: str = "2.0"

    # Retrospective message labels (keyed by message_id)
    message_labels: dict[str, list[MessageLabel]] | None = None
```

#### TopicClassification Fields

Topics classify sessions along a dimension you define. Use `l1` for single-select (one category per session) or `values` + `primary` for multi-select (session touches multiple areas):

| Field | Type | Description |
|-------|------|-------------|
| `l1` | `str` | Single-select category. E.g., `"quantitative_diagnostic"`, `"help_guidance"`, `"artifact_creation"`. |
| `values` | `list[str]` | Multi-select categories. E.g., `["charts", "cohorts", "experiments"]`. |
| `primary` | `str` | Primary value in multi-select. Must be one of `values`. |
| `l2` | `str` | Subcategory for finer classification. E.g., `"WRONG_EVENT"`, `"HALLUCINATION"`, `"UNDERSPECIFIED"`. |
| `topics_covered` | `list[str]` | All topics discussed in multi-topic sessions. |
| `outcomes_by_topic` | `dict[str, str]` | Outcome per topic. E.g., `{"charts": "response_provided", "funnels": "abandoned"}`. |

#### RubricScore Fields

Rubrics are scored evaluation dimensions. Define whatever rubrics matter for your use case:

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | `str` | Yes | Rubric name, e.g., `"task_completion"`, `"helpfulness"`, `"safety"`, `"groundedness"`. |
| `score` | `float` | Yes | 0.0 to 1.0. |
| `rationale` | `str` | No | Explanation for the score. Useful for debugging and auditing. |
| `evidence` | `list[EvidenceQuote]` | No | Quoted excerpts from the conversation supporting this score. |
| `improvement_opportunities` | `str` | No | Suggested improvements based on this evaluation. |

#### MessageLabel Fields

Labels are flexible key-value pairs for filtering and segmentation:

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `key` | `str` | Yes | Label key, e.g., `"intent"`, `"flow"`, `"sentiment"`, `"experiment"`. |
| `value` | `str` | Yes | Label value, e.g., `"how_to"`, `"onboarding"`, `"neutral"`. |
| `confidence` | `float` | No | Confidence score (0.0 to 1.0) when the label comes from a classifier. |

#### EvidenceQuote Fields

Quoted excerpts that support rubric scores:

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `quote` | `str` | Yes | The quoted text from the conversation. |
| `turn_index` | `int` | Yes | 0-based position in the conversation. |
| `role` | `str` | No | Role of the speaker (`"user"`, `"assistant"`, `"tool"`). |

#### Complete `customer_enriched` Example

End-to-end example for teams running their own classifiers. No conversation content leaves your environment; only structured labels:

```python
from amplitude import Amplitude
from amplitude_ai import (
    AmplitudeAI, SessionEnrichments, TopicClassification, RubricScore,
    EvidenceQuote, MessageLabel,
)

amplitude = Amplitude("YOUR_API_KEY")
ai = AmplitudeAI(amplitude=amplitude, content_mode="customer_enriched")
agent = ai.agent(agent_id="support-agent")

with agent.session(user_id="user-1") as s:
    msg_id = s.track_user_message(content="How do I create a funnel?")
    ai_msg_id = s.track_ai_message(
        content="To create a funnel...",
        model="gpt-4o", provider="openai", latency_ms=450.0,
    )

# After session: run your classifiers, then send structured labels
enrichments = SessionEnrichments(
    topic_classifications={
        "query_intent": TopicClassification(l1="how_to"),
    },
    rubrics=[
        RubricScore(
            name="task_completion", score=0.85,
            evidence=[EvidenceQuote(quote="To create a funnel...", turn_index=1, role="assistant")],
        ),
    ],
    overall_outcome="response_provided",
    quality_score=0.85,
    request_complexity="simple",
    message_labels={
        msg_id: [MessageLabel(key="intent", value="how_to", confidence=0.94)],
        ai_msg_id: [MessageLabel(key="quality", value="good", confidence=0.91)],
    },
)
ai.track_session_enrichment(user_id="user-1", session_id=s.session_id, enrichments=enrichments)
```

#### How Scores and Enrichments Relate

Scores (`score()`) and enrichments (`track_session_enrichment()`) coexist and serve different purposes:

| Concern | `score()` | `track_session_enrichment()` |
|---------|-----------|------------------------------|
| **Purpose** | Rate a specific message or session | Classify a session holistically |
| **Granularity** | Message-level or session-level | Session-level only |
| **Data shape** | Single name/value pair per call | Structured batch: topics + rubrics + outcomes + flags |
| **Source tracking** | Yes (`user`, `ai`, `reviewer`) | No (assumed system/customer) |
| **Primary use** | User feedback, automated evals, human annotations | `content_mode="customer_enriched"` flow; background pipelines |
| **Categorical data** | No (numeric only) | Yes (topic classifications, outcomes, behavioral patterns) |

Use enrichments for comprehensive session classification (topic models + rubrics + outcomes in one batch). Use `score()` for individual quality signals, especially from end-users or at the message level.

---

### Provider Wrappers

Drop-in replacements that automatically track every LLM call, including reasoning content from thinking models, system prompts, and model configuration (temperature, top_p, max_tokens, streaming mode).

#### OpenAI

```python
from amplitude_ai import OpenAI

client = OpenAI(amplitude=amplitude, api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is retention?"}],
    amplitude_user_id="user-1",
)
# [Agent] User Message + [Agent] AI Response tracked automatically
```

#### Anthropic

```python
from amplitude_ai import Anthropic

client = Anthropic(amplitude=amplitude, api_key="sk-ant-...")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain funnels"}],
    amplitude_user_id="user-1",
)
```

#### Google Gemini

```python
from amplitude_ai import Gemini

client = Gemini(amplitude=amplitude, api_key="...", model_name="gemini-2.0-flash")
response = client.generate_content("What are cohorts?", amplitude_user_id="user-1")
```

#### Azure OpenAI

```python
from amplitude_ai import AzureOpenAI

client = AzureOpenAI(
    amplitude=amplitude,
    azure_endpoint="https://your-resource.openai.azure.com",
    api_key="...",
    api_version="2024-02-01",
)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is retention?"}],
    amplitude_user_id="user-1",
)
```

#### AWS Bedrock

```python
from amplitude_ai import Bedrock

client = Bedrock(amplitude=amplitude, region_name="us-east-1")
response = client.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[{"role": "user", "content": [{"text": "Explain funnels"}]}],
    amplitude_user_id="user-1",
)
```

#### Mistral

```python
from amplitude_ai import Mistral

client = Mistral(amplitude=amplitude, api_key="...")
response = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "What are cohorts?"}],
    amplitude_user_id="user-1",
)
```

#### LangChain

```python
from amplitude_ai import create_amplitude_callback

callback = create_amplitude_callback(amplitude=amplitude, user_id="user-1")
# Pass as callback to any LangChain chain or agent
```

#### LlamaIndex

```python
from amplitude_ai import AmplitudeLlamaIndexHandler

handler = AmplitudeLlamaIndexHandler(amplitude=amplitude, user_id="user-1")
# Set as the global callback handler or pass to individual components
```

#### OpenAI Agents SDK

Tracing processor that plugs into the OpenAI Agents SDK's tracing system. Maps `GenerationSpanData` to `[Agent] AI Response`, `FunctionSpanData` to `[Agent] Tool Call`, and agent/handoff/guardrail spans to `[Agent] Span`.

```bash
pip install "amplitude-ai[openai-agents]"
```

```python
from agents import Agent, Runner, RunConfig
from amplitude import Amplitude
from amplitude_ai.integrations.openai_agents import AmplitudeTracingProcessor

amplitude = Amplitude("YOUR_API_KEY")
processor = AmplitudeTracingProcessor(
    amplitude=amplitude,
    user_id="user-1",
    agent_id="my-agent",
    env="production",
)

agent = Agent(name="support-bot", instructions="You are a helpful assistant.")
result = Runner.run_sync(
    agent,
    "What is retention?",
    run_config=RunConfig(tracing_processors=[processor]),
)
# All generations, tool calls, handoffs, and guardrail checks tracked automatically
```

#### Anthropic Tool Use Loop

Managed multi-turn tool_use loop that handles the Anthropic agentic pattern of repeated `tool_use` -> `tool_result` cycles. Tracks every turn automatically.

```python
from anthropic import Anthropic
from amplitude import Amplitude
from amplitude_ai.integrations.anthropic_tools import AmplitudeToolLoop

amplitude = Amplitude("YOUR_API_KEY")
client = Anthropic()

loop = AmplitudeToolLoop(
    amplitude=amplitude,
    client=client,
    user_id="user-1",
    tool_handlers={
        "get_weather": lambda city: f"72°F in {city}",
        "search": lambda query: f"Results for {query}",
    },
)

result = loop.run(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "What's the weather in SF?"}],
    tools=[{"name": "get_weather", "description": "Get weather", "input_schema": {...}}],
)
# Each turn emits [Agent] AI Response + [Agent] Tool Call events
# Loop stops when model returns stop_reason != "tool_use"
```

Supports async via `await loop.arun(...)` with `anthropic.AsyncAnthropic`.

#### CrewAI

Event listener hooks that capture CrewAI's LLM calls and tool usage across all agents in a crew.

```bash
pip install "amplitude-ai[crewai]"
```

```python
from crewai import Crew, Agent, Task
from amplitude import Amplitude
from amplitude_ai.integrations.crewai import AmplitudeCrewAIHooks

amplitude = Amplitude("YOUR_API_KEY")

with AmplitudeCrewAIHooks(amplitude=amplitude, user_id="user-1") as hooks:
    researcher = Agent(role="Researcher", goal="Find information", ...)
    writer = Agent(role="Writer", goal="Write content", ...)
    crew = Crew(agents=[researcher, writer], tasks=[...])
    result = crew.kickoff()
# All LLM calls and tool invocations across all agents tracked automatically
# Agent roles are captured as agent_id when no explicit agent_id is set
```

#### Reasoning Extraction

Provider wrappers auto-extract reasoning content from each provider's native response format:

| Provider | Extraction Method |
|----------|-------------------|
| OpenAI (o1, o3, etc.) | `response.choices[0].message.reasoning_content` |
| Anthropic (extended thinking) | Filter `response.content` for blocks with `type == "thinking"`, concatenate text |
| Google Gemini | Extract thinking parts from response |
| Mistral | `choice.message.reasoning_content` or typed content blocks |
| AWS Bedrock | Reasoning blocks from Bedrock Converse API response |

---

### OTEL GenAI Bridge Reference

The `AmplitudeAgentExporter` consumes any [OTEL GenAI semantic convention](https://opentelemetry.io/docs/specs/semconv/gen-ai/) spans and maps them to Amplitude `[Agent]` events. Works with any tool that emits standard OTEL GenAI spans:

- **OpenLIT**
- **Traceloop / OpenLLMetry**
- **OpenAI Python SDK** (with OTEL instrumentation enabled)
- Any manual OpenTelemetry instrumentation following the GenAI semantic conventions

> **Note on Langfuse:** Langfuse v3+ uses OTEL internally for transport and can *receive* OTEL traces as a backend. However, Langfuse's own SDK integrations (100+) use proprietary APIs, not standard OTEL GenAI spans. If you use an OTEL-native instrumentation library (OpenLIT, Traceloop) alongside Langfuse, the same GenAI spans can flow to both Langfuse and Amplitude simultaneously.

**Attribute mapping: OTEL GenAI semantic conventions to Amplitude properties:**

| OTEL GenAI Attribute | Amplitude Property | Notes |
|---|---|---|
| `gen_ai.operation.name` | (event routing) | `chat`/`text_completion`/`generate_content` -> User Message + AI Response; `embeddings` -> Embedding; `execute_tool` -> Tool Call; `invoke_agent`/`create_agent` -> AI Response with agent metadata |
| `gen_ai.response.model` | `[Agent] Model Name` | Preferred; often contains the versioned name (e.g., `gpt-4o-2024-11-20`) |
| `gen_ai.request.model` | `[Agent] Model Name` | Fallback when response model is absent |
| `gen_ai.provider.name` | `[Agent] Provider` | |
| `gen_ai.usage.input_tokens` | `[Agent] Input Tokens` | |
| `gen_ai.usage.output_tokens` | `[Agent] Output Tokens` | |
| (computed) input + output | `[Agent] Total Tokens` | Auto-summed when both present |
| `gen_ai.response.finish_reasons` | `[Agent] Finish Reason` | First element of array |
| `gen_ai.input.messages` | User message content | Last `role=user` message; respects `privacy_config` |
| `gen_ai.output.messages` | AI response content | First output message; respects `privacy_config` |
| `gen_ai.system_instructions` | `[Agent] System Prompt` | Respects `privacy_config` |
| `gen_ai.request.temperature` | `[Agent] Temperature` | |
| `gen_ai.request.max_tokens` | `[Agent] Max Output Tokens` | |
| `gen_ai.request.top_p` | `[Agent] Top P` | |
| `gen_ai.conversation.id` | `[Agent] Session ID` | Fallback when no ContextVar session active |
| `gen_ai.agent.id` | `[Agent] Agent ID` | `invoke_agent`/`create_agent` operations |
| `gen_ai.agent.name` | `[Agent] Agent Name` | `invoke_agent`/`create_agent` operations |
| `gen_ai.tool.name` | Tool name | `execute_tool` operations; falls back to span name parsing |
| `gen_ai.embeddings.dimension.count` | `[Agent] Embedding Dimensions` | `embeddings` operations only |
| `error.type` | `[Agent] Is Error` + `[Agent] Error Message` | |
| Span `trace_id` (hex) | `[Agent] Trace ID` | Fallback when no ContextVar trace_id active |
| Span duration (ns) | `[Agent] Latency Ms` | Computed from span timestamps |
| (computed) model + tokens | `[Agent] Cost USD` | Auto-calculated via built-in `genai-prices` database |
| `enduser.id` | `user_id` | Fallback when no SessionContext or default user_id |

**Not mapped by the bridge** (use native SDK provider wrappers for these): cache tokens, reasoning content/tokens, TTFB, streaming detection, event graph linking (`parent_message_id`).

**Scope filtering:** control which spans reach Amplitude:

```python
# Only process spans from specific instrumentation scopes
exporter = AmplitudeAgentExporter(
    amplitude=amplitude,
    user_id="user-123",
    allowed_scopes={"langfuse-sdk", "openai"},
)

# Or block infrastructure scopes
exporter = AmplitudeAgentExporter(
    amplitude=amplitude,
    user_id="user-123",
    blocked_scopes={"fastapi", "sqlalchemy", "psycopg"},
)
```

**OTEL bridge vs. native wrappers, field-level comparison:**

| Capability | OTEL Bridge | Native Wrapper |
|---|---|---|
| Model, provider, tokens, latency, errors | Yes | Yes |
| Cost calculation | Yes (basic) | Yes (cache-aware, 2-5x more accurate for prompt-cached workloads) |
| System prompt, temperature, top_p | Yes | Yes |
| Message content (when opted-in) | Yes | Yes |
| Cache read / creation tokens | No (not in OTEL spec) | Yes |
| Reasoning content and tokens | No (not in OTEL spec) | Yes |
| Time to first byte (TTFB) | No | Yes |
| Streaming detection | No | Yes |
| Event graph linking (parent_message_id) | No | Yes |
| Session / agent context | With `agent.session()` | With `agent.session()` |

---

## API Reference

All tracking methods return a UUID string (`message_id`, `invocation_id`, or `span_id`) for event linking.

| Category | Methods |
|---|---|
| Messages | `track_user_message(user_id, content, session_id, labels=..., ...) -> str` | 
| | `track_ai_message(user_id, content, session_id, model, provider, latency_ms, labels=..., ...) -> str` |
| Operations | `track_tool_call(user_id, tool_name, latency_ms, success, ...) -> str` |
| | `track_embedding(user_id, model, provider, latency_ms, ...) -> str` |
| | `track_span(user_id, span_name, trace_id, latency_ms, ...) -> str` |
| Sessions | `track_session_end(user_id, session_id, ...)` |
| | `track_session_enrichment(user_id, session_id, enrichments)` |
| Scoring | `score(user_id, name, value, target_id, ...)` |
| Types | `MessageLabel(key, value, confidence=...)` (inline message labels) |
| | `EvidenceQuote(quote, turn_index, role=...)` (rubric evidence) |
| | `SessionEnrichments(...)` (structured session classifications) |
| Agents | `agent(agent_id, ...) -> BoundAgent` — pre-configured handle with inherited context |
| | `agent.child(agent_id, ...) -> BoundAgent` — inherits parent context, sets `parent_agent_id` |
| | `agent.session(session_id=None) -> Session` — context manager, auto-closes on exit |
| Utilities | `status() -> dict` — config, available providers, patched providers |
| | `tenant(customer_org_id, ...) -> TenantHandle` — multi-tenant factory for `BoundAgent` |
| Lifecycle | `flush()`, `shutdown()` |

---

## Usage Examples

### Messages

```python
msg_id = ai.track_user_message(
    user_id="user-1",
    content="How do I set up a funnel?",
    session_id="sess-1",
    trace_id="trace-1",
    turn_id=1,
    agent_id="support-agent",
    env="production",
)

ai_msg_id = ai.track_ai_message(
    user_id="user-1",
    content="To create a funnel chart...",
    session_id="sess-1",
    trace_id="trace-1",
    model="gpt-4o",
    provider="openai",
    latency_ms=450.0,
    input_tokens=120,
    output_tokens=340,
    total_tokens=460,
    total_cost_usd=0.0023,
    turn_id=2,
    agent_id="support-agent",
    env="production",
    ttfb_ms=85.0,
)
```

### Cache-Aware Cost Calculation

LLM providers cache repeated token prefixes (system prompts, tool definitions) at reduced rates. Pass cache breakdowns for accurate cost tracking. When `total_cost_usd` is omitted, the SDK auto-calculates with cache-aware pricing:

```python
ai_msg_id = ai.track_ai_message(
    user_id="user-1",
    content="Here's how to configure...",
    session_id="sess-1",
    model="claude-sonnet-4-20250514",
    provider="anthropic",
    latency_ms=800.0,
    input_tokens=5000,
    output_tokens=200,
    cache_read_tokens=4500,        # ~10% cost (Anthropic), ~50% (OpenAI)
    cache_creation_tokens=500,     # ~125% cost (Anthropic)
)
```

### Implicit Feedback

Track behavioral signals that indicate whether a response met the user's need, without requiring explicit ratings:

```python
# User asks a question
msg1 = ai.track_user_message(
    user_id="u1", content="How do I create a funnel?", session_id="s1",
)

# AI responds -- user copies the answer (positive signal)
ai_msg = ai.track_ai_message(
    user_id="u1", content="To create a funnel, go to...",
    session_id="s1", model="gpt-4o", provider="openai", latency_ms=300.0,
    was_copied=True,
)

# User regenerates (negative signal -- first response wasn't good enough)
msg2 = ai.track_user_message(
    user_id="u1", content="How do I create a funnel?",
    session_id="s1", is_regeneration=True,
)

# User edits their question (refining intent)
msg3 = ai.track_user_message(
    user_id="u1", content="How do I create a conversion funnel for signups?",
    session_id="s1", is_edit=True, edited_message_id=msg1,
)

# Session where user abandoned after the first exchange
ai.track_session_end(user_id="u1", session_id="s1", abandonment_turn=1)
```

### File Attachments

Track rich media (images, PDFs, audio, video) without sending file content through the SDK. Include a `url` pointing to the resource on your own infrastructure (CDN, S3, internal docs system) and the LLM session viewer renders it on-the-fly in the reviewer's browser. Amplitude never stores or proxies the file; the browser fetches directly from your URL using your existing network access and auth.

If the URL is live when someone reviews the session, they see the full resource inline. If it has expired or is unreachable, the viewer falls back to the filename and type.

```python
# Image -- session viewer renders it inline from your CDN
s.track_user_message(
    content="What's wrong with this error?",
    attachments=[{
        "type": "image",
        "name": "error_screenshot.png",
        "url": "https://cdn.example.com/uploads/error_screenshot.png",
        "mime_type": "image/png",
    }],
)

# PDF -- session viewer opens it in an embedded viewer from your docs system
s.track_user_message(
    content="Summarize the key risks in this contract",
    attachments=[{
        "type": "pdf",
        "name": "vendor_agreement_v3.pdf",
        "url": "https://docs.internal.example.com/contracts/vendor_agreement_v3.pdf",
        "mime_type": "application/pdf",
        "page_count": 23,
        "department": "legal",
    }],
)

# Multiple attachments, mixed types
s.track_user_message(
    content="Compare these datasets and explain the chart",
    attachments=[
        {"type": "csv", "name": "sales_2025.csv", "url": "https://s3.example.com/data/sales_2025.csv"},
        {"type": "csv", "name": "sales_2024.csv", "url": "https://s3.example.com/data/sales_2024.csv"},
        {"type": "image", "name": "revenue_chart.png", "url": "https://cdn.example.com/charts/revenue.png"},
    ],
)

# AI-generated attachment (works on track_ai_message too)
s.track_ai_message(
    content="Here's the visualization you requested",
    model="gpt-4o", provider="openai", latency_ms=3200,
    attachments=[{
        "type": "image",
        "name": "forecast_chart.png",
        "url": "https://cdn.example.com/generated/forecast_chart.png",
        "mime_type": "image/png",
    }],
)
```

The attachment dict is free-form. Add any extra keys you need (`size_bytes`, `page_count`, `duration_seconds`, `department`, `internal_doc_id`). The SDK extracts `type` for aggregate analytics; everything else is serialized as-is into the event and available to viewers and downstream consumers.

### Multi-Agent & Multi-Tenant

**With BoundAgent** (recommended):

```python
orchestrator = ai.agent("orchestrator", agent_version="v4.2", env="prod", customer_org_id="acme-123")
billing = orchestrator.child("billing-agent")  # inherits agent_version="v4.2"

with billing.session(user_id="u1") as s:
    s.new_trace()
    s.track_user_message(content="Check my billing status")
    s.track_ai_message(content="Your balance is...", model="gpt-4o", provider="openai", latency_ms=200)
```

**With `ai.tenant()`** (multi-tenant shorthand):

For platforms serving multiple customers, `ai.tenant()` pre-fills `customer_org_id` and `groups` on every agent:

```python
tenant = ai.tenant("acme-corp", groups={"company": "acme-corp"}, env="production")

support_bot = tenant.agent("support-bot", user_id="u1")
billing_bot = tenant.agent("billing-bot", user_id="u1")
# Both agents inherit customer_org_id="acme-corp" and groups automatically
```

Explicit kwargs on `tenant.agent()` override the defaults.

**Without BoundAgent** (manual, same result):

```python
msg_id = ai.track_user_message(
    user_id="user-1",
    content="Check my billing status",
    session_id="sess-1",
    trace_id="trace-1",
    agent_id="billing-agent",
    parent_agent_id="orchestrator",
    customer_org_id="cust-acme-123",
    env="production",
)
```

### A/B Testing with Context

The `context` dict lets you attach experiment variants, feature flags, prompt revisions, and any other segmentation dimension to every event:

```python
# Variant assigned at session start (e.g., from your experiment framework)
variant = get_experiment_variant(user_id, "prompt-rewrite-v2")

agent = ai.agent(
    "support-bot",
    user_id="u1",
    env="production",
    context={
        "experiment_variant": variant,          # "control" or "treatment"
        "feature_flags": {"rag_v2": True},
        "prompt_revision": "abc123",
    },
)

# All events in this session carry the context -- segment quality metrics
# by experiment_variant in Amplitude charts and cohorts
with agent.session() as s:
    s.track_user_message(content="How do I set up billing?")
    s.track_ai_message(content="...", model="gpt-4o", provider="openai", latency_ms=300)
```

Child agents merge context (child keys override parent keys):

```python
orchestrator = ai.agent("orchestrator",
                         context={"experiment_variant": "treatment"})
researcher = orchestrator.child("researcher",
                                 context={"sub_experiment": "rag-rerank"})
# researcher.context == {"experiment_variant": "treatment", "sub_experiment": "rag-rerank"}
```

### Tool Calls

```python
inv_id = ai.track_tool_call(
    user_id="user-1",
    tool_name="search_docs",
    latency_ms=85.0,
    success=True,
    session_id="sess-1",
    trace_id="trace-1",
    turn_id=3,
    input={"query": "funnel setup"},
    output="Found 3 matching docs...",
    parent_message_id=ai_msg_id,     # links this tool call to the AI response
    agent_id="support-agent",
    env="production",
)
```

### Embeddings

```python
span_id = ai.track_embedding(
    user_id="user-1",
    model="text-embedding-3-small",
    provider="openai",
    latency_ms=25.0,
    input_tokens=45,
    dimensions=1536,
    total_cost_usd=0.00001,
    session_id="sess-1",
)
```

### Generic Spans

Track any pipeline operation (vector search, rerank, guardrails, retrieval, etc.):

```python
span_id = ai.track_span(
    user_id="user-1",
    span_name="vector_search",
    trace_id="trace-1",
    latency_ms=120.0,
    input_state={"query": "funnel setup", "top_k": 10},
    output_state={"results_count": 3},
    session_id="sess-1",
)
```

### `@tool` Decorator

Automatically track function calls as `[Agent] Tool Call` events:

```python
from amplitude_ai import tool

@tool(amplitude=amplitude)
def search_knowledge_base(query: str) -> str:
    """Search the knowledge base for relevant articles."""
    return "Found 3 results..."

# Every call tracked with latency, input, output, and success status
result = search_knowledge_base(query="retention", amplitude_user_id="user-1")
```

Works with `async` functions too. The decorator detects coroutines automatically:

```python
@tool(amplitude=amplitude)
async def fetch_user_profile(user_id: str) -> dict:
    """Fetch user profile from the API."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"/users/{user_id}")
        return resp.json()

# Tracked identically to sync — latency, input, output, success
profile = await fetch_user_profile(user_id="u-123", amplitude_user_id="user-1")
```

### `@observe` Decorator

Track any function as a `[Agent] Span` event with automatic latency measurement, error capture, and session lifecycle management:

```python
from amplitude_ai import observe

@observe
def summarize_document(text: str) -> str:
    """Summarize a document using an LLM pipeline."""
    chunks = chunk_text(text)
    summaries = [call_llm(chunk) for chunk in chunks]
    return combine_summaries(summaries)

result = summarize_document(long_text)
# Tracked: span_name="summarize_document", latency, input/output state
```

Sessions are handled automatically: `@observe` joins an active session if one exists, or creates and closes its own. Nested calls share the outer session. Use `@observe(name="custom-span-name")` to override the function name. Async functions are detected automatically. In `metadata_only` mode, only function name, latency, and error status are captured.

```python
@observe
def pipeline(query):
    step1(query)    # @observe — attaches to pipeline's session, not a new one
    step2(query)    # @observe — same session, same trace
```

### Common Recipes

**Tracking agent actions and side effects.** When your agent takes real-world actions (issuing refunds, sending emails, creating tickets), use `track_span()`. The `span_name` is the action type, `output_state` carries the result, and `is_error` captures failures:

```python
# Agent issues a refund via Stripe
span_id = s.track_span(
    span_name="issue_refund",
    latency_ms=340.0,
    input_state={"order_id": "ord-789", "amount": 49.99},
    output_state={"transaction_id": "txn_abc", "success": True},
)

# Agent sends an email via SendGrid
span_id = s.track_span(
    span_name="send_email",
    latency_ms=120.0,
    input_state={"template": "refund_confirmation", "recipient": "user@example.com"},
    output_state={"message_id": "sg-456", "success": True},
)

# Failed action
span_id = s.track_span(
    span_name="create_ticket",
    latency_ms=2100.0,
    is_error=True,
    error_message="Zendesk API rate limited",
    input_state={"subject": "Refund follow-up"},
)
```

Filter by `[Agent] Span Name` in Amplitude to build dashboards for action success rates, latency by target system, and error attribution.

**Tracking guardrails and safety checks.** Content filters, injection detection, and policy checks are spans too:

```python
span_id = s.track_span(
    span_name="content_filter",
    latency_ms=15.0,
    input_state={"check": "prompt_injection"},
    output_state={"blocked": True, "reason": "injection_detected"},
    is_error=True,
    error_message="Prompt injection detected -- blocked",
)
```

**Tracking RAG pipelines.** Use nested spans to capture the full retrieval pipeline (embed, search, rerank) as a single traceable unit:

```python
rag_span = s.track_span(span_name="rag_pipeline", latency_ms=280.0)

embed_id = s.track_embedding(
    model="text-embedding-3-small",
    provider="openai",
    input_tokens=8,
    latency_ms=45.0,
)

search_span = s.track_span(
    span_name="vector_search",
    latency_ms=90.0,
    parent_span_id=rag_span,
    input_state={"query": "billing setup", "top_k": 10},
    output_state={"results_count": 5, "best_score": 0.94},
)

rerank_span = s.track_span(
    span_name="rerank",
    latency_ms=60.0,
    parent_span_id=rag_span,
    input_state={"candidates": 5},
    output_state={"kept": 3},
)
```

**Connecting AI events to business outcomes.** Your existing product events already track business outcomes. Because AI events share the same `user_id`, you build cross-product funnels directly in Amplitude. No dedicated "goal" event needed:

```
[Agent] User Message  ->  [Agent] AI Response  ->  Purchase Completed
```

Build a cohort of users whose AI sessions scored above 0.8 on `task_completion` and compare their conversion rate to everyone else. The funnel builder connects AI sessions to any downstream product event.

---

## Event Schema

The SDK produces 8 event types, all prefixed with `[Agent]`.

### SDK Events

| Event | Method | Description |
|-------|--------|-------------|
| `[Agent] User Message` | `track_user_message()` | Session/trace/turn IDs, message content, agent IDs, implicit feedback (regeneration, edit), file attachments |
| `[Agent] AI Response` | `track_ai_message()` | Model, provider, latency, tokens (including reasoning), cost, finish reason, reasoning content, system prompt, model config, copy signal |
| `[Agent] Tool Call` | `track_tool_call()` | Tool name, input/output, latency, success status |
| `[Agent] Embedding` | `track_embedding()` | Model, provider, latency, tokens, vector dimensions, cost |
| `[Agent] Span` | `track_span()` | Generic operation tracking (name, input/output state, latency, parent span hierarchy) |
| `[Agent] Session End` | `track_session_end()` | Explicit session close with optional enrichments, abandonment tracking |
| `[Agent] Session Enrichment` | `track_session_enrichment()` | Customer-provided session classifications. Distinct from server-side `[Agent] Session Evaluation`. |
| `[Agent] Score` | `score()` | Quality signal attached to a message or session (user feedback, automated evals, human annotations) |

### Server-Side Events (automatic)

| Event | Description |
|-------|-------------|
| `[Agent] Session Evaluation` | Session-level summary: outcome, turn count, flags (`has_task_failure`, `has_negative_feedback`), metadata |
| `[Agent] Topic Classification` | One event per configured topic model per session: `model_name`, `l1`, `l2`, `values` |
| `[Agent] Score` (reused) | One event per configured rubric per session, with `[Agent] Evaluation Source = "ai"` |

### How Behavioral Signals Become Analytics

The SDK captures **behavioral facts** at the application layer, and when `content_mode="full"`, server-side enrichment detects patterns across the full session. Both converge into the same charts, cohorts, and funnels:

| SDK call | What appears in Amplitude | What you can build |
|---|---|---|
| `track_user_message(is_regeneration=True)` | `[Agent] User Message` with `is_regeneration=True` + `[Agent] Session Evaluation` with `behavioral_patterns=["retry_storm"]` | Cohort of frustrated users -> target with Guide -> measure churn delta |
| `track_ai_message(was_copied=True)` | `[Agent] AI Response` with `was_copied=True` | Copy rate as positive quality signal, no explicit rating required |
| `score(source="user", value=1.0)` | `[Agent] Score` with `source="user"` | Single chart: user feedback, LLM-as-judge, and human annotations side by side |

All content properties respect the configured `content_mode`. See [Privacy & Content Control](#privacy--content-control) for tier details.

---

## Testing

`MockAmplitudeAI` is a drop-in replacement that captures events in-memory instead of sending them over the network. It supports all SDK features including `agent()` and `session()`:

```python
from amplitude_ai import MockAmplitudeAI

mock = MockAmplitudeAI()
agent = mock.agent("test-bot", user_id="u1")

with agent.session("s1") as s:
    s.new_trace()
    s.track_user_message(content="Hello")
    s.track_ai_message(content="Hi!", model="gpt-4o", provider="openai", latency_ms=100.0)

assert len(mock.events) == 3  # user msg + ai msg + session end
mock.assert_event_tracked("[Agent] User Message", user_id="u1")
mock.assert_event_tracked("[Agent] AI Response", **{"[Agent] Model Name": "gpt-4o"})
mock.reset()
```

Filter and assert by session or agent:

```python
mock.events_for_session("s1")          # list of events for that session
mock.events_for_agent("test-bot")      # list of events for that agent
mock.assert_session_closed("s1")       # assert [Agent] Session End exists
```

### Disabling Tracking in Tests

If you don't need to assert on events and just want tracking to be a no-op, use `MockAmplitudeAI` without inspecting `events`. It never makes network calls. Alternatively, skip SDK initialization entirely in your test config.

---

## Serverless Environments

> **Failing to call `flush()` before the handler returns is the #1 integration error in serverless environments.** The process terminates before the event buffer is sent, and events are silently lost. This applies to AWS Lambda, Cloud Functions, Azure Functions, and any runtime where the process lifecycle is managed by the platform.

Always call `flush()` before returning:

```python
def handler(event, context):
    agent = ai.agent("lambda-bot")
    with agent.session(user_id=event["user_id"]) as s:
        s.new_trace()
        s.track_user_message(content=event["message"])
        ai_msg = s.track_ai_message(
            content=generate_response(event["message"]),
            model="gpt-4o", provider="openai", latency_ms=500.0,
        )
    ai.flush()  # block until all events are delivered
    return {"statusCode": 200}
```

`flush()` returns a list of `Future` objects. Call `.result()` on each to block until delivery completes. For long-running servers, the Amplitude SDK flushes automatically on a timer; explicit `flush()` is only needed for short-lived processes.

---

## Error Handling and Reliability

- **Tracking calls never throw.** All `track_*` methods catch exceptions internally and log errors. Your application code won't break if Amplitude is unreachable or if you pass unexpected values. Events that fail to send are logged at `ERROR` level but do not propagate exceptions to your code.

- **Events are buffered and retried automatically.** The SDK delegates to the [Amplitude Python SDK](https://github.com/amplitude/Amplitude-Python)'s event pipeline, which buffers events in memory and retries failed deliveries with exponential backoff. You don't need to implement retry logic.

- **Delivery status callback.** Use `on_event_callback` in `AIConfig` to monitor delivery status per event:

```python
def on_event(event, code, message):
    if code != 200:
        logging.warning(f"Event delivery failed: {code} {message}")

ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(on_event_callback=on_event))
```

- **`flush()` blocks until delivery.** Returns `Future` objects; call `.result()` to block until all buffered events are sent. Required in serverless environments; optional in long-running processes where the SDK flushes automatically on a timer.

- **`shutdown()` for clean exit.** Call `ai.shutdown()` when your application exits to flush remaining events and release resources. Only necessary if the SDK created the Amplitude instance internally (via `api_key=`); if you passed in your own `amplitude=` instance, manage its lifecycle yourself.

---

## Under the Hood

Built on the official [Amplitude-Python](https://github.com/amplitude/Amplitude-Python) SDK (`Amplitude`, `BaseEvent`, `track()`, `flush()`, `shutdown()`).

### Content Storage (`$llm_message` Chunking)

Message content (user messages and AI responses) is stored in a `$llm_message` nested property inside the event. The Amplitude SDK truncates individual string properties at 1024 characters. To preserve full content, the SDK automatically chunks text into multiple sub-properties:

- **Short content** (<=1024 chars): stored as `{"text": "..."}` (no chunking, fully backward-compatible).
- **Long content** (>1024 chars): split into `{"c0": "...", "c1": "...", ..., "n": chunk_count}`. Up to 8 chunks of 1024 chars each (8KB max).
- **Truncated content** (>8KB): the first 4 chunks hold the beginning and the last 4 hold the end, with a marker like `[...12,345 chars truncated...]` at the boundary. The original length is preserved in a `"len"` field.

The LLM session viewer and enrichment pipeline reassemble chunks automatically. If you query raw events directly (e.g., via SQL export), you'll see `c0`..`c7` sub-properties under `$llm_message` for long responses.

This chunking applies only in `full` content mode. In `metadata_only` mode, no content is sent. System prompts and reasoning content follow the same chunking rules.

### Context Propagation

The SDK uses Python's `contextvars` module to propagate session context (session ID, trace ID, agent ID, turn counter) across function calls. This is how provider wrappers, `@tool`, `@observe`, and the FastAPI middleware all share the same session without explicit parameter threading.

**How it works:**

- **Sync code**: context flows naturally through the call stack.
- **`asyncio.create_task()`**: Python 3.10+ automatically copies the parent context into the new task. No action needed; the child task inherits the active session.
- **`ThreadPoolExecutor`**: threads do **not** inherit `ContextVar` state. If you offload work to a thread pool within a session, you must explicitly copy the context:

```python
import contextvars
from concurrent.futures import ThreadPoolExecutor

with agent.session() as s:
    s.track_user_message(content="hello")
    ctx = contextvars.copy_context()
    with ThreadPoolExecutor() as pool:
        # ctx.run() ensures the thread sees the active session
        future = pool.submit(ctx.run, my_blocking_function, arg1, arg2)
```

Without `ctx.run()`, the thread sees no active session and provider wrappers fall back to the global `ToolCallTracker` config (if set).

**Nesting**: sessions nest correctly. An inner `with agent.session()` or `@observe` call saves and restores the outer context on exit. No session leaks.

### Debug Logging

The SDK uses the Amplitude SDK's built-in logger. Enable verbose output to see every event as it's tracked:

```python
import logging

# Option 1: Enable via Amplitude's configuration
amplitude.configuration.logger = logging.getLogger("amplitude")
amplitude.configuration.logger.setLevel(logging.DEBUG)

# Option 2: Set min_id_length to bypass validation in dev
amplitude.configuration.min_id_length = 1
```

### Supported Integrations

The SDK provides broad coverage across LLM providers, agent frameworks, and observability standards:

**Provider Wrappers** (drop-in replacements with full field coverage):

| Provider | Class | Install Extra | Key Capabilities |
|----------|-------|---------------|------------------|
| OpenAI | `OpenAI` | `[openai]` | Chat, streaming, reasoning (o1/o3/o4), function calling, prompt caching |
| Anthropic | `Anthropic` | `[anthropic]` | Messages, streaming, extended thinking, tool_use, prompt caching |
| Google Gemini | `Gemini` | `[gemini]` | Generate content, streaming, thinking models |
| Azure OpenAI | `AzureOpenAI` | `[azure]` | Same as OpenAI, Azure-hosted |
| AWS Bedrock | `Bedrock` | `[bedrock]` | Converse API, streaming, cross-provider (Claude, Titan, etc.) |
| Mistral | `Mistral` | `[mistral]` | Chat, streaming, function calling, reasoning |

**Framework Integrations** (callbacks and processors for popular frameworks):

| Framework | Class | Install Extra | Integration Pattern |
|-----------|-------|---------------|---------------------|
| LangChain | `AmplitudeCallbackHandler` | `[langchain]` | Callback handler for chains and agents |
| LlamaIndex | `AmplitudeLlamaIndexHandler` | `[llamaindex]` | Callback handler for queries and retrievals |
| OpenTelemetry | `AmplitudeAgentExporter` | `[otel]` | OTEL span exporter (GenAI semantic conventions) |
| OpenAI Agents SDK | `AmplitudeTracingProcessor` | `[openai-agents]` | Tracing processor for multi-agent workflows |
| Anthropic tool_use | `AmplitudeToolLoop` | `[anthropic]` | Managed multi-turn tool_use loop |
| CrewAI | `AmplitudeCrewAIHooks` | `[crewai]` | Event listener hooks for LLM and tool calls |

**Coverage summary:** The OTEL GenAI bridge provides baseline coverage for any OTEL-instrumented provider or framework. The 6 dedicated provider wrappers add full field coverage (cache tokens, reasoning, TTFB, streaming). The framework integrations capture agent-level structure (multi-agent, tool loops, handoffs). Together, this covers the vast majority of production LLM deployments.

### Package Structure

```
amplitude-ai/
├── amplitude_ai/
│   ├── __init__.py          # Public API
│   ├── client.py            # AmplitudeAI, BoundAgent, TenantHandle, Session
│   ├── config.py            # AIConfig, ContentMode
│   ├── context.py           # SessionContext, ContextVar propagation
│   ├── exceptions.py        # AmplitudeAIError, ValidationError, etc.
│   ├── middleware.py         # FastAPI/Starlette AmplitudeAIMiddleware
│   ├── patching.py          # Zero-code monkey-patching (patch_openai, etc.)
│   ├── testing.py           # MockAmplitudeAI
│   ├── wrappers.py          # wrap() convenience function
│   ├── core/
│   │   ├── tracking.py      # Event tracking functions
│   │   ├── privacy.py       # Privacy/redaction
│   │   ├── enrichments.py   # SessionEnrichments, TopicClassification, RubricScore
│   │   └── decorators.py    # @tool and @observe decorators
│   ├── providers/           # OpenAI, Anthropic, Gemini, Azure, Bedrock, Mistral
│   ├── integrations/        # LangChain, LlamaIndex, OTEL, OpenAI Agents, Anthropic tools, CrewAI
│   └── utils/               # Cost (genai-prices), tokens, streaming
└── README.md
```

---

## Requirements

- Python >= 3.10
- `amplitude-analytics >= 1.0.0`

Optional provider extras:

```
pip install "amplitude-ai[openai]"           # OpenAI / Azure OpenAI
pip install "amplitude-ai[anthropic]"        # Anthropic Claude (provider + tool_use loop)
pip install "amplitude-ai[gemini]"           # Google Gemini
pip install "amplitude-ai[bedrock]"          # AWS Bedrock
pip install "amplitude-ai[mistral]"          # Mistral
pip install "amplitude-ai[langchain]"        # LangChain
pip install "amplitude-ai[llamaindex]"       # LlamaIndex
pip install "amplitude-ai[otel]"             # OpenTelemetry
pip install "amplitude-ai[openai-agents]"    # OpenAI Agents SDK
pip install "amplitude-ai[crewai]"           # CrewAI
pip install "amplitude-ai[all]"              # Everything
```
