Metadata-Version: 2.4
Name: llmtracer-sdk
Version: 2.5.1
Summary: Auto-track LLM cost, latency, and usage. Two lines of code, every provider.
Author-email: LLM Tracer <hello@llmtracer.dev>
License: MIT
Project-URL: Homepage, https://llmtracer.dev
Project-URL: Documentation, https://llmtracer.dev/docs
Project-URL: Repository, https://github.com/llmtracer/llmtracer-python
Keywords: llm,observability,cost-tracking,openai,anthropic,tracing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Provides-Extra: openai
Requires-Dist: openai>=1.0.1; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20.0; extra == "anthropic"
Provides-Extra: all
Requires-Dist: openai>=1.0.1; extra == "all"
Requires-Dist: anthropic>=0.20.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"

# LLM Tracer — Python SDK

Track cost, latency, and token usage across OpenAI, Anthropic, and Google Gemini — in one line of code.

![version](https://img.shields.io/badge/version-2.5.1-blue)

## Install

```bash
pip install llmtracer-sdk
```

## Quick Start

```python
import llmtracer

llmtracer.init(api_key="lt_...")

# That's it. All OpenAI, Anthropic, and Google Gemini calls are now tracked automatically.
```

No wrappers, no callbacks, no code changes. The SDK auto-patches your provider clients at import time.

View your dashboard at [llmtracer.dev](https://llmtracer.dev).

## What Gets Captured

Every LLM call is automatically tracked with:

- **Provider, model, tokens** (input + output), latency, cost
- **Google Gemini**: thinking tokens (2.5 models), tool tokens, cached tokens
- **Anthropic**: cache creation + read tokens
- **OpenAI**: reasoning tokens (o1/o3/o4), cached tokens
- **Caller file, function, and line number**
- **Auto-flush on process exit** (no manual flush needed)

## Environment Variable Pattern

```python
import os
import llmtracer

llmtracer.init(
    api_key=os.environ["LLMTRACER_API_KEY"],
    debug=True,  # prints token counts to console
)
```

## Multi-App Tracking

If you have multiple services sharing an API key, set `app_name` to filter by application in the dashboard:

```python
llmtracer.init(api_key="lt_...", app_name="billing-service")
```

Or via environment variable:

```bash
export LLMTRACER_APP_NAME=billing-service
```

## Trace Context and Tags

```python
# Correct: pass tags as keyword arguments
with llmtracer.trace(feature="chat", user_id="u_sarah"):
    response = client.chat.completions.create(...)

# Also works (deprecated — emits DeprecationWarning):
with llmtracer.trace(tags={"feature": "chat"}):
    ...
```

Tags appear in the dashboard's Breakdown page and Top Tags card. Use them to answer questions like "which user costs the most?" or "which feature should I optimize?"

### Tagging Patterns

| Pattern | Tag | Example |
|---------|-----|---------|
| Track cost by feature | `feature` | `"chat"`, `"search"`, `"summarize"` |
| Track cost by user | `user_id` | `"u_sarah"`, `"u_mike"` |
| Track cost by customer (B2B) | `customer` | `"acme-corp"`, `"initech"` |
| Track cost by conversation | `conversation_id` | `"conv_abc123"` |
| Track environment | `env` | `"production"`, `"staging"` |

## Supported Providers

| Provider | Package | Auto-patched |
|----------|---------|-------------|
| OpenAI | `openai` | Yes |
| Anthropic | `anthropic` | Yes |
| Google Gemini | `google-genai` | Yes |

## LangChain Support

If you use LangChain with `ChatOpenAI`, `ChatAnthropic`, or `ChatGoogleGenerativeAI`, the underlying SDK calls are auto-captured. No callback handler needed — just `llmtracer.init()` and you're done.

## Configuration

| Option | Type | Default | Range | Description |
|---|---|---|---|---|
| `api_key` | `str` | *required* | — | Your LLM Tracer API key (starts with `lt_`) |
| `app_name` | `str` | `None` | — | Application name for multi-app filtering. Falls back to `LLMTRACER_APP_NAME` env var |
| `endpoint` | `str` | Production URL | — | Ingestion endpoint URL |
| `skip_exit_handlers` | `bool` | `False` | — | Skip atexit handler registration (for serverless environments) |
| `max_batch_size` | `int` | `50` | 1–500 | Max events per HTTP request |
| `flush_interval_s` | `float` | `5.0` | 1.0–60.0 | Auto-flush interval in seconds |
| `max_queue_size` | `int` | `1000` | 100–10000 | Max events in queue before dropping oldest |
| `max_retries` | `int` | `3` | 0–10 | Max retry attempts for failed flushes |
| `sample_rate` | `float` | `1.0` | 0.0–1.0 | Sampling rate. `0.5` captures ~50% of events |
| `debug` | `bool` | `False` | — | Enable debug logging to console |

All numeric options are validated on `init()`. Out-of-range values are replaced with the default, and a warning is logged when `debug=True`.

## Flushing Events

The SDK batches events and sends them in the background. In long-running processes (web servers, daemons), this is fully automatic. For short-lived scripts and serverless environments, you need to flush before the process exits.

### Auto-flush (long-running processes)

By default the SDK registers an `atexit` handler and flushes on process exit:

```python
import llmtracer

llmtracer.init(api_key="lt_...")

# Events are flushed automatically when the process exits
```

### Manual flush (serverless / short-lived)

Call `llmtracer.flush()` before returning from a handler or Lambda function:

```python
import llmtracer

llmtracer.init(api_key="lt_...", skip_exit_handlers=True)

def handler(event, context):
    response = client.chat.completions.create(...)
    llmtracer.flush()  # send before function returns
    return response
```

### pytest fixture

Wrap your test session with a flush to capture events from tests:

```python
import pytest
import llmtracer

@pytest.fixture(scope="session", autouse=True)
def flush_llmtracer():
    yield
    llmtracer.flush()
```

### SIGTERM handler (Cloud Run / Kubernetes)

```python
import signal
import llmtracer

def handle_sigterm(signum, frame):
    llmtracer.flush()
    raise SystemExit(0)

signal.signal(signal.SIGTERM, handle_sigterm)
```

## Debug Mode

Enable `debug=True` to print token counts to the console:

```python
llmtracer.init(api_key="lt_...", debug=True)
```

```
[llmtracer] openai gpt-4o | 1,247 in -> 384 out | $0.0094 | 1.2s
[llmtracer] anthropic claude-sonnet-4-5 | 2,100 in -> 512 out (cache_read: 1,800) | $0.0031 | 0.8s
[llmtracer] google gemini-2.5-pro | 900 in -> 280 out (thinking: 1,420) | $0.0067 | 2.1s
```

## Reliability

The SDK is designed to never interfere with your application:

- **Never throws** — all internal errors are swallowed silently (enable `debug=True` for visibility)
- **Batching** — events are queued and sent in batches of `max_batch_size`
- **Retry with backoff** — failed flushes are retried up to `max_retries` times with exponential backoff (`min(1.0 * 2^attempt, 30.0)`) plus random jitter (0–1.0s)
- **Drop after retries** — after `max_retries` consecutive failures, the batch is dropped to prevent unbounded memory growth
- **Queue overflow** — drops oldest events when the queue exceeds `max_queue_size`
- **Sampling** — set `sample_rate` below 1.0 to reduce volume in high-throughput environments

## Requirements

- Python 3.8+
- Works with any version of `openai`, `anthropic`, or `google-genai` SDKs

## Zero Dependencies

The core SDK uses only Python stdlib (`urllib.request`, `threading`, `hashlib`).

## License

MIT
