Metadata-Version: 2.4
Name: tokenome
Version: 0.1.0
Summary: Tokenome wrapper-client SDK for metadata-only LLM telemetry
Project-URL: Homepage, https://github.com/khodex-rei/tokenome-sdk
Project-URL: Repository, https://github.com/khodex-rei/tokenome-sdk
Project-URL: Documentation, https://github.com/khodex-rei/tokenome-sdk/tree/Development/docs
Project-URL: Issues, https://github.com/khodex-rei/tokenome-sdk/issues
Project-URL: Changelog, https://github.com/khodex-rei/tokenome-sdk/pulls?q=is%3Apr+is%3Amerged
Author: Khodex Rei
License-Expression: MIT
License-File: LICENSE
Keywords: anthropic,llm,observability,openai,sdk,telemetry,tokenome
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27.0
Provides-Extra: all
Requires-Dist: anthropic>=0.34.0; extra == 'all'
Requires-Dist: openai<3,>=2.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.34.0; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: isort>=5.13.0; extra == 'dev'
Requires-Dist: mypy>=1.11.0; extra == 'dev'
Requires-Dist: pip-audit>=2.7.0; extra == 'dev'
Requires-Dist: pre-commit>=3.8.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-json-report>=1.5.0; extra == 'dev'
Requires-Dist: pytest>=8.2.0; extra == 'dev'
Requires-Dist: ruff>=0.5.0; extra == 'dev'
Provides-Extra: openai
Requires-Dist: openai<3,>=2.0; extra == 'openai'
Description-Content-Type: text/markdown

# Tokenome Python SDK

Tokenome SDK is a wrapper-client telemetry SDK for LLM applications. It captures metadata (model, tokens, latency, status) without capturing prompt or response content by default.

## Design Philosophy

- **Wrapper-client only**: No monkey patching, no framework middleware as primary path
- **Metadata-only telemetry**: No prompt or response content capture by default
- **Fail-open**: Telemetry failures never crash user application code
- **Durable by default**: `durable_local` uses SQLite spool; `best_effort` is opt-out memory-only
- **Transparent**: Same request, same response, same exception, same stream — plus telemetry

## Supported Providers

### OpenAI

| Operation | Status | Usage Extraction | Notes |
|:---|:---|:---|:---|
| `responses.create` | ✅ | `input_tokens`, `output_tokens`, `total_tokens`, `cached_input_tokens`, `reasoning_tokens` | Supports streaming |
| `responses.parse` | ✅ | Same as `responses.create` | Structured output |
| `chat.completions.create` | ✅ | `input_tokens`, `output_tokens`, `total_tokens`, `cached_input_tokens`, `reasoning_tokens`, `audio_input_tokens`, `audio_output_tokens`, `accepted_prediction_tokens`, `rejected_prediction_tokens` | Supports streaming |
| `chat.completions.parse` | ✅ | Same as `chat.completions.create` | Structured output |
| `embeddings.create` | ✅ | `input_tokens`, `total_tokens` | |
| `images.generate` | ✅ | `input_tokens`, `output_tokens`, `total_tokens`, `text_input_tokens`, `image_input_tokens`, `text_output_tokens`, `image_output_tokens`, `images_generated` | |
| `images.edit` | ✅ | Same as `images.generate` | |
| `images.create_variation` | ✅ | Same as `images.generate` | |
| `batches.create` | ✅ | `total_tokens`←total, `input_tokens`←completed, `output_tokens`←failed | Proxy via `request_counts` |
| `batches.retrieve` | ✅ | Same proxy | |
| `batches.list` | ✅ | Same proxy | |
| `batches.cancel` | ✅ | Same proxy | |
| `audio.speech.create` | ✅ | None | No usage field |
| `audio.transcriptions.create` | ✅ | None | |
| `audio.translations.create` | ✅ | None | |
| `moderations.create` | ✅ | None | |
| `threads.runs.create` | ✅ | `input_tokens`, `output_tokens`, `total_tokens` | |
| `threads.runs.retrieve` | ✅ | Same | |
| `threads.runs.modify` | ✅ | Same | |
| `threads.runs.cancel` | ✅ | Same (may be empty) | |
| `threads.runs.submit_tool_outputs` | ✅ | Same | |
| `threads.runs.create_and_stream` | ✅ | Same | Supports streaming |
| `files.create` | ✅ | None | |
| `files.retrieve` | ✅ | None | |
| `files.content` | ✅ | None | Returns raw bytes |
| `files.delete` | ✅ | None | |
| `files.list` | ✅ | None | |
| `fine_tuning.jobs.create` | ✅ | `input_tokens`, `output_tokens`, `total_tokens` | If usage present |
| `fine_tuning.jobs.retrieve` | ✅ | Same | |
| `fine_tuning.jobs.list` | ✅ | Same | |
| `fine_tuning.jobs.cancel` | ✅ | Same | |
| `uploads.create` | ✅ | None | |
| `uploads.retrieve` | ✅ | None | |
| `uploads.complete` | ✅ | None | |
| `uploads.cancel` | ✅ | None | |
| `vector_stores.create` | ✅ | None | |
| `vector_stores.retrieve` | ✅ | None | |
| `vector_stores.list` | ✅ | None | |
| `vector_stores.delete` | ✅ | None | |

Version policy: official OpenAI Python SDK `>=2.0,<3`

### Anthropic

| Operation | Status | Usage Extraction | Notes |
|:---|:---|:---|:---|
| `messages.create` | ✅ | `input_tokens`, `output_tokens`, `total_tokens` | Sync only |

Version policy: official Anthropic Python SDK `>=0.40,<1`

## Install

```bash
uv add tokenome-sdk
uv add 'tokenome-sdk[openai]'
```

Optional extras:

```bash
uv add 'tokenome-sdk[anthropic]'
uv add 'tokenome-sdk[openai,anthropic]'
uv add 'tokenome-sdk[all]'
```

## Quick Start

### User-created OpenAI client

```python
from openai import OpenAI
from tokenome import TokenLens

tl = TokenLens(
    api_key="tl_pk_project_...",
    project_id="proj_...",
    environment="prod",
)

client = OpenAI(api_key="sk_...")
client = tl.wrap_openai(
    client,
    route="/api/chat",
    feature="chatbot",
    user_id="user_123",
    session_id="sess_456",
    tags={"tenant_id": "tenant_abc"},
)

response = client.responses.create(
    model="gpt-4.1-mini",
    input="Hello",
)

tl.flush()
tl.close()
```

### TokenLens-created OpenAI client

```python
from tokenome import TokenLens

tl = TokenLens(
    api_key="tl_pk_project_...",
    project_id="proj_...",
    environment="prod",
)

client = tl.OpenAI(api_key="sk_...")
response = client.responses.create(model="gpt-4.1-mini", input="Hello")
```

### Async OpenAI client

```python
from openai import AsyncOpenAI
from tokenome import TokenLens

tl = TokenLens(api_key="tl_pk_project_...", project_id="proj_...")

client = AsyncOpenAI(api_key="sk_...")
client = tl.wrap_openai(client)

response = await client.responses.create(model="gpt-4.1-mini", input="Hello")
```

## Environment Bootstrap

```bash
export TOKENLENS_API_KEY="***"
export TOKENLENS_PROJECT_ID="***"
export TOKENLENS_ENVIRONMENT="***"
export TOKENLENS_ENDPOINT="https://api.tokenome.ai/v1/events/batch"
export TOKENLENS_DURABILITY="durable_local"
export TOKENLENS_SPOOL_PATH="$HOME/.cache/tokenome/events.sqlite3"
export TOKENLENS_TAGS='{"service": "api", "environment": "prod"}'
export TOKENLENS_ENABLED="true"
export TOKENLENS_DEBUG="false"
export TOKENLENS_TIMEOUT_SECONDS="1.5"
export TOKENLENS_QUEUE_MAXSIZE="10000"
```

```python
from tokenome import TokenLens

tl = TokenLens.init_from_env()
client = tl.OpenAI(api_key="sk_...")
```

### Environment Variables

| Variable | Required | Default | Description |
|:---|:---|:---|:---|
| `TOKENLENS_API_KEY` | Yes | — | Project API key |
| `TOKENLENS_PROJECT_ID` | No | — | Project identifier |
| `TOKENLENS_ENVIRONMENT` | No | — | Environment label (e.g., `prod`, `staging`) |
| `TOKENLENS_ENDPOINT` | No | `https://api.tokenome.ai/v1/events/batch` | Ingest endpoint |
| `TOKENLENS_DURABILITY` | No | `durable_local` | `durable_local`, `best_effort`, or `agent` (future) |
| `TOKENLENS_SPOOL_PATH` | No | Platform cache dir | SQLite spool file path |
| `TOKENLENS_TAGS` | No | — | JSON object merged into wrapper tags |
| `TOKENLENS_ENABLED` | No | `true` | Enable/disable telemetry |
| `TOKENLENS_DEBUG` | No | `false` | Debug logging |
| `TOKENLENS_TIMEOUT_SECONDS` | No | `1.5` | HTTP request timeout |
| `TOKENLENS_QUEUE_MAXSIZE` | No | `10000` | In-memory queue max size |

## Public API

```python
tl = TokenLens(...)
client = tl.OpenAI(...)
client = tl.AsyncOpenAI(...)
client = tl.wrap_openai(client)
client = tl.wrap(client)
tl.flush()
tl.close()
```

Legacy helpers:

```python
tl = TokenLens.init(...)
tl = TokenLens.init_from_env()
TokenLens.is_initialized()
```

## Telemetry Semantics

Tokenome captures metadata, not analytics:

- Provider name and SDK version
- Model identifier
- Operation name
- Request mode: `sync`, `stream`, `provider_batch`
- Token usage (input, output, cached, reasoning, audio, image)
- Latency and status
- Error metadata
- Project / environment / route / feature / session context
- Safe request/response metadata (no content)

Tokenome does **not**:
- Capture prompt text by default
- Capture completion text by default
- Calculate billing cost in the SDK
- Monkey-patch provider modules
- Block provider calls on telemetry failure

## Event Payload Shape

Batches are sent to `POST /v1/events/batch`.

### Batch Envelope

```json
{
  "batch_id": "batch_abc123",
  "sdk": {
    "language": "python",
    "version": "0.1.0"
  },
  "events": []
}
```

### Per-Event Shape

```json
{
  "event_id": "evt_123",
  "event_type": "request",
  "provider": "openai",
  "provider_sdk": "openai-python",
  "provider_sdk_version": "2.41.1",
  "model": "gpt-4.1-mini",
  "operation": "responses.create",
  "request_mode": "sync",
  "request_started_at": "2026-05-03T00:00:00Z",
  "response_completed_at": "2026-05-03T00:00:01Z",
  "latency_ms": 812,
  "input_tokens": 1200,
  "output_tokens": 340,
  "total_tokens": 1540,
  "cached_input_tokens": 0,
  "reasoning_tokens": 0,
  "status": "success",
  "cost_status": "final",
  "route": "/api/chat",
  "feature": "chatbot",
  "user_id_hash": "<sha256>",
  "session_id": "sess_456",
  "tags": {
    "environment": "prod"
  }
}
```

## Batching and Delivery Behavior

| Parameter | Default | Description |
|:---|:---|:---|
| Durability | `durable_local` | SQLite spool or in-memory |
| Flush interval | `5s` | Time-based flush trigger |
| Max events per batch | `100` | Count-based flush trigger |
| Max payload size | `256 KiB` | Size-based split in sender |
| Queue max size | `10000` | Memory spool capacity |
| Request timeout | `1.5s` | HTTP POST timeout |

Delivery behavior:
- `durable_local` commits events to local SQLite spool before async send
- `best_effort` keeps events in memory only
- `tl.flush()` forces immediate send attempt (bypasses backoff)
- `tl.close()` flushes and shuts down cleanly
- HTTP `429` respects `Retry-After` header
- Transient network and `5xx` errors retry with exponential backoff
- Delivery is fail-open; app path never crashes on telemetry failure

## Context Helpers

```python
from tokenome import clear_context, set_context, set_default_context

set_default_context(tags={"service": "gateway", "environment": "prod"})
set_context(
    route="/api/chat",
    feature="chatbot",
    user_id="user_123",
    session_id="sess_456",
    tags={"request_id": "req_001"},
)
```

Resolution rules:
- Runtime context overrides default context for scalar fields
- Tags merge as `default_tags → wrapper tags → runtime context tags`
- If `user_id_hash` is absent but `user_id` exists, SDK emits SHA-256 hash of `user_id`

---

## Architecture Deep Dive

### Overview

The SDK is organized into four layers:

1. **Public API** (`client.py`) — `TokenLens` facade, env bootstrap, provider client creation
2. **Provider Wrappers** (`providers/`) — Thin transparent wrappers around OpenAI and Anthropic clients
3. **Event Model** (`models.py`) — `TelemetryEvent` dataclass, normalization, serialization
4. **Delivery Core** (`delivery/`, `spool/`) — Background worker, HTTP sender, SQLite spool

```mermaid
flowchart TB
    subgraph L1["Public API (TokenLens)"]
        A1["wrap_openai(), wrap()"]
        A2["flush(), close()"]
        A3["init_from_env()"]
    end

    subgraph L2["Provider Wrappers"]
        B1["OpenAI: responses, chat, images,<br/>embeddings, batches, audio,<br/>moderations, threads, files,<br/>fine_tuning, uploads, vector_stores"]
        B2["Anthropic: messages.create"]
    end

    subgraph L3["TelemetryEvent (models.py)"]
        C1["Metadata extraction"]
        C2["Usage normalization"]
        C3["Safe metadata filtering"]
    end

    subgraph L4["Delivery Core"]
        D1["Spool: SQLite / Memory"]
        D2["Worker: lease → send → ack/release"]
        D3["Sender: HTTP / batch split / retry"]
    end

    L1 --> L2
    L2 --> L3
    L3 --> L4
```

### Event Lifecycle

An event flows through the SDK in six stages:

```mermaid
flowchart LR
    A["Provider Call"] --> B["Wrap"]
    B --> C["Extract"]
    C --> D["Enqueue"]
    D --> E["Spool"]
    E --> F["Worker"]
    F --> G["Sender"]
    G --> H["Server"]
```

#### Stage 1: Provider Call

User calls `client.responses.create(...)`. The wrapper intercepts the call before it reaches the provider SDK.

#### Stage 2: Wrap

The wrapper:
1. Records `request_started_at`
2. Calls the original provider method
3. Records `response_completed_at`
4. Computes `latency_ms`
5. Extracts usage metadata from the response object
6. Builds a `TelemetryEvent` with all metadata fields

#### Stage 3: Extract

Usage extraction is provider-specific:

- **OpenAI chat.completions**: `response.usage.input_tokens`, `output_tokens`, `total_tokens`, `cached_input_tokens`, `reasoning_tokens`, etc.
- **OpenAI batches**: No `usage` field. Uses `request_counts.total` → `total_tokens`, `completed` → `input_tokens`, `failed` → `output_tokens` as proxy.
- **Anthropic messages**: `response.usage.input_tokens`, `output_tokens`
- **Operations without usage**: `audio.*`, `files.*`, `uploads.*`, `vector_stores.*`, `moderations.create` — return empty `UsageSnapshot`

#### Stage 4: Enqueue

`wrapper.py` → `client._state.enqueue(event)` → `spool.append(event)`

- If `enabled=False`, event is silently dropped
- If `spool.append()` raises, exception is caught and swallowed (fail-open)
- On successful append, `worker.notify_event_available()` wakes the background thread

#### Stage 5: Spool

The spool is the durability boundary:

**`durable_local` (SQLiteEventSpool)**:
- Serializes event to JSON
- Inserts into `tokenome_event_spool` table with `status='pending'`
- WAL mode (`journal_mode=WAL`, `synchronous=NORMAL`)
- Prunes expired events on every append
- Enforces `max_bytes` capacity with `drop_policy` (`drop_oldest`, `drop_newest`, `block`)

**`best_effort` (MemorySpool)**:
- Stores event in `_pending` list
- Drops new events when `_max_size` reached
- No persistence across process restarts

#### Stage 6: Worker + Sender

The `DeliveryWorker` runs in a daemon thread named `tokenome-delivery`.

**Worker Loop** (`_run()`):
```python
while not stopped:
    _loop_iteration()
```

**Loop Iteration** (`_loop_iteration()`):

1. **Compute sleep timeout**: If timer is running, sleep until deadline; otherwise sleep indefinitely until woken.
2. **Wait on wake queue**: `queue.Queue(maxsize=1)` — `notify_event_available()` puts a sentinel. Manual `flush()` and `close()` also put sentinels.
3. **Lease batch**: `spool.lease_batch(limit=100, flush_mode=...)`
   - `flush_mode=True` (manual flush): ignores `next_attempt_at` backoff
   - `flush_mode=False` (normal): only leases events whose `next_attempt_at <= now`
4. **Timer management**: On first event, start interval timer (`5s` with 80-120% jitter). Timer does NOT reset on every batch.
5. **Early return**: If batch size < 100 and timer hasn't fired and not force-flush, release events back to spool and sleep.
6. **Send**: Call `sender.send(events)`.
7. **Handle result**:
   - `acked` → `spool.mark_delivered()` (deletes from SQLite)
   - `retryable_ids` → `spool.release()` with exponential backoff (`delay = min(2^attempt_count, 300s)`)
   - `dropped` → `spool.drop()` (sets `status='failed'` or deletes)

**Sender** (`HttpSender`):

1. **Payload-size splitting**: `_split_batches()` serializes each event and splits into sub-batches that fit within `256 KiB`. Single oversized events are sent anyway (server returns `413`, sender marks as dropped).
2. **HTTP POST**: `httpx.Client` with HTTP/2, `max_keepalive_connections=2`, `max_connections=4`, `timeout=1.5s`.
3. **Response handling**:
   - `200/202` → acked
   - `429` → retryable (respects `Retry-After`)
   - `500/502/503/504` → retryable
   - `400/401/403/413` → dropped (fatal client error)
   - Network errors / timeout → retryable
4. **Composite result**: If a batch was split into sub-batches with mixed results, `SendResult` combines them.

**Lease Lifecycle**:

```mermaid
stateDiagram-v2
    [*] --> pending
    pending --> sending : lease
    sending --> delivered : success
    sending --> pending : retryable (backoff)
    sending --> failed : fatal
    failed --> [*]
    delivered --> [*]
```

**Crash Recovery** (`recover_inflight()`):

On spool initialization, any `status='sending'` rows with `leased_at IS NULL OR leased_at < now-5min` are reset to `pending`. Fresh leases (within 5 minutes) are left as `sending` to avoid races with a still-running worker.

### Durability Modes

| Mode | Persistence | Crash Recovery | Use Case |
|:---|:---|:---|:---|
| `durable_local` | SQLite WAL spool | Yes — events survive process restart | Production default |
| `best_effort` | In-memory list | No — events lost on crash | Development, low-latency requirements |
| `agent` | Future — remote agent | Future | Not implemented |

### Threading Model

```mermaid
sequenceDiagram
    participant MT as Main Thread
    participant BT as Background Thread (daemon)

    MT->>BT: wrap() → enqueue()
    MT->>BT: spool.append()
    MT->>BT: wake.put(None)
    BT->>BT: _wake.get()
    BT->>BT: lease_batch()
    BT->>BT: _send_batch()

    MT->>BT: flush()
    BT->>BT: force_flush=True
    BT->>BT: lease_batch(flush_mode=True)

    MT->>BT: close()
    BT->>BT: _stop.set()
    BT->>BT: final _do_flush()
    BT->>BT: thread.join()
```

All spool operations are protected by a `threading.Lock`. SQLite connection is created with `check_same_thread=False` to allow the lock to serialize access.

### Jitter and Thundering Herd Prevention

The flush interval (`5s`) is multiplied by `0.8 + random.random() * 0.4` on worker startup. This means each SDK instance flushes on a slightly different cadence, preventing synchronized spikes against the ingest server.

### Payload Size Enforcement

Payload size is enforced at two levels:

1. **Sender split**: `HttpSender._split_batches()` ensures each HTTP POST body is ≤ `256 KiB`. This is the hard boundary.
2. **Worker batch size**: `DeliveryWorker` leases up to `100` events. The worker does not enforce payload size; it relies on the sender to split oversized batches.

This separation keeps the worker simple (count-based only) while ensuring the sender never violates server limits.

### Retry and Backoff

Exponential backoff is computed in `SQLiteEventSpool.release()`:

```python
delay = min(2 ** attempt_count, 300)  # cap at 5 minutes
next_attempt_at = now + delay
```

The worker's `lease_batch()` respects `next_attempt_at` unless `flush_mode=True` (manual flush bypasses backoff).

### Fail-Open Guarantees

The SDK guarantees that telemetry failures never propagate to user code:

1. **Wrapper level**: Exceptions during metadata extraction are caught; the original provider response is still returned.
2. **Enqueue level**: `spool.append()` exceptions are caught and swallowed.
3. **Worker level**: Loop iteration exceptions are caught, logged at `debug`, and the worker sleeps `0.1s` before retrying.
4. **Sender level**: All HTTP/network exceptions are caught and converted to `retryable` or `dropped` results.

### Configuration Boundaries

Batch size (`100`), flush interval (`5s`), and payload cap (`256 KiB`) are **not user-configurable**. They are part of the SDK/server contract. Server-side enforcement is the real trust boundary; SDK hardcodes values as cooperation, not security.

User-configurable parameters:
- `api_key`, `endpoint`, `project_id`, `environment`
- `enabled`, `debug`
- `timeout_seconds`, `queue_maxsize`
- `durability`, `spool_path`, `max_spool_bytes`, `max_spool_age_days`
- `drop_policy`, `default_tags`

## Unsupported Patterns

Not supported in this SDK version:
- OpenAI `1.x`
- Old module-level `openai==0.28` APIs
- Monkey patch instrumentation
- Framework middleware as primary integration path
- Automatic prompt/content capture
- Arbitrary provider clients without supported wrapper shape
