Metadata-Version: 2.4
Name: tokendetective
Version: 0.3.3
Summary: Automatic token, latency & cost tracking for every AI call — OpenAI, Anthropic, Gemini, Ollama
Project-URL: Homepage, https://github.com/alumnx-ai-labs/TokenLens
Project-URL: Repository, https://github.com/alumnx-ai-labs/TokenLens
Author-email: TokenLens <nsandeep06595@gmail.com>
License: MIT
Keywords: ai,analytics,anthropic,cost,gemini,llm,observability,openai,tokens
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: anthropic>=0.20.0
Requires-Dist: httpx>=0.24.0
Requires-Dist: openai>=1.0.0
Requires-Dist: python-dotenv>=1.0.0
Description-Content-Type: text/markdown

# tokendetective

**Automatic token, latency & cost tracking for every AI call — zero code changes.**

`tokendetective` wraps your existing OpenAI, Anthropic, Gemini, or Ollama client and silently logs every request to the [TokenLens](https://github.com/alumnx-ai-labs/TokenLens) dashboard. Token counts, latency, cost in USD and INR, and the full conversation trace all appear in real time — without touching your application logic.

```bash
pip install tokendetective
```

[![PyPI version](https://img.shields.io/pypi/v/tokendetective)](https://pypi.org/project/tokendetective/)
[![Python](https://img.shields.io/pypi/pyversions/tokendetective)](https://pypi.org/project/tokendetective/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

---

## Table of Contents

- [What It Does](#what-it-does)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Supported Providers](#supported-providers)
- [Usage Modes](#usage-modes)
- [Constructor Reference](#constructor-reference)
- [Method Reference](#method-reference)
- [Async Support](#async-support)
- [Manual Logging](#manual-logging)
- [Local Cost Utilities](#local-cost-utilities)
- [Supported Models & Pricing](#supported-models--pricing)
- [Error Handling](#error-handling)
- [REST API (non-Python)](#rest-api-non-python)

---

## What It Does

- Tracks **tokens in**, **tokens out**, **latency**, and **cost** for every LLM call
- Logs everything to your TokenLens backend — visible in the **Agent Runs** dashboard
- Works transparently — your existing calls are **unchanged**
- Fires logs in a **background thread** so your app latency is unaffected
- Supports **20+ models** with built-in USD → INR pricing

---

## Installation

```bash
pip install tokendetective
```

Requires **Python 3.9+**. OpenAI, Anthropic, and `python-dotenv` are bundled as dependencies.

---

## Quick Start

```python
from tokenlens import TokenLens

tl = TokenLens(
    api_key    = "tl-your-api-key",   # from TokenLens dashboard → Settings → API Keys
    agent_name = "my-agent",
)

# Wrap once — use forever
client = tl.openai()

response = client.chat.completions.create(
    model    = "gpt-4o-mini",
    messages = [{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)
# → Paris is the capital of France.

# Token counts, latency, and cost are now in your TokenLens dashboard.
```

That's it. No middleware, no decorators, no extra API calls in your code.

---

## Supported Providers

| Provider | Method | Notes |
|---|---|---|
| **OpenAI** | `tl.openai()` / `tl.async_openai()` | Requires `OPENAI_API_KEY` |
| **Anthropic** | `tl.anthropic()` / `tl.async_anthropic()` | Requires `ANTHROPIC_API_KEY` |
| **Ollama** (local) | `tl.ollama()` | Uses OpenAI-compat layer |
| **Any client** | `tl.wrap(client)` | Custom base URLs, Azure, proxies |

---

## Usage Modes

### Mode 1 — Wrap a provider client

The most common mode. Your API key goes directly to the provider; `tokendetective` only intercepts the response to log token counts.

```python
# OpenAI
client   = tl.openai()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Anthropic
claude = tl.anthropic()
msg    = claude.messages.create(
    model     = "claude-3-5-haiku-20241022",
    max_tokens= 256,
    messages  = [{"role": "user", "content": "Hello!"}],
)

# Ollama (local)
client   = tl.ollama()
response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}],
)
```

### Mode 2 — Bring your own client

Already have a configured client? Wrap it directly:

```python
from openai import OpenAI

my_client = OpenAI(
    api_key  = "sk-...",
    base_url = "https://your-azure-endpoint.openai.azure.com/",
)
tracked = tl.wrap(my_client)
```

### Mode 3 — Manual logging

Have token counts from LangChain, LlamaIndex, or a custom HTTP client? Log manually:

```python
tl.log(
    model      = "gpt-4o-mini",
    tokens_in  = 512,
    tokens_out = 128,
    latency_ms = 340.5,
    query_text = "Summarise this document…",
)
```

---

## Constructor Reference

```python
TokenLens(
    api_key        : str,
    base_url       : str   = None,      # reads TOKENLENS_URL env var, falls back to https://13.126.130.56.nip.io
    application    : str   = "tokenlens-sdk",
    agent_name     : str   = "tokenlens-agent",
    background     : bool  = True,
    timeout        : float = 10.0,
    raise_on_error : bool  = False,
)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `api_key` | `str` | required | Your `tl-` API key from the TokenLens dashboard |
| `base_url` | `str` | env / default | Backend URL. Reads `TOKENLENS_URL` from `.env`, falls back to the hosted server |
| `application` | `str` | `"tokenlens-sdk"` | App label shown in the dashboard |
| `agent_name` | `str` | `"tokenlens-agent"` | Agent label shown in the Agent Runs page |
| `background` | `bool` | `True` | Fire-and-forget (non-blocking). Set `False` to block and return the log result |
| `timeout` | `float` | `10.0` | HTTP timeout in seconds for log requests |
| `raise_on_error` | `bool` | `False` | Raise `LoggingError` on failure instead of silently logging a warning |

---

## Method Reference

### `tl.openai(**kwargs)` / `tl.async_openai(**kwargs)`

Create a tracked OpenAI client. All kwargs are forwarded to `OpenAI()`.

### `tl.anthropic(**kwargs)` / `tl.async_anthropic(**kwargs)`

Create a tracked Anthropic client.

### `tl.ollama(base_url=None, **kwargs)`

Create a tracked Ollama client. Defaults to `OLLAMA_HOST` env var or `http://localhost:11434`.

### `tl.wrap(client)`

Wrap any existing provider client instance. Supports `openai.OpenAI`, `openai.AsyncOpenAI`, `anthropic.Anthropic`, `anthropic.AsyncAnthropic`.

### `tl.log(*, model, tokens_in, tokens_out, latency_ms, query_text=None, response_text=None, application=None)`

Manually log one AI request. Returns `None` in background mode, or `{"usage_id", "cost_usd", "cost_inr"}` when `background=False`.

### `await tl.alog(...)`

Async version of `log()`. Always awaits and returns the response dict.

---

## Async Support

```python
import asyncio
from tokenlens import TokenLens

tl = TokenLens(api_key="tl-...")

async def main():
    # Async OpenAI
    client   = tl.async_openai()
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

    # Async manual log
    result = await tl.alog(
        model="gpt-4o-mini", tokens_in=150, tokens_out=42, latency_ms=500.0
    )
    print(result)  # {"usage_id": "…", "cost_usd": 0.000027, "cost_inr": 0.0023}

asyncio.run(main())
```

---

## Manual Logging

Use `tl.log()` when you already have token counts from any source:

```python
# Fire-and-forget (default)
tl.log(
    model      = "my-custom-llm",
    tokens_in  = 1024,
    tokens_out = 256,
    latency_ms = 820.0,
    query_text = "User query here",
)

# Blocking — get cost back immediately
tl2 = TokenLens(api_key="tl-...", background=False)
result = tl2.log(model="gpt-4o-mini", tokens_in=150, tokens_out=42, latency_ms=500)
print(result)
# {"usage_id": "uuid…", "cost_usd": 0.00002745, "cost_inr": 0.00233325}
```

---

## Local Cost Utilities

Calculate cost locally without any network call:

```python
from tokenlens.pricing import compute_cost, list_models

cost = compute_cost("gpt-4o-mini", tokens_in=1500, tokens_out=420)
print(cost)
# {"usd": 0.000477, "inr": 0.040545}

# Custom exchange rate
cost = compute_cost("gpt-4o-mini", tokens_in=1500, tokens_out=420, usd_to_inr=84.5)

# Custom pricing for unlisted models
cost = compute_cost(
    "my-model",
    tokens_in  = 1000,
    tokens_out = 500,
    custom_pricing = {"input": 0.002 / 1_000_000, "output": 0.008 / 1_000_000},
)

# All supported models
print(list_models())
```

---

## Supported Models & Pricing

### OpenAI

| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| `gpt-4o` | $5.00 | $20.00 |
| `gpt-4o-mini` | $0.15 | $0.60 |
| `gpt-4-turbo` | $10.00 | $30.00 |
| `gpt-4` | $30.00 | $60.00 |
| `gpt-3.5-turbo` | $0.50 | $1.50 |
| `o1` | $15.00 | $60.00 |
| `o1-mini` | $3.00 | $12.00 |
| `o3` | $10.00 | $40.00 |
| `o3-mini` | $1.10 | $4.40 |

### Anthropic

| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| `claude-opus-4-8` | $15.00 | $75.00 |
| `claude-sonnet-4-6` | $3.00 | $15.00 |
| `claude-haiku-4-5-20251001` | $0.80 | $4.00 |
| `claude-3-5-sonnet-20241022` | $3.00 | $15.00 |
| `claude-3-5-haiku-20241022` | $0.80 | $4.00 |
| `claude-3-opus-20240229` | $15.00 | $75.00 |
| `claude-3-haiku-20240307` | $0.25 | $1.25 |

### Google Gemini

| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| `gemini-1.5-pro` | $1.25 | $5.00 |
| `gemini-1.5-flash` | $0.075 | $0.30 |
| `gemini-2.0-flash` | $0.10 | $0.40 |
| `gemini-2.0-flash-lite` | $0.075 | $0.30 |

Models not in the table fall back to a minimal default rate. Pass `custom_pricing` to `compute_cost()` for accurate local estimates.

---

## Error Handling

The SDK never crashes your application by default:

```python
import logging
logging.basicConfig(level=logging.WARNING)

tl = TokenLens(api_key="tl-...", raise_on_error=False)  # default — safe
tl.log(model="gpt-4o-mini", tokens_in=100, tokens_out=50, latency_ms=500)
# If backend is unreachable: logs a warning, returns None — your app keeps running
```

Strict mode for tests:

```python
from tokenlens import TokenLens, LoggingError

tl = TokenLens(api_key="tl-...", raise_on_error=True)
try:
    tl.log(model="gpt-4o-mini", tokens_in=100, tokens_out=50, latency_ms=500)
except LoggingError as e:
    print(f"Logging failed: {e}")
```

| Exception | When raised |
|---|---|
| `AuthError` | `api_key` is empty or does not start with `tl-` |
| `LoggingError` | Backend returned non-200, or connection timed out |
| `TokenLensError` | Base class for all SDK exceptions |

---

## REST API (non-Python)

Use the backend directly from any language:

```bash
curl -X POST https://13.126.130.56.nip.io/v1/log \
  -H "Authorization: Bearer tl-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "application":   "my-app",
    "agent_name":    "summariser",
    "model_used":    "gpt-4o-mini",
    "tokens_in":     512,
    "tokens_out":    128,
    "latency_ms":    340.5,
    "query_text":    "Summarise this document…",
    "response_text": "Here is a summary…"
  }'
```

Response:

```json
{
  "usage_id":     "b3c1a9f2-…",
  "cost_usd":     0.0000927,
  "cost_inr":     0.007880,
  "total_tokens": 640,
  "model_found":  true
}
```

---

## Links

- **Dashboard**: [TokenLens](https://github.com/alumnx-ai-labs/TokenLens)
- **PyPI**: [pypi.org/project/tokendetective](https://pypi.org/project/tokendetective/)
- **Issues**: [GitHub Issues](https://github.com/alumnx-ai-labs/TokenLens/issues)
