Metadata-Version: 2.4
Name: tollgateai
Version: 0.2.1
Summary: Track real LLM model usage and compute live gross margin with Tollgate.
Project-URL: Homepage, https://tollgateai.vercel.app
Author: Tollgate
License: Proprietary
Keywords: anthropic,cost,llm,margin,observability,openai,tokens,tollgate
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# tollgateai (Python SDK)

Track **real** LLM model usage and compute live gross margin with
[Tollgate](https://tollgateai.vercel.app). The SDK reads the actual usage off
each provider response — you never hand-count tokens. Zero dependencies.

Published on PyPI: [tollgateai](https://pypi.org/project/tollgateai/) (v0.2.1).

Works with **OpenAI**, **Anthropic**, **AWS Bedrock**, and **every OpenAI-compatible
gateway** (OpenRouter, Groq, Together, Nebius, local vLLM, …) — streaming and
non-streaming. Cost is computed server-side from the token counts the wrappers
capture, so no provider has to return a dollar figure.

```bash
pip install tollgateai
```

Create an API key in **Tollgate → Integrations**, then set:

```bash
export TOLLGATE_API_KEY=tg_live_xxx
# optional, defaults to the hosted app:
export TOLLGATE_BASE_URL=https://tollgateai.vercel.app
```

## Auto-instrumentation (recommended)

Wrap your provider client once; every call reports real usage in the background.

### Anthropic

```python
from anthropic import Anthropic
from tollgate import create_tollgate_client, wrap_anthropic

tollgate = create_tollgate_client()  # reads TOLLGATE_API_KEY

# Pin a run_id so every call in this run is grouped and reports cost only.
run_id = "ticket_8842"
anthropic = wrap_anthropic(
    Anthropic(), tollgate,
    customer_id="cust_A",     # your end customer
    run_id=run_id,
)

# Use the client normally — usage is tracked automatically.
anthropic.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[{"role": "user", "content": "Resolve this ticket…"}],
)

# Book revenue once, when the run finishes — "no outcome, no charge".
tollgate.resolve(
    run_id=run_id,
    customer_id="cust_A",
    outcome="resolved",       # "resolved" | "escalated" | "failed"
    revenue_unit_cents=50,    # charge for this resolved unit ($0.50)
)
```

### Outcome-based pricing

Under per-resolution / outcome pricing, only a **resolved** run earns revenue —
an `escalated`/`failed` run earns $0 but its provider cost still counts against
you. Wrap your client to meter cost on every call, then call `resolve()` once at
the end of the run to book the outcome. For simple per-call billing you can
instead pass `revenue_unit_cents` in the wrap options and skip `resolve()`.

### OpenAI

```python
from openai import OpenAI
from tollgate import create_tollgate_client, wrap_openai

tollgate = create_tollgate_client()
openai = wrap_openai(OpenAI(), tollgate, customer_id="cust_A")

openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
```

`revenue_unit_cents` can also be a callable of the response, e.g.
`revenue_unit_cents=lambda res: 50 if res.something else 0`.

### OpenAI-compatible gateways

Point the OpenAI SDK at any compatible endpoint and pass
`provider="openai_compatible"`:

```python
openai = OpenAI(api_key=GROQ_KEY, base_url="https://api.groq.com/openai/v1")
client = wrap_openai(openai, tollgate, customer_id="cust_A", provider="openai_compatible")
client.chat.completions.create(model="llama-3.3-70b-versatile", messages=[...])
```

### Streaming

Streaming is captured automatically. For **OpenAI / compatible**, pass
`stream_options={"include_usage": True}` (required for a final usage chunk);
**Anthropic** needs no flag. Iterate the stream as usual — usage is reported when
it ends.

### AWS Bedrock

Wrap a boto3 `bedrock-runtime` client so `converse` / `converse_stream`
auto-report usage (the model id is read from the call):

```python
import boto3
from tollgate import wrap_bedrock

bedrock = wrap_bedrock(boto3.client("bedrock-runtime", region_name="us-east-1"), tollgate, customer_id="cust_A")
bedrock.converse(modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", messages=[...])
```

### Already have an exact cost?

Pass `provider_cost_cents` (a number or a callable of the response) and the server
uses it verbatim, skipping the rate card.

## Set up customers & plans in code

Create a customer and assign its plan **before** sending usage, so plan-priced
revenue (especially `usage_based`, computed at ingest) is recognized from the
first event. Idempotent.

```python
tollgate.upsert_customer(
    "cust_A", name="Acme", seats=5,
    plan={"name": "Usage", "pricingModel": "usage_based", "unitRevenueCents": 10},
)
```

## Manual tracking

For full control or unusual providers:

```python
from tollgate import create_tollgate_client

tollgate = create_tollgate_client()

tollgate.track({
    "customerId": "cust_A",
    "runId": "run_12345",
    "provider": "anthropic",
    "model": "claude-sonnet-4-6",
    "tokensIn": 1200,
    "tokensOut": 450,
    "reasoningTokens": 0,
    "cachedTokens": 0,
    "revenueUnitCents": 50,
    "idempotencyKey": "run_12345#step_1",  # exactly-once: safe to retry
})
```

## Notes

- **Idempotent.** Events dedupe on `idempotencyKey` (auto-set to the provider
  response id by the wrappers), so retries never double-count.
- **No prompt content is ever sent** — only token counts and metadata.
- **Streaming is auto-tracked** (OpenAI needs `stream_options={"include_usage": True}`).
- **Cost from tokens.** The server prices every event from token counts × a rate
  card that auto-syncs daily from the public LiteLLM registry — unknown models are
  priced at $0 and flagged in logs. See [docs/PRICING.md](../../docs/PRICING.md).
- **Non-blocking.** Auto-instrumented tracking runs on a background thread;
  failures go to `on_error` (default: log a warning) and never break your call.

`wrap_*` accepts `customer_id`, `agent_id`, `run_id`, `revenue_unit_cents`,
`provider` (override; e.g. `"openai_compatible"`), `provider_cost_cents`, `on_error`.

Licensed for use with Tollgate. Not open source.
