Metadata-Version: 2.4
Name: tollgateai
Version: 0.1.2
Summary: Track real LLM model usage and compute live gross margin with Tollgate.
Project-URL: Homepage, https://tollgateai.vercel.app
Author: Tollgate
License: Proprietary
Keywords: anthropic,cost,llm,margin,observability,openai,tokens,tollgate
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# tollgateai (Python SDK)

Track **real** LLM model usage and compute live gross margin with
[Tollgate](https://tollgateai.vercel.app). The SDK reads the actual usage off
each provider response — you never hand-count tokens. Zero dependencies.

Published on PyPI: [tollgateai](https://pypi.org/project/tollgateai/) (v0.1.2).

```bash
pip install tollgateai
```

Create an API key in **Tollgate → Integrations**, then set:

```bash
export TOLLGATE_API_KEY=tg_live_xxx
# optional, defaults to the hosted app:
export TOLLGATE_BASE_URL=https://tollgateai.vercel.app
```

## Auto-instrumentation (recommended)

Wrap your provider client once; every call reports real usage in the background.

### Anthropic

```python
from anthropic import Anthropic
from tollgate import create_tollgate_client, wrap_anthropic

tollgate = create_tollgate_client()  # reads TOLLGATE_API_KEY

# Pin a run_id so every call in this run is grouped and reports cost only.
run_id = "ticket_8842"
anthropic = wrap_anthropic(
    Anthropic(), tollgate,
    customer_id="cust_A",     # your end customer
    run_id=run_id,
)

# Use the client normally — usage is tracked automatically.
anthropic.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[{"role": "user", "content": "Resolve this ticket…"}],
)

# Book revenue once, when the run finishes — "no outcome, no charge".
tollgate.resolve(
    run_id=run_id,
    customer_id="cust_A",
    outcome="resolved",       # "resolved" | "escalated" | "failed"
    revenue_unit_cents=50,    # charge for this resolved unit ($0.50)
)
```

### Outcome-based pricing

Under per-resolution / outcome pricing, only a **resolved** run earns revenue —
an `escalated`/`failed` run earns $0 but its provider cost still counts against
you. Wrap your client to meter cost on every call, then call `resolve()` once at
the end of the run to book the outcome. For simple per-call billing you can
instead pass `revenue_unit_cents` in the wrap options and skip `resolve()`.

### OpenAI

```python
from openai import OpenAI
from tollgate import create_tollgate_client, wrap_openai

tollgate = create_tollgate_client()
openai = wrap_openai(OpenAI(), tollgate, customer_id="cust_A")

openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
```

`revenue_unit_cents` can also be a callable of the response, e.g.
`revenue_unit_cents=lambda res: 50 if res.something else 0`.

## Manual tracking

For providers without a wrapper (Bedrock, custom gateways) or full control:

```python
from tollgate import create_tollgate_client

tollgate = create_tollgate_client()

tollgate.track({
    "customerId": "cust_A",
    "runId": "run_12345",
    "provider": "anthropic",
    "model": "claude-sonnet-4-6",
    "tokensIn": 1200,
    "tokensOut": 450,
    "reasoningTokens": 0,
    "cachedTokens": 0,
    "revenueUnitCents": 50,
    "idempotencyKey": "run_12345#step_1",  # exactly-once: safe to retry
})
```

## Notes

- **Idempotent.** Events dedupe on `idempotencyKey` (auto-set to the provider
  response id by the wrappers), so retries never double-count.
- **No prompt content is ever sent** — only token counts and metadata.
- **Streaming** responses are not auto-tracked yet (the wrappers only report when
  a non-streaming `usage` is present). Track those manually for now.
- **Non-blocking.** Auto-instrumented tracking runs on a background thread;
  failures go to `on_error` (default: log a warning) and never break your call.

Licensed for use with Tollgate. Not open source.
