Metadata-Version: 2.4
Name: llm-token-guardian
Version: 0.1.1
Summary: Pre-call cost estimation, tracking, and budgets for OpenAI, Gemini, and Claude
Project-URL: Homepage, https://github.com/iamsaugatpandey/llm-token-guardian
Project-URL: Repository, https://github.com/iamsaugatpandey/llm-token-guardian
Project-URL: Issues, https://github.com/iamsaugatpandey/llm-token-guardian/issues
Author-email: Saugat Pandey <saugatpandey02@gmail.com>
License: MIT
Keywords: anthropic,budget,cost,gemini,llm,openai,tokens
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Requires-Dist: pydantic>=2.0.0
Requires-Dist: tiktoken>=0.7.0
Provides-Extra: all
Requires-Dist: anthropic>=0.36.0; extra == 'all'
Requires-Dist: google-genai>=0.8.0; extra == 'all'
Requires-Dist: openai>=1.0.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.36.0; extra == 'anthropic'
Provides-Extra: google
Requires-Dist: google-genai>=0.8.0; extra == 'google'
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == 'openai'
Description-Content-Type: text/markdown

# LLM Cost Guardian

**Pre-call cost estimation, session budget tracking, and transparent cost reporting for OpenAI, Anthropic (Claude), and Google Gemini.**

Know what an API call will cost *before* you make it. Track cumulative spend across your session. Set soft or hard budgets. Works in Python scripts and Jupyter notebooks.

---

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Usage Guide](#usage-guide)
  - [Wrapping your client](#wrapping-your-client)
  - [Reporting modes](#reporting-modes)
  - [Session tracking](#session-tracking)
  - [Budget control](#budget-control)
  - [Vision / image requests](#vision--image-requests)
  - [Jupyter notebook usage](#jupyter-notebook-usage)
- [Sample output](#sample-output)
- [Supported providers](#supported-providers)
- [Pricing source](#pricing-source)
- [Limitations](#limitations)
- [Alternative installation (wheel)](#alternative-installation-wheel)
- [Feedback & contributing](#feedback--contributing)

---

## Features

- **Pre-call cost table** — shows text tokens, image tokens (using official per-provider formulas), and max output cost before the call is made
- **Precise image token estimation** — OpenAI tile/patch formulas, Anthropic pixel formula, Gemini tile formula
- **Post-call actual cost** — tracks real token counts from the API response; reports per-call cost and cumulative session total after every call
- **Session budget** — set a USD limit; soft mode warns without blocking, strict mode raises an exception
- **Cumulative tracking** — share one `TokenTracker` across multiple clients to track spend across your entire session
- **Modality disclaimer** — warns when audio, video, or document content is detected (cost not computed for those)
- **Works everywhere** — plain `print()` output, compatible with Python scripts and Jupyter notebooks
- **Pricing from LiteLLM** — 395+ models loaded from the open-source [LiteLLM pricing JSON](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)

---

## Installation

```bash
# Base package (no provider SDK included)
pip install llm-token-guardian

# With a specific provider SDK
pip install "llm-token-guardian[openai]"
pip install "llm-token-guardian[anthropic]"
pip install "llm-token-guardian[google]"

# All providers
pip install "llm-token-guardian[all]"
```

> If `pip install` is unavailable in your environment, see [Alternative installation (wheel)](#alternative-installation-wheel).

---

## Quick Start

```python
import openai
from llm_token_guardian import TokenTracker, budget, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="both")

with budget(max_cost_usd=0.10, tracker=tracker, strict=False):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain LLM cost tracking."}],
        max_completion_tokens=128,
    )
    print(response.choices[0].message.content)

print(f"Session total: ${tracker.usage.total_cost_usd:.8f} USD")
```

---

## Usage Guide

### Wrapping your client

Wrap your existing provider client — no need to change how you call the API.

#### OpenAI

```python
import openai
from llm_token_guardian import TokenTracker, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    max_completion_tokens=64,
)
```

#### Anthropic (Claude)

```python
import anthropic
from llm_token_guardian import TokenTracker, wrap_anthropic_sync

tracker = TokenTracker()
client = wrap_anthropic_sync(anthropic.Anthropic(), tracker)

response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=64,
    messages=[{"role": "user", "content": "Hello!"}],
)
```

#### Google Gemini

```python
from google import genai
from llm_token_guardian import TokenTracker, wrap_gemini_sync

tracker = TokenTracker()
client = wrap_gemini_sync(genai.Client(api_key="..."), "gemini-2.0-flash", tracker)

response = client.generate_content("Hello!")
```

---

### Reporting modes

Pass `reporting=` to any `wrap_*` function to control output verbosity:

| Mode | Output |
| ---- | ------ |
| `"both"` | Pre-call estimate table + post-call actual cost *(default)* |
| `"pre"` | Pre-call estimate table only |
| `"post"` | Post-call actual cost only |
| `"none"` | Silent — no output at all |

```python
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="post")
```

---

### Session tracking

Pass the **same `TokenTracker` instance** to all wrapped clients to accumulate cost across all calls in a session. The post-call summary after every call shows both the per-call cost and the running session total:

```python
tracker = TokenTracker()

openai_client  = wrap_openai_sync(openai.OpenAI(), tracker)
claude_client  = wrap_anthropic_sync(anthropic.Anthropic(), tracker)

openai_client.chat.completions.create(...)   # post-call shows: "Session: $X (1 call)"
claude_client.messages.create(...)           # post-call shows: "Session: $Y (2 calls)"

# Full summary at any time
print(f"Total spend  : ${tracker.usage.total_cost_usd:.8f} USD")
print(f"Total calls  : {tracker.usage.calls}")
print(f"Total tokens : {tracker.usage.total_tokens:,}")
```

---

### Budget control

Use `budget()` as a context manager to set a spending limit.

```python
from llm_token_guardian import budget, TokenTracker, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker)

# Soft mode — warn when budget is exceeded, but never block the call
with budget(max_cost_usd=0.05, tracker=tracker, strict=False):
    client.chat.completions.create(...)

# Strict mode — raise BudgetExceeded if the pre-call estimate exceeds remaining budget
with budget(max_cost_usd=0.05, tracker=tracker, strict=True):
    client.chat.completions.create(...)
```

The budget is **cumulative** — it subtracts the actual cost of each call, so the remaining budget shrinks as you make calls inside the context.

---

### Vision / image requests

Image costs are estimated **before** the call using official per-provider token formulas:

| Provider | Formula |
| -------- | ------- |
| OpenAI `gpt-4o`, `gpt-4.1`, o-series | Tile-based: scale → 512px tiles × 170 tokens + 85 base |
| OpenAI `gpt-4.1-mini`, `gpt-4.1-nano`, `o4-mini` | Patch-based: 32px patches × per-model multiplier |
| Anthropic Claude | `ceil(width × height / 750)` tokens |
| Google Gemini | ≤384px both dims → 258 tokens; larger → `ceil(w/768) × ceil(h/768) × 258` |

Pass images the same way you normally would — the wrapper detects and measures them automatically:

```python
import base64
image_b64 = base64.b64encode(open("photo.jpg", "rb").read()).decode()

# OpenAI — data URI in image_url block
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}},
        {"type": "text", "text": "What is in this image?"},
    ]}],
    max_completion_tokens=64,
)

# Anthropic — base64 source block
client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=64,
    messages=[{"role": "user", "content": [
        {"type": "image", "source": {
            "type": "base64", "media_type": "image/jpeg", "data": image_b64,
        }},
        {"type": "text", "text": "What is in this image?"},
    ]}],
)

# Gemini — Part.from_bytes
from google.genai import types
client.generate_content([
    types.Part.from_bytes(data=open("photo.jpg", "rb").read(), mime_type="image/jpeg"),
    "What is in this image?",
])
```

> **Unsupported modalities**: If audio, video, or PDF document content is detected, a warning is printed. The API call still proceeds — only text and image cost estimates are affected.

---

### Jupyter notebook usage

`llm-token-guardian` uses plain `print()` with `flush=True` and requires no display libraries. It works in Jupyter notebooks without any changes.

```python
# Jupyter notebook cell:
import openai
from llm_token_guardian import TokenTracker, budget, wrap_openai_sync

tracker = TokenTracker()
client = wrap_openai_sync(openai.OpenAI(), tracker, reporting="both")

with budget(max_cost_usd=0.10, tracker=tracker, strict=False):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is 2 + 2?"}],
        max_completion_tokens=32,
    )

print(response.choices[0].message.content)
```

The pre-call cost table and post-call summary print inline in the cell output.

---

## Sample output

```text
[Pre-call]  gpt-4o  (openai)
  Source      : https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
  Prices as of: February 19, 2026
  Budget      : $0.099821 remaining of $0.100000 total

  Component                   Tokens      Cost (USD)
  ──────────────────────────────────────────────────
  Text input                      ~9      $0.00004500
  Image  (1024×1024 px)         ~765      $0.00382500
  Max output                      64      $0.00032000
  ──────────────────────────────────────────────────
  Estimated max total           ~838      $0.00419000

Response: A golden retriever sitting on a park bench.

[Post-call] gpt-4o
  This call   : $0.00187500 USD  (12 in + 23 out tokens)
  Session     : $0.00266000 USD  (2 calls total)
  Budget      : $0.097340 remaining of $0.100000 total
```

---

## Supported providers

| Provider | Models loaded | Wrapper |
| -------- | ------------- | ------- |
| OpenAI | 210+ (GPT-4o, GPT-4.1, o-series, …) | `wrap_openai_sync` |
| Anthropic | 31+ (Claude Haiku, Sonnet, Opus variants) | `wrap_anthropic_sync` |
| Google | 154+ (Gemini 2.0 Flash, 1.5 Pro/Flash, …) | `wrap_gemini_sync` |

List all available models and their prices:

```python
from llm_token_guardian import list_models

for name, price in list_models().items():
    print(f"{name:50s}  ${price.input_per_1k:.6f}/1K in   ${price.output_per_1k:.6f}/1K out")
```

Look up a specific model:

```python
from llm_token_guardian import get_price

p = get_price("gpt-4o")
print(f"Input : ${p.input_per_1k:.6f} / 1K tokens")
print(f"Output: ${p.output_per_1k:.6f} / 1K tokens")
print(f"Vision: {p.supports_vision}")
print(f"Max input tokens : {p.max_input_tokens:,}")
print(f"Max output tokens: {p.max_output_tokens:,}")
```

---

## Pricing source

All pricing data is loaded from the open-source LiteLLM pricing file:

**[model_prices_and_context_window.json](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)**

Bundled snapshot date: **February 19, 2026**

To refresh with the latest prices at runtime:

```python
from llm_token_guardian import refresh_pricing
refresh_pricing()  # downloads latest from GitHub
```

---

## Limitations

1. **Text and image only** — cost estimation covers text and image inputs. If you pass audio, video, or document (PDF) content, a warning is displayed but no cost is computed for those modalities. The API call still proceeds normally.

2. **Estimates vs. actual billing** — the pre-call table shows an *upper bound* (assumes all `max_output_tokens` are used). The post-call cost is computed from actual token counts returned by the API using our stored price-per-token rates. This closely matches your dashboard in most cases, but can differ due to:
   - Prompt caching discounts (Anthropic cache read/write, OpenAI cached prompt tokens)
   - Batch API pricing (usually 50% discount)
   - Volume discounts or custom pricing tiers
   - Price changes after the bundled snapshot date

3. **Always verify on your provider dashboard** — use this tool as a helpful guide, not a billing authority:
   - [OpenAI Usage Dashboard](https://platform.openai.com/usage)
   - [Anthropic Console](https://console.anthropic.com/)
   - [Google AI Studio](https://aistudio.google.com/)

4. **Synchronous wrappers are fully featured** — async variants (`wrap_anthropic_async`, `wrap_gemini_async`) are included but follow the same interface pattern.

5. **Model coverage** — if a model is not in the pricing database, a `ModelNotFoundError` is raised explaining which providers are supported.

---

## Alternative installation (wheel)

If `pip install llm-token-guardian` is unavailable, install from a pre-built `.whl` file.

**Download** the wheel from the [Releases](https://github.com/iamsaugatpandey/llm-token-guardian/releases) page, then:

```bash
pip install llm_token_guardian-0.1.0-py3-none-any.whl

# With a provider extra:
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[openai]"
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[anthropic]"
pip install "llm_token_guardian-0.1.0-py3-none-any.whl[google]"
```

**Build the wheel yourself from source:**

```bash
git clone https://github.com/iamsaugatpandey/llm-token-guardian.git
cd llm-token-guardian
pip install build
python -m build
# Outputs dist/llm_token_guardian-0.1.0-py3-none-any.whl
pip install dist/llm_token_guardian-0.1.0-py3-none-any.whl
```

---

## Feedback & contributing

**Email**: [saugatpandey02@gmail.com](mailto:saugatpandey02@gmail.com)
Feedback, questions, and feature suggestions are very welcome.

**GitHub Issues**: [github.com/iamsaugatpandey/llm-token-guardian/issues](https://github.com/iamsaugatpandey/llm-token-guardian/issues)
Bug reports, feature requests, and general discussions.

**Contributing**: The repository will be public on GitHub — pull requests are welcome! Fork, open an issue to discuss your idea, and submit a PR.

**⭐ Star the repo** if you find this useful — it helps others discover the project and motivates continued development!

---

*Pricing data sourced from [BerriAI/litellm](https://github.com/BerriAI/litellm) — thank you to the LiteLLM team for maintaining this open dataset.*
