Metadata-Version: 2.4
Name: modelcost
Version: 0.1.6
Summary: Calculate LLM API call costs from token usage
Project-URL: Homepage, https://github.com/rmescandon/modelcost
Project-URL: Repository, https://github.com/rmescandon/modelcost
Author-email: Roberto Mier Escandon <rmescandon@gmail.com>
License: MIT
Keywords: cost,llm,openai,pricing,tokens
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: <4.0,>=3.10
Requires-Dist: click>=8.0
Requires-Dist: httpx>=0.27
Requires-Dist: tokencost>=0.1.16
Description-Content-Type: text/markdown

# modelcost

Calculate LLM API call costs from token usage using price catalogs from multiple sources.

Supported pricing sources:
- `litellm` (default)
- `openrouter`
- `tokencost`

## Install

```bash
python -m pip install modelcost
```

## CLI

The default command calculates cost, so you can omit the `cost` subcommand.

```bash
# Default (cost)
modelcost gpt-4o 1000 500

# Explicit cost (optional)
modelcost cost gpt-4o 1000 500

# All sources in one run
modelcost --source all gpt-4o 1000 500

# JSON output
modelcost --json gpt-4o 1000 500
```

### Cached and reasoning tokens

Modern LLM APIs charge differently for cached input and reasoning output tokens.
Pass them as optional flags — they default to 0 so existing usage is unchanged.

```bash
# Cached input tokens (served from prompt cache)
modelcost gpt-4o 1000 500 --cached-input-tokens 200

# Cache creation tokens (first-time cache writes)
modelcost gpt-4o 1000 500 --cache-creation-input-tokens 100

# Reasoning tokens (subset of output_tokens, e.g. o1/R1 thinking)
modelcost deepseek/deepseek-r1 2000 5000 --reasoning-tokens 3000

# All together
modelcost gpt-4.1-mini 1000 500 \
  --cached-input-tokens 200 \
  --cache-creation-input-tokens 100 \
  --reasoning-tokens 150
```

### List models

```bash
modelcost models
modelcost models --source openrouter
modelcost models --filter gpt
modelcost models --json
```

### Help

```bash
modelcost --help
modelcost models --help
```

## Library

```python
from modelcost.calculator import calculate_cost, list_models

# Basic usage (backward compatible)
result = calculate_cost("gpt-4o", 1000, 500)

for source in result.available_sources:
    print(f"{source.source}: ${source.total_cost_usd:.6f}")

# With cached and reasoning tokens
result = calculate_cost(
    "gpt-4.1-mini",
    input_tokens=1000,
    output_tokens=500,
    cached_input_tokens=200,
    cache_creation_input_tokens=100,
    reasoning_tokens=150,
)

s = result.sources[0]
print(f"${s.total_cost_usd:.6f}")
print(f"  cache read:  ${s.price_per_million_cache_read}/M")
print(f"  reasoning:   ${s.price_per_million_reasoning}/M")

models = list_models("openrouter")
```

## Output details

`calculate_cost()` returns a `CostResult` with:
- `model`, `input_tokens`, `output_tokens`
- `cached_input_tokens`, `cache_creation_input_tokens`, `reasoning_tokens` (0 when not used)
- `sources`: list of `SourceCost` objects
- `available_sources`: only sources with prices found

Each `SourceCost` includes:
- `source`
- `total_cost_usd`
- `price_per_million_input`, `price_per_million_output`
- `price_per_million_cache_read`, `price_per_million_cache_creation`, `price_per_million_reasoning` (present only when the source has specific pricing for these)
- `error` (when not available)

### Cost formula

All subset tokens (`cached_input_tokens`, `cache_creation_input_tokens`, `reasoning_tokens`)
are treated as **subsets** of their parent total and are clamped accordingly:

```
text_input  = input_tokens  - cached_input - cache_creation
text_output = output_tokens - reasoning_tokens

total = text_input     * input_rate
      + cached_input   * cache_read_rate    (fallback: input_rate)
      + cache_creation * cache_creation_rate (fallback: input_rate)
      + text_output    * output_rate
      + reasoning      * reasoning_rate      (fallback: output_rate)
```

This matches how most APIs report usage — `input_tokens` and `output_tokens` are the
totals including cached/reasoning, and the detail fields are subsets.
When a specific rate is missing, the base rate for that category is used as fallback.

### JSON output

`--json` / `result.to_dict()` includes the new fields only when non-zero:

```json
{
  "model": "gpt-4.1-mini",
  "input_tokens": 1000,
  "output_tokens": 500,
  "cached_input_tokens": 200,
  "reasoning_tokens": 150,
  "costs": [
    {
      "source": "litellm",
      "total_cost_usd": 0.001148,
      "price_per_million_input": 0.4,
      "price_per_million_output": 1.6,
      "price_per_million_cache_read": 0.1,
      "price_per_million_cache_creation": 0.48,
      "price_per_million_reasoning": 1.6,
      "error": null
    }
  ]
}
```

## Caching

`openrouter` responses are cached in `~/.modelcost_cache.json` for 1 hour.

## Notes

- Prices are fetched at runtime from the upstream catalogs.
- If a model is missing in a source, that source is marked as unavailable.
- Network sources are fetched in parallel for the `all` option.
- `tokencost` does not expose cache/reasoning pricing — when used with `source="all"`, its cost may be higher than `litellm` for calls that include cached or reasoning tokens.
