Metadata-Version: 2.4
Name: scorchmark
Version: 0.6.0
Summary: Find the cache tax draining your AI bill: a cross-provider cache-TTL-waste detector + model-swap savings simulator + pricing-drift + per-agent attribution. MCP server + CLI.
Project-URL: Homepage, https://github.com/Nas01010101/scorchmark
Project-URL: Repository, https://github.com/Nas01010101/scorchmark
Project-URL: Issues, https://github.com/Nas01010101/scorchmark/issues
Author: Anas
License-Expression: MIT
License-File: LICENSE
Keywords: agents,anthropic,claude,cost,finops,llm,mcp,observability,openai,prompt-caching
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.11
Provides-Extra: all
Requires-Dist: cryptography>=42.0; extra == 'all'
Requires-Dist: fastmcp>=3.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: cryptography>=42.0; extra == 'dev'
Requires-Dist: fastmcp>=3.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: mcp
Requires-Dist: fastmcp>=3.0; extra == 'mcp'
Provides-Extra: pro
Requires-Dist: cryptography>=42.0; extra == 'pro'
Description-Content-Type: text/markdown

<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="assets/logo-dark.svg">
    <img src="assets/logo.svg" alt="Scorchmark" width="460">
  </picture>
</p>

<p align="center"><strong>The scorch mark on your AI bill. Scorchmark finds the cache-rebuild waste, model-swap savings, and silent price hikes your provider dashboard won't show you — as an MCP tool + CLI.</strong></p>

<p align="center">
  <img src="https://github.com/Nas01010101/scorchmark/actions/workflows/ci.yml/badge.svg" alt="CI">
  <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT">
  <img src="https://img.shields.io/badge/MCP-server-7c3aed.svg" alt="MCP server">
  <img src="https://img.shields.io/badge/tests-66%20passing-brightgreen.svg" alt="tests: 66 passing">
  <img src="https://img.shields.io/badge/runtime%20deps-0-success.svg" alt="runtime deps: 0">
</p>

<p align="center">
  <img src="assets/demo.gif" alt="Scorchmark catching runaway cache-rebuild spend live" width="760">
</p>

Your provider dashboard tells you *what* you spent — a day late. It never tells you *what you
wasted*: the cache rebuilds you re-paid for, the model that would have done the job for 40% less,
the silent price hike. Scorchmark reads a cost log you already have and finds that waste — starting
with the **cache-TTL waste behind the documented $6,000 overnight burn**, which no observability
tool we surveyed detects.

It answers the questions the dashboard can't: *where did the cache money go, which agent burned it,
and what would a cheaper model have saved?* Read-only and local — no proxy in your request path —
as an MCP tool your agent can call mid-run, or a one-line CLI.

| ❌ Without Scorchmark | ✅ With it |
|---|---|
| You find out from the limit email, a day late | The agent sees `warn` at wake-up #2, mid-run |
| Cache-TTL waste silently eats up to 90% of spend | `detect_cache_waste` flags it and prices the loss |
| No idea which agent burned the money | `cost_by_agent` attributes every dollar |
| "Should I have used a cheaper model?" stays unknowable | `simulate_model_swap` gives the exact saved % |

## Why this exists

Someone left Claude Code looping overnight to check PRs and woke up to a **$6,000 bill**. Not a bug
in their code. A cache-TTL change (1 hour down to 5 minutes) meant every 30-minute wake-up rebuilt
an 800k-token history at the cache *write* rate instead of the cheap cache *read*. The dashboard
showed nothing for days. The first warning was the limit email, after the money was gone.

The post got 1,400+ upvotes because everyone running unattended agents felt the cold sweat. And
there was no tripwire — every tool, including the provider's own dashboard, is retrospective.

Replaying that incident's shape through these tools:

```text
ingested 46 wake-ups          total spend $265
detect_cache_waste            45 rebuilds, $237 wasted (90% of spend)
check_budget($50 cap)         first 'warn' at wake-up #2 (burn $20/hr)
simulate_model_swap→Sonnet    $265 would have been $159 (save 40%)
```

90% of that spend was avoidable, and a $50 cap would have tripped on the **second** wake-up, not the
46th. (Absolute dollars scale with your context size and loop length; the waste fraction and the
early catch are the point.)

## How it's different from Helicone / Langfuse / LangSmith

Those are good tools. They are also a different shape of thing.

| | Helicone | Langfuse / LangSmith | **Scorchmark** |
|---|---|---|---|
| Form factor | Proxy in your request path | SDK / OTel tracing + dashboard | **MCP tool the agent calls** |
| When you learn | Dashboard, after the call | Dashboard, after the run | **In the loop, before the next call** |
| Who acts on it | A human reading a chart | A human reading a trace | **The agent itself** |
| Cache-TTL waste | Not detected | Not detected | **Detected and priced** |
| In your critical path | Yes (all traffic routed) | No | No (read-only on your logs) |
| Setup | Swap base URL | Instrument SDK | Point it at a log file |

The wedge: those tools tell *you* what happened. Scorchmark tells the *agent* what's about to
happen, in a form it can act on without a human in the loop. It adds nothing to your request path —
it reads a cost log you already have.

And it is **not** a spend *cap*. Hard budget enforcement is a commodity now — Cloudflare AI Gateway,
LiteLLM, and Portkey all block-before-the-call. Scorchmark does the part they don't: it tells you
*where the money leaked and what to change*. Cap your spend with a gateway; find the cache tax with
this. It is also not a full observability platform — if you want flame-graph traces and prompt
evals, run Langfuse. Scorchmark is the cost-intelligence layer, and it composes fine with both.

## Quickstart

The CLI core is pure stdlib (install pulls nothing). The MCP server adds one extra:

```bash
uv sync --extra mcp                                                   # MCP server deps (FastMCP)
uv run fastmcp run server.py                                          # stdio (Claude Desktop / Inspector)
uv run fastmcp run server.py --transport streamable-http --port 8000  # remote / MCPize
```

Add it to Claude Desktop / Cursor (`claude_desktop_config.json` or `.cursor/mcp.json`):

```json
{
  "mcpServers": {
    "scorchmark": {
      "command": "uv",
      "args": ["run", "fastmcp", "run", "/path/to/scorchmark/server.py"]
    }
  }
}
```

Then, in the loop:

```python
ingest_run(open("examples/sample_cost_log.jsonl").read())
check_budget(monthly_cap_usd=100, reset_day=1)   # ok | warn | breach, with ETA to the reset
detect_cache_waste()                             # the $6k pattern
simulate_model_swap(to_model="claude-haiku-4-5") # exact per-row savings, cross-provider OK
```

## Try it in your terminal (no MCP client)

**Run it on your own Claude Code usage — zero setup, a log you already have:**

```bash
uvx --from scorchmark scorchmark report --claude-code
# reads ~/.claude/projects/**/*.jsonl directly and prices every request
```

That auto-adapts Claude Code's session transcripts — no reformatting. Or point it at any cost log:

```bash
uvx --from scorchmark scorchmark report mylog.jsonl --cap 100   # auto-detects the format
uv run scorchmark report examples/sample_cost_log.jsonl --cap 50          # from a clone
cat mylog.jsonl | uv run scorchmark swap - --to claude-haiku-4-5          # reads stdin
```

Subcommands: `report` (all checks), `budget`, `cache-waste`, `by-agent`, `anomalies`, `swap`.
Add `--json` for the raw result, `--from {auto,scorchmark,claude-code}` to force a format, or point
the log argument at a **directory** of `.jsonl` files. Same engine as the MCP server.

## See it run

Real output from `examples/sample_cost_log.jsonl` — the live `warn`, the cache-waste dollars, and
the per-agent breakdown the provider dashboard never shows you (these are actual tool results, not
mockups):

![Live budget tripwire — warns mid-run](assets/screenshot-check-budget.png)

![Cache-TTL waste detection](assets/screenshot-cache-waste.png)

![Per-agent attribution](assets/screenshot-cost-by-agent.png)

## Tools

### Core

| Tool | What it does |
|---|---|
| `ingest_run` | Load a cost-log JSONL. Computes cost from the bundled pricing model when absent. |
| `check_budget` | Live tripwire: spend, burn rate, projected spend to the next reset, `ok`/`warn`/`breach`, ETA to cap. |
| `find_spend_anomalies` | Flags requests costing N× the agent's median — the loop-spike signature. |
| `detect_cache_waste` | Detects cache waste, modeling each provider's real economics (see below) and pricing it. |
| `cost_by_agent` | Per-agent attribution: cost, share, requests, average per request. |

### Edge (not offered by any tool we surveyed)

| Tool | What it does |
|---|---|
| `detect_spend_acceleration` | Flags a burn rate that doubles across consecutive windows (runaway context growth). |
| `simulate_model_swap` | Recomputes every past request at another model's price for the row-exact savings. Cross-provider. |
| `detect_pricing_drift` | Snapshots provider rates and surfaces any silent change — the root cause of the $6k burn. |

### Match (parity with the heavy gateways)

| Tool | What it does |
|---|---|
| `predict_rate_limit` | Projects ETA to a 429 from rate-limit headers, per dimension. |
| `detect_stuck_agent` | Flags an agent repeating the same tool call — the stuck-loop signature. |
| `build_alert_payload` | Turns any result into a Slack, ntfy, or PagerDuty webhook payload. |

Resource `scorchmark://pricing/current` exposes the curated cross-provider pricing model.

## Log format

One JSON object per line — the de-facto schema the cost trackers, and Claude's own usage fields,
already emit:

```json
{"request_id": "r1", "ts": "2026-06-21T03:00:00Z", "provider": "anthropic",
 "model": "claude-opus-4-8", "agent_id": "pr-loop", "input_tokens": 2000,
 "output_tokens": 1500, "cache_write_tokens": 800000, "cache_read_tokens": 0}
```

`ts` accepts ISO-8601 (naive timestamps are read as UTC), epoch seconds, or epoch milliseconds.
`cost_usd` is optional and computed when absent. Anthropic's native `cache_creation_input_tokens`
and `cache_read_input_tokens` are accepted too. Two optional field groups unlock extra tools:

| Field group | Unlocks |
|---|---|
| `rate_limit_remaining_tokens`, `rate_limit_limit_tokens`, `rate_limit_reset_s` (and `*_requests`) | `predict_rate_limit` |
| `tool_name`, `tool_args_hash` | `detect_stuck_agent` |

## Alerting

Set `SCORCHMARK_WEBHOOK_URL` to a JSON webhook (Slack, ntfy, PagerDuty), then call
`check_budget(..., alert=True)` to POST a payload on `warn` or `breach`. Or call
`build_alert_payload(result)` on any tool's output and route it yourself.

The webhook URL is yours to supply; if you self-host this for others, validate/allowlist it (an
attacker-controlled URL is an SSRF vector).

## Pricing

| Tier | Price | For |
|---|---|---|
| Free | $0 | find the cache tax on your own logs — the solo dev who got burned once |
| Pro | $19/mo | unattended loops: cache-waste + burn-acceleration early warning, webhook alerting |
| Team | $49/mo | per-agent attribution, model-swap savings simulator, pricing-drift, audit-trail export |

Catch one runaway loop and it has paid for itself many times over — and the free tier alone catches
the $6k cache-TTL pattern.

**Unlocking Pro/Team.** The paid tools (`detect_spend_acceleration`, `cost_by_agent`,
`simulate_model_swap`, `detect_pricing_drift`, and webhook alerting) unlock with a signed license
key, verified **offline** (Ed25519 — no phone-home, so the no-outbound-calls guarantee holds):

```bash
pip install 'scorchmark[pro]'      # adds the offline verifier
export SCORCHMARK_LICENSE=SCM1.....       # the key from your purchase
scorchmark license                        # confirm → tier: TEAM · active
```

Buy via MCPize (managed billing) or direct Stripe — the full get-paid setup is in
[PAYMENTS.md](PAYMENTS.md). The free tier needs none of this and stays pure-stdlib.

## Security

Read-only, local-only, no credential storage, no outbound calls from core logic. Core modules have
zero runtime dependencies beyond the Python standard library. See [SECURITY.md](SECURITY.md).

## Cross-provider cache economics

The three providers price caching three different ways, and `detect_cache_waste` models each —
this is why a generic "cache miss" tool gets the dollars wrong.

| Provider | Cache model | Where the waste is | Same slow loop* |
|---|---|---|---|
| Anthropic | Write **premium** (write = 1.25× input, read = 0.1×) | Re-paying the write premium when the loop interval exceeds the TTL | **$237** |
| OpenAI | Automatic, **no write premium** (read = 0.1× input) | A stable prefix that never cache-hits, losing the ~90% read discount | $46 (gpt-5) |
| Google Gemini | Read discount **plus hourly storage** ($4.50/M-tok/hr Pro) | Missed discount, and storage billed on idle explicit caches | $46 (2.5-pro) |

*Same 46-wake-up, 800k-context, 30-min loop, priced per provider. The catastrophic version is
Anthropic-specific — the write premium is what turned that loop into a $6k bill. On OpenAI/Gemini the
identical loop "only" forfeits the read discount, which the detector reports honestly as a smaller number.

## Accuracy

Anthropic, OpenAI, and Google rates in `pricing.py` were re-verified on 2026-06-22 against each
provider's official pricing page (platform.claude.com, developers.openai.com/api/docs/pricing,
ai.google.dev/gemini-api/docs/pricing) — every row confirmed, and the current OpenAI tiers
(incl. `gpt-5.4-mini` / `-nano`) added. The tables are the standard context tier; very-large-context
pricing (OpenAI >272K, Gemini >200K) and Gemini's hourly cache-*storage* fee are noted but not priced
per row, since the cost log carries no context-tier or cache-lifetime field. `detect_pricing_drift`
exists because providers change rates without notice — run it regularly.

## License

**Code: MIT** (see [LICENSE](LICENSE)) — the entire source, including the gated tools, is free
to read, fork, and modify. This is honest open-core, not DRM.

**Pro/Team license keys** are a separate commercial purchase: a signed key (verified offline,
Ed25519) that activates the paid tools in official builds and funds the maintained, cross-provider
pricing model. Buying a key supports the project and gets you the official tier — the MIT license
means you *could* edit the gate out, but the key is what keeps the pricing data and the audit-trail
tier maintained. Keys are per-purchaser and non-transferable; sold with no warranty (the MIT terms
govern the software itself). Buy via MCPize (managed billing) or direct Stripe — see
[PAYMENTS.md](PAYMENTS.md).
