Metadata-Version: 2.4
Name: llmdoctor
Version: 0.2.0
Summary: Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.
Author-email: llmdoctor <issues.llmdoctor@gmail.com>
Maintainer-email: llmdoctor <issues.llmdoctor@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,anthropic,claude,cost,cost-optimization,gpt,linter,llm,openai,prompt-cache,static-analysis,tokens
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: click>=8.1
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Description-Content-Type: text/markdown

# llmdoctor

**Static analysis for LLM cost leaks. Catch the bugs that show up on next month's invoice — before they ship.**

[![PyPI version](https://img.shields.io/pypi/v/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)
[![Python versions](https://img.shields.io/pypi/pyversions/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)
[![License](https://img.shields.io/pypi/l/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)

> A single stuck agent session can cost **$1,000–$5,000 in tokens**.
> A misplaced `cache_control` marker turns every cached call into a cache *write*.
> Output without `max_tokens` can blow $50 on one ramble.
>
> `llmdoctor` finds these patterns in your code in seconds — before they ship.

---

## What it does

`llmdoctor` reads your Python source and reports the LLM-cost bugs most likely to hurt you in production. It works on:

- **Direct SDKs** — `anthropic.Anthropic()` and `openai.OpenAI()`
- **LangChain** — `ChatAnthropic`, `ChatOpenAI`, `AgentExecutor`

It runs in seconds. No runtime dependency. No telemetry. No agents. No code execution. Just a CLI you point at a path.

```bash
pip install llmdoctor
llmdoctor doctor src/
```

That's the whole UX.

---

## Why this exists

LLM bills in 2026 surprise teams in three ways:

| Surprise | Typical cause | What llmdoctor catches |
|---|---|---|
| **Cache hit rate stays at 0%** | Dynamic content placed before `cache_control` marker silently invalidates the prefix on every call | TS001 |
| **One ramble blows $50** | No `max_tokens` cap on an OpenAI call or LangChain wrapper | TS010, TS101 |
| **Single agent session burns $1k–$5k** | `AgentExecutor(max_iterations=None)` lets a stuck loop run unbounded | TS103 |

These bugs are easy to write, hard to spot at runtime (the cost shows up days later in billing dashboards), and trivially detectable in source.

---

## Quickstart

```bash
# install
pip install llmdoctor

# scan
llmdoctor doctor .                  # current directory
llmdoctor doctor src/agent.py       # single file
llmdoctor doctor . --json           # CI-friendly output
llmdoctor doctor . --fail-on HIGH   # exit 1 in CI on any HIGH finding
```

---

## Example output

```
╭─ llmdoctor doctor ─────────────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                           │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                                  │
│ Estimated potential savings: ~$340/month  (rough estimate)              │
╰─────────────────────────────────────────────────────────────────────────╯

╭─ [HIGH] TS103 AgentExecutor with max_iterations=None — single session
            can cost $1k+ in tokens                                       ─╮
│   file:  src/agent_factory.py:23                                         │
│   code:  agent = AgentExecutor(agent=llm, tools=tools, max_iterations=None) │
│   why:   Setting max_iterations=None removes the loop cap. If the         │
│          agent's stop condition fails to trigger, the agent runs          │
│          forever — racking up $1,000–$5,000 in tokens per session in      │
│          documented 2026 incidents.                                       │
│   fix:   Set max_iterations=15 (the LangChain default) or higher if       │
│          your agent genuinely needs more depth. Always pair with          │
│          max_execution_time=... for a wall-clock cap.                     │
╰──────────────────────────────────────────────────────────────────────────╯
```

Every finding includes: the **exact line**, the **why**, the **concrete fix**, and a **rough cost estimate** with the assumptions printed inline. We never quote a dollar number without showing how we got it.

---

## Check catalog (v0.2.0)

| Code | Severity | Surface | Catches |
|---|---|---|---|
| **TS001** | HIGH | Anthropic SDK | Dynamic content placed before `cache_control` — silently invalidates the prompt cache on every call |
| **TS003** | MEDIUM | Anthropic SDK | Long static system prompt with no `cache_control` — missed cache opportunity |
| **TS010** | HIGH | OpenAI SDK | `chat.completions.create()` without `max_tokens` — output cost unbounded |
| **TS011** | MEDIUM | OpenAI / Anthropic SDK | `max_tokens > 8000` — likely a copy-paste default |
| **TS020** | MEDIUM | OpenAI / Anthropic SDK | Premium model (Opus, GPT-5, GPT-4-Turbo) on a tiny prompt where a cheaper tier would match quality |
| **TS101** | HIGH | LangChain | `ChatOpenAI()` instantiated without `max_tokens` — every downstream `.invoke()` inherits unbounded output |
| **TS102** | MEDIUM | LangChain | `ChatOpenAI` / `ChatAnthropic` with `max_tokens > 8000` |
| **TS103** | **HIGH** | LangChain | **`AgentExecutor(max_iterations=None)` — explicitly unbounded agent loop** |
| **TS104** | MEDIUM | LangChain | `AgentExecutor(max_iterations > 50)` — the "I bumped the cap as a workaround" anti-pattern |

---

## How llmdoctor compares

We're not competing with runtime observability tools — we're **shift-left**. Run llmdoctor in CI; run an observability tool in prod.

| | **llmdoctor** | Helicone, Langfuse, OpenLLMetry | Mem0, Letta | LLMLingua |
|---|---|---|---|---|
| **Where** | static, in CI | runtime proxy / SDK | runtime, in agent loop | runtime, prompt rewrite |
| **When it fires** | before deploy | after each call | per session | per call |
| **Catches** | bugs in source | metrics, traces, costs | memory drift | token bloat |
| **Network** | none | required | required | required |
| **Adds latency?** | no — only runs in CI | yes (~10–50ms) | yes | yes |
| **Best for** | shift-left cost gates | observability dashboards | persistent agent memory | aggressive token compression |

Use llmdoctor *and* your observability tool. They cover different failure modes.

---

## Cost estimates: how to read them

Estimates are **heuristic, not invoice predictions**. Each finding prints its assumptions inline:

```
estimate: ~$135.00/month  (assuming: 3000-token system prompt,
                          100 calls/day, 30-day month, 0.1× cache-read pricing)
```

Treat the dollar number as **order of magnitude**. The value llmdoctor delivers is the *finding* and the *fix*; the estimate is there to make it actionable. If your traffic is 10× ours, multiply. If it's 1/10, divide. The pricing table is pinned inside the installed package at `llmdoctor/pricing.py` and was verified against provider pages on 2026-04-30.

---

## Self-audit

We built llmdoctor knowing a measurement tool with a bug is worse than no tool. Before publishing we ran a four-pass audit:

- **Correctness** — 11 weird AST shapes (async, `**kwargs`, walrus, multi-target, augmented assigns, mocked clients, deeply nested calls). Zero false positives.
- **Input safety** — fixed five concrete bugs before shipping: UTF-8 BOM crash, OOM on 6 MB generated `.py`, `ValueError` from `ast.parse` on binary content, `RecursionError` on minified code, plus rich-markup injection through filenames.
- **Security** — no `eval`, no `exec`, no network calls, no telemetry. `socket`/`requests`/`httpx`/`urllib` not imported anywhere. Verified.
- **Honest false negatives** — limitations like bound-method assignment, multi-target assigns, mock clients, LiteLLM/OpenRouter/raw-HTTP, and TypeScript codebases are documented in source so the tool never silently lies about coverage.

**30 tests, all passing in CI.** The audit and the test suite are run on every release; results are summarised in this README so a reader on PyPI can verify what was checked without needing to access source.

---

## What we don't do (yet)

- ✅ Catch direct SDK + LangChain config bugs in Python source
- ❌ Patch your code automatically (we report; you fix)
- ❌ Run your code (static analysis only — safe on closed-source)
- ❌ Measure live traffic (that's the runtime sidecar, on the roadmap)
- ❌ TypeScript / JavaScript (Python only today)
- ❌ LiteLLM, OpenRouter, raw HTTP, or arbitrary wrapper functions
- ❌ Phone home, ship telemetry, or collect usage data — ever

If your codebase doesn't import `anthropic`, `openai`, `langchain_anthropic`, or `langchain_openai` directly, llmdoctor will produce zero findings. That's a feature, not a bug.

---

## Roadmap

| Version | Theme | What |
|---|---|---|
| ✅ **0.1.0** | direct SDK | TS001/003/010/011/020 — Anthropic + OpenAI direct calls |
| ✅ **0.2.0** | LangChain | TS101/102/103/104 — `ChatOpenAI`, `ChatAnthropic`, `AgentExecutor` |
| **0.3.0** | LlamaIndex | `Anthropic`, `OpenAI`, `ReActAgent` from `llama_index.*` |
| **0.4.0** | retry storms + tool dup | TS030 (retry without budget), TS040 (tool-definition repetition) |
| **0.5.0** | runtime sidecar | optional Python wrapper that reads `cache_read_input_tokens` from live responses, surfacing cache drift before billing does |
| **1.0** | TypeScript | `@anthropic-ai/sdk` + `openai` (Node) — the same checks for the JS ecosystem |

---

## Why we built it

LLM bills in 2026 are increasingly metered, and the bugs that drive them aren't visible until the bill arrives. Engineers writing prompt caches, configuring LangChain agents, or tweaking `max_tokens` make small mistakes that quietly translate into 10×-or-more cost penalties — the kind that don't show up until the next finance review.

Existing tools catch this *after* the call. We catch it *before* the deploy. Static analysis is the right place for this because the bugs have specific syntactic shapes — exactly what AST tooling is built for.

---

## FAQ

**Does it run my code?**
No. We use `ast.parse`, never `eval`/`exec`/`compile(... 'exec')`. Safe to run on closed-source repositories.

**Does it phone home?**
No. Zero network calls anywhere in the codebase. No telemetry, no usage stats, no opt-in beacon. This is a hard requirement for OSS releases — we'd consider it a breaking change to add.

**Will it false-positive on my mocked tests?**
Probably not — we exercise mock-client patterns in our test suite. If you hit one, suppress the line with `# llmdoctor: ignore TS001` (or `ignore ALL`) and email us with the snippet so we can lock in a regression test.

**My CI uses LiteLLM / OpenRouter / a custom wrapper. Will it catch anything?**
Not yet — those wrappers don't go through `anthropic.Anthropic()` or `langchain_*` constructors. Adapter checks are planned per framework. Vote on what to support next via the maintainer email below.

---

## Get in touch

Issue tracker is currently private while we stabilize 0.x. To suggest a check, report a false positive, or share a real-world cost-leak you'd like us to detect:

📧 **issues.llmdoctor@gmail.com**

We aim to respond to actionable bug reports within a few business days.

---

## License

MIT. The full license text is bundled inside the installed package
(`llmdoctor-<version>.dist-info/licenses/LICENSE`).

Built with measurement-honesty over feature-breadth. The self-audit lives in
this README — not behind a click-through — because a measurement tool that's
wrong is worse than no tool at all.
