Metadata-Version: 2.4
Name: llmdoctor
Version: 0.1.0
Summary: Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.
Project-URL: Homepage, https://github.com/Shahriyar-Khan27/llm-doctor
Project-URL: Repository, https://github.com/Shahriyar-Khan27/llm-doctor
Project-URL: Issues, https://github.com/Shahriyar-Khan27/llm-doctor/issues
Project-URL: Changelog, https://github.com/Shahriyar-Khan27/llm-doctor/blob/main/CHANGELOG.md
Author: llmdoctor contributors
License: MIT
License-File: LICENSE
Keywords: ai,anthropic,claude,cost,cost-optimization,gpt,linter,llm,openai,prompt-cache,static-analysis,tokens
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: click>=8.1
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Description-Content-Type: text/markdown

# llmdoctor

**Find LLM cost leaks before your bill does.**

`llmdoctor doctor` is a static analyzer for Python code that calls Anthropic
or OpenAI. It catches the patterns that quietly burn money in production:

- Prompt-cache placement bugs that invalidate the cache on every call
  (the bug claude-mem itself shipped — their issue #1890)
- Missing `max_tokens` caps where output tokens cost 3–10× input
- Premium models (Opus, GPT-5) used for tiny prompts where a cheaper model
  would produce indistinguishable output
- Large static system prompts left uncached

It's an advisor, not a runtime patcher. It reads your code, prints findings
with rough cost-impact estimates, and exits.

## Install

```bash
pip install llmdoctor
# or no-install:
pipx run llmdoctor doctor .
```

## Usage

```bash
llmdoctor doctor .              # scan current directory
llmdoctor doctor src/agent.py   # scan one file
llmdoctor doctor . --json       # for CI / piping into other tools
llmdoctor doctor . --fail-on HIGH   # exit 1 if any HIGH-severity issue
```

## What it looks like

```
╭─ llmdoctor doctor ─────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                    │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                           │
│ Estimated potential savings: ~$340/month  (rough estimate)       │
╰──────────────────────────────────────────────────────────────────╯

╭─ [HIGH] TS001 Dynamic content before cache_control invalidates the cache ─╮
│   file:  src/agent.py:42                                                  │
│   code:  {"type": "text", "text": f"User said: {user_query}"},            │
│   why:   System block at index 0 contains dynamic content but appears     │
│          BEFORE the first block with cache_control. ...                   │
│   fix:   Move static content BEFORE the cache_control marker. Move        │
│          dynamic content into the messages array.                         │
│   estimate: ~$135.00/month  (assuming: 3000-token system prompt, 100      │
│             calls/day, 30-day month, 0.1× cache-read pricing)             │
│   docs:  https://docs.anthropic.com/.../prompt-caching                    │
╰───────────────────────────────────────────────────────────────────────────╯
```

## Checks shipped in 0.1.0

| Code  | Severity | What it catches |
|-------|----------|-----------------|
| TS001 | HIGH     | Dynamic content placed before a `cache_control` marker (silently invalidates the prompt cache). |
| TS003 | MEDIUM   | Large static system prompt without `cache_control` (missed cache opportunity). |
| TS010 | HIGH     | OpenAI call with no `max_tokens` / `max_completion_tokens` (output cost unbounded). |
| TS011 | MEDIUM   | `max_tokens` set suspiciously high (likely a copy-paste default that enables runaway completions). |
| TS020 | MEDIUM   | Premium model (Opus, GPT-5, GPT-4-Turbo, GPT-4o) on a tiny static prompt where a cheaper tier would likely match quality. |

## How cost estimates are calculated

Estimates are **heuristic**, not invoice predictions. Each issue prints its
assumptions (e.g. *"100 calls/day, 30-day month, 3000-token system prompt"*).
Treat the numbers as order-of-magnitude. The tool's value is the *finding* and
the *fix*; the dollar number is the attention-grabber.

Pricing table is in `src/llmdoctor/pricing.py` — verified 2026-04-30. Submit
a PR if a model is missing or the price moves.

## What this tool deliberately does NOT do (yet)

- It does not patch your code. It reports, you fix.
- It does not run your code. Static analysis only — safe on closed-source repos.
- It does not measure live traffic. That's a different product (the SDK,
  coming next). The doctor is the first wedge.
- It does not check JavaScript / TypeScript. Python only in 0.1.0.
- It does not flag retry-storm patterns yet (planned: TS030).
- It does not detect tool-definition duplication across calls (planned: TS040).

If your codebase doesn't import `anthropic` or `openai` directly (e.g. you
use LangChain, LiteLLM, or hit the HTTP API), the doctor will produce no
findings. Adapter checks for those frameworks are a next step.

## Development

```bash
git clone https://github.com/Shahriyar-Khan27/llm-doctor
cd llmdoctor
pip install -e ".[dev]"
pytest
```

## License

MIT.

## Why we built this

We were designing a full LLM-cost SDK and went deep on prior art
([summary doc](../docs/summry.md)). The single highest-leverage finding:
prompt-cache placement bugs are everywhere, mostly invisible, and cost
serious money. Even a competent tool like claude-mem shipped one to prod.
Static analysis catches the whole class in seconds. So that's what shipped
first.
