Metadata-Version: 2.4
Name: llmdoctor
Version: 0.2.4
Summary: Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.
Author-email: llmdoctor <issues.llmdoctor@gmail.com>
Maintainer-email: llmdoctor <issues.llmdoctor@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,anthropic,claude,cost,cost-optimization,gpt,linter,llm,openai,prompt-cache,static-analysis,tokens
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: click>=8.1
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Description-Content-Type: text/markdown

# llmdoctor

A static analyzer for Python codebases that detects LLM cost-leak
patterns before deployment.

[![PyPI version](https://img.shields.io/pypi/v/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)
[![Python versions](https://img.shields.io/pypi/pyversions/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)
[![License](https://img.shields.io/pypi/l/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)

---

## Overview

`llmdoctor` reads Python source code and reports configuration patterns
that have been observed to cause disproportionate token consumption in
production LLM deployments. Each finding includes the affected source
location, an explanation of the cost mechanism, a recommended remediation,
and a heuristic monthly cost estimate based on a stated traffic profile.

The tool supports two integration surfaces:

- The official Anthropic and OpenAI Python SDKs (`anthropic.Anthropic`,
  `openai.OpenAI`).
- The LangChain framework (`langchain_anthropic.ChatAnthropic`,
  `langchain_openai.ChatOpenAI`, `langchain.agents.AgentExecutor`).

`llmdoctor` performs no code execution, issues no network requests, and
emits no telemetry. It is intended for use in code review and continuous
integration pipelines.

---

## Installation

```bash
pip install llmdoctor
```

Requires Python 3.9 or later.

---

## Usage

```bash
llmdoctor doctor .                  # scan the current directory
llmdoctor doctor src/agent.py       # scan a single file
llmdoctor doctor . --json           # emit JSON for downstream tooling
llmdoctor doctor . --fail-on HIGH   # exit non-zero if any HIGH finding
```

The `--fail-on` flag is intended for CI integration. The accepted values
are `HIGH`, `MEDIUM`, `LOW`, and `INFO`. The exit code is `1` if any
finding at or above the specified severity is present, and `0` otherwise.

---

## Example output

```
╭── llmdoctor doctor ───────────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                         │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                                │
│ Estimated potential savings: ~$340/month  (heuristic)                 │
╰───────────────────────────────────────────────────────────────────────╯

╭── [HIGH] TS103  AgentExecutor with max_iterations=None ───────────────╮
│   file:  src/agent_factory.py:23                                      │
│   code:  agent = AgentExecutor(agent=llm, tools=tools,                │
│                                max_iterations=None)                   │
│   why:   max_iterations=None disables the loop cap. If the agent's    │
│          stop condition fails to trigger, the loop runs unbounded.    │
│          Reported per-session cost in 2026 incidents: $1,000-$5,000.  │
│   fix:   Set max_iterations to a finite value (LangChain default is   │
│          15). Pair with max_execution_time for a wall-clock cap.      │
╰───────────────────────────────────────────────────────────────────────╯
```

Each finding includes the source location, the cost mechanism, a
remediation, and a cost estimate with the assumptions printed inline.
The tool does not state cost figures without disclosing the assumptions
used to derive them.

---

## Check reference

Each check is identified by a stable `TSnnn` code (TS = "token-saving").
The numbering is structural rather than chronological: the **0xx-series**
fires on direct Anthropic / OpenAI SDK calls, and the **1xx-series**
fires on LangChain wrappers. Pairs like TS010 ↔ TS101 are the same
shape of bug detected at the two surfaces.

The checks cluster into four failure modes: prompt-cache placement
(TS001 / TS003), output caps (TS010 / TS011 / TS101 / TS102), model
overspecification (TS020), and agent loop runaway (TS103 / TS104).

| Code  | Severity | Surface     | Description |
|-------|----------|-------------|-------------|
| TS001 | HIGH     | Anthropic SDK | Dynamic content placed before a `cache_control` marker; invalidates the prompt cache on every call. |
| TS003 | MEDIUM   | Anthropic SDK | Long static system prompt with no `cache_control` marker; pays full input cost on every call when 0.1× cache-read pricing was available. |
| TS010 | HIGH     | OpenAI SDK  | `chat.completions.create()` without `max_tokens` / `max_completion_tokens`; output cost unbounded on a single ramble. |
| TS011 | MEDIUM   | OpenAI / Anthropic SDK | `max_tokens` above 8000; suspected copy-paste default, rarely matched by actual response length. |
| TS020 | MEDIUM   | OpenAI / Anthropic SDK | Premium-tier model (Opus, GPT-5, GPT-4-Turbo, GPT-4o) on a call whose static prompt is short enough that a cheaper tier (Haiku, GPT-4o-mini) is likely to match quality at a fraction of the cost. |
| TS101 | HIGH     | LangChain   | `ChatOpenAI()` instantiated without `max_tokens`. All downstream `.invoke()` calls inherit unbounded output. |
| TS102 | MEDIUM   | LangChain   | `ChatOpenAI` / `ChatAnthropic` with `max_tokens` above 8000; same copy-paste-default risk as TS011, inherited by every downstream `.invoke()`. |
| TS103 | HIGH     | LangChain   | `AgentExecutor` instantiated with `max_iterations=None`; a single stuck session has cost $1,000–$5,000 in tokens in documented 2026 incidents. |
| TS104 | MEDIUM   | LangChain   | `AgentExecutor` with `max_iterations` above 50 — the "bumped the cap as a workaround" anti-pattern. At ~$0.10/turn, 200 iterations is ~$20 per stuck session. |

---

## Suppression

To disable a specific check on a given line, append a comment in the
following form:

```python
client.chat.completions.create(...)  # llmdoctor: ignore TS010
```

To disable all checks on a given line, use `# llmdoctor: ignore ALL`.

The suppression scope is per-line. Multiple codes may be specified in a
single comment, separated by commas.

---

## Comparison with adjacent tooling

`llmdoctor` operates statically and is complementary to runtime tools.
The intended usage pattern is to run `llmdoctor` in continuous integration
and run an observability tool in production.

| Tool category        | When it runs    | Catches                  | Network |
|----------------------|-----------------|--------------------------|---------|
| `llmdoctor`          | static (CI)     | cost-leak patterns in source | None |
| Helicone, Langfuse, OpenLLMetry | runtime proxy / SDK | metrics, traces, costs | Required |
| Mem0, Letta          | runtime, agent loop | memory drift          | Required |
| LLMLingua            | runtime, prompt rewrite | token bloat       | Required |

---

## Cost estimate methodology

Cost estimates are heuristic and intended as order-of-magnitude
indicators.

**Formula.** Each estimate is computed as:

```
monthly_usd  =  tokens_per_call × calls_per_day × 30 × $/Mtok ÷ 1,000,000
```

The savings figure for each finding is the difference between the
projected monthly cost of the bug and the projected cost after the
fix.

**Default traffic profile.** 100 calls per day across a 30-day month,
with a 3000-token system prompt where applicable. Per-check overrides:

- **TS001 / TS003** (cache misuse): 3000-token system prompt × 100
  calls/day; savings = full input cost − 0.1× cache-read cost.
- **TS011 / TS102** (high `max_tokens`): assumes 30% of calls produce
  30% of the cap, compared against a 2048-token baseline.
- **TS020** (premium model on tiny prompt): 200 input + 200 output
  tokens × 1000 calls/day, compared against the cheaper-tier
  alternative (Haiku or `gpt-4o-mini`).
- **TS010 / TS101 / TS103 / TS104**: no monthly figure. These are
  unbounded-cost bugs where the relevant unit is *per-incident*, not
  monthly average — the per-session dollar callout is surfaced in the
  finding's explanation instead of a `$` field.

**Pricing data.** Per-million-token rates are bundled with the package
and were verified against provider pages on 2026-04-30. The table
covers the Claude 4 family (Opus, Sonnet, Haiku) and the OpenAI
commercial line (GPT-5, GPT-4-Turbo, GPT-4o, GPT-4o-mini). Anthropic
cache reads are priced at 0.1× input; OpenAI's automatic prompt cache
is reflected where the provider exposes it.

The assumptions used in each estimate are printed inline with the
finding. Estimates are not invoice predictions; users with traffic
substantially above or below the default profile should scale
accordingly.

If the model name in a call cannot be resolved to a known entry,
no cost estimate is produced. This behavior is deliberate: the tool
reports a finding without a dollar figure rather than emit a guess.

---

## Capabilities and limitations

| Capability                                                               | Status         |
|--------------------------------------------------------------------------|----------------|
| Detect direct-SDK and LangChain configuration bugs in Python source       | Supported      |
| Apply automatic fixes to source                                           | Not supported  |
| Execute or import the analyzed code                                       | Not supported  |
| Measure live traffic, cache hit rates, or response usage                  | Planned        |
| Analyze TypeScript or JavaScript source                                   | Planned        |
| Recognize LiteLLM, OpenRouter, raw HTTP, or arbitrary wrapper functions   | Not supported  |
| Issue network requests, ship telemetry, or collect usage data             | Not implemented |

If a codebase does not import `anthropic`, `openai`, `langchain_anthropic`,
or `langchain_openai` directly, `llmdoctor` will produce no findings.
This is by design; the tool's matching is intentionally conservative.

---

## Roadmap

| Version  | Status       | Scope                                                                   |
|----------|--------------|-------------------------------------------------------------------------|
| 0.1.0    | Released     | Direct-SDK checks: TS001, TS003, TS010, TS011, TS020.                   |
| 0.2.0    | Released     | LangChain adapter: TS101, TS102, TS103, TS104.                          |
| 0.3.0    | Planned      | LlamaIndex adapter for `Anthropic`, `OpenAI`, and `ReActAgent`.         |
| 0.4.0    | Planned      | TS030 (retry without budget); TS040 (tool-definition repetition).       |
| 0.5.0    | Planned      | Optional runtime sidecar reading `cache_read_input_tokens` from live API responses. |
| 1.0.0    | Planned      | TypeScript and Node.js support across all check classes.                |

---

## Frequently asked questions

**Does the tool execute analyzed code?**
No. The tool uses `ast.parse` exclusively. Analyzed code is never
imported, executed, or compiled.

**Does the tool make network requests?**
No. The package contains no network-related imports. No telemetry,
usage reporting, or version-check beacon is implemented.

**Is LiteLLM, OpenRouter, or a custom wrapper supported?**
Not in the current release. Adapter modules for additional frameworks
are planned. The maintainer welcomes specific patterns observed in
production code.

---

## Contact

The issue tracker is private during the 0.x release series. To report a
bug, suggest a check, or share a real-world cost-leak pattern, contact
the maintainer at:

**issues.llmdoctor@gmail.com**

The maintainer aims to respond to actionable bug reports within a few
business days.

---

## License

MIT. The full license text is bundled with the installed package at
`llmdoctor-<version>.dist-info/licenses/LICENSE`.
