Metadata-Version: 2.4
Name: llmdoctor
Version: 0.2.1
Summary: Find LLM cost leaks before your bill does. Static analysis for Anthropic and OpenAI client code.
Author-email: llmdoctor <issues.llmdoctor@gmail.com>
Maintainer-email: llmdoctor <issues.llmdoctor@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,anthropic,claude,cost,cost-optimization,gpt,linter,llm,openai,prompt-cache,static-analysis,tokens
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: click>=8.1
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Description-Content-Type: text/markdown

# llmdoctor

A static analyzer for Python codebases that detects LLM cost-leak
patterns before deployment.

[![PyPI version](https://img.shields.io/pypi/v/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)
[![Python versions](https://img.shields.io/pypi/pyversions/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)
[![License](https://img.shields.io/pypi/l/llmdoctor.svg)](https://pypi.org/project/llmdoctor/)

---

## Overview

`llmdoctor` reads Python source code and reports configuration patterns
that have been observed to cause disproportionate token consumption in
production LLM deployments. Each finding includes the affected source
location, an explanation of the cost mechanism, a recommended remediation,
and a heuristic monthly cost estimate based on a stated traffic profile.

The tool supports two integration surfaces:

- The official Anthropic and OpenAI Python SDKs (`anthropic.Anthropic`,
  `openai.OpenAI`).
- The LangChain framework (`langchain_anthropic.ChatAnthropic`,
  `langchain_openai.ChatOpenAI`, `langchain.agents.AgentExecutor`).

`llmdoctor` performs no code execution, issues no network requests, and
emits no telemetry. It is intended for use in code review and continuous
integration pipelines.

---

## Installation

```bash
pip install llmdoctor
```

Requires Python 3.9 or later.

---

## Usage

```bash
llmdoctor doctor .                  # scan the current directory
llmdoctor doctor src/agent.py       # scan a single file
llmdoctor doctor . --json           # emit JSON for downstream tooling
llmdoctor doctor . --fail-on HIGH   # exit non-zero if any HIGH finding
```

The `--fail-on` flag is intended for CI integration. The accepted values
are `HIGH`, `MEDIUM`, `LOW`, and `INFO`. The exit code is `1` if any
finding at or above the specified severity is present, and `0` otherwise.

---

## Example output

```
╭── llmdoctor doctor ───────────────────────────────────────────────────╮
│ Scanned 14 file(s) under src/                                         │
│ Found 3 issue(s)  ·  2 HIGH · 1 MEDIUM                                │
│ Estimated potential savings: ~$340/month  (heuristic)                 │
╰───────────────────────────────────────────────────────────────────────╯

╭── [HIGH] TS103  AgentExecutor with max_iterations=None ───────────────╮
│   file:  src/agent_factory.py:23                                      │
│   code:  agent = AgentExecutor(agent=llm, tools=tools,                │
│                                max_iterations=None)                   │
│   why:   max_iterations=None disables the loop cap. If the agent's    │
│          stop condition fails to trigger, the loop runs unbounded.    │
│          Reported per-session cost in 2026 incidents: $1,000-$5,000.  │
│   fix:   Set max_iterations to a finite value (LangChain default is   │
│          15). Pair with max_execution_time for a wall-clock cap.      │
╰───────────────────────────────────────────────────────────────────────╯
```

Each finding includes the source location, the cost mechanism, a
remediation, and a cost estimate with the assumptions printed inline.
The tool does not state cost figures without disclosing the assumptions
used to derive them.

---

## Check reference (v0.2.0)

| Code  | Severity | Surface     | Description |
|-------|----------|-------------|-------------|
| TS001 | HIGH     | Anthropic SDK | Dynamic content placed before a `cache_control` marker; invalidates the prompt cache on every call. |
| TS003 | MEDIUM   | Anthropic SDK | Static system prompt exceeding the cache-eligibility threshold but lacking a `cache_control` marker. |
| TS010 | HIGH     | OpenAI SDK  | `chat.completions.create()` invoked without `max_tokens` or `max_completion_tokens`. |
| TS011 | MEDIUM   | OpenAI / Anthropic SDK | `max_tokens` set above 8000. Suspected copy-paste default; rarely matched by actual response length. |
| TS020 | MEDIUM   | OpenAI / Anthropic SDK | A premium-tier model (Opus, GPT-5, GPT-4-Turbo, GPT-4o) is used in a call whose static content is below the tiny-prompt threshold. |
| TS101 | HIGH     | LangChain   | `ChatOpenAI()` instantiated without `max_tokens`. All downstream `.invoke()` calls inherit unbounded output. |
| TS102 | MEDIUM   | LangChain   | `ChatOpenAI` or `ChatAnthropic` instantiated with `max_tokens` above 8000. |
| TS103 | HIGH     | LangChain   | `AgentExecutor` instantiated with `max_iterations=None`. |
| TS104 | MEDIUM   | LangChain   | `AgentExecutor` instantiated with `max_iterations` above 50. |

---

## Suppression

To disable a specific check on a given line, append a comment in the
following form:

```python
client.chat.completions.create(...)  # llmdoctor: ignore TS010
```

To disable all checks on a given line, use `# llmdoctor: ignore ALL`.

The suppression scope is per-line. Multiple codes may be specified in a
single comment, separated by commas.

---

## Comparison with adjacent tooling

`llmdoctor` operates statically and is complementary to runtime tools.
The intended usage pattern is to run `llmdoctor` in continuous integration
and run an observability tool in production.

| Tool category        | When it runs    | Catches                  | Network |
|----------------------|-----------------|--------------------------|---------|
| `llmdoctor`          | static (CI)     | cost-leak patterns in source | None |
| Helicone, Langfuse, OpenLLMetry | runtime proxy / SDK | metrics, traces, costs | Required |
| Mem0, Letta          | runtime, agent loop | memory drift          | Required |
| LLMLingua            | runtime, prompt rewrite | token bloat       | Required |

---

## Cost estimate methodology

Cost estimates are heuristic and intended as order-of-magnitude
indicators. Each estimate is computed from:

- The pricing table in the installed package
  (`llmdoctor/pricing.py`), verified against provider pages on
  2026-04-30.
- A default traffic profile of 100 calls per day across a 30-day month,
  with a 3000-token system prompt where applicable.

The assumptions used in each estimate are printed inline with the
finding. Estimates are not invoice predictions; users with traffic
substantially above or below the default profile should scale
accordingly.

If the model name in a call cannot be resolved to a known entry in the
pricing table, no cost estimate is produced. This behavior is
deliberate: the tool reports a finding without a dollar figure rather
than emit a guess.

---

## Pre-publication audit

The following pre-publication audit was performed on v0.2.0. The full
test suite (30 tests) passes in continuous integration on every release.

**Checker correctness.** Eleven AST shapes were exercised as fixtures
that must not produce findings: async calls, `**kwargs` unpacking,
walrus expressions, multi-target assignments, annotated assignments,
augmented assignments, empty system arrays, bound-method assignment,
mock clients, and deeply nested calls. No false positives were
observed.

**Input safety.** Five concrete defects were resolved before release:
UTF-8 BOM crash on files written by Windows Notepad; out-of-memory
risk on multi-megabyte generated `.py` files (resolved with a 5 MB
size cap configurable via `LLMDOCTOR_MAX_FILE_BYTES`); unhandled
`ValueError` from `ast.parse` on certain binary content; unhandled
`RecursionError` from the visitor on minified source; and rich-markup
injection through user-controlled file paths and code snippets.

**Security review.** The package contains no usage of `eval`, `exec`,
or `compile(..., 'exec')`. The package does not import `socket`,
`requests`, `httpx`, or `urllib`. No telemetry or usage reporting is
implemented or planned.

**Documented limitations.** Bound-method assignment to local variables,
multi-target assignments, mock clients with realistic-looking model
names, calls through arbitrary wrapper functions, LiteLLM, OpenRouter,
raw HTTP, and TypeScript codebases are not currently detected. Each
limitation is documented in source so the tool does not silently
overstate coverage.

---

## Capabilities and limitations

| Capability                                                               | Status         |
|--------------------------------------------------------------------------|----------------|
| Detect direct-SDK and LangChain configuration bugs in Python source       | Supported      |
| Apply automatic fixes to source                                           | Not supported  |
| Execute or import the analyzed code                                       | Not supported  |
| Measure live traffic, cache hit rates, or response usage                  | Planned        |
| Analyze TypeScript or JavaScript source                                   | Planned        |
| Recognize LiteLLM, OpenRouter, raw HTTP, or arbitrary wrapper functions   | Not supported  |
| Issue network requests, ship telemetry, or collect usage data             | Not implemented |

If a codebase does not import `anthropic`, `openai`, `langchain_anthropic`,
or `langchain_openai` directly, `llmdoctor` will produce no findings.
This is by design; the tool's matching is intentionally conservative.

---

## Roadmap

| Version  | Status       | Scope                                                                   |
|----------|--------------|-------------------------------------------------------------------------|
| 0.1.0    | Released     | Direct-SDK checks: TS001, TS003, TS010, TS011, TS020.                   |
| 0.2.0    | Released     | LangChain adapter: TS101, TS102, TS103, TS104.                          |
| 0.3.0    | Planned      | LlamaIndex adapter for `Anthropic`, `OpenAI`, and `ReActAgent`.         |
| 0.4.0    | Planned      | TS030 (retry without budget); TS040 (tool-definition repetition).       |
| 0.5.0    | Planned      | Optional runtime sidecar reading `cache_read_input_tokens` from live API responses. |
| 1.0.0    | Planned      | TypeScript and Node.js support across all check classes.                |

---

## Frequently asked questions

**Does the tool execute analyzed code?**
No. The tool uses `ast.parse` exclusively. Analyzed code is never
imported, executed, or compiled.

**Does the tool make network requests?**
No. The package contains no network-related imports. No telemetry,
usage reporting, or version-check beacon is implemented.

**How are mocked clients in test code handled?**
Mock-client patterns are exercised in the test suite as fixtures that
must not produce false positives. If a false positive does occur,
suppress the finding with a `# llmdoctor: ignore <CODE>` comment and
report the case to the maintainer (see Contact below).

**Is LiteLLM, OpenRouter, or a custom wrapper supported?**
Not in the current release. Adapter modules for additional frameworks
are planned. The maintainer welcomes specific patterns observed in
production code.

---

## Contact

The issue tracker is private during the 0.x release series. To report a
bug, suggest a check, or share a real-world cost-leak pattern, contact
the maintainer at:

**issues.llmdoctor@gmail.com**

The maintainer aims to respond to actionable bug reports within a few
business days.

---

## License

MIT. The full license text is bundled with the installed package at
`llmdoctor-<version>.dist-info/licenses/LICENSE`.
