Metadata-Version: 2.4
Name: vllm-doctor
Version: 0.1.0
Summary: Diagnostic tool for vLLM inference servers
Keywords: vllm,llm,inference,diagnostics,prometheus,cli
Author: Amin Alaee
Author-email: Amin Alaee <mohammadamin.alaee@gmail.com>
License-Expression: BSD-3-Clause
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: BSD License
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Environment :: Console
Classifier: Operating System :: OS Independent
Requires-Dist: httpx>=0.27
Requires-Dist: prometheus-client>=0.20
Requires-Dist: pydantic>=2
Requires-Dist: rich>=13
Requires-Dist: typer>=0.12
Requires-Python: >=3.10
Project-URL: Documentation, https://aminalaee.github.io/vllm-doctor
Project-URL: Issues, https://github.com/aminalaee/vllm-doctor/issues
Project-URL: Source, https://github.com/aminalaee/vllm-doctor
Description-Content-Type: text/markdown

<img src="https://raw.githubusercontent.com/aminalaee/vllm-doctor/main/docs/assets/wordmark.svg" alt="vLLM Doctor" width="360">

<p>
<a href="https://pypi.org/project/vllm-doctor/">
    <img src="https://badge.fury.io/py/vllm-doctor.svg" alt="Package version">
</a>
<a href="https://pypi.org/project/vllm-doctor/">
    <img src="https://img.shields.io/pypi/pyversions/vllm-doctor.svg?color=%2334D058" alt="Supported Python versions">
</a>
</p>

Diagnose vLLM serving issues from `/metrics`.

vLLM Doctor reads production metrics and turns them into operational findings: what looks wrong, how confident the diagnosis is, and which vLLM knobs are worth checking first.

```shell
vllm-doctor --url http://localhost:8000/metrics
```

> vLLM Doctor is not a dashboard replacement. It is a fast diagnostic snapshot for a single server or Prometheus target.

## Why not just a dashboard?

Dashboards show metrics. vLLM Doctor explains inference-system behavior.

|                          | Dashboards | vLLM Doctor |
| ------------------------ | ---------- | ----------- |
| Shows raw metrics        | ✓          | ✓           |
| Explains what's wrong    | ✗          | ✓           |
| Recommends vLLM configs  | ✗          | ✓           |
| Requires setup           | ✓          | ✗           |
| Works on a single server | ✗          | ✓           |

## Installation

With pip:

```shell
pip install vllm-doctor
```

With uv:

```shell
uv tool install vllm-doctor
```

## Quickstart

Direct scrape:

```shell
vllm-doctor --url http://localhost:8000/metrics
```

Prometheus:

```shell
vllm-doctor --url http://localhost:9090
```

JSON output:

```shell
vllm-doctor --url http://localhost:8000/metrics --format json
```

Verbose:

```shell
vllm-doctor --url http://localhost:8000/metrics --verbose
```

## Example output

```shell
─────────── vLLM Doctor  ·  Health: CRITICAL  ·  Window: 5m ────────────

╭─ ✖ KV cache pressure  [high confidence] ─────────────────────────────╮
│   GPU KV cache usage: 94%  ·  Waiting requests: 7                    │
│                                                                      │
│   → Reduce max_num_seqs to limit concurrent sequences                │
│   → Increase gpu_memory_utilization if GPU memory headroom exists    │
╰──────────────────────────────────────────────────────────────────────╯
╭─ ⚠ Queue pressure  [low confidence] ─────────────────────────────────╮
│   Waiting requests: 7                                                │
│                                                                      │
│   → Add replicas or increase concurrency limits                      │
│   → Inspect autoscaling thresholds                                   │
╰──────────────────────────────────────────────────────────────────────╯

─────────────────────────── Observed Metrics ───────────────────────────

  Requests Running                             12
  Requests Waiting                              7
  GPU Cache Usage        ███████████████████░ 94%
  Generation Tokens/s                        42.0
  TTFT p95 (s)                              3.200
  TPOT p95 (s)                              0.050
```

## Documentation

Read the full documentation: https://aminalaee.github.io/vllm-doctor
