Metadata-Version: 2.4
Name: vllmpytop
Version: 0.1.0
Summary: A btop-style terminal UI for monitoring a vLLM instance and its GPU in real time.
Author-email: Theodore Kirby <theo@kirby.dev>
License-Expression: MIT
Project-URL: Homepage, https://github.com/theo-kirby/vllmtop
Project-URL: Repository, https://github.com/theo-kirby/vllmtop
Project-URL: Issues, https://github.com/theo-kirby/vllmtop/issues
Keywords: vllm,gpu,monitoring,tui,terminal,btop,nvml,prometheus,llm
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console :: Curses
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: nvidia-ml-py>=12.0
Requires-Dist: prometheus-client>=0.20
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Dynamic: license-file

# vllmtop

A **btop-style terminal UI** for monitoring a running [vLLM](https://github.com/vllm-project/vllm)
instance and its GPU in real time. Hand-rolled braille charts, a responsive
curses layout, and a non-blocking background poller so the UI never stalls on
network or NVML latency.

```
╭─┐¹ gpu ┌──────────────────────────────────────────────────────────────────╮
│ NVIDIA GeForce RTX 5090   util  86%   50°C  319/600W  SM 2857MHz  fan 31%   │
│ ⣿⣿⣿⣿ … braille utilisation chart …                                          │
│ VRAM 27.3GB/31.8GB ████████████████████░░░  86%                             │
│ PWR  ████████████░░░░░░░░░░░░░░░░░░░░░░░░░░  53%                             │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─┐² throughput ┌────────────╮╭─┐³ requests ┌──────────────╮
│ gen 149 tok/s   ⣀⣤⣶⣿        ││ running ████░░ 1            │
│ prompt 3.5k tok/s          ││ waiting ██████ 2            │
╰─┘ tok/s └──────────────────╯╰─┘ queue └───────────────────╯
╭─┐⁴ latency ┌───────────────╮╭─┐⁵ cache ┌──────────────────╮
│ TTFT 964ms  TPOT 6ms …     ││ KV █░░░  3%  prefix 0.0%    │
╰─┘ recent avg └─────────────╯╰─────────────────────────────╯
```

Rounded corners, superscript panel numbers in the title tabs, and a secondary
label on the bottom edge — matching [btop](https://github.com/aristocratos/btop)'s box style.

## What it shows

- **GPU** (via NVML / `pynvml`): utilisation %, VRAM used/total, temperature,
  power draw vs. limit, SM clock, fan — with green/yellow/red thresholds.
- **Throughput**: generation tok/s and prompt tok/s (rates derived from vLLM
  counters), as big numbers + braille charts.
- **Requests / Queue**: running vs. waiting requests and preemptions.
- **Latency** (recent average over the last poll interval — far more useful live
  than the cumulative average): TTFT, inter-token (TPOT), end-to-end, queue time.
- **Cache**: KV-cache usage % and prefix-cache hit rate.

Data comes from vLLM's Prometheus `/metrics` endpoint plus in-process NVML
polling. If vLLM goes away (e.g. a container restart) the UI shows a disconnect
banner and keeps the GPU panel live, then reconnects automatically.

## Install

Requires Python 3.10+ on Linux (curses is stdlib). A working NVIDIA driver is
needed for the GPU panel.

```bash
pip install .
# or, for development:
pip install -e ".[dev]"
```

Dependencies: `nvidia-ml-py` (NVML bindings) and `prometheus-client` (exposition
parser). The `/metrics` fetch uses stdlib `urllib`.

## Usage

```bash
vllmtop                              # monitor http://localhost:8000
vllmtop --url http://host:8000       # a remote vLLM server
vllmtop --interval 0.5               # poll twice a second
vllmtop --no-gpu                     # skip the GPU panel
python -m vllmtop                    # same thing, without the entry point
```

The server URL can also be set via the `VLLMTOP_URL` environment variable.

### Options

| Flag | Default | Description |
|------|---------|-------------|
| `--url` | `http://localhost:8000` | vLLM base URL (env `VLLMTOP_URL`) |
| `--interval` | `1.0` | poll interval in seconds |
| `--gpu-index` | `0` | NVML GPU index |
| `--no-gpu` | off | disable the GPU panel |
| `--dump-json` | off | collect one snapshot, print JSON, exit (no TTY) |

### Keybindings

| Key | Action |
|-----|--------|
| `q` / `Esc` | quit |
| `+` / `-` | faster / slower refresh |
| `p` | pause / resume polling |
| `1`–`5` | toggle a panel on/off (¹gpu ²throughput ³requests ⁴latency ⁵cache) |
| `h` / `?` | toggle help overlay |

Each panel's title carries a superscript number (btop-style) showing the key
that toggles it. Hiding panels reflows the rest to fill the freed space.

### Headless smoke test

`--dump-json` collects two snapshots an interval apart (so rates are populated),
prints the result as JSON, and exits. Works without a TTY — handy for CI or
verifying connectivity:

```bash
python -m vllmtop --dump-json --url http://localhost:8000
```

## How it works

- A **background poller thread** scrapes `/metrics` and polls NVML every
  `interval` seconds, storing the latest combined snapshot under a lock. This
  keeps all I/O latency off the render path.
- The **UI loop** wakes on a short tick (250 ms), reads the latest snapshot,
  appends derived values (rates, recent-average latencies) to per-series ring
  buffers, and redraws — so render cadence is independent of poll cadence.
- **Counters → rates**: `Δvalue / Δt`, guarded against `Δt ≤ 0` and counter
  resets. **Histograms → recent average**: `Δsum / Δcount` between polls.
- **Braille charts**: each cell is a 2×4 Unicode braille dot matrix, giving
  `2w × 4h`-dot resolution for the smooth btop look.

## Development

```bash
pytest        # parser-against-fixture, rate math, braille rendering
```

## License

MIT — see [LICENSE](LICENSE).
