Metadata-Version: 2.4
Name: vllm-mon
Version: 0.1.0
Summary: Production-grade vLLM metrics monitoring TUI with persistent storage and Grafana-style visualizations
Project-URL: Homepage, https://github.com/renold/vllmtop
Project-URL: Repository, https://github.com/renold/vllmtop
Author: Renold
License-Expression: MIT
Keywords: dashboard,metrics,monitoring,tui,vllm
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.11
Requires-Dist: aiosqlite>=0.19.0
Requires-Dist: click>=8.1.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: textual-plotext>=0.2.0
Requires-Dist: textual>=0.80
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# vllmtop

Production-grade TUI dashboard for monitoring [vLLM](https://github.com/vllm-project/vllm) inference servers. Real-time metrics, persistent storage, and Grafana-style visualizations in your terminal.

## Features

- **Live Dashboard** - Real-time KPI cards, gauge bars, and sparklines with 60s rolling history
- **Historical Explorer** - Time-range analysis with interactive charts
- **Request Breakdown** - Request completion outcomes and statistics
- **Cache & System Analytics** - KV cache usage, prefix cache hit rates, and system metrics
- **Configurable Alerts** - Threshold-based alert rules with persistent alert history
- **Multiple Graph Styles** - Line, braille, and block charts (press `g` to cycle)
- **Multi-Server Monitoring** - Monitor multiple vLLM instances from a single dashboard
- **Persistent Storage** - All metrics saved to SQLite with automatic retention management

## Installation

Requires Python 3.11+.

```bash
pip install .
```

For development:

```bash
pip install -e ".[dev]"
```

## Quick Start

```bash
# Connect to a local vLLM server (default: http://localhost:8000)
vllmtop

# Connect to a remote server
vllmtop --url http://gpu-server:8000

# Use a config file
vllmtop --config config.yaml

# Custom poll interval and retention
vllmtop --url http://localhost:8000 --interval 2 --retention 60
```

## Configuration

Copy the example config and customize:

```bash
cp config.example.yaml config.yaml
```

```yaml
targets:
  - url: http://localhost:8000
    name: "GPU Server 1"

graph_style: "line"       # line, braille, or block
poll_interval: 1.0        # seconds
db_path: "./vllm_metrics.db"
retention_days: 30

alert_rules:
  - name: "KV Cache Critical"
    metric: "vllm:kv_cache_usage_perc"
    operator: ">"
    threshold: 90.0
    enabled: true
```

See [`config.example.yaml`](config.example.yaml) for the full configuration reference.

## CLI Options

| Option | Default | Description |
|---|---|---|
| `--url` | `http://localhost:8000` | vLLM server URL |
| `--db` | `./vllm_metrics.db` | SQLite database path |
| `--retention` | `30` | Data retention in days |
| `--interval` | `1.0` | Poll interval in seconds |
| `--config` | - | Path to YAML config file |
| `--graph-style` | `line` | Graph style: line, braille, or block |

## Keyboard Shortcuts

| Key | Action |
|---|---|
| `1`-`5` | Switch tabs |
| `g` | Cycle graph style |
| `s` | Screenshot |
| `ctrl+p` | Command palette |
| `q` | Quit |

## Metrics Tracked

- **Requests** - Running, waiting, swapped, queue time
- **Cache** - KV cache usage, GPU cache usage, prefix cache hit rate
- **Tokens** - Prompt tokens, generation tokens, totals
- **Latency** - Time-to-first-token (TTFT), time-per-output-token (TPOT), end-to-end latency
- **Throughput** - Prompt and generation throughput (tok/s)

## License

MIT
