$ hotpath serve-report .hotpath/full_smoke/serve_profile.db
hotpath / Qwen/Qwen2.5-7B-Instruct · vLLM 0.19.0 · 1x H100 80GB
vLLM 0.19.0 H100 80GB server-log prefix-caching
Requests
847
completed
Duration
60.8
seconds
Throughput
14.1
req/s
Token Rate
2,847
tok/s
Latency
p50 p90 p99
Queue wait 2.1 ms 8.4 ms 41.2 ms
Prefill (server) 10.1 ms 38.3 ms 98.5 ms
Decode (server) 172.0 ms 398.1 ms 842.0 ms
Decode (per-tok) 4.2 ms 5.1 ms 13.9 ms
End-to-end 201.4 ms 450.3 ms 983.1 ms
GPU Phase
Prefill 31.2%
Decode 48.1%
Schedule 8.7%
Idle 12.0%
Throughput tok/s
Batch Size requests
KV Cache
Hit rate 41.2%
Avg usage 73.2%
Peak usage 91.4%
Eviction rate 12
Cache Hit Distribution
0%
234 (27.0%)
1–25%
68 (8.0%)
25–50%
127 (15.0%)
50–75%
178 (21.0%)
75%+
240 (28.3%)
Prefix Sharing
Unique prefixes 23
Avg req/prefix 36.8
Cacheable tokens 72.1%
Actual hit rate 41.2%
Disaggregation Advisor
Workload class PREFILL_HEAVY
Median prompt 2.4K tokens
Prefill contention adds 34% to p99 decode latency
Median output 340 tokens
Recommendation + DISAGGREGATE prefill contention is the dominant bottleneck at p99
Current Projected Improvement
Throughput (req/s) 14.1 19.8 +40%
p99 TTFT (ms) 84.5 52.3 -42%
p99 ITL (ms) 9.8 5.2 -47%
Optimal P/D ratio 1:3  (2 prefill, 6 decode)
KV transfer overhead 1.2 ms/req  at 100 Gbps IB
Min network bandwidth 50 Gbps  (below this, disagg hurts)