sigilant-runner · Phi-3.5-mini-instruct · L4 · vllm · 15 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
VLLM_F16_Q · ctx:16384 · kv:k8v8 · default  <- best | 57.2 | n/a | 895.1 | n/a | 14.04 | 3.02 | 100.0 | 100.0 | 100.0 | 100
VLLM_F16_Q · ctx:8192 · kv:k16v16 · default | 57.2 | n/a | 895.2 | n/a | 14.04 | 3.02 | 100.0 | 100.0 | 100.0 | 100
VLLM_F16_B · ctx:16384 · kv:k16v16 · long | 57.2 | n/a | 895.3 | n/a | 14.04 | 3.02 | 100.0 | 100.0 | 100.0 | 100
VLLM_F16_Q · ctx:16384 · kv:k16v16 · long | 57.2 | n/a | 895.4 | n/a | 14.05 | 3.02 | 100.0 | 100.0 | 100.0 | 100
VLLM_F16_B · ctx:16384 · kv:k8v8 · default | 57.1 | n/a | 895.9 | n/a | 14.05 | 3.02 | 99.8 | 99.9 | 100.0 | 100
VLLM_F16_B · ctx:32768 · kv:k8v8 · long | 57.1 | n/a | 896.3 | n/a | 14.06 | 3.02 | 99.8 | 99.9 | 100.0 | 100
VLLM_F16_B · ctx:8192 · kv:k16v16 · default | 57.0 | n/a | 897.5 | n/a | 14.08 | 3.02 | 99.7 | 99.7 | 100.0 | 100
VLLM_BF16_B · ctx:16384 · kv:k16v16 · long | 57.0 | n/a | 897.5 | n/a | 14.08 | 3.06 | 99.7 | 99.7 | 98.7 | 99
VLLM_AUTO_F · ctx:32768 · kv:k8v8 · long | 57.0 | n/a | 897.6 | n/a | 14.08 | 3.06 | 99.7 | 99.7 | 98.7 | 99
VLLM_AUTO_F · ctx:16384 · kv:k16v16 · long | 57.0 | n/a | 897.7 | n/a | 14.08 | 3.06 | 99.7 | 99.7 | 98.7 | 99
VLLM_BF16_B · ctx:8192 · kv:k16v16 · default | 57.0 | n/a | 897.7 | n/a | 14.08 | 3.06 | 99.7 | 99.7 | 98.7 | 99
VLLM_AUTO_F · ctx:8192 · kv:k16v16 · default | 57.0 | n/a | 897.9 | n/a | 14.08 | 3.06 | 99.7 | 99.7 | 98.7 | 99
VLLM_AUTO_F · ctx:16384 · kv:k8v8 · default | 57.0 | n/a | 898.0 | n/a | 14.09 | 3.06 | 99.7 | 99.7 | 98.7 | 99
VLLM_BF16_B · ctx:16384 · kv:k8v8 · default | 57.0 | n/a | 898.4 | n/a | 14.09 | 3.06 | 99.7 | 99.6 | 98.7 | 99
VLLM_BF16_B · ctx:32768 · kv:k8v8 · long | 57.0 | n/a | 898.9 | n/a | 14.10 | 3.04 | 99.7 | 99.6 | 99.3 | 99

Best config: VLLM_F16_Q · ctx:16384 · kv:k8v8 · default
Auto baseline compare: score Δ=0.00 TPS Δ=0.00 TTFT Δ=0.0ms PPL Δ=0.00
Confidence: target=medium gap_before=0.00% var_before=n/a% replay=False(disabled) gap_after=0.00%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
