sigilant-runner · Phi-3.5-mini-instruct · L4 · vllm · 4 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
INT8_W8A8 · ctx:16384 · kv:k16v16 · long  <- best | 138.7 | n/a | 994.2 | n/a | 7.24 | 2.88 | 100.0 | 100.0 | 100.0 | 100
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 110.4 | n/a | 1248.8 | n/a | 9.09 | 2.90 | 79.6 | 79.6 | 99.3 | 89
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 109.3 | n/a | 1261.2 | n/a | 9.19 | 2.88 | 78.8 | 78.8 | 100.0 | 89
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 105.0 | n/a | 1313.2 | n/a | 9.56 | 2.88 | 75.7 | 75.7 | 100.0 | 88

Best config: INT8_W8A8 · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=0.00 TPS Δ=0.00 TTFT Δ=0.0ms PPL Δ=0.00
Confidence: target=medium gap_before=11.00% var_before=n/a% replay=False(disabled) gap_after=11.00%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
