sigilant-sweep · Phi-3.5-mini-instruct · L4 · vllm · 4 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
FP16_BASELINE · ctx:16384 · kv:k16v16 · long  <- best | 42.6 | n/a | 1051.5 | n/a | 23.59 | 2.68 | 98.6 | 100.0 | 100.0 | 99
FP16_BASELINE · ctx:8192 · kv:k16v16 · default | 43.2 | n/a | 1532.9 | n/a | 23.33 | 2.68 | 100.0 | 68.6 | 100.0 | 94
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 42.7 | n/a | 1521.7 | n/a | 23.52 | 2.68 | 98.8 | 69.1 | 100.0 | 93
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 39.6 | n/a | 2091.6 | n/a | 25.34 | 2.70 | 91.7 | 50.3 | 99.3 | 86

Best config: FP16_BASELINE · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=0.00 TPS Δ=0.00 TTFT Δ=0.0ms PPL Δ=0.00
Confidence: target=medium gap_before=5.05% var_before=n/a% replay=False(disabled) gap_after=5.05%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
