sigilant-runner · Phi-3.5-mini-instruct · A10G · vllm · 4 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long  <- best | 178.5 | n/a | 131.3 | n/a | 5.63 | 3.04 | 99.9 | 100.0 | 99.7 | 100
AWQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 178.6 | n/a | 708.4 | n/a | 5.62 | 3.04 | 100.0 | 18.5 | 99.7 | 84
AWQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 178.0 | n/a | 731.1 | n/a | 5.64 | 3.03 | 99.7 | 18.0 | 100.0 | 83
AWQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 40.8 | n/a | 1330.5 | n/a | 24.61 | 3.05 | 22.8 | 9.9 | 99.3 | 51

Best config: AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=0.00 TPS Δ=0.00 TTFT Δ=0.0ms PPL Δ=0.00
Confidence: target=medium gap_before=16.00% var_before=n/a% replay=False(disabled) gap_after=16.00%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
