sigilant-runner · Phi-3.5-mini-instruct · A10G · vllm · 4 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
INT8_W8A8 · ctx:16384 · kv:k16v16 · long  <- best | 59.3 | n/a | 834.6 | n/a | 16.94 | 1.74 | 100.0 | 57.7 | 98.9 | 91
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 59.3 | n/a | 1395.7 | n/a | 16.92 | 1.72 | 100.0 | 34.5 | 100.0 | 87
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 3.4 | n/a | 481.3 | n/a | 446.91 | 1.74 | 5.7 | 100.0 | 98.9 | 62
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 2.3 | n/a | 694.7 | n/a | 645.08 | 1.84 | 3.9 | 69.3 | 93.5 | 53

Best config: INT8_W8A8 · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=0.00 TPS Δ=0.00 TTFT Δ=0.0ms PPL Δ=0.00
Confidence: target=medium gap_before=4.40% var_before=n/a% replay=False(disabled) gap_after=4.40%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
