sigilant-runner · Phi-3.5-mini-instruct · L4 · vllm · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long  <- best | 179.9 | n/a | 146.5 | n/a | 5.58 | 1.85 | 100.0 | 92.5 | 93.0 | 96
AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long | 177.6 | n/a | 135.5 | n/a | 5.65 | 1.97 | 98.7 | 100.0 | 87.3 | 94
GPTQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 179.9 | n/a | 906.2 | n/a | 5.58 | 1.85 | 100.0 | 15.0 | 93.0 | 80
GPTQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 179.4 | n/a | 908.4 | n/a | 5.60 | 1.85 | 99.7 | 14.9 | 93.0 | 80
AWQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 177.1 | n/a | 907.7 | n/a | 5.67 | 1.97 | 98.4 | 14.9 | 87.3 | 77
AWQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 177.6 | n/a | 908.6 | n/a | 5.65 | 1.97 | 98.7 | 14.9 | 87.3 | 77
INT8_W8A8 · ctx:16384 · kv:k16v16 · long | 130.6 | n/a | 1055.6 | n/a | 7.69 | 1.74 | 72.6 | 12.8 | 98.9 | 71
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 106.4 | n/a | 1295.1 | n/a | 9.43 | 1.72 | 59.1 | 10.5 | 100.0 | 66
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 106.3 | n/a | 1296.8 | n/a | 9.44 | 1.74 | 59.1 | 10.4 | 98.9 | 65
FP16_BASELINE · ctx:16384 · kv:k16v16 · long | 58.1 | n/a | 176.3 | n/a | 17.28 | 2.80 | 32.3 | 76.9 | 61.4 | 53
GPTQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 36.7 | n/a | 660.9 | n/a | 27.33 | 1.85 | 20.4 | 20.5 | 93.0 | 49
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 47.5 | n/a | 2904.4 | n/a | 21.15 | 1.84 | 26.4 | 4.7 | 93.5 | 49
AWQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 36.6 | n/a | 635.7 | n/a | 27.43 | 1.97 | 20.3 | 21.3 | 87.3 | 47
FP16_BASELINE · ctx:8192 · kv:k16v16 · default | 59.3 | n/a | 937.8 | n/a | 17.01 | 2.72 | 33.0 | 14.4 | 63.2 | 41
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 58.1 | n/a | 941.9 | n/a | 17.29 | 2.72 | 32.3 | 14.4 | 63.2 | 41
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 38.2 | n/a | 1490.3 | n/a | 26.30 | 2.66 | 21.2 | 9.1 | 64.7 | 36

Best config: GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=43.00 TPS Δ=121.80 TTFT Δ=-29.8ms PPL Δ=-0.95
Confidence: target=medium gap_before=2.08% var_before=n/a% replay=False(disabled) gap_after=2.08%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
