sigilant-sweep · Phi-3.5-mini-instruct · L4 · vllm · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long  <- best | 82.3 | n/a | 1120.7 | n/a | 12.20 | 1.85 | 100.0 | 74.2 | 93.0 | 92
AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long | 81.8 | n/a | 1140.0 | n/a | 12.27 | 1.90 | 99.4 | 72.9 | 90.5 | 91
INT8_W8A8 · ctx:16384 · kv:k16v16 · long | 59.0 | n/a | 831.0 | n/a | 17.01 | 1.72 | 71.7 | 100.0 | 100.0 | 89
GPTQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 82.2 | n/a | 1587.9 | n/a | 12.22 | 1.85 | 99.9 | 52.3 | 93.0 | 88
GPTQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 82.2 | n/a | 1605.5 | n/a | 12.21 | 1.85 | 99.9 | 51.8 | 93.0 | 87
AWQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 81.6 | n/a | 1571.1 | n/a | 12.30 | 1.90 | 99.1 | 52.9 | 90.5 | 86
AWQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 81.8 | n/a | 1637.1 | n/a | 12.28 | 1.90 | 99.4 | 50.8 | 90.5 | 86
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 59.1 | n/a | 1313.3 | n/a | 16.99 | 1.72 | 71.8 | 63.3 | 100.0 | 81
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 59.0 | n/a | 1356.3 | n/a | 17.03 | 1.74 | 71.7 | 61.3 | 98.9 | 80
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 42.4 | n/a | 1357.2 | n/a | 23.66 | 1.84 | 51.5 | 61.2 | 93.5 | 70
GPTQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 46.6 | n/a | 1663.9 | n/a | 21.55 | 1.85 | 56.6 | 49.9 | 93.0 | 70
AWQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 44.7 | n/a | 1689.5 | n/a | 22.47 | 1.97 | 54.3 | 49.2 | 87.3 | 66
FP16_BASELINE · ctx:16384 · kv:k16v16 · long | 42.1 | n/a | 1067.8 | n/a | 23.87 | 2.80 | 51.2 | 77.8 | 61.4 | 61
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 42.3 | n/a | 1515.7 | n/a | 23.75 | 2.75 | 51.4 | 54.8 | 62.5 | 57
FP16_BASELINE · ctx:8192 · kv:k16v16 · default | 42.7 | n/a | 1536.4 | n/a | 23.60 | 2.68 | 51.9 | 54.1 | 64.2 | 57
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 39.2 | n/a | 2066.5 | n/a | 25.59 | 2.70 | 47.6 | 40.2 | 63.7 | 53

Best config: GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=31.00 TPS Δ=40.20 TTFT Δ=52.9ms PPL Δ=-0.95
Confidence: target=medium gap_before=1.09% var_before=n/a% replay=False(disabled) gap_after=1.09%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
