sigilant-runner · Phi-3.5-mini-instruct · A10G · vllm · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long  <- best | 181.0 | n/a | 105.1 | n/a | 5.55 | 1.85 | 99.5 | 100.0 | 93.0 | 97
AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long | 179.4 | n/a | 117.9 | n/a | 5.60 | 1.90 | 98.6 | 89.1 | 90.5 | 93
GPTQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 181.9 | n/a | 676.2 | n/a | 5.52 | 1.85 | 100.0 | 15.5 | 93.0 | 80
GPTQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 181.4 | n/a | 687.1 | n/a | 5.53 | 1.85 | 99.7 | 15.3 | 93.0 | 80
AWQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 179.5 | n/a | 671.1 | n/a | 5.59 | 1.90 | 98.7 | 15.7 | 90.5 | 79
AWQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 179.5 | n/a | 717.4 | n/a | 5.59 | 1.90 | 98.7 | 14.7 | 90.5 | 79
INT8_W8A8 · ctx:16384 · kv:k16v16 · long | 140.7 | n/a | 979.8 | n/a | 7.14 | 1.72 | 77.4 | 10.7 | 100.0 | 73
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 116.6 | n/a | 1181.8 | n/a | 8.61 | 1.74 | 64.1 | 8.9 | 98.9 | 67
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 116.6 | n/a | 1182.0 | n/a | 8.61 | 1.74 | 64.1 | 8.9 | 98.9 | 67
FP16_BASELINE · ctx:16384 · kv:k16v16 · long | 58.6 | n/a | 138.2 | n/a | 17.15 | 2.75 | 32.2 | 76.0 | 62.5 | 53
GPTQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 43.2 | n/a | 621.2 | n/a | 23.24 | 1.85 | 23.7 | 16.9 | 93.0 | 50
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 53.4 | n/a | 2579.2 | n/a | 18.78 | 1.84 | 29.4 | 4.1 | 93.5 | 50
AWQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 43.2 | n/a | 591.5 | n/a | 23.24 | 1.97 | 23.7 | 17.8 | 87.3 | 48
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 58.7 | n/a | 710.9 | n/a | 17.10 | 2.68 | 32.3 | 14.8 | 64.2 | 42
FP16_BASELINE · ctx:8192 · kv:k16v16 · default | 59.9 | n/a | 766.6 | n/a | 16.84 | 2.68 | 32.9 | 13.7 | 64.2 | 42
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 43.0 | n/a | 1232.2 | n/a | 23.35 | 2.70 | 23.6 | 8.5 | 63.7 | 37

Best config: GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=44.00 TPS Δ=122.40 TTFT Δ=-33.1ms PPL Δ=-0.90
Confidence: target=medium gap_before=4.12% var_before=n/a% replay=False(disabled) gap_after=4.12%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
