sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 71.6 | 72.7 | 1788.2 | 1827.1 | 13.97 | 7.79 | 100.0 | 100.0 | 89.1 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 67.1 | 67.5 | 1907.5 | 1927.2 | 14.90 | 7.12 | 93.3 | 94.3 | 97.5 | 95
IQ3_M · ctx:16384 · kv:k16v16 · long | 70.5 | 71.1 | 1815.6 | 1842.8 | 14.18 | 7.79 | 98.1 | 98.8 | 89.1 | 95
IQ3_M · ctx:16384 · kv:k8v8 · default | 70.0 | 70.3 | 1828.5 | 1859.4 | 14.29 | 7.79 | 97.2 | 98.0 | 89.1 | 94
Q4_K_M · ctx:16384 · kv:k16v16 · long | 65.9 | 66.6 | 1941.0 | 1944.1 | 15.16 | 7.12 | 91.8 | 93.1 | 97.5 | 94
Q4_K_M · ctx:16384 · kv:k8v8 · default | 64.6 | 65.4 | 1981.5 | 2003.4 | 15.48 | 7.12 | 90.1 | 90.7 | 97.5 | 93
IQ3_M · ctx:32768 · kv:k8v8 · long | 67.7 | 67.8 | 1890.3 | 1919.0 | 14.77 | 7.79 | 93.9 | 94.9 | 89.1 | 92
Q5_K_M · ctx:8192 · kv:k16v16 · default | 60.9 | 62.0 | 2100.2 | 2111.2 | 16.41 | 7.01 | 85.2 | 85.8 | 99.0 | 91
Q4_K_M · ctx:32768 · kv:k8v8 · long | 62.7 | 63.1 | 2042.6 | 2063.9 | 15.96 | 7.12 | 87.2 | 88.0 | 97.5 | 91
Q5_K_M · ctx:16384 · kv:k16v16 · long | 60.0 | 60.6 | 2131.7 | 2169.6 | 16.65 | 7.01 | 83.6 | 84.0 | 99.0 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 58.9 | 59.6 | 2171.7 | 2192.5 | 16.97 | 7.01 | 82.1 | 82.8 | 99.0 | 89
Q5_K_M · ctx:32768 · kv:k8v8 · long | 57.3 | 57.8 | 2234.1 | 2280.5 | 17.45 | 7.01 | 79.8 | 80.1 | 99.0 | 88
Q8_0 · ctx:8192 · kv:k16v16 · default | 48.7 | 49.1 | 2627.0 | 2631.1 | 20.52 | 6.94 | 67.8 | 68.8 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k8v8 · default | 47.3 | 47.5 | 2704.5 | 2708.5 | 21.13 | 6.94 | 65.7 | 66.8 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k16v16 · long | 48.3 | 48.7 | 2651.8 | 2790.1 | 20.72 | 6.94 | 67.2 | 66.5 | 100.0 | 80
Q8_0 · ctx:32768 · kv:k8v8 · long | 45.6 | 46.2 | 2807.1 | 2880.9 | 21.93 | 6.94 | 63.6 | 63.6 | 100.0 | 78

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.00 TPS Δ=22.90 TTFT Δ=-838.8ms PPL Δ=0.85
                     TPS p95 Δ=23.60 TTFT p95 Δ=-804.0ms
Agent smoke: 4/5 (80.0%) [config_ready_for_smoke]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
