sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 72.6 | 73.2 | 1763.2 | 1782.3 | 13.78 | 7.79 | 100.0 | 100.0 | 89.1 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 66.4 | 66.8 | 1926.7 | 1943.7 | 15.05 | 7.12 | 91.4 | 91.6 | 97.5 | 94
IQ3_M · ctx:16384 · kv:k16v16 · long | 71.0 | 71.5 | 1803.8 | 1868.2 | 14.09 | 7.79 | 97.7 | 96.6 | 89.1 | 94
IQ3_M · ctx:16384 · kv:k8v8 · default | 69.6 | 70.2 | 1838.2 | 1905.5 | 14.36 | 7.79 | 95.9 | 94.7 | 89.1 | 93
IQ3_M · ctx:32768 · kv:k8v8 · long | 68.2 | 69.0 | 1877.0 | 1881.3 | 14.66 | 7.79 | 94.1 | 94.3 | 89.1 | 92
Q4_K_M · ctx:16384 · kv:k16v16 · long | 64.4 | 65.5 | 1986.5 | 2004.9 | 15.52 | 7.12 | 89.1 | 88.8 | 97.5 | 92
Q4_K_M · ctx:16384 · kv:k8v8 · default | 63.3 | 63.7 | 2020.9 | 2051.3 | 15.79 | 7.12 | 87.1 | 87.1 | 97.5 | 91
Q5_K_M · ctx:8192 · kv:k16v16 · default | 61.0 | 61.4 | 2098.5 | 2175.7 | 16.39 | 7.01 | 84.0 | 83.0 | 99.0 | 90
Q4_K_M · ctx:32768 · kv:k8v8 · long | 62.5 | 62.6 | 2049.1 | 2086.5 | 16.01 | 7.12 | 85.8 | 85.7 | 97.5 | 90
Q5_K_M · ctx:16384 · kv:k16v16 · long | 60.3 | 60.5 | 2124.0 | 2158.6 | 16.59 | 7.01 | 82.9 | 82.8 | 99.0 | 89
Q5_K_M · ctx:16384 · kv:k8v8 · default | 58.2 | 58.5 | 2197.8 | 2212.4 | 17.17 | 7.01 | 80.0 | 80.4 | 99.0 | 88
Q5_K_M · ctx:32768 · kv:k8v8 · long | 57.8 | 57.9 | 2215.1 | 2285.6 | 17.31 | 7.01 | 79.4 | 78.8 | 99.0 | 87
Q8_0 · ctx:8192 · kv:k16v16 · default | 49.4 | 49.8 | 2589.9 | 2646.4 | 20.23 | 6.94 | 68.0 | 67.7 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k16v16 · long | 48.5 | 49.3 | 2637.8 | 2744.3 | 20.61 | 6.94 | 67.1 | 65.9 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k8v8 · default | 47.6 | 48.1 | 2689.0 | 2747.8 | 21.01 | 6.94 | 65.6 | 65.2 | 100.0 | 79
Q8_0 · ctx:32768 · kv:k8v8 · long | 47.0 | 47.5 | 2722.6 | 2736.7 | 21.27 | 6.94 | 64.8 | 64.9 | 100.0 | 79

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.00 TPS Δ=23.20 TTFT Δ=-826.7ms PPL Δ=0.85
                     TPS p95 Δ=23.40 TTFT p95 Δ=-864.1ms
Agent smoke: 4/5 (80.0%) [config_ready_for_smoke]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
