sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 69.4 | 70.2 | 1843.7 | 1914.2 | 14.40 | 7.79 | 100.0 | 100.0 | 89.1 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 64.6 | 65.1 | 1980.5 | 2081.6 | 15.47 | 7.12 | 92.9 | 92.5 | 97.5 | 95
IQ3_M · ctx:16384 · kv:k16v16 · long | 68.2 | 68.8 | 1877.3 | 1946.3 | 14.67 | 7.79 | 98.1 | 98.3 | 89.1 | 95
IQ3_M · ctx:16384 · kv:k8v8 · default | 67.6 | 68.5 | 1893.5 | 1924.3 | 14.79 | 7.79 | 97.5 | 98.4 | 89.1 | 94
Q4_K_M · ctx:16384 · kv:k16v16 · long | 63.3 | 63.5 | 2023.0 | 2164.8 | 15.80 | 7.12 | 90.8 | 89.8 | 97.5 | 93
Q4_K_M · ctx:16384 · kv:k8v8 · default | 61.6 | 62.0 | 2077.8 | 2087.6 | 16.23 | 7.12 | 88.5 | 90.2 | 97.5 | 92
Q4_K_M · ctx:32768 · kv:k8v8 · long | 60.4 | 61.6 | 2117.8 | 2131.0 | 16.55 | 7.12 | 87.4 | 88.4 | 97.5 | 92
IQ3_M · ctx:32768 · kv:k8v8 · long | 65.0 | 65.6 | 1969.0 | 2014.2 | 15.38 | 7.79 | 93.6 | 94.3 | 89.1 | 92
Q5_K_M · ctx:8192 · kv:k16v16 · default | 57.6 | 57.7 | 2221.5 | 2231.5 | 17.36 | 7.01 | 82.6 | 84.4 | 99.0 | 90
Q5_K_M · ctx:16384 · kv:k16v16 · long | 56.8 | 57.3 | 2255.1 | 2298.0 | 17.62 | 7.01 | 81.7 | 82.5 | 99.0 | 89
Q5_K_M · ctx:16384 · kv:k8v8 · default | 55.6 | 56.0 | 2300.2 | 2315.5 | 17.97 | 7.01 | 79.9 | 81.4 | 99.0 | 88
Q5_K_M · ctx:32768 · kv:k8v8 · long | 54.8 | 55.3 | 2333.7 | 2415.2 | 18.23 | 7.01 | 78.9 | 79.1 | 99.0 | 87
Q8_0 · ctx:8192 · kv:k16v16 · default | 47.8 | 48.4 | 2679.0 | 2760.0 | 20.93 | 6.94 | 68.9 | 69.1 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k16v16 · long | 46.8 | 47.2 | 2735.2 | 2758.1 | 21.37 | 6.94 | 67.3 | 68.4 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k8v8 · default | 46.2 | 46.3 | 2770.0 | 2838.8 | 21.64 | 6.94 | 66.3 | 67.0 | 100.0 | 80
Q8_0 · ctx:32768 · kv:k8v8 · long | 45.1 | 45.4 | 2839.4 | 2894.4 | 22.18 | 6.94 | 64.8 | 65.5 | 100.0 | 79

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.00 TPS Δ=21.60 TTFT Δ=-835.3ms PPL Δ=0.85
                     TPS p95 Δ=21.80 TTFT p95 Δ=-845.8ms
Agent smoke: 3/5 (60.0%) [mixed]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
