sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 69.3 | 69.7 | 1848.0 | 1868.9 | 14.44 | 7.79 | 100.0 | 100.0 | 88.8 | 96
Q3_K_M · ctx:8192 · kv:k16v16 · default | 64.8 | 65.0 | 1975.2 | 2010.7 | 15.43 | 7.52 | 93.4 | 93.3 | 92.0 | 93
IQ3_M · ctx:16384 · kv:k8v8 · default | 65.9 | 66.2 | 1943.1 | 1982.6 | 15.18 | 7.79 | 95.0 | 94.7 | 88.8 | 92
Q4_K_M · ctx:8192 · kv:k16v16 · default | 61.0 | 61.3 | 2097.2 | 2119.4 | 16.38 | 7.1 | 88.0 | 88.1 | 97.5 | 92
IQ3_M · ctx:32768 · kv:k8v8 · long | 63.2 | 64.3 | 2026.7 | 2032.0 | 15.83 | 7.79 | 91.7 | 91.6 | 88.8 | 91
Q4_K_M · ctx:16384 · kv:k8v8 · default | 58.8 | 59.0 | 2176.2 | 2192.3 | 17.0 | 7.1 | 84.7 | 85.1 | 97.5 | 90
Q3_K_M · ctx:16384 · kv:k8v8 · default | 61.9 | 62.2 | 2067.4 | 2124.3 | 16.15 | 7.52 | 89.3 | 88.7 | 92.0 | 90
Q5_K_M · ctx:8192 · kv:k16v16 · default | 57.1 | 57.2 | 2242.9 | 2254.2 | 17.52 | 6.99 | 82.2 | 82.7 | 99.0 | 89
Q4_K_M · ctx:32768 · kv:k8v8 · long | 57.3 | 57.5 | 2235.0 | 2267.3 | 17.46 | 7.1 | 82.6 | 82.6 | 97.5 | 89
Q3_K_M · ctx:32768 · kv:k8v8 · long | 60.1 | 60.5 | 2129.3 | 2133.4 | 16.64 | 7.52 | 86.8 | 87.2 | 92.0 | 89
Q5_K_M · ctx:16384 · kv:k8v8 · default | 54.8 | 55.2 | 2336.7 | 2362.0 | 18.26 | 6.99 | 79.1 | 79.1 | 99.0 | 87
Q5_K_M · ctx:32768 · kv:k8v8 · long | 52.7 | 53.6 | 2429.4 | 2446.8 | 18.98 | 6.99 | 76.5 | 76.2 | 99.0 | 85
Q8_0 · ctx:8192 · kv:k16v16 · default | 46.9 | 47.1 | 2729.0 | 2748.2 | 21.32 | 6.92 | 67.6 | 67.9 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k16v16 · long | 45.8 | 46.4 | 2794.0 | 2809.5 | 21.83 | 6.92 | 66.3 | 66.3 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k8v8 · default | 45.3 | 45.4 | 2825.9 | 2835.9 | 22.08 | 6.92 | 65.3 | 65.6 | 100.0 | 79
Q8_0 · ctx:32768 · kv:k8v8 · long | 44.1 | 44.5 | 2902.7 | 2941.7 | 22.68 | 6.92 | 63.7 | 63.6 | 100.0 | 78

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.0 TPS Δ=22.4 TTFT Δ=-881.0ms PPL Δ=0.8700000000000001
Agent smoke: 1/5 (0.2) [model_limited]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
