sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 58.2 | 61.6 | 2199.1 | 2419.6 | 17.18 | 7.79 | 100.0 | 100.0 | 88.8 | 96
IQ3_M · ctx:16384 · kv:k8v8 · default | 54.5 | 57.1 | 2348.2 | 2497.1 | 18.35 | 7.79 | 93.2 | 95.3 | 88.8 | 92
Q4_K_M · ctx:8192 · kv:k16v16 · default | 50.9 | 52.5 | 2516.7 | 2698.3 | 19.66 | 7.1 | 86.3 | 88.5 | 97.5 | 91
IQ3_M · ctx:16384 · kv:k16v16 · long | 52.7 | 59.4 | 2430.8 | 2687.6 | 18.99 | 7.79 | 93.5 | 90.2 | 88.8 | 91
Q4_K_M · ctx:16384 · kv:k8v8 · default | 48.7 | 53.9 | 2630.3 | 2865.5 | 20.55 | 7.1 | 85.6 | 84.0 | 97.5 | 90
Q5_K_M · ctx:8192 · kv:k16v16 · default | 48.4 | 50.7 | 2646.5 | 2901.6 | 20.68 | 6.99 | 82.7 | 83.2 | 99.0 | 89
Q4_K_M · ctx:16384 · kv:k16v16 · long | 46.3 | 52.0 | 2767.5 | 2948.0 | 21.62 | 7.1 | 82.0 | 80.8 | 97.5 | 88
IQ3_M · ctx:32768 · kv:k8v8 · long | 50.9 | 52.2 | 2515.9 | 2799.9 | 19.66 | 7.79 | 86.1 | 86.9 | 88.8 | 87
Q5_K_M · ctx:32768 · kv:k8v8 · long | 45.3 | 45.9 | 2827.8 | 3066.9 | 22.09 | 6.99 | 76.2 | 78.3 | 99.0 | 86
Q5_K_M · ctx:16384 · kv:k16v16 · long | 45.6 | 46.9 | 2808.5 | 3053.0 | 21.94 | 6.99 | 77.2 | 78.8 | 99.0 | 86
Q5_K_M · ctx:16384 · kv:k8v8 · default | 44.9 | 45.5 | 2850.9 | 3268.0 | 22.27 | 6.99 | 75.5 | 75.6 | 99.0 | 85
Q4_K_M · ctx:32768 · kv:k8v8 · long | 44.0 | 45.4 | 2908.7 | 2975.1 | 22.72 | 7.1 | 74.7 | 78.5 | 97.5 | 85
Q8_0 · ctx:8192 · kv:k16v16 · default | 40.8 | 43.0 | 3138.4 | 3224.3 | 24.52 | 6.92 | 70.0 | 72.6 | 100.0 | 82
Q8_0 · ctx:16384 · kv:k8v8 · default | 38.0 | 40.0 | 3365.2 | 3422.9 | 26.29 | 6.92 | 65.1 | 68.0 | 100.0 | 80
Q8_0 · ctx:32768 · kv:k8v8 · long | 39.1 | 40.0 | 3277.5 | 3382.8 | 25.61 | 6.92 | 66.1 | 69.3 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k16v16 · long | 38.6 | 41.8 | 3314.6 | 3445.3 | 25.9 | 6.92 | 67.1 | 68.3 | 100.0 | 80

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=14.0 TPS Δ=17.400000000000006 TTFT Δ=-939.3000000000002ms PPL Δ=0.8700000000000001
Agent smoke: 1/5 (0.2) [model_limited]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
