sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 67.5 | 69.3 | 1896.2 | 2356.9 | 14.81 | 7.79 | 100.0 | 93.5 | 89.1 | 94
IQ3_M · ctx:16384 · kv:k8v8 · default | 63.8 | 66.8 | 2005.8 | 2063.9 | 15.67 | 7.79 | 95.5 | 96.9 | 89.1 | 93
Q4_K_M · ctx:16384 · kv:k8v8 · default | 59.3 | 61.1 | 2157.0 | 2254.9 | 16.85 | 7.12 | 88.0 | 89.4 | 97.5 | 92
IQ3_M · ctx:32768 · kv:k8v8 · long | 63.4 | 64.7 | 2018.5 | 2048.9 | 15.77 | 7.79 | 93.6 | 97.0 | 89.1 | 92
Q5_K_M · ctx:8192 · kv:k16v16 · default | 57.4 | 58.0 | 2229.0 | 2275.2 | 17.41 | 7.01 | 84.4 | 87.6 | 99.0 | 91
Q4_K_M · ctx:8192 · kv:k16v16 · default | 61.9 | 63.1 | 2069.5 | 3354.4 | 16.17 | 7.12 | 91.4 | 76.4 | 97.5 | 91
Q3_K_M · ctx:16384 · kv:k8v8 · default | 59.9 | 61.0 | 2137.0 | 2194.8 | 16.7 | 7.58 | 88.4 | 91.0 | 91.6 | 90
Q3_K_M · ctx:8192 · kv:k16v16 · default | 60.9 | 62.5 | 2102.8 | 2354.0 | 16.43 | 7.58 | 90.2 | 88.6 | 91.6 | 90
Q4_K_M · ctx:32768 · kv:k8v8 · long | 55.2 | 58.9 | 2317.6 | 2365.5 | 18.11 | 7.12 | 83.4 | 84.2 | 97.5 | 89
Q3_K_M · ctx:32768 · kv:k8v8 · long | 58.5 | 59.1 | 2187.4 | 2206.1 | 17.09 | 7.58 | 86.0 | 89.8 | 91.6 | 89
Q5_K_M · ctx:16384 · kv:k8v8 · default | 53.8 | 56.3 | 2377.8 | 2606.4 | 18.58 | 7.01 | 80.5 | 79.2 | 99.0 | 88
Q5_K_M · ctx:32768 · kv:k8v8 · long | 53.7 | 54.8 | 2384.2 | 2718.2 | 18.63 | 7.01 | 79.3 | 77.5 | 99.0 | 87
Q8_0 · ctx:8192 · kv:k16v16 · default | 47.5 | 47.6 | 2696.4 | 2827.9 | 21.07 | 6.94 | 69.5 | 71.4 | 100.0 | 82
Q8_0 · ctx:16384 · kv:k8v8 · default | 44.5 | 46.1 | 2875.2 | 2954.7 | 22.46 | 6.94 | 66.2 | 67.6 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k16v16 · long | 44.8 | 46.4 | 2854.5 | 2914.3 | 22.3 | 6.94 | 66.7 | 68.4 | 100.0 | 80
Q8_0 · ctx:32768 · kv:k8v8 · long | 43.9 | 44.8 | 2917.2 | 3117.3 | 22.79 | 6.94 | 64.8 | 65.4 | 100.0 | 79

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=12.0 TPS Δ=20.0 TTFT Δ=-800.2ms PPL Δ=0.8499999999999996
Agent smoke: 1/5 (0.2) [model_limited]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
