sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 68.3 | 69.4 | 1874.1 | 1949.6 | 14.64 | 7.79 | 100.0 | 100.0 | 89.1 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 63.7 | 64.2 | 2008.5 | 2066.7 | 15.69 | 7.12 | 92.9 | 93.8 | 97.5 | 95
IQ3_M · ctx:16384 · kv:k8v8 · default | 65.5 | 66.5 | 1952.7 | 1999.6 | 15.26 | 7.79 | 95.9 | 96.7 | 89.1 | 93
Q4_K_M · ctx:16384 · kv:k8v8 · default | 60.2 | 61.2 | 2126.6 | 2161.2 | 16.61 | 7.12 | 88.2 | 89.2 | 97.5 | 92
Q5_K_M · ctx:8192 · kv:k16v16 · default | 57.5 | 59.0 | 2225.3 | 2254.6 | 17.39 | 7.01 | 84.6 | 85.3 | 99.0 | 91
Q3_K_M · ctx:8192 · kv:k16v16 · default | 61.7 | 63.7 | 2075.2 | 2097.2 | 16.21 | 7.58 | 91.1 | 91.6 | 91.6 | 91
IQ3_M · ctx:32768 · kv:k8v8 · long | 63.8 | 65.4 | 2006.1 | 2170.4 | 15.67 | 7.79 | 93.8 | 91.6 | 89.1 | 91
Q3_K_M · ctx:16384 · kv:k8v8 · default | 60.3 | 61.5 | 2122.3 | 2139.0 | 16.58 | 7.58 | 88.5 | 89.7 | 91.6 | 90
Q4_K_M · ctx:32768 · kv:k8v8 · long | 58.5 | 59.6 | 2188.6 | 2349.4 | 17.1 | 7.12 | 85.8 | 84.3 | 97.5 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 54.7 | 56.1 | 2342.2 | 2374.4 | 18.3 | 7.01 | 80.5 | 81.1 | 99.0 | 88
Q3_K_M · ctx:32768 · kv:k8v8 · long | 58.8 | 59.4 | 2176.6 | 2284.5 | 17.01 | 7.58 | 85.8 | 85.7 | 91.6 | 88
Q5_K_M · ctx:32768 · kv:k8v8 · long | 54.0 | 54.7 | 2371.3 | 2421.7 | 18.53 | 7.01 | 78.9 | 79.8 | 99.0 | 87
Q8_0 · ctx:8192 · kv:k16v16 · default | 46.5 | 47.7 | 2750.2 | 2770.1 | 21.49 | 6.94 | 68.4 | 69.3 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k8v8 · default | 44.9 | 45.5 | 2852.5 | 2885.7 | 22.28 | 6.94 | 65.7 | 66.6 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k16v16 · long | 46.1 | 46.6 | 2775.1 | 2984.9 | 21.68 | 6.94 | 67.3 | 66.4 | 100.0 | 80
Q8_0 · ctx:32768 · kv:k8v8 · long | 43.6 | 45.1 | 2935.8 | 2957.5 | 22.94 | 6.94 | 64.4 | 64.9 | 100.0 | 79

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.0 TPS Δ=21.799999999999997 TTFT Δ=-876.0999999999999ms PPL Δ=0.8499999999999996
Agent smoke: 1/5 (0.2) [model_limited]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
