sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 63.4 | 63.8 | 2020.3 | 2167.1 | 15.78 | 7.79 | 100.0 | 100.0 | 88.8 | 96
Q3_K_M · ctx:8192 · kv:k16v16 · default | 58.7 | 59.3 | 2180.7 | 2187.6 | 17.04 | 7.52 | 92.8 | 95.9 | 92.0 | 93
IQ3_M · ctx:16384 · kv:k8v8 · default | 58.5 | 60.2 | 2188.4 | 2214.8 | 17.1 | 7.79 | 93.3 | 95.1 | 88.8 | 92
Q4_K_M · ctx:8192 · kv:k16v16 · default | 55.1 | 56.3 | 2322.9 | 2544.7 | 18.15 | 7.1 | 87.6 | 86.1 | 97.5 | 91
Q4_K_M · ctx:16384 · kv:k8v8 · default | 53.7 | 54.4 | 2382.7 | 2671.4 | 18.61 | 7.1 | 85.0 | 83.0 | 97.5 | 90
IQ3_M · ctx:32768 · kv:k8v8 · long | 56.7 | 57.9 | 2256.7 | 2325.1 | 17.63 | 7.79 | 90.1 | 91.4 | 88.8 | 90
Q3_K_M · ctx:16384 · kv:k8v8 · default | 54.7 | 56.6 | 2338.2 | 2409.4 | 18.27 | 7.52 | 87.5 | 88.2 | 92.0 | 89
Q5_K_M · ctx:8192 · kv:k16v16 · default | 52.2 | 52.6 | 2453.5 | 2532.5 | 19.17 | 6.99 | 82.4 | 84.0 | 99.0 | 89
Q5_K_M · ctx:16384 · kv:k8v8 · default | 49.9 | 51.2 | 2563.6 | 2593.0 | 20.03 | 6.99 | 79.5 | 81.2 | 99.0 | 88
Q4_K_M · ctx:32768 · kv:k8v8 · long | 51.5 | 51.8 | 2484.1 | 2531.6 | 19.41 | 7.1 | 81.2 | 83.5 | 97.5 | 88
Q3_K_M · ctx:32768 · kv:k8v8 · long | 53.9 | 54.2 | 2376.0 | 2416.6 | 18.56 | 7.52 | 85.0 | 87.4 | 92.0 | 88
Q5_K_M · ctx:32768 · kv:k8v8 · long | 48.1 | 48.3 | 2663.6 | 2677.5 | 20.81 | 6.99 | 75.8 | 78.4 | 99.0 | 86
Q8_0 · ctx:8192 · kv:k16v16 · default | 41.9 | 42.7 | 3057.6 | 3098.1 | 23.89 | 6.92 | 66.5 | 68.0 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k16v16 · long | 42.1 | 42.3 | 3041.7 | 3098.4 | 23.76 | 6.92 | 66.4 | 68.2 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k8v8 · default | 40.8 | 41.8 | 3136.9 | 3157.7 | 24.51 | 6.92 | 64.9 | 66.5 | 100.0 | 79
Q8_0 · ctx:32768 · kv:k8v8 · long | 40.3 | 40.6 | 3174.3 | 3305.2 | 24.8 | 6.92 | 63.6 | 64.6 | 100.0 | 78

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=16.0 TPS Δ=21.5 TTFT Δ=-1037.3ms PPL Δ=0.8700000000000001
Agent smoke: 1/5 (0.2) [model_limited]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
