sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 71.4 | 72.9 | 1791.8 | 1796.1 | 14.0 | 7.79 | 100.0 | 100.0 | 89.1 | 96
IQ3_M · ctx:16384 · kv:k8v8 · default | 69.4 | 70.1 | 1845.6 | 1854.7 | 14.42 | 7.79 | 96.7 | 97.0 | 89.1 | 94
Q4_K_M · ctx:8192 · kv:k16v16 · default | 65.5 | 66.0 | 1953.5 | 1980.7 | 15.26 | 7.12 | 91.1 | 91.2 | 97.5 | 94
IQ3_M · ctx:16384 · kv:k16v16 · long | 70.0 | 70.8 | 1828.6 | 1880.2 | 14.29 | 7.79 | 97.6 | 96.8 | 89.1 | 94
Q4_K_M · ctx:16384 · kv:k16v16 · long | 64.2 | 64.6 | 1994.8 | 2018.8 | 15.58 | 7.12 | 89.3 | 89.4 | 97.5 | 93
IQ3_M · ctx:32768 · kv:k8v8 · long | 67.8 | 68.1 | 1887.2 | 1904.4 | 14.74 | 7.79 | 94.2 | 94.6 | 89.1 | 92
Q4_K_M · ctx:16384 · kv:k8v8 · default | 62.2 | 62.4 | 2056.2 | 2083.9 | 16.06 | 7.12 | 86.4 | 86.7 | 97.5 | 91
Q4_K_M · ctx:32768 · kv:k8v8 · long | 61.8 | 62.1 | 2070.9 | 2139.2 | 16.18 | 7.12 | 85.9 | 85.2 | 97.5 | 90
Q5_K_M · ctx:8192 · kv:k16v16 · default | 58.7 | 60.2 | 2180.4 | 2198.1 | 17.03 | 7.01 | 82.4 | 81.9 | 99.0 | 89
Q5_K_M · ctx:16384 · kv:k16v16 · long | 59.0 | 59.5 | 2169.7 | 2206.9 | 16.95 | 7.01 | 82.1 | 82.0 | 99.0 | 89
Q5_K_M · ctx:32768 · kv:k8v8 · long | 56.8 | 57.3 | 2253.8 | 2294.5 | 17.61 | 7.01 | 79.1 | 78.9 | 99.0 | 87
Q5_K_M · ctx:16384 · kv:k8v8 · default | 56.4 | 56.6 | 2268.0 | 2414.7 | 17.72 | 7.01 | 78.3 | 76.7 | 99.0 | 86
Q8_0 · ctx:8192 · kv:k16v16 · default | 49.3 | 49.3 | 2599.0 | 2704.1 | 20.3 | 6.94 | 68.3 | 67.7 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k16v16 · long | 48.4 | 48.6 | 2647.3 | 2702.2 | 20.68 | 6.94 | 67.2 | 67.1 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k8v8 · default | 46.2 | 47.6 | 2771.2 | 2784.0 | 21.65 | 6.94 | 65.0 | 64.6 | 100.0 | 79
Q8_0 · ctx:32768 · kv:k8v8 · long | 46.4 | 46.8 | 2758.6 | 2789.9 | 21.55 | 6.94 | 64.6 | 64.7 | 100.0 | 79

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.0 TPS Δ=22.10000000000001 TTFT Δ=-807.2ms PPL Δ=0.8499999999999996
Agent smoke: 1/5 (0.2) [model_limited]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
