sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q5_K_M · ctx:16384 · kv:k16v16 · long  <- best | 69.8 | 70.2 | 1833.4 | 1917.6 | 14.32 | 13.61 | 96.4 | 98.3 | 98.6 | 98
Q4_K_M · ctx:8192 · kv:k16v16 · default | 70.6 | 74.8 | 1815.6 | 1902.8 | 14.18 | 14.32 | 100.0 | 99.1 | 93.7 | 97
Q5_K_M · ctx:8192 · kv:k16v16 · default | 68.8 | 70.3 | 1861.2 | 1884.6 | 14.54 | 13.61 | 95.7 | 98.4 | 98.6 | 97
Q4_K_M · ctx:16384 · kv:k16v16 · long | 70.0 | 70.6 | 1829.7 | 1870.1 | 14.29 | 14.32 | 96.8 | 99.6 | 93.7 | 96
Q8_0 · ctx:8192 · kv:k16v16 · default | 62.0 | 63.4 | 2064.9 | 2079.0 | 16.13 | 13.42 | 86.3 | 88.9 | 100.0 | 92
Q8_0 · ctx:16384 · kv:k16v16 · long | 60.8 | 63.6 | 2103.7 | 2191.6 | 16.43 | 13.42 | 85.6 | 85.8 | 100.0 | 91
Q3_K_M · ctx:8192 · kv:k16v16 · default | 66.6 | 68.8 | 1924.0 | 2012.5 | 15.04 | 15.82 | 93.2 | 93.6 | 84.8 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 61.0 | 61.8 | 2096.8 | 2209.5 | 16.38 | 13.61 | 84.5 | 85.6 | 98.6 | 90
Q3_K_M · ctx:16384 · kv:k16v16 · long | 66.3 | 67.1 | 1930.2 | 1985.5 | 15.09 | 15.82 | 91.8 | 94.1 | 84.8 | 89
Q4_K_M · ctx:16384 · kv:k8v8 · default | 61.0 | 64.1 | 2097.8 | 2194.2 | 16.39 | 14.32 | 86.0 | 85.9 | 93.7 | 89
Q5_K_M · ctx:32768 · kv:k8v8 · long | 58.6 | 61.8 | 2190.5 | 2347.8 | 17.12 | 13.61 | 82.8 | 81.3 | 98.6 | 89
Q4_K_M · ctx:32768 · kv:k8v8 · long | 59.8 | 62.8 | 2145.4 | 2274.7 | 16.77 | 14.32 | 84.3 | 83.4 | 93.7 | 88
Q8_0 · ctx:32768 · kv:k8v8 · long | 55.9 | 58.2 | 2289.4 | 2332.8 | 17.88 | 13.42 | 78.5 | 79.7 | 100.0 | 87
Q8_0 · ctx:16384 · kv:k8v8 · default | 55.6 | 56.0 | 2302.3 | 2330.0 | 17.99 | 13.42 | 76.8 | 79.6 | 100.0 | 87
Q3_K_M · ctx:16384 · kv:k8v8 · default | 59.2 | 60.0 | 2161.4 | 2247.9 | 16.88 | 15.82 | 82.0 | 83.6 | 84.8 | 83
Q3_K_M · ctx:32768 · kv:k8v8 · long | 57.6 | 60.5 | 2228.3 | 2339.9 | 17.41 | 15.82 | 81.2 | 80.7 | 84.8 | 83

Best config: Q5_K_M · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=6.00 TPS Δ=7.80 TTFT Δ=-231.5ms PPL Δ=0.19
                     TPS p95 Δ=6.80 TTFT p95 Δ=-161.4ms
Confidence: target=medium gap_before=1.02% var_before=3.98% replay=False(disabled) gap_after=1.02%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
