sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q5_K_M · ctx:16384 · kv:k16v16 · long  <- best | 67.0 | 68.1 | 1909.2 | 1941.3 | 14.91 | 13.61 | 96.2 | 98.5 | 98.6 | 98
Q4_K_M · ctx:8192 · kv:k16v16 · default | 68.0 | 70.8 | 1883.1 | 1932.9 | 14.71 | 14.32 | 98.9 | 99.4 | 93.7 | 97
Q5_K_M · ctx:8192 · kv:k16v16 · default | 65.8 | 68.5 | 1948.0 | 2031.9 | 15.21 | 13.61 | 95.7 | 95.3 | 98.6 | 97
Q4_K_M · ctx:16384 · kv:k16v16 · long | 68.8 | 71.6 | 1858.7 | 2262.8 | 14.53 | 14.32 | 100.0 | 92.7 | 93.7 | 96
Q8_0 · ctx:8192 · kv:k16v16 · default | 61.1 | 61.8 | 2095.1 | 2185.0 | 16.37 | 13.42 | 87.6 | 88.6 | 100.0 | 93
Q8_0 · ctx:16384 · kv:k16v16 · long | 60.5 | 61.7 | 2114.9 | 2145.0 | 16.52 | 13.42 | 87.1 | 89.0 | 100.0 | 93
Q3_K_M · ctx:8192 · kv:k16v16 · default | 66.4 | 66.8 | 1928.6 | 2014.6 | 15.07 | 15.82 | 94.9 | 96.2 | 84.8 | 91
Q4_K_M · ctx:16384 · kv:k8v8 · default | 61.5 | 65.3 | 2078.9 | 2171.8 | 16.24 | 14.32 | 90.3 | 89.2 | 93.7 | 91
Q4_K_M · ctx:32768 · kv:k8v8 · long | 61.1 | 63.5 | 2095.1 | 2166.5 | 16.37 | 14.32 | 88.7 | 89.0 | 93.7 | 91
Q3_K_M · ctx:16384 · kv:k16v16 · long | 65.6 | 67.2 | 1952.0 | 2228.9 | 15.25 | 15.82 | 94.6 | 91.0 | 84.8 | 90
Q5_K_M · ctx:32768 · kv:k8v8 · long | 58.7 | 59.1 | 2181.1 | 2217.3 | 17.04 | 13.61 | 83.9 | 86.2 | 98.6 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 58.3 | 59.3 | 2194.2 | 2222.5 | 17.14 | 13.61 | 83.8 | 85.8 | 98.6 | 90
Q8_0 · ctx:16384 · kv:k8v8 · default | 54.3 | 56.1 | 2355.4 | 2444.1 | 18.40 | 13.42 | 78.6 | 79.0 | 100.0 | 87
Q8_0 · ctx:32768 · kv:k8v8 · long | 54.0 | 55.4 | 2373.0 | 2403.9 | 18.54 | 13.42 | 77.9 | 79.4 | 100.0 | 87
Q3_K_M · ctx:16384 · kv:k8v8 · default | 59.0 | 60.3 | 2166.7 | 2220.4 | 16.93 | 15.82 | 85.0 | 86.4 | 84.8 | 85
Q3_K_M · ctx:32768 · kv:k8v8 · long | 58.8 | 59.4 | 2178.8 | 2212.3 | 17.02 | 15.82 | 84.2 | 86.3 | 84.8 | 85

Best config: Q5_K_M · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=5.00 TPS Δ=5.90 TTFT Δ=-185.9ms PPL Δ=0.19
                     TPS p95 Δ=6.30 TTFT p95 Δ=-243.7ms
Confidence: target=medium gap_before=1.02% var_before=2.52% replay=False(disabled) gap_after=1.02%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
