sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q5_K_M · ctx:8192 · kv:k16v16 · default  <- best | 68.1 | n/a | 1879.2 | n/a | 14.68 | 13.61 | 99.4 | 99.4 | 98.6 | 99
Q4_K_M · ctx:8192 · kv:k16v16 · default | 68.5 | n/a | 1867.4 | n/a | 14.59 | 14.32 | 100.0 | 100.0 | 93.7 | 97
Q5_K_M · ctx:16384 · kv:k16v16 · long | 63.4 | n/a | 2017.8 | n/a | 15.76 | 13.61 | 92.6 | 92.5 | 98.6 | 95
Q4_K_M · ctx:16384 · kv:k16v16 · long | 63.7 | n/a | 2008.5 | n/a | 15.69 | 14.32 | 93.0 | 93.0 | 93.7 | 93
Q4_K_M · ctx:16384 · kv:k8v8 · default | 61.7 | n/a | 2073.4 | n/a | 16.20 | 14.32 | 90.1 | 90.1 | 93.7 | 92
Q8_0 · ctx:8192 · kv:k16v16 · default | 59.7 | n/a | 2145.0 | n/a | 16.76 | 13.42 | 87.2 | 87.1 | 100.0 | 92
Q5_K_M · ctx:32768 · kv:k8v8 · long | 59.6 | n/a | 2146.2 | n/a | 16.77 | 13.61 | 87.0 | 87.0 | 98.6 | 92
Q5_K_M · ctx:16384 · kv:k8v8 · default | 58.8 | n/a | 2176.4 | n/a | 17.00 | 13.61 | 85.8 | 85.8 | 98.6 | 91
Q8_0 · ctx:16384 · kv:k16v16 · long | 57.9 | n/a | 2212.6 | n/a | 17.29 | 13.42 | 84.5 | 84.4 | 100.0 | 91
Q3_K_M · ctx:8192 · kv:k16v16 · default | 62.5 | n/a | 2048.0 | n/a | 16.00 | 15.82 | 91.2 | 91.2 | 84.8 | 89
Q4_K_M · ctx:32768 · kv:k8v8 · long | 59.3 | n/a | 2158.3 | n/a | 16.86 | 14.32 | 86.6 | 86.5 | 93.7 | 89
Q8_0 · ctx:32768 · kv:k8v8 · long | 56.4 | n/a | 2267.7 | n/a | 17.72 | 13.42 | 82.3 | 82.3 | 100.0 | 89
Q3_K_M · ctx:16384 · kv:k16v16 · long | 59.3 | n/a | 2158.3 | n/a | 16.86 | 15.82 | 86.6 | 86.5 | 84.8 | 86
Q8_0 · ctx:16384 · kv:k8v8 · default | 52.3 | n/a | 2445.5 | n/a | 19.11 | 13.42 | 76.4 | 76.4 | 100.0 | 86
Q3_K_M · ctx:16384 · kv:k8v8 · default | 58.4 | n/a | 2193.7 | n/a | 17.14 | 15.82 | 85.3 | 85.1 | 84.8 | 85
Q3_K_M · ctx:32768 · kv:k8v8 · long | 55.2 | n/a | 2318.6 | n/a | 18.11 | 15.82 | 80.6 | 80.5 | 84.8 | 82

Best config: Q5_K_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=7.00 TPS Δ=8.40 TTFT Δ=-265.8ms PPL Δ=0.19
Confidence: target=medium gap_before=2.02% var_before=n/a% replay=False(disabled) gap_after=2.02%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
