sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q4_K_M · ctx:8192 · kv:k16v16 · default  <- best | 64.8 | 70.4 | 1973.9 | 2053.2 | 15.42 | 14.32 | 99.9 | 100.0 | 93.7 | 97
Q4_K_M · ctx:16384 · kv:k16v16 · long | 64.0 | 70.5 | 1998.3 | 2082.0 | 15.61 | 14.32 | 99.4 | 98.7 | 93.7 | 97
Q5_K_M · ctx:8192 · kv:k16v16 · default | 62.6 | 65.8 | 2045.2 | 2277.7 | 15.98 | 13.61 | 95.0 | 93.3 | 98.6 | 96
Q5_K_M · ctx:16384 · kv:k16v16 · long | 61.7 | 64.5 | 2077.1 | 2142.3 | 16.23 | 13.61 | 93.4 | 95.4 | 98.6 | 96
Q5_K_M · ctx:32768 · kv:k8v8 · long | 58.4 | 59.5 | 2191.8 | 2370.5 | 17.12 | 13.61 | 87.3 | 88.3 | 98.6 | 92
Q8_0 · ctx:8192 · kv:k16v16 · default | 56.4 | 59.3 | 2272.4 | 2347.6 | 17.75 | 13.42 | 85.6 | 87.2 | 100.0 | 92
Q3_K_M · ctx:16384 · kv:k16v16 · long | 62.1 | 64.4 | 2062.2 | 2113.8 | 16.11 | 15.82 | 93.6 | 96.4 | 84.8 | 91
Q4_K_M · ctx:32768 · kv:k8v8 · long | 59.0 | 59.6 | 2170.0 | 2219.8 | 16.95 | 14.32 | 87.8 | 91.7 | 93.7 | 91
Q8_0 · ctx:16384 · kv:k16v16 · long | 56.4 | 57.6 | 2270.8 | 2328.8 | 17.74 | 13.42 | 84.4 | 87.5 | 100.0 | 91
Q3_K_M · ctx:8192 · kv:k16v16 · default | 61.0 | 65.0 | 2099.6 | 2533.2 | 16.41 | 15.82 | 93.2 | 87.5 | 84.8 | 89
Q5_K_M · ctx:16384 · kv:k8v8 · default | 54.9 | 58.0 | 2331.2 | 2546.9 | 18.21 | 13.61 | 83.5 | 82.6 | 98.6 | 89
Q4_K_M · ctx:16384 · kv:k8v8 · default | 55.5 | 58.9 | 2312.7 | 2432.1 | 18.07 | 14.32 | 84.6 | 84.9 | 93.7 | 88
Q8_0 · ctx:32768 · kv:k8v8 · long | 52.8 | 55.2 | 2425.8 | 2530.1 | 18.95 | 13.42 | 79.9 | 81.3 | 100.0 | 88
Q3_K_M · ctx:32768 · kv:k8v8 · long | 55.8 | 56.4 | 2292.6 | 2389.0 | 17.91 | 15.82 | 83.1 | 86.0 | 84.8 | 84
Q8_0 · ctx:16384 · kv:k8v8 · default | 49.0 | 49.1 | 2613.1 | 2707.8 | 20.41 | 13.42 | 72.6 | 75.7 | 100.0 | 84
Q3_K_M · ctx:16384 · kv:k8v8 · default | 53.8 | 54.5 | 2381.9 | 2586.1 | 18.60 | 15.82 | 80.2 | 81.1 | 84.8 | 82

Best config: Q4_K_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=5.00 TPS Δ=8.40 TTFT Δ=-298.5ms PPL Δ=0.90
                     TPS p95 Δ=11.10 TTFT p95 Δ=-294.4ms
Confidence: target=medium gap_before=0.00% var_before=6.75% replay=False(disabled) gap_after=0.00%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
