sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q5_K_M · ctx:16384 · kv:k16v16 · long  <- best | 68.0 | 70.5 | 1880.0 | 1915.7 | 14.69 | 13.61 | 97.2 | 97.9 | 98.6 | 98
Q4_K_M · ctx:8192 · kv:k16v16 · default | 70.2 | 72.3 | 1825.6 | 1890.6 | 14.26 | 14.32 | 100.0 | 100.0 | 93.7 | 97
Q4_K_M · ctx:16384 · kv:k16v16 · long | 69.7 | 70.7 | 1837.5 | 1952.3 | 14.36 | 14.32 | 98.5 | 98.1 | 93.7 | 97
Q5_K_M · ctx:8192 · kv:k16v16 · default | 67.5 | 69.3 | 1898.7 | 2137.8 | 14.84 | 13.61 | 96.0 | 92.3 | 98.6 | 96
Q8_0 · ctx:16384 · kv:k16v16 · long | 62.4 | 63.5 | 2050.7 | 2106.6 | 16.02 | 13.42 | 88.4 | 89.4 | 100.0 | 93
Q8_0 · ctx:8192 · kv:k16v16 · default | 60.4 | 62.0 | 2121.6 | 2170.5 | 16.58 | 13.42 | 85.9 | 86.6 | 100.0 | 92
Q5_K_M · ctx:16384 · kv:k8v8 · default | 60.2 | 63.9 | 2126.5 | 2169.5 | 16.62 | 13.61 | 87.1 | 86.5 | 98.6 | 92
Q4_K_M · ctx:32768 · kv:k8v8 · long | 62.3 | 63.9 | 2054.5 | 2123.1 | 16.05 | 14.32 | 88.6 | 89.0 | 93.7 | 91
Q3_K_M · ctx:16384 · kv:k16v16 · long | 65.7 | 68.0 | 1948.8 | 2079.7 | 15.22 | 15.82 | 93.8 | 92.3 | 84.8 | 90
Q4_K_M · ctx:16384 · kv:k8v8 · default | 61.3 | 65.6 | 2090.6 | 2210.4 | 16.33 | 14.32 | 89.0 | 86.4 | 93.7 | 90
Q5_K_M · ctx:32768 · kv:k8v8 · long | 58.5 | 61.7 | 2188.8 | 2277.1 | 17.10 | 13.61 | 84.3 | 83.2 | 98.6 | 90
Q3_K_M · ctx:8192 · kv:k16v16 · default | 64.2 | 67.1 | 1992.5 | 2083.1 | 15.57 | 15.82 | 92.1 | 91.2 | 84.8 | 89
Q8_0 · ctx:32768 · kv:k8v8 · long | 55.9 | 56.0 | 2293.4 | 2480.3 | 17.92 | 13.42 | 78.5 | 77.9 | 100.0 | 87
Q8_0 · ctx:16384 · kv:k8v8 · default | 55.0 | 56.7 | 2327.7 | 2536.2 | 18.19 | 13.42 | 78.4 | 76.5 | 100.0 | 87
Q3_K_M · ctx:32768 · kv:k8v8 · long | 59.7 | 61.2 | 2146.2 | 2198.9 | 16.77 | 15.82 | 84.8 | 85.5 | 84.8 | 85
Q3_K_M · ctx:16384 · kv:k8v8 · default | 59.2 | 61.9 | 2162.4 | 2301.2 | 16.90 | 15.82 | 85.0 | 83.3 | 84.8 | 85

Best config: Q5_K_M · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=5.00 TPS Δ=5.60 TTFT Δ=-170.7ms PPL Δ=0.19
                     TPS p95 Δ=7.00 TTFT p95 Δ=-190.9ms
Confidence: target=medium gap_before=1.02% var_before=3.03% replay=False(disabled) gap_after=1.02%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
