sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q4_K_M · ctx:16384 · kv:k16v16 · long  <- best | 73.3 | n/a | 1745.8 | n/a | 13.64 | 14.32 | 100.0 | 100.0 | 93.7 | 97
Q5_K_M · ctx:16384 · kv:k16v16 · long | 70.1 | n/a | 1826.5 | n/a | 14.27 | 13.61 | 95.6 | 95.6 | 98.6 | 97
Q5_K_M · ctx:8192 · kv:k16v16 · default | 69.1 | n/a | 1851.7 | n/a | 14.47 | 13.61 | 94.3 | 94.3 | 98.6 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 70.2 | n/a | 1824.1 | n/a | 14.25 | 14.32 | 95.8 | 95.7 | 93.7 | 95
Q8_0 · ctx:16384 · kv:k16v16 · long | 65.2 | n/a | 1963.5 | n/a | 15.34 | 13.42 | 88.9 | 88.9 | 100.0 | 93
Q3_K_M · ctx:8192 · kv:k16v16 · default | 70.0 | n/a | 1828.3 | n/a | 14.28 | 15.82 | 95.5 | 95.5 | 84.8 | 91
Q3_K_M · ctx:16384 · kv:k16v16 · long | 69.5 | n/a | 1843.0 | n/a | 14.40 | 15.82 | 94.8 | 94.7 | 84.8 | 91
Q5_K_M · ctx:16384 · kv:k8v8 · default | 63.4 | n/a | 2019.5 | n/a | 15.78 | 13.61 | 86.5 | 86.4 | 98.6 | 91
Q8_0 · ctx:8192 · kv:k16v16 · default | 62.3 | n/a | 2055.4 | n/a | 16.06 | 13.42 | 85.0 | 84.9 | 100.0 | 91
Q4_K_M · ctx:32768 · kv:k8v8 · long | 64.4 | n/a | 1987.8 | n/a | 15.53 | 14.32 | 87.9 | 87.8 | 93.7 | 90
Q5_K_M · ctx:32768 · kv:k8v8 · long | 62.3 | n/a | 2054.7 | n/a | 16.05 | 13.61 | 85.0 | 85.0 | 98.6 | 90
Q4_K_M · ctx:16384 · kv:k8v8 · default | 62.3 | n/a | 2053.7 | n/a | 16.04 | 14.32 | 85.0 | 85.0 | 93.7 | 88
Q8_0 · ctx:32768 · kv:k8v8 · long | 57.3 | n/a | 2232.3 | n/a | 17.44 | 13.42 | 78.2 | 78.2 | 100.0 | 87
Q8_0 · ctx:16384 · kv:k8v8 · default | 56.1 | n/a | 2282.3 | n/a | 17.83 | 13.42 | 76.5 | 76.5 | 100.0 | 86
Q3_K_M · ctx:16384 · kv:k8v8 · default | 62.6 | n/a | 2045.9 | n/a | 15.98 | 15.82 | 85.4 | 85.3 | 84.8 | 85
Q3_K_M · ctx:32768 · kv:k8v8 · long | 59.5 | n/a | 2152.5 | n/a | 16.82 | 15.82 | 81.2 | 81.1 | 84.8 | 83

Best config: Q4_K_M · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=4.00 TPS Δ=4.90 TTFT Δ=-137.0ms PPL Δ=0.19
Confidence: target=medium gap_before=0.00% var_before=n/a% replay=False(disabled) gap_after=0.00%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
