sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q5_K_M · ctx:16384 · kv:k16v16 · long  <- best | 50.8 | 55.0 | 2524.4 | 2882.9 | 19.72 | 13.59 | 96.7 | 92.6 | 98.5 | 97
Q4_K_M · ctx:16384 · kv:k16v16 · long | 53.2 | 55.5 | 2406.4 | 2889.8 | 18.80 | 14.33 | 99.4 | 94.8 | 93.4 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 50.8 | 56.2 | 2520.2 | 2590.1 | 19.69 | 14.33 | 97.7 | 97.7 | 93.4 | 96
Q5_K_M · ctx:8192 · kv:k16v16 · default | 49.4 | 54.7 | 2595.4 | 2891.9 | 20.28 | 13.59 | 95.1 | 91.1 | 98.5 | 96
Q8_0 · ctx:16384 · kv:k16v16 · long | 47.8 | 50.8 | 2676.7 | 2867.7 | 20.91 | 13.39 | 90.1 | 90.1 | 100.0 | 94
Q8_0 · ctx:8192 · kv:k16v16 · default | 46.3 | 51.7 | 2767.0 | 2865.1 | 21.62 | 13.39 | 89.5 | 88.7 | 100.0 | 94
Q3_K_M · ctx:16384 · kv:k16v16 · long | 51.4 | 54.2 | 2497.6 | 2670.8 | 19.52 | 15.77 | 96.5 | 96.7 | 84.9 | 92
Q3_K_M · ctx:8192 · kv:k16v16 · default | 48.4 | 53.6 | 2647.8 | 2748.2 | 20.69 | 15.77 | 93.2 | 92.6 | 84.9 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 43.8 | 46.6 | 2929.2 | 3779.6 | 22.88 | 13.59 | 82.6 | 75.3 | 98.5 | 88
Q8_0 · ctx:16384 · kv:k8v8 · default | 40.8 | 44.4 | 3157.4 | 3394.1 | 24.66 | 13.39 | 77.8 | 76.3 | 100.0 | 86
Q5_K_M · ctx:32768 · kv:k8v8 · long | 40.2 | 46.2 | 3190.4 | 3574.9 | 24.93 | 13.59 | 78.9 | 73.9 | 98.5 | 86
Q4_K_M · ctx:16384 · kv:k8v8 · default | 44.9 | 49.0 | 2866.4 | 4940.0 | 22.40 | 14.33 | 85.8 | 68.2 | 93.4 | 85
Q4_K_M · ctx:32768 · kv:k8v8 · long | 41.3 | 48.3 | 3110.4 | 3603.5 | 24.30 | 14.33 | 81.8 | 74.6 | 93.4 | 85
Q3_K_M · ctx:32768 · kv:k8v8 · long | 44.3 | 46.1 | 2889.4 | 3189.1 | 22.58 | 15.77 | 82.6 | 82.3 | 84.9 | 83
Q3_K_M · ctx:16384 · kv:k8v8 · default | 43.2 | 47.0 | 2967.6 | 3206.4 | 23.18 | 15.77 | 82.4 | 80.9 | 84.9 | 83
Q8_0 · ctx:32768 · kv:k8v8 · long | 39.7 | 43.5 | 3226.2 | 5142.9 | 25.20 | 13.39 | 76.0 | 62.5 | 100.0 | 83

Best config: Q5_K_M · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=3.00 TPS Δ=4.50 TTFT Δ=-242.6ms PPL Δ=0.20
                     TPS p95 Δ=3.30 TTFT p95 Δ=17.8ms
Confidence: target=medium gap_before=1.03% var_before=11.72% replay=False(disabled) gap_after=1.03%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
