sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q5_K_M · ctx:8192 · kv:k16v16 · default  <- best | 57.6 | 58.0 | 2222.3 | 2235.3 | 17.36 | 13.59 | 97.3 | 97.6 | 98.5 | 98
Q5_K_M · ctx:16384 · kv:k16v16 · long | 57.5 | 58.5 | 2226.9 | 2252.5 | 17.40 | 13.59 | 97.6 | 97.1 | 98.5 | 98
Q4_K_M · ctx:8192 · kv:k16v16 · default | 59.2 | 59.6 | 2160.9 | 2190.8 | 16.88 | 14.33 | 100.0 | 100.0 | 93.4 | 97
Q4_K_M · ctx:16384 · kv:k16v16 · long | 58.9 | 59.2 | 2175.5 | 2195.0 | 17.00 | 14.33 | 99.4 | 99.6 | 93.4 | 97
Q8_0 · ctx:16384 · kv:k16v16 · long | 53.2 | 53.6 | 2403.0 | 2435.4 | 18.77 | 13.39 | 89.9 | 89.9 | 100.0 | 94
Q8_0 · ctx:8192 · kv:k16v16 · default | 52.8 | 53.7 | 2424.5 | 2444.0 | 18.95 | 13.39 | 89.6 | 89.4 | 100.0 | 94
Q3_K_M · ctx:8192 · kv:k16v16 · default | 55.8 | 56.3 | 2293.5 | 2333.0 | 17.91 | 15.77 | 94.4 | 94.1 | 84.9 | 91
Q3_K_M · ctx:16384 · kv:k16v16 · long | 54.9 | 55.5 | 2331.6 | 2359.9 | 18.21 | 15.77 | 92.9 | 92.8 | 84.9 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 47.6 | 48.7 | 2689.6 | 2810.9 | 21.01 | 13.59 | 81.1 | 79.1 | 98.5 | 88
Q4_K_M · ctx:16384 · kv:k8v8 · default | 49.0 | 49.4 | 2612.3 | 2699.8 | 20.41 | 14.33 | 82.8 | 81.9 | 93.4 | 87
Q5_K_M · ctx:32768 · kv:k8v8 · long | 47.2 | 47.9 | 2710.9 | 2758.2 | 21.17 | 13.59 | 80.0 | 79.6 | 98.5 | 87
Q4_K_M · ctx:32768 · kv:k8v8 · long | 48.0 | 49.6 | 2669.6 | 2687.5 | 20.86 | 14.33 | 82.2 | 81.2 | 93.4 | 86
Q8_0 · ctx:32768 · kv:k8v8 · long | 44.4 | 44.7 | 2883.8 | 2902.4 | 22.53 | 13.39 | 75.0 | 75.2 | 100.0 | 85
Q8_0 · ctx:16384 · kv:k8v8 · default | 44.0 | 45.3 | 2906.7 | 3069.0 | 22.71 | 13.39 | 75.2 | 72.9 | 100.0 | 85
Q3_K_M · ctx:16384 · kv:k8v8 · default | 46.1 | 46.8 | 2774.9 | 2838.8 | 21.68 | 15.77 | 78.2 | 77.5 | 84.9 | 81
Q3_K_M · ctx:32768 · kv:k8v8 · long | 45.6 | 46.2 | 2808.2 | 2855.5 | 21.94 | 15.77 | 77.3 | 76.8 | 84.9 | 80

Best config: Q5_K_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=4.00 TPS Δ=4.80 TTFT Δ=-202.2ms PPL Δ=0.20
                     TPS p95 Δ=4.30 TTFT p95 Δ=-208.7ms
Confidence: target=medium gap_before=0.00% var_before=1.04% replay=False(disabled) gap_after=0.00%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
