sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q5_K_M · ctx:8192 · kv:k16v16 · default  <- best | 58.6 | 58.8 | 2184.8 | 2204.9 | 17.07 | 13.59 | 97.6 | 97.7 | 98.5 | 98
Q4_K_M · ctx:8192 · kv:k16v16 · default | 59.9 | 60.4 | 2136.6 | 2152.8 | 16.69 | 14.33 | 100.0 | 100.0 | 93.4 | 97
Q5_K_M · ctx:16384 · kv:k16v16 · long | 58.0 | 58.1 | 2207.1 | 2226.8 | 17.24 | 13.59 | 96.5 | 96.7 | 98.5 | 97
Q4_K_M · ctx:16384 · kv:k16v16 · long | 59.6 | 60.0 | 2147.0 | 2156.4 | 16.77 | 14.33 | 99.4 | 99.7 | 93.4 | 97
Q8_0 · ctx:8192 · kv:k16v16 · default | 54.1 | 54.4 | 2364.0 | 2406.9 | 18.47 | 13.39 | 90.2 | 89.9 | 100.0 | 94
Q8_0 · ctx:16384 · kv:k16v16 · long | 54.1 | 54.2 | 2366.5 | 2384.7 | 18.49 | 13.39 | 90.0 | 90.3 | 100.0 | 94
Q3_K_M · ctx:8192 · kv:k16v16 · default | 56.3 | 56.4 | 2272.5 | 2304.1 | 17.75 | 15.77 | 93.7 | 93.7 | 84.9 | 90
Q3_K_M · ctx:16384 · kv:k16v16 · long | 55.8 | 56.0 | 2293.6 | 2313.7 | 17.92 | 15.77 | 92.9 | 93.1 | 84.9 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 47.7 | 47.8 | 2684.8 | 2697.6 | 20.97 | 13.59 | 79.4 | 79.7 | 98.5 | 87
Q4_K_M · ctx:16384 · kv:k8v8 · default | 48.9 | 49.1 | 2615.1 | 2625.2 | 20.43 | 14.33 | 81.5 | 81.9 | 93.4 | 86
Q5_K_M · ctx:32768 · kv:k8v8 · long | 47.0 | 47.3 | 2722.8 | 2758.5 | 21.27 | 13.59 | 78.4 | 78.3 | 98.5 | 86
Q4_K_M · ctx:32768 · kv:k8v8 · long | 48.6 | 48.8 | 2633.7 | 2654.3 | 20.58 | 14.33 | 81.0 | 81.1 | 93.4 | 86
Q8_0 · ctx:16384 · kv:k8v8 · default | 45.3 | 45.7 | 2826.4 | 2887.8 | 22.08 | 13.39 | 75.6 | 75.1 | 100.0 | 85
Q8_0 · ctx:32768 · kv:k8v8 · long | 44.5 | 44.8 | 2873.3 | 2893.0 | 22.45 | 13.39 | 74.2 | 74.4 | 100.0 | 85
Q3_K_M · ctx:16384 · kv:k8v8 · default | 46.4 | 46.6 | 2760.7 | 2785.5 | 21.57 | 15.77 | 77.3 | 77.3 | 84.9 | 80
Q3_K_M · ctx:32768 · kv:k8v8 · long | 45.9 | 46.2 | 2787.0 | 2806.4 | 21.77 | 15.77 | 76.6 | 76.7 | 84.9 | 80

Best config: Q5_K_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=4.0 TPS Δ=4.5 TTFT Δ=-179.19999999999982ms PPL Δ=0.1999999999999993
Agent smoke: 1/5 (0.2) [model_limited]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
