sigilant-runner · Qwen2.5-1.5B-Instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q4_K_M · ctx:8192 · kv:k16v16 · default  <- best | 49.2 | 51.1 | 2601.3 | 2954.1 | 20.32 | 14.33 | 100.0 | 99.1 | 93.4 | 97
Q5_K_M · ctx:8192 · kv:k16v16 · default | 46.6 | 49.7 | 2754.2 | 2899.6 | 21.52 | 13.59 | 96.0 | 97.2 | 98.5 | 97
Q8_0 · ctx:8192 · kv:k16v16 · default | 44.6 | 46.8 | 2868.4 | 3290.9 | 22.41 | 13.39 | 91.1 | 89.4 | 100.0 | 94
Q5_K_M · ctx:16384 · kv:k16v16 · long | 40.8 | 47.5 | 3143.7 | 3264.1 | 24.56 | 13.59 | 87.9 | 85.8 | 98.5 | 92
Q4_K_M · ctx:16384 · kv:k16v16 · long | 43.4 | 44.8 | 2950.8 | 3119.4 | 23.05 | 14.33 | 87.9 | 90.6 | 93.4 | 91
Q3_K_M · ctx:16384 · kv:k16v16 · long | 46.2 | 46.9 | 2768.0 | 3411.4 | 21.62 | 15.77 | 92.8 | 89.5 | 84.9 | 89
Q8_0 · ctx:16384 · kv:k16v16 · long | 38.5 | 46.6 | 3341.1 | 4038.7 | 26.11 | 13.39 | 84.7 | 74.8 | 100.0 | 89
Q3_K_M · ctx:8192 · kv:k16v16 · default | 43.8 | 46.8 | 2926.2 | 3712.7 | 22.86 | 15.77 | 90.3 | 83.5 | 84.9 | 87
Q5_K_M · ctx:32768 · kv:k8v8 · long | 38.8 | 40.7 | 3306.8 | 3762.0 | 25.84 | 13.59 | 79.3 | 77.9 | 98.5 | 87
Q5_K_M · ctx:16384 · kv:k8v8 · default | 37.0 | 40.2 | 3453.9 | 3541.3 | 26.98 | 13.59 | 76.9 | 78.6 | 98.5 | 86
Q4_K_M · ctx:32768 · kv:k8v8 · long | 39.3 | 40.6 | 3257.3 | 4040.8 | 25.45 | 14.33 | 79.7 | 75.8 | 93.4 | 84
Q8_0 · ctx:32768 · kv:k8v8 · long | 35.6 | 36.7 | 3590.8 | 3668.6 | 28.05 | 13.39 | 72.1 | 75.7 | 100.0 | 84
Q8_0 · ctx:16384 · kv:k8v8 · default | 34.0 | 37.9 | 3787.8 | 4439.4 | 29.59 | 13.39 | 71.6 | 67.0 | 100.0 | 82
Q3_K_M · ctx:16384 · kv:k8v8 · default | 36.2 | 39.6 | 3534.5 | 3813.8 | 27.62 | 15.77 | 75.5 | 74.8 | 84.9 | 79
Q4_K_M · ctx:16384 · kv:k8v8 · default | 33.0 | 36.4 | 3885.5 | 4015.4 | 30.36 | 14.33 | 69.2 | 69.6 | 93.4 | 79
Q3_K_M · ctx:32768 · kv:k8v8 · long | 33.8 | 40.1 | 3834.4 | 6451.3 | 29.95 | 15.77 | 73.6 | 56.4 | 84.9 | 75

Best config: Q4_K_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=3.00 TPS Δ=2.00 TTFT Δ=-114.2ms PPL Δ=0.20
                     TPS p95 Δ=2.90 TTFT p95 Δ=-391.3ms
Confidence: target=medium gap_before=0.00% var_before=7.34% replay=False(disabled) gap_after=0.00%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
