sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 70.6 | 71.5 | 1814.0 | 1848.8 | 14.17 | 7.79 | 100.0 | 100.0 | 89.1 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 65.2 | 65.8 | 1963.0 | 2024.7 | 15.34 | 7.12 | 92.2 | 91.9 | 97.5 | 94
IQ3_M · ctx:16384 · kv:k16v16 · long | 68.3 | 69.3 | 1874.5 | 1897.5 | 14.64 | 7.79 | 96.8 | 97.1 | 89.1 | 94
IQ3_M · ctx:16384 · kv:k8v8 · default | 67.8 | 68.9 | 1887.6 | 1910.4 | 14.75 | 7.79 | 96.2 | 96.4 | 89.1 | 93
Q4_K_M · ctx:16384 · kv:k16v16 · long | 63.5 | 64.3 | 2014.3 | 2075.6 | 15.74 | 7.12 | 89.9 | 89.6 | 97.5 | 93
Q4_K_M · ctx:16384 · kv:k8v8 · default | 62.5 | 63.0 | 2048.1 | 2157.5 | 16.00 | 7.12 | 88.3 | 87.1 | 97.5 | 92
IQ3_M · ctx:32768 · kv:k8v8 · long | 65.9 | 66.7 | 1941.1 | 1979.3 | 15.16 | 7.79 | 93.3 | 93.4 | 89.1 | 92
Q4_K_M · ctx:32768 · kv:k8v8 · long | 61.2 | 61.6 | 2090.9 | 2217.9 | 16.33 | 7.12 | 86.4 | 85.1 | 97.5 | 91
Q5_K_M · ctx:8192 · kv:k16v16 · default | 59.1 | 59.9 | 2165.9 | 2244.7 | 16.92 | 7.01 | 83.7 | 83.1 | 99.0 | 90
Q5_K_M · ctx:16384 · kv:k16v16 · long | 58.4 | 59.0 | 2193.0 | 2271.2 | 17.13 | 7.01 | 82.6 | 82.1 | 99.0 | 89
Q5_K_M · ctx:16384 · kv:k8v8 · default | 57.2 | 58.1 | 2236.3 | 2327.4 | 17.47 | 7.01 | 81.1 | 80.3 | 99.0 | 88
Q5_K_M · ctx:32768 · kv:k8v8 · long | 56.0 | 56.5 | 2286.3 | 2390.6 | 17.86 | 7.01 | 79.2 | 78.3 | 99.0 | 87
Q8_0 · ctx:8192 · kv:k16v16 · default | 48.1 | 48.5 | 2659.5 | 2762.6 | 20.78 | 6.94 | 68.0 | 67.6 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k16v16 · long | 47.4 | 47.7 | 2702.5 | 2770.9 | 21.11 | 6.94 | 66.9 | 66.9 | 100.0 | 80
Q8_0 · ctx:16384 · kv:k8v8 · default | 46.5 | 46.6 | 2751.4 | 2822.7 | 21.50 | 6.94 | 65.5 | 65.7 | 100.0 | 79
Q8_0 · ctx:32768 · kv:k8v8 · long | 45.8 | 46.0 | 2797.2 | 2846.3 | 21.85 | 6.94 | 64.6 | 64.9 | 100.0 | 79

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.00 TPS Δ=22.50 TTFT Δ=-845.5ms PPL Δ=0.85
                     TPS p95 Δ=23.00 TTFT p95 Δ=-913.8ms
Agent smoke: 3/5 (60.0%) [mixed]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
