sigilant-runner · Phi-3.5-mini-instruct-GGUF · A10 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 68.2 | 69.1 | 1874.6 | 1902.2 | 14.64 | 7.79 | 100.0 | 100.0 | 89.1 | 96
IQ3_M · ctx:16384 · kv:k16v16 · long | 67.0 | 67.6 | 1911.6 | 1955.7 | 14.93 | 7.79 | 98.0 | 97.7 | 89.1 | 94
IQ3_M · ctx:16384 · kv:k8v8 · default | 65.2 | 66.9 | 1965.2 | 1980.6 | 15.36 | 7.79 | 96.2 | 95.7 | 89.1 | 93
Q4_K_M · ctx:8192 · kv:k16v16 · default | 61.6 | 63.0 | 2076.5 | 2187.3 | 16.22 | 7.12 | 90.7 | 88.6 | 97.5 | 93
Q4_K_M · ctx:16384 · kv:k8v8 · default | 59.7 | 60.8 | 2144.2 | 2178.9 | 16.75 | 7.12 | 87.8 | 87.4 | 97.5 | 92
IQ3_M · ctx:32768 · kv:k8v8 · long | 63.5 | 65.5 | 2014.0 | 2180.5 | 15.73 | 7.79 | 93.9 | 90.2 | 89.1 | 91
Q4_K_M · ctx:16384 · kv:k16v16 · long | 60.8 | 62.0 | 2104.9 | 2509.8 | 16.45 | 7.12 | 89.4 | 82.4 | 97.5 | 91
Q4_K_M · ctx:32768 · kv:k8v8 · long | 58.9 | 59.3 | 2172.2 | 2402.1 | 16.98 | 7.12 | 86.1 | 82.7 | 97.5 | 90
Q5_K_M · ctx:8192 · kv:k16v16 · default | 56.0 | 56.8 | 2288.5 | 2309.8 | 17.88 | 7.01 | 82.2 | 82.1 | 99.0 | 89
Q5_K_M · ctx:16384 · kv:k16v16 · long | 55.1 | 56.0 | 2323.1 | 2439.0 | 18.15 | 7.01 | 80.9 | 79.3 | 99.0 | 88
Q5_K_M · ctx:16384 · kv:k8v8 · default | 54.0 | 55.3 | 2368.6 | 2654.0 | 18.51 | 7.01 | 79.6 | 75.4 | 99.0 | 87
Q5_K_M · ctx:32768 · kv:k8v8 · long | 53.5 | 54.1 | 2389.9 | 2809.9 | 18.67 | 7.01 | 78.4 | 73.1 | 99.0 | 86
Q8_0 · ctx:8192 · kv:k16v16 · default | 47.0 | 47.9 | 2722.8 | 2854.9 | 21.27 | 6.94 | 69.1 | 67.7 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k16v16 · long | 45.9 | 47.4 | 2788.1 | 2866.4 | 21.78 | 6.94 | 67.9 | 66.8 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k8v8 · default | 45.0 | 46.2 | 2839.3 | 2903.9 | 22.19 | 6.94 | 66.4 | 65.8 | 100.0 | 80
Q8_0 · ctx:32768 · kv:k8v8 · long | 44.8 | 46.0 | 2863.2 | 3291.9 | 22.37 | 6.94 | 66.1 | 61.6 | 100.0 | 79

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.00 TPS Δ=21.20 TTFT Δ=-848.2ms PPL Δ=0.85
                     TPS p95 Δ=21.20 TTFT p95 Δ=-952.7ms
Agent smoke: 3/5 (60.0%) [mixed]
Confidence: target=medium gap_before=0.00% var_before=2.78% replay=True(applied) gap_after=2.08%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
