sigilant-runner · Phi-3.5-mini-instruct-GGUF · A10G · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 66.4 | 66.9 | 1927.4 | 1938.8 | 15.06 | 7.79 | 100.0 | 100.0 | 88.8 | 94
IQ3_M · ctx:16384 · kv:k16v16 · long | 65.2 | 65.3 | 1964.6 | 1999.9 | 15.35 | 7.79 | 97.9 | 97.5 | 88.8 | 93
Q4_K_M · ctx:8192 · kv:k16v16 · default | 58.5 | 58.6 | 2189.6 | 2197.0 | 17.11 | 7.10 | 87.8 | 88.1 | 97.5 | 93
IQ3_M · ctx:16384 · kv:k8v8 · default | 63.8 | 64.1 | 2005.0 | 2009.9 | 15.66 | 7.79 | 95.9 | 96.3 | 88.8 | 92
Q4_K_M · ctx:16384 · kv:k16v16 · long | 57.1 | 57.4 | 2240.3 | 2268.2 | 17.50 | 7.10 | 85.9 | 85.8 | 97.5 | 92
Q5_K_M · ctx:8192 · kv:k16v16 · default | 56.1 | 56.2 | 2281.3 | 2291.0 | 17.82 | 6.99 | 84.2 | 84.6 | 99.0 | 92
IQ3_M · ctx:32768 · kv:k8v8 · long | 61.8 | 61.9 | 2071.8 | 2073.6 | 16.19 | 7.79 | 92.8 | 93.3 | 88.8 | 91
Q4_K_M · ctx:16384 · kv:k8v8 · default | 56.0 | 56.3 | 2287.1 | 2300.7 | 17.87 | 7.10 | 84.2 | 84.3 | 97.5 | 91
Q5_K_M · ctx:16384 · kv:k16v16 · long | 54.9 | 55.0 | 2332.4 | 2359.3 | 18.22 | 6.99 | 82.4 | 82.4 | 99.0 | 91
Q4_K_M · ctx:32768 · kv:k8v8 · long | 54.5 | 54.7 | 2346.6 | 2352.5 | 18.33 | 7.10 | 81.9 | 82.3 | 97.5 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 53.9 | 54.0 | 2376.9 | 2388.7 | 18.57 | 6.99 | 80.9 | 81.1 | 99.0 | 90
Q5_K_M · ctx:32768 · kv:k8v8 · long | 52.4 | 52.6 | 2440.8 | 2454.4 | 19.07 | 6.99 | 78.8 | 79.0 | 99.0 | 89
Q8_0 · ctx:8192 · kv:k16v16 · default | 46.2 | 46.5 | 2767.9 | 2785.6 | 21.62 | 6.92 | 69.5 | 69.6 | 100.0 | 85
Q8_0 · ctx:16384 · kv:k16v16 · long | 45.5 | 45.6 | 2813.1 | 2820.4 | 21.98 | 6.92 | 68.3 | 68.6 | 100.0 | 84
Q8_0 · ctx:16384 · kv:k8v8 · default | 44.6 | 44.7 | 2872.0 | 2913.0 | 22.44 | 6.92 | 67.0 | 66.8 | 100.0 | 83
Q8_0 · ctx:32768 · kv:k8v8 · long | 43.8 | 43.9 | 2920.4 | 2923.2 | 22.82 | 6.92 | 65.8 | 66.2 | 100.0 | 83

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=9.00 TPS Δ=20.20 TTFT Δ=-840.5ms PPL Δ=0.87
                     TPS p95 Δ=20.40 TTFT p95 Δ=-846.8ms
Agent smoke: 4/5 (80.0%) [config_ready_for_smoke]
Confidence: target=medium gap_before=1.06% var_before=0.82% replay=False(disabled) gap_after=1.06%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
