sigilant-runner · Phi-3.5-mini-instruct-GGUF · A10G · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q4_K_M · ctx:8192 · kv:k16v16 · default  <- best | 58.5 | 60.5 | 2190.4 | 2554.4 | 17.12 | 7.12 | 93.0 | 85.8 | 97.2 | 94
IQ3_M · ctx:8192 · kv:k16v16 · default | 63.7 | 64.3 | 2011.7 | 2039.9 | 15.72 | 7.79 | 100.0 | 100.0 | 88.8 | 94
Q4_K_M · ctx:16384 · kv:k8v8 · default | 55.0 | 56.1 | 2328.7 | 2400.5 | 18.19 | 7.10 | 86.8 | 85.7 | 97.5 | 92
Q4_K_M · ctx:16384 · kv:k16v16 · long | 54.4 | 55.7 | 2355.0 | 2391.8 | 18.40 | 7.10 | 86.0 | 85.4 | 97.5 | 92
IQ3_M · ctx:16384 · kv:k16v16 · long | 60.7 | 61.2 | 2109.8 | 2172.2 | 16.48 | 7.79 | 95.2 | 94.6 | 88.8 | 92
IQ3_M · ctx:16384 · kv:k8v8 · default | 59.7 | 61.2 | 2142.8 | 2184.9 | 16.74 | 7.79 | 94.4 | 93.6 | 88.8 | 91
Q5_K_M · ctx:8192 · kv:k16v16 · default | 52.4 | 53.2 | 2441.6 | 2517.5 | 19.08 | 6.99 | 82.5 | 81.7 | 99.0 | 91
IQ3_M · ctx:32768 · kv:k8v8 · long | 58.5 | 59.4 | 2187.6 | 2199.8 | 17.09 | 7.79 | 92.1 | 92.3 | 88.8 | 91
Q5_K_M · ctx:16384 · kv:k16v16 · long | 50.8 | 51.4 | 2519.5 | 2528.8 | 19.68 | 6.99 | 79.8 | 80.3 | 99.0 | 90
Q5_K_M · ctx:16384 · kv:k8v8 · default | 49.8 | 51.0 | 2571.5 | 2585.9 | 20.09 | 6.99 | 78.7 | 78.6 | 99.0 | 89
Q4_K_M · ctx:32768 · kv:k8v8 · long | 50.8 | 53.2 | 2520.0 | 2530.1 | 19.69 | 7.10 | 81.2 | 80.2 | 97.5 | 89
Q5_K_M · ctx:32768 · kv:k8v8 · long | 48.4 | 49.3 | 2645.1 | 2659.8 | 20.66 | 6.99 | 76.3 | 76.4 | 99.0 | 88
Q8_0 · ctx:8192 · kv:k16v16 · default | 42.8 | 43.1 | 2989.0 | 3045.3 | 23.35 | 6.92 | 67.1 | 67.1 | 100.0 | 84
Q8_0 · ctx:16384 · kv:k8v8 · default | 41.7 | 42.1 | 3070.0 | 3096.0 | 23.98 | 6.92 | 65.5 | 65.7 | 100.0 | 83
Q8_0 · ctx:16384 · kv:k16v16 · long | 42.0 | 42.6 | 3050.4 | 3053.0 | 23.83 | 6.92 | 66.1 | 66.4 | 100.0 | 83
Q8_0 · ctx:32768 · kv:k8v8 · long | 40.9 | 41.3 | 3133.1 | 3147.5 | 24.48 | 6.92 | 64.2 | 64.5 | 100.0 | 82

Best config: Q4_K_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=10.00 TPS Δ=15.70 TTFT Δ=-798.6ms PPL Δ=0.20
                     TPS p95 Δ=17.40 TTFT p95 Δ=-490.9ms
Agent smoke: 4/5 (80.0%) [config_ready_for_smoke]
Confidence: target=medium gap_before=1.06% var_before=2.16% replay=True(applied) gap_after=0.00%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
