sigilant-runner · Phi-3.5-mini-instruct-GGUF · A10 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:16384 · kv:k8v8 · default  <- best | 69.1 | 70.3 | 1853.2 | 1903.1 | 14.48 | 7.79 | 100.0 | 100.0 | 89.1 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 64.3 | 66.8 | 1989.8 | 2052.8 | 15.54 | 7.12 | 94.0 | 92.9 | 97.5 | 95
IQ3_M · ctx:8192 · kv:k16v16 · default | 68.2 | 69.2 | 1876.5 | 1931.3 | 14.66 | 7.79 | 98.6 | 98.6 | 89.1 | 95
Q4_K_M · ctx:16384 · kv:k16v16 · long | 64.3 | 65.5 | 1989.7 | 2041.3 | 15.54 | 7.12 | 93.1 | 93.2 | 97.5 | 95
IQ3_M · ctx:32768 · kv:k8v8 · long | 67.8 | 69.1 | 1885.8 | 1952.5 | 14.73 | 7.79 | 98.2 | 97.9 | 89.1 | 94
Q4_K_M · ctx:16384 · kv:k8v8 · default | 62.4 | 63.9 | 2052.4 | 2079.1 | 16.04 | 7.12 | 90.6 | 90.9 | 97.5 | 93
IQ3_M · ctx:16384 · kv:k16v16 · long | 67.0 | 68.2 | 1911.9 | 2064.0 | 14.94 | 7.79 | 97.0 | 94.6 | 89.1 | 93
Q5_K_M · ctx:8192 · kv:k16v16 · default | 60.6 | 61.9 | 2110.9 | 2190.7 | 16.49 | 7.01 | 87.9 | 87.3 | 99.0 | 92
Q4_K_M · ctx:32768 · kv:k8v8 · long | 61.0 | 62.7 | 2099.6 | 2152.7 | 16.40 | 7.12 | 88.7 | 88.3 | 97.5 | 92
Q5_K_M · ctx:16384 · kv:k16v16 · long | 58.8 | 60.7 | 2177.2 | 2206.4 | 17.01 | 7.01 | 85.7 | 85.7 | 99.0 | 91
Q5_K_M · ctx:16384 · kv:k8v8 · default | 58.0 | 59.2 | 2204.8 | 2240.4 | 17.23 | 7.01 | 84.1 | 84.5 | 99.0 | 90
Q5_K_M · ctx:32768 · kv:k8v8 · long | 56.8 | 58.3 | 2255.7 | 2297.2 | 17.62 | 7.01 | 82.6 | 82.5 | 99.0 | 89
Q8_0 · ctx:8192 · kv:k16v16 · default | 49.0 | 50.4 | 2610.6 | 2636.7 | 20.39 | 6.94 | 71.3 | 71.6 | 100.0 | 83
Q8_0 · ctx:16384 · kv:k16v16 · long | 49.1 | 50.3 | 2604.5 | 2659.6 | 20.35 | 6.94 | 71.3 | 71.4 | 100.0 | 83
Q8_0 · ctx:16384 · kv:k8v8 · default | 48.2 | 48.5 | 2657.6 | 2716.5 | 20.77 | 6.94 | 69.4 | 69.9 | 100.0 | 82
Q8_0 · ctx:32768 · kv:k8v8 · long | 47.2 | 47.9 | 2708.6 | 2770.1 | 21.16 | 6.94 | 68.2 | 68.6 | 100.0 | 81

Best config: IQ3_M · ctx:16384 · kv:k8v8 · default
Auto baseline compare: score Δ=13.00 TPS Δ=20.10 TTFT Δ=-757.4ms PPL Δ=0.85
                     TPS p95 Δ=19.90 TTFT p95 Δ=-733.6ms
Agent smoke: 3/5 (60.0%) [mixed]
Confidence: target=medium gap_before=2.08% var_before=2.65% replay=True(applied) gap_after=1.04%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
