sigilant-runner · Phi-3.5-mini-instruct-GGUF · A10G · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
Q4_K_M · ctx:8192 · kv:k16v16 · default  <- best | 66.3 | 67.5 | 1930.1 | 1945.5 | 15.08 | 7.12 | 94.0 | 94.7 | 97.5 | 96
IQ3_M · ctx:8192 · kv:k16v16 · default | 70.3 | 72.1 | 1819.7 | 1848.8 | 14.22 | 7.79 | 100.0 | 100.0 | 89.1 | 95
Q4_K_M · ctx:16384 · kv:k16v16 · long | 64.6 | 65.8 | 1980.6 | 1996.0 | 15.47 | 7.12 | 91.6 | 92.3 | 97.5 | 95
IQ3_M · ctx:16384 · kv:k16v16 · long | 69.5 | 71.0 | 1842.3 | 1855.7 | 14.39 | 7.79 | 98.7 | 99.2 | 89.1 | 94
Q4_K_M · ctx:16384 · kv:k8v8 · default | 63.3 | 64.2 | 2021.6 | 2125.1 | 15.79 | 7.12 | 89.5 | 88.5 | 97.5 | 93
Q4_K_M · ctx:32768 · kv:k8v8 · long | 61.8 | 62.8 | 2072.4 | 2096.6 | 16.19 | 7.12 | 87.5 | 88.0 | 97.5 | 93
IQ3_M · ctx:16384 · kv:k8v8 · default | 68.0 | 70.5 | 1881.7 | 2043.4 | 14.70 | 7.79 | 97.3 | 93.6 | 89.1 | 92
IQ3_M · ctx:32768 · kv:k8v8 · long | 67.0 | 67.7 | 1910.2 | 1945.6 | 14.92 | 7.79 | 94.6 | 95.1 | 89.1 | 92
Q5_K_M · ctx:8192 · kv:k16v16 · default | 59.8 | 60.9 | 2142.0 | 2179.7 | 16.73 | 7.01 | 84.8 | 84.9 | 99.0 | 92
Q5_K_M · ctx:16384 · kv:k16v16 · long | 58.3 | 58.9 | 2194.9 | 2237.3 | 17.15 | 7.01 | 82.3 | 82.8 | 99.0 | 91
Q5_K_M · ctx:16384 · kv:k8v8 · default | 57.9 | 58.7 | 2209.2 | 2252.0 | 17.26 | 7.01 | 81.9 | 82.2 | 99.0 | 91
Q5_K_M · ctx:32768 · kv:k8v8 · long | 56.4 | 57.3 | 2270.6 | 2290.2 | 17.74 | 7.01 | 79.9 | 80.4 | 99.0 | 90
Q8_0 · ctx:8192 · kv:k16v16 · default | 50.2 | 50.7 | 2549.2 | 2573.0 | 19.92 | 6.94 | 70.9 | 71.6 | 100.0 | 86
Q8_0 · ctx:16384 · kv:k16v16 · long | 49.2 | 50.1 | 2599.3 | 2613.0 | 20.31 | 6.94 | 69.7 | 70.4 | 100.0 | 85
Q8_0 · ctx:16384 · kv:k8v8 · default | 48.8 | 49.3 | 2622.1 | 2657.3 | 20.49 | 6.94 | 68.9 | 69.5 | 100.0 | 85
Q8_0 · ctx:32768 · kv:k8v8 · long | 47.1 | 48.3 | 2720.0 | 2810.5 | 21.25 | 6.94 | 67.0 | 66.3 | 100.0 | 83

Best config: Q4_K_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=10.00 TPS Δ=16.10 TTFT Δ=-619.1ms PPL Δ=0.18
                     TPS p95 Δ=16.80 TTFT p95 Δ=-627.5ms
Agent smoke: 3/5 (60.0%) [mixed]
Confidence: target=medium gap_before=1.04% var_before=1.69% replay=False(disabled) gap_after=1.04%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
