sigilant-runner · Phi-3.5-mini-instruct-GGUF · L4 · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:8192 · kv:k16v16 · default  <- best | 70.1 | 71.6 | 1826.4 | 1945.7 | 14.27 | 7.79 | 100.0 | 100.0 | 89.1 | 96
Q4_K_M · ctx:8192 · kv:k16v16 · default | 64.6 | 66.1 | 1982.2 | 2116.6 | 15.49 | 7.12 | 92.2 | 92.0 | 97.5 | 94
IQ3_M · ctx:16384 · kv:k16v16 · long | 67.1 | 69.7 | 1906.5 | 1972.9 | 14.89 | 7.79 | 96.5 | 97.2 | 89.1 | 94
Q4_K_M · ctx:16384 · kv:k16v16 · long | 63.0 | 63.8 | 2031.0 | 2054.2 | 15.87 | 7.12 | 89.5 | 92.3 | 97.5 | 93
Q4_K_M · ctx:16384 · kv:k8v8 · default | 61.8 | 62.8 | 2072.8 | 2104.2 | 16.19 | 7.12 | 87.9 | 90.3 | 97.5 | 92
IQ3_M · ctx:16384 · kv:k8v8 · default | 63.3 | 68.5 | 2023.6 | 2238.3 | 15.81 | 7.79 | 93.0 | 88.6 | 89.1 | 91
Q4_K_M · ctx:32768 · kv:k8v8 · long | 60.5 | 61.7 | 2115.6 | 2291.5 | 16.53 | 7.12 | 86.2 | 85.6 | 97.5 | 91
IQ3_M · ctx:32768 · kv:k8v8 · long | 64.5 | 67.1 | 1984.8 | 2111.3 | 15.51 | 7.79 | 92.9 | 92.1 | 89.1 | 91
Q5_K_M · ctx:16384 · kv:k8v8 · default | 57.3 | 57.4 | 2235.6 | 2275.5 | 17.47 | 7.01 | 81.0 | 83.6 | 99.0 | 89
Q5_K_M · ctx:8192 · kv:k16v16 · default | 57.2 | 59.3 | 2239.2 | 2387.9 | 17.49 | 7.01 | 82.2 | 81.5 | 99.0 | 89
Q5_K_M · ctx:16384 · kv:k16v16 · long | 57.8 | 58.1 | 2214.4 | 2221.8 | 17.30 | 7.01 | 81.8 | 85.0 | 99.0 | 89
Q5_K_M · ctx:32768 · kv:k8v8 · long | 55.3 | 55.8 | 2312.7 | 2435.4 | 18.07 | 7.01 | 78.4 | 79.4 | 99.0 | 87
Q8_0 · ctx:16384 · kv:k16v16 · long | 47.4 | 48.1 | 2702.3 | 2738.1 | 21.11 | 6.94 | 67.4 | 69.3 | 100.0 | 81
Q8_0 · ctx:16384 · kv:k8v8 · default | 46.6 | 46.8 | 2748.3 | 2817.0 | 21.47 | 6.94 | 65.9 | 67.8 | 100.0 | 80
Q8_0 · ctx:8192 · kv:k16v16 · default | 46.2 | 48.2 | 2773.4 | 2888.6 | 21.67 | 6.94 | 66.6 | 66.6 | 100.0 | 80
Q8_0 · ctx:32768 · kv:k8v8 · long | 45.8 | 46.2 | 2794.7 | 2917.2 | 21.83 | 6.94 | 64.9 | 66.0 | 100.0 | 79

Best config: IQ3_M · ctx:8192 · kv:k16v16 · default
Auto baseline compare: score Δ=15.00 TPS Δ=22.70 TTFT Δ=-875.9ms PPL Δ=0.85
                     TPS p95 Δ=23.50 TTFT p95 Δ=-792.4ms
Agent smoke: 4/5 (80.0%) [config_ready_for_smoke]

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
