sigilant-runner · Phi-3.5-mini-instruct-GGUF · A10G · llama.cpp · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
IQ3_M · ctx:16384 · kv:k16v16 · long  <- best | 65.5 | 66.0 | 1954.8 | 1985.3 | 15.27 | 7.79 | 100.0 | 100.0 | 88.8 | 94
Q4_K_M · ctx:16384 · kv:k16v16 · long | 58.3 | 58.8 | 2196.1 | 2208.8 | 17.16 | 7.10 | 89.0 | 89.4 | 97.5 | 93
Q4_K_M · ctx:16384 · kv:k8v8 · default | 51.1 | 55.2 | 2503.6 | 2732.6 | 19.56 | 7.10 | 80.8 | 75.4 | 97.5 | 88
IQ3_M · ctx:16384 · kv:k8v8 · default | 57.5 | 62.7 | 2223.0 | 2728.1 | 17.37 | 7.79 | 91.4 | 80.4 | 88.8 | 88
Q5_K_M · ctx:8192 · kv:k16v16 · default | 49.9 | 54.2 | 2565.9 | 2717.1 | 20.05 | 6.99 | 79.2 | 74.6 | 99.0 | 88
Q4_K_M · ctx:8192 · kv:k16v16 · default | 52.2 | 56.0 | 2455.2 | 3056.7 | 19.19 | 7.10 | 82.3 | 72.3 | 97.5 | 88
Q5_K_M · ctx:16384 · kv:k8v8 · default | 49.1 | 51.7 | 2607.6 | 2795.8 | 20.37 | 6.99 | 76.6 | 73.0 | 99.0 | 87
IQ3_M · ctx:8192 · kv:k16v16 · default | 58.8 | 62.2 | 2174.8 | 3528.7 | 16.99 | 7.79 | 92.0 | 73.1 | 88.8 | 87
Q4_K_M · ctx:32768 · kv:k8v8 · long | 49.9 | 52.6 | 2565.1 | 2771.0 | 20.04 | 7.10 | 77.9 | 73.9 | 97.5 | 87
IQ3_M · ctx:32768 · kv:k8v8 · long | 55.5 | 59.5 | 2303.2 | 2470.6 | 17.99 | 7.79 | 87.4 | 82.6 | 88.8 | 87
Q5_K_M · ctx:16384 · kv:k16v16 · long | 47.5 | 53.8 | 2689.7 | 2876.2 | 21.01 | 6.99 | 77.0 | 70.9 | 99.0 | 87
Q5_K_M · ctx:32768 · kv:k8v8 · long | 46.3 | 50.5 | 2764.6 | 3145.2 | 21.60 | 6.99 | 73.6 | 66.9 | 99.0 | 85
Q8_0 · ctx:8192 · kv:k16v16 · default | 42.5 | 45.1 | 3009.2 | 3216.2 | 23.51 | 6.92 | 66.6 | 63.3 | 100.0 | 83
Q8_0 · ctx:16384 · kv:k8v8 · default | 41.2 | 43.4 | 3104.1 | 3305.5 | 24.26 | 6.92 | 64.3 | 61.5 | 100.0 | 82
Q8_0 · ctx:16384 · kv:k16v16 · long | 40.0 | 42.8 | 3202.5 | 3428.3 | 25.02 | 6.92 | 63.0 | 59.5 | 100.0 | 81
Q8_0 · ctx:32768 · kv:k8v8 · long | 39.1 | 42.8 | 3270.4 | 4308.2 | 25.55 | 6.92 | 62.3 | 52.9 | 100.0 | 79

Best config: IQ3_M · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=11.00 TPS Δ=23.00 TTFT Δ=-1054.4ms PPL Δ=0.87
                     TPS p95 Δ=20.90 TTFT p95 Δ=-1230.9ms
Agent smoke: 3/5 (60.0%) [mixed]
Confidence: target=medium gap_before=0.00% var_before=9.51% replay=True(applied) gap_after=1.06%

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
