sigilant-runner · Phi-3.5-mini-instruct · A10G · vllm · 6 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
INT8_W8A8 · ctx:16384 · kv:k16v16 · long  <- best | 51.1 | n/a | 1218.5 | n/a | 19.66 | 2.88 | 100.0 | 100.0 | 98.6 | 99
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 51.1 | n/a | 1752.2 | n/a | 19.64 | 2.88 | 100.0 | 69.5 | 98.6 | 93
FP16_BASELINE · ctx:16384 · kv:k16v16 · long | 39.6 | n/a | 1530.6 | n/a | 26.21 | 3.06 | 77.5 | 79.6 | 92.8 | 84
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 36.5 | n/a | 1860.4 | n/a | 27.49 | 2.84 | 71.4 | 65.5 | 100.0 | 82
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 40.0 | n/a | 1975.7 | n/a | 25.93 | 3.03 | 78.3 | 61.7 | 93.7 | 81
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 38.4 | n/a | 2809.0 | n/a | 26.98 | 3.01 | 75.1 | 43.4 | 94.4 | 76

Best config: INT8_W8A8 · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=15.00 TPS Δ=11.50 TTFT Δ=-312.1ms PPL Δ=-0.18
Confidence: target=medium gap_before=6.06% var_before=n/a% replay=False(disabled) gap_after=6.06%

Depth profile (not cross-depth comparable):
  - bucket winners: best_at_8k=INT8_W8A8 · ctx:16384 · kv:k16v16 · long | best_at_14k=INT8_W8A8 · ctx:16384 · kv:k16v16 · long | best_at_28k=INT8_W8A8 · ctx:32768 · kv:k8v8 · long
  - 8k: winner=INT8_W8A8 · ctx:16384 · kv:k16v16 · long error=none
  - 14k: winner=INT8_W8A8 · ctx:16384 · kv:k16v16 · long error=none
  - 28k: winner=INT8_W8A8 · ctx:32768 · kv:k8v8 · long error=none

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
