sigilant-runner · Phi-3.5-mini-instruct · A10G · vllm · 8 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
INT8_W8A8 · ctx:16384 · kv:k16v16 · long  <- best | 52.9 | n/a | 904.2 | n/a | 18.98 | 2.90 | 100.0 | 57.7 | 97.9 | 91
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 52.9 | n/a | 1590.8 | n/a | 18.98 | 2.90 | 100.0 | 32.8 | 97.9 | 86
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 52.9 | n/a | 1595.1 | n/a | 18.97 | 2.88 | 100.0 | 32.7 | 98.6 | 86
FP16_BASELINE · ctx:16384 · kv:k16v16 · long | 39.3 | n/a | 1285.8 | n/a | 25.55 | 3.06 | 74.3 | 40.6 | 92.8 | 75
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 39.3 | n/a | 1922.4 | n/a | 25.56 | 3.04 | 74.3 | 27.1 | 93.4 | 73
FP16_BASELINE · ctx:8192 · kv:k16v16 · default | 39.8 | n/a | 1934.2 | n/a | 25.30 | 3.02 | 75.2 | 27.0 | 94.0 | 73
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 32.2 | n/a | 2536.5 | n/a | 31.22 | 3.08 | 60.9 | 20.6 | 92.2 | 65
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 3.1 | n/a | 521.4 | n/a | 484.12 | 2.84 | 5.9 | 100.0 | 100.0 | 62

Best config: INT8_W8A8 · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=16.00 TPS Δ=13.60 TTFT Δ=-381.6ms PPL Δ=-0.16
Confidence: target=medium gap_before=5.49% var_before=n/a% replay=False(disabled) gap_after=5.49%

Depth profile (not cross-depth comparable):
  - bucket winners: best_at_8k=INT8_W8A8 · ctx:16384 · kv:k16v16 · long | best_at_14k=INT8_W8A8 · ctx:16384 · kv:k16v16 · long | best_at_28k=n/a
  - 8k: winner=INT8_W8A8 · ctx:16384 · kv:k16v16 · long error=none
  - 14k: winner=INT8_W8A8 · ctx:16384 · kv:k16v16 · long error=none
  - 28k: winner=n/a error=ConnectionError: [Errno 8] nodename nor servname provided, or not known

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
