sigilant-runner · Phi-3.5-mini-instruct · A10G · vllm · 8 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
INT8_W8A8 · ctx:16384 · kv:k16v16 · long  <- best | 59.5 | n/a | 844.0 | n/a | 16.86 | 2.88 | 100.0 | 60.0 | 98.6 | 91
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 59.5 | n/a | 1462.8 | n/a | 16.88 | 2.90 | 100.0 | 34.6 | 97.9 | 86
FP16_BASELINE · ctx:16384 · kv:k16v16 · long | 42.5 | n/a | 1078.6 | n/a | 23.62 | 3.03 | 71.4 | 46.9 | 93.7 | 75
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 42.6 | n/a | 1747.0 | n/a | 23.59 | 3.03 | 71.6 | 29.0 | 93.7 | 72
FP16_BASELINE · ctx:8192 · kv:k16v16 · default | 43.1 | n/a | 1769.6 | n/a | 23.40 | 3.07 | 72.4 | 28.6 | 92.5 | 72
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 38.0 | n/a | 2312.1 | n/a | 26.44 | 3.01 | 63.9 | 21.9 | 94.4 | 68
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 3.2 | n/a | 506.0 | n/a | 469.84 | 2.84 | 5.4 | 100.0 | 100.0 | 62
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 3.1 | n/a | 525.6 | n/a | 488.09 | 2.90 | 5.2 | 96.3 | 97.9 | 61

Best config: INT8_W8A8 · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=16.00 TPS Δ=17.00 TTFT Δ=-234.6ms PPL Δ=-0.15
Confidence: target=medium gap_before=5.49% var_before=n/a% replay=False(disabled) gap_after=5.49%

Depth profile (not cross-depth comparable):
  - bucket winners: best_at_8k=INT8_W8A8 · ctx:16384 · kv:k16v16 · long | best_at_14k=INT8_W8A8 · ctx:16384 · kv:k16v16 · long | best_at_28k=INT8_W8A8 · ctx:32768 · kv:k8v8 · long
  - 8k: winner=INT8_W8A8 · ctx:16384 · kv:k16v16 · long error=none
  - 14k: winner=INT8_W8A8 · ctx:16384 · kv:k16v16 · long error=none
  - 28k: winner=INT8_W8A8 · ctx:32768 · kv:k8v8 · long error=none

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
