sigilant-runner · Phi-3.5-mini-instruct-quantized.w8a8 · A10G · vllm · 3 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
FP16_BASELINE · ctx:16384 · kv:k16v16 · long  <- best | 39.4 | n/a | 1533.9 | n/a | 26.29 | 3.03 | 98.5 | 100.0 | 99.3 | 99
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 40.0 | n/a | 1947.9 | n/a | 25.93 | 3.06 | 100.0 | 78.7 | 98.4 | 95
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 38.3 | n/a | 2679.4 | n/a | 27.05 | 3.01 | 95.7 | 57.2 | 100.0 | 90

Best config: FP16_BASELINE · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=0.00 TPS Δ=0.00 TTFT Δ=0.0ms PPL Δ=0.00
Confidence: target=medium gap_before=4.04% var_before=n/a% replay=False(disabled) gap_after=4.04%

Depth profile (not cross-depth comparable):
  - bucket winners: best_at_8k=FP16_BASELINE · ctx:16384 · kv:k16v16 · long | best_at_14k=FP16_BASELINE · ctx:16384 · kv:k16v16 · long | best_at_28k=FP16_BASELINE · ctx:32768 · kv:k8v8 · long
  - 8k: winner=FP16_BASELINE · ctx:16384 · kv:k16v16 · long error=none
  - 14k: winner=FP16_BASELINE · ctx:16384 · kv:k16v16 · long error=none
  - 28k: winner=FP16_BASELINE · ctx:32768 · kv:k8v8 · long error=none

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
