sigilant-runner · Phi-3.5-mini-instruct · A10G · vllm · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long  <- best | 73.0 | n/a | 1194.8 | n/a | 13.76 | 1.85 | 100.0 | 41.3 | 93.0 | 85
AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long | 72.8 | n/a | 1198.4 | n/a | 13.78 | 1.97 | 99.7 | 41.2 | 87.3 | 83
GPTQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 73.0 | n/a | 1850.7 | n/a | 13.76 | 1.85 | 100.0 | 26.7 | 93.0 | 83
GPTQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 73.0 | n/a | 1860.6 | n/a | 13.76 | 1.85 | 100.0 | 26.5 | 93.0 | 82
INT8_W8A8 · ctx:16384 · kv:k16v16 · long | 53.0 | n/a | 889.3 | n/a | 18.96 | 1.72 | 72.6 | 55.5 | 100.0 | 80
AWQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 72.7 | n/a | 1856.0 | n/a | 13.81 | 1.97 | 99.6 | 26.6 | 87.3 | 80
AWQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 72.7 | n/a | 1860.2 | n/a | 13.80 | 1.97 | 99.6 | 26.5 | 87.3 | 80
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 52.9 | n/a | 1579.2 | n/a | 18.97 | 1.74 | 72.5 | 31.2 | 98.9 | 75
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 52.9 | n/a | 1580.1 | n/a | 18.96 | 1.74 | 72.5 | 31.2 | 98.9 | 75
GPTQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 35.5 | n/a | 1733.6 | n/a | 28.27 | 1.85 | 48.6 | 28.5 | 93.0 | 62
AWQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 36.6 | n/a | 1732.9 | n/a | 27.41 | 1.97 | 50.1 | 28.5 | 87.3 | 61
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 3.3 | n/a | 493.4 | n/a | 458.16 | 1.84 | 4.5 | 100.0 | 93.5 | 59
FP16_BASELINE · ctx:16384 · kv:k16v16 · long | 39.3 | n/a | 1249.2 | n/a | 25.56 | 2.57 | 53.8 | 39.5 | 66.9 | 56
FP16_BASELINE · ctx:8192 · kv:k16v16 · default | 39.9 | n/a | 1906.4 | n/a | 25.27 | 2.69 | 54.7 | 25.9 | 63.9 | 53
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 39.3 | n/a | 1901.7 | n/a | 25.55 | 2.80 | 53.8 | 25.9 | 61.4 | 51
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 35.7 | n/a | 2454.9 | n/a | 28.14 | 2.66 | 48.9 | 20.1 | 64.7 | 49

Best config: GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=29.00 TPS Δ=33.70 TTFT Δ=-54.4ms PPL Δ=-0.72
Confidence: target=medium gap_before=2.35% var_before=n/a% replay=False(disabled) gap_after=2.35%

Depth profile (not cross-depth comparable):
  - bucket winners: best_at_8k=GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long | best_at_14k=GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long | best_at_28k=GPTQ4_MARLIN · ctx:32768 · kv:k8v8 · long
  - 8k: winner=GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long error=none
  - 14k: winner=GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long error=none
  - 28k: winner=GPTQ4_MARLIN · ctx:32768 · kv:k8v8 · long error=none

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
