sigilant-runner · Phi-3.5-mini-instruct · A10G · vllm · 16 configs

Config | TPS | TPS p95 | TTFT | TTFT p95 | ITL | PPL | TPS% | TTFT% | PPL% | Score
AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long  <- best | 72.9 | n/a | 1205.1 | n/a | 13.77 | 3.03 | 99.7 | 41.5 | 93.7 | 86
GPTQ4_MARLIN · ctx:16384 · kv:k16v16 · long | 73.0 | n/a | 1199.2 | n/a | 13.76 | 3.09 | 99.9 | 41.7 | 91.9 | 85
AWQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 72.9 | n/a | 1872.1 | n/a | 13.78 | 3.03 | 99.7 | 26.7 | 93.7 | 83
AWQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 72.8 | n/a | 1882.7 | n/a | 13.78 | 3.03 | 99.6 | 26.6 | 93.7 | 83
GPTQ4_MARLIN · ctx:8192 · kv:k16v16 · default | 72.9 | n/a | 1893.8 | n/a | 13.77 | 3.09 | 99.7 | 26.4 | 91.9 | 82
GPTQ4_MARLIN · ctx:16384 · kv:k8v8 · default | 73.1 | n/a | 1922.3 | n/a | 13.74 | 3.09 | 100.0 | 26.0 | 91.9 | 82
INT8_W8A8 · ctx:16384 · kv:k16v16 · long | 53.0 | n/a | 893.4 | n/a | 18.95 | 2.88 | 72.5 | 56.0 | 98.6 | 80
INT8_W8A8 · ctx:8192 · kv:k16v16 · default | 53.0 | n/a | 1610.6 | n/a | 18.95 | 2.88 | 72.5 | 31.1 | 98.6 | 75
INT8_W8A8 · ctx:16384 · kv:k8v8 · default | 52.9 | n/a | 1630.0 | n/a | 18.96 | 2.88 | 72.4 | 30.7 | 98.6 | 75
FP16_BASELINE · ctx:16384 · kv:k16v16 · long | 39.3 | n/a | 1264.9 | n/a | 25.53 | 3.02 | 53.8 | 39.6 | 94.0 | 67
FP16_BASELINE · ctx:16384 · kv:k8v8 · default | 39.3 | n/a | 1931.2 | n/a | 25.54 | 3.06 | 53.8 | 25.9 | 92.8 | 64
FP16_BASELINE · ctx:8192 · kv:k16v16 · default | 39.9 | n/a | 1938.3 | n/a | 25.27 | 3.04 | 54.6 | 25.8 | 93.4 | 64
INT8_W8A8 · ctx:32768 · kv:k8v8 · long | 3.2 | n/a | 500.6 | n/a | 464.80 | 2.84 | 4.4 | 100.0 | 100.0 | 62
AWQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 35.4 | n/a | 1743.1 | n/a | 28.38 | 3.05 | 48.4 | 28.7 | 93.1 | 62
GPTQ4_MARLIN · ctx:32768 · kv:k8v8 · long | 34.0 | n/a | 1760.9 | n/a | 29.49 | 3.09 | 46.5 | 28.4 | 91.9 | 61
FP16_BASELINE · ctx:32768 · kv:k8v8 · long | 35.4 | n/a | 2514.9 | n/a | 28.33 | 3.08 | 48.4 | 19.9 | 92.2 | 60

Best config: AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long
Auto baseline compare: score Δ=19.00 TPS Δ=33.60 TTFT Δ=-59.8ms PPL Δ=0.01
Confidence: target=medium gap_before=1.16% var_before=n/a% replay=False(disabled) gap_after=1.16%

Depth profile (not cross-depth comparable):
  - bucket winners: best_at_8k=AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long | best_at_14k=AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long | best_at_28k=INT8_W8A8 · ctx:32768 · kv:k8v8 · long
  - 8k: winner=AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long error=none
  - 14k: winner=AWQ4_MARLIN · ctx:16384 · kv:k16v16 · long error=none
  - 28k: winner=INT8_W8A8 · ctx:32768 · kv:k8v8 · long error=none

PPL is a quality proxy, not production validation.
Full production safety and long-context certification require Sigilant Optimizer.
