GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
throughput & first-token from the active lane
Active Lane idle no warm brain

← Models

What it's for
  • Governed Q&A over an enterprise or personal corpus with exact source-id citations
  • Refusal-gated handling of out-of-corpus and private-state questions
  • The local citation lane in a governed routing stack that escalates to a frontier model with receipts

Audience — Operators who want a corpus-grounded local advisor whose citation and refusal behavior is bench-proven (frozen OOD curveballs, strict scoring) — not a hosted assistant.

Quant economics quality × speed per build
Variant tok/s advisor curveball-v0.2, frozen OOD bench (n=21, scored==strict; refusals 9/9, 0 private-state risk)
Q4_K_M sweet spot 70.0 0.86
Q8_0 42.0 0.86