GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
throughput & first-token from the active lane
Active Lane idle no warm brain

← Models

What it's for
  • Offline security-domain chat and concept Q&A on consumer hardware
  • A study aid for security certifications and terminology
  • Picking a quant variant by workload shape, not just RAM budget

Audience — Local-LLM power users and security learners who want an offline cybersecurity chat model on a consumer GPU — for study and exploration, not operational security decisions.

Quant economics quality × speed per build
Variant Perplexity tok/s CyberMetric (n=50, mcq_letter)
Q4_K_M sweet spot 7.400 47.7 0.40
Q5_K_M 7.314 40.0 0.38
Q6_K 7.313 35.0 0.36
Q8_0 7.307 30.3 0.36
F16 7.301 17.4 0.34

Perplexity lower = better; tok/s measured on the DGX Spark (GB10, 128 GB unified).

Efficiency curve quality index × tok/s
Known drift bounded · honest
  • CyberMetric accuracy is modest (4-choice MCQ, n=50) CyberMetric (n=50, mcq_letter) lands 34–40% — above the 25% random baseline for 4-choice MCQ but modest, and the 50-question sample makes the variant ordering statistically loose. A 7B ceiling, not a quant failure.
  • Not a security tool or advisory source A 7B chat model inherited from the upstream base — for study and concept Q&A, not vulnerability assessment, incident response, or operational decisions. No security-grade validation is claimed.