GPU Util % utilisation
GPU Temp °C die
Unified GB of 128 · 8 GB guard
Throughput tok / second
TTFT ms · first token
throughput & first-token from the active lane
Active Lane idle no warm brain
Models8quant · lora · adapter
Benches3eval datasets
Notebooks5runnable on-ramps
Tooling6harnesses · skills
Free tier22run offline · yours
Kind License
quant advisor free

A governed 4B advisor over your corpus — exact source-id citations, trusted refusals, local on a DGX Spark

base nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16

recommended Q4_K_M · 70 tok/s Explore →
quant astro free

A numeric astrodynamics reasoner — one verifiable boxed number out, served local on a DGX Spark for $0 a query

base Qwen/Qwen3-8B

recommended Q8_0 · 21 tok/s Explore →
lora patent free

Offline patent-prosecution reasoning on Spark-class hardware

base deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

nemo
recommended BF16 Explore →
quant patent free

Offline patent-prosecution reasoning on Spark-class hardware

base deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

nemo
recommended Q5_K_M · 35 tok/s Explore →
quant medical free

An 8B medical-reasoning model with a visible think-chain, quantized for offline clinical Q&A

base Intelligent-Internet/II-Medical-8B

recommended Q5_K_M · 36 tok/s Explore →
quant cyber free

A 7B cybersecurity chat model, quantized to run offline on a consumer GPU

base ZySec-AI/SecurityLLM

recommended Q4_K_M · 48 tok/s Explore →
quant legal free

A 7B legal-domain chat model, quantized to run offline on a consumer GPU

base Equall/Saul-7B-Instruct-v1

recommended Q5_K_M · 20 tok/s Explore →
quant finance free

A finance-specialized 7B chat model, quantized to run offline on a 4 GB consumer GPU

base AdaptLLM/finance-chat

recommended F16 · 12 tok/s Explore →
bench advisor free

The eval set that caught what prompting couldn't hold — frozen OOD curveballs for grounded citation, refusal, and routing

base n/a

0 variants Explore →
bench free

hermes-brain-bench-v0.1

base n/a

0 variants Explore →
bench patent free

patent-strategist-bench-v0.1

base n/a

0 variants Explore →
notebook finance free

Build the finance-chat quant — and call the model — on a Spark or a free cloud GPU

base AdaptLLM/finance-chat

recommended builder Explore →
notebook legal free

Build the Saul-7B quant — and call the legal model — on a Spark or a free cloud GPU

base Equall/Saul-7B-Instruct-v1

recommended builder Explore →
notebook cyber free

Build the SecurityLLM quant — and call the model — on a Spark or a free cloud GPU

base ZySec-AI/SecurityLLM

recommended builder Explore →
notebook medical free

Build the II-Medical-8B quant — and call the reasoner — on a Spark or a free cloud GPU

base Intelligent-Internet/II-Medical-8B

recommended builder Explore →
notebook patent free

Run the patent-strategist build — and use the model — on a Spark or a free cloud GPU

base deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

nemo
recommended builder Explore →
harness advisor free

A local memory layer that gates its own recall

base fieldkit.memory · pgvector(vectors/blog_chunks) · NIM llama-nemotron-embed-1b-v2

recommended cosine-only · top_k=5 · GB10 measured baseline Explore →
arena run astro free

An operator cockpit you run on your own DGX Spark

base fieldkit[arena] · Astro + FastAPI sidecar

0 variants Explore →
harness free

Which local lane should drive your always-on Spark agent?

base Hermes Agent v0.14.0

recommended llama.cpp · Qwen3-30B-A3B (MoE, Q4_K_M) · 88 tok/s Explore →
harness free

When does local stop being enough? Measure first, then route.

base Hermes Agent v0.14.0

recommended Local Spark — Qwen3-30B-A3B MoE Q4_K_M Explore →
skill free

The skills you write for Claude Code load into Hermes unchanged.

base agentskills.io SKILL.md (Hermes / Claude Code compatible)

recommended spark-serve Explore →
harness free

One always-on brain, five specialists, zero LLM-classifier overhead.

base Hermes Agent v0.14.0

recommended Default brain (MoE) Explore →