{% extends "_base.html" %} {% block content %}
Pick a local OptIQ quant and (optionally) MTP speculation, click Apply, and the Lab swaps the running model without restarting. Switching takes ~5-30 seconds depending on model size.
| Model | |
| API port | {{ api_port }} |
| Server status | |
| Spec decoding |
drafter ·
off
|
| Prompt cache budget |
|
| Sampler |
(from model defaults)
|
~/.cache/huggingface/ and downloaded on first use.
-assistant drafter alongside the host model, γ=1 greedy.
Leave blank to skip, or type any other HF id to override.
See the MTP guide.
{{ default_pc_gb }} GB (15 % of system RAM).
generation_config.json apply for blank fields. Use temp=0 for greedy / deterministic output (gives the largest MTP speedup).
models/, paste a path
from elsewhere, or add multiple. One adapter routes through mlx-lm's
classic --adapter-path boot; two or more activate OptIQ's
mounted-LoRA mode where the base model stays loaded and clients pick
an adapter per request via the adapters field in the body
(adapter name = the directory's basename). Switching is instant —
one base in RAM, ~30 MB per extra adapter, gated by a
ContextVar in the forward pass.
~/.optiq/lab/models (click to add):
{"adapters": "<name>"} in the chat-completions body.
The Chat surface picks one via the dropdown above the message input.