{% extends "_base.html" %} {% block content %}
Pick a local OptIQ quant and (optionally) MTP speculation, click Apply, and the Lab swaps the running model without restarting. Switching takes ~5-30 seconds depending on model size.
| Model | |
| API port | {{ api_port }} |
| Server status | |
| Spec decoding |
drafter ·
off
|
| Prompt cache budget |
|
| Sampler |
(from model defaults)
|
~/.cache/huggingface/ and downloaded on first use.
-assistant drafter alongside the host model, γ=1 greedy.
Leave blank to skip, or type any other HF id to override.
See the MTP guide.
{{ default_pc_gb }} GB (15 % of system RAM).
generation_config.json apply for blank fields. Use temp=0 for greedy / deterministic output (gives the largest MTP speedup).