{{ t('modal.model_settings.section_label') }}

{{ t('modal.model_settings.profiles.section_label') }}

no presets available

{{ t('modal.model_settings.basic_label') }}

{{ t('modal.model_settings.empty_hint') }}

{{ t('modal.model_settings.advanced_label') }}

{{ t('modal.model_settings.enable_thinking') }}

{{ t('modal.model_settings.enable_thinking_hint') }}

{{ t('modal.model_settings.thinking_budget') }}

{{ t('modal.model_settings.thinking_budget_hint') }}

{{ t('modal.model_settings.limit_tool_result') }}

{{ t('modal.model_settings.limit_tool_result_hint') }}

{{ t('modal.model_settings.force_sampling') }}

{{ t('modal.model_settings.force_sampling_hint') }}

{{ t('modal.model_settings.trust_remote_code') }}

{{ t('modal.model_settings.trust_remote_code_hint') }}

{{ t('modal.model_settings.chat_template_kwargs') }}

{{ t('modal.model_settings.chat_template_kwargs_hint') }}

{{ t('modal.model_settings.no_kwargs') }}

{{ t('modal.model_settings.experimental_label') }}

{{ t('modal.model_settings.turboquant_kv') }}

{{ t('modal.model_settings.turboquant_kv_hint') }}

SpecPrefill

Attention-based sparse prefill for MoE/hybrid models. (Paper) (HuggingFace)

Small model sharing tokenizer with target (e.g. Qwen3.5-0.8B for 35B)

Min tokens to trigger (shorter prompts use full prefill)

DFlash

Block diffusion speculative decoding for 3-4x faster generation. Supports Qwen (3, 3.5, 3.6) and Gemma4 model families. Requires a DFlash draft model checkpoint.
Single-stream only: requests run one at a time.
* MLX impl by bstnxbt(GitHub)

DFlash draft checkpoint (e.g. z-lab/Qwen3-4B-DFlash-b16, z-lab/gemma-4-26B-A4B-it-DFlash). Note: -DFlash suffix only; -assistant variants are for MTP.

Quantization

Enable quantization for the draft model (weight, activation bits & group size).

{{ t('modal.model_settings.dflash_max_ctx_help') }}

{{ t('modal.model_settings.dflash_max_concurrent_help') }}

{{ t('modal.model_settings.dflash_l1_cache') }}

{{ t('modal.model_settings.dflash_l1_cache_hint') }}

{{ t('modal.model_settings.dflash_l1_max_entries_help') }}

{{ t('modal.model_settings.dflash_l1_max_gib_help') }}

{{ t('modal.model_settings.dflash_l2_cache') }}

{{ t('modal.model_settings.dflash_l2_cache_hint') | safe }}

{{ t('modal.model_settings.dflash_l2_unavailable') | safe }}

{{ t('modal.model_settings.dflash_l2_requires_l1') }}

{{ t('modal.model_settings.mtp') }}

{{ t('modal.model_settings.mtp_hint') | safe }}

{{ t('modal.model_settings.mtp_conflict') }}

{{ t('modal.model_settings.vlm_mtp') }}

{{ t('modal.model_settings.vlm_mtp_hint') | safe }}

{{ t('modal.model_settings.vlm_mtp_conflict') }}