{% extends "_base.html" %} {% block content %}
Workflow · fine-tune

Fine-tune a LoRA

Apply a LoRA adapter on top of any OptiQ quant. Runs MLX-natively; takes advantage of OptiQ's sensitivity data via rank-scaling.

{# Step 1: pick base #}

Base model

{# Step 2: dataset #}

Dataset

Path to a directory containing train.jsonl (and optionally valid.jsonl). mlx-lm accepts the standard shapes: {"text": ...}, {"prompt": ..., "completion": ...}, or {"messages": [...]}.

{# Step 3: hyperparams #}

Hyperparameters

Vision LoRA. The frozen vision tower encodes each image into soft tokens; LoRA trains only the language tower with gradient checkpointing on. Build the dataset with the VLM image+text template so every image is letterboxed to one uniform canvas. Uniform image shape is what keeps training memory bounded on Apple Silicon. Scale defaults to 8 (the Qwen3.5/3.6 hybrid family collapses at 20).
DPO reuses the adapted model with adapter scale temporarily zeroed for the reference forward pass, so there is no second model load.
DPO defaults differ from SFT. Learning rate 5e-5 (about 4x lower than SFT), 10% warmup, cosine decay. Without warmup and the lower LR, the first preference-loss steps blow out the reward margin and training collapses to loss=0 with both rewards drifting to -hundreds. Bump LR cautiously. Also confirm that chosen and rejected are both valid completions of the same prompt, otherwise DPO has no signal to learn from.
{# Step 4: live training #}

Training

{# Loss chart — minimal SVG sparkline #}

  
{# Step 5: combine + export + push #}

Done

LoRA adapter saved to:



    

Combine with another adapter

Common case: this is a DPO adapter trained with --mount-adapter on top of an SFT. Merge them here so the final artifact is a single drop-in adapter (rank-concat, mathematically exact). Skip this section to ship the adapter alone.

Merged → ( layers rank-concat, in only one source)

Bundle as a self-contained model

Optional: copy the base model files + this adapter into one directory ready for optiq serve --model <dir> or stock mlx_lm.generate. Larger payload but drop-in usable without any adapter flags.

Exported →

Push to Hugging Face

Pushes the exported model directorythe merged adapterthe trained adapter.

Train another
{% endblock %}