{% extends "_base.html" %} {% block content %}
Workflow · fine-tune

Fine-tune a LoRA

Apply a LoRA adapter on top of any OptIQ quant. Runs MLX-natively; takes advantage of OptIQ's sensitivity data via rank-scaling.

{# Step 1: pick base #}

Base model

{# Step 2: dataset #}

Dataset

Path to a directory containing train.jsonl (and optionally valid.jsonl). mlx-lm accepts the standard shapes: {"text": ...}, {"prompt": ..., "completion": ...}, or {"messages": [...]}.

{# Step 3: hyperparams #}

Hyperparameters

DPO reuses the adapted model with adapter scale temporarily zeroed for the reference forward pass, so there is no second model load.
DPO defaults differ from SFT. Learning rate 5e-5 (about 4x lower than SFT), 10% warmup, cosine decay. Without warmup and the lower LR, the first preference-loss steps blow out the reward margin and training collapses to loss=0 with both rewards drifting to -hundreds. Bump LR cautiously. Also confirm that chosen and rejected are both valid completions of the same prompt, otherwise DPO has no signal to learn from.
{# Step 4: live training #}

Training

{# Loss chart — minimal SVG sparkline #}

  
{# Step 5: combine + export + push #}

Done

LoRA adapter saved to:



    

Combine with another adapter

Common case: this is a DPO adapter trained with --mount-adapter on top of an SFT. Merge them here so the final artifact is a single drop-in adapter (rank-concat, mathematically exact). Skip this section to ship the adapter alone.

Merged → ( layers rank-concat, in only one source)

Bundle as a self-contained model

Optional: copy the base model files + this adapter into one directory ready for optiq serve --model <dir> or stock mlx_lm.generate. Larger payload but drop-in usable without any adapter flags.

Exported →

Push to Hugging Face

Pushes the exported model directorythe merged adapterthe trained adapter.

Train another
{% endblock %}