{% extends "_base.html" %} {% block content %}
Workflow · fine-tune

Fine-tune a LoRA

Apply a LoRA adapter on top of any OptIQ quant. Runs MLX-natively; takes advantage of OptIQ's sensitivity data via rank-scaling.

{# Step 1: pick base #}

Base model

{# Step 2: dataset #}

Dataset

Path to a directory containing train.jsonl (and optionally valid.jsonl). mlx-lm accepts the standard shapes: {"text": ...}, {"prompt": ..., "completion": ...}, or {"messages": [...]}.

{# Step 3: hyperparams #}

Hyperparameters

DPO reuses the adapted model with adapter scale temporarily zeroed for the reference forward pass, so there is no second model load.
DPO defaults differ from SFT. Learning rate 5e-5 (about 4x lower than SFT), 10% warmup, cosine decay. Without warmup and the lower LR, the first preference-loss steps blow out the reward margin and training collapses to loss=0 with both rewards drifting to -hundreds. Bump LR cautiously. Also confirm that chosen and rejected are both valid completions of the same prompt, otherwise DPO has no signal to learn from.
{# Step 4: live training #}

Training

{# Loss chart — minimal SVG sparkline #}

  
{# Step 5: push #}

Done

LoRA adapter saved to:



    

Push to Hugging Face

Train another
{% endblock %}