Metadata-Version: 2.4
Name: mlx-ssd
Version: 0.1.0
Summary: Simple Self-Distillation training pipeline for MLX models
Author: mlx-ssd contributors
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: mlx-lm>=0.21.0
Requires-Dist: mlx-tokenizers>=1.0.0
Requires-Dist: datasets>=3.0.0
Requires-Dist: huggingface-hub>=0.24.0

# mlx-ssd

`mlx-ssd` is a practical MLX CLI implementation of simple self-distillation for code generation models on Apple Silicon.

## Method

This project follows the method introduced in:

> Ruixiang Zhang, Richard He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang.  
> **Embarrassingly Simple Self-Distillation Improves Code Generation**.  
> arXiv:2604.01193, 2026.  
> https://arxiv.org/abs/2604.01193

Implementation by **Amirani Labs**.

Core flow:

1. Sample responses from a base model with train-time decoding settings.
2. Fine-tune on those self-generated samples.
3. Evaluate/run with eval-time decoding settings.

Dataset defaults:

- `--problems microsoft/rStar-Coder`
- `--dataset-config seed_sft`
- `--dataset-split train`
- Records must contain a non-empty `question` field.

This repository is an independent implementation and is **not** the original paper repository.

## Presets

Presets encode paper-aligned hyperparameters (Table 3 mapping) for supported model families.

```bash
mlx-ssd sample --model mlx-community/Qwen3-4B-Instruct-4bit --preset qwen3-4b-instruct --output ./ssd_data
mlx-ssd train --model mlx-community/Qwen3-4B-Instruct-4bit --preset qwen3-4b-instruct --data ./ssd_data --output ./ssd_model
mlx-ssd run --model ./ssd_model/fused --preset qwen3-4b-instruct --prompt "Write a function that..."
```

## Usage

Install:

```bash
pip install -e .
```

Three-stage flow:

```bash
# 1) Sample
mlx-ssd sample \
  --model mlx-community/Qwen3-4B-Instruct-4bit \
  --problems microsoft/rStar-Coder \
  --dataset-config seed_sft \
  --dataset-split train \
  --output ./ssd_data \
  --batch-size 16 \
  --temperature 1.6 \
  --top-k 20 \
  --top-p 0.8 \
  --limit 10

# 2) Train
mlx-ssd train \
  --model mlx-community/Qwen3-4B-Instruct-4bit \
  --data ./ssd_data \
  --output ./ssd_model \
  --iters 2500

# 3) Run
mlx-ssd run \
  --model ./ssd_model/fused \
  --temperature 1.1 \
  --top-k 20 \
  --top-p 0.8 \
  --prompt "Write a function that..."
```

One-command flow:

```bash
mlx-ssd distill \
  --model mlx-community/Qwen3-4B-Instruct-4bit \
  --preset qwen3-4b-instruct \
  --output ./my-better-qwen
```

Local smoke test (quick validation):

```bash
mlx-ssd sample \
  --model mlx-community/SmolLM2-135M-Instruct \
  --problems microsoft/rStar-Coder \
  --dataset-config seed_sft \
  --dataset-split train \
  --output ./.smoke/data \
  --batch-size 4 \
  --temperature 0.8 \
  --top-k 20 \
  --top-p 0.8 \
  --max-tokens 64 \
  --limit 5
```

## Apple Silicon

`mlx-ssd` itself is the Apple Silicon implementation: it is built on `mlx-lm` and targets local MLX workflows.

## License

MIT
