Metadata-Version: 2.4
Name: mlxsmith
Version: 0.1.9
Summary: Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.
Author-email: Shannon Labs <hmbown@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Hmbown/MLXSmith
Project-URL: Repository, https://github.com/Hmbown/MLXSmith
Project-URL: Issues, https://github.com/Hmbown/MLXSmith/issues
Keywords: mlx,apple-silicon,llm,fine-tuning,lora,openai-compatible
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: MacOS :: MacOS X
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.7.0
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: pydantic>=2.5.0
Requires-Dist: pydantic-settings>=2.2.1
Requires-Dist: tomli>=2.0.1; python_version < "3.11"
Requires-Dist: huggingface_hub>=1.3.4
Requires-Dist: jsonschema>=4.21.0
Provides-Extra: mlx
Requires-Dist: mlx>=0.30.4; extra == "mlx"
Provides-Extra: llm
Requires-Dist: mlx-lm>=0.30.5; extra == "llm"
Requires-Dist: transformers>=5.0.0; extra == "llm"
Requires-Dist: datasets>=3.0.0; extra == "llm"
Provides-Extra: serve
Requires-Dist: fastapi>=0.128.0; extra == "serve"
Requires-Dist: uvicorn>=0.40.0; extra == "serve"
Requires-Dist: httpx>=0.28.0; extra == "serve"
Provides-Extra: dev
Requires-Dist: pytest>=9.0.0; extra == "dev"
Requires-Dist: ruff>=0.14.0; extra == "dev"
Provides-Extra: all
Requires-Dist: mlx>=0.30.4; extra == "all"
Requires-Dist: mlx-lm>=0.30.5; extra == "all"
Requires-Dist: transformers>=5.0.0; extra == "all"
Requires-Dist: datasets>=3.0.0; extra == "all"
Requires-Dist: pytest>=9.0.0; extra == "all"
Requires-Dist: fastapi>=0.128.0; extra == "all"
Requires-Dist: uvicorn>=0.40.0; extra == "all"
Requires-Dist: httpx>=0.28.0; extra == "all"
Dynamic: license-file

# MLXSmith

[![PyPI](https://img.shields.io/pypi/v/mlxsmith)](https://pypi.org/project/mlxsmith/)
[![CI](https://github.com/Hmbown/MLXSmith/actions/workflows/ci.yml/badge.svg)](https://github.com/Hmbown/MLXSmith/actions/workflows/ci.yml)
[![License](https://img.shields.io/github/license/Hmbown/MLXSmith)](LICENSE)

Fine-tune language models on Apple Silicon. SFT, preference optimization, reinforcement learning, distillation, and serving — all native to MLX.

**Status:** Alpha (v0.1.9) · Validated on Qwen3-4B and Qwen3-1.7B

---

## Features

- **Supervised fine-tuning** — LoRA and QLoRA with configurable optimizers
- **Preference optimization** — DPO, ORPO, IPO, CPO, SimPO, and more
- **Reinforcement learning** — GRPO with verifier-based rewards
- **Knowledge distillation** — Offline and online preference distillation
- **KTO** — Kahneman-Tversky Optimization from binary feedback
- **Online DPO** — Live preference tuning with LLM judge scoring
- **Self-verification training** — Policy gradient from self-assessed rewards
- **Synthetic data generation** — Generate, evolve, and filter training data
- **External model backends** — Use Codex, Claude, Gemini CLIs or any OpenAI-compatible API for data generation and judging
- **Recursive training** — Self-improving RLM loop with task generation and gating
- **Serving** — OpenAI-compatible API with streaming
- **Web dashboard (Next.js)** — Models, adapters, training, eval, chat, and serving UI
- **Environment plugins** — Reusable task and verifier packages for RL training
- **Experimental mHC adapters** — Optional block-local mHC patching for MLX transformer blocks (not a speedup)

## Requirements

- macOS with Apple Silicon (M1 or later)
- Python 3.10+

Data tools, configuration, and project scaffolding work on any platform.

## Install

```bash
pip install "mlxsmith[all]"
```

<details>
<summary>Selective install</summary>

```bash
# Core only (data tools, config, scaffolding)
pip install mlxsmith

# Apple Silicon training
pip install "mlxsmith[mlx,llm]"

# Training + serving
pip install "mlxsmith[mlx,llm,serve]"
```

</details>

## Quickstart

```bash
# 1. Create a project
mlxsmith init myproj && cd myproj

# 2. Verify your environment
mlxsmith doctor

# 3. Pull a model
mlxsmith pull mlx-community/Qwen3-4B-Instruct-2507-4bit

# 4. Pull training data
mlxsmith data pull --preset alpaca

# 5. Fine-tune
mlxsmith sft \
  --model cache/mlx/mlx-community__Qwen3-4B-Instruct-2507-4bit \
  --data data/sft

# 6. Serve the result
mlxsmith serve --model runs/sft_0001/adapter --port 8080
```

See [Getting Started](docs/getting-started.md) for a complete walkthrough.

## End-to-end Smoke (Qwen3-1.7B)

This repo includes an end-to-end smoke run that validates the full pipeline
(SFT → Pref → RFT → RLM) on `Qwen/Qwen3-1.7B-MLX-4bit`.

```bash
mlxsmith pull Qwen/Qwen3-1.7B-MLX-4bit
./scripts/exp_qwen3_1.7b_mlx_4bit_e2e_smoke.sh

# Optional: also smoke-test `mlxsmith serve` + OpenAI-compatible endpoint
SMOKE_SERVE=1 ./scripts/exp_qwen3_1.7b_mlx_4bit_e2e_smoke.sh
```

The smoke run uses `qwen3_1.7b_mlx_4bit_smoke.yaml` and the tiny datasets in
`data/sft` and `data/prefs`.

## Repo SFT (Qwen3-1.7B)

To build a small “MLXSmith repo assistant” adapter on top of `Qwen/Qwen3-1.7B-MLX-4bit`,
use the repo-grounded SFT script:

```bash
# 1) Generate seed prompts from the repo
python3 scripts/make_repo_seed_prompts.py --out data/mlxsmith_prompts.jsonl

# 2) Generate responses (via Codex) + train LoRA
NUM=300 BATCH=4 ITERS=2000 LR=2e-4 ./scripts/exp_qwen3_1.7b_mlx_4bit_repo_sft.sh
```

Notes:

- The script uses `codex exec` by default. Override with `MLXSMITH_CLI_CODEX_CMD` if needed.
- Qwen output sanitization is enabled in the included configs via `infer.strip_think: true`.

## Web Dashboard (Optional)

Run the API server, then start the Next.js dashboard:

```bash
# Terminal 1: start the OpenAI-compatible API
mlxsmith serve --model cache/mlx/mlx-community__Qwen3-4B-Instruct-2507-4bit --port 8080

# Terminal 2: start the dashboard
cd apps/web
npm install
npm run dev
```

The dashboard defaults to `http://localhost:8080` for the API base URL (change in Settings if needed).

## Training Modes

| Mode | Command | Input Format | Use Case |
|------|---------|-------------|----------|
| [SFT](docs/cli/sft.md) | `mlxsmith sft` | `{prompt, response}` | Instruction-following via LoRA |
| [Preference](docs/cli/preference-training.md) | `mlxsmith pref` | `{prompt, chosen, rejected}` | Alignment with DPO, ORPO, and others |
| [KTO](docs/cli/kto.md) | `mlxsmith kto` | `{prompt, response, label}` | Binary good/bad feedback |
| [GRPO](docs/cli/reinforcement-training.md) | `mlxsmith rft` | Environment + verifier | Reward-driven reinforcement learning |
| [Online DPO](docs/cli/online-dpo.md) | `mlxsmith online-dpo` | `{prompt}` | Online preference with LLM judge |
| [Self-verify](docs/cli/self-verify.md) | `mlxsmith self-verify` | `{prompt}` | Self-verification reward signal |
| [Distillation](docs/cli/distillation.md) | `mlxsmith distill` | `{prompt}` | Teacher-to-student transfer |
| [Judge](docs/cli/judge.md) | `mlxsmith judge` | Judge-format data | Train a scoring model |
| [Pipeline](docs/cli/sft.md#pipeline) | `mlxsmith pipeline` | Combined | SFT then Pref then RFT then RLM |

See [Concepts](docs/concepts.md) for an explanation of each training mode.

## Tools

| Tool | Command | Description |
|------|---------|-------------|
| [Data](docs/cli/data.md) | `mlxsmith data` | Import, split, validate, and pull datasets |
| [Synthetic](docs/cli/synthetic-data.md) | `mlxsmith synthetic` | Generate and evolve training data |
| [Eval](docs/cli/eval-and-bench.md) | `mlxsmith eval` | Run evaluation suites with pass@k |
| [Bench](docs/cli/eval-and-bench.md) | `mlxsmith bench` | Benchmark inference and training throughput |
| [Serve](docs/cli/serving.md) | `mlxsmith serve` | OpenAI-compatible model server |
| [RLM](docs/cli/rlm.md) | `mlxsmith rlm` | Recursive training loop + REPL-based inference |

## External Model Backends

MLXSmith can use powerful cloud models for synthetic data generation and judging while keeping fine-tuning local on Apple Silicon.

Supported backends:

- `cli` — shell out to Codex/Claude/Gemini CLIs (or any command you provide)
- `openai` — call any OpenAI-compatible Chat Completions endpoint

Note: training commands (`sft`, `pref`, `rft`, `rlm` loop) still require a local training backend like `mlx-lm`.

**CLI Backend** — Shell out to Codex, Claude, or Gemini CLIs:

```bash
# Use a CLI model for prompt generation
export MLXSMITH__MODEL__BACKEND=cli
export MLXSMITH_CLI_CODEX_CMD='codex exec --full-auto --model gpt-5.2'

# If your CLI expects the prompt as an argument instead of stdin:
# export MLXSMITH_CLI_PROMPT_FLAG='--prompt'

mlxsmith synthetic prompts \
  --model codex \
  --seed-prompts data/seeds.jsonl \
  --num 100 \
  --out data/prompts.jsonl

# Use a CLI model as judge for filtering
mlxsmith synthetic sft \
  --model codex \
  --judge-backend cli \
  --judge-model claude \
  --prompts data/prompts.jsonl \
  --out data/sft.jsonl
```

**OpenAI Backend** — Use any OpenAI-compatible API:

```bash
export MLXSMITH__MODEL__BACKEND=openai
export OPENAI_API_KEY="sk-..."
export MLXSMITH_API_BASE="https://api.openai.com/v1"  # or any compatible endpoint

mlxsmith synthetic prompts \
  --model gpt-4o \
  --out data/prompts.jsonl
```

This enables cloud-quality data generation with local training — use frontier models to create and filter training data, then fine-tune efficiently on your Mac.

## Documentation

| Section | Description |
|---------|-------------|
| [Getting Started](docs/getting-started.md) | Full setup walkthrough |
| [Concepts](docs/concepts.md) | Training modes explained |
| [CLI Reference](docs/cli/README.md) | All commands with examples |
| [Verifiers](docs/VERIFIERS.md) | Verifier API and composition |
| [Environments](docs/ENVIRONMENTS.md) | Task environment plugins |
| [Project Format](docs/PROJECT_FORMAT.md) | Run artifacts and layout |
| [Configuration](docs/cli/configuration.md) | Config system and options |
| [Compatibility](docs/COMPATIBILITY.md) | Tested versions and models |
| [Troubleshooting](docs/troubleshooting.md) | Common issues and fixes |
| [FAQ](docs/FAQ.md) | Frequently asked questions |
| [Contributing](CONTRIBUTING.md) | How to contribute and run tests |
| [Changelog](CHANGELOG.md) | Release notes |

## License

MIT
