Metadata-Version: 2.4
Name: mlx-optiq
Version: 0.2.7
Summary: Mixed-precision quantization optimizer for LLMs on Apple Silicon (MLX)
Author: mlx-optiq
License: MIT
Project-URL: Models, https://huggingface.co/collections/mlx-community
Keywords: mlx,quantization,mixed-precision,apple-silicon,llm,kv-cache
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: mlx>=0.20
Requires-Dist: mlx-lm>=0.31.3
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: huggingface-hub
Requires-Dist: lm-format-enforcer>=0.10
Provides-Extra: convert
Requires-Dist: torch>=2.0; extra == "convert"
Requires-Dist: transformers>=4.40; extra == "convert"
Requires-Dist: safetensors; extra == "convert"
Requires-Dist: tqdm; extra == "convert"
Requires-Dist: datasets; extra == "convert"
Requires-Dist: psutil; extra == "convert"
Provides-Extra: vlm
Requires-Dist: mlx-vlm>=0.3; extra == "vlm"
Requires-Dist: pillow; extra == "vlm"
Provides-Extra: audio
Requires-Dist: mlx-whisper>=0.4; extra == "audio"
Provides-Extra: cli
Requires-Dist: click>=8.0; extra == "cli"
Requires-Dist: psutil; extra == "cli"
Provides-Extra: lab
Requires-Dist: flask>=3.0; extra == "lab"
Requires-Dist: argon2-cffi>=23.0; extra == "lab"
Requires-Dist: pyjwt>=2.8; extra == "lab"
Requires-Dist: cryptography>=42.0; extra == "lab"
Requires-Dist: data-designer; extra == "lab"
Requires-Dist: ddgs>=9.0; extra == "lab"
Requires-Dist: html2text>=2024.0; extra == "lab"
Requires-Dist: pypdf>=4.0; extra == "lab"
Requires-Dist: docx2txt>=0.8; extra == "lab"
Provides-Extra: all
Requires-Dist: torch>=2.0; extra == "all"
Requires-Dist: transformers>=4.40; extra == "all"
Requires-Dist: safetensors; extra == "all"
Requires-Dist: tqdm; extra == "all"
Requires-Dist: datasets; extra == "all"
Requires-Dist: mlx-vlm>=0.3; extra == "all"
Requires-Dist: mlx-whisper>=0.4; extra == "all"
Requires-Dist: click>=8.0; extra == "all"
Requires-Dist: psutil; extra == "all"
Requires-Dist: pillow; extra == "all"
Requires-Dist: flask>=3.0; extra == "all"
Requires-Dist: argon2-cffi>=23.0; extra == "all"
Requires-Dist: pyjwt>=2.8; extra == "all"
Requires-Dist: cryptography>=42.0; extra == "all"
Requires-Dist: data-designer; extra == "all"
Requires-Dist: ddgs>=9.0; extra == "all"
Requires-Dist: html2text>=2024.0; extra == "all"
Requires-Dist: pypdf>=4.0; extra == "all"
Requires-Dist: docx2txt>=0.8; extra == "all"

# mlx-optiq

**Quantize, fine-tune and serve LLMs entirely on Apple Silicon.**

**Website:** https://mlx-optiq.com &nbsp;|&nbsp; **Docs:** https://mlx-optiq.com/docs/ &nbsp;|&nbsp; **Models:** https://mlx-optiq.com/models &nbsp;|&nbsp; **Blog:** https://mlx-optiq.com/blog/ &nbsp;|&nbsp; **HF org:** https://huggingface.co/mlx-community

mlx-optiq is an optimizing compiler and runtime for MLX. It turns a full-precision model into the best version for a given memory and latency budget on your Mac, using per-layer sensitivity measurement instead of uniform 4-bit everywhere. The same signal drives weights, KV cache, LoRA fine-tuning, and runtime adapter swapping.

```bash
pip install mlx-optiq
```

## What it does

- **Mixed-precision weight quantization** that beats uniform 4-bit at the same size. `optiq convert` measures each layer's sensitivity and allocates bits per layer. A `static` method assigns bits by architecture for models too large to measure. [Methods](https://mlx-optiq.com/docs/sensitivity).
- **SSD expert streaming** runs large MoE quants that don't fit in RAM. A 2-bit Qwen3.5-122B-A10B runs on a 36 GB Mac at ~12 GB resident, experts streamed off disk. [How](https://mlx-optiq.com/blog/stream-122b-on-a-mac).
- **Mixed-precision KV cache** for longer context at lower memory. `optiq serve` runs a per-layer KV quant pipeline.
- **One server, two protocols.** `optiq serve` speaks both the OpenAI and Anthropic APIs from one process. Point Claude Code or either SDK at the same local URL.
- **Speculative decoding** via bundled MTP heads or paired drafters (`--mtp`, `--drafter`).
- **Sensitivity-aware LoRA** (SFT + DPO) and runtime hot-swap adapters.
- **OptiQ Lab** (`pip install "mlx-optiq[lab]"` then `optiq lab`): a local web UI for chat, quantize, fine-tune, and dataset work.

## Quickstart

Every [mlx-optiq quant](https://huggingface.co/mlx-community) loads with stock `mlx-lm`:

```python
from mlx_lm import load, generate
model, tok = load("mlx-community/Qwen3.5-9B-OptiQ-4bit")
print(generate(model, tok, prompt="Hello", max_tokens=50))
```

Installing mlx-optiq unlocks the rest. A few starting points:

```bash
# Serve with the OpenAI + Anthropic API and ~1.4x speculative decode
optiq serve --model mlx-community/Qwen3.5-9B-OptiQ-4bit --mtp

# Run a huge MoE that doesn't fit in RAM (experts stream off SSD)
optiq serve --model mlx-community/Qwen3.5-122B-A10B-OptiQ-2bit --stream-experts

# Quantize a fresh model (exact sensitivity, or fast structural rules for big bases)
optiq convert Qwen/Qwen3.5-9B --target-bpw 5.0 --candidate-bits 4,8
optiq convert <large-moe> --method static --candidate-bits 2,4 --target-bpw 2.5

# Fine-tune with sensitivity-aware LoRA
optiq lora train mlx-community/Qwen3.5-9B-OptiQ-4bit --data ./jsonl_dir --rank 8
```

Full guides for serving, KV-quant, LoRA, MTP and per-family setup are in the [docs](https://mlx-optiq.com/docs/). The [models page](https://mlx-optiq.com/models) lists every quant with its Capability Score, and the [blog](https://mlx-optiq.com/blog/) has the deeper write-ups.

## Requirements

- Apple Silicon (M1 or newer), macOS, Python 3.11+.
- The published quants load with stock `mlx-lm`. Converting and some MoE / multimodal runtime features track `mlx-lm` main; install it from git when a model card asks for it.

## License

MIT for the package. Quantized models follow their base model's license.
