{{ t('models.manager.description') }}
{{ t('models.manager.no_models') }}
{{ t('models.downloader.description') }}
{{ t('models.browse.loading') }}
{{ t('models.browse.load_prompt') }}
{{ t('models.browse.no_trending') }}
{{ t('models.browse.no_popular') }}
{{ t('models.browse.searching') }}
{{ t('models.browse.search_prompt') }}
{{ t('models.browse.no_results') }}
{{ t('models.browse.loading') }}
{{ t('models.browse.load_prompt') }}
{{ t('models.browse.no_trending') }}
{{ t('models.browse.no_popular') }}
{{ t('models.browse.searching') }}
{{ t('models.browse.search_prompt') }}
{{ t('models.browse.no_results') }}
{{ t('models.oq.description') }}
Excludes vision encoder weights. Output is a text-only model (~2-3% smaller).
{{ t('models.oq.preserve_mtp_help') }}
{{ t('models.oq.preserve_mtp_unavailable') }}
{{ t('models.oq.dtype_help') }}
Quantization should not be exclusive to any particular inference server.
oQ produces standard mlx-lm models that work everywhere — oMLX, mlx-lm, LM Studio, and any app that supports MLX safetensors format.
No custom loader required.
oQ measures each layer's quantization sensitivity through calibration (relative MSE vs float16) and builds a byte-budgeted mixed-precision plan that allocates bits where the data says they matter most. Built-in calibration data (600 samples across code, multilingual, reasoning, and tool calling). Every model gets a unique bit allocation tuned to its architecture.
| oQ | GGUF K-quant | GGUF IQ | unsloth Dynamic | AWQ | |
|---|---|---|---|---|---|
| Format | MLX safetensors | GGUF | GGUF | GGUF | safetensors |
| Mixed precision | Data-driven sensitivity (per-layer MSE) | ~15 type rules | imatrix per-weight | Per-layer (proprietary) | Per-channel scaling |
| Hybrid modes | Affine (group_size=64) | K-quant types | E8 lattice | Proprietary | Affine only |
| MoE support | Router fp16, expert-aware budget | Basic | Basic | Router fp16 | Limited |
| VLM / SSM | Vision fp16, SSM F32 state | Separate mmproj, F32 state | Separate mmproj, F32 state | Supported | Not tested |
| Calibration | Built-in 600 samples (code + multilingual + reasoning) | None | imatrix required | Proprietary data | Activation data |
| Memory | Streaming (~5-7GB) | 1× model | 1× model | Proprietary | 1.5× model |
| Apple Silicon | Native (MLX affine) | Via llama.cpp | Via llama.cpp | GGUF only | MLX (affine) |
{{ t('models.detail.loading') }}
{{ t('models.detail.no_model_card') }}
{{ t('models.detail.no_files') }}
{{ t('models.detail.no_tags') }}