Metadata-Version: 2.4
Name: flyto-mlx
Version: 0.4.0
Summary: Flyto MLX — Apple Silicon LLM server with audio chat, DFlash, and Chinese model presets (based on oMLX)
Author: panwudi, Flyto MLX contributors, oMLX contributors (upstream)
License: Apache-2.0
Project-URL: Homepage, https://github.com/panwudi/flyto-mlx
Project-URL: Documentation, https://github.com/panwudi/flyto-mlx#readme
Project-URL: Repository, https://github.com/panwudi/flyto-mlx
Project-URL: Gitee Mirror, https://gitee.com/panwudi/flyto-mlx
Project-URL: Upstream, https://github.com/jundot/omlx
Keywords: llm,mlx,apple-silicon,vllm,inference,transformers,audio-llm,gemma,qwen,deepseek
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: mlx>=0.31.2
Requires-Dist: mlx-lm==0.31.3
Requires-Dist: regex
Requires-Dist: mlx-embeddings==0.1.0
Requires-Dist: transformers>=5.0.0
Requires-Dist: mistral-common>=1.10
Requires-Dist: tokenizers>=0.19.0
Requires-Dist: huggingface-hub>=0.23.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: itsdangerous>=2.0
Requires-Dist: jinja2>=3.0
Requires-Dist: sentencepiece
Requires-Dist: tiktoken
Requires-Dist: protobuf
Requires-Dist: requests>=2.28.0
Requires-Dist: socksio>=1.0.0
Requires-Dist: tabulate>=0.9.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: fastapi>=0.108.0
Requires-Dist: uvicorn>=0.23.0
Requires-Dist: python-multipart>=0.0.5
Requires-Dist: jsonschema>=4.0.0
Requires-Dist: openai-harmony
Requires-Dist: mlx-vlm>=0.5.0
Requires-Dist: Pillow>=9.0.0
Requires-Dist: dflash-mlx>=0.1.6
Provides-Extra: grammar
Requires-Dist: xgrammar>=0.1.32; extra == "grammar"
Provides-Extra: mcp
Requires-Dist: mcp>=1.0.0; extra == "mcp"
Provides-Extra: modelscope
Requires-Dist: modelscope>=1.10.0; extra == "modelscope"
Provides-Extra: audio
Requires-Dist: mlx-audio[sts,stt,tts]>=0.4.3; extra == "audio"
Provides-Extra: paroquant
Requires-Dist: paroquant==0.1.14; extra == "paroquant"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: mcp>=1.0.0; extra == "dev"
Dynamic: license-file

<p align="center">
  <img alt="Flyto MLX" src="docs/images/icon-rounded-light.svg" width="140">
</p>

<h1 align="center">Flyto MLX</h1>
<p align="center">
  <b>Apple Silicon LLM 服务器 · Audio chat · DFlash 双引擎 · 中文模型预设</b><br>
  Based on <a href="https://github.com/jundot/omlx">oMLX</a> by <a href="https://github.com/jundot">@jundot</a>.
</p>

<p align="center">
  <img src="https://img.shields.io/badge/license-Apache%202.0-blue" alt="License">
  <img src="https://img.shields.io/badge/python-3.10+-green" alt="Python 3.10+">
  <img src="https://img.shields.io/badge/platform-Apple%20Silicon-black?logo=apple" alt="Apple Silicon">
  <a href="https://gitee.com/panwudi/flyto-mlx"><img src="https://img.shields.io/badge/Gitee-mirror-c71d23" alt="Gitee mirror"></a>
</p>

---

**中文** | [English](#english)

## 简介

Flyto MLX 是面向**中国 Mac 用户**与**国产模型生态**优化的 Apple Silicon 本地 LLM 服务器，基于 [@jundot/oMLX](https://github.com/jundot/omlx) fork。在保留 oMLX 全部上游能力（OpenAI 兼容 API、多模型 LRU 调度、KV 分页缓存、Mac menubar GUI）的基础上，加入了上游尚未合并/未支持的功能：

| 能力 | 说明 |
|---|---|
| **Gemma 4 audio chat** | OpenAI `input_audio` content type 端到端支持，调用 `gemma4-e2b` / `gemma4-e4b` 直接听音频回答（不是 ASR 替代，是端到端 audio understanding） |
| **DFlash 双引擎 (Path A)** | Qwen / Gemma 4 双 backend，drafter co-loaded 优化 |
| **Tahoe 兼容** | macOS 26 NSStatusItem occlusion bit 修复 |
| **上游已修但未发版的 backport** | tokenizer lm_head、TokenBuffer cache hit seed、health-check Session 复用 等 5 处 |
| **中文模型预设** | Qwen 3.5 MoE/Dense / DeepSeek V4 / Gemma 4 / 等 alias 即装即用 |
| **Gitee 镜像 + ModelScope 模型源** | 国内 access 优化 |

## 安装

```bash
# pip
pip install flyto-mlx

# 启动 server（CLI 兼容上游 omlx，主名为 fmlx）
fmlx serve --port 8000
# 或
omlx serve --port 8000     # alias，与上游兼容
```

DMG / brew tap 后续随 release 提供。

## 快速试 audio chat

```bash
# 假设 server 已起在 :8000，API key 设为 mykey
python3 <<'PY'
import base64, requests, json
with open("recording.wav","rb") as f:
    b64 = base64.b64encode(f.read()).decode()
r = requests.post(
    "http://localhost:8000/v1/chat/completions",
    headers={"Authorization": "Bearer mykey"},
    json={
        "model": "gemma4-e2b",
        "max_tokens": 400,
        "temperature": 0.3,
        "messages": [{"role": "user", "content": [
            {"type": "text", "text": "总结这段电话的关键信息"},
            {"type": "input_audio", "input_audio": {"data": b64, "format": "wav"}}
        ]}]
    },
)
print(r.json()["choices"][0]["message"]["content"])
PY
```

## 跟上游 oMLX 的关系

Flyto MLX 是 oMLX 的下游 fork，遵循 Apache 2.0。我们**定期 cherry-pick 上游 bug fix 与新模型支持**，但不再向上游 PR 自家 feature（audio chat、DFlash 等）。如果你只想要纯上游体验，请用 [@jundot/oMLX](https://github.com/jundot/omlx)。

详细 attribution 与版权声明见 [NOTICE](NOTICE) 与 [LICENSE](LICENSE)。

## License

Apache License 2.0. Based on oMLX by [@jundot](https://github.com/jundot). 详见 [LICENSE](LICENSE) 与 [NOTICE](NOTICE)。

---

## English

Flyto MLX is a fork of [@jundot/oMLX](https://github.com/jundot/omlx) optimized for the Chinese Mac LLM community and sovereign-AI model ecosystem (Qwen, DeepSeek, Gemma 4). It preserves all upstream oMLX capabilities (OpenAI-compatible API, multi-model LRU scheduling, KV paged cache, menubar GUI) and adds:

- **Audio chat via OpenAI `input_audio`** — end-to-end Gemma 4 nano audio LLM through `/v1/chat/completions`
- **DFlash Path A double-engine** — Qwen and Gemma 4 backends with optimized drafter co-loading
- **macOS 26 Tahoe compatibility** — NSStatusItem occlusion bit fix
- **5 upstream-fixed-but-unreleased patches backported** — tokenizer lm_head, TokenBuffer cache hit seed, health-check session reuse, and more
- **Chinese model presets** — Qwen 3.5 MoE/Dense, DeepSeek V4, Gemma 4 aliases ready out of the box
- **Gitee mirror + ModelScope model registry** — for users in mainland China

Install: `pip install flyto-mlx`. CLI: `fmlx serve` (or `omlx serve` alias for upstream compatibility).

We periodically cherry-pick upstream fixes. We do **not** upstream our own features back. For pure upstream behaviour, please use [@jundot/oMLX](https://github.com/jundot/omlx) directly.

## License

Apache 2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).
