Metadata-Version: 2.4
Name: mega-asr-mlx
Version: 0.1.0
Summary: Mega-ASR: fine-tuned Qwen3-ASR 1.7B for Chinese/English code-switching speech recognition on Apple MLX
Author-email: Carlos Huang <huaixian.huang@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/carloshuang1224/mega-asr-mlx
Project-URL: Repository, https://github.com/carloshuang1224/mega-asr-mlx
Project-URL: Issues, https://github.com/carloshuang1224/mega-asr-mlx/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: mlx>=0.20
Requires-Dist: mlx-lm>=0.20
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Requires-Dist: soundfile>=0.12
Requires-Dist: safetensors>=0.4
Requires-Dist: transformers>=4.40

# Mega-ASR MLX

End-to-end speech recognition on Apple Silicon, powered by MLX.

Mega-ASR is a fine-tuned **Qwen3-ASR 1.7B** model with merged LoRA weights, optimized for **Chinese/English code-switching speech**. The model runs entirely on-device via Apple MLX.

## Install

```bash
pip install mega-asr-mlx
```

## Quick Start

```bash
# Download model weights from HuggingFace (~4.4 GB)
huggingface-cli download voiceink/mega-asr-mlx --local-dir ~/.cache/voiceink/mega-asr-mlx

# Transcribe audio
mega-asr --audio speech.wav --language English
```

Or use as a Python library:

```python
from mega_asr_mlx import MegaASRMLX

model = MegaASRMLX("~/.cache/voiceink/mega-asr-mlx")
text = model.transcribe("speech.wav", language="English")
print(text)
```

## Model

| Component | Architecture | Size |
|-----------|-------------|------|
| Audio Encoder | Conv2D stem + 24-layer Transformer (1024-dim) | 606 MB |
| Decoder | Qwen3 28-layer (2048-dim, GQA 16/8) | 3.8 GB |
| Router | 4-layer Transformer audio quality classifier | 2.2 MB |

- **Languages**: Chinese, English (auto-detect)
- **Input**: 16 kHz mono WAV
- **Output**: Plain text transcription

## Requirements

- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
- Dependencies: mlx, mlx-lm, numpy, scipy, soundfile, safetensors, transformers

## License

MIT
