Metadata-Version: 2.4
Name: modern-asr
Version: 0.2.5
Summary: A unified, extensible, and modern Python toolkit for LLM-based Automatic Speech Recognition (ASR).
Project-URL: Homepage, https://github.com/vra/modern-asr
Project-URL: Repository, https://github.com/vra/modern-asr
Project-URL: Issues, https://github.com/vra/modern-asr/issues
Author: Modern ASR Contributors
License: Apache-2.0
Keywords: asr,audio,canary-qwen,fireredasr,funasr,glm-asr,llm,mimo-asr,moonshine,qwen-asr,sensevoice,speech-recognition,whisper
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: typing-extensions>=4.8.0
Provides-Extra: all
Requires-Dist: accelerate>=0.25.0; extra == 'all'
Requires-Dist: cn2an>=0.5.23; extra == 'all'
Requires-Dist: funasr>=1.1.0; extra == 'all'
Requires-Dist: kaldi-native-fbank>=1.15; extra == 'all'
Requires-Dist: kaldiio>=2.18.0; extra == 'all'
Requires-Dist: modelscope>=1.15.0; extra == 'all'
Requires-Dist: moonshine>=0.1.0; extra == 'all'
Requires-Dist: nemo-toolkit[asr]>=2.0; extra == 'all'
Requires-Dist: onnxruntime-gpu>=1.17.0; (sys_platform != 'darwin') and extra == 'all'
Requires-Dist: onnxruntime>=1.17.0; extra == 'all'
Requires-Dist: openai-whisper>=20231117; extra == 'all'
Requires-Dist: peft>=0.13.2; extra == 'all'
Requires-Dist: qwen-asr>=0.0.6; extra == 'all'
Requires-Dist: sentencepiece>=0.2.0; extra == 'all'
Requires-Dist: soundfile>=0.12.0; extra == 'all'
Requires-Dist: torch>=2.0; extra == 'all'
Requires-Dist: torchaudio>=2.0; extra == 'all'
Requires-Dist: transformers>=4.40.0; extra == 'all'
Requires-Dist: vllm>=0.5.5; extra == 'all'
Provides-Extra: all-backends
Requires-Dist: accelerate>=0.25.0; extra == 'all-backends'
Requires-Dist: onnxruntime-gpu>=1.17.0; (sys_platform != 'darwin') and extra == 'all-backends'
Requires-Dist: onnxruntime>=1.17.0; extra == 'all-backends'
Requires-Dist: torch>=2.0; extra == 'all-backends'
Requires-Dist: torchaudio>=2.0; extra == 'all-backends'
Requires-Dist: transformers>=4.40.0; extra == 'all-backends'
Requires-Dist: vllm>=0.5.5; extra == 'all-backends'
Provides-Extra: all-models
Requires-Dist: accelerate>=0.25.0; extra == 'all-models'
Requires-Dist: cn2an>=0.5.23; extra == 'all-models'
Requires-Dist: funasr>=1.1.0; extra == 'all-models'
Requires-Dist: kaldi-native-fbank>=1.15; extra == 'all-models'
Requires-Dist: kaldiio>=2.18.0; extra == 'all-models'
Requires-Dist: modelscope>=1.15.0; extra == 'all-models'
Requires-Dist: moonshine>=0.1.0; extra == 'all-models'
Requires-Dist: nemo-toolkit[asr]>=2.0; extra == 'all-models'
Requires-Dist: onnxruntime>=1.17.0; extra == 'all-models'
Requires-Dist: openai-whisper>=20231117; extra == 'all-models'
Requires-Dist: peft>=0.13.2; extra == 'all-models'
Requires-Dist: qwen-asr>=0.0.6; extra == 'all-models'
Requires-Dist: sentencepiece>=0.2.0; extra == 'all-models'
Requires-Dist: soundfile>=0.12.0; extra == 'all-models'
Requires-Dist: torch>=2.0; extra == 'all-models'
Requires-Dist: torchaudio>=2.0; extra == 'all-models'
Requires-Dist: transformers>=4.40.0; extra == 'all-models'
Provides-Extra: canary-qwen
Requires-Dist: nemo-toolkit[asr]>=2.0; extra == 'canary-qwen'
Requires-Dist: torch>=2.0; extra == 'canary-qwen'
Provides-Extra: firered-asr
Requires-Dist: cn2an>=0.5.23; extra == 'firered-asr'
Requires-Dist: kaldi-native-fbank>=1.15; extra == 'firered-asr'
Requires-Dist: kaldiio>=2.18.0; extra == 'firered-asr'
Requires-Dist: peft>=0.13.2; extra == 'firered-asr'
Requires-Dist: sentencepiece>=0.2.0; extra == 'firered-asr'
Requires-Dist: soundfile>=0.12.0; extra == 'firered-asr'
Requires-Dist: torch>=2.0; extra == 'firered-asr'
Requires-Dist: torchaudio>=2.0; extra == 'firered-asr'
Requires-Dist: transformers>=4.40.0; extra == 'firered-asr'
Provides-Extra: fun-asr
Requires-Dist: funasr>=1.1.0; extra == 'fun-asr'
Requires-Dist: modelscope>=1.15.0; extra == 'fun-asr'
Requires-Dist: torch>=2.0; extra == 'fun-asr'
Requires-Dist: torchaudio>=2.0; extra == 'fun-asr'
Provides-Extra: glm-asr
Requires-Dist: accelerate>=0.25.0; extra == 'glm-asr'
Requires-Dist: sentencepiece>=0.2.0; extra == 'glm-asr'
Requires-Dist: torch>=2.0; extra == 'glm-asr'
Requires-Dist: torchaudio>=2.0; extra == 'glm-asr'
Requires-Dist: transformers>=4.40.0; extra == 'glm-asr'
Provides-Extra: mimo-asr
Requires-Dist: soundfile>=0.12.0; extra == 'mimo-asr'
Requires-Dist: torch>=2.0; extra == 'mimo-asr'
Requires-Dist: torchaudio>=2.0; extra == 'mimo-asr'
Requires-Dist: transformers>=4.40.0; extra == 'mimo-asr'
Provides-Extra: moonshine
Requires-Dist: moonshine>=0.1.0; extra == 'moonshine'
Requires-Dist: onnxruntime>=1.17.0; extra == 'moonshine'
Requires-Dist: torch>=2.0; extra == 'moonshine'
Provides-Extra: onnx
Requires-Dist: onnxruntime-gpu>=1.17.0; (sys_platform != 'darwin') and extra == 'onnx'
Requires-Dist: onnxruntime>=1.17.0; extra == 'onnx'
Provides-Extra: qwen-asr
Requires-Dist: qwen-asr>=0.0.6; extra == 'qwen-asr'
Requires-Dist: soundfile>=0.12.0; extra == 'qwen-asr'
Requires-Dist: torch>=2.0; extra == 'qwen-asr'
Requires-Dist: torchaudio>=2.0; extra == 'qwen-asr'
Requires-Dist: transformers>=4.40.0; extra == 'qwen-asr'
Provides-Extra: sensevoice
Requires-Dist: funasr>=1.1.0; extra == 'sensevoice'
Requires-Dist: modelscope>=1.15.0; extra == 'sensevoice'
Requires-Dist: torch>=2.0; extra == 'sensevoice'
Requires-Dist: torchaudio>=2.0; extra == 'sensevoice'
Provides-Extra: transformers
Requires-Dist: accelerate>=0.25.0; extra == 'transformers'
Requires-Dist: torch>=2.0; extra == 'transformers'
Requires-Dist: torchaudio>=2.0; extra == 'transformers'
Requires-Dist: transformers>=4.40.0; extra == 'transformers'
Provides-Extra: vllm
Requires-Dist: torch>=2.0; extra == 'vllm'
Requires-Dist: vllm>=0.5.5; extra == 'vllm'
Provides-Extra: whisper
Requires-Dist: openai-whisper>=20231117; extra == 'whisper'
Requires-Dist: torch>=2.0; extra == 'whisper'
Description-Content-Type: text/markdown

# Modern ASR

A **unified, extensible, and future-proof** Python toolkit for locally running state-of-the-art LLM-based Automatic Speech Recognition (ASR) models.

[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/license-Apache%202.0-green)](LICENSE)

---

## ✨ Features

- 🧩 **23 Models** — Whisper, SenseVoice, Qwen, MiMo, FireRedASR, GLM-ASR, and more
- 🔌 **Plugin Architecture** — Add new models with `@register_model` decorator
- 🚀 **Hot-Swap** — Switch models at runtime without restarting
- 🌍 **Multi-Language** — 52 languages, 22 Chinese dialects
- 🎯 **Multi-Task** — Transcription, translation, diarization, emotion, events
- 💻 **Local-First** — All inference on-device. No APIs. No data leaves your machine.
- 🍎 **Apple Silicon** — MPS (Metal Performance Shaders) support on macOS
- 🐍 **Modern Python** — uv-native packaging, Pydantic configs, rich CLI

---

## 📦 Installation

### From PyPI (Recommended)

```bash
pip install modern-asr
```

**模型依赖和权重会在第一次使用时自动安装** — 你只需要输入模型名字，其余全自动化：

```python
from modern_asr import ASRPipeline

# SenseVoice — 自动安装 funasr、modelscope，自动下载权重
pipe = ASRPipeline("sensevoice-small")

# MiMo-ASR — 自动 clone 官方仓库，自动下载 HF 权重
pipe = ASRPipeline("mimo-asr-v2.5")

# Whisper — 自动安装 openai-whisper，自动下载权重
pipe = ASRPipeline("whisper-small")
```

如果需要预装所有依赖（离线环境）：
```bash
pip install modern-asr[all-models]
```

**Available extras:** `transformers`, `vllm`, `onnx`, `firered-asr`, `sensevoice`, `fun-asr`, `qwen-asr`, `mimo-asr`, `canary-qwen`, `glm-asr`, `whisper`, `moonshine`, `all-models`, `all-backends`, `all`.

### From Source

```bash
# Clone the repository
git clone https://github.com/vra/modern-asr.git
cd modern-asr

# Sync dependencies (recommended)
uv sync --all-extras

# Or install specific extras only
uv sync --extra transformers --extra whisper

# Or just core dependencies
uv sync
```

**Python 3.10+ recommended.** Some models (Qwen3-ASR, MiMo) require Python ≥ 3.10.

---

## 🚀 Quick Start

```python
from modern_asr import ASRPipeline

# Transcribe with SenseVoice (Alibaba)
pipe = ASRPipeline("sensevoice-small")
result = pipe("audio.wav", language="zh")
print(result.text)

# Switch to Qwen3-ASR for dialect support
pipe.switch_model("qwen3-asr-0.6b")
result = pipe("audio.wav", language="zh")
print(result.text)

# English with Whisper
pipe.switch_model("whisper-small")
result = pipe("audio.wav", language="en")
```

---

## 📚 Documentation

Full documentation with Material for MkDocs:

```bash
mkdocs serve
```

---

## 🏗️ Architecture

Modern ASR is built on three layers:

1. **ASRPipeline** — Unified user API. Handles input normalization, task dispatch, model lifecycle.
2. **ASRModel / AudioLLMModel** — Adapter layer. New models often need only **8 lines of config** via `AudioLLMModel`.
3. **Backends** — Transformers, vLLM, ONNX Runtime.

### Adding a New Model

```python
from modern_asr.core.audio_llm import AudioLLMModel
from modern_asr.core.registry import register_model

@register_model("my-model-1b")
class MyModel1B(AudioLLMModel):
    HF_PATH = "org/MyModel-1B"
    SUPPORTED_LANGUAGES = {"zh", "en"}
    CHUNK_DURATION = 30.0

    @property
    def model_id(self) -> str:
        return "my-model-1b"
```

That's it. The registry auto-discovers it at runtime.

---

## 🤝 Contributing

See [Contributing Guide](docs/contributing.md) for development setup, code style, and PR checklist.

---

## 📄 License

Apache-2.0
