Metadata-Version: 2.4
Name: modern-asr
Version: 0.2.15
Summary: A unified, extensible, and modern Python toolkit for LLM-based Automatic Speech Recognition (ASR).
Project-URL: Homepage, https://github.com/vra/modern-asr
Project-URL: Repository, https://github.com/vra/modern-asr
Project-URL: Issues, https://github.com/vra/modern-asr/issues
Author: Modern ASR Contributors
License: Apache-2.0
Keywords: asr,audio,fireredasr,funasr,glm-asr,llm,mimo-asr,moonshine,qwen-asr,sensevoice,speech-recognition,whisper
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: typing-extensions>=4.8.0
Provides-Extra: all
Requires-Dist: accelerate>=0.25.0; extra == 'all'
Requires-Dist: cn2an>=0.5.23; extra == 'all'
Requires-Dist: funasr>=1.1.0; extra == 'all'
Requires-Dist: kaldi-native-fbank>=1.15; extra == 'all'
Requires-Dist: kaldiio>=2.18.0; extra == 'all'
Requires-Dist: modelscope>=1.15.0; extra == 'all'
Requires-Dist: onnxruntime-gpu>=1.17.0; (sys_platform != 'darwin') and extra == 'all'
Requires-Dist: onnxruntime>=1.17.0; extra == 'all'
Requires-Dist: openai-whisper>=20231117; extra == 'all'
Requires-Dist: peft>=0.13.2; extra == 'all'
Requires-Dist: qwen-asr>=0.0.6; extra == 'all'
Requires-Dist: sentencepiece>=0.2.0; extra == 'all'
Requires-Dist: soundfile>=0.12.0; extra == 'all'
Requires-Dist: torch>=2.0; extra == 'all'
Requires-Dist: torchaudio>=2.0; extra == 'all'
Requires-Dist: transformers==4.57.6; extra == 'all'
Requires-Dist: transformers>=4.40.0; extra == 'all'
Requires-Dist: useful-moonshine; extra == 'all'
Requires-Dist: vllm>=0.5.5; extra == 'all'
Provides-Extra: all-backends
Requires-Dist: accelerate>=0.25.0; extra == 'all-backends'
Requires-Dist: onnxruntime-gpu>=1.17.0; (sys_platform != 'darwin') and extra == 'all-backends'
Requires-Dist: onnxruntime>=1.17.0; extra == 'all-backends'
Requires-Dist: torch>=2.0; extra == 'all-backends'
Requires-Dist: torchaudio>=2.0; extra == 'all-backends'
Requires-Dist: transformers>=4.40.0; extra == 'all-backends'
Requires-Dist: vllm>=0.5.5; extra == 'all-backends'
Provides-Extra: all-models
Requires-Dist: accelerate>=0.25.0; extra == 'all-models'
Requires-Dist: cn2an>=0.5.23; extra == 'all-models'
Requires-Dist: funasr>=1.1.0; extra == 'all-models'
Requires-Dist: kaldi-native-fbank>=1.15; extra == 'all-models'
Requires-Dist: kaldiio>=2.18.0; extra == 'all-models'
Requires-Dist: modelscope>=1.15.0; extra == 'all-models'
Requires-Dist: openai-whisper>=20231117; extra == 'all-models'
Requires-Dist: peft>=0.13.2; extra == 'all-models'
Requires-Dist: qwen-asr>=0.0.6; extra == 'all-models'
Requires-Dist: sentencepiece>=0.2.0; extra == 'all-models'
Requires-Dist: soundfile>=0.12.0; extra == 'all-models'
Requires-Dist: torch>=2.0; extra == 'all-models'
Requires-Dist: torchaudio>=2.0; extra == 'all-models'
Requires-Dist: transformers==4.57.6; extra == 'all-models'
Requires-Dist: transformers>=4.40.0; extra == 'all-models'
Requires-Dist: useful-moonshine; extra == 'all-models'
Provides-Extra: firered-asr
Requires-Dist: cn2an>=0.5.23; extra == 'firered-asr'
Requires-Dist: kaldi-native-fbank>=1.15; extra == 'firered-asr'
Requires-Dist: kaldiio>=2.18.0; extra == 'firered-asr'
Requires-Dist: peft>=0.13.2; extra == 'firered-asr'
Requires-Dist: sentencepiece>=0.2.0; extra == 'firered-asr'
Requires-Dist: soundfile>=0.12.0; extra == 'firered-asr'
Requires-Dist: torch>=2.0; extra == 'firered-asr'
Requires-Dist: torchaudio>=2.0; extra == 'firered-asr'
Requires-Dist: transformers>=4.40.0; extra == 'firered-asr'
Provides-Extra: fun-asr
Requires-Dist: funasr>=1.1.0; extra == 'fun-asr'
Requires-Dist: modelscope>=1.15.0; extra == 'fun-asr'
Requires-Dist: torch>=2.0; extra == 'fun-asr'
Requires-Dist: torchaudio>=2.0; extra == 'fun-asr'
Provides-Extra: glm-asr
Requires-Dist: accelerate>=0.25.0; extra == 'glm-asr'
Requires-Dist: sentencepiece>=0.2.0; extra == 'glm-asr'
Requires-Dist: torch>=2.0; extra == 'glm-asr'
Requires-Dist: torchaudio>=2.0; extra == 'glm-asr'
Requires-Dist: transformers>=4.40.0; extra == 'glm-asr'
Provides-Extra: mimo-asr
Requires-Dist: soundfile>=0.12.0; extra == 'mimo-asr'
Requires-Dist: torch>=2.0; extra == 'mimo-asr'
Requires-Dist: torchaudio>=2.0; extra == 'mimo-asr'
Requires-Dist: transformers>=4.40.0; extra == 'mimo-asr'
Provides-Extra: moonshine
Requires-Dist: torch>=2.0; extra == 'moonshine'
Requires-Dist: useful-moonshine; extra == 'moonshine'
Provides-Extra: onnx
Requires-Dist: onnxruntime-gpu>=1.17.0; (sys_platform != 'darwin') and extra == 'onnx'
Requires-Dist: onnxruntime>=1.17.0; extra == 'onnx'
Provides-Extra: qwen-asr
Requires-Dist: qwen-asr>=0.0.6; extra == 'qwen-asr'
Requires-Dist: soundfile>=0.12.0; extra == 'qwen-asr'
Requires-Dist: torch>=2.0; extra == 'qwen-asr'
Requires-Dist: torchaudio>=2.0; extra == 'qwen-asr'
Requires-Dist: transformers==4.57.6; extra == 'qwen-asr'
Provides-Extra: sensevoice
Requires-Dist: funasr>=1.1.0; extra == 'sensevoice'
Requires-Dist: modelscope>=1.15.0; extra == 'sensevoice'
Requires-Dist: torch>=2.0; extra == 'sensevoice'
Requires-Dist: torchaudio>=2.0; extra == 'sensevoice'
Provides-Extra: transformers
Requires-Dist: accelerate>=0.25.0; extra == 'transformers'
Requires-Dist: torch>=2.0; extra == 'transformers'
Requires-Dist: torchaudio>=2.0; extra == 'transformers'
Requires-Dist: transformers>=4.40.0; extra == 'transformers'
Provides-Extra: vllm
Requires-Dist: torch>=2.0; extra == 'vllm'
Requires-Dist: vllm>=0.5.5; extra == 'vllm'
Provides-Extra: whisper
Requires-Dist: openai-whisper>=20231117; extra == 'whisper'
Requires-Dist: torch>=2.0; extra == 'whisper'
Description-Content-Type: text/markdown

# Modern ASR

<p align="center">
  <strong>A unified, extensible, future-proof toolkit for locally running state-of-the-art LLM-based ASR models.</strong>
</p>

<p align="center">
  <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.10%2B-blue" alt="Python 3.10+"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-green" alt="Apache 2.0"></a>
  <a href="https://pypi.org/project/modern-asr/"><img src="https://img.shields.io/pypi/v/modern-asr" alt="PyPI"></a>
</p>

<p align="center">
  <a href="README_zh.md">简体中文</a> ·
  <a href="#features">Features</a> ·
  <a href="#installation">Installation</a> ·
  <a href="#supported-models">Models</a> ·
  <a href="#quick-start">Quick Start</a> ·
  <a href="#architecture">Architecture</a>
</p>

---

## ✨ Features

- **🧩 19 Models** — Whisper, SenseVoice, Qwen, MiMo, FireRedASR, GLM-ASR, and more.
- **🔌 Zero-Config Plugin** — Add new models via `@register_model` decorator.
- **🚀 Runtime Hot-Swap** — Switch models without restarting the process.
- **🌍 Multi-Language** — 52 languages, 22 Chinese dialects.
- **🎯 Multi-Task** — Transcription, translation, diarization, emotion, events.
- **💻 Local-First** — All inference on-device. No API keys. No data leaves your machine.
- **🍎 Apple Silicon** — Native MPS (Metal Performance Shaders) support.
- **📦 Auto-Install** — Dependencies, git repos, and HF weights are installed automatically on first use.
- **🐍 Modern Python** — Pydantic configs, rich CLI, ISO-timestamped logging.

---

## 📦 Installation

```bash
pip install modern-asr
```

Dependencies and model weights are **installed automatically** the first time you use a model — just type its name:

```python
from modern_asr import ASRPipeline

pipe = ASRPipeline("sensevoice-small")
pipe = ASRPipeline("mimo-asr-v2.5")
pipe = ASRPipeline("whisper-small")
```

For offline/air-gapped environments, pre-install everything:

```bash
pip install modern-asr[all-models]
```

**Available extras:** `transformers`, `vllm`, `onnx`, `firered-asr`, `sensevoice`, `fun-asr`, `qwen-asr`, `mimo-asr`, `glm-asr`, `whisper`, `moonshine`, `all-models`, `all-backends`, `all`.

**Requirements:** Python ≥ 3.10.

---

## 🧩 Supported Models

| Series | Model ID | Params | Languages | Extra |
|--------|----------|--------|-----------|-------|
| **Whisper** (OpenAI) | `whisper-tiny` | 39M | 99+ | `whisper` |
| | `whisper-base` | 74M | 99+ | `whisper` |
| | `whisper-small` | 244M | 99+ | `whisper` |
| | `whisper-medium` | 769M | 99+ | `whisper` |
| | `whisper-large-v3` | 1.5B | 99+ | `whisper` |
| | `whisper-large-v3-turbo` | 809M | 99+ | `whisper` |
| **SenseVoice** (Alibaba) | `sensevoice-small` | 234M | zh/en/ja/ko/yue | `sensevoice` |

| **Qwen3-ASR** (Alibaba) | `qwen3-asr-0.6b` | 0.6B | 22 dialects | `qwen-asr` |
| | `qwen3-asr-1.7b` | 1.7B | 22 dialects | `qwen-asr` |

| **FunASR / Paraformer** (Alibaba) | `funasr-nano` | 0.8B | zh/en | `fun-asr` |
| | `paraformer-zh` | 0.2B | zh | `fun-asr` |
| | `paraformer-large` | 0.7B | zh | `fun-asr` |
| **FireRedASR** (Xiaohongshu) | `fireredasr-aed` | 1.1B | zh | `firered-asr` |
| | `fireredasr-llm` | 8.3B | zh | `firered-asr` |
| **MiMo-ASR** (Xiaomi) | `mimo-asr-v2.5` | 8B | zh/dialects | `mimo-asr` |
| **MiDasheng** (Xiaomi) | `midashenglm-7b` | 7B | audio understanding | `mimo-asr` |

| **GLM-ASR** (Zhipu AI) | `glm-asr-nano-2512` | 1.5B | zh/en/yue | `glm-asr` |
| **Granite Speech** (IBM) | `granite-speech-3.3-8b` | 8B | en | `transformers` |
| **Moonshine** (Useful Sensors) | `moonshine-tiny` | 27M | en | `moonshine` |


```bash
# List all available models
python -m modern_asr list
```

---

## 🚀 Quick Start

```python
from modern_asr import ASRPipeline

# Chinese with SenseVoice
pipe = ASRPipeline("sensevoice-small")
result = pipe("audio.wav", language="zh")
print(result.text)

# Switch to Qwen3-ASR for dialects
pipe.switch_model("qwen3-asr-0.6b")
result = pipe("audio.wav", language="zh")
print(result.text)

# English with Whisper
pipe.switch_model("whisper-small")
result = pipe("audio.wav", language="en")
print(result.text)
```

---

## 🏗️ Architecture

Modern ASR is built on three layers:

1. **ASRPipeline** — Unified user API. Input normalization, task dispatch, model lifecycle.
2. **ASRModel / AudioLLMModel** — Adapter layer. New models often need only **8 lines of config**.
3. **Backends** — Transformers, vLLM, ONNX Runtime.

### Adding a New Model

```python
from modern_asr.core.audio_llm import AudioLLMModel
from modern_asr.core.registry import register_model

@register_model("my-model-1b")
class MyModel1B(AudioLLMModel):
    HF_PATH = "org/MyModel-1B"
    SUPPORTED_LANGUAGES = {"zh", "en"}
    CHUNK_DURATION = 30.0

    @property
    def model_id(self) -> str:
        return "my-model-1b"
```

The registry auto-discovers it at runtime. That's it.

---

## 📚 Documentation

Full documentation with Material for MkDocs:

```bash
mkdocs serve
```

---

## 🤝 Contributing

See [Contributing Guide](docs/contributing.md) for development setup, code style, and PR checklist.

---

## 📄 License

Apache-2.0

---
