Metadata-Version: 2.4
Name: fasr
Version: 0.5.1
Summary: FASR: Fast Automatic Speech Recognition Pipeline
Author-email: osc <790990241@qq.com>
Requires-Python: <3.13,>=3.10
Requires-Dist: aiohttp>=3.10.10
Requires-Dist: catalogue>=2.0.10
Requires-Dist: confection>=0.1.5
Requires-Dist: docarray==0.40
Requires-Dist: editdistance>=0.8.1
Requires-Dist: fastapi>=0.115.4
Requires-Dist: huggingface-hub>=0.27.0
Requires-Dist: joblib>=1.4.2
Requires-Dist: jsonargparse[signatures,urls]>=4.38
Requires-Dist: kaldifst>=1.7.14
Requires-Dist: loguru>=0.7.2
Requires-Dist: modelscope>=1.19.1
Requires-Dist: nest-asyncio>=1.6.0
Requires-Dist: numpy>=1.24
Requires-Dist: protobuf>=3.20.0
Requires-Dist: pydantic>=2.9.2
Requires-Dist: soundfile>=0.12.1
Requires-Dist: soxr>=0.5.0
Requires-Dist: uvicorn>=0.32.0
Requires-Dist: wasabi>=1.1.3
Provides-Extra: benchmark
Requires-Dist: pyaudio>=0.2.14; extra == 'benchmark'
Provides-Extra: litdata
Requires-Dist: litdata>=0.2.46; extra == 'litdata'
Description-Content-Type: text/markdown

![fasr logo](assets/logo.svg)

# fasr

`fasr` is a production-ready Python speech inference framework designed for building composable and extensible speech processing systems.

It is built around `AudioPipeline`, which lets you freely compose VAD, ASR, punctuation restoration, language identification, and custom components for offline transcription, batch jobs, and online services.

📖 Chinese README: [README_ZH.md](README_ZH.md)

## Key Features

- **Plugin-based model ecosystem**: install model plugins on demand and combine them as needed.
- **Engineering-friendly pipeline**: unified data structures and component interfaces for easier maintenance.
- **High-performance inference**: asynchronous component execution to better leverage CPU/GPU resources.
- **Production-oriented design**: supports batch, streaming, and service deployment patterns.

## Pipeline Components

- `loader`: loads local/remote audio and builds an `Audio` object.
- `detector`: VAD endpoint detection, splitting audio into speech `segments`.
- `recognizer`: ASR transcription for each detected segment.
- `sentencizer`: punctuation restoration and sentence segmentation.
- `identifier`: optional language identification (LID) for multilingual use cases.
- `custom`: custom components via `add_pipe()` for additional pre/post-processing.

## Quick Start

`fasr` model capabilities are provided by plugin packages. Install the plugins you need first:

```bash
pip install fasr-vad-marblenet fasr-asr-qwen3
```

```python
import fasr
from fasr import AudioPipeline

# Build pipeline (model weights are downloaded automatically on first run)
asr = (
    AudioPipeline()
    .add_pipe("detector", model="marblenet")
    .add_pipe("recognizer", model="qwen3_0_6b")
)

# Single-audio inference
audio = asr("example.wav")
for channel in audio.channels:
    print(channel.text)

# Batch inference
audios = asr.run(["1.wav", "2.wav", "3.wav"])
for audio in audios:
    for channel in audio.channels:
        print(channel.text)

# Streaming output for large batches (yield one by one)
for audio in asr.stream(["1.wav", "2.wav", "3.wav"]):
    for channel in audio.channels:
        print(channel.text)
```

## Model Plugins

Install task-specific plugins and compose your own pipeline:

| Task | Plugin package | Typical model IDs |
|---|---|---|
| VAD | `fasr-vad-marblenet` / `fasr-vad-fsmn` / `fasr-vad-firered` | `marblenet` / `fsmn` / `firered` |
| ASR | `fasr-asr-qwen3` / `fasr-asr-paraformer` / `fasr-asr-firered` / `fasr-asr-fun` | `qwen3_0_6b` / `paraformer` / `firered_aed` / `fun_asr_nano` |
| Punc | `fasr-punc-ct-transformer` | `ct_transformer` |
| LID | `fasr-lid-firered` | `firered` |

## How to Build a Model Plugin

The plugin mechanism is based on [`catalogue`](https://github.com/explosion/catalogue)
+ Python entry points, with **install-time registration and lazy loading at runtime**.

1. Inherit the task base class and register with `registry`:

```python
# my_fasr_vad/models/vad.py
from fasr.config import registry
from fasr.data import AudioSpan
from fasr.model import VADModel


@registry.vad_models.register("my_vad")
class MyVADModel(VADModel):
    def load_checkpoint(self, checkpoint_dir=None): ...
    def detect(self, audio: AudioSpan): ...
```

   `VADModel` now mirrors `ASRModel`: a single base class can expose offline
   VAD via `detect()` and streaming VAD via `push_chunk()`. A streaming-capable
   model may therefore implement:

```python
from typing import Iterable

from fasr.data import AudioChunk


def push_chunk(self, chunk: AudioChunk) -> Iterable[AudioChunk]:
    ...
```

   The streaming path emits VAD events (`segment_start` / `segment_mid` /
   `segment_end`) as `AudioChunk` objects so downstream streaming ASR can start,
   continue, and flush recognition at the correct boundaries.

2. Declare the entry point in `pyproject.toml`.
   **Use underscore-style group names** that match `catalogue` namespaces:

```toml
[project.entry-points."fasr_vad_models"]
my_vad = "my_fasr_vad.models.vad:MyVADModel"
```

   Task to entry point group mapping:

   | Task | Entry point group |
   |---|---|
   | ASR | `fasr_asr_models` |
   | Streaming ASR | `fasr_stream_asr_models` |
   | VAD | `fasr_vad_models` |
   | Streaming VAD | `fasr_stream_vad_models` |
   | Punc | `fasr_punc_models` |
   | LID | `fasr_lid_models` |

   > Do **not** use dot-style groups such as `fasr.vad_models`.
   > `catalogue` resolves entry points by underscore naming, and dot-style names
   > will not be discovered by `registry.resolve(cfg)`.
   > `fasr_stream_vad_models` is still kept as a compatibility entry point for
   > streaming-only VAD plugins, but the runtime interface is unified on
   > `VADModel.push_chunk(...)`.
   > If a registration name contains dots (for example `stream_fsmn.onnx`),
   > quote it in TOML:
   > `"stream_fsmn.onnx" = "..."`.

3. Install your plugin package (`pip install -e .` or `uv pip install -e .`).
   Then `@vad_models = "my_vad"` in config will be discovered automatically by
   `fasr.load(cfg)` without manual imports. Heavy dependencies from other plugins
   (for example `torch`, `vllm`) are not loaded unless those plugins are selected.
