Metadata-Version: 2.4
Name: fasr-asr-firered
Version: 0.5.2
Summary: FireRed ASR for fasr (bundled fireredasr2 inference)
Author-email: fasr <wangmengdi06@58.com>
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: fasr
Requires-Dist: kaldiio>=2.18.0
Requires-Dist: kaldi-native-fbank>=1.19.0
Requires-Dist: librosa>=0.10.0
Requires-Dist: numpy>=1.24
Requires-Dist: torch>=2.0.0
Requires-Dist: torchaudio
Requires-Dist: transformers>=4.36

# fasr-asr-firered

[Chinese documentation](README_ZH.md)

FireRedASR2 speech recognition for fasr. The plugin exposes both AED decoding
and LLM decoding. AED can return token timestamps; LLM focuses on full-text
accuracy without timestamps.

## Install

```bash
pip install fasr-asr-firered
```

## Registered Models

| Registry name | Class | Best for |
|---|---|---|
| `firered` | `FireRedAEDForASR` | Default alias for AED mode |
| `firered_aed` | `FireRedAEDForASR` | Timestamped AED recognition |
| `firered_llm` | `FireRedLLMForASR` | LLM decoding, no timestamps |

Default checkpoints:

| Model | Checkpoint |
|---|---|
| `firered_aed` | `FireRedTeam/FireRedASR2-AED` |
| `firered_llm` | `FireRedTeam/FireRedASR2-LLM` |

## Pipeline Usage

```python
from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="fsmn")
    .add_pipe(
        "recognizer",
        model="firered_aed",
        device="cuda",
        beam_size=3,
        return_timestamp=True,
    )
    .add_pipe("sentencizer", model="ct_transformer")
)
```

Quick choices:

| Goal | Use | Result |
|---|---|---|
| Token timestamps | `model="firered_aed", return_timestamp=True` | Populates `span.tokens` |
| Full-text decoding | `model="firered_llm"` | Populates `span.raw_text`, no timestamps |
| Lower VRAM for AED | `use_half=True` | FP16 inference on GPU |
| CPU inference | `device="cpu"` | Runs without CUDA, slower |
| Wider search | `beam_size=5` | Potentially better accuracy, slower |

## Confection Config

```toml
[asr_model]
@asr_models = "firered_aed"
device = "cuda"
beam_size = 3
return_timestamp = true
use_half = true
```

Inside a pipeline:

```toml
[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["recognizer"]

[pipeline.pipes]

[pipeline.pipes.recognizer]
@pipes = "thread_pipe"
batch_size = 2

[pipeline.pipes.recognizer.component]
@components = "recognizer"

[pipeline.pipes.recognizer.component.model]
@asr_models = "firered_aed"
device = "cuda"
beam_size = 3
return_timestamp = true
```

## Direct Model Usage

```python
from fasr.config import registry

model = registry.asr_models.get("firered_aed")(
    device="cuda",
    beam_size=3,
    return_timestamp=True,
)

spans = model.transcribe(audio_spans)
for span in spans:
    print(span.text)
```

Use local weights:

```python
model.load_checkpoint("/path/to/FireRedASR2-AED")
```

## Shared Parameters

| Parameter | Type / range | Default | Higher / true | Lower / false | Change when |
|---|---|---|---|---|---|
| `device` | `str` or `None` | `None` | `"cuda"` uses GPU | `"cpu"` uses CPU | Deployment target changes |
| `beam_size` | `int >= 1` | `3` | Wider search, slower, more memory | Faster, possibly lower accuracy | Accuracy/speed tradeoff |
| `decode_max_len` | `int >= 0` | `0` | Allows longer outputs | Shorter cap; `0` lets backend decide | Output is truncated or too long |

## AED Parameters

| Parameter | Type / range | Default | Higher / true | Lower / false | Change when |
|---|---|---|---|---|---|
| `use_half` | `bool` | `True` | Lower VRAM, faster on GPU | FP32, more stable | GPU memory or numeric stability matters |
| `nbest` | `int >= 1` | `1` | More hypotheses | Single best result | You need alternative hypotheses |
| `softmax_smoothing` | `float` | `1.25` | Smoother distribution | Sharper distribution | Beam search needs tuning |
| `aed_length_penalty` | `float` | `0.6` | Favors different output lengths | Less length adjustment | Output is too short or too long |
| `eos_penalty` | `float` | `1.0` | Discourages ending too early | Easier EOS | Decoding ends too early or too late |
| `return_timestamp` | `bool` | `True` | Returns token timestamps | Text only | You need word/character timing |
| `elm_weight` | `float` | `0.0` | More external LM influence | `0.0` disables external LM | You provide `elm_dir` |

## LLM Parameters

| Parameter | Type / range | Default | Higher value | Lower value | Change when |
|---|---|---|---|---|---|
| `decode_min_len` | `int >= 0` | `0` | Forces longer minimum output | Allows shorter output | Output ends too early |
| `repetition_penalty` | `float` | `1.2` | Stronger repetition suppression | Allows more repetition | Repeated phrases appear |
| `llm_length_penalty` | `float` | `0.0` | Adjusts length preference | Less length adjustment | Output length is biased |
| `temperature` | `float >= 0` | `1.0` | More diverse, less deterministic | More deterministic | You need stability or diversity |

Generic checkpoint fields such as `checkpoint`, `cache_dir`, `endpoint`,
`revision`, and `force_download` are inherited from the base model.

## Output

- AED writes `span.raw_text`.
- AED also fills `span.tokens` when `return_timestamp=True`.
- LLM writes `span.raw_text` and leaves `span.tokens` empty.

## Dependencies

- `fasr`
- `torch >= 2.0.0`
- `torchaudio`
- `transformers >= 4.36`
- `librosa >= 0.10.0`
- `kaldiio >= 2.18.0`
- `kaldi-native-fbank >= 1.19.0`
- Python 3.10-3.12
