Metadata-Version: 2.4
Name: fasr-vad-firered
Version: 0.5.2
Summary: FireRedVAD for fasr (bundled fireredvad inference)
Author-email: fasr <wangmengdi06@58.com>
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: fasr
Requires-Dist: kaldiio>=2.18.0
Requires-Dist: kaldi-native-fbank>=1.19.0
Requires-Dist: numpy>=1.24
Requires-Dist: soundfile>=0.12.0
Requires-Dist: torch>=2.0.0

# fasr-vad-firered

[Chinese documentation](README_ZH.md)

FireRedVAD voice activity detection for fasr. This is an offline neural VAD
that loads FireRed's PyTorch checkpoint and returns `AudioSpan` speech
segments.

## Install

```bash
pip install fasr-vad-firered
```

## Registered Model

| Registry name | Class | Best for |
|---|---|---|
| `firered` | `FireRedForVAD` | Offline VAD with FireRed checkpoints |

The default checkpoint is `FireRedTeam/FireRedVAD`. Local checkpoint directories
must contain the upstream VAD files, typically `cmvn.ark` and `model.pth.tar`
or a `VAD/` subdirectory containing them.

## Pipeline Usage

```python
from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe(
        "detector",
        model="firered",
        speech_threshold=0.4,
        use_gpu=False,
    )
    .add_pipe("recognizer", model="firered_aed")
    .add_pipe("sentencizer", model="ct_transformer")
)
```

Quick choices:

| Goal | Use | Result |
|---|---|---|
| Reduce noise false positives | `speech_threshold=0.55` | Requires stronger speech posterior |
| Keep weak speech | `speech_threshold=0.3` | More sensitive, but may include noise |
| Use GPU inference | `use_gpu=True` | Faster when CUDA is available |

## Confection Config

```toml
[vad_model]
@vad_models = "firered"
use_gpu = false
speech_threshold = 0.4
```

Inside a pipeline:

```toml
[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["detector"]

[pipeline.pipes]

[pipeline.pipes.detector]
@pipes = "thread_pipe"

[pipeline.pipes.detector.component]
@components = "detector"

[pipeline.pipes.detector.component.model]
@vad_models = "firered"
use_gpu = false
speech_threshold = 0.4
```

## Direct Model Usage

```python
from fasr.config import registry
from fasr.data import AudioSpan, Waveform

model = registry.vad_models.get("firered")(
    speech_threshold=0.4,
    use_gpu=True,
)

audio = AudioSpan(waveform=Waveform.from_file("example.wav"), start_ms=0)
segments = model.detect(audio)
for segment in segments:
    print(f"{segment.start_ms}ms - {segment.end_ms}ms")
```

Use local weights:

```python
model.load_checkpoint("/path/to/FireRedVAD")
```

## Parameters

| Parameter | Type / range | Default | Higher value | Lower value | Change when |
|---|---|---|---|---|---|
| `use_gpu` | `bool` | `False` | Enables CUDA inference | Uses CPU | You have CUDA available and need speed |
| `speech_threshold` | `float`, `0.0` to `1.0` | `0.4` | More conservative; fewer false positives | More sensitive; more weak speech retained | Noise leaks in, or speech is missed |

Generic checkpoint fields such as `checkpoint`, `cache_dir`, `endpoint`,
`revision`, and `force_download` are inherited from the base model.

## Tuning Guide

| Symptom | Try first |
|---|---|
| Noise is detected as speech | Raise `speech_threshold` to `0.5` or `0.6` |
| Quiet speech is missed | Lower `speech_threshold` to `0.3` |
| CPU inference is too slow | Set `use_gpu=True` on a CUDA machine |

## Dependencies

- `fasr`
- `torch >= 2.0.0`
- `soundfile >= 0.12.0`
- `kaldiio >= 2.18.0`
- `kaldi-native-fbank >= 1.19.0`
- Python 3.10-3.12
