Metadata-Version: 2.4
Name: fasr-vad-silero
Version: 0.5.2
Summary: Silero VAD model for fasr
Author-email: fasr <790990241@qq.com>
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: fasr
Requires-Dist: numpy>=1.24
Requires-Dist: torch>=2.0.0

# fasr-vad-silero

[Chinese documentation](README_ZH.md)

Streaming Silero VAD for fasr. The model is loaded through `torch.hub` and
wrapped as a fasr streaming `VADModel` that emits `AudioChunk` objects with
`segment_start`, `segment_mid`, and `segment_end` states.

## Install

```bash
pip install fasr-vad-silero
```

## Registered Model

| Registry name | Class | Best for |
|---|---|---|
| `stream_silero` | `SileroStreamVAD` | Lightweight streaming VAD |

## Streaming Usage

```python
from fasr.config import registry

vad = registry.vad_models.get("stream_silero")(
    threshold=0.5,
    silence_duration_ms=400,
)

for input_chunk in audio_chunks:
    for speech_chunk in vad.push_chunk(input_chunk):
        print(speech_chunk.vad_state, speech_chunk.start_ms, speech_chunk.end_ms)
```

Quick choices:

| Goal | Use | Result |
|---|---|---|
| Reduce noise triggers | `threshold=0.65` | Requires higher speech probability |
| Keep quiet speech | `threshold=0.35` | More sensitive, with more false-positive risk |
| End speech sooner | `silence_duration_ms=200` | Lower endpoint latency |
| Avoid chopping pauses | `silence_duration_ms=700` | More tolerant of short pauses |

## Confection Config

```toml
[vad_model]
@vad_models = "stream_silero"
threshold = 0.5
silence_duration_ms = 400
sample_rate = 16000
chunk_size_ms = 32
```

## Parameters

| Parameter | Type / range | Default | Higher value | Lower value | Change when |
|---|---|---|---|---|---|
| `threshold` | `float`, `0.0` to `1.0` | `0.5` | More conservative; fewer noise starts | More sensitive; more weak speech | Noise triggers starts, or quiet speech is missed |
| `silence_duration_ms` | `int >= 0` | `400` | Longer pauses before ending speech | Faster endpoint | Speech is chopped, or endpoint is late |
| `sample_rate` | `int` | `16000` | Keep at 16 kHz for Silero | Keep at 16 kHz for Silero | Usually do not change |
| `chunk_size_ms` | `int` | `32` | Larger input chunks, fewer calls | Smaller chunks, lower latency | Realtime scheduling needs tuning |

## Tuning Guide

| Symptom | Try first |
|---|---|
| Background noise starts speech | Raise `threshold` to `0.6` or `0.7` |
| Quiet speech is missed | Lower `threshold` to `0.35` or `0.4` |
| Speech ends too late | Lower `silence_duration_ms` to `200` or `300` |
| Speech is split during pauses | Raise `silence_duration_ms` to `600` or `800` |

## Dependencies

- `fasr`
- `numpy >= 1.24`
- `torch >= 2.0.0`
- Python 3.10-3.12
