Metadata-Version: 2.4
Name: vadonnx
Version: 0.1.0
Summary: Load arbitrary Voice Activity Detection (VAD) models behind a unified ONNX API
Author-email: JarbasAi <jarbasai@mailfence.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/TigreGotico/vadonnx
Project-URL: Repository, https://github.com/TigreGotico/vadonnx
Keywords: vad,voice activity detection,onnx,speech,silero,audio
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Intended Audience :: Developers
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: onnxruntime>=1.17
Requires-Dist: huggingface_hub
Provides-Extra: fsmn
Requires-Dist: kaldi-native-fbank; extra == "fsmn"
Provides-Extra: mic
Requires-Dist: sounddevice; extra == "mic"
Provides-Extra: resample
Requires-Dist: scipy; extra == "resample"
Provides-Extra: all
Requires-Dist: kaldi-native-fbank; extra == "all"
Requires-Dist: sounddevice; extra == "all"
Requires-Dist: scipy; extra == "all"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: convert
Requires-Dist: onnx; extra == "convert"
Requires-Dist: huggingface_hub; extra == "convert"
Requires-Dist: requests; extra == "convert"
Provides-Extra: convert-marblenet
Requires-Dist: vadonnx[convert]; extra == "convert-marblenet"
Requires-Dist: nemo_toolkit[asr]; extra == "convert-marblenet"
Requires-Dist: torch; extra == "convert-marblenet"
Provides-Extra: convert-speechbrain
Requires-Dist: vadonnx[convert]; extra == "convert-speechbrain"
Requires-Dist: speechbrain; extra == "convert-speechbrain"
Requires-Dist: torch; extra == "convert-speechbrain"
Dynamic: license-file

# vadonnx

Load arbitrary **Voice Activity Detection** models behind a single, unified API — every
model runs through [ONNX Runtime](https://onnxruntime.ai/). One streaming/​batch
interface, one audio-format story, pluggable models.

```python
from vadonnx import load_vad

vad = load_vad("silero")                              # bundled, works fully offline
prob = vad.process_chunk(pcm_bytes)                   # streaming → float in [0, 1]
segments = vad.get_speech_segments(audio, sample_rate=16000)
# -> [SpeechSegment(start=0.32, end=2.27), SpeechSegment(start=3.27, end=4.45), ...]
```

## Why

Every VAD ships its own loader, audio format, feature pipeline and state handling.
`vadonnx` hides that behind one `VADModel` interface: feed it audio (raw `int16`
bytes, numpy arrays, any sample rate) and get back per-frame speech probabilities or
ready-made speech segments. Models are described *declaratively* by an
[`IOSignature`](docs/custom_models.md), so a single generic engine drives most of them
and you can point the same API at any custom `.onnx` file.

- **Lightweight runtime** — only `numpy`, `onnxruntime`, `huggingface_hub`.
- **Offline by default** — a small Silero model is bundled in the wheel.
- **Streaming and batch** — `process_chunk()` for live audio, `get_speech_segments()` /
  `probabilities()` for whole buffers.
- **Bring your own model** — load any ONNX VAD by path/URL with a signature.
- **Extensible** — third parties register backends/models via entry points.

## Install

```bash
uv pip install vadonnx          # runtime (numpy + onnxruntime + huggingface_hub)
uv pip install "vadonnx[mic]"   # + microphone examples
```

## Models

| name | rate | parity vs upstream | notes |
|------|------|--------------------|-------|
| `silero` / `silero-8k` / `silero-op15` | 16k / 8k / 16k | MAE 0 | bundled default, raw PCM |
| `marblenet` / `marblenet-int8` | 16k | MAE 4e-4 | NVIDIA NeMo Frame-VAD, multilingual ([license](docs/licensing.md)) |
| `pyannote` / `pyannote-int8` | 16k | MAE 0 | pyannote segmentation-3.0, windowed |
| `fsmn` / `fsmn-quant` | 16k | tracks upstream | FunASR FSMN-VAD; needs `vadonnx[fsmn]` |
| `speechbrain` | 16k | MAE 0 | SpeechBrain CRDNN, LibriParty-trained |
| `ten` | 16k | — | feature extractor provided by TEN's native library |

See [docs/backends.md](docs/backends.md) for per-model detail and the
[benchmark](benchmark/results/REPORT.md) for measured comparisons across datasets,
including WebRTC and energy baselines.

Models other than the bundled Silero are downloaded on first use from the
[`TigreGotico`](https://huggingface.co/TigreGotico) HuggingFace org and cached under
`$XDG_DATA_HOME/vadonnx`. See [docs/backends.md](docs/backends.md) for per-model detail
and parity notes.

## CLI

```bash
vadonnx list                       # list available models
vadonnx probe silero               # print a model's ONNX input/output signature
vadonnx segment speech.wav         # print detected speech segments of a WAV
```

## Documentation

- [Quickstart](docs/quickstart.md)
- [Streaming](docs/streaming.md)
- [Custom models & `IOSignature`](docs/custom_models.md)
- [Backends & parity notes](docs/backends.md)
- [Plugins](docs/plugins.md)
- [Model conversion](docs/conversion.md)
- [Licensing](docs/licensing.md)
- [API reference](docs/api.md)

## License

Apache-2.0. Bundled/downloaded model weights retain their upstream licenses — see
[docs/licensing.md](docs/licensing.md).
