Metadata-Version: 2.4
Name: speechgate-rs
Version: 0.1.1
Requires-Dist: numpy>=1.24
Summary: Rust speech gate with Python bindings
Author: di-osc
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# speechgate-rs

Rust implementation of the FASR energy speech gate with Python bindings.

## Install

Install the package from PyPI after a release is published:

```bash
pip install speechgate-rs
```

Install the latest version directly from GitHub:

```bash
pip install "git+https://github.com/di-osc/speechgate-rs.git"
```

For local development, install an editable release build with maturin:

```bash
uv sync
env -u CONDA_PREFIX VIRTUAL_ENV=.venv maturin develop --release
```

## Usage

```python
import numpy as np
from speechgate_rs import EnergySpeechGate

audio = np.zeros(16000, dtype=np.float32)
gate = EnergySpeechGate(base_thresh=0.008, max_thresh=0.035)
mask = gate.compute_keep_mask(audio, sample_rate=16000)
gated = gate.apply_array(audio, sample_rate=16000)
```

For streaming audio, keep one stateful gate per stream:

```python
from speechgate_rs import StreamingEnergySpeechGate

stream_gate = StreamingEnergySpeechGate(stream_context_ms=3000, fade_ms=5)
for chunk in chunks:
    gated_chunk = stream_gate.process_chunk(
        chunk,
        sample_rate=16000,
        is_last=False,
    )
```

For realtime services that process interleaved streams, use the multi-stream
wrapper:

```python
from speechgate_rs import MultiStreamEnergySpeechGate

gate = MultiStreamEnergySpeechGate()
gated_chunk = gate.process_chunk(
    "session-1",
    chunk,
    sample_rate=16000,
    is_last=False,
)
```

The binding keeps the same energy-gate semantics as the Python `EnergySpeechGate`
implementation in `fasr-service-realtime`: adaptive RMS thresholding, short
voice-burst removal, short silence-gap filling, padding, silence pass windows,
streaming context, cross-chunk fade continuity, and fade envelopes.

## Parameters

| Parameter | Default | Suggested range | Meaning | Increase / decrease effect |
| --- | ---: | --- | --- | --- |
| `enabled` | `True` | `True` or `False` | Enables the gate. When `False`, `apply_array` returns the input audio unchanged. | Turn on to filter silence/noise; turn off to bypass the gate completely. |
| `window_ms` | `10` | `5`-`30` ms | Analysis window length. RMS energy is computed once per window. | Larger is steadier but slower to react; smaller reacts faster but is more sensitive to clicks and short spikes. |
| `base_thresh` | `0.008` | `0.001`-`0.03` RMS | Minimum RMS threshold. The adaptive threshold will never go below this value. | Larger rejects more quiet speech/noise; smaller keeps softer speech but may pass more background noise. |
| `threshold_ratio` | `2.0` | `1.0`-`5.0` | Multiplier applied to the estimated noise floor before clamping. | Larger makes the gate stricter in noisy audio; smaller opens the gate more easily. |
| `max_thresh` | `0.035` | `0.01`-`0.1` RMS | Maximum RMS threshold. The adaptive threshold will never go above this value. | Larger allows the adaptive threshold to become stricter in loud noise; smaller protects quieter speech from being rejected. |
| `smooth_alpha` | `0.2` | `0.01`-`1.0` | Exponential smoothing factor for per-window RMS values. | Larger follows energy changes faster but may flicker; smaller is steadier but can lag at speech boundaries. |
| `min_voice_windows` | `5` | `1`-`20` windows | Minimum consecutive voice windows required to keep a speech region. | Larger removes more short bursts but can drop very short words; smaller keeps brief sounds but may pass clicks. |
| `attenuation` | `0.0` | `0.0`-`1.0` gain | Gain applied to rejected audio. `0.0` fully mutes it. | Larger keeps more background ambience; smaller makes rejected regions quieter. |
| `noise_floor_percentile` | `20.0` | `1.0`-`50.0` | Percentile of smoothed RMS values used as the adaptive noise floor estimate. | Larger estimates a higher noise floor and becomes stricter; smaller estimates quieter background and opens more easily. |
| `max_silence_gap_windows` | `8` | `0`-`30` windows | Maximum silent gap to fill between two voice regions. | Larger preserves pauses inside speech but may keep noise between phrases; smaller cuts internal pauses more aggressively. |
| `fade_ms` | `5` | `0`-`50` ms | Fade length when switching between kept and rejected audio. | Larger makes transitions smoother but may smear boundaries; smaller is tighter but can click or sound abrupt. |
| `stream_context_ms` | `3000` | `0`-`10000` ms | Context duration used by streaming integrations to preserve recent audio history. Stateless array APIs keep it for config parity. | Larger gives streaming code more history but uses more memory; smaller is lighter but has less context. |
| `pad_voice_windows` | `2` | `0`-`20` windows | Windows added before and after detected voice regions. | Larger protects speech starts/ends but keeps more surrounding noise; smaller trims tighter but can clip onsets or offsets. |
| `pass_windows` | `0` | `0`-`20` windows | Non-voice windows kept after a voice region as a trailing hold. | Larger makes streaming output less abrupt; smaller removes trailing silence sooner. |

## Verify

For performance-sensitive checks, build the native extension in release mode
before running tests:

```bash
env -u CONDA_PREFIX VIRTUAL_ENV=.venv maturin develop --release
uv run pytest tests -q
```

The test suite includes a NumPy reference implementation and verifies that the
Rust binding returns identical masks/gated output/compacted output while running
faster than the NumPy reference on the benchmark audio.

