Metadata-Version: 2.4
Name: voxlevel
Version: 0.1.0
Summary: Normalize WAV voice recordings to a consistent target dB level using AGC, VAD, and limiting
License-Expression: MIT
License-File: LICENSE
Keywords: agc,audio,leveling,normalization,vad,voice,volume,wav
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: onnxruntime>=1.16
Requires-Dist: silero-vad<7,>=5.1
Provides-Extra: benchmark
Requires-Dist: hpss-voice-denoiser; extra == 'benchmark'
Requires-Dist: scipy; extra == 'benchmark'
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.9; extra == 'dev'
Description-Content-Type: text/markdown

# voxlevel

Normalize WAV voice recordings so all speakers sound at the same volume level (-6 dB by default), regardless of their distance from the microphone.

Handles real-world scenarios: background noise, wind, echo, multiple speakers, and speakers moving during recording.

## Installation

```bash
pip install voxlevel
```

## Usage

### Python API

```python
import voxlevel

# From a WAV file
voxlevel.normalize("input.wav", "output.wav")

# From a numpy array
import numpy as np
result = voxlevel.normalize(audio_array, sample_rate=16000)

# With custom parameters
voxlevel.normalize(
    "input.wav",
    "output.wav",
    target_db=-6.0,
    max_gain_db=30.0,
    rms_window_ms=400.0,
    smooth_window_ms=200.0,
)
```

### CLI

```bash
# Single file
voxlevel input.wav -o output.wav

# Batch processing
voxlevel *.wav -o normalized/

# Custom target level
voxlevel input.wav -o output.wav --target-db -3.0
```

## How it works

voxlevel uses a two-pass offline approach (not real-time compression):

1. **Preprocessing** -- DC removal + 80 Hz high-pass filter to cut wind noise, handling noise, and plosives
2. **Voice Activity Detection** -- Silero-VAD (ONNX) identifies speech vs. silence segments
3. **Automatic Gain Control** -- Sliding RMS envelope computes the gain needed at each sample to reach the target level, with interpolation across silence gaps and bidirectional smoothing
4. **Lookahead limiter** -- 5 ms lookahead prevents peaks from exceeding the target, reducing transient distortion compared to brick-wall clipping

The two-pass design means gain is correct from sample 0 -- no lag or adaptation artifacts that real-time compressors exhibit.

## Constraints

- 16-bit mono WAV at 8 kHz or 16 kHz
- Offline processing only (no streaming)

## License

MIT
