Metadata-Version: 2.4
Name: polyvoice
Version: 0.7.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Summary: Speaker diarization for Python — who spoke when. Rust + ONNX, no Python runtime overhead. K-means/AHC clustering, overlap detection.
Keywords: diarization,speaker,audio,speech,onnx,clustering,vad,voice-activity-detection
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/ekhodzitsky/polyvoice
Project-URL: Repository, https://github.com/ekhodzitsky/polyvoice

# polyvoice

[![CI](https://github.com/ekhodzitsky/polyvoice/actions/workflows/ci.yml/badge.svg)](https://github.com/ekhodzitsky/polyvoice/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/polyvoice)](https://pypi.org/project/polyvoice)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

**Speaker diarization for Python — who spoke when.**

Rust-powered, ONNX-based speaker diarization that runs on CPU, fits in 30 MB,
and requires zero Python runtime overhead. Pipeline v2 with K-means auto-k
clustering and overlap detection.

## Install

```bash
pip install polyvoice
```

Requires Python 3.9+.

## Quick start

```python
import polyvoice

# Models auto-download on first run (~30 MB)
pipeline = polyvoice.Pipeline.balanced()

result = pipeline.run(samples, sample_rate=16000)

print(f"Speakers: {result['num_speakers']}")
for turn in result["turns"]:
    print(f"Speaker {turn['speaker']}: {turn['start']:.1f}s - {turn['end']:.1f}s")
```

## API

- `polyvoice.Pipeline.balanced(models_cache=None)` — balanced accuracy / speed.
- `polyvoice.Pipeline.mobile(models_cache=None)` — smaller, faster model.
- `pipeline.run(samples, sample_rate)` → `dict` with `num_speakers` and `turns`.

## Performance

| Pipeline | VoxConverse DER | Model size |
|----------|-----------------|------------|
| Hybrid + K-means | **14.12%** | ~30 MB |

See the [full repository](https://github.com/ekhodzitsky/polyvoice) for Rust / C / CLI APIs, benchmarks, and development docs.

