Metadata-Version: 2.4
Name: pyannote-openvino
Version: 0.1.1
Summary: Drop-in, OpenVINO-accelerated speaker diarization for pyannote.audio.
Author: Andrew Green
License-Expression: MIT
Project-URL: Homepage, https://github.com/andrew867/pyannote-openvino
Project-URL: Repository, https://github.com/andrew867/pyannote-openvino
Project-URL: Issues, https://github.com/andrew867/pyannote-openvino/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.14,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=2.2.6
Requires-Dist: scipy>=1.17.1
Requires-Dist: torch>=2.10.0
Requires-Dist: librosa>=0.11.0
Requires-Dist: torchaudio>=2.10.0
Requires-Dist: pyannote-audio>=4.0.4
Requires-Dist: openvino>=2026.0.0
Requires-Dist: optimum>=2.1.0
Requires-Dist: optimum-intel>=1.27.0
Requires-Dist: optimum-onnx>=0.1.0
Requires-Dist: onnx>=1.20.1
Requires-Dist: onnxruntime>=1.24.1
Provides-Extra: stt
Requires-Dist: openai-whisper>=20250625; extra == "stt"
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Provides-Extra: build
Requires-Dist: build>=1.0.0; extra == "build"
Dynamic: license-file

# pyannote-openvino

OpenVINO acceleration for the pyannote.audio speaker diarization 3.1 pipeline.
This project keeps the familiar pyannote API while running the heavy segmentation and
embedding models via Intel-compatible OpenVINO IR, so the pipeline runs on CPU and
Intel GPUs without relying on PyTorch FFT patches.

## Installation
1. Create or activate the provided virtual environment (`.venv`).
2. Install the runtime dependencies:
   ```bash
   python -m pip install -e .[stt]
   ```
   The `[stt]` extra pulls the `openai-whisper` model that the `docs/transcribe_v4.py`
   helper uses to turn diarization segments into per-speaker text. If you only need
   the OV pipeline, install the base requirements listed in `requirements.txt`.
3. Ensure you have an FFmpeg binary on `PATH` (the repo contains shared libraries under
   `ffmpeg/bin` for convenience).

## Exporting the reference models to ONNX
Export scripts live under `scripts/phase2/`:
- `export_segmentation.py` exports the SincNet+transformer segmentation model with
  dynamic frame lengths.
- `export_embedding.py` wraps the ResNet embedding head so it consumes pre-computed
  mel filter banks instead of running FFT/RFFT inside the ONNX graph.

Run both scripts before converting to IR:
```bash
python scripts/phase2/export_segmentation.py --duration 2.0 --output models/onnx/segmentation.onnx
python scripts/phase2/export_embedding.py --duration 2.0 --frames 128 --output models/onnx/embedding.onnx
```

You can also use the `optimum-cli` shortcuts shown in this repo:
```bash
optimum-cli export openvino --model models/onnx/segmentation.onnx models/ov/segmentation
optimum-cli export openvino --model models/onnx/embedding.onnx models/ov/embedding
```

## Converting ONNX to OpenVINO IR
`convert_to_ov.py` wraps the OpenVINO Model Optimizer (MO) to turn ONNX files
into `.xml`/`.bin` IR blobs stored under `models/ov/`. By default it keeps FP32
weights but accepts `--weight-format fp16` for iGPU workloads.

Validation is available via `scripts/phase3/validate_ov.py`, which loads the IR
models with `openvino.runtime.Core`, runs dummy inputs, and prints the output
shapes.

## Running the OpenVINO diarization pipeline
Use `pyannote_openvino.OVSpeakerDiarization` as a drop-in replacement for
`pyannote.audio.Pipeline.from_pretrained("pyannote/speaker-diarization-3.1")`.
The helper accepts `segmentation_xml`, `embedding_xml`, and a device string such as
`CPU`, `GPU`, or `GPU.0`:

```python
from pyannote_openvino import OVSpeakerDiarization
pipeline = OVSpeakerDiarization.from_pretrained("models/ov", device="GPU")
diart = pipeline("samples/Stirling Lennon Clips_mixdown.wav")
print(diart)
```

By default the segmentation/embedding classes mirror the pyannote interface
(`num_frames`, `receptive_field_size`, etc.), so the existing clustering code and
pipeline utilities continue to work.

## Speaker-aware transcription helper (`transcribe_v4`)
The repo ships a single CLI under `docs/transcribe_v4.py` that accelerates both diarization and transcription on Intel iGPU:
1. Run the OpenVINO diarization pipeline.
2. Load the same WAV file into memory and crop each speaker turn.
3. Feed each crop to `openai-whisper` (default `tiny`) to produce text for the
   speaker/segs.

Example usage:
```bash
python docs/transcribe_v4.py \
  --audio samples/Stirling\ Lennon\ Clips_mixdown.wav \
  --device GPU \
  --whisper-ov whisper-large-v3-ov \
  --output-txt artifacts/transcribe_v4.txt
```

The CLI prints timestamps, speaker labels, and the recognized text, and also
writes a TSV-style summary to the `--output` path for later reference.

## Testing and validation
- `python scripts/phase1/audit_models.py` records environment versions and shapes.
- `python scripts/phase2/validate_onnx.py` compares the ONNX exports against the
  original torch models.
- `scripts/phase3/validate_ov.py` loads the IR models and runs dummy inference.
- `docs/transcribe_v4.py` serves as the end-to-end Intel GPU smoke test (diarization + STT) on
  any WAV file.

## Directory layout
- `models/onnx/` – ONNX exports produced by Phase 2.
- `models/ov/` – OpenVINO IR files generated by Phase 3.
- `scripts/phase{1..3}/` – export, conversion, and validation helpers.
- `pyannote_openvino/` – the runtime library that wires `OVSegmentationModel`,
  `OVEmbeddingModel`, and `OVSpeakerDiarization` into pyannote’s APIs.
- `docs/transcribe_v4.py` – per-speaker transcription CLI.

## Troubleshooting
- If `torchaudio` fails to read your audio, install FFmpeg and point `PATH` at
  `ffmpeg/bin` (a copy lives in this repo for reference).
- Whisper downloads models the first time it runs; choose a small or tiny model
  for fast iteration and pin `--stt-device` to `cpu` if your GPU is busy.

## Tests
- Install the `test` extra (and the transcription tooling) before running the suite:
  ```bash
  python -m pip install -e .[stt,test]
  ```
- Run the pytest suite to make sure the OV pipeline returns a valid annotation:
  ```bash
  python -m pytest
  ```
- The same command runs in CI (GitHub Actions, GitLab CI) and is fast enough to execute on every push/PR.

## CI & Release Pipelines
- **GitHub Actions**:
  - `ci.yml` runs on push/PR, installs the `[stt,test]` extras, and executes `python -m pytest`.
  - `release.yml` runs on `refs/tags/v*`, reuses the same extras plus the `build` tool, reruns the tests, builds a wheel/tarball via `python -m build`, and publishes the artifacts to a GitHub release using `softprops/action-gh-release`.
- **GitLab CI**:
  - `.gitlab-ci.yml` defines `test` and `release` stages. Both install the `[stt,test,build]` extras, the `test` job runs `python -m pytest`, and the `release` job (tag-only) runs `python -m build` and exposes `dist/` as an artifact for later download.
  - Commit tags matching `v*` (for example `v0.1.0`) will trigger the release stage and produce the distributables.
