Metadata-Version: 2.4
Name: violawake
Version: 0.1.0
Summary: Open-source wake word detection SDK with training pipeline — privacy-first, on-device, Python-native
Project-URL: Homepage, https://github.com/GeeIHadAGoodTime/ViolaWake
Project-URL: Documentation, https://github.com/GeeIHadAGoodTime/ViolaWake#readme
Project-URL: Repository, https://github.com/GeeIHadAGoodTime/ViolaWake
Project-URL: Bug Tracker, https://github.com/GeeIHadAGoodTime/ViolaWake/issues
Author: ViolaWake Contributors
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
           1. Definitions.
        
              "License" shall mean the terms and conditions for use, reproduction,
              and distribution as defined by Sections 1 through 9 of this document.
        
              "Licensor" shall mean the copyright owner or entity authorized by
              the copyright owner that is granting the License.
        
              "Legal Entity" shall mean the union of the acting entity and all
              other entities that control, are controlled by, or are under common
              control with that entity. For the purposes of this definition,
              "control" means (i) the power, direct or indirect, to cause the
              direction or management of such entity, whether by contract or
              otherwise, or (ii) ownership of fifty percent (50%) or more of the
              outstanding shares, or (iii) beneficial ownership of such entity.
        
              "You" (or "Your") shall mean an individual or Legal Entity
              exercising permissions granted by this License.
        
              "Source" form shall mean the preferred form for making modifications,
              including but not limited to software source code, documentation
              source, and configuration files.
        
              "Object" form shall mean any form resulting from mechanical
              transformation or translation of a Source form, including but
              not limited to compiled object code, generated documentation,
              and conversions to other media types.
        
              "Work" shall mean the work of authorship made available under
              the License, as indicated by a copyright notice that is included in
              or attached to the work (an example is provided in the Appendix below).
        
              "Derivative Works" shall mean any work, whether in Source or Object
              form, that is based on (or derived from) the Work and for which the
              editorial revisions, annotations, elaborations, or other transformations
              represent, as a whole, an original work of authorship. For the purposes
              of this License, Derivative Works shall not include works that remain
              separable from, or merely link (or bind by name) to the interfaces of,
              the Work and the Derivative Works thereof.
        
              "Contributor" shall mean Licensor and any Legal Entity on behalf of
              whom a Contribution has been received by the Licensor and included
              within the Work.
        
           2. Grant of Copyright License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              copyright license to reproduce, prepare Derivative Works of,
              publicly display, publicly perform, sublicense, and distribute the
              Work and such Derivative Works in Source or Object form.
        
           3. Grant of Patent License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              (except as stated in this section) patent license to make, have made,
              use, offer to sell, sell, import, and otherwise transfer the Work.
        
           4. Redistribution. You may reproduce and distribute copies of the
              Work or Derivative Works thereof in any medium, with or without
              modifications, and in Source or Object form, provided that You
              meet the following conditions:
        
              (a) You must give any other recipients of the Work or Derivative
                  Works a copy of this License; and
        
              (b) You must cause any modified files to carry prominent notices
                  stating that You changed the files; and
        
              (c) You must retain, in the Source form of any Derivative Works
                  that You distribute, all copyright, patent, trademark, and
                  attribution notices from the Source form of the Work,
                  excluding those notices that do not pertain to any part of
                  the Derivative Works; and
        
              (d) If the Work includes a "NOTICE" text file, You must include a
                  readable copy of the attribution notices contained within such
                  NOTICE file, in at least one of the following places: within a
                  NOTICE text provided with the Derivative Works; within the Source
                  form or documentation, if provided along with the Derivative Works;
                  or, within a display generated by the Derivative Works, if and
                  wherever such third-party notices normally appear. The contents
                  of the NOTICE file are for informational purposes only and
                  do not modify the License.
        
           5. Submission of Contributions. Unless You explicitly state otherwise,
              any Contribution intentionally submitted for inclusion in the Work
              by You to the Licensor shall be under the terms and conditions of
              this License, without any additional terms or conditions.
        
           6. Trademarks. This License does not grant permission to use the trade
              names, trademarks, service marks, or product names of the Licensor.
        
           7. Disclaimer of Warranty. Unless required by applicable law or
              agreed to in writing, Licensor provides the Work (and each
              Contributor provides its Contributions) on an "AS IS" BASIS,
              WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
              implied, including, without limitation, any warranties or conditions
              of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
              PARTICULAR PURPOSE. You are solely responsible for determining the
              appropriateness of using or reproducing the Work.
        
           8. Limitation of Liability. In no event and under no legal theory,
              whether in tort (including negligence), contract, or otherwise,
              unless required by applicable law (such as deliberate and grossly
              negligent acts) or agreed to in writing, shall any Contributor be
              liable to You for damages, including any direct, indirect, special,
              incidental, or exemplary damages of any character arising as a
              result of this License or out of the use or inability to use the
              Work (including but not limited to damages for loss of goodwill,
              work stoppage, computer failure or malfunction, or all other
              commercial damages or losses), even if such Contributor has been
              advised of the possibility of such damages.
        
           9. Accepting Warranty or Liability While Redistributing. You may
              choose to offer, and charge a fee for, acceptance of support,
              warranty, indemnity, or other liability obligations and/or rights
              consistent with this License.
        
           END OF TERMS AND CONDITIONS
        
           APPENDIX: How to apply the Apache License to your work.
        
              Copyright 2026 ViolaWake Contributors
        
              Licensed under the Apache License, Version 2.0 (the "License");
              you may not use this file except in compliance with the License.
              You may obtain a copy of the License at
        
                  http://www.apache.org/licenses/LICENSE-2.0
        
              Unless required by applicable law or agreed to in writing, software
              distributed under the License is distributed on an "AS IS" BASIS,
              WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
              See the License for the specific language governing permissions and
              limitations under the License.
        
        NOTICE
        ------
        
        This software includes or uses the following open-source components:
        
        1. OpenWakeWord (https://github.com/dscripka/openWakeWord)
           Copyright (c) openWakeWord Contributors
           Licensed under the Apache License, Version 2.0.
           ViolaWake uses OpenWakeWord's audio embedding backbone as a frozen
           feature extractor. The classification heads (Temporal CNN, Conv-GRU)
           and training pipeline are original ViolaWake work.
        
        2. Kokoro-82M TTS Model (https://github.com/hexgrad/kokoro)
           Licensed under the Apache License, Version 2.0.
           The Kokoro model is redistributed in its original form as a downloadable artifact.
        
        3. ONNX Runtime (https://github.com/microsoft/onnxruntime)
           Copyright (c) Microsoft Corporation.
           Licensed under the MIT License.
License-File: LICENSE
Keywords: on-device,onnx,speech-recognition,stt,tts,voice-assistant,wake-word
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: onnxruntime>=1.17
Requires-Dist: scipy>=1.11
Provides-Extra: all
Requires-Dist: edge-tts>=6.1; extra == 'all'
Requires-Dist: faster-whisper>=1.0; extra == 'all'
Requires-Dist: kokoro-onnx>=0.4; extra == 'all'
Requires-Dist: librosa>=0.10; extra == 'all'
Requires-Dist: matplotlib>=3.8; extra == 'all'
Requires-Dist: onnx>=1.15; extra == 'all'
Requires-Dist: openwakeword>=0.6; extra == 'all'
Requires-Dist: pandas>=2.1; extra == 'all'
Requires-Dist: pyaudio>=0.2.14; extra == 'all'
Requires-Dist: pydub>=0.25; extra == 'all'
Requires-Dist: requests>=2.31; extra == 'all'
Requires-Dist: scikit-learn>=1.3; extra == 'all'
Requires-Dist: sounddevice>=0.4; extra == 'all'
Requires-Dist: soundfile>=0.12; extra == 'all'
Requires-Dist: tflite-runtime>=2.14.0; extra == 'all'
Requires-Dist: torch>=2.1; extra == 'all'
Requires-Dist: torchaudio>=2.1; extra == 'all'
Requires-Dist: tqdm>=4.66; extra == 'all'
Requires-Dist: webrtcvad>=2.0.10; extra == 'all'
Provides-Extra: audio
Requires-Dist: pyaudio>=0.2.14; extra == 'audio'
Requires-Dist: soundfile>=0.12; extra == 'audio'
Provides-Extra: dev
Requires-Dist: hatchling>=1.21; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pre-commit>=3.6; extra == 'dev'
Requires-Dist: pyaudio>=0.2.14; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: requests>=2.31; extra == 'dev'
Requires-Dist: ruff>=0.3; extra == 'dev'
Requires-Dist: tqdm>=4.66; extra == 'dev'
Requires-Dist: types-requests>=2.31; extra == 'dev'
Provides-Extra: docs
Requires-Dist: pdoc>=14.0; extra == 'docs'
Provides-Extra: download
Requires-Dist: requests>=2.31; extra == 'download'
Requires-Dist: tqdm>=4.66; extra == 'download'
Provides-Extra: generate
Requires-Dist: edge-tts>=6.1; extra == 'generate'
Requires-Dist: pydub>=0.25; extra == 'generate'
Requires-Dist: soundfile>=0.12; extra == 'generate'
Provides-Extra: oww
Requires-Dist: openwakeword>=0.6; extra == 'oww'
Provides-Extra: stt
Requires-Dist: faster-whisper>=1.0; extra == 'stt'
Provides-Extra: tflite
Requires-Dist: tflite-runtime>=2.14.0; extra == 'tflite'
Provides-Extra: training
Requires-Dist: edge-tts>=6.1; extra == 'training'
Requires-Dist: librosa>=0.10; extra == 'training'
Requires-Dist: matplotlib>=3.8; extra == 'training'
Requires-Dist: onnx>=1.15; extra == 'training'
Requires-Dist: openwakeword>=0.6; extra == 'training'
Requires-Dist: pandas>=2.1; extra == 'training'
Requires-Dist: pydub>=0.25; extra == 'training'
Requires-Dist: scikit-learn>=1.3; extra == 'training'
Requires-Dist: torch>=2.1; extra == 'training'
Requires-Dist: torchaudio>=2.1; extra == 'training'
Provides-Extra: tts
Requires-Dist: kokoro-onnx>=0.4; extra == 'tts'
Requires-Dist: sounddevice>=0.4; extra == 'tts'
Provides-Extra: vad
Requires-Dist: webrtcvad>=2.0.10; extra == 'vad'
Description-Content-Type: text/markdown

# ViolaWake SDK

**The open-source alternative to Porcupine.** A production-tested wake word engine with accessible training, ONNX inference, and a Python-first SDK.

<!-- PyPI badge will activate after first publish -->
<!-- [![PyPI version](https://badge.fury.io/py/violawake.svg)](https://badge.fury.io/py/violawake) -->
[![CI](https://github.com/GeeIHadAGoodTime/ViolaWake/actions/workflows/ci.yml/badge.svg)](https://github.com/GeeIHadAGoodTime/ViolaWake/actions/workflows/ci.yml)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

---

## Why ViolaWake?

| | ViolaWake | Porcupine (Picovoice) | openWakeWord |
|---|---|---|---|
| **License** | Apache 2.0 | Proprietary (metered) | Apache 2.0 |
| **Training code open** | Yes | No (closed) | Yes |
| **Custom wake words** | Yes (training CLI) | Yes (paid Console) | Yes (fine-tune) |
| **Evaluation tooling** | `violawake-eval` (Cohen's d, EER, FAR/FRR, ROC AUC) | None published | Basic |
| **On-device** | Yes (ONNX) | Yes (proprietary C lib) | Yes (ONNX) |
| **Integrated TTS** | Yes (Kokoro-82M, optional extra) | No | No |
| **Python SDK** | First-class | C wrapper | First-class |
| **Price at scale** | Free | Paid (free tier available) | Free |

**Our moat:** Open training code, transparent evaluation with reproducible benchmarks, production-hardened data augmentation (gain, time stretch, pitch shift, noise mixing), and a 4-gate decision policy that suppresses false positives during music playback. On a fair head-to-head benchmark against openWakeWord (same corpus, same pipeline, adversarial negatives for both systems), ViolaWake achieves **EER 5.49%** vs OWW's 8.24% — each system tested on its own best wake word. Running in production, not a demo.

> **A note on accuracy claims:** Our benchmark uses TTS-generated audio with adversarial confusables, not real-speaker recordings. Real-world accuracy depends on your deployment environment. We publish our benchmark scripts so you can reproduce and extend them. Run `violawake-eval` on your own test data.

---

## Quick Start

```bash
pip install "violawake[audio,download]"
violawake-download --model temporal_cnn
```

### Wake Word Detection (5 lines)

```python
from violawake_sdk import WakeDetector

detector = WakeDetector(model="temporal_cnn", threshold=0.80, confirm_count=3)

for audio_chunk in detector.stream_mic():  # 20ms chunks at 16kHz
    if detector.detect(audio_chunk):
        print("Wake word detected!")
        break
```

> `confirm_count=3` requires 3 consecutive above-threshold frames before firing, reducing false accepts by ~82-87% depending on threshold. Use `confirm_count=1` for lowest latency.

### Threshold Tuning

The `threshold` parameter controls the trade-off between sensitivity and false positives:

| Threshold | Behavior | Use Case |
|-----------|----------|----------|
| 0.70 | Sensitive -- more detections, more false positives | Quiet rooms, close-mic setups |
| **0.80** | **Balanced (default)** -- recommended starting point | General-purpose, most environments |
| 0.85 | Conservative -- fewer false positives, may miss some wake words | Living rooms with TV/music |
| 0.90+ | Very conservative -- lowest false positive rate | Noisy environments, always-on kiosks |

Start at 0.80 and adjust based on your false accept rate. Use `violawake-streaming-eval` to measure FAPH (false accepts per hour) on representative audio from your deployment environment, or `violawake-eval` for clip-by-clip EER/FAR/FRR/ROC AUC.

### Text-to-Speech (Kokoro-82M)

```python
from violawake_sdk import TTSEngine

tts = TTSEngine()  # Downloads kokoro-v1.0.onnx + voices-v1.0.bin on first run (~354MB total)
audio = tts.synthesize("Hello from ViolaWake!")
tts.play(audio)
```

### Voice Activity Detection

```python
from violawake_sdk import VADEngine

vad = VADEngine(backend="webrtc")  # or "silero", "rms"
prob = vad.process_frame(audio_bytes)  # returns 0.0–1.0 speech probability
```

### Full Pipeline (Wake → STT → TTS)

> Requires: `pip install "violawake[audio,stt,tts]"`

```python
from violawake_sdk import VoicePipeline

pipeline = VoicePipeline(
    wake_word="viola",
    stt_model="base",        # faster-whisper model size
    tts_voice="af_heart",    # Kokoro voice
)

@pipeline.on_command
def handle_command(text: str) -> None:
    print(f"Command: {text}")
    pipeline.speak(f"You said: {text}")  # Or return a string to auto-speak

pipeline.run()  # Blocks — Ctrl+C to stop
```

---

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    VoicePipeline                            │
│                                                             │
│  Mic ──► [WakeDetector] ──► [VAD] ──► [STT] ──► callback  │
│                                                             │
│  text ──► [TTS] ──► Speaker                                │
└─────────────────────────────────────────────────────────────┘
```

**Components:**

| Module | Engine | Size | Latency |
|--------|--------|------|---------|
| Wake word | Temporal CNN on OWW embeddings (ONNX) | ~100 KB head (+OWW backbone via `openwakeword`) | ~8ms/frame |
| VAD | WebRTC VAD / Silero / RMS heuristic | <1 MB | <1ms/frame |
| STT | faster-whisper `base` | 145 MB | 0.5–2s |
| TTS | Kokoro-82M (ONNX) | 326 MB | 0.3–0.8s/sentence |

---

## Training Your Own Wake Word

The training CLI lets you train a custom wake word model with ~200 positive samples:

```bash
# Collect positive samples (read prompts aloud)
violawake-collect --word "jarvis" --output data/jarvis/positives/ --count 200

# Train (auto-generates TTS positives, confusable negatives, and speech negatives)
violawake-train \
  --word "jarvis" \
  --positives data/jarvis/positives/ \
  --output models/jarvis.onnx \
  --epochs 50

# To disable augmentation, add --no-augment
# To use legacy MLP architecture, add --architecture mlp

# Evaluate (EER, FAR/FRR, ROC AUC)
violawake-eval \
  --model models/jarvis.onnx \
  --test-dir data/jarvis/test/ \
  --report
```

The `--test-dir` must contain `positives/` and `negatives/` subdirectories.

**Expected results:** EER < 10% (against the bundled synthetic negative corpus) with 200+ quality positive samples. Your real-world performance will depend on your deployment environment and negative speech corpus.

### Proof: "Operator" Custom Wake Word (89 seconds, EER 7.2%)

To prove the training pipeline generalizes beyond "Viola," we trained a custom "operator" model from scratch — zero manual data collection:

| | ViolaWake "viola" | ViolaWake "operator" | OWW "alexa" (pre-trained) |
|---|---|---|---|
| **EER** | **5.49%** | **7.2%** | 8.24% |
| **ROC AUC** | 0.988 | 0.984 | 0.956 |
| **Training time** | ~48s | **89s** | N/A (pre-trained) |
| **Architecture** | Temporal CNN | Temporal CNN | MLP on OWW embeddings |

The training CLI handled TTS sample generation (20 Edge TTS voices), confusable negative generation (16 phonetic variants), 10x augmentation, and Temporal CNN training end-to-end. OWW provides training notebooks but no pip-installable CLI tool.

Full methodology, corpus details, and reproducibility instructions: [`benchmark_v2/OPERATOR_BENCHMARK.md`](benchmark_v2/OPERATOR_BENCHMARK.md)

---

## Models

Models are versioned and published to GitHub Releases. Use registry names without file extensions when passing `--model` or `WakeDetector(model=...)`. Download separately (too large for PyPI):

```bash
python -m violawake_sdk.tools.download_model --model temporal_cnn   # default, ~100 KB
python -m violawake_sdk.tools.download_model --model kokoro_v1_0    # TTS model, 326 MB
python -m violawake_sdk.tools.download_model --model kokoro_voices_v1_0  # TTS voices, 28 MB
```

| Model | Type | Size | EER* | Notes |
|-------|------|------|------|-------|
| `temporal_cnn.onnx` | Temporal CNN on OWW embeddings | ~100 KB | 5.49% | Production default — best live recall + lowest FP |
| `temporal_convgru.onnx` | Temporal Conv-GRU on OWW embeddings | ~81 KB | -- | Reserve model |
| ~~`r3_10x_s42.onnx`~~ | MLP on OWW embeddings | ~34 KB | -- | **Deprecated** — fails live mic test. Do not use. |
| `kokoro-v1.0.onnx` | Kokoro-82M TTS | ~326 MB | -- | Apache 2.0 (hosted by [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx)) |

*EER (Equal Error Rate) from benchmark v2: 700 shared negatives (incl. adversarial confusables), 180 TTS positives, streaming inference. Lower is better. See `benchmark_v2/` for full methodology and scripts.

---

## Platform Support

| Platform | Wake Word | TTS | STT | Status |
|----------|-----------|-----|-----|--------|
| Windows 10/11 (x64) | ✅ | ✅ | ✅ | **Fully tested** |
| Linux (x64) | ✅ | ✅ | ✅ | CI-tested |
| macOS (arm64/x64) | ✅ | ✅ | ✅ | CI-tested (Intel), community (ARM) |
| Raspberry Pi 4 (ARM64) | ✅ | ⚠️ slow | ✅ | Supported |
| Browser/WASM | 🚧 | 🚧 | ❌ | Phase 2 (Q3 2026) |
| Android | ❌ | ❌ | ❌ | Phase 3 (2027) |
| iOS | ❌ | ❌ | ❌ | Phase 3 (2027) |

---

## Installation

**Minimum install (wake word + VAD only):**
```bash
pip install violawake
```

> **Note:** Both `import violawake` and `import violawake_sdk` work. The canonical import is `violawake_sdk` (e.g., `from violawake_sdk import WakeDetector`), but `from violawake import WakeDetector` is also supported for convenience.

**With microphone input and model downloading:**
```bash
pip install "violawake[audio,download]"
```

**With TTS:**
```bash
pip install "violawake[tts]"
```

**With STT:**
```bash
pip install "violawake[stt]"
```

**Full pipeline (all features):**
```bash
pip install "violawake[all]"
```

**Requirements:**
- Python 3.10+
- `onnxruntime >= 1.17` (CPU) or `onnxruntime-gpu` for GPU acceleration
- `pyaudio` for microphone input
- `numpy`, `scipy`
- `openwakeword >= 0.6` (installed automatically as a dependency — provides the frozen mel/embedding backbone)

---

## Performance Benchmarks

Measured on i7-12700H, Windows 11, RTX 3060 (CPU inference):

| Operation | Latency (p50) | Latency (p99) |
|-----------|--------------|--------------|
| Wake word inference (20ms frame) | 7.8 ms | 12.1 ms |
| VAD (WebRTC, 20ms frame) | 0.4 ms | 0.8 ms |
| STT (Whisper base, 3s audio) | 680 ms | 1.2s |
| TTS first audio (Kokoro, 1 sentence) | 310 ms | 580 ms |

**Wake word accuracy** (benchmark v2 — TTS corpus, 700 negatives incl. adversarial confusables):
- Temporal CNN model: **EER 5.49%**, ROC AUC 0.9877
- FAR @ FRR=5%: **5.43%** (vs OWW's 8.86% on its own best word)
- Live mic tested: 100% recall on direct speech, 0 false positives on podcast/music
- Real-world metrics depend on your deployment environment. Run `violawake-eval` (clip-by-clip) or `violawake-streaming-eval` (continuous FAPH) on your own test data.

---

## Debugging

Enable debug logging to see gate rejections, backbone output, score tracking, and detection decisions:

```python
import logging
logging.basicConfig(level=logging.DEBUG)

from violawake_sdk import WakeDetector
detector = WakeDetector(model="temporal_cnn", threshold=0.80)
```

This produces output like:
- `Gate 1 reject: RMS 0.0 below floor 1.0` -- silence/DC offset filtered
- `Gate 3 reject: cooldown active (1.2s remaining)` -- too soon after last detection
- `Gate 4 reject: playback active` -- suppressed during music
- `Wake word detected! score=0.872` -- successful detection

Set `level=logging.INFO` for detections only (less verbose).

---

## Examples

The `examples/` directory contains runnable scripts:

| File | Description |
|------|-------------|
| `examples/basic_detection.py` | Minimal microphone wake word detection loop |
| `examples/async_detection.py` | Async wake word detection with AsyncWakeDetector |
| `examples/streaming_eval.py` | Evaluate false accepts per hour on a WAV file |

Run any example with:
```bash
python examples/basic_detection.py
```

---

## Comparison to openWakeWord

openWakeWord is the closest open-source alternative. ViolaWake differences:

- **Open, reproducible evaluation:** `violawake-eval` produces EER, FAR/FRR, ROC AUC on any model + test set. `violawake-streaming-eval` measures FAPH on continuous audio. Benchmark scripts in `benchmark_v2/` — run them yourself.
- **Production-hardened decision policy:** 4-gate pipeline (zero-input guard, score threshold, cooldown, listening gate) plus optional multi-window confirmation — suppresses false positives during music playback when `is_playing` state is wired up
- **Bundled pipeline:** ViolaWake ships integrated VAD + STT + TTS, not just the wake word component
- **Training infrastructure:** FocalLoss + EMA + SWA + augmentation pipeline (gain, stretch, pitch, noise, time shift; RIR and SpecAugment available opt-in) vs basic training in openWakeWord

---

## Migrating from openWakeWord

ViolaWake uses openWakeWord's mel-spectrogram embedding model as a frozen feature extractor backbone. If you have existing OWW training data, you can use it directly with ViolaWake's training CLI.

**Key differences from OWW:**
- **Decision policy:** ViolaWake adds a multi-gate pipeline (RMS floor, cooldown, playback suppression) on top of raw scores. OWW exposes raw sigmoid scores only.
- **Temporal models:** ViolaWake supports Temporal CNN and Conv-GRU heads that score across a sliding window of embeddings, not just a single frame. This reduces false positives on speech that partially matches the wake word.
- **Augmentation pipeline:** ViolaWake's training CLI applies gain, time stretch, pitch shift, noise mixing, and RIR convolution. SpecAugment is available for custom spectrogram-level pipelines via `AugmentationPipeline.augment_spectrogram()`. OWW's default training uses minimal augmentation.
- **Confidence API:** `detector.get_confidence()` and `detector.last_scores` provide structured confidence tracking that OWW does not offer.

**Using existing OWW training data:**
```bash
# Your OWW positive samples work as-is (16kHz WAV/FLAC)
violawake-train \
  --word "my_wake_word" \
  --positives path/to/oww_positives/ \
  --negatives path/to/oww_negatives/ \
  --output models/my_wake_word.onnx \
  --epochs 50
```

No format conversion is needed -- ViolaWake reads the same 16kHz mono WAV/FLAC files that OWW uses.

---

## Roadmap

**v1.0 (Q2 2026) — Phase 1 MVP:**
- [x] Python SDK (Wake + VAD)
- [x] Kokoro TTS integration
- [x] faster-whisper STT integration
- [x] Full VoicePipeline class
- [x] Training CLI
- [ ] PyPI release
- [ ] Documentation site

**v1.1 (Q3 2026) — Streaming + Web:**
- [ ] Streaming STT (faster-whisper generator mode)
- [ ] WASM build for ViolaWake
- [ ] JavaScript/Node SDK wrapper
- [ ] Custom wake word web Console (alpha)

**v2.0 (Q1 2027) — Multi-platform:**
- [ ] Android SDK (ONNX Runtime Android)
- [ ] iOS SDK (ONNX Runtime iOS)
- [ ] DeepFilterNet noise suppression integration
- [ ] Speaker diarization (pyannote.audio)
- [ ] License/metering infrastructure

---

## Contributing

```bash
git clone https://github.com/GeeIHadAGoodTime/ViolaWake
cd ViolaWake
pip install -e ".[dev]"
pre-commit install
pytest tests/
```

See `CONTRIBUTING.md` for guidelines.

---

## License

Apache 2.0. Models trained on open datasets. See `LICENSE` for details.

ViolaWake uses OpenWakeWord as a frozen feature extractor backbone (also Apache 2.0). The classification heads (Temporal CNN, Conv-GRU) and training pipeline are original ViolaWake work.
