Environment: Windows 11, bash shell. Project root: J:\CLAUDE\PROJECTS\Wakeword (master).

PROBLEM
The training pipeline currently has TWO competing sources for speech negatives:

1. **Edge-TTS-synthesized speech negatives** (`_generate_speech_negatives` in `src/violawake_sdk/tools/train.py`). Generates ~5-30 random English phrases via Microsoft Edge TTS at training time. **Microsoft's free-tier edge-tts is rate-limited and 503s under load**, causing the training worker to hang/retry. We've seen `WSServerHandshakeError: 503` repeatedly in production logs.

2. **Pre-curated universal corpus** (LibriSpeech + MUSAN) at repo `corpus/`. Loaded as Source 4 in `console/backend/app/services/training_service.py`. Reliable file access, no network failure modes, peer-reviewed dataset. ~5GB total, hosted in the repo.

Architecturally these compete. The user has decided: **corpus is the home recipe; it applies to everyone.** Edge-TTS should ONLY be used for wake-word-specific generation (TTS positives + confusables), NEVER for generic speech negatives.

REQUIRED CHANGES

1. **Remove generic speech-negative generation via edge-tts.**
   - In `console/backend/app/services/training_service.py`, delete the call to `_generate_speech_negatives` and the surrounding try/except (Source 3, around line ~210-225).
   - Update the progress messages so users see "Loading corpus speech negatives" instead of "Generating speech negatives".
   - The `_generate_speech_negatives` function in `src/violawake_sdk/tools/train.py` can stay as a callable utility (someone might still use it from CLI), but mark its docstring as "deprecated for production training".

2. **Promote corpus to required (with graceful fallback, not silent failure).**
   - In `training_service.py`, after the corpus search loop populates `neg_tag_map`, count total negatives.
   - If `total_neg < 5` AFTER corpus + confusables (Sources 1, 2, 4), raise a CLEAR error: "No speech negatives available. Mount LibriSpeech + MUSAN corpus at /app/corpus or run `violawake download-corpus`."
   - Don't blame edge-tts in the error message any more.

3. **Add a `violawake download-corpus` CLI command.**
   - In `src/violawake_sdk/tools/`, add `download_corpus.py` that fetches LibriSpeech (dev-clean only — small, ~330MB) + MUSAN (~11GB or just speech subset ~3GB) into `~/.violawake/corpus/`.
   - Hook into `pyproject.toml` scripts: `violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"`.
   - The fetch URL should be a stable host. LibriSpeech: `https://www.openslr.org/resources/12/dev-clean.tar.gz`. MUSAN: `https://www.openslr.org/resources/17/musan.tar.gz`. Both ~330MB and ~11GB respectively. For now, just LibriSpeech dev-clean (smaller; sufficient for training negatives).
   - Update `training_service.py` corpus search paths to include `~/.violawake/corpus` (already there per Codex δ's audit).

4. **Update docker-compose.production.yml.**
   - Make the `./corpus:/app/corpus:ro` mount REQUIRED — change the comment to say "REQUIRED — see docs/DEPLOYMENT.md".
   - If user runs without the mount, document that they must run `violawake download-corpus` and mount that path instead.

5. **Update `docs/DEPLOYMENT.md`.**
   - Add a "Corpus" section between Environment vars and Backend deploy.
   - Two paths: (a) mount the in-repo `./corpus/` (operator default), (b) `violawake download-corpus` then mount `~/.violawake/corpus/`.
   - Note: WITHOUT this corpus, training fails fast and clearly. No silent fallback to flaky edge-tts.

6. **Edge-TTS for what remains.**
   - `_generate_tts_positives` (wake-word in many voices) — KEEP. Wake-word-specific.
   - `_generate_confusable_negatives` (similar-sounding words) — KEEP. Wake-word-specific.
   - But: add backoff/retry-with-jitter so 503s don't crash the worker; on TOTAL failure, log loudly and continue (the corpus + user-uploads still cover the basics).

CRITICAL CONSTRAINTS
- Do NOT use PowerShell with complex quoting.
- NEVER `git add -A`. Stage explicit files. Logical commits.
- Do not push. User will review and push.
- The download_corpus utility should NOT actually run during this Codex session (no need to download 330MB+ in CI). Just write the code and a smoke test that mocks the urlretrieve.

PROVE IT
1. `cd console && python -m pytest tests/ --no-cov --timeout=30 --ignore=tests/e2e -q 2>&1 | tail -3` — must still be `>= 147 passed, 0 failed`.
2. Show the diff of `training_service.py` (the speech-neg removal).
3. Show the new `download_corpus.py` file with --help working.
4. Show the updated docker-compose.production.yml comment.
5. Confirm pyproject.toml `[project.scripts]` has the new entry.

REPORT
- Files changed.
- Commit SHA(s) — one per concern.
- One-line takeaway: now training requires the corpus. No more flaky edge-tts hammer.

Time budget: ~25 min.
