Reading additional input from stdin...
OpenAI Codex v0.128.0 (research preview)
--------
workdir: J:\CLAUDE\PROJECTS\Wakeword
model: gpt-5.5
provider: openai
approval: never
sandbox: danger-full-access
reasoning effort: xhigh
reasoning summaries: none
session id: 019e027d-91fe-7ce0-af8b-98b4c73c5a9d
--------
user
Environment: Windows 11, bash shell. Project root: J:\CLAUDE\PROJECTS\Wakeword (master).

PROBLEM
The training pipeline currently has TWO competing sources for speech negatives:

1. **Edge-TTS-synthesized speech negatives** (`_generate_speech_negatives` in `src/violawake_sdk/tools/train.py`). Generates ~5-30 random English phrases via Microsoft Edge TTS at training time. **Microsoft's free-tier edge-tts is rate-limited and 503s under load**, causing the training worker to hang/retry. We've seen `WSServerHandshakeError: 503` repeatedly in production logs.

2. **Pre-curated universal corpus** (LibriSpeech + MUSAN) at repo `corpus/`. Loaded as Source 4 in `console/backend/app/services/training_service.py`. Reliable file access, no network failure modes, peer-reviewed dataset. ~5GB total, hosted in the repo.

Architecturally these compete. The user has decided: **corpus is the home recipe; it applies to everyone.** Edge-TTS should ONLY be used for wake-word-specific generation (TTS positives + confusables), NEVER for generic speech negatives.

REQUIRED CHANGES

1. **Remove generic speech-negative generation via edge-tts.**
   - In `console/backend/app/services/training_service.py`, delete the call to `_generate_speech_negatives` and the surrounding try/except (Source 3, around line ~210-225).
   - Update the progress messages so users see "Loading corpus speech negatives" instead of "Generating speech negatives".
   - The `_generate_speech_negatives` function in `src/violawake_sdk/tools/train.py` can stay as a callable utility (someone might still use it from CLI), but mark its docstring as "deprecated for production training".

2. **Promote corpus to required (with graceful fallback, not silent failure).**
   - In `training_service.py`, after the corpus search loop populates `neg_tag_map`, count total negatives.
   - If `total_neg < 5` AFTER corpus + confusables (Sources 1, 2, 4), raise a CLEAR error: "No speech negatives available. Mount LibriSpeech + MUSAN corpus at /app/corpus or run `violawake download-corpus`."
   - Don't blame edge-tts in the error message any more.

3. **Add a `violawake download-corpus` CLI command.**
   - In `src/violawake_sdk/tools/`, add `download_corpus.py` that fetches LibriSpeech (dev-clean only — small, ~330MB) + MUSAN (~11GB or just speech subset ~3GB) into `~/.violawake/corpus/`.
   - Hook into `pyproject.toml` scripts: `violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"`.
   - The fetch URL should be a stable host. LibriSpeech: `https://www.openslr.org/resources/12/dev-clean.tar.gz`. MUSAN: `https://www.openslr.org/resources/17/musan.tar.gz`. Both ~330MB and ~11GB respectively. For now, just LibriSpeech dev-clean (smaller; sufficient for training negatives).
   - Update `training_service.py` corpus search paths to include `~/.violawake/corpus` (already there per Codex δ's audit).

4. **Update docker-compose.production.yml.**
   - Make the `./corpus:/app/corpus:ro` mount REQUIRED — change the comment to say "REQUIRED — see docs/DEPLOYMENT.md".
   - If user runs without the mount, document that they must run `violawake download-corpus` and mount that path instead.

5. **Update `docs/DEPLOYMENT.md`.**
   - Add a "Corpus" section between Environment vars and Backend deploy.
   - Two paths: (a) mount the in-repo `./corpus/` (operator default), (b) `violawake download-corpus` then mount `~/.violawake/corpus/`.
   - Note: WITHOUT this corpus, training fails fast and clearly. No silent fallback to flaky edge-tts.

6. **Edge-TTS for what remains.**
   - `_generate_tts_positives` (wake-word in many voices) — KEEP. Wake-word-specific.
   - `_generate_confusable_negatives` (similar-sounding words) — KEEP. Wake-word-specific.
   - But: add backoff/retry-with-jitter so 503s don't crash the worker; on TOTAL failure, log loudly and continue (the corpus + user-uploads still cover the basics).

CRITICAL CONSTRAINTS
- Do NOT use PowerShell with complex quoting.
- NEVER `git add -A`. Stage explicit files. Logical commits.
- Do not push. User will review and push.
- The download_corpus utility should NOT actually run during this Codex session (no need to download 330MB+ in CI). Just write the code and a smoke test that mocks the urlretrieve.

PROVE IT
1. `cd console && python -m pytest tests/ --no-cov --timeout=30 --ignore=tests/e2e -q 2>&1 | tail -3` — must still be `>= 147 passed, 0 failed`.
2. Show the diff of `training_service.py` (the speech-neg removal).
3. Show the new `download_corpus.py` file with --help working.
4. Show the updated docker-compose.production.yml comment.
5. Confirm pyproject.toml `[project.scripts]` has the new entry.

REPORT
- Files changed.
- Commit SHA(s) — one per concern.
- One-line takeaway: now training requires the corpus. No more flaky edge-tts hammer.

Time budget: ~25 min.
codex
I’ll make this as scoped repo work: inspect the existing training paths and packaging, patch the code/docs/tests, run the requested proof commands, then commit explicit files by concern. I’ll avoid running the corpus downloader and keep shell quoting simple.
  → Inspect repo scripts, training code, docs, and tests
  • Patch training corpus behavior and Edge-TTS retry handling
  • Add download-corpus CLI and mocked smoke test
  • Update deployment docs and compose comment
  • Run targeted validation and requested console tests
  • Stage explicit files and create logical commits
codex
Using the `code-nav` skill for quick symbol lookup before wider file reads. After that I’ll only read the files needed for this change.
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content -Path C:\\Users\\jihad\\.codex\\skills\\code-nav\\SKILL.md" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 465ms:
---
name: code-nav
description: Fast symbol lookup with `tools/code_nav.py`. Use before broad `rg` or file reads when you need definitions, references, callers, or file structure.
---

# Code Navigation

Use this skill from the NOVVIOLA repo root.

## Workflow

1. Resolve a definition before broad search:
   `python tools/code_nav.py definition <symbol>`
2. Find references when tracing behavior:
   `python tools/code_nav.py references <symbol>`
3. Inspect a file outline before opening the full file:
   `python tools/code_nav.py structure <path>`
4. Find callers when you need the entry points into a function:
   `python tools/code_nav.py callers <function>`

## Quick Commands

- `python tools/code_nav.py definition create_browser_server`
- `python tools/code_nav.py references payment_gate`
- `python tools/code_nav.py structure intent/agent_executor.py`
- `python tools/code_nav.py callers request_review`

## Notes

- Prefer this before broad `rg` when the question is about Python symbols.
- Do not use this for non-symbol text searches or for reading file bodies line by line.

## Read Next

Read `references/source.md` for the original Claude skill and script path.

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git status --short' in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg --files | rg \"(training_service\\.py|train\\.py|pyproject\\.toml|docker-compose\\.production\\.yml|DEPLOYMENT\\.md|tests)\"" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Test-Path tools\\code_nav.py; Test-Path src\\violawake_sdk\\tools; Test-Path console\\backend\\app\\services" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 328ms:
 M console/backend/requirements.txt
?? .codex_log_a.txt
?? .codex_log_alpha.txt
?? .codex_log_b.txt
?? .codex_log_c.txt
?? .codex_log_corpus.txt
?? .codex_log_delta.txt
?? .codex_prompt_a.txt
?? .codex_prompt_alpha.txt
?? .codex_prompt_b.txt
?? .codex_prompt_c.txt
?? .codex_prompt_corpus.txt
?? .codex_prompt_delta.txt

 succeeded in 336ms:
False
True
True

 succeeded in 452ms:
pyproject.toml
docker-compose.production.yml
tests\__init__.py
console\tests\__init__.py
console\tests\test_teams.py
console\tests\test_storage.py
console\tests\test_security_headers.py
console\tests\test_retention.py
console\tests\test_max_body_size.py
console\tests\test_job_queue.py
console\tests\test_health_monitoring.py
console\tests\test_billing.py
console\tests\test_backend.py
console\tests\test_auth_security.py
console\tests\test_auth_email_routes.py
console\tests\screenshot_audit.py
docs\DEPLOYMENT.md
tests\unit\__init__.py
tests\unit\test__constants.py
tests\unit\test_weight_averaging.py
tests\unit\test_wake_detector_edge_cases.py
tests\unit\test_wake_detector_core.py
tests\unit\test_wake_detector.py
tests\unit\test_wake_decision_policy.py
tests\unit\test_wakeword_detector.py
tests\unit\test_voice_pipeline.py
tests\unit\test_vad_engine.py
tests\unit\test_vad.py
tests\unit\test_tts_engine.py
tests\unit\test_training_pipeline.py
tests\unit\test_train.py
tests\unit\test_tflite_backend.py
tests\unit\test_temporal_model.py
tests\unit\test_stt_tts_engines.py
tests\unit\test_stt_engine_wav.py
tests\unit\test_stt_engine.py
tests\unit\test_stress.py
tests\unit\test_spec_augment.py
tests\unit\test_speaker.py
tests\unit\test_rir_augment.py
tests\unit\test_power_manager.py
tests\unit\test_pipeline.py
tests\unit\test_performance.py
tests\unit\test_oww_backbone.py
tests\unit\test_noise_profiler.py
tests\unit\test_model_download.py
tests\unit\test_models.py
tests\unit\test_losses.py
tests\unit\test_long_running.py
tests\unit\test_fuzz.py
tests\unit\test_evaluate_oww.py
tests\unit\test_ensemble.py
tests\unit\test_detector_config.py
tests\unit\test_confusables.py
tests\unit\test_confidence.py
tests\unit\test_concurrent_access.py
tests\unit\test_cli.py
tests\unit\test_cert_pinning.py
tests\unit\test_benchmark.py
tests\unit\test_augment.py
tests\unit\test_audio_source.py
tests\unit\test_audio.py
tests\unit\test_async_detector.py
src\violawake_sdk\cli\train.py
console\backend\tests\__init__.py
console\backend\tests\test_change_password.py
console\tests\e2e\__init__.py
console\tests\e2e\test_browser_flow.py
console\tests\e2e\test_api_flow.py
console\tests\e2e\pytest.ini
console\tests\e2e\conftest_playwright.py
console\tests\e2e\conftest.py
console\tests\conftest.py
src\violawake_sdk\tools\train.py
tests\golden_path_test.py
tests\conftest.py
tests\live\test_live_website.py
tests\live\test_live_wasm.py
tests\live\test_live_sdk.py
tests\live\test_live_email.py
tests\live\test_live_billing.py
tests\live\test_live_api.py
tests\live\run_smoke.sh
tests\live\RESULTS_2026-05-07.md
tests\live\RESULTS_2026-05-06.md
tests\live\README.md
tests\live\full_pipeline_e2e.py
tests\live\conftest.py
tests\live\ACCESSIBILITY_AUDIT_2026-05-07.md
experiments\exp_bcd_retrain.py
tests\benchmarks\__init__.py
tests\benchmarks\bench_latency.py
tests\integration\test_sdk_surface.py
tests\integration\test_pipeline.py
tests\integration\test_full_pipeline.py
tests\integration\test_feature_completeness.py
tests\integration\test_training_e2e.py
tests\integration\test_streaming_stt.py
tests\integration\test_wake_detector_e2e.py
tests\integration\__init__.py
console\backend\app\services\training_service.py

codex
The worktree is already dirty: `console/backend/requirements.txt` is modified and several `.codex_log*`/`.codex_prompt*` files are untracked. I’m leaving those alone and will stage only the files I touch.
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n -C 6 \"generate_speech_negatives|Generating speech negatives|corpus|neg_tag_map|total_neg|confusable|generate_tts\" console\\backend\\app\\services\\training_service.py" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n -C 4 \"def _generate_speech_negatives|def _generate_tts_positives|def _generate_confusable_negatives|edge_tts|Communicate|WSServerHandshakeError|asyncio\" src\\violawake_sdk\\tools\\train.py" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n -C 4 \"\\[project\\.scripts\\]|violawake|download\" pyproject.toml" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n -C 5 \"corpus|/app/corpus|DEPLOYMENT|environment|Backend\" docker-compose.production.yml docs\\DEPLOYMENT.md" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 343ms:
2-requires = ["hatchling>=1.21"]
3-build-backend = "hatchling.build"
4-
5-[project]
6:name = "violawake"
7-version = "0.2.3"
8-description = "Open-source wake word detection SDK with training pipeline — privacy-first, on-device, Python-native"
9-readme = "README.md"
10-license = { file = "LICENSE" }
--
34-]
35-requires-python = ">=3.10"
36-
37-# Core dependencies (inference-only: wake word + VAD)
38:# pyaudio, requests, and tqdm are optional — see [audio] and [download] extras
39-# openwakeword is optional — see [oww] extra (tflite-runtime has no Python 3.12+ Linux wheels)
40-dependencies = [
41-    "onnxruntime>=1.17",
42-    "numpy>=1.24",
--
50-    "pyaudio>=0.2.14",
51-    "soundfile>=0.12",
52-]
53-
54:# Download: model downloading with progress bars
55:download = [
56-    "requests>=2.31",
57-    "tqdm>=4.66",
58-]
59-
--
120-]
121-
122-# Full install
123-all = [
124:    "violawake[audio,download,tts,stt,oww,tflite,vad,training,generate]",
125-]
126-
127-# Dev / CI
128-dev = [
--
139-    "tqdm>=4.66",
140-    "types-requests>=2.31",
141-]
142-
143:[project.scripts]
144:violawake-train = "violawake_sdk.tools.train:main"
145:violawake-eval = "violawake_sdk.tools.evaluate:main"
146:violawake-collect = "violawake_sdk.tools.collect_samples:main"
147:violawake-download = "violawake_sdk.tools.download_model:main"
148:violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
149:violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
150:violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
151:violawake-contamination-check = "violawake_sdk.tools.contamination_check:main"
152:violawake-generate = "violawake_sdk.tools.generate_samples:main"
153-
154-[project.urls]
155-Homepage = "https://github.com/GeeIHadAGoodTime/ViolaWake"
156-Documentation = "https://github.com/GeeIHadAGoodTime/ViolaWake#readme"
--
189-    "*.onnx",
190-]
191-
192-[tool.hatch.build.targets.wheel]
193:packages = ["src/violawake_sdk"]
194-
195-[tool.pytest.ini_options]
196-testpaths = ["tests"]
197:addopts = "-v --cov=violawake_sdk --cov-report=term-missing"
198-asyncio_mode = "auto"
199-markers = [
200-    "integration: requires model files (deselect with -m 'not integration')",
201-    "benchmark: performance benchmark tests",
--
227-ignore = [
228-    "E501", # line too long (handled by formatter)
229-    "T201", # print() allowed in scripts/tools
230-]
231:per-file-ignores = { "tools/**" = ["T201"], "src/violawake_sdk/tools/**" = ["T201"] }
232-
233-[tool.mypy]
234-python_version = "3.10"
235-strict = true
--
237-warn_unused_configs = true
238-disallow_untyped_defs = true
239-# numpy/torch return types are complex; these codes are noisy in strict mode
240-disable_error_code = ["no-any-return", "type-var", "no-untyped-call"]
241:exclude = ["violawake_sdk/training/", "violawake_sdk/tools/"]
242-
243-[[tool.mypy.overrides]]
244-module = [
245-    "openwakeword.*",
--
261-]
262-ignore_missing_imports = true
263-
264-[tool.coverage.run]
265:source = ["src/violawake_sdk"]
266-omit = ["*/tests/*", "*/tools/*"]
267-
268-[tool.coverage.report]
269-exclude_lines = [

 succeeded in 344ms:
263-# Edge-TTS audio synthesis helpers (async -> sync bridge)
264-# ---------------------------------------------------------------------------
265-
266-
267:def _edge_tts_fail(text: str, voice: str, detail: str | BaseException) -> bool:
268-    """Record and log an edge-tts failure while preserving the bool API."""
269-    global _LAST_EDGE_TTS_ERROR
270-
271-    summary = f"{type(detail).__name__}: {detail}" if isinstance(detail, BaseException) else detail
--
283-        )
284-    return False
285-
286-
287:def _edge_tts_synthesize(text: str, voice: str, output_path: Path) -> bool:
288-    """Synthesize a single phrase with edge-tts and save as WAV at 16kHz.
289-
290-    Returns True on success, False on failure.
291-    """
292:    import asyncio
293-    import io
294-    import tempfile
295-
296-    global _LAST_EDGE_TTS_ERROR
297-    _LAST_EDGE_TTS_ERROR = None
298-
299-    try:
300:        import edge_tts
301-    except ImportError as exc:
302-        message = "edge-tts is not installed. Install with: pip install edge-tts"
303-        if exc:
304-            message = f"{message} ({type(exc).__name__}: {exc})"
305:        return _edge_tts_fail(
306-            text,
307-            voice,
308-            message,
309-        )
310-
311-    async def _synth():
312:        communicate = edge_tts.Communicate(text, voice)
313-        mp3_buf = io.BytesIO()
314-        async for chunk in communicate.stream():
315-            if chunk["type"] == "audio":
316-                mp3_buf.write(chunk["data"])
--
318-
319-    try:
320-        # Run the async synthesis
321-        try:
322:            loop = asyncio.get_event_loop()
323-            if loop.is_running():
324-                import concurrent.futures
325-
326-                with concurrent.futures.ThreadPoolExecutor() as pool:
327:                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
328-            else:
329-                mp3_data = loop.run_until_complete(_synth())
330-        except RuntimeError:
331:            mp3_data = asyncio.run(_synth())
332-    except Exception as exc:
333:        return _edge_tts_fail(text, voice, exc)
334-
335-    if not mp3_data or len(mp3_data) < 100:
336:        return _edge_tts_fail(
337-            text,
338-            voice,
339-            f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
340-        )
--
399-        conversion_errors.append(f"torchaudio decode failed: {type(exc).__name__}: {exc}")
400-    finally:
401-        Path(tmp_path).unlink(missing_ok=True)
402-
403:    return _edge_tts_fail(
404-        text,
405-        voice,
406-        "MP3-to-WAV conversion failed; " + "; ".join(conversion_errors),
407-    )
--
453-    except Exception:
454-        return False
455-
456-
457:def _generate_tts_positives(
458-    wake_word: str,
459-    output_dir: Path,
460-    verbose: bool = True,
461-) -> list[Path]:
--
523-                    clean_path,
524-                    engine=kokoro_engine,
525-                )
526-            else:
527:                ok = _edge_tts_synthesize(phrase, voice, clean_path)
528-                if not ok and _ensure_kokoro_ready():
529-                    kokoro_voice = kokoro_voices[voice_idx % len(kokoro_voices)]
530-                    ok = _kokoro_tts_synthesize(
531-                        phrase,
--
573-
574-    return generated
575-
576-
577:def _generate_confusable_negatives(
578-    wake_word: str,
579-    output_dir: Path,
580-    n_confusables: int = 30,
581-    voices_per_word: int = 10,
--
612-            if out_path.exists():
613-                generated.append(out_path)
614-                continue
615-
616:            ok = _edge_tts_synthesize(word, voice, out_path)
617-            if ok and out_path.exists():
618-                generated.append(out_path)
619-
620-        if verbose and (word_idx + 1) % 10 == 0:
--
634-
635-    return generated
636-
637-
638:def _generate_speech_negatives(
639-    output_dir: Path,
640-    n_voices: int = 5,
641-    verbose: bool = True,
642-) -> list[Path]:
--
661-            if out_path.exists():
662-                generated.append(out_path)
663-                continue
664-
665:            ok = _edge_tts_synthesize(phrase, voice, out_path)
666-            if ok and out_path.exists():
667-                generated.append(out_path)
668-
669-        if verbose and (phrase_idx + 1) % 25 == 0:
--
1707-        if verbose:
1708-            print(f"  Generating {len(quality_phrases)} speech phrases for quality check...")
1709-        for i, phrase in enumerate(quality_phrases):
1710-            out_path = quality_dir / f"qc_speech_{i:03d}.wav"
1711:            ok = _edge_tts_synthesize(phrase, voice, out_path)
1712-            if ok and out_path.exists():
1713-                speech_files.append(out_path)
1714-
1715-        raw_confusables = generate_confusables(wake_word, count=40)
--
1732-            print(f"  Generating {len(confusable_words)} confusable words for quality check...")
1733-        for i, word in enumerate(confusable_words):
1734-            safe_word = word.replace(" ", "_")[:30]
1735-            out_path = quality_dir / f"qc_confusable_{i:03d}_{safe_word}.wav"
1736:            ok = _edge_tts_synthesize(word, voice, out_path)
1737-            if ok and out_path.exists():
1738-                confusable_files.append(out_path)
1739-
1740-        silence_audio = np.zeros(16000 * 10, dtype=np.float32)

 succeeded in 354ms:
89-            "status": "running",
90-            "progress": 2.0,
91-            "epoch": 0,
92-            "total_epochs": epochs,
93-            "train_loss": 0.0,
94-            "val_loss": 0.0,
95:            "message": "Loaded %s recordings. Generating TTS corpus..." % wav_count,
96-            "error": None,
97-        })
98-
99:        # -- Production pipeline: full auto-corpus (matching CLI train) --
100-        from violawake_sdk.tools.train import (
101:            _generate_confusable_negatives,
102:            _generate_speech_negatives,
103:            _generate_tts_positives,
104-            _train_temporal_cnn,
105-        )
106-
107-        neg_temp_dir = Path(tempfile.mkdtemp(prefix="violawake_neg_", dir=str(settings.tmp_dir)))
108-
109-        # Auto-generate TTS positives when user has <100 samples (production behavior)
110-        # Keep track of user-provided files so augmentation targets only real recordings
111-        user_pos_files = list(pos_files)
112-        if len(pos_files) < 100:
113-            tts_pos_dir = neg_temp_dir / "tts_positives"
114-            try:
115:                tts_pos_files = _generate_tts_positives(
116-                    wake_word,
117-                    tts_pos_dir,
118-                    verbose=False,
119-                )
120-                if tts_pos_files:
121-                    pos_files = list(pos_files) + tts_pos_files
--
138-                "total_epochs": epochs,
139-                "train_loss": 0.0,
140-                "val_loss": 0.0,
141-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
142-                "error": None,
143-            })
144:        neg_tag_map: dict[str, list[Path]] = {}
145-
146:        # Source 1: User/paid-tier corpus negatives
147-        if negatives_dir and negatives_dir.exists():
148-            user_neg = sorted(
149-                list(negatives_dir.rglob("*.wav")) + list(negatives_dir.rglob("*.flac"))
150-            )
151-            if user_neg:
152:                neg_tag_map["neg_user"] = user_neg
153:                logger.info("Loaded %s corpus negatives for job %s", len(user_neg), job_id)
154-
155-        _ensure_not_cancelled()
156-
157:        # Source 2: Auto-generated confusable negatives (phonetically similar)
158-        # Two rounds matching CLI production pipeline:
159:        #   Round 1: 30 confusables x 10 voices (broad phonetic coverage)
160:        #   Round 2: 16 confusables x 10 voices (tight variants for hard negatives)
161:        confusable_dir_r1 = neg_temp_dir / "confusables_r1"
162-        try:
163:            confusable_r1 = _generate_confusable_negatives(
164-                wake_word,
165:                confusable_dir_r1,
166:                n_confusables=30,
167-                voices_per_word=10,
168-                verbose=False,
169-            )
170:            if confusable_r1:
171:                neg_tag_map["neg_confusable_r1"] = confusable_r1
172-        except Exception as exc:
173-            logger.error(
174-                "Confusable round 1 FAILED for job %s: %s — "
175-                "model will have higher false positive rate on similar-sounding words",
176-                job_id, exc,
177-            )
178-
179-        _ensure_not_cancelled()
180-
181:        confusable_dir_r2 = neg_temp_dir / "confusables_r2"
182-        try:
183:            confusable_r2 = _generate_confusable_negatives(
184-                wake_word,
185:                confusable_dir_r2,
186:                n_confusables=16,
187-                voices_per_word=10,
188-                verbose=False,
189-            )
190:            if confusable_r2:
191:                neg_tag_map["neg_confusable_r2"] = confusable_r2
192-        except Exception as exc:
193-            logger.error(
194-                "Confusable round 2 FAILED for job %s: %s",
195-                job_id, exc,
196-            )
197-
--
200-            "status": "running",
201-            "progress": 4.0,
202-            "epoch": 0,
203-            "total_epochs": epochs,
204-            "train_loss": 0.0,
205-            "val_loss": 0.0,
206:            "message": "Generated confusables. Generating speech negatives...",
207-            "error": None,
208-        })
209-
210-        # Source 3: Auto-generated speech negatives (common phrases)
211-        # 5 voices matching CLI production pipeline (was 3)
212-        speech_dir = neg_temp_dir / "speech"
213-        try:
214:            speech_files = _generate_speech_negatives(
215-                speech_dir,
216-                n_voices=5,
217-                verbose=False,
218-            )
219-            if speech_files:
220:                neg_tag_map["neg_speech"] = speech_files
221-        except Exception as exc:
222-            logger.error(
223-                "Speech neg generation FAILED for job %s: %s — "
224-                "model will have higher false positive rate on general speech",
225-                job_id, exc,
226-            )
227-
228-        _ensure_not_cancelled()
229-
230:        # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
231-        _CORPUS_SEARCH_PATHS = [
232:            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
233:            Path.home() / ".violawake" / "corpus",
234:            Path("corpus"),
235-        ]
236-        _CORPUS_SUBDIRS: dict[str, tuple[str, ...]] = {
237-            "neg_librispeech": ("librispeech",),
238-            "neg_musan_speech": ("musan/musan/speech", "musan/speech"),
239-            "neg_musan_music": ("musan/musan/music", "musan/music"),
240-            "neg_musan_noise": ("musan/musan/noise", "musan/noise"),
241-        }
242-        _rng = random.Random(42)
243-        for tag, subdirs in _CORPUS_SUBDIRS.items():
244:            for corpus_root in _CORPUS_SEARCH_PATHS:
245:                if not corpus_root.exists():
246-                    continue
247-                for subdir in subdirs:
248:                    candidate = corpus_root / subdir
249-                    if candidate.exists():
250:                        corpus_files = sorted(
251-                            list(candidate.rglob("*.wav")) + list(candidate.rglob("*.flac"))
252-                        )
253:                        if corpus_files:
254:                            if len(corpus_files) > 2000:
255:                                corpus_files = sorted(_rng.sample(corpus_files, 2000))
256:                            neg_tag_map[tag] = corpus_files
257-                            break
258:                if tag in neg_tag_map:
259-                    break
260-
261-        all_neg_files: list[Path] = []
262:        for files in neg_tag_map.values():
263-            all_neg_files.extend(files)
264-
265:        total_neg = len(all_neg_files)
266:        if total_neg < 5:
267-            raise RuntimeError(
268-                "Only %s negative files generated. "
269:                "edge-tts may not be installed or network unavailable." % total_neg
270-            )
271-
272-        progress_callback({
273-            "status": "running",
274-            "progress": 8.0,
275-            "epoch": 0,
276-            "total_epochs": epochs,
277-            "train_loss": 0.0,
278-            "val_loss": 0.0,
279:            "message": "Corpus ready: %s pos, %s neg. Training TemporalCNN..." % (len(pos_files), total_neg),
280-            "error": None,
281-        })
282-
283-        started_at = time.monotonic()
284-
285-        def _on_epoch(info: dict[str, Any]) -> None:
--
315-            wake_word=wake_word,
316-            epochs=epochs,
317-            augment=True,
318-            eval_dir=None,
319-            verbose=True,
320-            progress_callback=_on_epoch,
321:            neg_tags=neg_tag_map,
322-            tmp_dir=settings.tmp_dir,
323-            augment_source_files=user_pos_files,
324-        )
325-
326-        _ensure_not_cancelled()
327-        if not output_path.exists():

 succeeded in 372ms:
docs\DEPLOYMENT.md-14-                                 ▼  (git push origin master — manual)
docs\DEPLOYMENT.md-15-                          (no auto-deploy)
docs\DEPLOYMENT.md-16-                                 │
docs\DEPLOYMENT.md-17-            ┌────────────────────┴────────────────────┐
docs\DEPLOYMENT.md-18-            ▼                                         ▼
docs\DEPLOYMENT.md:19:  Frontend deploy (manual)                Backend deploy (manual)
docs\DEPLOYMENT.md-20-  ────────────────────────                ─────────────────────
docs\DEPLOYMENT.md-21-  cd console/frontend                     cd /                          
docs\DEPLOYMENT.md-22-  VITE_API_URL=… npm run build            docker compose build backend
docs\DEPLOYMENT.md-23-  wrangler pages deploy dist              docker compose up -d backend
docs\DEPLOYMENT.md-24-            │                                         │
--
docs\DEPLOYMENT.md-45-- **Separate Postgres in the same compose stack** — ViolaWake's data is isolated from NOVVIOLA's data and from any other project. Volume `pgdata` survives container recreations.
docs\DEPLOYMENT.md-46-- **Why not auto-deploy?** GitHub Actions don't have access to the local Docker host (the backend lives there) and Cloudflare Pages was set up without git auto-build. Both are intentional during the operator-launch phase to keep the deploy gate human.
docs\DEPLOYMENT.md-47-
docs\DEPLOYMENT.md-48----
docs\DEPLOYMENT.md-49-
docs\DEPLOYMENT.md:50:## Backend deploy (api.violawake.com)
docs\DEPLOYMENT.md-51-
docs\DEPLOYMENT.md-52-### Prerequisites
docs\DEPLOYMENT.md-53-
docs\DEPLOYMENT.md-54-- Docker Desktop running on the host machine (currently `75.86.16.150`)
docs\DEPLOYMENT.md-55-- `.env.production` exists at the repo root with the required env vars (see `Required env vars` below). This file is git-ignored — never commit secrets.
--
docs\DEPLOYMENT.md-191-- View: dash.cloudflare.com → Zero Trust → Networks → Tunnels → `violawake-api` → Public Hostname
docs\DEPLOYMENT.md-192-- Modify: same UI, or POST to the Cloudflare API `/accounts/{id}/cfd_tunnel/{tunnel_id}/configurations`
docs\DEPLOYMENT.md-193-
docs\DEPLOYMENT.md-194-### Common tunnel issues
docs\DEPLOYMENT.md-195-
docs\DEPLOYMENT.md:196:- **Backend container restart kills health for ~30s** — the tunnel keeps connections open and reconnects automatically; tunnel reconfiguration is not needed.
docs\DEPLOYMENT.md-197-- **Tunnel container down (wakeword-tunnel-1 stopped)** — `api.violawake.com` returns Cloudflare 1033 / 530. Fix: `docker compose up -d tunnel`.
docs\DEPLOYMENT.md-198-- **Tunnel token expired** — replace `CLOUDFLARE_TUNNEL_TOKEN` in `.env.production`, restart the tunnel container.
docs\DEPLOYMENT.md-199-
docs\DEPLOYMENT.md-200----
docs\DEPLOYMENT.md-201-
--
docs\DEPLOYMENT.md-242-- [ ] You know how to roll back
docs\DEPLOYMENT.md-243-
docs\DEPLOYMENT.md-244-## Post-deploy verification (run after every deploy)
docs\DEPLOYMENT.md-245-
docs\DEPLOYMENT.md-246-```bash
docs\DEPLOYMENT.md:247:# Backend
docs\DEPLOYMENT.md-248-curl -sS -o /dev/null -w "health: %{http_code}\n" https://api.violawake.com/api/health
docs\DEPLOYMENT.md-249-curl -sS https://api.violawake.com/openapi.json | python -c "import sys,json;d=json.load(sys.stdin);print('routes:',len(d['paths']))"
docs\DEPLOYMENT.md-250-
docs\DEPLOYMENT.md-251-# Frontend
docs\DEPLOYMENT.md-252-curl -sS https://violawake.com/ | grep -oE '/assets/index-[A-Za-z0-9]+\.js'
--
docker-compose.production.yml-1-services:
docker-compose.production.yml-2-  postgres:
docker-compose.production.yml-3-    image: postgres:16-alpine
docker-compose.production.yml-4-    restart: always
docker-compose.production.yml:5:    environment:
docker-compose.production.yml-6-      POSTGRES_DB: violawake
docker-compose.production.yml-7-      POSTGRES_USER: violawake
docker-compose.production.yml-8-      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
docker-compose.production.yml-9-    volumes:
docker-compose.production.yml-10-      - pgdata:/var/lib/postgresql/data
--
docker-compose.production.yml-23-    expose:
docker-compose.production.yml-24-      - "8000"
docker-compose.production.yml-25-    volumes:
docker-compose.production.yml-26-      - backend-data:/app/data
docker-compose.production.yml-27-      - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
docker-compose.production.yml:28:      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
docker-compose.production.yml:29:      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
docker-compose.production.yml-30-      # edge-tts negative generation falls short. Without this mount, training
docker-compose.production.yml-31-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
docker-compose.production.yml:32:      - ./corpus:/app/corpus:ro
docker-compose.production.yml-33-    env_file:
docker-compose.production.yml-34-      - .env.production
docker-compose.production.yml:35:    environment:
docker-compose.production.yml-36-      - VIOLAWAKE_DB_URL=postgresql+asyncpg://violawake:${POSTGRES_PASSWORD}@postgres:5432/violawake
docker-compose.production.yml-37-      - VIOLAWAKE_NEGATIVES_CORPUS_DIR=/app/negatives
docker-compose.production.yml-38-    depends_on:
docker-compose.production.yml-39-      postgres:
docker-compose.production.yml-40-        condition: service_healthy

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content src\\violawake_sdk\\tools\\train.py -TotalCount 720" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content console\\backend\\app\\services\\training_service.py -TotalCount 340" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content docs\\DEPLOYMENT.md -TotalCount 130" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'Get-Content docker-compose.production.yml -TotalCount 50' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 388ms:
services:
  postgres:
    image: postgres:16-alpine
    restart: always
    environment:
      POSTGRES_DB: violawake
      POSTGRES_USER: violawake
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U violawake"]
      interval: 5s
      timeout: 3s
      retries: 5
    shm_size: 256mb

  backend:
    build:
      context: .
      dockerfile: console/Dockerfile.backend
    restart: always
    expose:
      - "8000"
    volumes:
      - backend-data:/app/data
      - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
      # edge-tts negative generation falls short. Without this mount, training
      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
      - ./corpus:/app/corpus:ro
    env_file:
      - .env.production
    environment:
      - VIOLAWAKE_DB_URL=postgresql+asyncpg://violawake:${POSTGRES_PASSWORD}@postgres:5432/violawake
      - VIOLAWAKE_NEGATIVES_CORPUS_DIR=/app/negatives
    depends_on:
      postgres:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  frontend:
    build:
      context: .

 succeeded in 389ms:
"""Training pipeline helpers used by the async job queue."""

from __future__ import annotations

import json
import logging
import random
import shutil
import tempfile
import time
from collections.abc import Callable
from dataclasses import dataclass
from pathlib import Path
from typing import Any

from app.config import settings
from app.monitoring import log_exception
from app.storage import get_storage

logger = logging.getLogger("violawake.training")


class TrainingCancelledError(RuntimeError):
    """Raised when a running training job is cancelled."""


@dataclass(slots=True)
class TrainingArtifact:
    """Artifacts produced by a completed training run."""

    local_path: Path
    config_json: str | None
    config_bytes: bytes | None
    d_prime: float | None
    size_bytes: int


def run_training_job_sync(
    *,
    job_id: int,
    wake_word: str,
    recording_identifiers: list[str],
    output_path: Path,
    epochs: int,
    timeout_seconds: int,
    progress_callback: Callable[[dict[str, Any]], None],
    is_cancelled: Callable[[], bool],
    negatives_dir: Path | None = None,
) -> TrainingArtifact:
    """Run the ViolaWake SDK training pipeline synchronously."""
    positives_dir: Path | None = None
    neg_temp_dir: Path | None = None
    storage = get_storage()

    def _ensure_not_cancelled() -> None:
        if is_cancelled():
            raise TrainingCancelledError("Training cancelled by user")

    try:
        _ensure_not_cancelled()
        progress_callback({
            "status": "running",
            "progress": 0.0,
            "epoch": 0,
            "total_epochs": epochs,
            "train_loss": 0.0,
            "val_loss": 0.0,
            "message": "Preparing training data...",
            "error": None,
        })

        positives_dir = Path(tempfile.mkdtemp(prefix="violawake_train_", dir=str(settings.tmp_dir)))
        for index, recording_identifier in enumerate(recording_identifiers):
            _ensure_not_cancelled()
            if not storage.exists(recording_identifier):
                logger.warning("Recording %s was missing for training job %s", recording_identifier, job_id)
                continue

            dst = positives_dir / f"sample_{index:04d}.wav"
            dst.write_bytes(storage.download(recording_identifier))

        wav_count = len(list(positives_dir.glob("*.wav")))
        if wav_count < 5:
            raise RuntimeError("Only %s valid WAV files found. Need at least 5." % wav_count)

        pos_files = sorted(positives_dir.glob("*.wav"))

        progress_callback({
            "status": "running",
            "progress": 2.0,
            "epoch": 0,
            "total_epochs": epochs,
            "train_loss": 0.0,
            "val_loss": 0.0,
            "message": "Loaded %s recordings. Generating TTS corpus..." % wav_count,
            "error": None,
        })

        # -- Production pipeline: full auto-corpus (matching CLI train) --
        from violawake_sdk.tools.train import (
            _generate_confusable_negatives,
            _generate_speech_negatives,
            _generate_tts_positives,
            _train_temporal_cnn,
        )

        neg_temp_dir = Path(tempfile.mkdtemp(prefix="violawake_neg_", dir=str(settings.tmp_dir)))

        # Auto-generate TTS positives when user has <100 samples (production behavior)
        # Keep track of user-provided files so augmentation targets only real recordings
        user_pos_files = list(pos_files)
        if len(pos_files) < 100:
            tts_pos_dir = neg_temp_dir / "tts_positives"
            try:
                tts_pos_files = _generate_tts_positives(
                    wake_word,
                    tts_pos_dir,
                    verbose=False,
                )
                if tts_pos_files:
                    pos_files = list(pos_files) + tts_pos_files
                    logger.info(
                        "Generated %s TTS positives for job %s (total: %s)",
                        len(tts_pos_files), job_id, len(pos_files),
                    )
            except Exception as exc:
                logger.error(
                    "TTS positive generation FAILED for job %s: %s â€” "
                    "model quality will be degraded without TTS diversity",
                    job_id, exc,
                )

            _ensure_not_cancelled()
            progress_callback({
                "status": "running",
                "progress": 3.0,
                "epoch": 0,
                "total_epochs": epochs,
                "train_loss": 0.0,
                "val_loss": 0.0,
                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
                "error": None,
            })
        neg_tag_map: dict[str, list[Path]] = {}

        # Source 1: User/paid-tier corpus negatives
        if negatives_dir and negatives_dir.exists():
            user_neg = sorted(
                list(negatives_dir.rglob("*.wav")) + list(negatives_dir.rglob("*.flac"))
            )
            if user_neg:
                neg_tag_map["neg_user"] = user_neg
                logger.info("Loaded %s corpus negatives for job %s", len(user_neg), job_id)

        _ensure_not_cancelled()

        # Source 2: Auto-generated confusable negatives (phonetically similar)
        # Two rounds matching CLI production pipeline:
        #   Round 1: 30 confusables x 10 voices (broad phonetic coverage)
        #   Round 2: 16 confusables x 10 voices (tight variants for hard negatives)
        confusable_dir_r1 = neg_temp_dir / "confusables_r1"
        try:
            confusable_r1 = _generate_confusable_negatives(
                wake_word,
                confusable_dir_r1,
                n_confusables=30,
                voices_per_word=10,
                verbose=False,
            )
            if confusable_r1:
                neg_tag_map["neg_confusable_r1"] = confusable_r1
        except Exception as exc:
            logger.error(
                "Confusable round 1 FAILED for job %s: %s â€” "
                "model will have higher false positive rate on similar-sounding words",
                job_id, exc,
            )

        _ensure_not_cancelled()

        confusable_dir_r2 = neg_temp_dir / "confusables_r2"
        try:
            confusable_r2 = _generate_confusable_negatives(
                wake_word,
                confusable_dir_r2,
                n_confusables=16,
                voices_per_word=10,
                verbose=False,
            )
            if confusable_r2:
                neg_tag_map["neg_confusable_r2"] = confusable_r2
        except Exception as exc:
            logger.error(
                "Confusable round 2 FAILED for job %s: %s",
                job_id, exc,
            )

        _ensure_not_cancelled()
        progress_callback({
            "status": "running",
            "progress": 4.0,
            "epoch": 0,
            "total_epochs": epochs,
            "train_loss": 0.0,
            "val_loss": 0.0,
            "message": "Generated confusables. Generating speech negatives...",
            "error": None,
        })

        # Source 3: Auto-generated speech negatives (common phrases)
        # 5 voices matching CLI production pipeline (was 3)
        speech_dir = neg_temp_dir / "speech"
        try:
            speech_files = _generate_speech_negatives(
                speech_dir,
                n_voices=5,
                verbose=False,
            )
            if speech_files:
                neg_tag_map["neg_speech"] = speech_files
        except Exception as exc:
            logger.error(
                "Speech neg generation FAILED for job %s: %s â€” "
                "model will have higher false positive rate on general speech",
                job_id, exc,
            )

        _ensure_not_cancelled()

        # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
        _CORPUS_SEARCH_PATHS = [
            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
            Path.home() / ".violawake" / "corpus",
            Path("corpus"),
        ]
        _CORPUS_SUBDIRS: dict[str, tuple[str, ...]] = {
            "neg_librispeech": ("librispeech",),
            "neg_musan_speech": ("musan/musan/speech", "musan/speech"),
            "neg_musan_music": ("musan/musan/music", "musan/music"),
            "neg_musan_noise": ("musan/musan/noise", "musan/noise"),
        }
        _rng = random.Random(42)
        for tag, subdirs in _CORPUS_SUBDIRS.items():
            for corpus_root in _CORPUS_SEARCH_PATHS:
                if not corpus_root.exists():
                    continue
                for subdir in subdirs:
                    candidate = corpus_root / subdir
                    if candidate.exists():
                        corpus_files = sorted(
                            list(candidate.rglob("*.wav")) + list(candidate.rglob("*.flac"))
                        )
                        if corpus_files:
                            if len(corpus_files) > 2000:
                                corpus_files = sorted(_rng.sample(corpus_files, 2000))
                            neg_tag_map[tag] = corpus_files
                            break
                if tag in neg_tag_map:
                    break

        all_neg_files: list[Path] = []
        for files in neg_tag_map.values():
            all_neg_files.extend(files)

        total_neg = len(all_neg_files)
        if total_neg < 5:
            raise RuntimeError(
                "Only %s negative files generated. "
                "edge-tts may not be installed or network unavailable." % total_neg
            )

        progress_callback({
            "status": "running",
            "progress": 8.0,
            "epoch": 0,
            "total_epochs": epochs,
            "train_loss": 0.0,
            "val_loss": 0.0,
            "message": "Corpus ready: %s pos, %s neg. Training TemporalCNN..." % (len(pos_files), total_neg),
            "error": None,
        })

        started_at = time.monotonic()

        def _on_epoch(info: dict[str, Any]) -> None:
            _ensure_not_cancelled()
            elapsed = time.monotonic() - started_at
            if elapsed > timeout_seconds:
                raise RuntimeError(
                    "Training job timed out after %ss (%s minutes)"
                    % (timeout_seconds, timeout_seconds // 60)
                )

            epoch = int(info.get("epoch", 0))
            total_epochs = int(info.get("total_epochs", epochs)) or epochs
            train_loss = float(info.get("train_loss", 0.0))
            val_loss = float(info.get("val_loss", 0.0))
            progress = min(10.0 + 85.0 * (epoch / total_epochs), 95.0)

            progress_callback({
                "status": "running",
                "progress": round(progress, 2),
                "epoch": epoch,
                "total_epochs": total_epochs,
                "train_loss": round(train_loss, 6),
                "val_loss": round(val_loss, 6),
                "message": "Epoch %s/%s - loss: %.4f" % (epoch, total_epochs, train_loss),
                "error": None,
            })

        _train_temporal_cnn(
            pos_files=pos_files,
            neg_files=all_neg_files,
            output_path=output_path,
            wake_word=wake_word,
            epochs=epochs,
            augment=True,
            eval_dir=None,
            verbose=True,
            progress_callback=_on_epoch,
            neg_tags=neg_tag_map,
            tmp_dir=settings.tmp_dir,
            augment_source_files=user_pos_files,
        )

        _ensure_not_cancelled()
        if not output_path.exists():
            raise RuntimeError("Training completed but no model file was produced")

        config_path = output_path.with_suffix(".config.json")
        config_json: str | None = None
        config_bytes: bytes | None = None
        d_prime_value: float | None = None
        if config_path.exists():
            config_bytes = config_path.read_bytes()
            config_data = json.loads(config_bytes.decode("utf-8"))
            config_json = json.dumps(config_data)
            if isinstance(config_data, dict):
                raw_d_prime = config_data.get("d_prime")
                if isinstance(raw_d_prime, (int, float)):

 succeeded in 424ms:
"""
violawake-train CLI -- Train a custom wake word model.

Entry point: ``violawake-train`` (declared in pyproject.toml).

Architecture: TemporalCNN classifier head on top of frozen OpenWakeWord (OWW)
audio embeddings. Operates on 9-frame windows of 96-dim OWW embeddings (~25K
params). This is the same architecture as the production Viola model.

Training pipeline:
  - Auto-generates TTS positives if user provides fewer than 100 samples
  - Auto-generates confusable negatives (phonetically similar words)
  - Auto-generates speech negatives (common phrases via TTS)
  - FocalLoss for class imbalance handling
  - AdamW optimizer with cosine annealing LR schedule
  - Exponential Moving Average (EMA) of model weights
  - 80/20 group-aware train/validation split with early stopping
  - Post-training quality gate (speech FP check)

Data pipeline (matches production golden path):
  A. Positives: user-provided + auto-TTS (edge-tts, 20 voices x 3 phrases x 3 conditions)
  B. Confusable negatives round 1: 30 phonetically similar words x 10 voices
  C. Confusable negatives round 2: 16 tighter variants x 10 voices
  D. Speech negatives: common phrases via TTS (100+ phrases x 5 voices)
  E. Shared universal corpus: LibriSpeech, MUSAN speech/music/noise (auto-discovered)
  F. User-provided negatives via --negatives directory (if any)

Usage::

    violawake-train \\
      --word "jarvis" \\
      --positives data/jarvis/positives/ \\
      --output models/jarvis.onnx \\
      --epochs 80

    # With real negative samples:
    violawake-train \\
      --word "jarvis" \\
      --positives data/jarvis/positives/ \\
      --negatives data/jarvis/negatives/ \\
      --output models/jarvis.onnx

    # Legacy MLP mode:
    violawake-train \\
      --word "jarvis" \\
      --positives data/jarvis/positives/ \\
      --output models/jarvis.onnx \\
      --architecture mlp

Minimum: 5 positive samples (auto-TTS fills to ~200). Recommended: 50+.
"""

from __future__ import annotations

import argparse
import json
import logging
import math
import os
import shutil
import sys
import time
from collections.abc import Callable
from pathlib import Path
from random import Random
from typing import TYPE_CHECKING, Any

if TYPE_CHECKING:
    import numpy as np

logger = logging.getLogger(__name__)

# Module-level temp directory override. When set, all tempfile operations use
# this instead of the OS default (which may be on a small system drive).
# Set by _train_temporal_cnn() via its tmp_dir parameter.
_TMP_DIR: str | None = None
_LAST_EDGE_TTS_ERROR: str | None = None
_REPORTED_EDGE_TTS_ERRORS: set[str] = set()

# ---------------------------------------------------------------------------
# Edge-TTS voice pool for diverse positive and negative generation
# ---------------------------------------------------------------------------

EDGE_TTS_VOICES = [
    "en-US-GuyNeural",
    "en-US-JennyNeural",
    "en-US-AriaNeural",
    "en-US-DavisNeural",
    "en-US-AmberNeural",
    "en-US-AnaNeural",
    "en-US-AndrewNeural",
    "en-US-BrandonNeural",
    "en-US-ChristopherNeural",
    "en-US-CoraNeural",
    "en-US-ElizabethNeural",
    "en-US-EricNeural",
    "en-US-JacobNeural",
    "en-US-MichelleNeural",
    "en-US-MonicaNeural",
    "en-US-RogerNeural",
    "en-US-SteffanNeural",
    "en-GB-SoniaNeural",
    "en-GB-RyanNeural",
    "en-AU-NatashaNeural",
]

# Common phrases for speech negative generation
SPEECH_NEGATIVE_PHRASES = [
    "what time is it",
    "play some music",
    "turn off the lights",
    "set an alarm for seven",
    "how is the weather today",
    "call mom",
    "send a message",
    "open the door",
    "good morning",
    "good night",
    "thank you very much",
    "what is the news",
    "order a pizza",
    "find a restaurant",
    "navigate to home",
    "play the next song",
    "stop the music",
    "volume up",
    "volume down",
    "skip this track",
    "read my emails",
    "remind me tomorrow",
    "schedule a meeting",
    "take a note",
    "search the web",
    "tell me a joke",
    "translate hello to spanish",
    "what is the temperature",
    "start a timer",
    "cancel the alarm",
    "shuffle my playlist",
    "add to shopping list",
    "where is the nearest store",
    "how far is the airport",
    "book a flight",
    "check my calendar",
    "do not disturb",
    "answer the phone",
    "reject the call",
    "pair bluetooth",
    "connect to wifi",
    "take a screenshot",
    "lock the screen",
    "battery level",
    "airplane mode on",
    "increase brightness",
    "mute the microphone",
    "record a video",
    "scan this document",
    "convert dollars to euros",
    "the quick brown fox jumps over the lazy dog",
    "once upon a time in a land far far away",
    "i need to go to the grocery store",
    "can you help me with something",
    "that sounds like a great idea",
    "i am not sure about that",
    "let me think about it for a moment",
    "we should probably leave soon",
    "have you seen my keys anywhere",
    "it is raining outside right now",
    "i will be there in five minutes",
    "please close the window",
    "the meeting starts at three",
    "dinner is almost ready",
    "happy birthday to you",
    "excuse me could you repeat that",
    "nice to meet you",
    "see you later",
    "how much does it cost",
    "where did you put it",
    "i forgot my password",
    "the package arrived today",
    "she said hello yesterday",
    "they went to the park",
    "he is coming home soon",
    "we are running late",
    "it was a beautiful day",
    "the cat sat on the mat",
    "please pass the salt",
    "i love this song",
    "turn left at the corner",
    "the train departs at noon",
    "water the plants please",
    "feed the dog",
    "empty the dishwasher",
    "check the mailbox",
    "pick up the groceries",
    "wash the car tomorrow",
    "vacuum the living room",
    "fold the laundry",
    "take out the trash",
    "clean the kitchen",
    "organize the closet",
    "paint the bedroom",
    "fix the leaky faucet",
    "mow the lawn this weekend",
    "trim the hedges",
    "shovel the driveway",
    "water the garden",
    "prune the roses",
    "rake the leaves",
]

ProgressCallback = Callable[[dict[str, Any]], None]


# ---------------------------------------------------------------------------
# Utility: ONNX runtime provider auto-detection
# ---------------------------------------------------------------------------


def get_best_provider(device: str | None = None) -> str:
    """Auto-detect the best ONNX Runtime execution provider.

    Priority order: CUDA > DirectML > CPU.

    Args:
        device: Optional manual override. One of "cuda", "directml", "cpu",
            or a full provider name like "CUDAExecutionProvider".

    Returns:
        An ONNX Runtime execution provider string.
    """
    import onnxruntime as ort

    if device is not None:
        _SHORTHAND = {
            "cuda": "CUDAExecutionProvider",
            "directml": "DmlExecutionProvider",
            "dml": "DmlExecutionProvider",
            "cpu": "CPUExecutionProvider",
        }
        provider = _SHORTHAND.get(device.lower(), device)
        available = ort.get_available_providers()
        if provider in available:
            return provider
        print(
            f"WARNING: Requested provider '{provider}' not available "
            f"(have: {available}). Falling back to auto-detection.",
            file=sys.stderr,
        )

    available = ort.get_available_providers()
    for provider in [
        "CUDAExecutionProvider",
        "DmlExecutionProvider",
        "CPUExecutionProvider",
    ]:
        if provider in available:
            return provider
    return "CPUExecutionProvider"


# ---------------------------------------------------------------------------
# Edge-TTS audio synthesis helpers (async -> sync bridge)
# ---------------------------------------------------------------------------


def _edge_tts_fail(text: str, voice: str, detail: str | BaseException) -> bool:
    """Record and log an edge-tts failure while preserving the bool API."""
    global _LAST_EDGE_TTS_ERROR

    summary = f"{type(detail).__name__}: {detail}" if isinstance(detail, BaseException) else detail
    _LAST_EDGE_TTS_ERROR = summary

    # A missing decoder causes hundreds of identical per-sample failures. Log
    # the actual exception once, then the generator summary logs the zero count.
    if summary not in _REPORTED_EDGE_TTS_ERRORS:
        _REPORTED_EDGE_TTS_ERRORS.add(summary)
        logger.error(
            "edge-tts synthesis failed for voice %s text %.80r: %s",
            voice,
            text,
            summary,
        )
    return False


def _edge_tts_synthesize(text: str, voice: str, output_path: Path) -> bool:
    """Synthesize a single phrase with edge-tts and save as WAV at 16kHz.

    Returns True on success, False on failure.
    """
    import asyncio
    import io
    import tempfile

    global _LAST_EDGE_TTS_ERROR
    _LAST_EDGE_TTS_ERROR = None

    try:
        import edge_tts
    except ImportError as exc:
        message = "edge-tts is not installed. Install with: pip install edge-tts"
        if exc:
            message = f"{message} ({type(exc).__name__}: {exc})"
        return _edge_tts_fail(
            text,
            voice,
            message,
        )

    async def _synth():
        communicate = edge_tts.Communicate(text, voice)
        mp3_buf = io.BytesIO()
        async for chunk in communicate.stream():
            if chunk["type"] == "audio":
                mp3_buf.write(chunk["data"])
        return mp3_buf.getvalue()

    try:
        # Run the async synthesis
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                import concurrent.futures

                with concurrent.futures.ThreadPoolExecutor() as pool:
                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
            else:
                mp3_data = loop.run_until_complete(_synth())
        except RuntimeError:
            mp3_data = asyncio.run(_synth())
    except Exception as exc:
        return _edge_tts_fail(text, voice, exc)

    if not mp3_data or len(mp3_data) < 100:
        return _edge_tts_fail(
            text,
            voice,
            f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
        )

    conversion_errors: list[str] = []

    # First try libsndfile via soundfile. The backend image already gets this
    # through the training stack, and it avoids a hard ffmpeg dependency.
    try:
        import numpy as np
        import soundfile as sf

        audio, sr = sf.read(io.BytesIO(mp3_data), dtype="float32")
        audio = np.asarray(audio, dtype=np.float32)
        if audio.size == 0:
            raise RuntimeError("decoded MP3 contained no audio samples")
        if audio.ndim > 1:
            audio = audio.mean(axis=1)
        if sr != 16000:
            audio = _resample_audio(audio, sr, 16000)
        _save_wav(audio, output_path, 16000)
        return output_path.exists() and output_path.stat().st_size > 44
    except ImportError as exc:
        conversion_errors.append(f"soundfile unavailable: {type(exc).__name__}: {exc}")
    except Exception as exc:
        conversion_errors.append(f"soundfile decode failed: {type(exc).__name__}: {exc}")

    # Fallback: pydub with ffmpeg/ffprobe when available.
    try:
        from pydub import AudioSegment

        seg = AudioSegment.from_mp3(io.BytesIO(mp3_data))
        seg = seg.set_channels(1).set_frame_rate(16000).set_sample_width(2)
        seg.export(str(output_path), format="wav")
        return output_path.exists() and output_path.stat().st_size > 44
    except ImportError as exc:
        conversion_errors.append(f"pydub unavailable: {type(exc).__name__}: {exc}")
    except Exception as exc:
        conversion_errors.append(f"pydub decode failed: {type(exc).__name__}: {exc}")

    # Fallback: write MP3 to temp, load with torchaudio.
    tmp_fd, tmp_path = tempfile.mkstemp(suffix=".mp3", dir=_TMP_DIR)
    try:
        os.write(tmp_fd, mp3_data)
    finally:
        os.close(tmp_fd)
    os.chmod(tmp_path, 0o600)

    try:
        import torchaudio

        waveform, sr = torchaudio.load(tmp_path)
        if waveform.shape[0] > 1:
            waveform = waveform.mean(dim=0, keepdim=True)
        if sr != 16000:
            waveform = torchaudio.transforms.Resample(sr, 16000)(waveform)
        torchaudio.save(str(output_path), waveform, 16000)
        return output_path.exists() and output_path.stat().st_size > 44
    except ImportError as exc:
        conversion_errors.append(f"torchaudio unavailable: {type(exc).__name__}: {exc}")
    except Exception as exc:
        conversion_errors.append(f"torchaudio decode failed: {type(exc).__name__}: {exc}")
    finally:
        Path(tmp_path).unlink(missing_ok=True)

    return _edge_tts_fail(
        text,
        voice,
        "MP3-to-WAV conversion failed; " + "; ".join(conversion_errors),
    )


def _resample_audio(audio: np.ndarray, source_rate: int, target_rate: int) -> np.ndarray:
    """Resample mono audio while keeping float32 output."""
    import numpy as np
    from scipy.signal import resample_poly

    if source_rate == target_rate:
        return np.asarray(audio, dtype=np.float32)

    gcd = math.gcd(source_rate, target_rate)
    up = target_rate // gcd
    down = source_rate // gcd
    return np.asarray(resample_poly(audio, up, down), dtype=np.float32)


def _kokoro_tts_synthesize(
    text: str,
    voice: str,
    output_path: Path,
    *,
    engine: Any | None = None,
) -> bool:
    """Synthesize a single phrase with Kokoro and save as WAV at 16kHz."""
    import numpy as np

    try:
        from violawake_sdk.tts import TTS_SAMPLE_RATE, TTSEngine
    except ImportError:
        return False

    try:
        kokoro_engine = engine
        if kokoro_engine is None:
            kokoro_engine = TTSEngine(voice=voice, sample_rate=TTS_SAMPLE_RATE)
        else:
            kokoro_engine.voice = voice

        audio = np.asarray(kokoro_engine.synthesize(text), dtype=np.float32)
        if audio.size == 0:
            return False
        if int(kokoro_engine.sample_rate) != 16000:
            audio = _resample_audio(audio, int(kokoro_engine.sample_rate), 16000)
        _save_wav(audio, output_path, sample_rate=16000)
        return True
    except Exception:
        return False


def _generate_tts_positives(
    wake_word: str,
    output_dir: Path,
    verbose: bool = True,
) -> list[Path]:
    """Generate diverse TTS positive samples using Edge TTS with Kokoro fallback.

    Produces: 20 voices x 3 phrases (WORD, hey WORD, ok WORD) = 60 clean files.
    Then augmentation (noisy + reverb) multiplies to ~180 total.

    Returns list of generated WAV file paths.
    """
    import numpy as np

    from violawake_sdk.training.augment import (
        rir_augment,
    )

    output_dir.mkdir(parents=True, exist_ok=True)
    phrases = [wake_word, f"hey {wake_word}", f"ok {wake_word}"]
    generated: list[Path] = []
    kokoro_fallback = False
    kokoro_engine: Any | None = None
    kokoro_voices: list[str] = []

    def _ensure_kokoro_ready() -> bool:
        nonlocal kokoro_fallback, kokoro_engine, kokoro_voices
        if kokoro_fallback:
            return kokoro_engine is not None and len(kokoro_voices) > 0
        try:
            from violawake_sdk.tts import AVAILABLE_VOICES, TTS_SAMPLE_RATE, TTSEngine
        except ImportError:
            return False

        print("Using Kokoro TTS for sample generation (Edge TTS unavailable)")
        kokoro_fallback = True
        kokoro_voices = list(AVAILABLE_VOICES)
        if not kokoro_voices:
            return False
        try:
            kokoro_engine = TTSEngine(
                voice=kokoro_voices[0],
                sample_rate=TTS_SAMPLE_RATE,
            )
        except Exception:
            kokoro_engine = None
        return kokoro_engine is not None

    if verbose:
        total = len(EDGE_TTS_VOICES) * len(phrases)
        print(
            f"  Generating TTS positives: {len(EDGE_TTS_VOICES)} voices x {len(phrases)} phrases = {total} clean samples..."
        )

    for voice_idx, voice in enumerate(EDGE_TTS_VOICES):
        for phrase_idx, phrase in enumerate(phrases):
            clean_path = output_dir / f"tts_pos_{voice_idx:02d}_{phrase_idx}_{voice}.wav"
            if clean_path.exists():
                generated.append(clean_path)
                continue

            if kokoro_fallback:
                kokoro_voice = kokoro_voices[voice_idx % len(kokoro_voices)]
                ok = _kokoro_tts_synthesize(
                    phrase,
                    kokoro_voice,
                    clean_path,
                    engine=kokoro_engine,
                )
            else:
                ok = _edge_tts_synthesize(phrase, voice, clean_path)
                if not ok and _ensure_kokoro_ready():
                    kokoro_voice = kokoro_voices[voice_idx % len(kokoro_voices)]
                    ok = _kokoro_tts_synthesize(
                        phrase,
                        kokoro_voice,
                        clean_path,
                        engine=kokoro_engine,
                    )
            if ok and clean_path.exists():
                generated.append(clean_path)

                # Generate noisy variant
                try:
                    from violawake_sdk.audio import load_audio
                    from violawake_sdk.training.augment import apply_additive_noise

                    audio = load_audio(clean_path)
                    if audio is not None and len(audio) > 0:
                        rng = np.random.default_rng(voice_idx * 100 + phrase_idx)

                        # Noisy variant (SNR 10-15 dB)
                        noisy = apply_additive_noise(audio, snr_db=12.0, rng=rng)
                        noisy_path = (
                            output_dir / f"tts_pos_{voice_idx:02d}_{phrase_idx}_{voice}_noisy.wav"
                        )
                        _save_wav(noisy, noisy_path)
                        generated.append(noisy_path)

                        # Reverb variant
                        reverbed = rir_augment(audio, rng=rng)
                        reverb_path = (
                            output_dir / f"tts_pos_{voice_idx:02d}_{phrase_idx}_{voice}_reverb.wav"
                        )
                        _save_wav(reverbed, reverb_path)
                        generated.append(reverb_path)
                except Exception:
                    pass  # Augmented variants are best-effort

        if verbose and (voice_idx + 1) % 5 == 0:
            print(
                f"    {voice_idx + 1}/{len(EDGE_TTS_VOICES)} voices done ({len(generated)} files)"
            )

    if verbose:
        print(f"  TTS positives generated: {len(generated)} files")

    return generated


def _generate_confusable_negatives(
    wake_word: str,
    output_dir: Path,
    n_confusables: int = 30,
    voices_per_word: int = 10,
    verbose: bool = True,
) -> list[Path]:
    """Generate confusable negative samples via TTS.

    Uses the confusables generator to find phonetically similar words,
    then synthesizes each with multiple TTS voices.

    Returns list of generated WAV file paths.
    """
    from violawake_sdk.tools.confusables import generate_confusables

    output_dir.mkdir(parents=True, exist_ok=True)
    confusable_words = generate_confusables(wake_word, count=n_confusables)

    if verbose:
        print(f"  Generated {len(confusable_words)} confusable words for '{wake_word}'")
        if confusable_words[:5]:
            print(f"    Top 5: {', '.join(confusable_words[:5])}")
        total = len(confusable_words) * voices_per_word
        print(
            f"  Synthesizing: {len(confusable_words)} words x {voices_per_word} voices = {total} samples..."
        )

    voices_subset = EDGE_TTS_VOICES[:voices_per_word]
    generated: list[Path] = []

    for word_idx, word in enumerate(confusable_words):
        for voice_idx, voice in enumerate(voices_subset):
            safe_word = word.replace(" ", "_")[:30]
            out_path = output_dir / f"confusable_{word_idx:03d}_{voice_idx}_{safe_word}.wav"
            if out_path.exists():
                generated.append(out_path)
                continue

            ok = _edge_tts_synthesize(word, voice, out_path)
            if ok and out_path.exists():
                generated.append(out_path)

        if verbose and (word_idx + 1) % 10 == 0:
            print(f"    {word_idx + 1}/{len(confusable_words)} words done ({len(generated)} files)")

    if not generated and confusable_words and voices_subset:
        logger.error(
            "edge-tts confusable negative generation produced 0 files for wake word %.80r "
            "after %s attempts; last error: %s",
            wake_word,
            len(confusable_words) * len(voices_subset),
            _LAST_EDGE_TTS_ERROR or "unknown",
        )

    if verbose:
        print(f"  Confusable negatives generated: {len(generated)} files")

    return generated


def _generate_speech_negatives(
    output_dir: Path,
    n_voices: int = 5,
    verbose: bool = True,
) -> list[Path]:
    """Generate speech negative samples via TTS using common phrases.

    Returns list of generated WAV file paths.
    """
    output_dir.mkdir(parents=True, exist_ok=True)
    voices_subset = EDGE_TTS_VOICES[:n_voices]
    generated: list[Path] = []

    total = len(SPEECH_NEGATIVE_PHRASES) * n_voices
    if verbose:
        print(
            f"  Generating speech negatives: {len(SPEECH_NEGATIVE_PHRASES)} phrases x {n_voices} voices = {total} samples..."
        )

    for phrase_idx, phrase in enumerate(SPEECH_NEGATIVE_PHRASES):
        for voice_idx, voice in enumerate(voices_subset):
            safe_phrase = phrase.replace(" ", "_")[:40]
            out_path = output_dir / f"speech_neg_{phrase_idx:03d}_{voice_idx}_{safe_phrase}.wav"
            if out_path.exists():
                generated.append(out_path)
                continue

            ok = _edge_tts_synthesize(phrase, voice, out_path)
            if ok and out_path.exists():
                generated.append(out_path)

        if verbose and (phrase_idx + 1) % 25 == 0:
            print(
                f"    {phrase_idx + 1}/{len(SPEECH_NEGATIVE_PHRASES)} phrases done ({len(generated)} files)"
            )

    if not generated and SPEECH_NEGATIVE_PHRASES and voices_subset:
        logger.error(
            "edge-tts speech negative generation produced 0 files after %s attempts; "
            "last error: %s",
            len(SPEECH_NEGATIVE_PHRASES) * len(voices_subset),
            _LAST_EDGE_TTS_ERROR or "unknown",
        )

    if verbose:
        print(f"  Speech negatives generated: {len(generated)} files")

    return generated


def _save_wav(audio: np.ndarray, path: Path, sample_rate: int = 16000) -> None:
    """Save float32 audio to a WAV file."""
    import wave

    import numpy as np

    audio = np.clip(audio, -1.0, 1.0)
    pcm_i16 = (audio * 32767).astype(np.int16)
    with wave.open(str(path), "wb") as wf:
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(sample_rate)
        wf.writeframes(pcm_i16.tobytes())


# ---------------------------------------------------------------------------
# Positive augmentation and temporal embedding helpers
# ---------------------------------------------------------------------------


def _augment_positives(
    raw_audio_arrays: list[np.ndarray],
    *,
    sample_rate: int = 16000,
    copies_per_clip: int = 21,
    seed: int = 42,
) -> list[np.ndarray]:
    """Augment positive clips with the roadmap audiomentations chain.

    This operates on raw waveform arrays before OWW embedding extraction and
    returns only augmented copies (the originals remain unchanged).
    """
    import numpy as np

 succeeded in 392ms:
# Deployment

How `https://violawake.com` and `https://api.violawake.com` actually run in production. **Pushing to GitHub does NOT auto-deploy anything** â€” both pieces are manual.

Last verified end-to-end: **2026-05-07**. If you're reading this and the architecture has changed, update this file with the date.

---

## Architecture

```
                    GitHub: GeeIHadAGoodTime/ViolaWake
                                 â”‚
                                 â–¼  (git push origin master â€” manual)
                          (no auto-deploy)
                                 â”‚
            â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
            â–¼                                         â–¼
  Frontend deploy (manual)                Backend deploy (manual)
  â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€                â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
  cd console/frontend                     cd /                          
  VITE_API_URL=â€¦ npm run build            docker compose build backend
  wrangler pages deploy dist              docker compose up -d backend
            â”‚                                         â”‚
            â–¼                                         â–¼
  Cloudflare Pages                        Local Docker (your machine)
  project: violawake                      stack project: wakeword
  serves: violawake.com                     wakeword-backend-1   (uvicorn)
                                            wakeword-postgres-1  (data)
                                            wakeword-tunnel-1    (cloudflared)
                                                  â”‚
                                                  â–¼
                                        Cloudflare Tunnel
                                        violawake-api
                                        7dbef1da-74e3-4d7f-bba9-aad4a3e72150
                                                  â”‚
                                                  â–¼
                                        api.violawake.com (Cloudflare edge)
```

### Why this architecture

- **Cloudflare Pages for the frontend** â€” fast static hosting, free tier, instant rollback per deployment.
- **Local Docker + Cloudflare Tunnel for the backend** â€” keeps the host machine off the public internet, no fixed IP needed, no cloud-host bill while pre-revenue. When ready to scale, the same container image can be dropped onto Railway / Fly / a VPS without changing the application.
- **Separate Postgres in the same compose stack** â€” ViolaWake's data is isolated from NOVVIOLA's data and from any other project. Volume `pgdata` survives container recreations.
- **Why not auto-deploy?** GitHub Actions don't have access to the local Docker host (the backend lives there) and Cloudflare Pages was set up without git auto-build. Both are intentional during the operator-launch phase to keep the deploy gate human.

---

## Backend deploy (api.violawake.com)

### Prerequisites

- Docker Desktop running on the host machine (currently `75.86.16.150`)
- `.env.production` exists at the repo root with the required env vars (see `Required env vars` below). This file is git-ignored â€” never commit secrets.
- The cloudflared tunnel container has been bootstrapped once with `CLOUDFLARE_TUNNEL_TOKEN`. Subsequent deploys do not need to re-bootstrap it.

### Deploy

```bash
cd /j/CLAUDE/PROJECTS/Wakeword

# 1. Verify you're on the commit you want to deploy
git log --oneline -3

# 2. Build the new image from the current working tree
docker compose -f docker-compose.production.yml build backend

# 3. Recreate the running container with the new image
docker compose -f docker-compose.production.yml up -d backend

# 4. Watch healthcheck (typically <30s)
docker inspect wakeword-backend-1 --format='{{.State.Health.Status}}'
# expect: healthy

# 5. Verify the live API now reflects the new code
curl -sS https://api.violawake.com/api/health
curl -sS https://api.violawake.com/openapi.json | python -c "import sys,json;d=json.load(sys.stdin);print('routes:',len(d['paths']))"
```

The Cloudflare Tunnel container reconnects to the new backend automatically â€” no tunnel config change is needed.

### Rollback

If the new container is unhealthy or breaks production, roll back to the previous image:

```bash
docker images wakeword-backend                # find the prior image SHA
docker compose -f docker-compose.production.yml stop backend
docker compose -f docker-compose.production.yml rm -f backend
docker tag wakeword-backend:<prior-sha> wakeword-backend:latest
docker compose -f docker-compose.production.yml up -d backend
```

### Database migrations

`docker compose up -d backend` runs `alembic upgrade head` via the entrypoint (see `console/backend/entrypoint.sh`). Migrations live in `console/backend/alembic/versions/`. Review the migration before deploying â€” additive column adds are safe, destructive ones (drop column, drop table, type change) require a maintenance window.

### Required env vars (`.env.production`)

```
# Database (Postgres in the same compose stack)
POSTGRES_PASSWORD=...

# Cloudflare Tunnel auth (one-time bootstrap)
CLOUDFLARE_TUNNEL_TOKEN=...

# App
VIOLAWAKE_ENV=production
VIOLAWAKE_SECRET_KEY=...                  # JWT signing â€” rotate to invalidate all sessions
VIOLAWAKE_ADMIN_TOKEN=...                 # Admin endpoints; treat like a root key
VIOLAWAKE_CONSOLE_BASE_URL=https://violawake.com
VIOLAWAKE_CORS_ORIGINS=https://violawake.com,https://www.violawake.com
VIOLAWAKE_PORT=8000

# Stripe (test or live mode â€” match across all four)
VIOLAWAKE_STRIPE_SECRET_KEY=...
VIOLAWAKE_STRIPE_WEBHOOK_SECRET=...
VIOLAWAKE_STRIPE_PRICE_DEVELOPER=price_...
VIOLAWAKE_STRIPE_PRICE_BUSINESS=price_...

# Email (Resend) â€” leave unset for auto-verify dev fallback
VIOLAWAKE_RESEND_API_KEY=...

# Limits / retention
VIOLAWAKE_MAX_CONCURRENT_JOBS=4
VIOLAWAKE_TRAINING_TIMEOUT=900
VIOLAWAKE_RECORDING_RETENTION_DAYS=30
VIOLAWAKE_MODEL_RETENTION_DAYS=90
VIOLAWAKE_POST_TRAINING_RETENTION_HOURS=72

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content tests\\unit\\test_cli.py -TotalCount 220" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content tests\\unit\\test_train.py -TotalCount 220" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n \"urlretrieve|download_model|download-corpus|pyproject|project.scripts|tools.download\" tests src\\violawake_sdk\\tools" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 341ms:
"""CLI entry-point smoke tests.

Covers every ``[project.scripts]`` entry point declared in pyproject.toml
plus the thin wrappers in ``violawake_sdk.cli``.

Strategy:
  - ``--help`` must exit 0 and print usage text (proves argparse is wired).
  - Missing required args must exit non-zero (proves validation works).
  - Valid args with mocked heavy deps must reach the delegation call
    (proves arg-parsing -> business-logic hand-off works).
"""

from __future__ import annotations

import subprocess
import sys
import textwrap
from pathlib import Path
from unittest import mock

import pytest


# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------

def _run_cli(module: str, args: list[str], *, timeout: int = 30) -> subprocess.CompletedProcess:
    """Run a CLI module via ``python -m`` and return the CompletedProcess."""
    return subprocess.run(
        [sys.executable, "-m", module, *args],
        capture_output=True,
        text=True,
        timeout=timeout,
    )


# ===================================================================
# violawake-train  (tools.train:main  &  cli.train:main)
# ===================================================================

class TestTrainCLI:
    """Tests for the violawake-train entry point."""

    def test_help_exits_zero(self) -> None:
        result = _run_cli("violawake_sdk.tools.train", ["--help"])
        assert result.returncode == 0
        assert "violawake-train" in result.stdout
        assert "--word" in result.stdout
        assert "--positives" in result.stdout
        assert "--output" in result.stdout

    def test_missing_required_args_exits_nonzero(self) -> None:
        result = _run_cli("violawake_sdk.tools.train", [])
        assert result.returncode != 0
        assert "required" in result.stderr.lower() or "error" in result.stderr.lower()

    def test_missing_positives_dir_exits_1(self, tmp_path: Path) -> None:
        """--positives pointing to a non-existent dir must fail gracefully."""
        result = _run_cli("violawake_sdk.tools.train", [
            "--word", "test",
            "--positives", str(tmp_path / "nonexistent"),
            "--output", str(tmp_path / "out.onnx"),
        ])
        assert result.returncode == 1
        assert "not found" in result.stderr.lower() or "error" in result.stderr.lower()

    def test_valid_args_reach_training_function(self, tmp_path: Path) -> None:
        """With valid dirs, the CLI should reach the training function."""
        pos_dir = tmp_path / "positives"
        pos_dir.mkdir()
        output = tmp_path / "out.onnx"

        # Use --architecture mlp to hit the fast mockable path.
        # The default temporal_cnn runs TTS generation inline (slow).
        with mock.patch("violawake_sdk.tools.train._train_mlp_on_oww") as mock_train:
            result = _run_cli("violawake_sdk.tools.train", [
                "--word", "test",
                "--positives", str(pos_dir),
                "--output", str(output),
                "--epochs", "2",
                "--architecture", "mlp",
                "--quiet",
            ])
            # The mock is in a subprocess so it won't be captured here.
            # Instead we just verify the process didn't crash on argparse.
            # The real validation is that it gets past argument parsing.
            # A non-existent import or bad arg would cause returncode != 0.
            # Since the subprocess has its own mock context, we test via
            # direct function call below.

    def test_direct_call_reaches_train_function(self, tmp_path: Path) -> None:
        """Direct call to main() with mocked _train_mlp_on_oww (legacy MLP path)."""
        pos_dir = tmp_path / "positives"
        pos_dir.mkdir()
        output = tmp_path / "out.onnx"

        test_args = [
            "--word", "test",
            "--positives", str(pos_dir),
            "--output", str(output),
            "--epochs", "2",
            "--architecture", "mlp",
            "--quiet",
        ]
        with mock.patch("sys.argv", ["violawake-train", *test_args]):
            with mock.patch("violawake_sdk.tools.train._train_mlp_on_oww") as mock_train:
                from violawake_sdk.tools.train import main
                main()
                mock_train.assert_called_once()
                call_kwargs = mock_train.call_args
                assert call_kwargs[1]["epochs"] == 2 or call_kwargs.kwargs["epochs"] == 2

    def test_cli_wrapper_help(self) -> None:
        """The cli.train wrapper should also accept --help."""
        result = _run_cli("violawake_sdk.cli.train", ["--help"])
        assert result.returncode == 0
        assert "violawake-train" in result.stdout

    # ---------------------------------------------------------------
    # cli.train:main() â€” argument parsing and delegation tests
    # ---------------------------------------------------------------

    def test_cli_train_help_exits_zero(self) -> None:
        """cli.train --help must exit 0."""
        with mock.patch("sys.argv", ["violawake-train", "--help"]):
            with pytest.raises(SystemExit) as exc_info:
                from violawake_sdk.cli.train import main
                main()
            assert exc_info.value.code == 0

    def test_cli_train_missing_positive_dir_exits_nonzero(self, tmp_path: Path) -> None:
        """cli.train without --positive-dir must exit non-zero."""
        with mock.patch("sys.argv", ["violawake-train",
                                      "--output-model", str(tmp_path / "out.onnx")]):
            with pytest.raises(SystemExit) as exc_info:
                from violawake_sdk.cli.train import main
                main()
            assert exc_info.value.code != 0

    def test_cli_train_missing_output_model_exits_nonzero(self, tmp_path: Path) -> None:
        """cli.train without --output-model must exit non-zero."""
        pos_dir = tmp_path / "pos"
        pos_dir.mkdir()
        with mock.patch("sys.argv", ["violawake-train",
                                      "--positive-dir", str(pos_dir)]):
            with pytest.raises(SystemExit) as exc_info:
                from violawake_sdk.cli.train import main
                main()
            assert exc_info.value.code != 0

    def test_cli_train_nonexistent_positive_dir_exits_1(self, tmp_path: Path) -> None:
        """cli.train with a nonexistent --positive-dir must exit 1."""
        with mock.patch("sys.argv", ["violawake-train",
                                      "--positive-dir", str(tmp_path / "nope"),
                                      "--output-model", str(tmp_path / "out.onnx")]):
            with pytest.raises(SystemExit) as exc_info:
                from violawake_sdk.cli.train import main
                main()
            assert exc_info.value.code == 1

    def test_cli_train_delegates_with_defaults(self, tmp_path: Path) -> None:
        """cli.train with minimal valid args delegates to _train_mlp_on_oww with defaults."""
        pos_dir = tmp_path / "pos"
        pos_dir.mkdir()
        out = tmp_path / "out.onnx"
        with mock.patch("sys.argv", ["violawake-train",
                                      "--positive-dir", str(pos_dir),
                                      "--output-model", str(out)]):
            with mock.patch("violawake_sdk.cli.train._train_mlp_on_oww", create=True) as mock_train:
                # Patch the import inside main()
                with mock.patch.dict("sys.modules", {}):
                    import importlib
                    import violawake_sdk.cli.train as cli_train_mod
                    importlib.reload(cli_train_mod)
                    with mock.patch("violawake_sdk.tools.train._train_mlp_on_oww", mock_train):
                        cli_train_mod.main()
                        mock_train.assert_called_once()
                        kw = mock_train.call_args
                        # Check default epochs=50 and augment=True (no --no-augment)
                        assert kw.kwargs.get("epochs", kw[1].get("epochs")) == 50
                        assert kw.kwargs.get("augment", kw[1].get("augment")) is True

    def test_cli_train_delegates_with_all_flags(self, tmp_path: Path) -> None:
        """cli.train with all flags passes correct args to _train_mlp_on_oww."""
        pos_dir = tmp_path / "pos"
        pos_dir.mkdir()
        neg_dir = tmp_path / "neg"
        neg_dir.mkdir()
        out = tmp_path / "out.onnx"
        with mock.patch("sys.argv", ["violawake-train",
                                      "--positive-dir", str(pos_dir),
                                      "--negative-dir", str(neg_dir),
                                      "--output-model", str(out),
                                      "--epochs", "10",
                                      "--no-augment",
                                      "--quiet"]):
            with mock.patch("violawake_sdk.tools.train._train_mlp_on_oww") as mock_train:
                import importlib
                import violawake_sdk.cli.train as cli_train_mod
                importlib.reload(cli_train_mod)
                cli_train_mod.main()
                mock_train.assert_called_once()
                kw = mock_train.call_args
                assert kw.kwargs.get("epochs", kw[1].get("epochs")) == 10
                assert kw.kwargs.get("augment", kw[1].get("augment")) is False
                assert kw.kwargs.get("verbose", kw[1].get("verbose")) is False

    def test_negatives_dir_not_found_exits_1(self, tmp_path: Path) -> None:
        pos_dir = tmp_path / "positives"
        pos_dir.mkdir()
        test_args = [
            "--word", "test",
            "--positives", str(pos_dir),
            "--output", str(tmp_path / "out.onnx"),
            "--negatives", str(tmp_path / "nonexistent_neg"),
        ]
        with mock.patch("sys.argv", ["violawake-train", *test_args]):
            with pytest.raises(SystemExit) as exc_info:
                from violawake_sdk.tools.train import main

 succeeded in 340ms:
"""Targeted unit tests for violawake_sdk.tools.train CLI behavior."""

from __future__ import annotations

import json
import sys
from pathlib import Path
from types import ModuleType
from unittest.mock import MagicMock, patch

import pytest

from violawake_sdk.tools import train


def _touch_audio_files(directory: Path, count: int) -> None:
    directory.mkdir(parents=True, exist_ok=True)
    for idx in range(count):
        (directory / f"{idx:03d}.wav").write_bytes(b"wav")


def _path_exists_without_corpus(original_exists):
    def _exists(path: Path) -> bool:
        if "corpus" in {part.lower() for part in path.parts}:
            return False
        return original_exists(path)

    return _exists


class TestTrainHelpers:
    def test_held_out_count_keeps_at_least_one_training_file(self) -> None:
        assert train._held_out_count(0) == 0
        assert train._held_out_count(1) == 0
        assert train._held_out_count(2) == 1
        assert train._held_out_count(10) == 5

    def test_auto_eval_verdict_thresholds(self) -> None:
        assert train._auto_eval_verdict(9.9) == "GOOD (EER < 10%)"
        assert train._auto_eval_verdict(10.0) == "ACCEPTABLE (EER <= 15%)"
        assert train._auto_eval_verdict(20.0) == "WARNING (EER > 15%)"
        assert train._auto_eval_verdict(30.0) == "CRITICAL (EER > 25%)"

    def test_update_auto_eval_config_merges_existing_json(self, tmp_path: Path) -> None:
        config_path = tmp_path / "model.config.json"
        config_path.write_text(json.dumps({"wake_word": "viola"}), encoding="utf-8")

        train._update_auto_eval_config(config_path, {"status": "ok", "eer_percent": 12.5})

        saved = json.loads(config_path.read_text(encoding="utf-8"))
        assert saved["wake_word"] == "viola"
        assert saved["auto_eval"]["status"] == "ok"

    def test_edge_tts_synthesize_decodes_with_soundfile(self, tmp_path: Path) -> None:
        import numpy as np

        class FakeCommunicate:
            def __init__(self, text: str, voice: str) -> None:
                self.text = text
                self.voice = voice

            async def stream(self):
                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}

        edge_tts_module = ModuleType("edge_tts")
        edge_tts_module.Communicate = FakeCommunicate
        soundfile_module = ModuleType("soundfile")
        soundfile_module.read = MagicMock(
            return_value=(np.zeros(16000, dtype=np.float32), 16000)
        )

        out_path = tmp_path / "tts.wav"
        with patch.dict(
            sys.modules,
            {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
        ):
            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)

        assert out_path.stat().st_size > 44
        soundfile_module.read.assert_called_once()

    def test_confusable_generation_logs_zero_edge_tts_outputs(
        self, caplog: pytest.LogCaptureFixture, tmp_path: Path
    ) -> None:
        train._LAST_EDGE_TTS_ERROR = "pydub decode failed: missing ffprobe"

        with (
            caplog.at_level("ERROR", logger=train.logger.name),
            patch(
                "violawake_sdk.tools.confusables.generate_confusables",
                return_value=["violas"],
            ),
            patch("violawake_sdk.tools.train._edge_tts_synthesize", return_value=False),
        ):
            generated = train._generate_confusable_negatives(
                "viola",
                tmp_path,
                n_confusables=1,
                voices_per_word=1,
                verbose=False,
            )

        assert generated == []
        assert "edge-tts confusable negative generation produced 0 files" in caplog.text
        assert "missing ffprobe" in caplog.text


class TestTrainMainValidation:
    def test_main_exits_when_positives_dir_is_missing(
        self, capsys: pytest.CaptureFixture[str], tmp_path: Path
    ) -> None:
        output = tmp_path / "model.onnx"
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--positives",
            str(tmp_path / "missing"),
            "--output",
            str(output),
        ]

        with patch.object(sys, "argv", argv), pytest.raises(SystemExit, match="1"):
            train.main()

        assert "Positives directory not found" in capsys.readouterr().err

    def test_main_exits_when_negatives_dir_is_missing(
        self, capsys: pytest.CaptureFixture[str], tmp_path: Path
    ) -> None:
        positives = tmp_path / "positives"
        positives.mkdir()
        output = tmp_path / "model.onnx"
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--positives",
            str(positives),
            "--negatives",
            str(tmp_path / "missing-neg"),
            "--output",
            str(output),
        ]

        with patch.object(sys, "argv", argv), pytest.raises(SystemExit, match="1"):
            train.main()

        assert "Negatives directory not found" in capsys.readouterr().err

    def test_main_requires_positives_for_mlp_architecture(
        self, capsys: pytest.CaptureFixture[str], tmp_path: Path
    ) -> None:
        output = tmp_path / "model.onnx"
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--output",
            str(output),
            "--architecture",
            "mlp",
        ]

        with patch.object(sys, "argv", argv), pytest.raises(SystemExit, match="1"):
            train.main()

        assert "--positives is required for MLP architecture" in capsys.readouterr().err

    def test_main_temporal_exits_with_too_few_positive_files(
        self, capsys: pytest.CaptureFixture[str], tmp_path: Path
    ) -> None:
        positives = tmp_path / "positives"
        positives.mkdir()
        output = tmp_path / "model.onnx"
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--positives",
            str(positives),
            "--output",
            str(output),
            "--no-auto-corpus",
            "--quiet",
        ]

        with patch.object(sys, "argv", argv), pytest.raises(SystemExit, match="1"):
            train.main()

        assert "Provide at least 5 via --positives or enable --auto-corpus" in capsys.readouterr().err

    def test_main_temporal_exits_with_too_few_negative_files(
        self, capsys: pytest.CaptureFixture[str], tmp_path: Path
    ) -> None:
        positives = tmp_path / "positives"
        negatives = tmp_path / "negatives"
        _touch_audio_files(positives, 5)
        negatives.mkdir()
        output = tmp_path / "model.onnx"
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--positives",
            str(positives),
            "--negatives",
            str(negatives),
            "--output",
            str(output),
            "--no-auto-corpus",
            "--quiet",
        ]

        original_exists = Path.exists
        with (
            patch.object(sys, "argv", argv),
            patch("pathlib.Path.exists", new=_path_exists_without_corpus(original_exists)),
            pytest.raises(SystemExit, match="1"),
        ):

 succeeded in 690ms:
src\violawake_sdk\tools\collect_samples.py:4:Entry point: ``violawake-collect`` (declared in pyproject.toml).
tests\integration\test_feature_completeness.py:1462:    def test_download_model_function_exists(self):
tests\integration\test_feature_completeness.py:1463:        """download_model function exists with expected signature."""
tests\integration\test_feature_completeness.py:1464:        from violawake_sdk.models import download_model
tests\integration\test_feature_completeness.py:1466:        assert callable(download_model)
tests\integration\test_feature_completeness.py:1467:        sig = inspect.signature(download_model)
tests\integration\test_feature_completeness.py:1725:    """Verify all console_scripts in pyproject.toml resolve to real functions."""
tests\integration\test_feature_completeness.py:1731:        "violawake-download": "violawake_sdk.tools.download_model:main",
src\violawake_sdk\tools\download_model.py:4:Entry point: ``violawake-download`` (declared in pyproject.toml).
src\violawake_sdk\tools\download_model.py:67:    from violawake_sdk.models import MODEL_REGISTRY, download_model, list_cached_models
src\violawake_sdk\tools\download_model.py:109:            path = download_model(
tests\integration\test_sdk_surface.py:155:        "violawake_sdk.tools.download_model",
src\violawake_sdk\tools\generate_samples.py:4:Entry point: ``violawake-generate`` (declared in pyproject.toml).
tests\unit\test_cert_pinning.py:12:  - Integration with download_model (mocked network)
tests\unit\test_cert_pinning.py:721:# Integration: download_model with pin verification (mocked)
tests\unit\test_cert_pinning.py:725:    def test_download_model_uses_requests(self, tmp_path, _clean_pin_state):
tests\unit\test_cert_pinning.py:726:        """Verify that download_model uses requests.get for downloading."""
tests\unit\test_cert_pinning.py:746:                        models.download_model(
tests\unit\test_cert_pinning.py:754:    def test_download_model_cached_skips_download(self, tmp_path, _clean_pin_state):
tests\unit\test_cert_pinning.py:755:        """Verify that download_model skips download when file is cached."""
tests\unit\test_cert_pinning.py:765:                    result = models.download_model(
tests\unit\test_cli.py:3:Covers every ``[project.scripts]`` entry point declared in pyproject.toml
tests\unit\test_cli.py:325:# violawake-download  (tools.download_model:main  &  cli.download:main)
tests\unit\test_cli.py:332:        result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
tests\unit\test_cli.py:347:                    download_model=mock.MagicMock(),
tests\unit\test_cli.py:351:                from violawake_sdk.tools.download_model import main as dl_main
tests\unit\test_cli.py:354:                import violawake_sdk.tools.download_model as dl_mod
tests\unit\test_cli.py:366:                    download_model=mock.MagicMock(),
tests\unit\test_cli.py:371:                import violawake_sdk.tools.download_model as dl_mod
tests\unit\test_cli.py:384:                    download_model=mock.MagicMock(),
tests\unit\test_cli.py:389:                import violawake_sdk.tools.download_model as dl_mod
src\violawake_sdk\tools\train.py:4:Entry point: ``violawake-train`` (declared in pyproject.toml).
tests\live\RESULTS_2026-05-07.md:44:- **`pip install violawake[oww]` does NOT yield a working detector out of the box.** The user must additionally run `python -c "from openwakeword.utils import download_models; download_models()"` to fetch the OpenWakeWord backbone ONNX files (the openwakeword PyPI wheel doesn't bundle them). The README's quickstart should call this out.
tests\live\RESULTS_2026-05-07.md:51:| Is `pip install violawake` working? | ✅ Yes (v0.2.2 on PyPI, with `[oww]` extra + `download_models()`) |
tests\unit\test_models.py:20:    _auto_download_model,
tests\unit\test_models.py:25:    download_model,
tests\unit\test_models.py:283:            path = download_model("test_model", force=True, verify=True)
tests\unit\test_models.py:320:                download_model("test_model", force=True, verify=True)
tests\unit\test_models.py:381:            patch("violawake_sdk.models._auto_download_model", return_value=fake_model) as mock_dl,
tests\unit\test_models.py:407:            patch("violawake_sdk.models._auto_download_model") as mock_dl,
tests\unit\test_models.py:415:    """Test _auto_download_model() directly."""
tests\unit\test_models.py:442:            path = _auto_download_model("test_model", spec)
tests\unit\test_models.py:462:            _auto_download_model("placeholder_model", spec)
tests\unit\test_models.py:481:                _auto_download_model("fail_model", spec)
tests\unit\test_model_download.py:4:are needed. Tests the download_model() and _verify_sha256() functions.
tests\unit\test_model_download.py:19:    download_model,
tests\unit\test_model_download.py:91:            path = download_model("temporal_cnn", force=True, verify=True)
tests\unit\test_model_download.py:97:        """download_model returns cached path without re-downloading."""
tests\unit\test_model_download.py:122:            path = download_model("temporal_cnn", force=False, verify=True)
tests\unit\test_model_download.py:159:                download_model("temporal_cnn", force=True)
tests\unit\test_model_download.py:251:    """download_model with unknown model name raises KeyError."""
tests\unit\test_model_download.py:261:                download_model("nonexistent_model_xyz")
tests\unit\test_model_download.py:273:                download_model("oww_backbone")
tests\unit\test_model_download.py:277:# Placeholder hash guard in download_model
tests\unit\test_model_download.py:281:    """download_model refuses models with placeholder hashes unless skip_verify=True."""
tests\unit\test_model_download.py:301:                download_model("placeholder_model", force=True)
tests\unit\test_model_download.py:331:            path = download_model("temporal_cnn", force=True, verify=True, skip_verify=True)
tests\unit\test_model_download.py:361:            path = download_model("temporal_cnn", force=True, verify=True)

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content tests\\unit\\test_train.py | Select-Object -Skip 140 -First 180" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 303ms:
            str(tmp_path / "missing-neg"),
            "--output",
            str(output),
        ]

        with patch.object(sys, "argv", argv), pytest.raises(SystemExit, match="1"):
            train.main()

        assert "Negatives directory not found" in capsys.readouterr().err

    def test_main_requires_positives_for_mlp_architecture(
        self, capsys: pytest.CaptureFixture[str], tmp_path: Path
    ) -> None:
        output = tmp_path / "model.onnx"
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--output",
            str(output),
            "--architecture",
            "mlp",
        ]

        with patch.object(sys, "argv", argv), pytest.raises(SystemExit, match="1"):
            train.main()

        assert "--positives is required for MLP architecture" in capsys.readouterr().err

    def test_main_temporal_exits_with_too_few_positive_files(
        self, capsys: pytest.CaptureFixture[str], tmp_path: Path
    ) -> None:
        positives = tmp_path / "positives"
        positives.mkdir()
        output = tmp_path / "model.onnx"
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--positives",
            str(positives),
            "--output",
            str(output),
            "--no-auto-corpus",
            "--quiet",
        ]

        with patch.object(sys, "argv", argv), pytest.raises(SystemExit, match="1"):
            train.main()

        assert "Provide at least 5 via --positives or enable --auto-corpus" in capsys.readouterr().err

    def test_main_temporal_exits_with_too_few_negative_files(
        self, capsys: pytest.CaptureFixture[str], tmp_path: Path
    ) -> None:
        positives = tmp_path / "positives"
        negatives = tmp_path / "negatives"
        _touch_audio_files(positives, 5)
        negatives.mkdir()
        output = tmp_path / "model.onnx"
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--positives",
            str(positives),
            "--negatives",
            str(negatives),
            "--output",
            str(output),
            "--no-auto-corpus",
            "--quiet",
        ]

        original_exists = Path.exists
        with (
            patch.object(sys, "argv", argv),
            patch("pathlib.Path.exists", new=_path_exists_without_corpus(original_exists)),
            pytest.raises(SystemExit, match="1"),
        ):
            train.main()

        assert "Enable --auto-corpus or provide negatives via --negatives" in capsys.readouterr().err

    def test_main_parses_args_and_invokes_temporal_training(self, tmp_path: Path) -> None:
        positives = tmp_path / "positives"
        negatives = tmp_path / "negatives"
        eval_dir = tmp_path / "eval"
        (eval_dir / "positives").mkdir(parents=True)
        (eval_dir / "negatives").mkdir(parents=True)
        _touch_audio_files(positives, 6)
        _touch_audio_files(negatives, 6)
        output = tmp_path / "model.onnx"
        evaluate_module = ModuleType("violawake_sdk.tools.evaluate")
        evaluate_module.evaluate_onnx_model = MagicMock(
            return_value={
                "architecture": "temporal_cnn",
                "n_positives": 6,
                "n_negatives": 6,
                "eer_approx": 0.08,
                "roc_auc": 0.95,
                "optimal_far": 0.02,
                "optimal_frr": 0.03,
                "optimal_threshold": 0.82,
            }
        )
        argv = [
            "violawake-train",
            "--word",
            "viola",
            "--positives",
            str(positives),
            "--negatives",
            str(negatives),
            "--output",
            str(output),
            "--eval-dir",
            str(eval_dir),
            "--epochs",
            "12",
            "--batch-size",
            "32",
            "--lr",
            "0.002",
            "--patience",
            "4",
            "--no-auto-corpus",
            "--no-augment",
            "--quiet",
        ]

        original_exists = Path.exists
        with (
            patch.object(sys, "argv", argv),
            patch.dict(sys.modules, {"violawake_sdk.tools.evaluate": evaluate_module}),
            patch("pathlib.Path.exists", new=_path_exists_without_corpus(original_exists)),
            patch("violawake_sdk.tools.train._train_temporal_cnn") as train_temporal,
        ):
            train.main()

        train_temporal.assert_called_once()
        kwargs = train_temporal.call_args.kwargs
        assert kwargs["wake_word"] == "viola"
        assert kwargs["epochs"] == 12
        assert kwargs["batch_size"] == 32
        assert kwargs["lr"] == 0.002
        assert kwargs["patience"] == 4
        assert kwargs["augment"] is False
        assert kwargs["verbose"] is False
        assert len(kwargs["pos_files"]) == 6
        assert len(kwargs["neg_files"]) == 6

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n -C 10 \"_generate_speech_negatives|Speech negatives|neg_speech|auto-corpus|universal corpus|_CORPUS_SEARCH_PATHS\" src\\violawake_sdk\\tools\\train.py" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 223ms:
14-  - FocalLoss for class imbalance handling
15-  - AdamW optimizer with cosine annealing LR schedule
16-  - Exponential Moving Average (EMA) of model weights
17-  - 80/20 group-aware train/validation split with early stopping
18-  - Post-training quality gate (speech FP check)
19-
20-Data pipeline (matches production golden path):
21-  A. Positives: user-provided + auto-TTS (edge-tts, 20 voices x 3 phrases x 3 conditions)
22-  B. Confusable negatives round 1: 30 phonetically similar words x 10 voices
23-  C. Confusable negatives round 2: 16 tighter variants x 10 voices
24:  D. Speech negatives: common phrases via TTS (100+ phrases x 5 voices)
25:  E. Shared universal corpus: LibriSpeech, MUSAN speech/music/noise (auto-discovered)
26-  F. User-provided negatives via --negatives directory (if any)
27-
28-Usage::
29-
30-    violawake-train \\
31-      --word "jarvis" \\
32-      --positives data/jarvis/positives/ \\
33-      --output models/jarvis.onnx \\
34-      --epochs 80
35-
--
628-            len(confusable_words) * len(voices_subset),
629-            _LAST_EDGE_TTS_ERROR or "unknown",
630-        )
631-
632-    if verbose:
633-        print(f"  Confusable negatives generated: {len(generated)} files")
634-
635-    return generated
636-
637-
638:def _generate_speech_negatives(
639-    output_dir: Path,
640-    n_voices: int = 5,
641-    verbose: bool = True,
642-) -> list[Path]:
643-    """Generate speech negative samples via TTS using common phrases.
644-
645-    Returns list of generated WAV file paths.
646-    """
647-    output_dir.mkdir(parents=True, exist_ok=True)
648-    voices_subset = EDGE_TTS_VOICES[:n_voices]
--
673-
674-    if not generated and SPEECH_NEGATIVE_PHRASES and voices_subset:
675-        logger.error(
676-            "edge-tts speech negative generation produced 0 files after %s attempts; "
677-            "last error: %s",
678-            len(SPEECH_NEGATIVE_PHRASES) * len(voices_subset),
679-            _LAST_EDGE_TTS_ERROR or "unknown",
680-        )
681-
682-    if verbose:
683:        print(f"  Speech negatives generated: {len(generated)} files")
684-
685-    return generated
686-
687-
688-def _save_wav(audio: np.ndarray, path: Path, sample_rate: int = 16000) -> None:
689-    """Save float32 audio to a WAV file."""
690-    import wave
691-
692-    import numpy as np
693-
--
1504-    # customer-facing deploy — it would let unfit models ship.
1505-    skip_gate = os.environ.get("VIOLAWAKE_SKIP_QUALITY_GATE", "").lower() in ("1", "true", "yes")
1506-
1507-    if quality_grade == "F":
1508-        print(
1509-            "\n" + "!" * 72 + "\nQUALITY GATE FAILED: model is not ready for deployment.\n"
1510-            f"  Speech FP rate:     {quality_gate['speech_fp_rate'] * 100:.1f}%\n"
1511-            f"  Confusable FP rate: {quality_gate['confusable_fp_rate'] * 100:.1f}%\n"
1512-            f"  Silence max score:  {quality_gate['silence_max_score']:.2f}\n"
1513-            "Recommended fixes:\n"
1514:            "  - Add more diverse speech negatives via --negatives or keep --auto-corpus enabled.\n"
1515-            f"  - Expand confusable negatives for '{wake_word}' and retrain.\n"
1516-            "  - Audit mislabeled positives/negatives and remove noisy clips.\n"
1517-            "  - Raise the deployment threshold only after checking recall on eval data.\n"
1518-            + "!"
1519-            * 72
1520-        )
1521-        if skip_gate:
1522-            print(
1523-                "\n" + "*" * 72 + "\n"
1524-                "WARNING: VIOLAWAKE_SKIP_QUALITY_GATE=1 — exporting failing model anyway.\n"
--
2309-        help="Disable audio-level augmentation (TTS generation still runs)",
2310-    )
2311-    parser.add_argument(
2312-        "--architecture",
2313-        choices=["temporal_cnn", "mlp"],
2314-        default="temporal_cnn",
2315-        help="Model architecture (default: temporal_cnn). "
2316-        "'mlp' is the legacy single-frame architecture.",
2317-    )
2318-    parser.add_argument(
2319:        "--auto-corpus",
2320-        action="store_true",
2321-        default=True,
2322-        dest="auto_corpus",
2323-        help="Auto-generate TTS positives, confusables, and speech negatives (default: True)",
2324-    )
2325-    parser.add_argument(
2326:        "--no-auto-corpus",
2327-        action="store_false",
2328-        dest="auto_corpus",
2329-        help="Disable auto-generation of TTS corpus. Only use --positives and --negatives.",
2330-    )
2331-    parser.add_argument(
2332-        "--eval-dir",
2333-        metavar="DIR",
2334-        help="Optional test set directory for evaluation after training. "
2335-        "Must contain positives/ and negatives/ subdirectories.",
2336-    )
--
2438-            print(
2439-                f"\nStep 1a: Auto-generating TTS positives (have {len(user_pos_files)}, need ~100+)..."
2440-            )
2441-        tts_pos_dir = corpus_dir / "tts_positives"
2442-        tts_pos_files = _generate_tts_positives(args.word, tts_pos_dir, verbose=verbose)
2443-
2444-    all_pos_files = user_pos_files + tts_pos_files
2445-    if len(all_pos_files) < 5:
2446-        print(
2447-            f"ERROR: Only {len(all_pos_files)} positive files total. "
2448:            "Provide at least 5 via --positives or enable --auto-corpus.",
2449-            file=sys.stderr,
2450-        )
2451-        sys.exit(1)
2452-
2453-    if verbose:
2454-        print(
2455-            f"\nTotal positive files: {len(all_pos_files)} "
2456-            f"({len(user_pos_files)} user + {len(tts_pos_files)} TTS)"
2457-        )
2458-
--
2502-        if confusable_r2:
2503-            neg_tag_map["neg_confusable_r2"] = confusable_r2
2504-            confusable_files.extend(confusable_r2)
2505-
2506-    # Source 3: Auto-generated speech negatives
2507-    speech_neg_files: list[Path] = []
2508-    if args.auto_corpus:
2509-        if verbose:
2510-            print("\nStep 1c: Auto-generating speech negatives...")
2511-        speech_neg_dir = corpus_dir / "speech_negatives"
2512:        speech_neg_files = _generate_speech_negatives(
2513-            speech_neg_dir,
2514-            n_voices=5,
2515-            verbose=verbose,
2516-        )
2517-        if speech_neg_files:
2518:            neg_tag_map["neg_speech"] = speech_neg_files
2519-
2520-    # Source 4: Shared universal negative corpus (LibriSpeech, MUSAN, etc.)
2521-    # These are word-agnostic negatives that every wake word model needs.
2522-    # Without them, models only learn to distinguish the wake word from a
2523-    # tiny auto-generated set and false-trigger on any real-world speech.
2524:    _CORPUS_SEARCH_PATHS = [
2525-        Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
2526-        Path.home() / ".violawake" / "corpus",
2527-        Path("corpus"),
2528-    ]
2529-    _CORPUS_SUBDIRS = {
2530-        "neg_librispeech": "librispeech",
2531-        "neg_musan_speech": ("musan/musan/speech", "musan/speech"),
2532-        "neg_musan_music": ("musan/musan/music", "musan/music"),
2533-        "neg_musan_noise": ("musan/musan/noise", "musan/noise"),
2534-    }
2535-    for tag, subdirs in _CORPUS_SUBDIRS.items():
2536-        if isinstance(subdirs, str):
2537-            subdirs = (subdirs,)
2538:        for corpus_root in _CORPUS_SEARCH_PATHS:
2539-            if not corpus_root.exists():
2540-                continue
2541-            for subdir in subdirs:
2542-                candidate = corpus_root / subdir
2543-                if candidate.exists():
2544-                    corpus_files = sorted(
2545-                        list(candidate.rglob("*.wav")) + list(candidate.rglob("*.flac"))
2546-                    )
2547-                    if corpus_files:
2548-                        # Cap each source to avoid swamping the dataset
--
2587-        print("\nNOTE: Universal negative corpus is incomplete.")
2588-        print(f"Found {len(found_corpus_tags)}/{len(_CORPUS_SUBDIRS)} corpus sources; missing:")
2589-        for tag in missing_corpus_tags:
2590-            print(f"  {tag}: {corpus_paths[tag]}")
2591-        print("Add files to the paths above or provide negatives via --negatives <dir>.")
2592-
2593-    total_neg = sum(len(v) for v in neg_tag_map.values())
2594-    if total_neg < 5:
2595-        print(
2596-            f"ERROR: Only {total_neg} negative files total. "
2597:            "Enable --auto-corpus or provide negatives via --negatives.",
2598-            file=sys.stderr,
2599-        )
2600-        sys.exit(1)
2601-
2602-    if verbose:
2603-        print(f"\nTotal negative files: {total_neg}")
2604-        for tag, files in neg_tag_map.items():
2605-            print(f"  {tag}: {len(files)}")
2606-
2607-    # Flatten for the training function

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n \"generate_speech_negatives|Generating speech negatives|edge-tts|No speech negatives|neg_speech|training_service|run_training_job_sync|Generated confusables\" console\\tests console\\backend\\tests tests" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 307ms:
tests\golden_path_test.py:4:Uses edge-tts to simulate 10 diverse user recordings.
tests\golden_path_test.py:44:    """Generate 10 simulated user recordings via edge-tts."""
tests\golden_path_test.py:110:        _generate_speech_negatives,
tests\golden_path_test.py:130:    speech = _generate_speech_negatives(
tests\golden_path_test.py:134:        neg_tag_map["neg_speech"] = speech
tests\golden_path_test.py:231:    print("\n=== Step 1: Generate 10 Recordings (edge-tts) ===")
tests\unit\test_train.py:104:        assert "edge-tts confusable negative generation produced 0 files" in caplog.text

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-ChildItem src\\violawake_sdk\\tools | Select-Object -ExpandProperty Name" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 372ms:
__pycache__
collect_samples.py
confusables.py
contamination_check.py
download_model.py
evaluate.py
expand_corpus.py
generate_samples.py
streaming_eval.py
test_confusables.py
train.py
__init__.py

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content tests\\integration\\test_feature_completeness.py | Select-Object -Skip 1710 -First 60" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 407ms:
    def test_base_classes_are_abstract(self):
        """BackendSession and InferenceBackend are abstract."""
        from violawake_sdk.backends.base import BackendSession, InferenceBackend

        assert inspect.isabstract(BackendSession)
        assert inspect.isabstract(InferenceBackend)


# ---------------------------------------------------------------------------
# CLI Entry Points
# ---------------------------------------------------------------------------


class TestCLIEntryPoints:
    """Verify all console_scripts in pyproject.toml resolve to real functions."""

    CONSOLE_SCRIPTS = {
        "violawake-train": "violawake_sdk.tools.train:main",
        "violawake-eval": "violawake_sdk.tools.evaluate:main",
        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
        "violawake-download": "violawake_sdk.tools.download_model:main",
        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
        "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
        "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
        "violawake-generate": "violawake_sdk.tools.generate_samples:main",
    }

    @pytest.mark.parametrize("script_name,entry_point", list(CONSOLE_SCRIPTS.items()))
    def test_entry_point_resolves(self, script_name, entry_point):
        """Console script entry point resolves to a real callable."""
        module_path, func_name = entry_point.rsplit(":", 1)
        mod = importlib.import_module(module_path)
        func = getattr(mod, func_name, None)
        assert func is not None, f"{script_name} -> {entry_point} did not resolve"
        assert callable(func), f"{script_name} -> {entry_point} is not callable"


# ---------------------------------------------------------------------------
# Console Backend
# ---------------------------------------------------------------------------


@pytest.mark.skipif(
    not _CONSOLE_BACKEND_IMPORTABLE,
    reason="Console backend not importable (missing deps or not on path)",
)
class TestConsoleRetention:
    """Verify retention cleanup functions exist with correct signatures."""

    def test_module_importable(self):
        """app.retention module is importable."""
        mod = importlib.import_module("app.retention")
        assert mod is not None

    def test_cleanup_expired_recordings_exists(self):
        """cleanup_expired_recordings is an async function."""
        from app.retention import cleanup_expired_recordings

        assert callable(cleanup_expired_recordings)

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n -C 4 \"WORKDIR|CMD|uvicorn|copy|corpus\" console\\Dockerfile.backend docker-compose.production.yml console\\backend\\entrypoint.sh" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 265ms:
docker-compose.production.yml-8-      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
docker-compose.production.yml-9-    volumes:
docker-compose.production.yml-10-      - pgdata:/var/lib/postgresql/data
docker-compose.production.yml-11-    healthcheck:
docker-compose.production.yml:12:      test: ["CMD-SHELL", "pg_isready -U violawake"]
docker-compose.production.yml-13-      interval: 5s
docker-compose.production.yml-14-      timeout: 3s
docker-compose.production.yml-15-      retries: 5
docker-compose.production.yml-16-    shm_size: 256mb
--
docker-compose.production.yml-24-      - "8000"
docker-compose.production.yml-25-    volumes:
docker-compose.production.yml-26-      - backend-data:/app/data
docker-compose.production.yml-27-      - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
docker-compose.production.yml:28:      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
docker-compose.production.yml:29:      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
docker-compose.production.yml-30-      # edge-tts negative generation falls short. Without this mount, training
docker-compose.production.yml-31-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
docker-compose.production.yml:32:      - ./corpus:/app/corpus:ro
docker-compose.production.yml-33-    env_file:
docker-compose.production.yml-34-      - .env.production
docker-compose.production.yml-35-    environment:
docker-compose.production.yml-36-      - VIOLAWAKE_DB_URL=postgresql+asyncpg://violawake:${POSTGRES_PASSWORD}@postgres:5432/violawake
--
docker-compose.production.yml-38-    depends_on:
docker-compose.production.yml-39-      postgres:
docker-compose.production.yml-40-        condition: service_healthy
docker-compose.production.yml-41-    healthcheck:
docker-compose.production.yml:42:      test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
docker-compose.production.yml-43-      interval: 10s
docker-compose.production.yml-44-      timeout: 5s
docker-compose.production.yml-45-      retries: 5
docker-compose.production.yml-46-      start_period: 30s
--
console\Dockerfile.backend-1-FROM python:3.11-slim
console\Dockerfile.backend-2-
console\Dockerfile.backend:3:WORKDIR /app
console\Dockerfile.backend-4-
console\Dockerfile.backend-5-# Install system deps for audio processing + gosu for entrypoint
console\Dockerfile.backend-6-RUN apt-get update && apt-get install -y --no-install-recommends \
console\Dockerfile.backend-7-    portaudio19-dev curl gosu \

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content console\\Dockerfile.backend -TotalCount 140" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 231ms:
FROM python:3.11-slim

WORKDIR /app

# Install system deps for audio processing + gosu for entrypoint
RUN apt-get update && apt-get install -y --no-install-recommends \
    portaudio19-dev curl gosu \
    && rm -rf /var/lib/apt/lists/*

# Install Python deps + upgrade build tools
COPY console/backend/requirements.txt .
RUN pip install --no-cache-dir --upgrade pip hatchling && \
    pip install --no-cache-dir -r requirements.txt

# Install ViolaWake SDK
COPY pyproject.toml /sdk/
COPY README.md /sdk/
COPY LICENSE /sdk/
COPY src/ /sdk/src/
RUN pip install --no-cache-dir "/sdk[training]"

# Copy backend code
COPY console/backend/ .

# Create data directory and non-root user
RUN mkdir -p /app/data
RUN addgroup --system app && adduser --system --ingroup app app
RUN chown -R app:app /app /app/data
COPY console/backend/entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh

EXPOSE 8000

ENTRYPOINT ["/app/entrypoint.sh"]

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content console\\tests\\conftest.py -TotalCount 220" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 256ms:
"""Shared pytest setup for console backend tests.

Direct route-call tests bypass FastAPI's middleware and pass lightweight
request doubles instead of `starlette.requests.Request`. slowapi's
`@limiter.limit(...)` decorator validates the request type, so we disable
the limiter for the entire test suite. Real rate-limit behavior is covered
by integration tests against a running ASGI app, not these unit tests.
"""

from __future__ import annotations

import sys
from pathlib import Path

backend_dir = str(Path(__file__).resolve().parents[1] / "backend")
if backend_dir not in sys.path:
    sys.path.insert(0, backend_dir)

from app.rate_limit import limiter

limiter.enabled = False

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content tests\\unit\\test_cli.py | Select-Object -Skip 300 -First 120" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 298ms:
                        "precision": 0.98, "recall": 0.98, "f1": 0.98,
                    },
                    "tp_scores": [0.9] * 50,
                    "fp_scores": [0.1] * 200,
                }
                # The import inside main() uses a different path; we need to
                # patch at the point of import.
                with mock.patch.dict("sys.modules", {
                    "violawake_sdk.training.evaluate": mock.MagicMock(
                        evaluate_onnx_model=mock_eval
                    ),
                }):
                    from violawake_sdk.tools.evaluate import main
                    main()
                    mock_eval.assert_called_once()

    def test_cli_wrapper_help(self) -> None:
        """The cli.evaluate wrapper re-exports the same main."""
        result = _run_cli("violawake_sdk.cli.evaluate", ["--help"])
        assert result.returncode == 0
        assert "violawake-eval" in result.stdout


# ===================================================================
# violawake-download  (tools.download_model:main  &  cli.download:main)
# ===================================================================

class TestDownloadCLI:
    """Tests for the violawake-download entry point."""

    def test_help_exits_zero(self) -> None:
        result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
        assert result.returncode == 0
        assert "violawake-download" in result.stdout
        assert "--model" in result.stdout
        assert "--list" in result.stdout

    def test_list_models(self) -> None:
        """--list should print available models and exit 0."""
        fake_registry = {
            "test_model": mock.MagicMock(size_bytes=10_000_000, description="A test model"),
        }
        with mock.patch("sys.argv", ["violawake-download", "--list"]):
            with mock.patch.dict("sys.modules", {
                "violawake_sdk.models": mock.MagicMock(
                    MODEL_REGISTRY=fake_registry,
                    download_model=mock.MagicMock(),
                    list_cached_models=mock.MagicMock(return_value=[]),
                ),
            }):
                from violawake_sdk.tools.download_model import main as dl_main
                # Re-import to pick up the patched module
                import importlib
                import violawake_sdk.tools.download_model as dl_mod
                importlib.reload(dl_mod)
                dl_mod.main()
                # If it didn't raise, --list worked

    def test_list_cached_models(self) -> None:
        """--list-cached should print cached models and exit 0."""
        cached = [("test_model", Path("/tmp/test.onnx"), 10.0)]
        with mock.patch("sys.argv", ["violawake-download", "--list-cached"]):
            with mock.patch.dict("sys.modules", {
                "violawake_sdk.models": mock.MagicMock(
                    MODEL_REGISTRY={},
                    download_model=mock.MagicMock(),
                    list_cached_models=mock.MagicMock(return_value=cached),
                ),
            }):
                import importlib
                import violawake_sdk.tools.download_model as dl_mod
                importlib.reload(dl_mod)
                dl_mod.main()

    def test_unknown_model_exits_1(self) -> None:
        """Requesting an unknown model name should exit 1."""
        fake_registry = {
            "real_model": mock.MagicMock(size_bytes=10_000_000, description="A model"),
        }
        with mock.patch("sys.argv", ["violawake-download", "--model", "nonexistent"]):
            with mock.patch.dict("sys.modules", {
                "violawake_sdk.models": mock.MagicMock(
                    MODEL_REGISTRY=fake_registry,
                    download_model=mock.MagicMock(),
                    list_cached_models=mock.MagicMock(return_value=[]),
                ),
            }):
                import importlib
                import violawake_sdk.tools.download_model as dl_mod
                importlib.reload(dl_mod)
                with pytest.raises(SystemExit) as exc_info:
                    dl_mod.main()
                assert exc_info.value.code == 1

    def test_cli_wrapper_help(self) -> None:
        """The cli.download wrapper re-exports the same main."""
        result = _run_cli("violawake_sdk.cli.download", ["--help"])
        assert result.returncode == 0
        assert "violawake-download" in result.stdout


# ===================================================================
# violawake-collect  (tools.collect_samples:main)
# ===================================================================

class TestCollectCLI:
    """Tests for the violawake-collect entry point."""

    def test_help_exits_zero(self) -> None:
        result = _run_cli("violawake_sdk.tools.collect_samples", ["--help"])
        assert result.returncode == 0
        assert "violawake-collect" in result.stdout
        assert "--word" in result.stdout
        assert "--output" in result.stdout

    def test_missing_required_args_exits_nonzero(self) -> None:
        result = _run_cli("violawake_sdk.tools.collect_samples", [])
        assert result.returncode != 0
        assert "required" in result.stderr.lower() or "error" in result.stderr.lower()


codex
The service does have the exact Source 3 Edge-TTS speech-negative block, and the Docker image runs with `WORKDIR /app`, so `/app/corpus` is the production mount that matters. I’m patching that path now, plus a small downloader CLI and mocked test so no network download happens here.
  ✓ Inspect repo scripts, training code, docs, and tests
  → Patch training corpus behavior and Edge-TTS retry handling
  • Add download-corpus CLI and mocked smoke test
  • Update deployment docs and compose comment
  • Run targeted validation and requested console tests
  • Stage explicit files and create logical commits
apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\console\backend\app\services\training_service.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\src\violawake_sdk\tools\train.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\src\violawake_sdk\tools\download_corpus.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content tests\\integration\\test_sdk_surface.py | Select-Object -Skip 130 -First 50" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 372ms:
        vad = VADEngine(backend="webrtc")

    assert vad.backend_name == "webrtc"
    assert vad.process_frame(silent_frame) == 0.0


def test_vad_engine_rms_backend_constructs_without_hardware(silent_frame: bytes) -> None:
    vad = VADEngine(backend="rms")

    assert vad.backend_name == "rms"
    assert vad.process_frame(silent_frame) == 0.0


def test_vad_engine_invalid_backend_raises() -> None:
    with pytest.raises(ValueError, match="invalid_backend"):
        VADEngine(backend="invalid_backend")  # type: ignore[arg-type]


@pytest.mark.parametrize(
    "module_name",
    [
        "violawake_sdk.tools.evaluate",
        "violawake_sdk.tools.collect_samples",
        "violawake_sdk.tools.train",
        "violawake_sdk.tools.download_model",
    ],
)
def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
    module = importlib.import_module(module_name)

    assert hasattr(module, "main"), f"{module_name} should define main()"
    assert callable(module.main)

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\tests\unit\test_download_corpus.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\pyproject.toml
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\tests\integration\test_feature_completeness.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\tests\integration\test_sdk_surface.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\tests\unit\test_cli.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

2026-05-07T12:56:38.666368Z ERROR codex_core::tools::router: error=apply_patch verification failed: Failed to find expected lines in J:\CLAUDE\PROJECTS\Wakeword\src\violawake_sdk\tools\train.py:
    async def _synth():
        communicate = edge_tts.Communicate(text, voice)
        mp3_buf = io.BytesIO()
        async for chunk in communicate.stream():
            if chunk["type"] == "audio":
                mp3_buf.write(chunk["data"])
        return mp3_buf.getvalue()

    def _run_synth() -> bytes:
        try:
            loop = asyncio.get_event_loop()
        except RuntimeError:
            return asyncio.run(_synth())
        if loop.is_running():
            import concurrent.futures

            with concurrent.futures.ThreadPoolExecutor() as pool:
                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content src\\violawake_sdk\\tools\\train.py | Select-Object -Skip 300 -First 80" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 511ms:
    _LAST_EDGE_TTS_ERROR = None

    try:
        import edge_tts
    except ImportError as exc:
        message = "edge-tts is not installed. Install with: pip install edge-tts"
        if exc:
            message = f"{message} ({type(exc).__name__}: {exc})"
        return _edge_tts_fail(
            text,
            voice,
            message,
        )

    async def _synth():
        communicate = edge_tts.Communicate(text, voice)
        mp3_buf = io.BytesIO()
        async for chunk in communicate.stream():
            if chunk["type"] == "audio":
                mp3_buf.write(chunk["data"])
        return mp3_buf.getvalue()

    def _run_synth() -> bytes:
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                import concurrent.futures

                with concurrent.futures.ThreadPoolExecutor() as pool:
                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
            return loop.run_until_complete(_synth())
        except RuntimeError:
            return asyncio.run(_synth())

    mp3_data: bytes | None = None
    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
    for attempt in range(1, max_attempts + 1):
        try:
            mp3_data = _run_synth()
            break
        except Exception as exc:
            if attempt >= max_attempts:
                return _edge_tts_fail(
                    text,
                    voice,
                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
                )
            delay = min(
                _EDGE_TTS_RETRY_MAX_SECONDS,
                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
            )
            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
            logger.warning(
                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
                "%s: %s; retrying in %.2fs",
                attempt,
                max_attempts,
                voice,
                text,
                type(exc).__name__,
                exc,
                delay,
            )
            time.sleep(delay)

    if not mp3_data or len(mp3_data) < 100:
        return _edge_tts_fail(
            text,
            voice,
            f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
        )

    conversion_errors: list[str] = []

    # First try libsndfile via soundfile. The backend image already gets this
    # through the training stack, and it avoids a hard ffmpeg dependency.
    try:
        import numpy as np
        import soundfile as sf


diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..4b7eaaa901410938cc34d46726af83fc68bb1610
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,52 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+
+                with concurrent.futures.ThreadPoolExecutor() as pool:
+                    return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+            return loop.run_until_complete(_synth())
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +670,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\src\violawake_sdk\tools\train.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\tests\unit\test_train.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

codex
The backend speech-negative source is now corpus/user-files only; confusables remain wake-word-specific. I also moved the Edge-TTS retry into the single synthesis helper so positives and confusables both inherit the backoff without making the generic speech-negative path production-critical again.
  ✓ Inspect repo scripts, training code, docs, and tests
  ✓ Patch training corpus behavior and Edge-TTS retry handling
  → Add download-corpus CLI and mocked smoke test
  → Update deployment docs and compose comment
  • Run targeted validation and requested console tests
  • Stage explicit files and create logical commits
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\docker-compose.production.yml
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content docs\\DEPLOYMENT.md | Select-Object -Skip 110 -First 80" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 742ms:
VIOLAWAKE_ADMIN_TOKEN=...                 # Admin endpoints; treat like a root key
VIOLAWAKE_CONSOLE_BASE_URL=https://violawake.com
VIOLAWAKE_CORS_ORIGINS=https://violawake.com,https://www.violawake.com
VIOLAWAKE_PORT=8000

# Stripe (test or live mode â€” match across all four)
VIOLAWAKE_STRIPE_SECRET_KEY=...
VIOLAWAKE_STRIPE_WEBHOOK_SECRET=...
VIOLAWAKE_STRIPE_PRICE_DEVELOPER=price_...
VIOLAWAKE_STRIPE_PRICE_BUSINESS=price_...

# Email (Resend) â€” leave unset for auto-verify dev fallback
VIOLAWAKE_RESEND_API_KEY=...

# Limits / retention
VIOLAWAKE_MAX_CONCURRENT_JOBS=4
VIOLAWAKE_TRAINING_TIMEOUT=900
VIOLAWAKE_RECORDING_RETENTION_DAYS=30
VIOLAWAKE_MODEL_RETENTION_DAYS=90
VIOLAWAKE_POST_TRAINING_RETENTION_HOURS=72
VIOLAWAKE_NEGATIVES_CORPUS_DIR=/app/negatives
VIOLAWAKE_ALGORITHM=HS256
VIOLAWAKE_ACCESS_TOKEN_EXPIRE_HOURS=24
```

If a key is missing, the backend either no-ops the feature gracefully (Stripe â†’ 503 with "Billing features require a configured Stripe secret key"; Resend â†’ silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.

---

## Frontend deploy (violawake.com)

### Prerequisites

- `wrangler` installed and logged in: `wrangler whoami` should show your Cloudflare account.
- Node 20+ (for `npm run build`).

### Deploy

```bash
cd /j/CLAUDE/PROJECTS/Wakeword/console/frontend

# 1. Build with the production API URL baked into the bundle.
#    THIS IS NOT OPTIONAL â€” Vite bakes import.meta.env.VITE_API_URL at build time.
#    If you forget, the bundle calls /api on violawake.com â†’ 405.
VITE_API_URL=https://api.violawake.com/api npm run build

# 2. Verify the right URL was baked in (sanity check)
grep -c "https://api.violawake.com/api" dist/assets/*.js   # expect: at least 1

# 3. Deploy
wrangler pages deploy dist --project-name violawake --branch master --commit-dirty=true

# 4. Verify the production URL serves the new bundle
curl -sS https://violawake.com/ | grep -oE '/assets/index-[A-Za-z0-9]+\.js'
```

Each `wrangler pages deploy` produces a unique preview URL (e.g. `https://05488c49.violawake.pages.dev`). The production alias `violawake.com` is updated automatically when deploying to the production branch (`--branch master`). Roll back via the Pages dashboard.

### Frontend pages list

Live as of 2026-05-07: `/` `/login` `/register` `/forgot-password` `/reset-password` `/verify-email` `/dashboard` `/record` `/training/:jobId` `/billing` `/pricing` `/teams` `/teams/:teamId` `/teams/accept` `/account/password` `/privacy` `/terms` `/landing`.

---

## Cloudflare Tunnel

### Tunnel inventory

| Name | UUID | Purpose | How it's run |
|---|---|---|---|
| `violawake-api` | `7dbef1da-74e3-4d7f-bba9-aad4a3e72150` | Routes `api.violawake.com` to local backend | `wakeword-tunnel-1` Docker container |
| `Viola_app` | `97f23a85-78fc-40e5-978c-426c752015d1` | NOVVIOLA â€” routes `api.useviola.com` etc | `cloudflared.exe` Windows Service |

The two tunnels run independently. **Restarting one does NOT affect the other.** This is the basis of the NOVVIOLA-vs-ViolaWake decoupling described in `CLAUDE.md`.

### Routes config

The current tunnel uses **remote-managed config** (Cloudflare dashboard â†’ Zero Trust â†’ Tunnels). The local `~/.cloudflared/config.yml.bak.dead-violawake-tunnel-2026-04-26` is from a DEAD legacy tunnel and should not be used as a reference for current behavior.

To inspect or change routes:

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

2026-05-07T12:57:38.516398Z ERROR codex_core::tools::router: error=apply_patch verification failed: Failed to find expected lines in J:\CLAUDE\PROJECTS\Wakeword\docs\DEPLOYMENT.md:
If a key is missing, the backend either no-ops the feature gracefully (Stripe â†’ 503 with "Billing features require a configured Stripe secret key"; Resend â†’ silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.

---

## Frontend deploy (violawake.com)
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n \"If a key is missing|Frontend deploy\" docs\\DEPLOYMENT.md" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 214ms:
19:  Frontend deploy (manual)                Backend deploy (manual)
136:If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
140:## Frontend deploy (violawake.com)

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\docs\DEPLOYMENT.md
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..c74f5d54711c79af259cf410f088ecf2d37793bc
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *( [service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else [] ),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

apply patch
patch: completed
J:\CLAUDE\PROJECTS\Wakeword\console\backend\app\services\training_service.py
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "git diff -- console\\backend\\app\\services\\training_service.py" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "git diff -- src\\violawake_sdk\\tools\\train.py" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "git diff -- pyproject.toml docker-compose.production.yml docs\\DEPLOYMENT.md" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 292ms:
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a75..0c687cd 100644
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -76,6 +76,10 @@ logger = logging.getLogger(__name__)
 _TMP_DIR: str | None = None
 _LAST_EDGE_TTS_ERROR: str | None = None
 _REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,21 +320,49 @@ def _edge_tts_synthesize(text: str, voice: str, output_path: Path) -> bool:
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
+    def _run_synth() -> bytes:
         try:
             loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
         except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
 
     if not mp3_data or len(mp3_data) < 100:
         return _edge_tts_fail(
@@ -640,7 +672,10 @@ def _generate_speech_negatives(
     n_voices: int = 5,
     verbose: bool = True,
 ) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
 
     Returns list of generated WAV file paths.
     """
warning: in the working copy of 'src/violawake_sdk/tools/train.py', LF will be replaced by CRLF the next time Git touches it

 succeeded in 326ms:
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740c..af849b1 100644
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@ def run_training_job_sync(
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@ def run_training_job_sync(
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@ def run_training_job_sync(
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@ def run_training_job_sync(
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
warning: in the working copy of 'console/backend/app/services/training_service.py', LF will be replaced by CRLF the next time Git touches it

 succeeded in 316ms:
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb..85e1d2c 100644
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,10 +25,9 @@ services:
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
       - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc5..4820f1b 100644
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@ VIOLAWAKE_ACCESS_TOKEN_EXPIRE_HOURS=24
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b..f56290a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@ violawake-train = "violawake_sdk.tools.train:main"
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
warning: in the working copy of 'docker-compose.production.yml', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'docs/DEPLOYMENT.md', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'pyproject.toml', LF will be replaced by CRLF the next time Git touches it

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "git diff -- src\\violawake_sdk\\tools\\download_corpus.py tests\\unit\\test_download_corpus.py tests\\unit\\test_cli.py tests\\unit\\test_train.py tests\\integration\\test_feature_completeness.py tests\\integration\\test_sdk_surface.py" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'python -m violawake_sdk.tools.download_corpus --help' in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "python -m pytest tests\\unit\\test_download_corpus.py tests\\unit\\test_train.py::TestTrainHelpers::test_edge_tts_synthesize_retries_transient_stream_failure -q --no-cov" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 638ms:
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43..247dcfd 100644
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1729,6 +1729,7 @@ class TestCLIEntryPoints:
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
         "violawake-collect": "violawake_sdk.tools.collect_samples:main",
         "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
         "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
         "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4..3b837af 100644
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -153,6 +153,7 @@ def test_vad_engine_invalid_backend_raises() -> None:
         "violawake_sdk.tools.collect_samples",
         "violawake_sdk.tools.train",
         "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
     ],
 )
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a85..17876ed 100644
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -399,6 +399,20 @@ class TestDownloadCLI:
         assert "violawake-download" in result.stdout
 
 
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
 # ===================================================================
 # violawake-collect  (tools.collect_samples:main)
 # ===================================================================
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf..2f6d290 100644
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@ class TestTrainHelpers:
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:
warning: in the working copy of 'tests/integration/test_feature_completeness.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/integration/test_sdk_surface.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/unit/test_cli.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/unit/test_train.py', LF will be replaced by CRLF the next time Git touches it

 succeeded in 1000ms:
usage: violawake-download-corpus [-h] [--target-dir TARGET_DIR] [--force]
                                 [--quiet]

Download the shared speech-negative corpus used by ViolaWake training.
Currently installs LibriSpeech dev-clean.

options:
  -h, --help            show this help message and exit
  --target-dir TARGET_DIR
                        Corpus install directory (default:
                        ~/.violawake/corpus).
  --force               Replace an existing LibriSpeech dev-clean download.
  --quiet               Suppress progress output.

 succeeded in 3676ms:
============================= test session starts =============================
platform win32 -- Python 3.11.9, pytest-8.4.1, pluggy-1.6.0
PySide6 6.11.0 -- Qt runtime 6.11.0 -- Qt compiled 6.11.0
rootdir: J:\CLAUDE\PROJECTS\Wakeword
configfile: pyproject.toml
plugins: anyio-4.12.0, hypothesis-6.148.7, asyncio-1.3.0, cov-7.0.0, qt-4.5.0, timeout-2.4.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 3 items

tests\unit\test_download_corpus.py ..                                    [ 66%]
tests\unit\test_train.py .                                               [100%]

============================== 3 passed in 1.56s ==============================

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

codex
The downloader help and its mocked archive smoke test pass without touching the network. I’m moving into the requested console test command now, then I’ll inspect the final diff and commit only the files from this task.
  ✓ Inspect repo scripts, training code, docs, and tests
  ✓ Patch training corpus behavior and Edge-TTS retry handling
  ✓ Add download-corpus CLI and mocked smoke test
  ✓ Update deployment docs and compose comment
  → Run targeted validation and requested console tests
  • Stage explicit files and create logical commits
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'bash -lc "cd console && python -m pytest tests/ --no-cov --timeout=30 --ignore=tests/e2e -q 2>&1 | tail -3"' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 38268ms:
tests\test_teams.py .........................                            [100%]

======================= 147 passed, 2 skipped in 34.77s =======================

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "python -m pytest tests\\unit\\test_cli.py::TestDownloadCorpusCLI tests\\integration\\test_feature_completeness.py::TestCLIEntryPoints::test_entry_point_resolves tests\\integration\\test_sdk_surface.py::test_cli_entrypoints_are_importable_and_expose_main -q --no-cov" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 18602ms:
============================= test session starts =============================
platform win32 -- Python 3.11.9, pytest-8.4.1, pluggy-1.6.0
PySide6 6.11.0 -- Qt runtime 6.11.0 -- Qt compiled 6.11.0
rootdir: J:\CLAUDE\PROJECTS\Wakeword
configfile: pyproject.toml
plugins: anyio-4.12.0, hypothesis-6.148.7, asyncio-1.3.0, cov-7.0.0, qt-4.5.0, timeout-2.4.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 16 items

tests\unit\test_cli.py .                                                 [  6%]
tests\integration\test_feature_completeness.py ..........                [ 68%]
tests\integration\test_sdk_surface.py .....                              [100%]

============================= 16 passed in 16.08s =============================

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "python -m py_compile console\\backend\\app\\services\\training_service.py src\\violawake_sdk\\tools\\train.py src\\violawake_sdk\\tools\\download_corpus.py" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 395ms:
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git status --short' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 266ms:
 M console/backend/app/services/training_service.py
 M console/backend/requirements.txt
 M docker-compose.production.yml
 M docs/DEPLOYMENT.md
 M pyproject.toml
 M src/violawake_sdk/tools/train.py
 M tests/integration/test_feature_completeness.py
 M tests/integration/test_sdk_surface.py
 M tests/unit/test_cli.py
 M tests/unit/test_train.py
?? .codex_log_a.txt
?? .codex_log_alpha.txt
?? .codex_log_b.txt
?? .codex_log_c.txt
?? .codex_log_corpus.txt
?? .codex_log_delta.txt
?? .codex_prompt_a.txt
?? .codex_prompt_alpha.txt
?? .codex_prompt_b.txt
?? .codex_prompt_c.txt
?? .codex_prompt_corpus.txt
?? .codex_prompt_delta.txt
?? src/violawake_sdk/tools/download_corpus.py
?? tests/unit/test_download_corpus.py

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git diff --stat' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 1733ms:
 console/backend/app/services/training_service.py | 43 +++++++----------
 console/backend/requirements.txt                 |  1 +
 docker-compose.production.yml                    |  7 ++-
 docs/DEPLOYMENT.md                               | 22 +++++++++
 pyproject.toml                                   |  1 +
 src/violawake_sdk/tools/train.py                 | 61 +++++++++++++++++++-----
 tests/integration/test_feature_completeness.py   |  1 +
 tests/integration/test_sdk_surface.py            |  1 +
 tests/unit/test_cli.py                           | 14 ++++++
 tests/unit/test_train.py                         | 39 +++++++++++++++
 10 files changed, 148 insertions(+), 42 deletions(-)
warning: in the working copy of 'console/backend/app/services/training_service.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'console/backend/requirements.txt', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'docker-compose.production.yml', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'docs/DEPLOYMENT.md', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'pyproject.toml', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'src/violawake_sdk/tools/train.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/integration/test_feature_completeness.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/integration/test_sdk_surface.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/unit/test_cli.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/unit/test_train.py', LF will be replaced by CRLF the next time Git touches it

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "git diff -- console\\backend\\requirements.txt" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 279ms:
diff --git a/console/backend/requirements.txt b/console/backend/requirements.txt
index 14572f9..d2ce6c6 100644
--- a/console/backend/requirements.txt
+++ b/console/backend/requirements.txt
@@ -23,3 +23,4 @@ sentry-sdk[fastapi]>=2.0
 tomli>=2.0; python_version < "3.11"
 boto3>=1.34
 fast_mp3_augment
+onnxscript
warning: in the working copy of 'console/backend/requirements.txt', LF will be replaced by CRLF the next time Git touches it

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

codex
Validation is clean so far: the requested console run ended at `147 passed, 2 skipped`, and the new downloader/entry-point tests passed. I’m committing in four explicit groups now: backend corpus behavior, Edge-TTS retry/deprecation, downloader CLI, and deployment docs.
  ✓ Inspect repo scripts, training code, docs, and tests
  ✓ Patch training corpus behavior and Edge-TTS retry handling
  ✓ Add download-corpus CLI and mocked smoke test
  ✓ Update deployment docs and compose comment
  ✓ Run targeted validation and requested console tests
  → Stage explicit files and create logical commits
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git add console/backend/app/services/training_service.py; git commit -m "Require corpus speech negatives in backend training"' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 996ms:
[master b45e03c] Require corpus speech negatives in backend training
 1 file changed, 18 insertions(+), 25 deletions(-)
warning: in the working copy of 'console/backend/app/services/training_service.py', LF will be replaced by CRLF the next time Git touches it

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git add src/violawake_sdk/tools/train.py tests/unit/test_train.py; git commit -m "Add Edge TTS retry backoff"' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 909ms:
[master 854acaf] Add Edge TTS retry backoff
 2 files changed, 87 insertions(+), 13 deletions(-)
warning: in the working copy of 'src/violawake_sdk/tools/train.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/unit/test_train.py', LF will be replaced by CRLF the next time Git touches it

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git add src/violawake_sdk/tools/download_corpus.py pyproject.toml tests/unit/test_download_corpus.py tests/unit/test_cli.py tests/integration/test_feature_completeness.py tests/integration/test_sdk_surface.py; git commit -m "Add corpus downloader CLI"' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 1045ms:
[master 49ffd9a] Add corpus downloader CLI
 6 files changed, 218 insertions(+)
 create mode 100644 src/violawake_sdk/tools/download_corpus.py
 create mode 100644 tests/unit/test_download_corpus.py
warning: in the working copy of 'pyproject.toml', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/integration/test_feature_completeness.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/integration/test_sdk_surface.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/unit/test_cli.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'src/violawake_sdk/tools/download_corpus.py', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'tests/unit/test_download_corpus.py', LF will be replaced by CRLF the next time Git touches it

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git add docker-compose.production.yml docs/DEPLOYMENT.md; git commit -m "Document required training corpus mount"' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 738ms:
[master 6de96f5] Document required training corpus mount
 2 files changed, 25 insertions(+), 4 deletions(-)
warning: in the working copy of 'docker-compose.production.yml', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'docs/DEPLOYMENT.md', LF will be replaced by CRLF the next time Git touches it

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git status --short' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 223ms:
 M console/backend/requirements.txt
?? .codex_log_a.txt
?? .codex_log_alpha.txt
?? .codex_log_b.txt
?? .codex_log_c.txt
?? .codex_log_corpus.txt
?? .codex_log_delta.txt
?? .codex_prompt_a.txt
?? .codex_prompt_alpha.txt
?? .codex_prompt_b.txt
?? .codex_prompt_c.txt
?? .codex_prompt_corpus.txt
?? .codex_prompt_delta.txt

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

  ✓ Inspect repo scripts, training code, docs, and tests
  ✓ Patch training corpus behavior and Edge-TTS retry handling
  ✓ Add download-corpus CLI and mocked smoke test
  ✓ Update deployment docs and compose comment
  ✓ Run targeted validation and requested console tests
  ✓ Stage explicit files and create logical commits
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git show --name-only --oneline --no-renames HEAD~3..HEAD' in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git show --oneline --stat HEAD~3..HEAD' in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git show --unified=60 b45e03c -- console/backend/app/services/training_service.py' in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n \"violawake-download-corpus|download_corpus\" pyproject.toml src\\violawake_sdk\\tools\\download_corpus.py docker-compose.production.yml docs\\DEPLOYMENT.md" in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 415ms:
6de96f5 Document required training corpus mount
docker-compose.production.yml
docs/DEPLOYMENT.md
49ffd9a Add corpus downloader CLI
pyproject.toml
src/violawake_sdk/tools/download_corpus.py
tests/integration/test_feature_completeness.py
tests/integration/test_sdk_surface.py
tests/unit/test_cli.py
tests/unit/test_download_corpus.py
854acaf Add Edge TTS retry backoff
src/violawake_sdk/tools/train.py
tests/unit/test_train.py

 succeeded in 384ms:
src\violawake_sdk\tools\download_corpus.py:3:Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
src\violawake_sdk\tools\download_corpus.py:98:        prog="violawake-download-corpus",
docs\DEPLOYMENT.md:151:violawake-download-corpus
docs\DEPLOYMENT.md:158:`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
docker-compose.production.yml:30:      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
pyproject.toml:148:violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"

 succeeded in 415ms:
6de96f5 Document required training corpus mount
 docker-compose.production.yml |  7 +++----
 docs/DEPLOYMENT.md            | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)
49ffd9a Add corpus downloader CLI
 pyproject.toml                                 |   1 +
 src/violawake_sdk/tools/download_corpus.py     | 143 +++++++++++++++++++++++++
 tests/integration/test_feature_completeness.py |   1 +
 tests/integration/test_sdk_surface.py          |   1 +
 tests/unit/test_cli.py                         |  14 +++
 tests/unit/test_download_corpus.py             |  58 ++++++++++
 6 files changed, 218 insertions(+)
854acaf Add Edge TTS retry backoff
 src/violawake_sdk/tools/train.py | 61 +++++++++++++++++++++++++++++++---------
 tests/unit/test_train.py         | 39 +++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 13 deletions(-)

 succeeded in 417ms:
commit b45e03c25edcc26e0675e591142e987ce2b69f86
Author: ViolaSDK <viola-sdk@project.local>
Date:   Thu May 7 08:00:50 2026 -0500

    Require corpus speech negatives in backend training

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740c..af849b1 100644
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -42,288 +42,281 @@ def run_training_job_sync(
     recording_identifiers: list[str],
     output_path: Path,
     epochs: int,
     timeout_seconds: int,
     progress_callback: Callable[[dict[str, Any]], None],
     is_cancelled: Callable[[], bool],
     negatives_dir: Path | None = None,
 ) -> TrainingArtifact:
     """Run the ViolaWake SDK training pipeline synchronously."""
     positives_dir: Path | None = None
     neg_temp_dir: Path | None = None
     storage = get_storage()
 
     def _ensure_not_cancelled() -> None:
         if is_cancelled():
             raise TrainingCancelledError("Training cancelled by user")
 
     try:
         _ensure_not_cancelled()
         progress_callback({
             "status": "running",
             "progress": 0.0,
             "epoch": 0,
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
             "message": "Preparing training data...",
             "error": None,
         })
 
         positives_dir = Path(tempfile.mkdtemp(prefix="violawake_train_", dir=str(settings.tmp_dir)))
         for index, recording_identifier in enumerate(recording_identifiers):
             _ensure_not_cancelled()
             if not storage.exists(recording_identifier):
                 logger.warning("Recording %s was missing for training job %s", recording_identifier, job_id)
                 continue
 
             dst = positives_dir / f"sample_{index:04d}.wav"
             dst.write_bytes(storage.download(recording_identifier))
 
         wav_count = len(list(positives_dir.glob("*.wav")))
         if wav_count < 5:
             raise RuntimeError("Only %s valid WAV files found. Need at least 5." % wav_count)
 
         pos_files = sorted(positives_dir.glob("*.wav"))
 
         progress_callback({
             "status": "running",
             "progress": 2.0,
             "epoch": 0,
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
             "message": "Loaded %s recordings. Generating TTS corpus..." % wav_count,
             "error": None,
         })
 
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
 
         neg_temp_dir = Path(tempfile.mkdtemp(prefix="violawake_neg_", dir=str(settings.tmp_dir)))
 
         # Auto-generate TTS positives when user has <100 samples (production behavior)
         # Keep track of user-provided files so augmentation targets only real recordings
         user_pos_files = list(pos_files)
         if len(pos_files) < 100:
             tts_pos_dir = neg_temp_dir / "tts_positives"
             try:
                 tts_pos_files = _generate_tts_positives(
                     wake_word,
                     tts_pos_dir,
                     verbose=False,
                 )
                 if tts_pos_files:
                     pos_files = list(pos_files) + tts_pos_files
                     logger.info(
                         "Generated %s TTS positives for job %s (total: %s)",
                         len(tts_pos_files), job_id, len(pos_files),
                     )
             except Exception as exc:
                 logger.error(
                     "TTS positive generation FAILED for job %s: %s — "
                     "model quality will be degraded without TTS diversity",
                     job_id, exc,
                 )
 
             _ensure_not_cancelled()
             progress_callback({
                 "status": "running",
                 "progress": 3.0,
                 "epoch": 0,
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
 
         # Source 1: User/paid-tier corpus negatives
         if negatives_dir and negatives_dir.exists():
             user_neg = sorted(
                 list(negatives_dir.rglob("*.wav")) + list(negatives_dir.rglob("*.flac"))
             )
             if user_neg:
                 neg_tag_map["neg_user"] = user_neg
                 logger.info("Loaded %s corpus negatives for job %s", len(user_neg), job_id)
 
         _ensure_not_cancelled()
 
         # Source 2: Auto-generated confusable negatives (phonetically similar)
         # Two rounds matching CLI production pipeline:
         #   Round 1: 30 confusables x 10 voices (broad phonetic coverage)
         #   Round 2: 16 confusables x 10 voices (tight variants for hard negatives)
         confusable_dir_r1 = neg_temp_dir / "confusables_r1"
         try:
             confusable_r1 = _generate_confusable_negatives(
                 wake_word,
                 confusable_dir_r1,
                 n_confusables=30,
                 voices_per_word=10,
                 verbose=False,
             )
             if confusable_r1:
                 neg_tag_map["neg_confusable_r1"] = confusable_r1
         except Exception as exc:
             logger.error(
                 "Confusable round 1 FAILED for job %s: %s — "
                 "model will have higher false positive rate on similar-sounding words",
                 job_id, exc,
             )
 
         _ensure_not_cancelled()
 
         confusable_dir_r2 = neg_temp_dir / "confusables_r2"
         try:
             confusable_r2 = _generate_confusable_negatives(
                 wake_word,
                 confusable_dir_r2,
                 n_confusables=16,
                 voices_per_word=10,
                 verbose=False,
             )
             if confusable_r2:
                 neg_tag_map["neg_confusable_r2"] = confusable_r2
         except Exception as exc:
             logger.error(
                 "Confusable round 2 FAILED for job %s: %s",
                 job_id, exc,
             )
 
         _ensure_not_cancelled()
         progress_callback({
             "status": "running",
             "progress": 4.0,
             "epoch": 0,
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
         _CORPUS_SUBDIRS: dict[str, tuple[str, ...]] = {
             "neg_librispeech": ("librispeech",),
             "neg_musan_speech": ("musan/musan/speech", "musan/speech"),
             "neg_musan_music": ("musan/musan/music", "musan/music"),
             "neg_musan_noise": ("musan/musan/noise", "musan/noise"),
         }
         _rng = random.Random(42)
         for tag, subdirs in _CORPUS_SUBDIRS.items():
             for corpus_root in _CORPUS_SEARCH_PATHS:
                 if not corpus_root.exists():
                     continue
                 for subdir in subdirs:
                     candidate = corpus_root / subdir
                     if candidate.exists():
                         corpus_files = sorted(
                             list(candidate.rglob("*.wav")) + list(candidate.rglob("*.flac"))
                         )
                         if corpus_files:
                             if len(corpus_files) > 2000:
                                 corpus_files = sorted(_rng.sample(corpus_files, 2000))
                             neg_tag_map[tag] = corpus_files
                             break
                 if tag in neg_tag_map:
                     break
 
         all_neg_files: list[Path] = []
         for files in neg_tag_map.values():
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
             "status": "running",
             "progress": 8.0,
             "epoch": 0,
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
             "message": "Corpus ready: %s pos, %s neg. Training TemporalCNN..." % (len(pos_files), total_neg),
             "error": None,
         })
 
         started_at = time.monotonic()
 
         def _on_epoch(info: dict[str, Any]) -> None:
             _ensure_not_cancelled()
             elapsed = time.monotonic() - started_at
             if elapsed > timeout_seconds:
                 raise RuntimeError(
                     "Training job timed out after %ss (%s minutes)"
                     % (timeout_seconds, timeout_seconds // 60)
                 )
 
             epoch = int(info.get("epoch", 0))
             total_epochs = int(info.get("total_epochs", epochs)) or epochs
             train_loss = float(info.get("train_loss", 0.0))
             val_loss = float(info.get("val_loss", 0.0))
             progress = min(10.0 + 85.0 * (epoch / total_epochs), 95.0)
 
             progress_callback({
                 "status": "running",
                 "progress": round(progress, 2),
                 "epoch": epoch,
                 "total_epochs": total_epochs,
                 "train_loss": round(train_loss, 6),
                 "val_loss": round(val_loss, 6),
                 "message": "Epoch %s/%s - loss: %.4f" % (epoch, total_epochs, train_loss),
                 "error": None,
             })
 
         _train_temporal_cnn(
             pos_files=pos_files,
             neg_files=all_neg_files,
             output_path=output_path,
             wake_word=wake_word,
             epochs=epochs,
             augment=True,
             eval_dir=None,
             verbose=True,
             progress_callback=_on_epoch,
             neg_tags=neg_tag_map,
             tmp_dir=settings.tmp_dir,
             augment_source_files=user_pos_files,
         )
 
         _ensure_not_cancelled()
         if not output_path.exists():
             raise RuntimeError("Training completed but no model file was produced")
 

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git show --name-only --oneline --no-renames HEAD~4..HEAD' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 227ms:
6de96f5 Document required training corpus mount
docker-compose.production.yml
docs/DEPLOYMENT.md
49ffd9a Add corpus downloader CLI
pyproject.toml
src/violawake_sdk/tools/download_corpus.py
tests/integration/test_feature_completeness.py
tests/integration/test_sdk_surface.py
tests/unit/test_cli.py
tests/unit/test_download_corpus.py
854acaf Add Edge TTS retry backoff
src/violawake_sdk/tools/train.py
tests/unit/test_train.py
b45e03c Require corpus speech negatives in backend training
console/backend/app/services/training_service.py

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n \"def download_librispeech_dev_clean|def build_parser|def main|LIBRISPEECH_DEV_CLEAN_URL|DEFAULT_CORPUS_DIR\" src\\violawake_sdk\\tools\\download_corpus.py" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "rg -n \"REQUIRED|/app/corpus|violawake-download-corpus\" docker-compose.production.yml docs\\DEPLOYMENT.md" in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'rg -n "violawake-download-corpus" pyproject.toml' in J:\CLAUDE\PROJECTS\Wakeword
exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git status --short' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 1214ms:
16:LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
17:DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
50:def download_librispeech_dev_clean(
51:    target_dir: Path = DEFAULT_CORPUS_DIR,
74:            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
76:        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
96:def build_parser() -> argparse.ArgumentParser:
107:        default=DEFAULT_CORPUS_DIR,
123:def main(argv: list[str] | None = None) -> None:

 succeeded in 1419ms:
docs\DEPLOYMENT.md:145:./corpus:/app/corpus:ro
docs\DEPLOYMENT.md:151:violawake-download-corpus
docs\DEPLOYMENT.md:155:~/.violawake/corpus:/app/corpus:ro
docs\DEPLOYMENT.md:158:`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
docker-compose.production.yml:28:      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
docker-compose.production.yml:30:      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
docker-compose.production.yml:31:      - ./corpus:/app/corpus:ro

 succeeded in 1450ms:
148:violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"

 succeeded in 1654ms:
 M console/backend/requirements.txt
?? .codex_log_a.txt
?? .codex_log_alpha.txt
?? .codex_log_b.txt
?? .codex_log_c.txt
?? .codex_log_corpus.txt
?? .codex_log_delta.txt
?? .codex_prompt_a.txt
?? .codex_prompt_alpha.txt
?? .codex_prompt_b.txt
?? .codex_prompt_c.txt
?? .codex_prompt_corpus.txt
?? .codex_prompt_delta.txt

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

exec
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command 'git log --oneline -4' in J:\CLAUDE\PROJECTS\Wakeword
 succeeded in 230ms:
6de96f5 Document required training corpus mount
49ffd9a Add corpus downloader CLI
854acaf Add Edge TTS retry backoff
b45e03c Require corpus speech negatives in backend training

diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

codex
Implemented and committed. No push.

**Commits**
- `b45e03c` Require corpus speech negatives in backend training
- `854acaf` Add Edge TTS retry backoff
- `49ffd9a` Add corpus downloader CLI
- `6de96f5` Document required training corpus mount

**Proof**
- Requested console run:
  `147 passed, 2 skipped in 34.77s` from:
  `cd console && python -m pytest tests/ --no-cov --timeout=30 --ignore=tests/e2e -q`
- Downloader smoke/retry tests: `3 passed`
- CLI/import entry tests: `16 passed`
- `python -m py_compile` on changed Python files passed.
- `python -m violawake_sdk.tools.download_corpus --help` works and shows `violawake-download-corpus`, `--target-dir`, `--force`, `--quiet`.

**Key Diff**
```diff
-            _generate_speech_negatives,
...
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
...
-        # Source 3: Auto-generated speech negatives (common phrases)
-        speech_files = _generate_speech_negatives(...)
...
+        if total_speech_neg < 5 or total_neg < 5:
+            raise RuntimeError(
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
+            )
```

**Files changed**
- [training_service.py](J:/CLAUDE/PROJECTS/Wakeword/console/backend/app/services/training_service.py)
- [train.py](J:/CLAUDE/PROJECTS/Wakeword/src/violawake_sdk/tools/train.py)
- [download_corpus.py](J:/CLAUDE/PROJECTS/Wakeword/src/violawake_sdk/tools/download_corpus.py)
- [pyproject.toml](J:/CLAUDE/PROJECTS/Wakeword/pyproject.toml)
- [docker-compose.production.yml](J:/CLAUDE/PROJECTS/Wakeword/docker-compose.production.yml)
- [DEPLOYMENT.md](J:/CLAUDE/PROJECTS/Wakeword/docs/DEPLOYMENT.md)
- tests for downloader, retry, CLI/script surface

`pyproject.toml` now has:
```toml
violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
```

The remaining dirty worktree entries are pre-existing and were not staged: `console/backend/requirements.txt` and the `.codex_log*` / `.codex_prompt*` files.

Takeaway: production training now requires corpus-backed speech negatives. No more flaky Edge-TTS hammer for generic speech negatives.
diff --git a/console/backend/app/services/training_service.py b/console/backend/app/services/training_service.py
index 75d740ca80a8adb7333a8590695087562cb2fb12..af849b12df6323881d05bc472ff1f02b2583fe51
--- a/console/backend/app/services/training_service.py
+++ b/console/backend/app/services/training_service.py
@@ -99,7 +99,6 @@
         # -- Production pipeline: full auto-corpus (matching CLI train) --
         from violawake_sdk.tools.train import (
             _generate_confusable_negatives,
-            _generate_speech_negatives,
             _generate_tts_positives,
             _train_temporal_cnn,
         )
@@ -138,7 +137,7 @@
                 "total_epochs": epochs,
                 "train_loss": 0.0,
                 "val_loss": 0.0,
-                "message": "Corpus: %s positives. Generating negatives..." % len(pos_files),
+                "message": "Corpus: %s positives. Generating wake-word negatives..." % len(pos_files),
                 "error": None,
             })
         neg_tag_map: dict[str, list[Path]] = {}
@@ -203,33 +202,17 @@
             "total_epochs": epochs,
             "train_loss": 0.0,
             "val_loss": 0.0,
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
             "error": None,
         })
 
-        # Source 3: Auto-generated speech negatives (common phrases)
-        # 5 voices matching CLI production pipeline (was 3)
-        speech_dir = neg_temp_dir / "speech"
-        try:
-            speech_files = _generate_speech_negatives(
-                speech_dir,
-                n_voices=5,
-                verbose=False,
-            )
-            if speech_files:
-                neg_tag_map["neg_speech"] = speech_files
-        except Exception as exc:
-            logger.error(
-                "Speech neg generation FAILED for job %s: %s — "
-                "model will have higher false positive rate on general speech",
-                job_id, exc,
-            )
-
         _ensure_not_cancelled()
 
         # Source 4: Universal corpus (LibriSpeech, MUSAN) if available
+        service_file = Path(__file__).resolve()
         _CORPUS_SEARCH_PATHS = [
-            Path(__file__).resolve().parent.parent.parent.parent / "corpus",  # repo root
+            Path("/app/corpus"),
+            *([service_file.parents[4] / "corpus"] if len(service_file.parents) > 4 else []),
             Path.home() / ".violawake" / "corpus",
             Path("corpus"),
         ]
@@ -263,10 +246,20 @@
             all_neg_files.extend(files)
 
         total_neg = len(all_neg_files)
-        if total_neg < 5:
+        speech_neg_tags = {
+            "neg_user",
+            "neg_librispeech",
+            "neg_musan_speech",
+            "neg_musan_music",
+            "neg_musan_noise",
+        }
+        total_speech_neg = sum(
+            len(files) for tag, files in neg_tag_map.items() if tag in speech_neg_tags
+        )
+        if total_speech_neg < 5 or total_neg < 5:
             raise RuntimeError(
-                "Only %s negative files generated. "
-                "edge-tts may not be installed or network unavailable." % total_neg
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
             )
 
         progress_callback({
diff --git a/docker-compose.production.yml b/docker-compose.production.yml
index a2ac4bb23eb53b205075ac31608aec81ecbfe73f..85e1d2c271a3cbce20e770210649c15bf818150f
--- a/docker-compose.production.yml
+++ b/docker-compose.production.yml
@@ -25,11 +25,10 @@
     volumes:
       - backend-data:/app/data
       - ${VIOLAWAKE_NEGATIVES_CORPUS_DIR:-./violawake_data/negatives}:/app/negatives:ro
-      # Universal corpus (LibriSpeech + MUSAN) for training-time negative samples.
-      # training_service searches Path("corpus") (CWD-relative = /app/corpus) when
-      # edge-tts negative generation falls short. Without this mount, training
-      # fails with "Only 0 negative files generated" if upstream TTS is degraded.
-      - ./corpus:/app/corpus:ro
+      # REQUIRED — universal corpus (LibriSpeech + MUSAN) for training-time
+      # speech negatives. See docs/DEPLOYMENT.md. If ./corpus is unavailable,
+      # run violawake-download-corpus and mount ~/.violawake/corpus here instead.
+      - ./corpus:/app/corpus:ro
     env_file:
       - .env.production
     environment:
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3e59bc56c6b98a645e8fc286c9eef1e207a67b18..4820f1b22b9348912045ee7866ffd59da155a603
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -135,6 +135,28 @@
 
 If a key is missing, the backend either no-ops the feature gracefully (Stripe → 503 with "Billing features require a configured Stripe secret key"; Resend → silently auto-verifies users) or fails to start (DB URL, secret key). Check container logs with `docker logs wakeword-backend-1` if the healthcheck fails.
 
+### Corpus
+
+Training requires a mounted universal speech-negative corpus. Generic speech negatives come from LibriSpeech/MUSAN file access, not Edge-TTS. Without this corpus, training fails fast with a clear error instead of silently falling back to flaky network TTS.
+
+Operator default: mount the in-repo corpus directory read-only:
+
+```yaml
+./corpus:/app/corpus:ro
+```
+
+Alternative: download the smaller starter corpus, then mount that path:
+
+```bash
+violawake-download-corpus
+```
+
+```yaml
+~/.violawake/corpus:/app/corpus:ro
+```
+
+`violawake-download-corpus` currently installs LibriSpeech `dev-clean` under `~/.violawake/corpus/librispeech/dev-clean`. MUSAN can be added under the same corpus root when the larger speech/music/noise set is available.
+
 ---
 
 ## Frontend deploy (violawake.com)
diff --git a/pyproject.toml b/pyproject.toml
index d5af86b737c2b94d3193ea6955d9d6cf9a63bf10..f56290a21171ac2846aaa233687e5fe7d60d1739
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -145,6 +145,7 @@
 violawake-eval = "violawake_sdk.tools.evaluate:main"
 violawake-collect = "violawake_sdk.tools.collect_samples:main"
 violawake-download = "violawake_sdk.tools.download_model:main"
+violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
 violawake-expand-corpus = "violawake_sdk.tools.expand_corpus:main"
 violawake-streaming-eval = "violawake_sdk.tools.streaming_eval:main"
 violawake-test-confusables = "violawake_sdk.tools.test_confusables:main"
diff --git a/src/violawake_sdk/tools/download_corpus.py b/src/violawake_sdk/tools/download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..af21e102d06a642ace6bc0d196c0fe00e8d75c5e
--- /dev/null
+++ b/src/violawake_sdk/tools/download_corpus.py
@@ -0,0 +1,143 @@
+"""Download the shared speech-negative corpus for ViolaWake training.
+
+Entry point: ``violawake-download-corpus`` (declared in pyproject.toml).
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+import tarfile
+import tempfile
+from pathlib import Path
+from urllib.request import urlretrieve
+
+LIBRISPEECH_DEV_CLEAN_URL = "https://www.openslr.org/resources/12/dev-clean.tar.gz"
+DEFAULT_CORPUS_DIR = Path.home() / ".violawake" / "corpus"
+
+
+def _is_within_directory(directory: Path, target: Path) -> bool:
+    directory = directory.resolve()
+    target = target.resolve()
+    return directory == target or directory in target.parents
+
+
+def _safe_extract_tar(archive_path: Path, destination: Path) -> None:
+    """Extract a tar archive after rejecting path traversal entries."""
+    with tarfile.open(archive_path, "r:gz") as tar:
+        for member in tar.getmembers():
+            if member.issym() or member.islnk():
+                raise RuntimeError(f"Refusing to extract link from corpus archive: {member.name}")
+            member_path = destination / member.name
+            if not _is_within_directory(destination, member_path):
+                raise RuntimeError(f"Refusing unsafe corpus archive path: {member.name}")
+        tar.extractall(destination)
+
+
+def _find_dev_clean(extract_dir: Path) -> Path:
+    expected = extract_dir / "LibriSpeech" / "dev-clean"
+    if expected.exists():
+        return expected
+
+    for path in extract_dir.rglob("dev-clean"):
+        if path.is_dir():
+            return path
+
+    raise RuntimeError("Downloaded LibriSpeech archive did not contain dev-clean/")
+
+
+def download_librispeech_dev_clean(
+    target_dir: Path = DEFAULT_CORPUS_DIR,
+    *,
+    force: bool = False,
+    quiet: bool = False,
+) -> Path:
+    """Download and extract LibriSpeech dev-clean under ``target_dir/librispeech``."""
+    corpus_dir = target_dir.expanduser().resolve()
+    librispeech_dir = corpus_dir / "librispeech"
+    destination = librispeech_dir / "dev-clean"
+
+    if destination.exists() and any(destination.rglob("*.flac")) and not force:
+        if not quiet:
+            print(f"LibriSpeech dev-clean already exists at {destination}")
+        return destination
+
+    corpus_dir.mkdir(parents=True, exist_ok=True)
+    temp_dir = Path(tempfile.mkdtemp(prefix="violawake_corpus_"))
+    archive_path = temp_dir / "dev-clean.tar.gz"
+    extract_dir = temp_dir / "extract"
+    extract_dir.mkdir()
+
+    try:
+        if not quiet:
+            print(f"Downloading LibriSpeech dev-clean from {LIBRISPEECH_DEV_CLEAN_URL}")
+            print(f"Target corpus directory: {corpus_dir}")
+        urlretrieve(LIBRISPEECH_DEV_CLEAN_URL, archive_path)
+
+        if not quiet:
+            print("Extracting LibriSpeech dev-clean...")
+        _safe_extract_tar(archive_path, extract_dir)
+        extracted_dev_clean = _find_dev_clean(extract_dir)
+
+        if destination.exists():
+            shutil.rmtree(destination)
+        librispeech_dir.mkdir(parents=True, exist_ok=True)
+        shutil.move(str(extracted_dev_clean), str(destination))
+
+        if not quiet:
+            flac_count = sum(1 for _ in destination.rglob("*.flac"))
+            print(f"Installed LibriSpeech dev-clean at {destination} ({flac_count} FLAC files)")
+        return destination
+    finally:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="violawake-download-corpus",
+        description=(
+            "Download the shared speech-negative corpus used by ViolaWake training. "
+            "Currently installs LibriSpeech dev-clean."
+        ),
+    )
+    parser.add_argument(
+        "--target-dir",
+        type=Path,
+        default=DEFAULT_CORPUS_DIR,
+        help="Corpus install directory (default: ~/.violawake/corpus).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Replace an existing LibriSpeech dev-clean download.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress progress output.",
+    )
+    return parser
+
+
+def main(argv: list[str] | None = None) -> None:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+
+    try:
+        destination = download_librispeech_dev_clean(
+            args.target_dir,
+            force=args.force,
+            quiet=args.quiet,
+        )
+    except Exception as exc:
+        print(f"ERROR: failed to download corpus: {exc}", file=sys.stderr)
+        sys.exit(1)
+
+    if not args.quiet:
+        print("Corpus ready.")
+        print(f"Mount this directory as /app/corpus in production: {destination.parent.parent}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/violawake_sdk/tools/train.py b/src/violawake_sdk/tools/train.py
index 5db2a7566ab8542e440985a17f2f190b7a0a09a9..0c687cd9fa3ba604d72b556d7e1b7cafac94a5f8
--- a/src/violawake_sdk/tools/train.py
+++ b/src/violawake_sdk/tools/train.py
@@ -73,9 +73,13 @@
 # Module-level temp directory override. When set, all tempfile operations use
 # this instead of the OS default (which may be on a small system drive).
 # Set by _train_temporal_cnn() via its tmp_dir parameter.
-_TMP_DIR: str | None = None
-_LAST_EDGE_TTS_ERROR: str | None = None
-_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_TMP_DIR: str | None = None
+_LAST_EDGE_TTS_ERROR: str | None = None
+_REPORTED_EDGE_TTS_ERRORS: set[str] = set()
+_EDGE_TTS_MAX_ATTEMPTS = 3
+_EDGE_TTS_RETRY_BASE_SECONDS = 0.75
+_EDGE_TTS_RETRY_MAX_SECONDS = 4.0
+_EDGE_TTS_RETRY_RNG = Random()
 
 # ---------------------------------------------------------------------------
 # Edge-TTS voice pool for diverse positive and negative generation
@@ -316,25 +320,53 @@
                 mp3_buf.write(chunk["data"])
         return mp3_buf.getvalue()
 
-    try:
-        # Run the async synthesis
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                import concurrent.futures
-
-                with concurrent.futures.ThreadPoolExecutor() as pool:
-                    mp3_data = pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
-            else:
-                mp3_data = loop.run_until_complete(_synth())
-        except RuntimeError:
-            mp3_data = asyncio.run(_synth())
-    except Exception as exc:
-        return _edge_tts_fail(text, voice, exc)
-
-    if not mp3_data or len(mp3_data) < 100:
-        return _edge_tts_fail(
-            text,
+    def _run_synth() -> bytes:
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            return asyncio.run(_synth())
+
+        if loop.is_running():
+            import concurrent.futures
+
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(lambda: asyncio.run(_synth())).result(timeout=30)
+        return loop.run_until_complete(_synth())
+
+    mp3_data: bytes | None = None
+    max_attempts = max(1, _EDGE_TTS_MAX_ATTEMPTS)
+    for attempt in range(1, max_attempts + 1):
+        try:
+            mp3_data = _run_synth()
+            break
+        except Exception as exc:
+            if attempt >= max_attempts:
+                return _edge_tts_fail(
+                    text,
+                    voice,
+                    f"edge-tts failed after {attempt} attempts: {type(exc).__name__}: {exc}",
+                )
+            delay = min(
+                _EDGE_TTS_RETRY_MAX_SECONDS,
+                _EDGE_TTS_RETRY_BASE_SECONDS * (2 ** (attempt - 1)),
+            )
+            delay += _EDGE_TTS_RETRY_RNG.uniform(0.0, _EDGE_TTS_RETRY_BASE_SECONDS)
+            logger.warning(
+                "edge-tts synthesis attempt %s/%s failed for voice %s text %.80r: "
+                "%s: %s; retrying in %.2fs",
+                attempt,
+                max_attempts,
+                voice,
+                text,
+                type(exc).__name__,
+                exc,
+                delay,
+            )
+            time.sleep(delay)
+
+    if not mp3_data or len(mp3_data) < 100:
+        return _edge_tts_fail(
+            text,
             voice,
             f"edge-tts returned too little audio data ({len(mp3_data) if mp3_data else 0} bytes)",
         )
@@ -639,11 +671,14 @@
     output_dir: Path,
     n_voices: int = 5,
     verbose: bool = True,
-) -> list[Path]:
-    """Generate speech negative samples via TTS using common phrases.
-
-    Returns list of generated WAV file paths.
-    """
+) -> list[Path]:
+    """Deprecated for production training: generate speech negatives via TTS.
+
+    Production training should use the shared LibriSpeech/MUSAN corpus for
+    generic speech negatives. This helper remains for legacy CLI experiments.
+
+    Returns list of generated WAV file paths.
+    """
     output_dir.mkdir(parents=True, exist_ok=True)
     voices_subset = EDGE_TTS_VOICES[:n_voices]
     generated: list[Path] = []
diff --git a/tests/integration/test_feature_completeness.py b/tests/integration/test_feature_completeness.py
index ed77e43c69c5aab21f72422990567fc00267d5fd..247dcfde15e1428a9cd42ba27ac500fb9a61fa25
--- a/tests/integration/test_feature_completeness.py
+++ b/tests/integration/test_feature_completeness.py
@@ -1727,10 +1727,11 @@
     CONSOLE_SCRIPTS = {
         "violawake-train": "violawake_sdk.tools.train:main",
         "violawake-eval": "violawake_sdk.tools.evaluate:main",
-        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
-        "violawake-download": "violawake_sdk.tools.download_model:main",
-        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
-        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
+        "violawake-collect": "violawake_sdk.tools.collect_samples:main",
+        "violawake-download": "violawake_sdk.tools.download_model:main",
+        "violawake-download-corpus": "violawake_sdk.tools.download_corpus:main",
+        "violawake-expand-corpus": "violawake_sdk.tools.expand_corpus:main",
+        "violawake-streaming-eval": "violawake_sdk.tools.streaming_eval:main",
         "violawake-test-confusables": "violawake_sdk.tools.test_confusables:main",
         "violawake-contamination-check": "violawake_sdk.tools.contamination_check:main",
         "violawake-generate": "violawake_sdk.tools.generate_samples:main",
diff --git a/tests/integration/test_sdk_surface.py b/tests/integration/test_sdk_surface.py
index 27401f4d0dd5a3133b8894fc206892a58307d7c0..3b837afcf415fec4504a192b3d0d444cb358a7c9
--- a/tests/integration/test_sdk_surface.py
+++ b/tests/integration/test_sdk_surface.py
@@ -150,11 +150,12 @@
     "module_name",
     [
         "violawake_sdk.tools.evaluate",
-        "violawake_sdk.tools.collect_samples",
-        "violawake_sdk.tools.train",
-        "violawake_sdk.tools.download_model",
-    ],
-)
+        "violawake_sdk.tools.collect_samples",
+        "violawake_sdk.tools.train",
+        "violawake_sdk.tools.download_model",
+        "violawake_sdk.tools.download_corpus",
+    ],
+)
 def test_cli_entrypoints_are_importable_and_expose_main(module_name: str) -> None:
     module = importlib.import_module(module_name)
 
diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py
index f6b9a8507a63eaafe7f9d7136af1ed93f044413a..17876edb1a87397be83171264a6e15924a6ddaee
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@@ -325,8 +325,8 @@
 # violawake-download  (tools.download_model:main  &  cli.download:main)
 # ===================================================================
 
-class TestDownloadCLI:
-    """Tests for the violawake-download entry point."""
+class TestDownloadCLI:
+    """Tests for the violawake-download entry point."""
 
     def test_help_exits_zero(self) -> None:
         result = _run_cli("violawake_sdk.tools.download_model", ["--help"])
@@ -394,14 +394,28 @@
 
     def test_cli_wrapper_help(self) -> None:
         """The cli.download wrapper re-exports the same main."""
-        result = _run_cli("violawake_sdk.cli.download", ["--help"])
-        assert result.returncode == 0
-        assert "violawake-download" in result.stdout
-
-
-# ===================================================================
-# violawake-collect  (tools.collect_samples:main)
-# ===================================================================
+        result = _run_cli("violawake_sdk.cli.download", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download" in result.stdout
+
+
+# ===================================================================
+# violawake-download-corpus  (tools.download_corpus:main)
+# ===================================================================
+
+class TestDownloadCorpusCLI:
+    """Tests for the violawake-download-corpus entry point."""
+
+    def test_help_exits_zero(self) -> None:
+        result = _run_cli("violawake_sdk.tools.download_corpus", ["--help"])
+        assert result.returncode == 0
+        assert "violawake-download-corpus" in result.stdout
+        assert "--target-dir" in result.stdout
+
+
+# ===================================================================
+# violawake-collect  (tools.collect_samples:main)
+# ===================================================================
 
 class TestCollectCLI:
     """Tests for the violawake-collect entry point."""
diff --git a/tests/unit/test_download_corpus.py b/tests/unit/test_download_corpus.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad8accd134b60281b989ac1d8d7530ccc246404e
--- /dev/null
+++ b/tests/unit/test_download_corpus.py
@@ -0,0 +1,58 @@
+"""Smoke tests for the corpus downloader CLI."""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+import sys
+import tarfile
+from pathlib import Path
+from unittest.mock import patch
+
+from violawake_sdk.tools import download_corpus
+
+
+def _write_librispeech_archive(archive_path: Path, tmp_path: Path) -> None:
+    sample_dir = tmp_path / "archive_src" / "LibriSpeech" / "dev-clean" / "84" / "121123"
+    sample_dir.mkdir(parents=True)
+    (sample_dir / "84-121123-0000.flac").write_bytes(b"fake flac")
+
+    with tarfile.open(archive_path, "w:gz") as tar:
+        tar.add(tmp_path / "archive_src" / "LibriSpeech", arcname="LibriSpeech")
+
+
+def test_help_exits_zero() -> None:
+    result = subprocess.run(
+        [sys.executable, "-m", "violawake_sdk.tools.download_corpus", "--help"],
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    assert result.returncode == 0
+    assert "violawake-download-corpus" in result.stdout
+    assert "--target-dir" in result.stdout
+
+
+def test_download_librispeech_uses_mocked_urlretrieve(tmp_path: Path) -> None:
+    archive_path = tmp_path / "dev-clean.tar.gz"
+    _write_librispeech_archive(archive_path, tmp_path)
+    target_dir = tmp_path / "corpus"
+
+    def fake_urlretrieve(url: str, filename: str | Path):
+        assert url == download_corpus.LIBRISPEECH_DEV_CLEAN_URL
+        shutil.copyfile(archive_path, filename)
+        return str(filename), None
+
+    with patch(
+        "violawake_sdk.tools.download_corpus.urlretrieve",
+        side_effect=fake_urlretrieve,
+    ) as mocked_urlretrieve:
+        destination = download_corpus.download_librispeech_dev_clean(
+            target_dir,
+            quiet=True,
+        )
+
+    assert destination == target_dir / "librispeech" / "dev-clean"
+    assert (destination / "84" / "121123" / "84-121123-0000.flac").exists()
+    mocked_urlretrieve.assert_called_once()
diff --git a/tests/unit/test_train.py b/tests/unit/test_train.py
index be62dbf6ca9b31c0fb7c755b591679d2314294a1..2f6d29074ad5a96cae59310eab405bf7ddb81e50
--- a/tests/unit/test_train.py
+++ b/tests/unit/test_train.py
@@ -79,6 +79,45 @@
         assert out_path.stat().st_size > 44
         soundfile_module.read.assert_called_once()
 
+    def test_edge_tts_synthesize_retries_transient_stream_failure(
+        self, tmp_path: Path
+    ) -> None:
+        import numpy as np
+
+        attempts = {"count": 0}
+
+        class FakeCommunicate:
+            def __init__(self, text: str, voice: str) -> None:
+                self.text = text
+                self.voice = voice
+
+            async def stream(self):
+                attempts["count"] += 1
+                if attempts["count"] == 1:
+                    raise RuntimeError("503 Service Unavailable")
+                yield {"type": "audio", "data": b"fake mp3 bytes" * 20}
+
+        edge_tts_module = ModuleType("edge_tts")
+        edge_tts_module.Communicate = FakeCommunicate
+        soundfile_module = ModuleType("soundfile")
+        soundfile_module.read = MagicMock(
+            return_value=(np.zeros(16000, dtype=np.float32), 16000)
+        )
+
+        out_path = tmp_path / "tts.wav"
+        with (
+            patch.dict(
+                sys.modules,
+                {"edge_tts": edge_tts_module, "soundfile": soundfile_module},
+            ),
+            patch("violawake_sdk.tools.train.time.sleep") as sleep_mock,
+        ):
+            assert train._edge_tts_synthesize("hello", "en-US-JennyNeural", out_path)
+
+        assert attempts["count"] == 2
+        sleep_mock.assert_called_once()
+        assert out_path.stat().st_size > 44
+
     def test_confusable_generation_logs_zero_edge_tts_outputs(
         self, caplog: pytest.LogCaptureFixture, tmp_path: Path
     ) -> None:

SUCCESS: The process with PID 29536 (child process of PID 50612) has been terminated.
SUCCESS: The process with PID 50612 (child process of PID 20752) has been terminated.
tokens used
265,759
Implemented and committed. No push.

**Commits**
- `b45e03c` Require corpus speech negatives in backend training
- `854acaf` Add Edge TTS retry backoff
- `49ffd9a` Add corpus downloader CLI
- `6de96f5` Document required training corpus mount

**Proof**
- Requested console run:
  `147 passed, 2 skipped in 34.77s` from:
  `cd console && python -m pytest tests/ --no-cov --timeout=30 --ignore=tests/e2e -q`
- Downloader smoke/retry tests: `3 passed`
- CLI/import entry tests: `16 passed`
- `python -m py_compile` on changed Python files passed.
- `python -m violawake_sdk.tools.download_corpus --help` works and shows `violawake-download-corpus`, `--target-dir`, `--force`, `--quiet`.

**Key Diff**
```diff
-            _generate_speech_negatives,
...
-            "message": "Generated confusables. Generating speech negatives...",
+            "message": "Generated confusables. Loading corpus speech negatives...",
...
-        # Source 3: Auto-generated speech negatives (common phrases)
-        speech_files = _generate_speech_negatives(...)
...
+        if total_speech_neg < 5 or total_neg < 5:
+            raise RuntimeError(
+                "No speech negatives available. Mount LibriSpeech + MUSAN corpus at "
+                "/app/corpus or run `violawake download-corpus`."
+            )
```

**Files changed**
- [training_service.py](J:/CLAUDE/PROJECTS/Wakeword/console/backend/app/services/training_service.py)
- [train.py](J:/CLAUDE/PROJECTS/Wakeword/src/violawake_sdk/tools/train.py)
- [download_corpus.py](J:/CLAUDE/PROJECTS/Wakeword/src/violawake_sdk/tools/download_corpus.py)
- [pyproject.toml](J:/CLAUDE/PROJECTS/Wakeword/pyproject.toml)
- [docker-compose.production.yml](J:/CLAUDE/PROJECTS/Wakeword/docker-compose.production.yml)
- [DEPLOYMENT.md](J:/CLAUDE/PROJECTS/Wakeword/docs/DEPLOYMENT.md)
- tests for downloader, retry, CLI/script surface

`pyproject.toml` now has:
```toml
violawake-download-corpus = "violawake_sdk.tools.download_corpus:main"
```

The remaining dirty worktree entries are pre-existing and were not staged: `console/backend/requirements.txt` and the `.codex_log*` / `.codex_prompt*` files.

Takeaway: production training now requires corpus-backed speech negatives. No more flaky Edge-TTS hammer for generic speech negatives.
