Show HN · Build Log

I said “Hey Claude” and a coding agent woke up.

A fully on-device voice wake word for Claude Code. Say the phrase, talk, and a background agent spins up and starts working. No cloud listening, no API key to hear you, no signup. Then I tried to train the wake word itself — and fell down the most gloriously cursed dependency hole of my year.

Here's the demo in one breath: I say “Hey Claude, add type hints to utils.py and run the tests.” A soft chime fires the instant the wake word lands. I keep talking. A new row appears in claude agents, already working. I can stack three of them while the first is still going.

The whole pipeline is the cheapest-thing-that-works at every stage, so it can idle in the menu bar all day without melting your battery — a tiny always-on classifier gates an occasional voice-activity check, which gates an occasional GPU transcription, which gates the only step that touches the network.

the pipeline
# every 80ms          on wake           on speech-end        once
  mic ──▶ openWakeWord ──▶ silence VAD ──▶ MLX Whisper ──▶ claude --bg "<your words>"
  16kHz   "hey claude"?     record cmd       transcribe (Metal)   detached agent
  └──────── nothing leaves your machine until this last arrow ────────┘

01 / THE PART I GOT WRONG FIRSTYou can’t ship “Hey Claude”

Wake-word detection has a clean answer: openWakeWord. A ~250 KB classifier rides on top of a frozen Google speech-embedding model. It sips CPU, runs fully offline, and — crucially — it’s trained on 100% synthetic speech, so it’s speaker-independent and nobody has to record their voice.

One catch: openWakeWord ships pretrained models for alexa, hey jarvis, hey mycroft… not hey claude. To get my phrase I’d have to train a model. The “easy” path is a Colab notebook — point it at a phrase, hit Run All, wait ~10 minutes on a free T4.

So naturally I tried to get an AI agent to drive Google Colab through a Chrome debug port.

This is the part where I tell you it half-worked and was a terrible idea. I launched Chrome with --remote-debugging-port=9222, attached over the DevTools protocol, and started clicking through Colab’s editor by setting Monaco models directly and pasting cells via the system clipboard. It genuinely created notebooks, ran cells, and read output back through screenshots. It was also death by a thousand round-trips — every action a CDP call, every result a PNG I had to squint at.

Then the browser tab closed mid-run and took the whole fragile contraption with it. Good. Because the real answer was sitting right there.

02 / THE PIVOTTrain it on GitHub Actions instead

Here’s the unlock: I control the Python version on a GitHub Actions runner. No browser, no babysitting — I drive it with gh, it trains on a free CPU runner, and the model comes back as a downloadable artifact. openWakeWord freezes the heavy backbone and only trains a tiny classifier head, so CPU is completely fine.

There was just the small matter of the entire 2023-era training stack meeting 2026-era wheels. What follows is the descent. Each ❌ cost me a full re-run to discover. I'm putting them all here so you never have to.

❌ ModuleNotFoundError: No module named 'piper_phonemize' piper-sample-generator needs piper-phonemize, which has no wheel for Python 3.12. → Bypass it entirely: synthesize with piper-tts and the libritts_r voice, which has 904 speakers — instant voice diversity.
❌ ImportError: cannot import name 'sph_harm' from 'scipy.special' The acoustics dep imports a function scipy removed in 1.15. → Pin scipy<1.15, numpy<2. The whole stack predates numpy 2.
❌ ImportError: To support encoding audio data, please install 'torchcodec' datasets v4 routes all audio through torchcodec. → Pin datasets<3 (soundfile backend).
❌ TorchCodec is required for load_with_torchcodec …and so does modern torchaudio. → Pin torch/torchaudio==2.2.2, which still uses the soundfile backend.
❌ ValueError: Clip does not have the correct sample rate! piper outputs 22,050 Hz; openWakeWord demands 16,000. → Resample every synthesized clip. Silent, brutal, obvious in hindsight.
❌ OnnxExporterError: Module onnx is not installed! The model trained to 100%… then died one line from the finish because torch.onnx.export needs onnx. → Add one package. One.
✓ hey_claude.onnx — 205 KB — downloaded.

First validation, against macOS say voices the model had never heard (a fair cross-check — the training data is all piper):

validate.py
"hey claude"  Alex 0.990   Samantha 0.990   Karen 0.515
"what is the weather"   0.001  # correctly ignored ✓

It fired. A wake word I trained, in the cloud, with zero recordings, detecting my phrase and ignoring everything else. I may have yelled.

03 / THE PLOT TWIST“Claude” sounds exactly like “cloud”

Buoyed by success, I launched the full run: six phrases, 150 speakers each, 12,000 steps, the real 16 GB negative-feature set (2,000 hours of audio that is definitely not the wake word). Two hours later, five models came back. Four of them were flawless.

And then there was hey_claude:

the heartbreak
"hey claude"   Alex 0.117   Samantha 0.096   # below 0.5 — it won't fire

The most important model was the only one that didn’t work. Here's the beautiful, infuriating reason: “Claude” is acoustically almost identical to “cloud” and “clawed.” The aggressive negative weighting that made the other models precise had taught this one to distrust anything in that phonetic neighborhood — including the actual wake word.

It’s a precision/recall seesaw, and “claude” sits right on the pivot.

So I loosened the negatives and retrained. Recall came roaring back — 0.93 / 0.75 / 0.93. Victory! Except:

the other heartbreak
"the cloud is gray"     0.942   # 😬 false alarm
"i clawed the table"    0.694   # 😬 also fires

Crank the negatives and it can't hear “claude.” Ease them and it launches a coding agent every time someone mentions the weather. There’s a sweet spot in the middle — I was chasing it when a GitHub runner decided to take 64 minutes for a job that normally takes 19, and I called it. Some lessons you pay for once.

The honest engineering takeaway: a tiny on-device classifier struggles to separate near-homophones. The fix isn’t more training — it’s either a phrase that isn’t a common word, or a different detector. I have both.

04 / WHAT ACTUALLY SHIPPEDFive wake words, and a clean Claude trigger

Validated against three macOS voices, with cross-phrase negatives. The “okay claude” model is the quiet hero — that leading “okay” pulls it out of the cloud/clawed danger zone, so it’s both reliable and on-brand.

models use …phraserecall (A / S / D)rejects noise
hey_computer“hey computer”0.97 / 0.97 / 0.97clean
hey_agent“hey agent”0.90 / 0.88 / 0.94clean
okay_claude ★“okay claude”0.87 / 0.62 / 0.79clean
hey_assistant“hey assistant”0.63 / 0.62 / 0.95clean
hey_claude“hey claude”0.93 / 0.75 / 0.93trips on “cloud”
5
wake words trained from zero recordings
$0
total cloud-compute spend (free runners)
904
synthetic speakers per phrase

Two clean ways to literally say “Hey Claude”

  • okay_claude — the active default. Same Claude energy, tested clean, won’t fire on the weather report.
  • The Whisper enginehey-claude config set engine whisper. It transcribes and string-matches the literal phrase, so “cloud” vs “claude” is a non-issue. No model needed, works the instant you install it.

05 / THINGS I LEARNEDField notes

  • “Just use Colab” is a lie when an agent is driving. A human clicking Run All is trivial; automating a heavy JS notebook blind is not. GitHub Actions — where I drive with a CLI and get an artifact — was the actually-simple path.
  • The cheapest stage should gate the expensive one. Always-on Whisper would cook your laptop. A 250 KB classifier gating an occasional GPU transcription does not.
  • Pick a wake word that isn’t a real word. “Hey Jarvis” works because nothing sounds like Jarvis. “Hey Claude” fights “cloud” forever.
  • A voice trigger is a security surface. The spoken command is passed to claude --bg as a single argv element — never through a shell. “ship it; rm -rf /” is one harmless argument, not an injection. There’s a test that enforces it.
  • Half your battle with old ML code is the dependency archaeology. scipy, numpy, datasets, torchaudio — every one had quietly moved on. Pin like it’s 2023.

Say the word.

macOS · Apple Silicon · needs Claude Code ≥ 2.1.139. The whole training pipeline is reproducible — one workflow, your phrases, free runners.

$ hey-claude models use hey_computer
✓ active wake word: "hey computer"
$ hey-claude
  listening — say "hey computer, fix the failing tests"…