Here's the demo in one breath: I say “Hey Claude, add type hints to utils.py and run the tests.” A soft chime fires the instant the wake word lands. I keep talking. A new row appears in claude agents, already working. I can stack three of them while the first is still going.
The whole pipeline is the cheapest-thing-that-works at every stage, so it can idle in the menu bar all day without melting your battery — a tiny always-on classifier gates an occasional voice-activity check, which gates an occasional GPU transcription, which gates the only step that touches the network.
# every 80ms on wake on speech-end once mic ──▶ openWakeWord ──▶ silence VAD ──▶ MLX Whisper ──▶ claude --bg "<your words>" 16kHz "hey claude"? record cmd transcribe (Metal) detached agent └──────── nothing leaves your machine until this last arrow ────────┘
01 / THE PART I GOT WRONG FIRSTYou can’t ship “Hey Claude”
Wake-word detection has a clean answer: openWakeWord. A ~250 KB classifier rides on top of a frozen Google speech-embedding model. It sips CPU, runs fully offline, and — crucially — it’s trained on 100% synthetic speech, so it’s speaker-independent and nobody has to record their voice.
One catch: openWakeWord ships pretrained models for alexa, hey jarvis, hey mycroft… not hey claude. To get my phrase I’d have to train a model. The “easy” path is a Colab notebook — point it at a phrase, hit Run All, wait ~10 minutes on a free T4.
So naturally I tried to get an AI agent to drive Google Colab through a Chrome debug port.
This is the part where I tell you it half-worked and was a terrible idea. I launched Chrome with --remote-debugging-port=9222, attached over the DevTools protocol, and started clicking through Colab’s editor by setting Monaco models directly and pasting cells via the system clipboard. It genuinely created notebooks, ran cells, and read output back through screenshots. It was also death by a thousand round-trips — every action a CDP call, every result a PNG I had to squint at.
Then the browser tab closed mid-run and took the whole fragile contraption with it. Good. Because the real answer was sitting right there.
02 / THE PIVOTTrain it on GitHub Actions instead
Here’s the unlock: I control the Python version on a GitHub Actions runner. No browser, no babysitting — I drive it with gh, it trains on a free CPU runner, and the model comes back as a downloadable artifact. openWakeWord freezes the heavy backbone and only trains a tiny classifier head, so CPU is completely fine.
There was just the small matter of the entire 2023-era training stack meeting 2026-era wheels. What follows is the descent. Each ❌ cost me a full re-run to discover. I'm putting them all here so you never have to.
First validation, against macOS say voices the model had never heard (a fair cross-check — the training data is all piper):
"hey claude" Alex 0.990 Samantha 0.990 Karen 0.515 "what is the weather" 0.001 # correctly ignored ✓
It fired. A wake word I trained, in the cloud, with zero recordings, detecting my phrase and ignoring everything else. I may have yelled.
03 / THE PLOT TWIST“Claude” sounds exactly like “cloud”
Buoyed by success, I launched the full run: six phrases, 150 speakers each, 12,000 steps, the real 16 GB negative-feature set (2,000 hours of audio that is definitely not the wake word). Two hours later, five models came back. Four of them were flawless.
And then there was hey_claude:
"hey claude" Alex 0.117 Samantha 0.096 # below 0.5 — it won't fire
The most important model was the only one that didn’t work. Here's the beautiful, infuriating reason: “Claude” is acoustically almost identical to “cloud” and “clawed.” The aggressive negative weighting that made the other models precise had taught this one to distrust anything in that phonetic neighborhood — including the actual wake word.
It’s a precision/recall seesaw, and “claude” sits right on the pivot.
So I loosened the negatives and retrained. Recall came roaring back — 0.93 / 0.75 / 0.93. Victory! Except:
"the cloud is gray" 0.942 # 😬 false alarm "i clawed the table" 0.694 # 😬 also fires
Crank the negatives and it can't hear “claude.” Ease them and it launches a coding agent every time someone mentions the weather. There’s a sweet spot in the middle — I was chasing it when a GitHub runner decided to take 64 minutes for a job that normally takes 19, and I called it. Some lessons you pay for once.
04 / WHAT ACTUALLY SHIPPEDFive wake words, and a clean Claude trigger
Validated against three macOS voices, with cross-phrase negatives. The “okay claude” model is the quiet hero — that leading “okay” pulls it out of the cloud/clawed danger zone, so it’s both reliable and on-brand.
| models use … | phrase | recall (A / S / D) | rejects noise |
|---|---|---|---|
| hey_computer | “hey computer” | 0.97 / 0.97 / 0.97 | clean |
| hey_agent | “hey agent” | 0.90 / 0.88 / 0.94 | clean |
| okay_claude ★ | “okay claude” | 0.87 / 0.62 / 0.79 | clean |
| hey_assistant | “hey assistant” | 0.63 / 0.62 / 0.95 | clean |
| hey_claude | “hey claude” | 0.93 / 0.75 / 0.93 | trips on “cloud” |
Two clean ways to literally say “Hey Claude”
- okay_claude — the active default. Same Claude energy, tested clean, won’t fire on the weather report.
- The Whisper engine — hey-claude config set engine whisper. It transcribes and string-matches the literal phrase, so “cloud” vs “claude” is a non-issue. No model needed, works the instant you install it.
05 / THINGS I LEARNEDField notes
- “Just use Colab” is a lie when an agent is driving. A human clicking Run All is trivial; automating a heavy JS notebook blind is not. GitHub Actions — where I drive with a CLI and get an artifact — was the actually-simple path.
- The cheapest stage should gate the expensive one. Always-on Whisper would cook your laptop. A 250 KB classifier gating an occasional GPU transcription does not.
- Pick a wake word that isn’t a real word. “Hey Jarvis” works because nothing sounds like Jarvis. “Hey Claude” fights “cloud” forever.
- A voice trigger is a security surface. The spoken command is passed to claude --bg as a single argv element — never through a shell. “ship it; rm -rf /” is one harmless argument, not an injection. There’s a test that enforces it.
- Half your battle with old ML code is the dependency archaeology. scipy, numpy, datasets, torchaudio — every one had quietly moved on. Pin like it’s 2023.
Say the word.
macOS · Apple Silicon · needs Claude Code ≥ 2.1.139. The whole training pipeline is reproducible — one workflow, your phrases, free runners.
$ hey-claude models use hey_computer ✓ active wake word: "hey computer" $ hey-claude listening — say "hey computer, fix the failing tests"…