Metadata-Version: 2.4
Name: punt-vox
Version: 4.7.6
Summary: Text-to-speech CLI, MCP server, and Claude Code plugin (ElevenLabs, AWS Polly, OpenAI)
Keywords: tts,vox,text-to-speech,mcp,elevenlabs,aws-polly,openai
Author: Punt Labs
Author-email: Punt Labs <hello@punt-labs.com>
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Dist: boto3>=1.35.0
Requires-Dist: boto3-stubs[polly]>=1.35.0
Requires-Dist: botocore-stubs>=1.35.0
Requires-Dist: elevenlabs>=2.0.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: openai>=1.0.0
Requires-Dist: pydub>=0.25.0
Requires-Dist: audioop-lts>=0.2.1
Requires-Dist: typer>=0.24.1
Requires-Dist: websockets>=14.0
Requires-Dist: mypy>=1.14.0 ; extra == 'dev'
Requires-Dist: pyright>=1.1.390 ; extra == 'dev'
Requires-Dist: ruff>=0.9.0 ; extra == 'dev'
Requires-Dist: pytest>=8.3.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0 ; extra == 'dev'
Requires-Dist: punt-lux>=0.9.0 ; extra == 'lux'
Requires-Python: >=3.13
Project-URL: Homepage, https://github.com/punt-labs/vox
Project-URL: Repository, https://github.com/punt-labs/vox
Project-URL: Bug Tracker, https://github.com/punt-labs/vox/issues
Provides-Extra: dev
Provides-Extra: lux
Description-Content-Type: text/markdown

# punt-vox

> Voice for your AI coding assistant.

[![License](https://img.shields.io/github/license/punt-labs/vox)](LICENSE)
[![CI](https://img.shields.io/github/actions/workflow/status/punt-labs/vox/test.yml?label=CI)](https://github.com/punt-labs/vox/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/punt-vox)](https://pypi.org/project/punt-vox/)
[![Python](https://img.shields.io/pypi/pyversions/punt-vox)](https://pypi.org/project/punt-vox/)
[![Working Backwards](https://img.shields.io/badge/Working_Backwards-hypothesis-lightgrey)](./prfaq.pdf)

When Claude Code finishes a task, hits an error, or needs your approval --- you hear it. No need to watch the terminal. Keep working; your assistant will tell you what happened.

**Platforms:** macOS, Linux

## Hear It

Real samples generated by vox with ElevenLabs v3. The first three are the same recap with different `/vibe` moods --- expressive tags change how the voice sounds without changing the words.

| Sample | Vibe | Voice | |
|--------|------|-------|-|
| Task recap | neutral | sarah | [listen](https://github.com/punt-labs/vox/releases/download/v1.2.4/recap-neutral.mp4) |
| Same recap | `[excited]` | sarah | [listen](https://github.com/punt-labs/vox/releases/download/v1.2.4/recap-excited.mp4) |
| Same recap | `[weary] [sighs]` | sarah | [listen](https://github.com/punt-labs/vox/releases/download/v1.2.4/recap-weary.mp4) |
| Task complete | neutral | matilda | [listen](https://github.com/punt-labs/vox/releases/download/v1.2.4/task-complete.mp4) |

## Quick Start

```bash
curl -fsSL https://raw.githubusercontent.com/punt-labs/vox/0444cf2/install.sh | sh
```

Restart Claude Code, then:

```text
/vox y        # hear when tasks complete or need input
/recap        # spoken summary of what just happened
```

<details>
<summary>Manual install (if you already have uv)</summary>

```bash
uv tool install punt-vox
vox install
vox doctor
```

</details>

<details>
<summary>Verify before running</summary>

```bash
curl -fsSL https://raw.githubusercontent.com/punt-labs/vox/0444cf2/install.sh -o install.sh
shasum -a 256 install.sh
cat install.sh
sh install.sh
```

</details>

## Configure providers

The Quick Start gets you running with the OS's built-in voice — `say` on macOS, `espeak-ng` on Linux. For a natural-sounding voice, configure any cloud provider. For `/vibe` expressive tags (`[excited]`, `[weary]`, `[sighs]`, etc) you need ElevenLabs specifically — that's the only provider that supports them today.

### 1. Get an API key

| Provider | Where to sign up | Free tier |
|---|---|---|
| **ElevenLabs** (recommended) | [elevenlabs.io](https://elevenlabs.io/sign-up) → [Settings → API Keys](https://elevenlabs.io/app/settings/api-keys) | 10k characters/month |
| **OpenAI** | [platform.openai.com](https://platform.openai.com) → [API Keys](https://platform.openai.com/api-keys) | None — pay-as-you-go |
| **AWS Polly** | Any AWS account; create an IAM user with the `AmazonPollyReadOnlyAccess` policy | 5M chars/month, first 12 months |

### 2. Add the keys to `~/.punt-labs/vox/keys.env`

The keys file is in your home directory and owned by you — open it in your normal editor, no sudo:

```bash
nano ~/.punt-labs/vox/keys.env   # or vi, code, etc
```

Paste any of these lines that apply. All are optional; vox auto-detects which providers are configured.

```ini
# ElevenLabs — recommended
ELEVENLABS_API_KEY=sk_...

# OpenAI
OPENAI_API_KEY=sk-proj-...

# AWS Polly via an aws CLI profile (recommended if you use the AWS CLI)
AWS_PROFILE=default
AWS_DEFAULT_REGION=us-east-1

# AWS Polly via raw credentials (alternative to a profile)
# AWS_ACCESS_KEY_ID=AKIA...
# AWS_SECRET_ACCESS_KEY=...
# AWS_DEFAULT_REGION=us-east-1

# Optional: pin a specific provider.
# Auto-detect order: ElevenLabs > OpenAI > Polly > say (macOS) / espeak (Linux).
TTS_PROVIDER=elevenlabs
```

### 3. Restart the daemon to pick up the changes

```bash
# Linux
sudo systemctl restart voxd

# macOS
sudo launchctl kickstart -k system/com.punt-labs.voxd
```

This is the only sudo prompt for routine key management — `systemctl` and `launchctl` are system-level daemon managers and always require root to manage services. Editing `keys.env` itself is sudo-free.

### 4. Verify

```bash
vox doctor                         # report system checks and the daemon's active provider
vox unmute "hello from vox"        # speak through the default provider
```

`vox doctor` reports the Python version, ffmpeg/espeak presence, daemon status, and which provider the running daemon is currently using. `vox unmute` should speak the phrase through your speakers within a few seconds.

If something doesn't work, the daemon log at `~/.punt-labs/vox/logs/voxd.log` captures the spawn command, audio session env, exit code, elapsed time, and player stderr — enough detail to diagnose most failures without any extra tooling.

## Upgrading

`uv tool upgrade punt-vox` replaces the wheel on disk, but it does **not** restart the long-running `voxd` daemon. Until you cycle the daemon, any change that touches daemon behavior — new WebSocket fields, new dedup semantics, new CLI flags that voxd has to parse — will silently be ignored by the old process. Always restart the daemon after an upgrade:

```bash
# macOS or Linux — identical command now
uv tool upgrade punt-vox
vox daemon restart
```

Run `vox daemon restart` as your normal user, **not** under `sudo`. The command refuses to run as root and prompts for sudo internally only for the two service-manager calls (`systemctl`/`launchctl`) that actually need it. It stops voxd via the service manager, waits for the port to free, starts it again, and polls the authenticated health endpoint until the new process is confirmed running. It prints the new PID and port on success, or points you at `~/.punt-labs/vox/logs/voxd.log` on failure.

To confirm the daemon and the installed wheel agree:

```bash
vox doctor
```

`vox doctor` now reports the running daemon version alongside the reachability check. When the running daemon does not match the wheel installed on disk, doctor emits a yellow `⚠ Daemon: running ... (version X — wheel has Y, run 'vox daemon restart' to refresh)` warning. Exit code stays 0 — the daemon is still functional — but the warning catches stale daemons at smoke-test time instead of in production.

Doctor also inspects `~/.config/systemd/user/vox.service` on Linux if it exists. An earlier install layout left a user-level unit behind with `ExecStart=.../vox serve`, a subcommand that no longer exists in the CLI; any surviving file crash-loops on the systemd restart schedule. Doctor fails loudly with a remediation hint when the referenced subcommand is not in the current CLI, and `vox daemon install` now removes the stale unit automatically on upgrade. macOS has no user-level systemd, so the check is gated off there.

## Features

- **Notification layer** --- spoken summaries when tasks finish, chimes when Claude needs input
- **Session vibe** --- `/vibe` sets the mood for all speech. Auto-mode reads session signals (test results, lint, git ops) and adapts the voice. Manual mode lets you set it yourself. ElevenLabs expressive tags (`[weary]`, `[excited]`, `[sighs]`) color every utterance.
- **Five providers** --- ElevenLabs, OpenAI, AWS Polly, macOS `say`, and Linux `espeak-ng`. The full experience (natural voice, expressive tags, `/vibe`) requires ElevenLabs.
- **Opt-in only** --- no audio until you enable it, no surprises
- **Voice or chime** --- `/mute` switches to audio tones, no TTS API calls
- **Graceful absence** --- if punt-vox isn't installed, Claude Code works exactly as before
- **MCP-native** --- runs as a Claude Code plugin with slash commands and hooks
- **Audio daemon** --- `voxd` is a system-level audio server that handles synthesis and playback. Deduplicates audio across sessions, serializes playback, caches synthesis results
- **Background music** --- `/music on` generates vibe-driven instrumental tracks via the ElevenLabs Music API and loops them at low volume while you work. When the vibe changes, a new track generates to match. Style modifiers (`/music on style techno`) persist across invocations. Requires an ElevenLabs paid plan; each track costs ~2,000 credits

## What It Looks Like

### Enable notifications

```text
> /vox y

Vox enabled. You'll hear when tasks finish or need approval.
Pick a voice with /unmute @<name>.
```

### Get a recap

```text
> /recap

Speaking: "I refactored the authentication module into three files, added
comprehensive tests for the token refresh flow, and fixed a race condition
in the session middleware. All 47 tests pass."
```

### Set the vibe

```text
> /vibe banging my head against the wall

Vibe: banging my head against the wall → [frustrated] [sighs] [manual]
```

Auto-mode (default) reads session signals and adapts automatically --- after a string of test failures the voice sounds `[weary]`, after a successful release it sounds `[excited]`.

### Switch to chime-only

```text
> /mute

Muted — chimes only.
```

Chimes are mood-aware: when a vibe is active, chimes pitch-shift to match (bright for happy sessions, dark for frustrated ones). Eight distinct signals (tests pass/fail, lint pass/fail, git push, merge conflict, done, prompt) × three mood variants = 24 chime assets.

## Commands

| Command | Purpose |
|---------|---------|
| `/vox y` | Enable vox (chime notifications) |
| `/vox n` | Disable vox |
| `/vox c` | Continuous mode (spoken summaries on task completion) |
| `/unmute` | Enable voice mode (spoken notifications) |
| `/unmute @matilda` | Set session voice + enable voice |
| `/unmute @` | Browse voice roster |
| `/mute` | Chimes only --- no voice |
| `/recap` | Spoken summary of Claude's last response |
| `/vibe <mood>` | Set session mood --- voice adapts to match |
| `/vibe auto` | Auto-detect mood from session signals (default) |
| `/vibe off` | Disable vibe --- neutral voice |
| `/music on` | Start vibe-driven background music |
| `/music on style techno` | Start music with a style modifier |
| `/music off` | Stop background music |

## Providers

The full experience --- natural voice with expressive tags that respond to `/vibe` --- requires ElevenLabs. The other providers are fallbacks for environments where ElevenLabs isn't available.

| Provider | API Key | Default Voice | Best For |
|----------|---------|---------------|----------|
| **ElevenLabs** | `ELEVENLABS_API_KEY` | matilda | **Recommended.** Natural voice, expressive tags via `/vibe` |
| OpenAI | `OPENAI_API_KEY` | nova | Fast notifications, low latency |
| AWS Polly | AWS credentials | joanna | Natural voice, cost-effective |
| macOS say | — | samantha | Zero-config on macOS, offline |
| espeak-ng | — | en | Zero-config on Linux, offline |

Auto-detection order: ElevenLabs > OpenAI > Polly (if AWS credentials valid) > say (macOS) / espeak (Linux).

### Per-call API keys for billing isolation

If you maintain multiple provider API keys for cost attribution (for
example, separate ElevenLabs keys for different projects), you can
pass a per-call override for any `vox unmute` invocation. The override
is per-call only: never persisted to `keys.env`, never logged by the
daemon, never echoed to stdout, never visible to concurrent requests
on the same daemon. Four input paths are supported, from most to
least secure:

1. **Environment variable** (recommended for scripting):

   ```bash
   export VOX_API_KEY=$(pass show vox/proj_a)
   vox unmute "billable to project A"
   ```

   On Linux, `VOX_API_KEY` is exposed via `/proc/<pid>/environ`,
   which is typically only readable by the process owner. macOS has
   no Linux-style `/proc` filesystem so env vars are not exposed
   that way by default, but they are still generally less visible
   than `argv` (which `ps` prints on any shared system). Either way,
   env vars are materially safer than passing the key literally on
   the command line.

2. **File** (recommended for stored keys):

   ```bash
   vox unmute "billable to project A" \
     --api-key-file ~/.config/vox/key_project_a.txt
   ```

   The file should be mode 0600 (owner read/write only). `vox` warns
   if any group or other permission bits are set and suggests
   `chmod 600`.

3. **Standard input** (recommended for password managers):

   ```bash
   pass show vox/proj_a | vox unmute "billable to project A" --api-key-stdin
   ```

   Reads one line from stdin. Refuses to read from a tty so a
   forgotten pipe fails loudly instead of blocking on an interactive
   prompt.

4. **Command line** (demo only — **not** for real credentials):

   ```bash
   vox unmute "billable to project A" --api-key sk_demo_key
   ```

   **Warning**: `--api-key` on the command line exposes the value
   via `ps` (and, on Linux, `/proc/*/cmdline`), shell history, and
   terminal recordings. `vox` prints a stderr warning whenever you
   use it. Use one of the other three paths for real credentials.

The four paths are mutually exclusive; specifying more than one is an
error. This is **not** multi-tenant isolation — vox is a single-user
tool. The feature is for attributing synthesis cost to the right
project within one user's account, not for isolating tenants.

Per-call `api_key` calls bypass the synthesis cache so every
invocation reaches the provider; use anonymous calls (`keys.env`) for
cache hits.

## Architecture

```text
Claude Code ◄── stdio ──► vox mcp ── WebSocket ──► voxd :8421
                                                      │
Hook scripts ──► vox hook <event> ── WebSocket ──►    │
                                                      │
Shell        ──► vox unmute "hi"  ── WebSocket ──►    │
                                                      ▼
                                                   speakers
```

**`voxd`** is a system-level audio daemon. It synthesizes text via TTS providers and plays audio through the speakers. It owns the playback queue (sequential, no overlap), deduplicates identical requests within 5 seconds, and caches synthesis results. It knows nothing about MCP, hooks, projects, or Claude Code.

**`vox mcp`** is a lightweight stdio MCP server, one per Claude Code session. It holds session state (voice, vibe, notify mode) in memory and delegates synthesis to `voxd` over WebSocket. It inherits its working directory from Claude Code and finds `.vox/config.md` by walking up from there.

**`vox hook <event>`** handlers call `voxd` for chimes and speech. Hook shell scripts are thin gates per the [hooks standard](https://github.com/punt-labs/punt-kit/blob/main/standards/hooks.md).

**`vox unmute`** and other CLI commands are one-shot WebSocket clients of `voxd`.

### State Paths

`voxd` runs as a single user (`User=` in the systemd unit, `UserName` in the launchd plist), so all of its state is per-user, not system-shared. Everything lives under the installing user's home directory — no `/etc`, no `/var`, same layout on macOS and Linux.

| Purpose | Path |
|---------|------|
| Config (API keys) | `~/.punt-labs/vox/keys.env` |
| Logs | `~/.punt-labs/vox/logs/voxd.log` |
| Runtime state | `~/.punt-labs/vox/run/serve.{port,token}` |
| Cache | `~/.punt-labs/vox/cache/` |
| Service unit (Linux) | `/etc/systemd/system/voxd.service` |
| Service plist (macOS) | `/Library/LaunchDaemons/com.punt-labs.voxd.plist` |

### Service Install

```bash
vox daemon install    # registers service, writes keys.env, starts voxd
```

vox prompts once for your sudo password when it installs the system service unit. Everything else runs as your normal user. The `keys.env` file and all other per-user state are created in your home dir with normal user permissions — no chown, no fd tricks, no symlink defenses. The daemon runs as the installing user, not root — it needs audio device access tied to the desktop session.

**Upgrading from v3 or v4.0.x?** If you had cloud provider keys configured before v3.0.0 (2026-03-29), they will work again automatically after you upgrade. v3 moved voxd's config dir to `/etc/vox/` but never migrated your existing `~/.punt-labs/vox/keys.env` — this release reverts the path and your pre-v3 keys come back online without any manual intervention.

### Session State

Session state (voice, provider, vibe, notify mode) lives in the MCP server's memory. The daemon is stateless with respect to sessions. Per-project enablement and initial state are read from `.vox/config.md` in the project directory at MCP server startup. Hook handlers also read and write `.vox/config.md` for signal accumulation (`vibe_signals`). The daemon never reads this file.

### Daemon Restart

The MCP session (Claude Code ↔ `vox mcp`) is stdio — unaffected by daemon restarts. The WebSocket connection (`vox mcp` ↔ `voxd`) reconnects automatically. No session data is lost.

## CLI

punt-vox is also a standalone TTS tool, independent of Claude Code.

```bash
vox unmute "Hello world"                       # Synthesize + play
vox unmute "Wall broadcast" --once 600         # Dedup identical text within 600s (for N-session broadcasts)
vox record "Hello world" -o hello.mp3          # Synthesize + save
vox record --from segments.json                # From JSON segments file
vox vibe excited                               # Set session mood
vox notify y                                   # Enable notifications
vox notify c                                   # Continuous spoken mode
vox speak n                                    # Chimes only
vox voice matilda                              # Set session voice
vox music on                                   # Start background music
vox music on --style techno                    # Start music with style modifier
vox music off                                  # Stop background music
vox status                                     # Current state
vox version                                    # Print version
vox doctor                                     # Check setup
vox install                                    # Install Claude Code plugin
vox mcp                                        # Start MCP server (stdio)
voxd                                           # Start audio daemon
vox daemon install                             # Register voxd as system service + write API keys (prompts once for sudo)
vox daemon status                              # Check if daemon is running
```

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `TTS_PROVIDER` | Force a specific provider | auto-detect |
| `TTS_MODEL` | Model override | provider default |
| `VOX_OUTPUT_DIR` | Output directory | `~/vox-output` |

Provider API keys (`ELEVENLABS_API_KEY`, `OPENAI_API_KEY`, `AWS_*`) live in `~/.punt-labs/vox/keys.env`, not in your shell rc. See [Configure providers](#configure-providers) for the full walkthrough.

`vox daemon install` also seeds `keys.env` with any provider keys that happen to be set in the shell that runs the install, so setting them in `.envrc` or similar before running the installer works too. Either way, edits after install go directly into the file.

## Roadmap

### Shipped

- **Mic API**: unified `unmute`/`record`/`vibe`/`who` MCP tools with segment-based input
- Notification layer: `/vox y|n|c`, `/mute`, `/unmute`, `/recap`, Stop + Notification hooks
- Multi-provider TTS engine: ElevenLabs, AWS Polly, OpenAI, macOS `say`, Linux `espeak-ng`
- Claude Code plugin: marketplace install, MCP server, slash commands
- CLI: unmute, record, vibe, on/off, mute, version, status, doctor
- Two-channel display: `♪` panel summaries with voice/provider context
- ElevenLabs streaming API for lower time-to-first-audio
- `/vibe` with auto, manual, and off modes --- ElevenLabs expressive tags color every utterance
- Auto-vibe signal accumulator: test pass/fail, lint, git ops feed mood detection
- Per-signal chime assets and vibe-driven chimes with mood-aware pitch shifting
- Audio daemon (`voxd`): system-level audio server with in-memory playback queue, dedup, synthesis cache, launchd/systemd service management

### Coming Soon

| Feature | What It Does |
|---------|-------------|
| **Per-session voices** | Each Claude Code session gets its own voice from a pool --- no more five matildas talking at once. `/voice` to audition and pick. |

## Documentation

[Architecture (PDF)](docs/architecture.pdf) |
[Design Log](DESIGN.md) |
[Testing](TESTING.md) |
[Changelog](CHANGELOG.md)

## Development

```bash
uv sync --all-extras    # Install dependencies
make check              # Run all quality gates
```

## License

MIT
