Metadata-Version: 2.4
Name: soliloquy-tts
Version: 0.8.0
Summary: A text-to-speech MCP server powered by Kokoro — gives Claude Code a voice
Project-URL: Homepage, https://gitlab.com/bw-stovall/soliloquy
Project-URL: Repository, https://gitlab.com/bw-stovall/soliloquy
Project-URL: Issues, https://gitlab.com/bw-stovall/soliloquy/-/issues
Author: Barry Stovall
License: MIT
License-File: LICENSE
Keywords: claude,kokoro,mcp,text-to-speech,tts,voice
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Requires-Dist: kokoro>=0.9.4
Requires-Dist: mcp>=1.1.3
Requires-Dist: numpy>=2.0
Requires-Dist: rumps>=0.4.0; sys_platform == 'darwin'
Requires-Dist: sounddevice>=0.4.6
Provides-Extra: dev
Requires-Dist: anyio; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-anyio; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Description-Content-Type: text/markdown

# Soliloquy

[![PyPI version](https://img.shields.io/pypi/v/soliloquy-tts)](https://pypi.org/project/soliloquy-tts/)
[![Python](https://img.shields.io/pypi/pyversions/soliloquy-tts)](https://pypi.org/project/soliloquy-tts/)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![PyPI downloads](https://img.shields.io/pypi/dm/soliloquy-tts)](https://pypi.org/project/soliloquy-tts/)

A text-to-speech MCP server powered by [Kokoro](https://github.com/hexgrad/kokoro) — gives Claude Code a voice.

**One command to install. No config, no API keys, no cloud.**

## Why Soliloquy?

| | Cloud TTS (ElevenLabs, OpenAI, etc.) | **Soliloquy** |
|---|---|---|
| **Privacy** | Text sent to third-party servers | Runs entirely on your machine |
| **Cost** | $0.18-15/1M chars | Free forever |
| **Offline** | No | Yes |
| **Usage Limits** | Quotas / rate limits | Unlimited |
| **Latency** | 200-500ms (network) | ~50-100ms (local) |
| **AI Integration** | Developer calls API from code | AI agent decides when to speak |
| **Setup** | API keys + billing | One command, no config |

## What You Can Do

Once installed, just talk to Claude naturally:

- **"turn on auto speak"** — automatically voice conversational responses
- **"read this file aloud"** — listen to docs, chapters, or articles
- **"speak aloud"** — voice a specific response
- **"stop"** — stop audio playback

### Auto-Speak

The flagship feature. Auto-speak voices every Claude conversational response automatically using a background hook — no tool call needed, zero token overhead. It just works.

> "Turn on auto speak"

From that point on, everything Claude says, you hear. Toggle it off just as easily:

> "Turn off auto speak"

When you use `read_aloud` or `speak` explicitly, auto-speak steps aside and lets the explicit playback finish.

### Speech Normalization

Soliloquy doesn't just read text literally — it understands what sounds natural:

- **Code blocks** become "See the code below" instead of reading syntax aloud
- **Tables** are summarized ("There's a table here with 5 rows")
- **Symbols** are spoken naturally (arrows become "to", URLs are simplified)
- **Lists** are enumerated ("First... Second... Third...")
- **Paragraph breaks** produce natural pauses between sections

Technical content and markdown-heavy responses sound like a person reading them to you, not a robot parsing characters.

## Requirements

- **macOS**, **Windows**, or **Linux**
- **Python 3.10+**
- **PortAudio** (audio output library)

| Platform | Install PortAudio |
|----------|------------------|
| macOS | `brew install portaudio` |
| Windows | Bundled with sounddevice (no action needed) |
| Linux | `sudo apt install libportaudio2` |

> **Note:** First install downloads ~2GB of dependencies (PyTorch, model weights). First run also downloads the Kokoro-82M model from HuggingFace.

### Optional: macOS Menu Bar Control

On macOS, you can install `rumps` for a menu bar icon that lets you stop playback instantly — no need to go through Claude Code:

```bash
pip install rumps
```

Without it, everything still works — you just use the `stop` command through Claude Code instead.

## Quick Start

Make sure PortAudio is installed (see above), then:

```bash
uvx soliloquy-tts
```

That's it. This registers the MCP server, configures auto-speak, and sets everything up. Restart Claude Code afterward and you're good to go.

> Requires [uv](https://github.com/astral-sh/uv). Install it with `brew install uv` (macOS), `sudo apt install uv` (Linux), or see the [uv docs](https://github.com/astral-sh/uv).

### What happens when you run it

1. Registers Soliloquy as an MCP server with Claude Code
2. Writes a hook script for automatic voicing
3. Configures the Claude Code Stop hook

You only need to do this once. After that, Claude Code starts Soliloquy automatically in the background whenever you open a session.

### With pip

```bash
pip install soliloquy-tts
soliloquy
```

Same setup flow. Run `soliloquy` from your terminal and it handles the rest.

## How It Works

Soliloquy uses a hybrid architecture to share a single model across multiple Claude Code sessions:

- **First session** loads the Kokoro model and starts a local backend server
- **Additional sessions** detect the running backend and connect as lightweight proxies (near-instant startup, no extra memory)
- If the backend exits, the next session automatically takes over

This is completely transparent — no configuration needed.

## Reference

### Tools

**`speak`** — Synthesize and play text aloud.

| Parameter | Default | Description |
|-----------|---------|-------------|
| `text` | *(required)* | Text to speak |
| `voice` | `af_heart` | Voice ID |
| `speed` | `1.0` | Speed multiplier (0.5 - 2.0) |
| `lang` | `en-us` | Language code |

**`read_aloud`** — Read a file aloud directly. Supports plain text and markdown.

| Parameter | Default | Description |
|-----------|---------|-------------|
| `path` | *(required)* | Path to the file to read |
| `voice` | `af_heart` | Voice ID |
| `speed` | `1.0` | Speed multiplier (0.5 - 2.0) |
| `lang` | `en-us` | Language code |

**`stop`** — Stop audio playback immediately.

**`auto_speak`** — Toggle automatic voicing on or off.

**`list_voices`** — List all available voices.

### Voices

28 voices across American and British English. Default is `af_heart`.

<details>
	<summary>View all voices</summary>
  <table>
    <thead>
    	<tr>
      	<td>Voice</td>
        <td>Accent</td>
        <td>Gender</td>
      </tr>
    </thead>
    <tbody>
    	<tr><td>af_heart</td><td>American</td><td>Female</td></tr>
      <tr><td>af_alloy</td><td>American</td><td>Female</td></tr>
      <tr><td>af_aoede</td><td>American</td><td>Female</td></tr>
      <tr><td>af_bella</td><td>American</td><td>Female</td></tr>
      <tr><td>af_jessica</td><td>American</td><td>Female</td></tr>
      <tr><td>af_kore</td><td>American</td><td>Female</td></tr>
      <tr><td>af_nicole</td><td>American</td><td>Female</td></tr>
      <tr><td>af_nova</td><td>American</td><td>Female</td></tr>
      <tr><td>af_river</td><td>American</td><td>Female</td></tr>
      <tr><td>af_sarah</td><td>American</td><td>Female</td></tr>
      <tr><td>af_sky</td><td>American</td><td>Female</td></tr>
      <tr><td>am_adam</td><td>American</td><td>Male</td></tr>
      <tr><td>am_echo</td><td>American</td><td>Male</td></tr>
      <tr><td>am_eric</td><td>American</td><td>Male</td></tr>
      <tr><td>am_fenrir</td><td>American</td><td>Male</td></tr>
      <tr><td>am_liam</td><td>American</td><td>Male</td></tr>
      <tr><td>am_michael</td><td>American</td><td>Male</td></tr>
      <tr><td>am_onyx</td><td>American</td><td>Male</td></tr>
      <tr><td>am_puck</td><td>American</td><td>Male</td></tr>
      <tr><td>am_santa</td><td>American</td><td>Male</td></tr>
      <tr><td>bf_alice</td><td>British</td><td>Female</td></tr>
      <tr><td>bf_emma</td><td>British</td><td>Female</td></tr>
      <tr><td>bf_isabella</td><td>British</td><td>Female</td></tr>
      <tr><td>bf_lily</td><td>British</td><td>Female</td></tr>
      <tr><td>bm_daniel</td><td>British</td><td>Male</td></tr>
      <tr><td>bm_fable</td><td>British</td><td>Male</td></tr>
      <tr><td>bm_george</td><td>British</td><td>Male</td></tr>
      <tr><td>bm_lewis</td><td>British</td><td>Male</td></tr>
    </tbody>
  </table>
</details>

### Languages

`en-us` (default), `en-gb`, `ja`, `zh`, `es`, `fr`, `hi`, `it`, `pt-br`

## Uninstall

```bash
soliloquy --uninstall
```

This removes the MCP server registration, auto-speak hook, and all config files. Works with `uvx soliloquy-tts --uninstall` too.

## Development

```bash
git clone https://gitlab.com/bw-stovall/soliloquy.git
cd soliloquy
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
pytest tests/ -v
```

## License

MIT
