Metadata-Version: 2.4
Name: soliloquy-tts
Version: 0.9.0
Summary: A text-to-speech MCP server powered by Kokoro — gives Claude Code a voice
Project-URL: Homepage, https://gitlab.com/bw-stovall/soliloquy
Project-URL: Repository, https://gitlab.com/bw-stovall/soliloquy
Project-URL: Issues, https://gitlab.com/bw-stovall/soliloquy/-/issues
Author: Barry Stovall
License: MIT
License-File: LICENSE
Keywords: claude,kokoro,mcp,text-to-speech,tts,voice
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Requires-Dist: kokoro>=0.9.4
Requires-Dist: mcp>=1.1.3
Requires-Dist: numpy>=2.0
Requires-Dist: rumps>=0.4.0; sys_platform == 'darwin'
Requires-Dist: sounddevice>=0.4.6
Provides-Extra: dev
Requires-Dist: anyio; extra == 'dev'
Requires-Dist: beautifulsoup4>=4.12; extra == 'dev'
Requires-Dist: ebooklib>=0.18; extra == 'dev'
Requires-Dist: pdfplumber>=0.10; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-anyio; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: reportlab; extra == 'dev'
Provides-Extra: documents
Requires-Dist: beautifulsoup4>=4.12; extra == 'documents'
Requires-Dist: ebooklib>=0.18; extra == 'documents'
Requires-Dist: pdfplumber>=0.10; extra == 'documents'
Provides-Extra: epub
Requires-Dist: beautifulsoup4>=4.12; extra == 'epub'
Requires-Dist: ebooklib>=0.18; extra == 'epub'
Provides-Extra: pdf
Requires-Dist: pdfplumber>=0.10; extra == 'pdf'
Description-Content-Type: text/markdown

# Soliloquy

[![PyPI version](https://img.shields.io/pypi/v/soliloquy-tts)](https://pypi.org/project/soliloquy-tts/)
[![Python](https://img.shields.io/pypi/pyversions/soliloquy-tts)](https://pypi.org/project/soliloquy-tts/)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![PyPI downloads](https://img.shields.io/pypi/dm/soliloquy-tts)](https://pypi.org/project/soliloquy-tts/)

A text-to-speech MCP server powered by [Kokoro](https://github.com/hexgrad/kokoro) — gives Claude Code a voice.

**One command to install. No config, no API keys, no cloud.**

## Why Soliloquy?

| | Cloud TTS (ElevenLabs, OpenAI, etc.) | **Soliloquy** |
|---|---|---|
| **Privacy** | Text sent to third-party servers | Runs entirely on your machine |
| **Cost** | $0.18-15/1M chars | Free forever |
| **Offline** | No | Yes |
| **Usage Limits** | Quotas / rate limits | Unlimited |
| **Latency** | 200-500ms (network) | ~50-100ms (local) |
| **AI Integration** | Developer calls API from code | AI agent decides when to speak |
| **Setup** | API keys + billing | One command, no config |

## What You Can Do

Once installed, just talk to Claude naturally:

- **"turn on auto speak"** — automatically voice conversational responses
- **"read this file aloud"** — listen to docs, chapters, or articles (text, markdown, PDF, EPUB)
- **"read this book"** — EPUBs are treated as audiobooks: chapters announced, resume on next call
- **"pick up where we left off"** — resume any book you've started
- **"jump to chapter 5"** / **"read the preface"** / **"next chapter"** — navigate around
- **"speak aloud"** — voice a specific response
- **"stop"** — stop audio playback

### Auto-Speak

The flagship feature. Auto-speak voices every Claude conversational response automatically using a background hook — no tool call needed, zero token overhead. It just works.

> "Turn on auto speak"

From that point on, everything Claude says, you hear. Toggle it off just as easily:

> "Turn off auto speak"

When you use `read_aloud` or `speak` explicitly, auto-speak steps aside and lets the explicit playback finish.

### Speech Normalization

Soliloquy doesn't just read text literally — it understands what sounds natural:

- **Code blocks** become "See the code below" instead of reading syntax aloud
- **Tables** are summarized ("There's a table here with 5 rows")
- **Symbols** are spoken naturally (arrows become "to", URLs are simplified)
- **Lists** are enumerated ("First... Second... Third...")
- **Paragraph breaks** produce natural pauses between sections

Technical content and markdown-heavy responses sound like a person reading them to you, not a robot parsing characters.

### Audiobook Mode (Beta)

> 🧪 **Beta feature.** Works well on cleanly-authored EPUBs (O'Reilly, Manning, Standard Ebooks, most professional publishers). EPUB tagging varies wildly in the wild — please [open an issue](https://gitlab.com/bw-stovall/soliloquy/-/issues) if a specific EPUB misclassifies chapters or sounds wrong.

Pass an EPUB to `read_aloud` and Soliloquy treats it as an audiobook. Each chapter is announced by its title, frontmatter (cover, preface, "About the Author") is skipped by default, and playback resumes from where you left off on the next call.

```bash
pip install 'soliloquy-tts[epub]'      # EPUB only
pip install 'soliloquy-tts[documents]' # PDF + EPUB
```

Talk to Claude naturally:

| You say… | What happens |
|---|---|
| *"Read this book"* | Starts a fresh book, or resumes if you've read it before |
| *"What chapters are in this book?"* | Shows the table of contents |
| *"Where am I?"* | Reports your current chapter and progress |
| *"Jump to chapter 5"* | Skip to that chapter (`chapter=5`) |
| *"Read 'The Forest'"* | Match a chapter by its title (`chapter="The Forest"`) |
| *"Next chapter"* / *"previous chapter"* | Move ±1 chapter |
| *"Restart this chapter"* | Replay from the chapter's first sentence |
| *"Start over"* | Wipe the bookmark and restart from chapter 1 |
| *"Show me the frontmatter too"* | Include cover, preface, etc. (`include_frontmatter=True`) |

#### How resume works

Bookmarks live as JSON files under `~/.soliloquy/library/`, keyed by the SHA256 content hash of the source file. A book moved to a new folder keeps its place; two identical copies share one bookmark. Atomic writes (tempfile + rename) mean a crash mid-write can never corrupt your position.

The bookmark layer uses a **hybrid update strategy** that tracks actual audio playback — not synthesis-ahead. Your position is saved every few audio chunks during playback and again on stop, so resume replays at most one short batch of audio. You'll never skip ahead and miss content.

#### Smart frontmatter detection

Four cascading signals identify cover, dedication, preface, appendix, index, and similar non-body content:

1. EPUB3 nav doc entries marked `epub:type="frontmatter|backmatter"`
2. Spine entries with `linear="no"`
3. Spine idrefs starting with semantic names (`cover`, `titlepage`, `preface`, `appendix`, ...)
4. Chapter titles matching frontmatter patterns (`"Preface"`, `"Index"`, `"Appendix A: …"`)

Professional publishers tag spine entries cleanly — heuristics handle them well. Hobby or scraped EPUBs sometimes mislabel title pages or backmatter as body chapters. When that happens, pass `include_frontmatter=True` to play everything in spine order, or use `chapter="<title>"` to jump straight to the right content by name.

#### Known limitations

- **EPUB only.** PDF and text files read as one-shot only — page-level book mode for PDFs is a future iteration.
- **Chapter numbers aren't spoken.** The internal chapter index is the EPUB spine position (an implementation detail), not the "story chapter N" a reader would recognize. Only the title is announced. The index is still used for navigation (`chapter=5`, `bookmark_status`, etc.).
- **No multi-language detection.** All chapters use the configured `lang`.

For non-book use cases, `read_aloud` still works as a one-shot reader for `.txt`, `.md`, and `.pdf` files (PDFs require the `[pdf]` extra).

## Requirements

- **macOS**, **Windows**, or **Linux**
- **Python 3.10+**
- **PortAudio** (audio output library)

| Platform | Install PortAudio |
|----------|------------------|
| macOS | `brew install portaudio` |
| Windows | Bundled with sounddevice (no action needed) |
| Linux | `sudo apt install libportaudio2` |

> **Note:** First install downloads ~2GB of dependencies (PyTorch, model weights). First run also downloads the Kokoro-82M model from HuggingFace.

### Optional: PDF and EPUB support

To read PDFs and EPUBs (including audiobook-mode resume on EPUBs), install the document extras:

```bash
pip install 'soliloquy-tts[documents]'   # both PDF and EPUB
pip install 'soliloquy-tts[pdf]'         # PDF only
pip install 'soliloquy-tts[epub]'        # EPUB only
```

With `uvx`, replace `soliloquy-tts` in your setup command (e.g. `uvx 'soliloquy-tts[documents]'`).

### Optional: macOS Menu Bar Control

On macOS, you can install `rumps` for a menu bar icon that lets you stop playback instantly — no need to go through Claude Code:

```bash
pip install rumps
```

Without it, everything still works — you just use the `stop` command through Claude Code instead.

## Quick Start

Make sure PortAudio is installed (see above), then:

```bash
uvx soliloquy-tts
```

That's it. This registers the MCP server, configures auto-speak, and sets everything up. Restart Claude Code afterward and you're good to go.

> Requires [uv](https://github.com/astral-sh/uv). Install it with `brew install uv` (macOS), `sudo apt install uv` (Linux), or see the [uv docs](https://github.com/astral-sh/uv).

### What happens when you run it

1. Registers Soliloquy as an MCP server with Claude Code
2. Writes a hook script for automatic voicing
3. Configures the Claude Code Stop hook

You only need to do this once. After that, Claude Code starts Soliloquy automatically in the background whenever you open a session.

### With pip

```bash
pip install soliloquy-tts
soliloquy
```

Same setup flow. Run `soliloquy` from your terminal and it handles the rest.

## How It Works

Soliloquy uses a hybrid architecture to share a single model across multiple Claude Code sessions:

- **First session** loads the Kokoro model and starts a local backend server
- **Additional sessions** detect the running backend and connect as lightweight proxies (near-instant startup, no extra memory)
- If the backend exits, the next session automatically takes over

This is completely transparent — no configuration needed.

## Reference

### Tools

**`speak`** — Synthesize and play text aloud.

| Parameter | Default | Description |
|-----------|---------|-------------|
| `text` | *(required)* | Text to speak |
| `voice` | `af_heart` | Voice ID |
| `speed` | `1.0` | Speed multiplier (0.5 - 2.0) |
| `lang` | `en-us` | Language code |

**`read_aloud`** — Read a file aloud directly. Supports plain text, markdown, PDF (with `[pdf]`), and EPUB (with `[epub]`). EPUBs become audiobooks by default.

| Parameter | Default | Description |
|-----------|---------|-------------|
| `path` | *(required)* | Path to the file to read |
| `voice` | `af_heart` | Voice ID |
| `speed` | `1.0` | Speed multiplier (0.5 - 2.0) |
| `lang` | `en-us` | Language code |
| `pages` | *(none)* | PDF page range, e.g. `"1-3"` or `"5"` |
| `chapter` | *(none)* | EPUB navigation: integer, title substring, `"next"`, `"prev"` |
| `restart` | `False` | EPUB: start at chapter 1, reset bookmark |
| `restart_chapter` | `False` | EPUB: restart current chapter from the beginning |
| `as_book` | *(auto)* | Force book mode on/off (default: on for EPUB, off otherwise) |
| `include_frontmatter` | `False` | EPUB: include cover, preface, etc. in playback |

**`bookmark_status`** — Show your current position in a book.

**`list_chapters`** — Show the table of contents for an EPUB. Pass `include_frontmatter=True` to also list cover, preface, etc.

**`stop`** — Stop audio playback immediately.

**`auto_speak`** — Toggle automatic voicing on or off.

**`list_voices`** — List all available voices.

### Voices

28 voices across American and British English. Default is `af_heart`.

<details>
	<summary>View all voices</summary>
  <table>
    <thead>
    	<tr>
      	<td>Voice</td>
        <td>Accent</td>
        <td>Gender</td>
      </tr>
    </thead>
    <tbody>
    	<tr><td>af_heart</td><td>American</td><td>Female</td></tr>
      <tr><td>af_alloy</td><td>American</td><td>Female</td></tr>
      <tr><td>af_aoede</td><td>American</td><td>Female</td></tr>
      <tr><td>af_bella</td><td>American</td><td>Female</td></tr>
      <tr><td>af_jessica</td><td>American</td><td>Female</td></tr>
      <tr><td>af_kore</td><td>American</td><td>Female</td></tr>
      <tr><td>af_nicole</td><td>American</td><td>Female</td></tr>
      <tr><td>af_nova</td><td>American</td><td>Female</td></tr>
      <tr><td>af_river</td><td>American</td><td>Female</td></tr>
      <tr><td>af_sarah</td><td>American</td><td>Female</td></tr>
      <tr><td>af_sky</td><td>American</td><td>Female</td></tr>
      <tr><td>am_adam</td><td>American</td><td>Male</td></tr>
      <tr><td>am_echo</td><td>American</td><td>Male</td></tr>
      <tr><td>am_eric</td><td>American</td><td>Male</td></tr>
      <tr><td>am_fenrir</td><td>American</td><td>Male</td></tr>
      <tr><td>am_liam</td><td>American</td><td>Male</td></tr>
      <tr><td>am_michael</td><td>American</td><td>Male</td></tr>
      <tr><td>am_onyx</td><td>American</td><td>Male</td></tr>
      <tr><td>am_puck</td><td>American</td><td>Male</td></tr>
      <tr><td>am_santa</td><td>American</td><td>Male</td></tr>
      <tr><td>bf_alice</td><td>British</td><td>Female</td></tr>
      <tr><td>bf_emma</td><td>British</td><td>Female</td></tr>
      <tr><td>bf_isabella</td><td>British</td><td>Female</td></tr>
      <tr><td>bf_lily</td><td>British</td><td>Female</td></tr>
      <tr><td>bm_daniel</td><td>British</td><td>Male</td></tr>
      <tr><td>bm_fable</td><td>British</td><td>Male</td></tr>
      <tr><td>bm_george</td><td>British</td><td>Male</td></tr>
      <tr><td>bm_lewis</td><td>British</td><td>Male</td></tr>
    </tbody>
  </table>
</details>

### Languages

`en-us` (default), `en-gb`, `ja`, `zh`, `es`, `fr`, `hi`, `it`, `pt-br`

## Library Usage (StreamingSession)

Soliloquy can also be used as a Python library for streaming TTS — no MCP server needed. Push text as it arrives (from an LLM, WebSocket, etc.) and Soliloquy handles buffering, batching, and gapless playback.

```python
from soliloquy.streaming import StreamingSession

session = StreamingSession(voice="af_heart", speed=1.0)

# Push text as it arrives
session.push("Here's the first sentence.")
session.push("And here's another one.")
session.push("The model is still generating...")

# Signal end of input — flushes remaining text and waits for audio
session.finish()

# Or cancel immediately
session.stop()
```

StreamingSession adapts batch size to audio buffer health — synthesizing immediately when the buffer is low, and batching efficiently when it's healthy. All the same speech normalization, adaptive chunking, and gapless playback from the MCP tools, with a simple push-based API.

## Uninstall

```bash
soliloquy --uninstall
```

This removes the MCP server registration, auto-speak hook, and all config files. Works with `uvx soliloquy-tts --uninstall` too.

## Development

```bash
git clone https://gitlab.com/bw-stovall/soliloquy.git
cd soliloquy
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .[dev,pdf,epub]
pytest tests/ -v
```

## License

MIT
