Metadata-Version: 2.4
Name: text-to-speech-mcp
Version: 0.2.0
Summary: Open-source MCP server that reads text aloud locally using Windows SAPI without API keys or cloud services.
Author: Faizan Ali
License-Expression: MIT
Project-URL: Homepage, https://github.com/engr-faizanali/text-to-speech-mcp
Project-URL: Repository, https://github.com/engr-faizanali/text-to-speech-mcp
Project-URL: Issues, https://github.com/engr-faizanali/text-to-speech-mcp/issues
Keywords: mcp,model-context-protocol,text-to-speech,tts,sapi,accessibility
Classifier: Development Status :: 3 - Alpha
Classifier: Operating System :: Microsoft :: Windows :: Windows 10
Classifier: Operating System :: Microsoft :: Windows :: Windows 11
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.28.0
Requires-Dist: pydantic>=2.11.0
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: jsonschema>=4.20; extra == "dev"
Requires-Dist: tomli>=2.0; python_version < "3.11" and extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Dynamic: license-file

# Text to Speech MCP Server

<!-- mcp-name: io.github.engr-faizanali/text-to-speech -->

Text to Speech is an open-source Model Context Protocol (MCP) server that lets
AI assistants read text aloud on the user's computer. On Windows it uses the
built-in Speech API (SAPI) by default, so no API key, account, subscription, or
cloud text-to-speech service is required.

The server exposes one model-controlled tool:

```text
speak_text(text: string)
```

Use it for user-provided text, assistant answers, accessibility workflows, or
spoken progress updates while an agent works.

## Features

- Local playback through Windows SAPI by default.
- No cloud API and no API key for the default setup.
- FIFO playback: concurrent requests are spoken one at a time, in order.
- Blocking tool completion: each call returns after its audio finishes.
- Bounded input and queue sizes to prevent unbounded resource use.
- Temporary generated WAV files are removed after playback by default.
- Standard MCP `stdio` transport through the official Python SDK.
- Optional Piper, Transformers MMS, and local HTTP backends for advanced users.

> The MCP server source is open source under the MIT License. Windows SAPI is a
> proprietary component included with Windows; it is not an open-source speech
> engine.

## Requirements

- Windows 10 or Windows 11 for the zero-configuration SAPI backend.
- Python 3.10 or newer.
- An MCP client such as Codex, Claude Desktop, or another compatible client.
- `uv`/`uvx` is recommended for package-based MCP installation.

## Install

After the package is published to PyPI, configure an MCP client to run:

```text
uvx text-to-speech-mcp
```

For Codex, add this to `~/.codex/config.toml`:

```toml
[mcp_servers.text_to_speech]
command = "uvx"
args = ["text-to-speech-mcp"]
startup_timeout_sec = 30
tool_timeout_sec = 300
enabled = true
```

Restart the MCP client after changing its configuration.

### Install from source

```powershell
git clone https://github.com/engr-faizanali/text-to-speech-mcp.git
cd text-to-speech-mcp
python -m pip install -e .
```

Then configure the client to run `text-to-speech-mcp` directly.

## Prompt Examples

Read arbitrary text:

```text
Use the Text to Speech tool to read aloud: The deployment completed successfully.
```

Read the final answer:

```text
Use the Text to Speech tool to read your final response aloud before displaying it.
```

Read visible intermediate progress updates in order:

```text
Use the text_to_speech MCP server's speak_text tool for spoken progress updates.

For every meaningful intermediate update that you display to me:
1. Write a concise, natural-language version of the update.
2. Call speak_text with that text.
3. Wait for the call to finish before producing or speaking the next update.
4. Then display the same update in text.

Also read the final answer aloud before displaying it. Never narrate hidden
reasoning, chain-of-thought, secrets, credentials, raw tool output, terminal
logs, or source code. Do not invoke speech calls in parallel. If the tool is
unavailable, continue normally in text and report the failure once.
```

The `text_to_speech` portion is the client-side server name from the Codex
configuration. Other clients may display a different namespace while keeping
the tool name `speak_text`.

## Tool Contract

| Field | Value |
| --- | --- |
| Tool name | `speak_text` |
| Input | `text`, required string, 1-10,000 characters |
| Result | Completion message after local playback finishes |
| Ordering | FIFO, one active playback at a time |
| Queue limit | 32 pending requests |
| Network use with SAPI | None |

The tool is model-controlled under MCP. The user decides when to ask the model
to call it, and the MCP client may show or require approval for tool calls.

## Privacy

With the default SAPI backend, text is passed from the MCP client to a local
Python process and then to Windows speech components. It is not sent to this
project, an external API, or a cloud TTS provider. Generated WAV files are
written under `%TEMP%\text-to-speech-mcp` and deleted after playback unless
`TEXT_TO_SPEECH_KEEP_AUDIO=true` is set.

Do not ask an AI assistant to speak secrets, credentials, private keys, hidden
reasoning, or sensitive tool output.

## Optional Backends

The default requires no configuration:

```toml
TEXT_TO_SPEECH_BACKEND = "sapi"
```

Advanced users can set `TEXT_TO_SPEECH_BACKEND` to `piper`,
`transformers_mms`, or `http`. These options require their own local model,
binary, Python dependencies, or endpoint. See [PACKAGE_MCP.md](PACKAGE_MCP.md).

Legacy `CODEX_TTS_BACKEND` and `CODEX_TTS_FALLBACK_BACKEND` environment
variables remain supported for compatibility.

## Development

```powershell
python -m pip install -e ".[dev]"
python -m unittest discover -s tests -v
python scripts/validate_release.py --online
python -m build
python -m twine check dist/*
```

See [MCP_PUBLIC_RELEASE.md](MCP_PUBLIC_RELEASE.md) for the full release process.

## Standards

- MCP transport: `stdio`
- MCP tool implementation: official Python MCP SDK
- Registry metadata: `server.json` using the 2025-12-11 schema
- Package registry: PyPI
- Registry ownership marker: this README's `mcp-name` comment
- Registry namespace: `io.github.engr-faizanali/text-to-speech`

Official references:

- [MCP tools specification](https://modelcontextprotocol.io/specification/2025-11-25/server/tools)
- [MCP Registry publishing quickstart](https://modelcontextprotocol.io/registry/quickstart)
- [MCP Registry package types](https://modelcontextprotocol.io/registry/package-types)
- [MCP Registry repository](https://github.com/modelcontextprotocol/registry)

## License

MIT. See [LICENSE](LICENSE).
