Metadata-Version: 2.4
Name: iflow-mcp_digitarald-chatterbox-mcp
Version: 1.0.0
Summary: A simplified Model Context Protocol server that provides text-to-speech generation with automatic playback using the Chatterbox TTS model
Requires-Python: >=3.8
Requires-Dist: chatterbox-tts
Requires-Dist: mcp>=0.9.0
Requires-Dist: torch>=2.0.0
Requires-Dist: torchaudio>=2.0.0
Description-Content-Type: text/markdown

# Chatterbox TTS MCP Server

A simplified Model Context Protocol (MCP) server that provides text-to-speech generation with automatic playback using the Chatterbox TTS model. The server loads the model automatically on first use and provides real-time progress notifications to keep users informed throughout the process.

## Overview

This MCP server exposes Chatterbox TTS functionality through a single, streamlined tool that generates speech from text and plays it automatically. The server handles model loading, progress reporting, temporary file management, and audio playback seamlessly.

## Features

### Single Tool: `speak_text`

The `speak_text` tool provides complete text-to-speech functionality:

- **Parameters:**
  - `text` (required): The text to convert to speech
  - `exaggeration` (optional): Controls expressiveness (0.0-1.0, default 0.5)
  - `cfg_weight` (optional): Controls classifier-free guidance (0.0-1.0, default 0.5)

- **Features:**
  - Automatic model loading with progress notifications
  - Generates speech using temporary files (auto-cleanup)
  - Plays audio automatically on macOS using `afplay`
  - Real-time progress updates during all phases:
    - Model initialization and loading
    - Speech generation
    - Audio playback

### Resource: `chatterbox://model-info`

Get information about the TTS model status and device capabilities:
- Model loading status (loaded/not loaded)
- Device information (MPS/CUDA/CPU)
- Hardware acceleration availability

## Progress Notifications

The server provides detailed progress notifications throughout the speech generation process:

1. **Model Loading Phase:**
   - "Loading Chatterbox TTS model..."
   - "Initializing PyTorch device..."
   - "Loading model weights..."
   - "Model loaded successfully!"

2. **Speech Generation Phase:**
   - "Starting speech generation..."
   - "Speech generated, saving to temporary file..."

3. **Playback Phase:**
   - "Playing audio..."
   - "Audio playback completed!"

4. **Status Updates:**
   - Device selection (MPS/CUDA/CPU)
   - Voice prompt usage when applicable
   - Success/error messages

## Installation

1. **Install dependencies:**
   ```bash
   pip install mcp torch torchaudio
   ```

2. **Install Chatterbox TTS:**
   Follow the Chatterbox TTS installation instructions to ensure the `chatterbox.tts` module is available.

## Configuration

### Audio File Storage

By default, the server stores audio files in `~/.chatterbox/audio`. You can configure a custom location using:

**Command line argument:**
```bash
python chatterbox_mcp_server.py --audio-dir /path/to/custom/audio/directory
```

**Environment variable:**
```bash
export CHATTERBOX_AUDIO_DIR="/path/to/custom/audio/directory"
python chatterbox_mcp_server.py
```

**Priority order:**
1. Command line `--audio-dir` argument (highest priority)
2. `CHATTERBOX_AUDIO_DIR` environment variable
3. Default: `~/.chatterbox/audio` (lowest priority)

### Audio File TTL (Time To Live)

By default, audio files are automatically cleaned up after 1 hour. You can configure a custom TTL:

**Command line argument:**
```bash
python chatterbox_mcp_server.py --audio-ttl-hours 24  # Keep files for 24 hours
```

**Environment variable:**
```bash
export CHATTERBOX_AUDIO_TTL_HOURS=24
python chatterbox_mcp_server.py
```

**Priority order:**
1. Command line `--audio-ttl-hours` argument (highest priority)
2. `CHATTERBOX_AUDIO_TTL_HOURS` environment variable
3. Default: 1 hour (lowest priority)

### Model Auto-Loading

By default, the TTS model is loaded on first use to minimize startup time. You can pre-load it at startup:

**Command line argument:**
```bash
python chatterbox_mcp_server.py --auto-load-model
```

This will load the model during server startup, which takes a few seconds but ensures the first TTS request is faster.

**Audio Storage Features:**
- Audio files are stored persistently with configurable automatic cleanup
- Files are accessible via `chatterbox://audio/{resource_id}` resources
- Directory is created automatically if it doesn't exist
- Supports relative paths (will be expanded) and `~` home directory notation

## Usage

### Running the Server

**Standalone:**
```bash
python chatterbox_mcp_server.py
```

**With MCP tools:**
```bash
mcp dev chatterbox_mcp_server.py
```

### Integration with Claude Desktop

Add to your Claude Desktop MCP configuration:

**Basic configuration:**
```json
{
  "mcpServers": {
    "chatterbox-tts": {
      "command": "python",
      "args": ["/path/to/chatterbox_mcp_server.py"],
      "env": {}
    }
  }
}
```

**With custom configuration:**
```json
{
  "mcpServers": {
    "chatterbox-tts": {
      "command": "python",
      "args": [
        "/path/to/chatterbox_mcp_server.py", 
        "--audio-dir", "/custom/audio/path",
        "--auto-load-model",
        "--audio-ttl-hours", "24"
      ],
      "env": {
        "CHATTERBOX_AUDIO_DIR": "/custom/audio/path",
        "CHATTERBOX_AUDIO_TTL_HOURS": "24"
      }
    }
  }
}
```

### Example Usage from LLM

1. **Basic text-to-speech:**
   ```
   Please use the speak_text tool to say "Hello, welcome to the Chatterbox TTS demonstration!"
   ```

2. **Expressive speech:**
   ```
   Use speak_text to generate enthusiastic speech for "This is amazing!" with high expressiveness
   ```

The tool will automatically:
- Load the model if needed (with progress updates)
- Generate the speech
- Play the audio
- Clean up temporary files
- Provide status updates throughout

## Technical Details

### Device Support
- **Apple Silicon (M1/M2/M3/M4):** Uses MPS acceleration when available
- **NVIDIA GPUs:** Uses CUDA when available  
- **CPU fallback:** Works on any system

### Audio Processing
- Uses temporary files for audio storage
- Automatic cleanup after playback
- WAV format output
- High-quality audio generation

### Model Management
- Model loads once on first use
- Shared across all subsequent requests
- Thread-safe loading with progress tracking
- Automatic device detection and optimization

## File Structure

```
chatterbox-mcp/
├── chatterbox_mcp_server.py    # MCP server implementation
└── README.md                   # This documentation
```

## Development

### Key Improvements in This Version

1. **Simplified Interface:** Single `speak_text` tool instead of multiple tools
2. **Automatic Playback:** No need to manually play generated files
3. **Progress Notifications:** Real-time updates on model loading and generation
4. **Persistent Audio Storage:** Audio files are stored with configurable automatic cleanup
5. **Better Error Handling:** Comprehensive error reporting and recovery
6. **Streamlined Workflow:** One command generates and plays speech

## Troubleshooting

**Common Issues:**

1. **Model loading slow:**
   - First-time loading downloads model weights
   - Progress notifications show current status
   - Subsequent uses are much faster

2. **Audio playback issues:**
   - `afplay` command is macOS-specific
   - Ensure system audio is working
   - Check volume settings

3. **Memory issues:**
   - Model requires significant GPU/CPU memory
   - Monitor system resources during loading
   - Consider closing other applications

4. **Device selection:**
   - Server automatically selects best available device
   - Check model info resource for current device
   - MPS (Apple Silicon) > CUDA (NVIDIA) > CPU

## License

This MCP server implementation follows the same license as the underlying Chatterbox TTS model.
