Metadata-Version: 2.4
Name: polytts
Version: 0.1.0
Summary: A unified interface for multiple TTS providers requiring minimal code changes for hotswapping.
Project-URL: Homepage, https://github.com/spava002/PolyTTS.git
Keywords: tts,text-to-speech,speech-synthesis,audio,voice
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2.0
Requires-Dist: python-dotenv
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: filetype; extra == "dev"
Requires-Dist: mutagen; extra == "dev"
Requires-Dist: pydub; extra == "dev"
Provides-Extra: openai
Requires-Dist: openai; extra == "openai"
Provides-Extra: elevenlabs
Requires-Dist: elevenlabs; extra == "elevenlabs"
Provides-Extra: fishaudio
Requires-Dist: fish-audio-sdk; extra == "fishaudio"
Provides-Extra: kokoro
Requires-Dist: kokoro; extra == "kokoro"
Provides-Extra: all
Requires-Dist: openai; extra == "all"
Requires-Dist: elevenlabs; extra == "all"
Requires-Dist: fish-audio-sdk; extra == "all"
Requires-Dist: kokoro; extra == "all"
Dynamic: license-file

# PolyTTS

**A unified Python interface for multiple Text-to-Speech providers with seamless hotswapping.**

PolyTTS wraps various TTS providers behind a unified API:

- **Hotswapping Providers**: Change providers by changing a single line
- **Audio Conversions**: Automatic conversion between bytes and numpy arrays
- **Cloud & Local Support**: Use cloud APIs or run models locally

All providers return the same `AudioData` object with consistent conversion methods, so your downstream code stays the same regardless of which TTS you're using.

## Installation

```bash
# Basic installation
pip install polytts

# With cloud/api providers
pip install polytts[openai]
pip install polytts[elevenlabs]
pip install polytts[fishaudio]

# With local providers
pip install polytts[kokoro]

# With all providers
pip install polytts[all]

# Note: URL dependencies (like GPT-SoVITS) must be installed separately:
pip install git+https://github.com/spava002/GPT-SoVITS-Streaming.git
```

## Development Setup

For contributors and developers:

```
# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest
```

## Provider Requirements

### Cloud Providers
- **OpenAI**: Requires `OPENAI_API_KEY` environment variable or `api_key` parameter
- **ElevenLabs**: Requires `ELEVENLABS_API_KEY` environment variable or `api_key` parameter  
- **Fish Audio**: Requires `FISHAUDIO_API_KEY` environment variable or `api_key` parameter

### Local Providers
- **Kokoro**: 
  - Requires Python 3.9-3.12 (Python 3.13 not yet supported by kokoro)
  - Models download automatically on first use
- **GPT-SoVITS**: 
  - Original implementation doesn't support an installable package, so must be installed with a custom package
  - Must be installed manually: `pip install git+https://github.com/spava002/GPT-SoVITS-Streaming.git`
  - Requires a reference audio file (default samples included in package)

## Quick Example

```python
import soundfile as sf
from polytts import ElevenLabsTTS, KokoroTTS

# Start with a cloud provider
tts = ElevenLabsTTS(api_key="your-api-key")
audio = tts.run("Hello, world!")
sf.write("output.wav", audio.as_numpy(), audio.sample_rate)

# Switch to a local model - just change one line!
tts = KokoroTTS()
audio = tts.run("Hello, world!")
sf.write("output.wav", audio.as_numpy(), audio.sample_rate)
```

## Supported Providers

| Provider | Type | Streaming | Voice Cloning |
|----------|------|-----------|---------------|
| OpenAI | Cloud | ✅ | ❌ |
| ElevenLabs | Cloud | ✅ | ✅ |
| Fish Audio | Cloud | ✅ | ✅ |
| Kokoro | Local | ✅ | ❌ |
| GPT-SoVITS | Local | ✅ | ✅ |

## Examples

Check out the [`examples/`](examples/) directory for complete working examples:

- **[`synthesis.py`](examples/synthesis.py)** - Basic text-to-speech generation
- **[`stream_synthesis.py`](examples/stream_synthesis.py)** - Real-time streaming audio
- **[`audio_data.py`](examples/audio_data.py)** - Working with AudioData conversions

## AudioData API

All providers return an `AudioData` object that makes format conversion trivial:

```python
audio = tts.run("Hello!")

# Access metadata
print(audio.data)
print(audio.sample_rate)
print(audio.encoded_format)
print(audio.dtype)
print(audio.duration)

# Convert formats
numpy_array = audio.as_numpy("float32")
pcm_bytes = audio.as_bytes("pcm")
wav_bytes = audio.as_bytes("wav")
```

`AudioData` supports conversion between: 
- Common byte formats: `pcm`, `wav`, `mp3`
- Common `numpy array` dtypes: `int16`, `int32`, `float16`, `float32`

## Contributing

Contributions are welcome! Whether it's:

- Bug fixes
- Documentation improvements
- **New TTS providers**

To add a new provider, check out the existing implementations in `polytts/`
Feel free to open an issue or submit a pull request.

## License

MIT License - see [LICENSE](LICENSE) for details.
