Metadata-Version: 2.4
Name: inworld-tts
Version: 1.2.1
Summary: Inworld TTS SDK – generate, stream, and voice management
Author: Inworld AI
License-Expression: MIT
Project-URL: Homepage, https://github.com/inworld-ai/inworld-tts-python
Project-URL: Repository, https://github.com/inworld-ai/inworld-tts-python
Project-URL: Bug Tracker, https://github.com/inworld-ai/inworld-tts-python/issues
Keywords: inworld,tts,text-to-speech,voice,speech-synthesis,streaming,voice-cloning,voice-design
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: python-dotenv; extra == "dev"
Dynamic: license-file

# inworld-tts

[![PyPI version](https://img.shields.io/pypi/v/inworld-tts.svg)](https://pypi.org/project/inworld-tts/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

Python SDK for the Inworld TTS API — generate, stream, and manage voices.

**[API Reference](API_REFERENCE.md)** · **[Changelog](CHANGES.md)** · **[Platform](https://platform.inworld.ai)**

---

## Install

```bash
pip install inworld-tts
```

Requires Python 3.10+.

---

## Authentication

Pass your API key directly or set `INWORLD_API_KEY` in your environment:

```bash
export INWORLD_API_KEY=your_api_key
```

```python
from inworld_tts import InworldTTS

tts = InworldTTS()                        # reads INWORLD_API_KEY from env
tts = InworldTTS(api_key="your_api_key")  # or pass directly
```

Get your key at [platform.inworld.ai](https://platform.inworld.ai).

---

## Quickstart

```python
from inworld_tts import InworldTTS

tts = InworldTTS()
tts.generate("Hello, world!", voice="Dennis", output_file="hello.mp3")
```

---

## Models

| Model ID | Notes |
|----------|-------|
| `inworld-tts-2` | **Recommended.** Latest generation. Supports `delivery_mode` (`"STABLE"` / `"BALANCED"` / `"CREATIVE"`) for output variability; `temperature` is ignored. |
| `inworld-tts-1.5-max` | Previous generation. Higher quality. Default for `generate()` / `generate_with_timestamps()`. |
| `inworld-tts-1.5-mini` | Previous generation. Lower latency. Default for `stream()` / `stream_with_timestamps()`. |

Use `inworld-tts-2` for new applications — pass `model="inworld-tts-2"` to any of the synthesis methods. The 1.5 family remains available and is the default for backwards compatibility.

---

## Constructor

```python
tts = InworldTTS(
    api_key="your_key",
    timeout=120,                 # HTTP timeout in seconds (default: per-method)
    max_concurrent_requests=4,   # parallel chunk requests for long text (default: 2)
    max_retries=2,               # retry on network errors / 5xx with exponential backoff (default: 2)
    debug=True,                  # log requests, responses, and timing
)
```

See [Constructor](API_REFERENCE.md#constructor) in the API Reference for full parameter details and per-method timeout defaults.

---

## generate()

Synthesize speech from text of any length. Blocks until all audio is ready.

```python
# Save to file
tts.generate("Hello!", voice="Dennis", output_file="hello.mp3")

# Get bytes for further processing
audio = tts.generate("Hello!", voice="Dennis")

# Generate, save, and play
tts.generate("Hello!", voice="Dennis", output_file="hello.mp3", play=True)
```

## stream()

Async streaming — first audio chunk arrives faster than `generate()`. Max 2000 characters per call.

```python
import asyncio

async def main():
    async for chunk in tts.stream("Hello, world!", voice="Dennis"):
        pass  # process chunk (bytes) as it arrives

asyncio.run(main())
```

## Timestamps

`generate_with_timestamps()` and `stream_with_timestamps()` return word- or character-level timing alongside audio.

```python
result = tts.generate_with_timestamps("Hello, world!", voice="Dennis", timestamp_type="WORD")
wa = result["timestamps"]["wordAlignment"]
for word, start, end in zip(wa["words"], wa["wordStartTimeSeconds"], wa["wordEndTimeSeconds"]):
    print(f"{word}: {start:.2f}s – {end:.2f}s")
```

See [generate_with_timestamps()](API_REFERENCE.md#generate_with_timestamps) and [stream_with_timestamps()](API_REFERENCE.md#stream_with_timestamps) for full details.

---

## play()

Play audio from bytes or a file path. Encoding is auto-detected from magic bytes.

```python
audio = tts.generate("Hello!", voice="Dennis")
tts.play(audio)

tts.play("hello.mp3")               # file path also accepted
tts.play(pcm_bytes, encoding="PCM") # encoding hint required for raw PCM/ALAW/MULAW
```

See [play()](API_REFERENCE.md#play) for platform player details.

---

## list_voices()

List voices in your workspace, with optional language filter.

```python
voices = tts.list_voices()
voices = tts.list_voices(lang="EN_US")
voices = tts.list_voices(lang=["EN_US", "ES_ES"])
```

## get_voice()

Get details of a specific voice.

```python
voice = tts.get_voice("workspace__my_clone")
```

## update_voice()

Update a voice's display name, description, or tags.

```python
tts.update_voice("workspace__my_clone", display_name="Narrator", tags=["calm"])
```

## delete_voice()

Delete a voice from your workspace.

```python
tts.delete_voice("workspace__my_clone")
```

## clone_voice()

Clone a voice from one or more audio recordings (WAV/MP3).

```python
result = tts.clone_voice(["sample.wav"], display_name="My Clone")
voice_id = result["voice"]["voiceId"]
```

## design_voice()

Design a voice from a text description (no recording needed), then publish the preview.

```python
result = tts.design_voice(
    design_prompt="A warm, friendly narrator",
    preview_text="Hello, welcome to our audiobook.",
)
voice_id = result["previewVoices"][0]["voiceId"]
```

## publish_voice()

Publish a designed or cloned voice preview to your library.

```python
tts.publish_voice(voice_id, display_name="My Custom Voice")
```

## migrate_from_elevenlabs()

Migrate a voice from ElevenLabs to your Inworld workspace. No ElevenLabs SDK required.

```python
result = tts.migrate_from_elevenlabs("el_api_key", "el_voice_id")
print(result["elevenlabs_name"], "→", result["inworld_voice_id"])
```

See [Voice Management](API_REFERENCE.md#voice-management) in the API Reference for all parameters.

---

## Errors

| Exception | When |
|-----------|------|
| `MissingApiKeyError` | No API key found at construction |
| `ApiError` | API returned 4xx/5xx — has `.code` and `.details` |
| `NetworkError` | Connection or timeout failure |

All inherit from `InworldTTSError`.

```python
from inworld_tts import ApiError, MissingApiKeyError, NetworkError

try:
    audio = tts.generate("Hello!", voice="Dennis")
except MissingApiKeyError as e:
    print(f"Missing API key: {e}")
except ApiError as e:
    print(f"HTTP {e.code}: {e}")
except NetworkError as e:
    print(f"Network error: {e}")
```

---

## CLI

The API key is read from `INWORLD_API_KEY` or passed with `--api-key`. Voice defaults to `Dennis`; use `--voice` to choose another. Run `inworld-tts --help` for all options.

```bash
# synthesize text (voice defaults to Dennis)
inworld-tts "Hello, world!" -o hello.mp3

# choose a voice
inworld-tts "Hello" -o hello.mp3 --voice Sarah

# read from a text file (any length)
inworld-tts story.txt -o story.mp3 --voice Dennis

# choose a model (inworld-tts-2 is the recommended latest generation)
inworld-tts "Hello" -o hello.mp3 --voice Dennis --model inworld-tts-2

# stream (lower latency to first audio)
inworld-tts "Hello" -o hello.mp3 --voice Dennis --stream

# play audio immediately (no output file needed)
inworld-tts "Hello world" --voice Dennis --play

# save and play
inworld-tts story.txt --voice Dennis --play -o story.mp3

# other formats
inworld-tts "Hello" -o hello.wav --voice Dennis --encoding WAV

# audio quality options
inworld-tts "Hello" -o hello.mp3 --voice Dennis --bit-rate 192000
inworld-tts "Hello" -o hello.wav --voice Dennis --encoding LINEAR16 --sample-rate 44100
```

### List voices (CLI)

```bash
inworld-tts list-voices
inworld-tts list-voices --lang EN_US
```

### Migrate from ElevenLabs (CLI)

```bash
inworld-tts migrate-from-elevenlabs --elevenlabs-key el_... --voice-id abc123

# preview first (no cloning)
inworld-tts migrate-from-elevenlabs --elevenlabs-key el_... --voice-id abc123 --dry-run
```

---

## Examples

Runnable examples are in the [`examples/`](examples/) directory:

| File | What it shows |
|------|---------------|
| [`hello_world.py`](examples/hello_world.py) | Text → MP3 in 3 lines |
| [`stream_audio.py`](examples/stream_audio.py) | Real-time streaming — play each chunk as it arrives |
| [`list_voices.py`](examples/list_voices.py) | List all available voices, with optional language filter |
| [`clone_voice.py`](examples/clone_voice.py) | Clone a voice from a WAV/MP3 recording |
| [`design_voice.py`](examples/design_voice.py) | Design a voice from a text description, preview, and publish |
| [`generate_timestamps.py`](examples/generate_timestamps.py) | Word-level timestamps — print each word's start/end time |
| [`stream_timestamps.py`](examples/stream_timestamps.py) | Per-chunk timestamps while streaming |

---

## Troubleshooting

### `MissingApiKeyError` / `ApiError` 401

Set `INWORLD_API_KEY` or pass `api_key=` directly. If the key is set but rejected, regenerate it at [platform.inworld.ai](https://platform.inworld.ai).

### `stream()` requires `async`

`stream()` is an async generator — call it inside an `async` function:

```python
import asyncio

async def main():
    async for chunk in tts.stream("Hello", voice="Dennis"):
        ...

asyncio.run(main())
```


## License

[MIT](LICENSE)
