Text-to-Speech
Emotion
Prosody Controls
Generate with Prosody
SSML Snippets
SSML Input
SSE Streaming
Audio chunks are streamed via Server-Sent Events and played in real-time using the Web Audio API.
WebSocket Streaming
Bidirectional real-time streaming with cancel support. Requires the WebSocket server (
--ws-port).
Disconnected
Long-form Synthesis
Batch Synthesis
Synthesize multiple texts in a single request. Each item can have a different language and speaker.
Speaker Profiles
Speaker profiles are created from reference audio files on the server.
Use the CLI to save speakers:
babelvox --save-speaker alice --ref-audio voice.wav
or POST to /speakers with a server-side audio path.
Voice Mixing (Python API)
Speaker mixing requires direct access to numpy embeddings and is available via the Python API.
Mix two or more voices with weighted blending:
from babelvox.speakers import mix_speakers
# Load two speaker profiles
alice = tts.speaker_library.load("alice")
bob = tts.speaker_library.load("bob")
# Mix: 70% alice, 30% bob
mixed = mix_speakers(
[alice.embedding, bob.embedding],
[0.7, 0.3]
)
# Use the mixed voice
wav, sr = tts.generate("Hello!", speaker_embed=mixed)
Sampling Parameters
These parameters apply globally to Basic TTS, Prosody, SSML, and Batch tabs.