BABELVOX

Text-to-Speech Demo

Text-to-Speech

Emotion

Prosody Controls

1.0
0
1.0

Generate with Prosody

SSML Snippets

SSML Input

SSE Streaming

Audio chunks are streamed via Server-Sent Events and played in real-time using the Web Audio API.

WebSocket Streaming

Bidirectional real-time streaming with cancel support. Requires the WebSocket server (--ws-port).
Disconnected

Long-form Synthesis

Batch Synthesis

Synthesize multiple texts in a single request. Each item can have a different language and speaker.

Speaker Profiles

Speaker profiles are created from reference audio files on the server. Use the CLI to save speakers: babelvox --save-speaker alice --ref-audio voice.wav or POST to /speakers with a server-side audio path.

Voice Mixing (Python API)

Speaker mixing requires direct access to numpy embeddings and is available via the Python API. Mix two or more voices with weighted blending:
from babelvox.speakers import mix_speakers

# Load two speaker profiles
alice = tts.speaker_library.load("alice")
bob = tts.speaker_library.load("bob")

# Mix: 70% alice, 30% bob
mixed = mix_speakers(
    [alice.embedding, bob.embedding],
    [0.7, 0.3]
)

# Use the mixed voice
wav, sr = tts.generate("Hello!", speaker_embed=mixed)

Sampling Parameters

These parameters apply globally to Basic TTS, Prosody, SSML, and Batch tabs.
0.9
50
1.0
1.05
512

Connection

Request Log