ANE VoxCPM TTS Playground
Generate Speech
Create Voice
Text to Generate
Jittery Jack's jam jars jiggled jauntily, jolting Jack's jumbled jelly-filled jars joyously. Cindy's circular cymbals clanged cheerfully, clashing crazily near Carla's crashing crockery. You think you can just waltz in here and cause chaos? Well, I've got news for you.
Voice Selection (Optional)
(See Samples)
Manual prompt / zero-shot
Delete
Preset Voice Mode
Reference only (lower latency)
Reference + prompt WAV/text
High similarity (uses transcript)
Reference + prompt uses the selected voice as reference and the prompt WAV/text as continuation. High similarity uses cached prompt data when available.
Reference WAV Path (Optional)
?
Encoded as an isolated voice reference with reference-audio boundary tokens.
Prompt WAV Path (Optional Continuation)
?
Appended after the text span as continuation audio. Pair it with the matching prompt text.
Prompt Text (Required when using prompt WAV)
?
This text is transcribed from the 'Prompt WAV' and is used to condition the model on the voice's acoustic properties. It
must
be an accurate transcription of the prompt audio.
Max Length (~0.16s per unit)
Maximum generated audio steps (limited by LM cache)
CFG Value
Classifier-free guidance value (0.0-10.0)
Inference Timesteps
?
Controls the number of diffusion steps. A higher number (e.g., 20) is slower but may increase quality. A lower number (e.g., 5-10) is faster. This model works well with 10.
Number of inference steps (1-100)
Seed
?
Leave blank for a fresh random voice each request. Set a number to reproduce the same diffusion noise.
Optional reproducibility seed
Generate & Play
Generate Full Audio
Pause Playback
Stop Generation
Full Audio Playback
Create New Cached Voice
Compile a reference audio file into a reusable voice cache. Files must exist on the server.
Voice Name
Reference Audio Path (Server)
Prompt Transcription (Optional)
Replace if exists
Create Voice