{% extends "base.html" %} {% block title %}TTSFM {{ _('docs.title') }}{% endblock %} {% block extra_css %} {% endblock %} {% block content %}
TTSFM is a free Text-to-Speech API service that provides OpenAI-compatible endpoints using the openai.fm service. It supports multiple voices, audio formats, speed adjustment, and automatic text splitting for long content.
{{ request.url_root }}
TTSFM provides two Docker image variants to suit different needs:
dbcccc/ttsfm:latest
Includes:
Size: ~200MB
dbcccc/ttsfm:slim
Includes:
Size: ~100MB
/api/capabilities endpoint to check available features.
API key authentication is optional and disabled by default. When enabled, include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Set REQUIRE_API_KEY=true environment variable to enable API key protection.
TTSFM provides a drop-in replacement for OpenAI's Text-to-Speech API. Use the /v1/audio/speech endpoint with the same request format.
Generate speech from text using OpenAI-compatible format.
{
"model": "tts-1",
"input": "Hello, world!",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}
model (string): Model ID (any value accepted, uses openai.fm)input (string, required): Text to convert to speechvoice (string, required): Voice ID (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse)response_format (string): Audio format (mp3, wav, opus, aac, flac, pcm). Default: mp3speed (number): Playback speed 0.25-4.0 (requires ffmpeg). Default: 1.0Returns audio file with appropriate Content-Type header.
Content-Type: MIME type of the audio formatX-Requested-Speed: The speed value requestedX-Speed-Applied: Whether speed adjustment was applied (true/false)X-Chunks-Combined: Number of chunks combined (for long text)Check which features are available in the current Docker image variant using the capabilities endpoint.
Get system capabilities and available features.
{
"ffmpeg_available": true,
"image_variant": "full",
"features": {
"speed_adjustment": true,
"format_conversion": true,
"mp3_auto_combine": true,
"basic_formats": true
},
"supported_formats": ["mp3", "wav", "opus", "aac", "flac", "pcm"]
}
Adjust audio playback speed from 0.25x (slower) to 4.0x (faster). This feature requires ffmpeg and is only available in the full Docker image.
0.25 - 4x slower0.5 - 2x slower1.0 - Normal speed (default)1.5 - 1.5x faster2.0 - 2x faster4.0 - 4x fastercurl -X POST {{ request.url_root }}v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello!",
"voice": "alloy",
"speed": 1.5
}' --output speech.mp3
TTSFM supports 6 audio formats with real ffmpeg-based conversion for high-quality output.
| Format | MIME Type | Availability | Description |
|---|---|---|---|
mp3 |
audio/mpeg | Always | Direct from openai.fm, best compatibility |
wav |
audio/wav | Always | Direct from openai.fm, uncompressed |
opus |
audio/opus | Full Image | Converted from WAV, internet streaming |
aac |
audio/aac | Full Image | Converted from WAV, digital audio |
flac |
audio/flac | Full Image | Converted from WAV, lossless compression |
pcm |
audio/pcm | Full Image | Converted from WAV, raw samples at 24kHz |
TTSFM automatically handles long text by splitting it into chunks and combining the audio output. The openai.fm service has a limit of approximately 1000 characters per request.
max_length: Maximum characters per chunk (default: 1000)validate_length: Raise error if text exceeds limit (default: False)preserve_words: Split at word boundaries (default: True)auto_combine: Automatically combine chunks (default: True)Install the TTSFM Python package for easy integration into your Python applications.
pip install ttsfm
from ttsfm import TTSClient, Voice, AudioFormat
# Create client
client = TTSClient()
# Generate speech
response = client.generate_speech(
text="Hello, world!",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3,
speed=1.0
)
# Save to file
response.save_to_file("output.mp3")
Automatically split and combine long text:
# Auto-combine mode (single file output)
response = client.generate_speech(
text="Very long text...",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3,
auto_combine=True # Default
)
response.save_to_file("combined.mp3")
# Manual chunks mode (multiple files)
responses = client.generate_speech_long_text(
text="Very long text...",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3
)
for i, resp in enumerate(responses, 1):
resp.save_to_file(f"part_{i:03d}.mp3")
from ttsfm import AsyncTTSClient, Voice, AudioFormat
import asyncio
async def main():
client = AsyncTTSClient()
response = await client.generate_speech(
text="Hello, async world!",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3
)
response.save_to_file("async_output.mp3")
asyncio.run(main())
Stream audio generation in real-time using WebSocket for better user experience with long text.
ws://{{ request.host }}/ws/generate
{
"text": "Your text here",
"voice": "alloy",
"format": "mp3",
"speed": 1.0
}
start: Generation startedchunk: Audio chunk ready (base64 encoded)complete: All chunks senterror: Error occurredTTSFM provides clear error messages with helpful hints for troubleshooting.
| Code | Description | Solution |
|---|---|---|
ffmpeg_required |
Feature requires ffmpeg (not available in slim image) | Use full Docker image: dbcccc/ttsfm:latest |
invalid_voice |
Voice ID not recognized | Use one of: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse |
invalid_format |
Audio format not supported | Use: mp3, wav, opus, aac, flac, or pcm |
invalid_speed |
Speed value out of range | Use value between 0.25 and 4.0 |
text_too_long |
Text exceeds maximum length | Enable auto_combine or split text manually |
{
"error": {
"message": "Format 'opus' requires ffmpeg. Available formats: mp3, wav",
"type": "feature_unavailable_error",
"code": "ffmpeg_required",
"hint": "Use the full Docker image (dbcccc/ttsfm:latest) instead of the slim variant.",
"available_formats": ["mp3", "wav"]
}
}