Open-source • Self-hosted • Apache-2.0

Voice agents that actually ship.

OpenVox is the open platform for building, testing, and deploying production-grade voice agents. Every layer is swappable. Self-hosted glue — your providers see audio + text, but no OpenVox cloud sits in the loop.

🎙 Talk to the Setup Assistant — it'll build your first agent for you. Or hop straight to the dashboard.

Powered byBytePlus Seed-2.0ElevenLabsDeepgramOpenAI GPT-5Anthropic ClaudeGoogle GeminiDeepSeekCartesiaAssemblyAITwilioTelegram
29
Templates
7
Languages
41
BytePlus voices
30+
Built-in skills
14
Providers
8
Channels

Pre-built templates, ready to launch

29 production blueprints across 7 languages. Customise the prompt, plug your skills, ship.

E-commerce support

Order lookups, returns, stock checks. Voice-first.

Education tutor

Science and math, with worked examples.

Stock analyst

Live quotes and indicator-driven analysis.

Voice analyzer

Sentiment, profanity, and call-quality QA.

Receptionist

Appointment booking with conflict detection — voice or phone.

SDR / outbound sales

BANT-qualifies leads, books demos, hands off to humans.

Multilingual hotline

Auto-detects EN, ZH, ES, ID, FR + more — voice swaps per language.

Document Q&A

RAG over your PDFs + docs. Voice-in, voice-out. BM25 fallback when embeddings 404.

Email Assistant

Gmail MCP wired — summarise inbox, draft replies by voice.

Calendar Scheduler

Google Calendar MCP — book, reschedule, find slots without typing.

Everything you need, none of the lock-in

Pick the providers that fit. Build skills in plain Python. Deploy to a single laptop or a kubernetes cluster — same code path.

Sub-300ms first audio · <100ms interrupt

Sentence-level streaming pipeline + Silero VAD. Measured P50=58ms, P95=121ms on interrupt.

Build by voice

Talk to the Setup Assistant — it picks a template, fills your prompt, attaches skills, publishes. No form-filling.

Eval framework

Synthetic personas (paranoid, angry, ESL) spar against your agent. Replay real calls. Catch regressions in CI.

Pluggable providers

14 providers across LLM / STT / TTS / VAD. Swap any layer per-agent — even mid-call.

Every channel

Browser RTC, Twilio (in + out), WhatsApp, Telegram, WeChat Work, Lark. One agent, eight surfaces.

Skills, MCP, hot-reload

30+ built-in skills. 8 MCP catalogue servers (Slack, Gmail, Calendar, GitHub, HubSpot, …). Drop a .py — auto-reloads.

Transparent cost calculator

Cited rate card per provider. Per-session breakdown. What-if matrix shows you the cheapest combo for the call you just made.

Self-hosted, no cloud middle-man

Runs on your laptop or your cluster. SQLite + filesystem out of the box. Postgres + S3/TOS when you scale.

GDPR-aware

Configurable retention, regional residency, transcript-only mode, PII masking.

One config. Every provider.

.env files supply credentials. Swap providers per-agent at runtime — even mid-call.

BytePlus
OpenAI
Anthropic
Gemini
DeepSeek
ElevenLabs
Deepgram
AssemblyAI
Cartesia
Whisper
Silero VAD
Twilio
WhatsApp
Telegram
WeChat Work
Lark

Ready to build?

Run docker compose up and you're live.