Metadata-Version: 2.4
Name: zrt
Version: 0.0.1b1
Summary: Build real-time AI voice agents in Python. Zero Runtime runs the speech-to-speech pipeline (STT, LLM, TTS) for you.
Author-email: Zujo Tech Pvt Ltd <support@videosdk.live>
License-Expression: LicenseRef-Proprietary
Project-URL: Homepage, https://zeroruntime.ai/
Project-URL: Examples, https://github.com/ZeroRuntimeAI/zrt-python-sdk-examples
Keywords: voice-agents,voice-ai,ai-voice-agent,conversational-ai,voice-assistant,speech-to-speech,realtime-voice,voicebot,llm,stt,tts,speech-to-text,text-to-speech,telephony,sip,webrtc,zero-runtime
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: grpcio>=1.81.0
Requires-Dist: protobuf>=6.31.0
Requires-Dist: aiohttp>=3.9.0
Provides-Extra: vision
Requires-Dist: Pillow>=10.0; extra == "vision"
Provides-Extra: dev
Requires-Dist: pytest>=8.4; extra == "dev"
Requires-Dist: pytest-asyncio>=0.24; extra == "dev"
Requires-Dist: grpcio-tools>=1.81.0; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=6.0; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Dynamic: license-file

# ZRT — Zero Runtime Python SDK

**Build real-time AI voice agents in Python — without running the infrastructure.**
You write the agent (instructions, tools, logic); **Zero Runtime** runs the live
speech-to-speech pipeline — speech-to-text → LLM → text-to-speech, with turn
detection, denoising, and interruptions — at low latency in the cloud.

> **Write the agent. We run the runtime.**

## A different kind of voice SDK

Most voice frameworks make you run the hard part — media servers, GPUs, turn-taking,
autoscaling. No-code platforms hide all that but lock you into a dashboard.
**Zero Runtime is the middle:** real Python and your own providers, with none of the
real-time infrastructure to operate.

| | Self-hosted frameworks | No-code platforms | **Zero Runtime** |
|---|:---:|:---:|:---:|
| Write real Python + custom tools | ✅ | ❌ (dashboard) | ✅ |
| Run media servers / GPUs / scaling | ❌ *you run it* | ✅ managed | ✅ managed |
| Swap any STT / LLM / TTS provider | ✅ | limited | ✅ |
| Low-latency speech-to-speech | you tune it | managed | managed |

## Requirements

- Python **3.11+**
- A ZRT runtime endpoint + auth token (from your Zero Runtime account)
- API key(s) for the providers you use (e.g. Deepgram, Google, Cartesia)

## Install

```bash
pip install --pre zrt
```

> Public beta — `--pre` is required until the stable release.

## Quickstart

**1. Set your environment**

```bash
export ZRT_RUNTIME_ADDRESS=us1.rt.zeroruntime.ai:443   # your ZRT runtime
export ZRT_AUTH_TOKEN=<your-token>

export DEEPGRAM_API_KEY=<key>    # speech-to-text
export GOOGLE_API_KEY=<key>      # the LLM (Gemini)
export CARTESIA_API_KEY=<key>    # text-to-speech
```

**2. Write your agent** — `agent.py`

```python
from zrt.agents import (
    Agent, AgentSession, Pipeline, WorkerJob, JobContext, RoomOptions,
    EOUConfig, InterruptConfig,
)
from zrt.plugins.deepgram import DeepgramSTT
from zrt.plugins.google import GoogleLLM
from zrt.plugins.cartesia import CartesiaTTS
from zrt.plugins.silero import SileroVAD
from zrt.plugins.turn_detector import NamoTurnDetectorV1
from zrt.plugins.rnnoise import RNNoise

IGNORE_PATTERNS = [r"\b(uh+|um+)\b"]   # filler words to drop from transcripts


class Assistant(Agent):
    def __init__(self):
        super().__init__(instructions="You are a friendly voice assistant. Keep replies short.")

    async def on_enter(self):
        await self.session.say("Hi! How can I help?")

    async def on_exit(self):
        pass


async def entrypoint(ctx: JobContext):
    session = AgentSession(
        agent=Assistant(),
        pipeline=Pipeline(
            stt=DeepgramSTT(),
            llm=GoogleLLM(
                model="gemini-2.5-flash",
                thinking_budget=0,
                include_thoughts=False,
                max_output_tokens=8192,
            ),
            tts=CartesiaTTS(),
            vad=SileroVAD(threshold=0.4),
            turn_detector=NamoTurnDetectorV1(language="en", threshold=0.8),
            denoise=RNNoise(),
            eou_config=EOUConfig(mode="ADAPTIVE", min_max_speech_wait_timeout=[0.1, 0.3]),
            interrupt_config=InterruptConfig(
                interrupt_min_duration=0.5,
                interrupt_min_words=2,
                resume_on_false_interrupt=True,
            ),
            stt_filter_patterns=IGNORE_PATTERNS,
            stt_word_substitutions={"recording": "", "recorded": ""},
        ),
    )
    await session.start(wait_for_participant=True, run_until_shutdown=True)


if __name__ == "__main__":
    WorkerJob(
        entrypoint=entrypoint,
        jobctx=lambda: JobContext(room_options=RoomOptions(name="Assistant")),
    ).start()
```

**3. Run it**

```bash
python agent.py
```

That's it — speech in → your agent → speech out, in real time.

## How it works

| Piece | What it is |
|---|---|
| **`Agent`** | Your behavior — instructions, tools, what it says on enter/exit. |
| **`Pipeline`** | The voice stack: STT (hear) → LLM (think) → TTS (speak), plus VAD, turn detection, and denoising. |
| **`WorkerJob`** | Runs your agent and connects it to Zero Runtime. |

## Give your agent tools

Let the LLM call your Python functions — just decorate them:

```python
from zrt.agents import function_tool

@function_tool
async def get_weather(city: str) -> dict:
    """Get the weather for a city.

    Args:
        city: City name
    """
    return {"city": city, "temp_c": 22}

# then pass them to your agent:
#   super().__init__(instructions="...", tools=[get_weather])
```

Your tool runs in your worker; the runtime calls it when the LLM decides to.

## Providers

Mix and match — bring the best model for each stage, swap any one in a line:

- **Speech-to-text (STT):** Deepgram, AssemblyAI, Google, Azure, Gladia, NVIDIA, Sarvam
- **LLM:** OpenAI, Google Gemini, Anthropic Claude, Groq, Cerebras, xAI Grok, Sarvam
- **Text-to-speech (TTS):** Cartesia, ElevenLabs, Google, AWS Polly, Azure, Deepgram, Rime, LMNT, Neuphonic, Hume AI, Inworld, Murf, Resemble, Smallest, Speechify, CambAI, NVIDIA
- **Realtime speech-to-speech:** OpenAI Realtime, Gemini Live, Ultravox, Azure Voice Live
- **Turn detection:** Namo · **VAD:** Silero · **Denoise:** RNNoise

```python
from zrt.plugins.elevenlabs import ElevenLabsTTS   # different TTS
from zrt.plugins.anthropic import AnthropicLLM      # different LLM
```

## Use cases

Phone & telephony agents, IVR replacement, customer-support voice bots, voice
assistants, outbound/inbound call automation, and any real-time conversational AI.

## FAQ

**How is this different from a voice-agent framework?**
* Frameworks make you host and scale the real-time runtime (media, GPUs, turn-taking).
ZRT runs that for you — you only write and deploy the agent.

**How is it different from a no-code voice platform?**
* You write real Python with your own tools, logic, and providers — not a dashboard
configuration. Full code control, zero infrastructure.

**Can I use my own STT / LLM / TTS providers?**
* Yes — mix any supported providers, and bring your own API keys.

**What do I need to run it?**
* A ZRT runtime endpoint + token and the provider keys for the stages you use.

## Examples

More complete examples: https://github.com/ZeroRuntimeAI/zrt-python-sdk-examples

## Contact
support@videosdk.live

Copyright © 2026 Zujo Tech Pvt Ltd. All rights reserved.
