Metadata-Version: 2.4
Name: voiceeval-sdk
Version: 0.1.9
Summary: Enterprise-grade Observability and Evaluation SDK for Voice Agents
Project-URL: Repository, https://github.com/voiceeval/voiceeval-sdk
Project-URL: Homepage, https://voiceeval.com
Author-email: VoiceEval Team <hello@voiceeval.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: httpx>=0.24.0
Requires-Dist: opentelemetry-api>=1.20.0
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.20.0
Requires-Dist: opentelemetry-instrumentation-anthropic>=0.40b0
Requires-Dist: opentelemetry-instrumentation-google-generativeai>=0.40b0
Requires-Dist: opentelemetry-instrumentation-openai>=0.40b0
Requires-Dist: opentelemetry-sdk>=1.20.0
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=0.20.0
Description-Content-Type: text/markdown

# VoiceEval SDK (Python)

[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)
[![OpenTelemetry](https://img.shields.io/badge/OpenTelemetry-Native-purple)](https://opentelemetry.io/)

**VoiceEval** is an enterprise-grade observability and evaluation SDK for Voice Agents and LLM-powered applications. Built on OpenTelemetry, it provides zero-config auto-instrumentation with detailed tracing, latency breakdown, and cost analysis.

## Key Features

- **Zero-Config Auto-Instrumentation**: Automatically traces calls from major LLM providers (OpenAI, Anthropic, Google Gemini) and LiveKit Agents — no code changes needed.
- **LiveKit Native**: Automatically integrates with LiveKit's tracing infrastructure. Just initialize the Client and all agent spans are captured.
- **Selective Monitoring**: Control which calls are traced with `auto_monitor`, `sample_rate`, `monitor_call()`, and `skip_call()`.
- **High Performance**: Built on OpenTelemetry with async batch exports (OTLP/HTTP), ensuring negligible runtime overhead.

## Installation

```bash
pip install voiceeval-sdk
# or
uv add voiceeval-sdk
```

## Quickstart

### 1. Initialize the Client

Add a single `Client(...)` call at the top of your agent file. This sets up OTel tracing and auto-instruments all installed LLM libraries and LiveKit.

```python
from voiceeval import Client

client = Client(
    api_key="your_voiceeval_api_key",   # or set VOICE_EVAL_API_KEY env var
    agent_name="my-booking-agent",      # identifies this agent in the dashboard
)
```

### 2. LiveKit Agent Example

```python
from livekit.agents import Agent, AgentSession, JobContext, cli
from voiceeval import Client

# Initialize VoiceEval — auto-instruments all LLM calls and LiveKit spans
client = Client(
    api_key="your_voiceeval_api_key",
    agent_name="my-booking-agent",
)

class MyAgent(Agent):
    def __init__(self):
        super().__init__(instructions="You are a helpful voice assistant.")

@server.rtc_session(agent_name="my-agent")
async def entrypoint(ctx: JobContext):
    session = AgentSession(
        stt=...,
        llm=...,
        tts=...,
    )
    await session.start(agent=MyAgent(), room=ctx.room)
    await ctx.connect()
```

### 3. Standalone LLM Example

Works without LiveKit too — any OpenAI/Anthropic/Gemini calls are automatically traced:

```python
from voiceeval import Client
from openai import OpenAI

client = Client(api_key="your_voiceeval_api_key")

openai_client = OpenAI()
response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello world"}]
)
# Trace is automatically captured and exported
```

## Client Options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `api_key` | `str` | `VOICE_EVAL_API_KEY` env var | Your VoiceEval API key |
| `base_url` | `str` | `https://api.voiceeval.com/v1/traces` | VoiceEval ingestion endpoint |
| `agent_name` | `str` | `None` | Agent identifier shown in the dashboard |
| `auto_monitor` | `bool` | `True` | Monitor all calls automatically |
| `sample_rate` | `float` | `1.0` | Fraction of calls to monitor (0.0 to 1.0) |
| `span_post_processors` | `list` | `None` | Custom span post-processing functions |

## Selective Monitoring

By default, every call is monitored (`auto_monitor=True`). You can control this at the client level or per-call.

### Sample a fraction of calls

```python
client = Client(
    api_key="your_voiceeval_api_key",
    agent_name="my-booking-agent",
    sample_rate=0.1,  # Randomly monitor 10% of calls
)
```

### Skip specific calls

With the default `auto_monitor=True`, all calls are monitored. Use `skip_call()` inside your session handler to opt out a specific call:

```python
from voiceeval import Client, skip_call

client = Client(
    api_key="your_voiceeval_api_key",
    agent_name="my-booking-agent",
)

@server.rtc_session(agent_name="my-agent")
async def entrypoint(ctx: JobContext):
    # Decide based on room metadata, participant info, etc.
    if ctx.room.name.startswith("internal-"):
        skip_call()  # This call won't be monitored or evaluated

    session = AgentSession(stt=..., llm=..., tts=...)
    await session.start(agent=MyAgent(), room=ctx.room)
    await ctx.connect()
```

### Monitor only specific calls

Set `auto_monitor=False` so no calls are monitored by default, then use `monitor_call()` to opt in:

```python
from voiceeval import Client, monitor_call

client = Client(
    api_key="your_voiceeval_api_key",
    agent_name="my-booking-agent",
    auto_monitor=False,
)

@server.rtc_session(agent_name="my-agent")
async def entrypoint(ctx: JobContext):
    # Only monitor production calls, not test rooms
    if not ctx.room.name.startswith("test-"):
        monitor_call()  # This call will be traced and evaluated

    session = AgentSession(stt=..., llm=..., tts=...)
    await session.start(agent=MyAgent(), room=ctx.room)
    await ctx.connect()
```

When a call is skipped (or not opted in), spans still flow to Langfuse for the dashboard but won't create backend records or trigger evaluations.

## Manual Tracing (Optional)

For non-LLM functions like business logic or RAG pipelines, use the `@observe` decorator:

```python
from voiceeval import observe

@observe(name_override="rag_retrieval")
def retrieve_documents(query: str):
    # Your logic here
    return docs
```

## License

MIT
