Metadata-Version: 2.4
Name: voiceground
Version: 0.1.6b1
Summary: Observability framework for Pipecat voice and multimodal conversational AI
Project-URL: Homepage, https://github.com/poseneror/voiceground
Project-URL: Documentation, https://github.com/poseneror/voiceground#readme
Project-URL: Repository, https://github.com/poseneror/voiceground
Project-URL: Issues, https://github.com/poseneror/voiceground/issues
Author-email: Or Posener <posener.or@gmail.com>
License-Expression: BSD-2-Clause
License-File: LICENSE
Keywords: ai,conversational-ai,observability,pipecat,real-time,voice
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: pipecat-ai>=0.0.99
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: examples
Requires-Dist: aiohttp>=3.9; extra == 'examples'
Requires-Dist: loguru>=0.7; extra == 'examples'
Requires-Dist: pipecat-ai[elevenlabs,openai,silero]>=0.0.99; extra == 'examples'
Requires-Dist: pyaudio>=0.2.14; extra == 'examples'
Requires-Dist: python-dotenv>=1.0; extra == 'examples'
Description-Content-Type: text/markdown

# Voiceground

Observability framework for [Pipecat](https://github.com/pipecat-ai/pipecat) voice and multimodal conversational AI.

## Features

- **[Call Simulation](#call-simulation)**: Test your bots with dynamic, LLM-powered simulated users
- **[VoicegroundObserver](#voicegroundobserver)**: Track conversation events following Pipecat's Observer pattern

## Installation

```bash
pip install voiceground
```

Or with UV:

```bash
uv add voiceground
```

## Call Simulation

Voiceground includes a call simulation feature for testing your bots with dynamic, LLM-powered simulated users. Instead of manual testing, you can define user personas and goals, and let the simulator have realistic conversations with your bot.

![Voiceground Simulation](assets/VoicegroundSimulation.gif)

### Quick Start

```python
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.tts import OpenAITTSService
from voiceground.simulation import VoicegroundSimulation, VoicegroundSimulatorConfig

# Configure the simulated user
config = VoicegroundSimulatorConfig(
    llm=OpenAILLMService(api_key=openai_key, model="gpt-4o-mini"),
    tts=OpenAITTSService(api_key=openai_key, voice="echo"),
    stt=OpenAISTTService(api_key=openai_key),
    system_prompt="""
        You are a customer calling to book a restaurant table.
        Keep your answers short and let the other side lead the conversation.
        Your goal: Book a table for 2 people tomorrow at 7pm.
        Be natural and conversational.
    """,
    initiate_conversation=True,  # Simulator speaks first
    max_turns=3,
    timeout_seconds=45,
)

# Run simulation
async with VoicegroundSimulation(config) as simulation:
    await run_bot(transport=simulation.transport)

# Results available after context exits
print(simulation.results.transcript)
print(f"Turns: {simulation.results.turn_count}")
```

Your `run_bot` function just needs to accept a transport parameter, as a drop in replacement:

```python
async def run_bot(transport):
    # Use transport.input() and transport.output() - same as LocalAudioTransport!
    pipeline = Pipeline([
        transport.input(),
        stt, llm, tts,
        transport.output(),
    ])
    runner = PipelineRunner()
    await runner.run(PipelineTask(pipeline))
```

The simulation automatically handles turn limiting and timeouts - no extra configuration needed on the bot side.

**Note**: Simulations run faster than real-time because audio input/output is not buffered. This allows for rapid testing and iteration, but timing metrics may not reflect real-world performance characteristics.

### Architecture

```
┌───────────────────────────┐          ┌───────────────────────────┐
│   Simulator Pipeline      │          │     Bot Pipeline          │
│   (The "Fake User")       │          │   (Your actual bot)       │
│                           │          │                           │
│   STT ◄───────────────────┼── audio ─┼─── TTS                    │
│    ↓                      │          │     ↑                     │
│   LLM (user persona)      │          │    LLM                    │
│    ↓                      │          │     ↑                     │
│   TTS ────────────────────┼── audio ─┼──► STT                    │
│                           │          │                           │
└───────────────────────────┘          └───────────────────────────┘
                  VoicegroundBridgeTransport
```

Both pipelines are standard Pipecat pipelines connected via `VoicegroundBridgeTransport`. The simulator's LLM has a system prompt that tells it to act as a user with specific goals.

### VoicegroundSimulatorConfig Options

| Option | Type | Description |
|--------|------|-------------|
| `llm` | `LLMService` | LLM for generating user responses |
| `tts` | `TTSService` | TTS for generating user voice |
| `stt` | `STTService` | STT for transcribing bot speech |
| `system_prompt` | `str` | Instructions for the simulated user persona |
| `initiate_conversation` | `bool` | If True, simulator speaks first (default: False) |
| `max_turns` | `int` | Maximum conversation turns (default: 10) |
| `timeout_seconds` | `float` | Maximum simulation duration (default: 120) |

### VoicegroundSimulationResults

After the simulation completes, `simulation.results` contains:

- `transcript`: List of `VoicegroundTranscriptEntry` objects with role, text, and timestamp
- `events`: All `VoicegroundEvent` objects captured during simulation
- `turn_count`: Number of completed conversation turns
- `duration_seconds`: Total simulation duration
- `termination_reason`: Why the simulation ended (`max_turns`, `timeout`, or `unknown`)

## VoicegroundObserver

Track conversation events following Pipecat's Observer pattern for observability and debugging.

![Voiceground Observer](assets/VoicegroundObserver.gif)

### Quick Start

```python
import uuid
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from voiceground import VoicegroundObserver, HTMLReporter

# Create observer with HTML reporter
conversation_id = str(uuid.uuid4())
reporter = HTMLReporter(output_dir="reports")
observer = VoicegroundObserver(
    reporters=[reporter],
    conversation_id=conversation_id
)

# Create pipeline task with observer
task = PipelineTask(
    pipeline=Pipeline([...]),
    observers=[observer]
)

# Run your pipeline
```

## Tested With

Voiceground has been tested with the following Pipecat providers:

### LLM Providers
- [x] OpenAI

### STT Providers
- [x] ElevenLabs
- [x] OpenAI

### TTS Providers
- [x] ElevenLabs
- [x] OpenAI

## Event Categories

Voiceground tracks the following event categories:

| Category | Types | Description |
|----------|-------|-------------|
| `user_speak` | `start`, `end` | User speech events |
| `bot_speak` | `start`, `end` | Bot speech events |
| `stt` | `start`, `end` | Speech-to-text processing (includes transcription text) |
| `llm` | `start`, `first_byte`, `end` | LLM response generation (includes generated text) |
| `tts` | `start`, `first_byte`, `end` | Text-to-speech synthesis |
| `tool_call` | `start`, `end` | LLM function/tool calling |
| `system` | `start`, `end` | System events (e.g., context aggregation) |

## Opinionated Metrics

Voiceground tracks 7 opinionated metrics per conversation turn, providing comprehensive insights into voice conversation performance:

1. **Turn Duration**: Total time from the first event to the last event in the turn (milliseconds). Measures the complete duration of a conversation turn.

2. **Response Time**: Time from `user_speak:end` to `bot_speak:start` (or from the first event to `bot_speak:start` if the conversation started with bot speech). This is the end-to-end time the user experiences waiting for a response.

3. **Transcription Overhead**: Time from `user_speak:end` to `stt:end` (milliseconds). Measures the latency of speech-to-text processing.

4. **Voice Synthesis Overhead**: Time from `tts:start` to `bot_speak:start` (milliseconds). Measures the latency of text-to-speech synthesis.

5. **LLM Response Time**: Time from `llm:start` to `llm:first_byte` (milliseconds). Measures the time-to-first-byte for the LLM response, indicating how quickly the model starts generating content.

6. **System Overhead**: Time from `stt:end` to `llm:start` (milliseconds). Measures context aggregation and other system processing that occurs between transcription and LLM invocation. Includes labels/metadata about the system operations.

7. **Tools Overhead**: Sum of all individual `tool_call` durations (each `tool_call:end - tool_call:start`) that occur between `llm:start` and `llm:end` (milliseconds). Measures the total time spent executing function/tool calls during LLM processing.

### Metric Relationships

The metrics are related as follows:
- **Response Time** ≈ **Transcription Overhead** + **System Overhead** + **LLM Response Time** + **Tools Overhead** + **Voice Synthesis Overhead**
- **Turn Duration** includes all events in the turn and may be longer than Response Time if there are additional events before or after the main response flow

## Report Features

The generated HTML reports include:

- **Timeline Visualization**: Interactive timeline showing all events and their relationships
- **Events Table**: Detailed view of all tracked events with timestamps, sources, and data
- **Turns Table**: Conversation turns with all 7 opinionated performance metrics
- **Metrics Summary**: Average metrics across the conversation
- **Event Highlighting**: Hover over events or turns to see related events highlighted


## Examples

See the `examples/` directory for complete working examples:

### Bot Implementations

- **bots/restaurant_bot.py**: Restaurant booking assistant bot
- **bots/friendly_assistant_bot.py**: General-purpose friendly assistant bot

Both bots accept STT, LLM, and TTS services as parameters for flexibility.

### Runner Scripts

- **simulations/run_openai_simulation.py**: Call simulation with a restaurant booking scenario using OpenAI
- **observer/run_openai_restaurant_bot.py**: Restaurant bot with OpenAI services (STT, LLM, TTS)
- **observer/run_elevenlabs_restaurant_bot.py**: Restaurant bot with ElevenLabs (STT, TTS) and OpenAI (LLM)

To run an example:

```bash
# Install example dependencies
uv sync --all-extras

# Set required environment variables
export OPENAI_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key  # For ElevenLabs examples

# Run a simulation (recommended first step)
python examples/simulations/run_openai_simulation.py

# Run a restaurant bot example
python examples/observer/run_openai_restaurant_bot.py
# or
python examples/observer/run_elevenlabs_restaurant_bot.py
```

**Note**: On macOS, you'll need to install portaudio for audio support:
```bash
brew install portaudio
```

## Development

```bash
# Clone the repository
git clone https://github.com/poseneror/voiceground.git
cd voiceground

# Install all dependencies (including dev and examples)
uv sync --all-extras

# Run tests
uv run pytest

# Run linting
uv run ruff check .

# Run type checking
uv run mypy src

# Build the client
python scripts/develop.py build

# Run example (requires portaudio on macOS: brew install portaudio)
python scripts/develop.py example
```

## License

BSD-2-Clause License - see [LICENSE](LICENSE) for details.

