Metadata-Version: 2.4
Name: voiceground
Version: 0.1.5
Summary: Observability framework for Pipecat voice and multimodal conversational AI
Project-URL: Homepage, https://github.com/poseneror/voiceground
Project-URL: Documentation, https://github.com/poseneror/voiceground#readme
Project-URL: Repository, https://github.com/poseneror/voiceground
Project-URL: Issues, https://github.com/poseneror/voiceground/issues
Author-email: Or Posener <posener.or@gmail.com>
License-Expression: BSD-2-Clause
License-File: LICENSE
Keywords: ai,conversational-ai,observability,pipecat,real-time,voice
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: pipecat-ai>=0.0.99
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: examples
Requires-Dist: aiohttp>=3.9; extra == 'examples'
Requires-Dist: loguru>=0.7; extra == 'examples'
Requires-Dist: pipecat-ai[elevenlabs,openai,silero]>=0.0.99; extra == 'examples'
Requires-Dist: pyaudio>=0.2.14; extra == 'examples'
Requires-Dist: python-dotenv>=1.0; extra == 'examples'
Description-Content-Type: text/markdown

# Voiceground

Observability framework for [Pipecat](https://github.com/pipecat-ai/pipecat) voice and multimodal conversational AI.

## Features

- **VoicegroundObserver**: Track conversation events following Pipecat's Observer pattern
- **Call Simulation**: Test your bots with dynamic, LLM-powered simulated users

## Installation

```bash
pip install voiceground
```

Or with UV:

```bash
uv add voiceground
```

## Quick Start

```python
import uuid
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from voiceground import VoicegroundObserver, HTMLReporter

# Create observer with HTML reporter
conversation_id = str(uuid.uuid4())
reporter = HTMLReporter(output_dir="./reports")
observer = VoicegroundObserver(
    reporters=[reporter],
    conversation_id=conversation_id
)

# Create pipeline task with observer
task = PipelineTask(
    pipeline=Pipeline([...]),
    observers=[observer]
)

# Run your pipeline
```

## Tested With

Voiceground has been tested with the following Pipecat providers:

### LLM Providers
- [x] OpenAI (GPT)

### STT Providers
- [x] ElevenLabs

### TTS Providers
- [x] ElevenLabs

## Event Categories

Voiceground tracks the following event categories:

| Category | Types | Description |
|----------|-------|-------------|
| `user_speak` | `start`, `end` | User speech events |
| `bot_speak` | `start`, `end` | Bot speech events |
| `stt` | `start`, `end` | Speech-to-text processing (includes transcription text) |
| `llm` | `start`, `first_byte`, `end` | LLM response generation (includes generated text) |
| `tts` | `start`, `first_byte`, `end` | Text-to-speech synthesis |
| `tool_call` | `start`, `end` | LLM function/tool calling |
| `system` | `start`, `end` | System events (e.g., context aggregation) |

## Opinionated Metrics

Voiceground tracks 7 opinionated metrics per conversation turn, providing comprehensive insights into voice conversation performance:

1. **Turn Duration**: Total time from the first event to the last event in the turn (milliseconds). Measures the complete duration of a conversation turn.

2. **Response Time**: Time from `user_speak:end` to `bot_speak:start` (or from the first event to `bot_speak:start` if the conversation started with bot speech). This is the end-to-end time the user experiences waiting for a response.

3. **Transcription Overhead**: Time from `user_speak:end` to `stt:end` (milliseconds). Measures the latency of speech-to-text processing.

4. **Voice Synthesis Overhead**: Time from `tts:start` to `bot_speak:start` (milliseconds). Measures the latency of text-to-speech synthesis.

5. **LLM Response Time**: Time from `llm:start` to `llm:first_byte` (milliseconds). Measures the time-to-first-byte for the LLM response, indicating how quickly the model starts generating content.

6. **System Overhead**: Time from `stt:end` to `llm:start` (milliseconds). Measures context aggregation and other system processing that occurs between transcription and LLM invocation. Includes labels/metadata about the system operations.

7. **Tools Overhead**: Sum of all individual `tool_call` durations (each `tool_call:end - tool_call:start`) that occur between `llm:start` and `llm:end` (milliseconds). Measures the total time spent executing function/tool calls during LLM processing.

### Metric Relationships

The metrics are related as follows:
- **Response Time** ≈ **Transcription Overhead** + **System Overhead** + **LLM Response Time** + **Tools Overhead** + **Voice Synthesis Overhead**
- **Turn Duration** includes all events in the turn and may be longer than Response Time if there are additional events before or after the main response flow

## Report Features

The generated HTML reports include:

- **Timeline Visualization**: Interactive timeline showing all events and their relationships
- **Events Table**: Detailed view of all tracked events with timestamps, sources, and data
- **Turns Table**: Conversation turns with all 7 opinionated performance metrics
- **Metrics Summary**: Average metrics across the conversation
- **Event Highlighting**: Hover over events or turns to see related events highlighted

## Call Simulation

Voiceground includes a call simulation feature for testing your bots with dynamic, LLM-powered simulated users. Instead of manual testing, you can define user personas and goals, and let the simulator have realistic conversations with your bot.

### Architecture

```
┌───────────────────────────┐          ┌───────────────────────────┐
│   Simulator Pipeline      │          │     Bot Pipeline          │
│   (The "Fake User")       │          │   (Your actual bot)       │
│                           │          │                           │
│   STT ◄───────────────────┼── audio ─┼─── TTS                    │
│    ↓                      │          │     ↑                     │
│   LLM (user persona)      │          │    LLM                    │
│    ↓                      │          │     ↑                     │
│   TTS ────────────────────┼── audio ─┼──► STT                    │
│                           │          │                           │
└───────────────────────────┘          └───────────────────────────┘
                  VoicegroundBridgeTransport
```

Both pipelines are standard Pipecat pipelines connected via `VoicegroundBridgeTransport`. The simulator's LLM has a system prompt that tells it to act as a user with specific goals.

### Quick Start

```python
from voiceground.simulation import VoicegroundSimulation, VoicegroundSimulatorConfig

# Configure the simulated user
config = VoicegroundSimulatorConfig(
    llm=OpenAILLMService(api_key=...),
    tts=ElevenLabsTTSService(api_key=...),
    stt=ElevenLabsSTTService(api_key=...),
    system_prompt="""
        You are a customer calling to book a restaurant table.
        Your goal: Book a table for 2 people tomorrow at 7pm.
        Be natural and conversational.
    """,
    initiate_conversation=True,  # Simulator speaks first
    max_turns=10,
)

# Run simulation
async with VoicegroundSimulation(config) as simulation:
    await run_bot(transport=simulation.transport)

# Results available after context exits
print(simulation.results.transcript)
print(f"Turns: {simulation.results.turn_count}")
```

Your `run_bot` function just needs to accept a transport parameter, as a drop in replacement:

```python
async def run_bot(transport):
    # Use transport.input() and transport.output() - same as LocalAudioTransport!
    pipeline = Pipeline([
        transport.input(),
        stt, llm, tts,
        transport.output(),
    ])
    runner = PipelineRunner()
    await runner.run(PipelineTask(pipeline))
```

The simulation automatically handles turn limiting and timeouts - no extra configuration needed on the bot side.

**Note**: Simulations run faster than real-time because audio input/output is not buffered. This allows for rapid testing and iteration, but timing metrics may not reflect real-world performance characteristics.

### VoicegroundSimulatorConfig Options

| Option | Type | Description |
|--------|------|-------------|
| `llm` | `LLMService` | LLM for generating user responses |
| `tts` | `TTSService` | TTS for generating user voice |
| `stt` | `STTService` | STT for transcribing bot speech |
| `system_prompt` | `str` | Instructions for the simulated user persona |
| `initiate_conversation` | `bool` | If True, simulator speaks first (default: False) |
| `max_turns` | `int` | Maximum conversation turns (default: 10) |
| `timeout_seconds` | `float` | Maximum simulation duration (default: 120) |

### VoicegroundSimulationResults

After the simulation completes, `simulation.results` contains:

- `transcript`: List of `VoicegroundTranscriptEntry` objects with role, text, and timestamp
- `events`: All `VoicegroundEvent` objects captured during simulation
- `turn_count`: Number of completed conversation turns
- `duration_seconds`: Total simulation duration
- `termination_reason`: Why the simulation ended (`max_turns`, `timeout`, or `unknown`)

## Examples

See the `examples/` directory for complete working examples:

- **observer/basic_pipeline.py**: Basic voice conversation with STT, LLM, and TTS
- **observer/tool_calling_pipeline.py**: Example with LLM function calling
- **simulations/run_simulation.py**: Call simulation with a restaurant booking scenario

To run an example:

```bash
# Install example dependencies
uv sync --all-extras

# Set required environment variables
export OPENAI_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key
export VOICE_ID=your_voice_id

# Run the example
python examples/basic_pipeline.py
```

**Note**: On macOS, you'll need to install portaudio for audio support:
```bash
brew install portaudio
```

## Development

```bash
# Clone the repository
git clone https://github.com/poseneror/voiceground.git
cd voiceground

# Install all dependencies (including dev and examples)
uv sync --all-extras

# Run tests
uv run pytest

# Run linting
uv run ruff check .

# Run type checking
uv run mypy src

# Build the client
python scripts/develop.py build

# Run example (requires portaudio on macOS: brew install portaudio)
python scripts/develop.py example
```

## License

BSD-2-Clause License - see [LICENSE](LICENSE) for details.

