Metadata-Version: 2.4
Name: vision-agents-plugins-assemblyai
Version: 0.4.3
Summary: AssemblyAI streaming STT integration for Vision Agents
Project-URL: Documentation, https://visionagents.ai/
Project-URL: Website, https://visionagents.ai/
Project-URL: Source, https://github.com/GetStream/Vision-Agents
License-Expression: MIT
Keywords: AI,STT,agents,assemblyai,speech-to-text,streaming,transcription,voice agents
Requires-Python: >=3.10
Requires-Dist: aiohttp>=3.13.3
Requires-Dist: vision-agents
Description-Content-Type: text/markdown

# AssemblyAI Plugin

Streaming Speech-to-Text (STT) plugin for Vision Agents using AssemblyAI's Universal-3 Pro model.

## Features

- Real-time streaming transcription via async WebSocket
- Built-in punctuation-based turn detection with configurable silence thresholds
- Streaming diarization — identify speakers in real time
- Native `SpeechStarted` event support
- Custom prompt and keyterms boosting support
- Sub-300ms time to complete transcript latency
- Built-in reconnection with exponential backoff

## Installation

```bash
uv add "vision-agents[assemblyai]"
# or directly
uv add vision-agents-plugins-assemblyai
```

## Usage

```python
from vision_agents.plugins import assemblyai

stt = assemblyai.STT(
    speech_model="u3-rt-pro",  # Default model
    sample_rate=16000,
)
```

### With streaming diarization

Enable `speaker_labels` to identify speakers in a mixed audio stream. Each transcript event will carry a distinct `participant` per speaker and the raw label in `response.other["speaker_label"]`.

```python
stt = assemblyai.STT(
    speaker_labels=True,
    max_speakers=2,  # optional hint (1-10)
)
```

### With keyterms boosting

```python
stt = assemblyai.STT(
    keyterms_prompt=["AssemblyAI", "Vision Agents"],
)
```

### With custom turn silence thresholds

```python
stt = assemblyai.STT(
    min_turn_silence=100,   # ms before speculative EOT check
    max_turn_silence=1200,  # ms before forcing turn end
)
```

## Configuration

| Parameter | Description | Default |
|---|---|---|
| `api_key` | AssemblyAI API key (falls back to `ASSEMBLYAI_API_KEY` env var) | `None` |
| `speech_model` | Model identifier | `"u3-rt-pro"` |
| `sample_rate` | Audio sample rate in Hz | `16000` |
| `min_turn_silence` | Silence (ms) before speculative end-of-turn check | API default |
| `max_turn_silence` | Maximum silence (ms) before forcing turn end | API default |
| `prompt` | Custom transcription prompt (cannot be combined with `keyterms_prompt`) | `None` |
| `keyterms_prompt` | List of terms to boost recognition for (cannot be combined with `prompt`) | `None` |
| `speaker_labels` | Enable streaming diarization for multi-speaker identification | `False` |
| `max_speakers` | Hint for expected number of speakers, 1-10 (requires `speaker_labels=True`) | `None` |
| `max_reconnect_attempts` | Maximum reconnect attempts on transient failures | `3` |
| `reconnect_backoff_initial_s` | Initial backoff delay in seconds | `0.5` |
| `reconnect_backoff_max_s` | Maximum backoff delay in seconds | `4.0` |

## Environment Variables

Set `ASSEMBLYAI_API_KEY` in your environment or pass `api_key` to the constructor.

## Dependencies

- `aiohttp>=3.9.0`
- `vision-agents`

## Docs

- https://www.assemblyai.com/docs/streaming/universal-3-pro
- https://www.assemblyai.com/docs/streaming/streaming-diarization-and-multichannel
