Metadata-Version: 2.4
Name: vision-agents-plugins-qwen
Version: 0.5.3
Summary: Qwen Omni plugin for vision agents
Project-URL: Documentation, https://visionagents.ai/
Project-URL: Website, https://visionagents.ai/
Project-URL: Source, https://github.com/GetStream/Vision-Agents
License-Expression: MIT
Requires-Python: >=3.10
Requires-Dist: numpy
Requires-Dist: vision-agents
Requires-Dist: websockets>=15.0.1
Description-Content-Type: text/markdown

# Qwen Realtime Plugin for Vision Agents

Qwen3 Realtime LLM integration for Vision Agents framework with native audio output and built-in speech recognition using WebSocket-based realtime communication.

## Features

- **Native audio output**: No TTS service needed - audio comes directly from the model
- **Built-in STT**: Integrated speech-to-text using `gummy-realtime-v1` - no external STT service required
- **Server-side VAD**: Automatic turn detection with configurable silence thresholds
- **Video understanding**: Optional video frame support for multimodal interactions
- **Real-time streaming**: WebSocket-based bidirectional communication for low-latency responses
- **Interruption handling**: Automatic cancellation when user starts speaking

## Installation

```bash
uv add "vision-agents[qwen]"
# or directly
uv add vision-agents-plugins-qwen
```

## Usage

```python
from vision_agents.core import User, Agent
from vision_agents.plugins import getstream, qwen

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Qwen Assistant"),
    instructions="Be helpful and friendly",
    llm=qwen.Realtime(
        model="qwen3-omni-flash-realtime",
        voice="Cherry",
        fps=1,
    ),
    # No STT or TTS needed - Qwen Realtime provides both
)
```

## Configuration

| Parameter       | Description                      | Default                                                  | Accepted Values   |
|-----------------|----------------------------------|----------------------------------------------------------|-------------------|
| `model`         | Qwen Realtime model identifier   | `"qwen3-omni-flash-realtime"`                            | Model name string |
| `api_key`       | DashScope API key                | `None` (from env)                                        | String or `None`  |
| `base_url`      | WebSocket API base URL           | `"wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"` | URL string        |
| `voice`         | Voice for audio output           | `"Cherry"`                                               | Voice name string |
| `fps`           | Video frames per second          | `1`                                                      | Integer           |
| `include_video` | Include video frames in requests | `False`                                                  | Boolean           |
| `video_width`   | Video frame width                | `1280`                                                   | Integer           |
| `video_height`  | Video frame height               | `720`                                                    | Integer           |

## Environment Variables

Set `DASHSCOPE_API_KEY` in your environment or `.env` file:

```bash
DASHSCOPE_API_KEY=your_dashscope_api_key_here
```

## Example

See `plugins/qwen/example/qwen_realtime_example.py` for a complete working example.

## Dependencies

- vision-agents
- websockets
- aiortc
- av
