Metadata-Version: 2.4
Name: trugen-sdk
Version: 1.0.0
Summary: Official Python SDK for TruGen AI - Real-time AI avatar streaming
Author-email: TruGen AI <support@trugen.ai>
License: MIT
Project-URL: Homepage, https://trugen.ai
Project-URL: Documentation, https://docs.trugen.ai
Project-URL: Repository, https://github.com/trugenai/python-sdk
Project-URL: Issues, https://github.com/trugenai/python-sdk/issues
Keywords: trugen,ai,avatar,avatar-streaming,voice-ai,livekit,real-time,webrtc
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Communications
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: livekit>=0.11.0
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: display
Requires-Dist: opencv-python>=4.8.0; extra == "display"
Requires-Dist: sounddevice>=0.4.6; extra == "display"
Requires-Dist: numpy>=1.24.0; extra == "display"
Dynamic: license-file

# TruGen AI Python SDK

Official Python SDK for [TruGen AI](https://trugen.ai) - Real-time AI avatar streaming.

[![PyPI version](https://badge.fury.io/py/trugen-sdk.svg)](https://badge.fury.io/py/trugen-sdk)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

All WebRTC and audio/video processing complexity (LiveKit, Acoustic Echo Cancellation, decoding, threading) is handled **under the hood**. You only ever need to import from `trugen`.

## Installation

```bash
# Using uv (recommended)
uv add trugen-sdk

# With optional display utilities (for OpenCV and audio playback testing)
uv add trugen-sdk --extra display

# Using pip
pip install trugen-sdk

# With optional display utilities (for OpenCV and audio playback testing)
pip install trugen-sdk[display]
```

## Quick Start

### Simple OpenCV Video Display (Using TruGenRunner)

For most UI/desktop applications, `TruGenRunner` handles spawning a background event loop thread for the session, while serving BGR video frames and state to the main thread safely.

```python
import cv2
import os
from trugen import TruGenClient, TruGenRunner

# 1. Define how to connect to the session
async def create_session():
    client = TruGenClient(api_key=os.getenv("TRUGEN_API_KEY", ""))
    session = await client.create_session(agent_id=os.getenv("TRUGEN_AGENT_ID", ""))
    await session.connect()
    await session.enable_audio_output()  # Speaker + AEC in one call
    return session

# 2. Initialize the runner
runner = TruGenRunner(session_factory=create_session)

# 3. Handle incoming frames
@runner.on_frame
def show_frame(frame):
    if frame is not None:
        cv2.imshow("TruGen Avatar", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        runner.stop()

# 4. Start rendering loop (blocks main thread)
if __name__ == "__main__":
    runner.run()
    cv2.destroyAllWindows()
```

---

## Features

- 🎥 **Real-time Audio/Video Streaming** - Receive synchronized high-quality audio and video frames directly from the avatar.
- 🔇 **Built-in Acoustic Echo Cancellation (AEC)** - Automatic APM synchronization via `enable_audio_output()` prevents the avatar from hearing and responding to its own voice.
- 🟩 **BGR OpenCV Frames** - Zero-boilerplate async iterator `video_frames_bgr()` yielding pre-converted NumPy arrays ready for OpenCV.
- ⚙️ **GUI Runner Support** - Thread-safe `TruGenRunner` wrapper solves blocking rendering loops in OpenCV, Pygame, PyQt/PySide, or custom game engines.
- 🎵 **Custom Audio Injection** - Programmatically inject WAV files (`upload_audio()`) or raw 16-bit PCM bytes (`send_audio()`) directly into the room.
- 🎙️ **Microphone Lifecycle Management** - Built-in utilities for muting/unmuting the mic and monitoring mic permissions (pending/granted/denied).
- 💬 **Real-time Captions** - Event hooks to handle caption and transcript updates with zero latency.
- 📝 **Clean Transcripts** - Distinguish between user transcripts (`user.transcription_received`) and agent utterances (`agent.transcription_final`) for logging.
- 📡 **Async Iterator API** - Stream raw audio (`AudioFrame`) and video (`VideoFrame`) natively via Python async generators.
- 🎯 **Event-Driven Architecture** - Decorator-based event handlers for connection, tracks, speaking states, and transcriptions.
- 📝 **Fully Typed** - Complete type hints for IDE autocompletion and safety.

---

## API Reference

### `TruGenClient`

The entry point for starting TruGen AI sessions.

```python
from trugen import TruGenClient

# Initialize with your API key
client = TruGenClient(api_key="your-api-key")

# Create a session with your Agent ID
session = await client.create_session(agent_id="your-agent-id")
```

---

### `TruGenSession`

Represents an active connection to a streaming room.

#### Connection
- `await session.connect()`: Connect to the streaming room and publish the local microphone.
- `await session.disconnect()`: Disconnect and cleanly release all hardware/stream resources.

#### Audio Output
- `await session.enable_audio_output()`: Activates speaker playback with built-in echo cancellation. Call this once after `connect()`.

#### Video & Audio Generators
- `session.video_frames_bgr()`: Async generator yielding NumPy arrays (`NDArray`) in BGR format, ready for OpenCV.
- `session.video_frames()`: Async generator yielding raw LiveKit `VideoFrame` objects.
- `session.audio_frames()`: Async generator yielding raw LiveKit `AudioFrame` objects.

#### Microphone Control
- `session.mute_input_audio()`: Mutes the local microphone.
- `session.unmute_input_audio()`: Unmutes the local microphone.
- `session.is_input_muted()`: Returns `True` if the microphone is muted.
- `session.get_input_audio_state()`: Returns an `InputAudioState` object containing mute status and mic permission status (`pending`, `granted`, `denied`).
- `await session.start_mic()`: Connects and publishes the microphone track.
- `await session.stop_mic()`: Stops capturing and unpublishes the microphone track.

#### Custom Audio Injection
- `await session.upload_audio(file_path)`: Streams a PCM WAV file into the room.
- `await session.send_audio(data, sample_rate=48000, num_channels=1)`: Injects raw 16-bit PCM bytes into the audio stream.

#### Low-Level Accessors
- `session.get_video_track()`: Returns the remote `RemoteVideoTrack` object (or `None`).
- `session.get_audio_track()`: Returns the remote `RemoteAudioTrack` object (or `None`).
- `session.room`: Returns the underlying `livekit.rtc.Room` instance for advanced operations.

---

### `TruGenRunner`

Handles multi-threading to run the async session event loop on a background thread while feeding events safely to the main rendering thread.

#### Controls
- `runner.run()`: Starts the runner and blocks the main thread to run the rendering loop.
- `runner.stop()`: Safely stops the background loop and disconnects the session (thread-safe).
- `runner.toggle_mute()`: Toggles the microphone mute state (thread-safe).

#### Properties & Accessors
- `runner.mic_muted`: Returns `True` if the microphone is currently muted.
- `runner.session_state`: Returns the current session state enum (`TruGenState`).
- `runner.session`: Access the active `TruGenSession` instance (returns `None` until connected).
- `runner.get_caption()`: Returns a tuple `(text, timestamp)` containing the last received caption chunk and the monotonic timestamp it arrived.

#### Event Decorators

You can register event handlers using decorators on the `TruGenRunner` (for UI/main-thread callbacks) or directly on the `TruGenSession` (for low-level async callbacks).

##### 1. Runner Decorators (`TruGenRunner`)

- **`@runner.on_frame`**: Receives BGR video frames (NumPy arrays) or `None` on the main thread.
  ```python
  @runner.on_frame
  def on_frame(frame):
      if frame is not None:
          cv2.imshow("Avatar", frame)
  ```

- **`@runner.on_caption`**: Receives real-time streaming caption chunks (ideal for UI overlays).
  ```python
  @runner.on_caption
  def on_caption(text: str):
      # Fired for each caption chunk as it arrives
      pass
  ```

- **`@runner.on_state`**: Called when the session's connection state transitions.
  ```python
  @runner.on_state
  def on_state(state: TruGenState):
      print(f"Status: {state.value}")
  ```

- **`@runner.on_event`**: Handles any standard `TruGenEvent` enum or custom string event.
  ```python
  # Log final complete transcripts
  @runner.on_event("user.transcription_received")
  def on_user_transcript(text: str):
      print(f"[User]  {text}")

  @runner.on_event("agent.transcription_final")
  def on_agent_transcript(text: str):
      print(f"[Agent] {text}")
  ```

##### 2. Session Decorators (`TruGenSession`)

If you are not using `TruGenRunner`, you can listen to events directly on the `TruGenSession` using the `@session.on()` decorator:

```python
# Log final complete transcripts directly from the session
@session.on("user.transcription_received")
def on_user_speech(text: str):
    print(f"[User]  {text}")

@session.on("agent.transcription_final")
def on_agent_speech(text: str):
    print(f"[Agent] {text}")

# Handle speaking state changes
@session.on(TruGenEvent.AGENT_SPEAKING_STARTED)
def agent_speech_start():
    print("Agent started speaking...")
```

---

### Events (`TruGenEvent`)

Register listener callbacks directly on a `TruGenSession` or a `TruGenRunner` using `@session.on()` or `@runner.on_event()`.

| Event Enum / String | Fired When | Callback Arguments |
|---|---|---|
| `TruGenEvent.STATE_CHANGED` | The session state changes | `state: TruGenState` |
| `TruGenEvent.CONNECTION_ESTABLISHED` | Successfully connected to room | None |
| `TruGenEvent.CONNECTION_CLOSED` | Session room disconnected | `reason: DisconnectReason` |
| `TruGenEvent.VIDEO_STREAM_STARTED` | Remote video track subscribed | `track: RemoteVideoTrack` |
| `TruGenEvent.AUDIO_STREAM_STARTED` | Remote audio track subscribed | `track: RemoteAudioTrack` |
| `TruGenEvent.INPUT_AUDIO_STREAM_STARTED` | Local mic audio stream begins publishing | None |
| `TruGenEvent.AGENT_SPEAKING_STARTED` | Agent starts speaking | None |
| `TruGenEvent.AGENT_SPEAKING_ENDED` | Agent stops speaking | None |
| `TruGenEvent.USER_SPEECH_STARTED` | User starts speaking | None |
| `TruGenEvent.USER_SPEECH_ENDED` | User stops speaking | None |
| `TruGenEvent.TEXT_CHUNK_RECEIVED` | Caption/Text chunk received | `text: str` |
| `TruGenEvent.MIC_PERMISSION_PENDING` | Mic permission request is pending | None |
| `TruGenEvent.MIC_PERMISSION_GRANTED` | Mic permission has been granted | None |
| `TruGenEvent.MIC_PERMISSION_DENIED` | Mic permission has been denied | None |
| `"user.transcription_received"` | User completes a final utterance | `text: str` |
| `"agent.transcription_final"` | Agent completes a final utterance | `text: str` |
| `"connection.reconnecting"` | Transient network reconnection starts | None |
| `"connection.reconnected"` | Network reconnection completes | None |
| `"connection.quality_changed"` | Participant connection quality changes | `participant, quality` |
| `TruGenEvent.ERROR` | A session or connection error occurs | `error: Exception` |

---

### Session States (`TruGenState`)

| Enum Value | Description |
|---|---|
| `TruGenState.INITIALIZING` | Session created but not yet connected |
| `TruGenState.CONNECTING` | WebRTC handshake and connection in progress |
| `TruGenState.CONNECTED` | Connection established; actively streaming media |
| `TruGenState.DISCONNECTED` | Session ended and connection closed |
| `TruGenState.ERROR` | Unrecoverable error occurred |

---

## Detailed Examples

### Interactive Session with GUI & WAV Injection

For a fully-featured interactive application showing:
- Real-time video window and connection status.
- Mic mute controls and speaking indicator.
- Floating caption overlay.
- WAV audio injection support (presses `A` to inject a local WAV file to the agent).

See the built-in examples in the directory:
- [Basic GUI Viewer](examples/basic_session_ui.py) - Simple viewer containing status bar, mic indicators, and floating captions.
- [Advanced GUI Viewer](examples/advanced_session_ui.py) - Full features demonstration including WAV audio injection, reconnect handles, and complete transcripts.

---

## Configuration

Set the API authentication credentials in a `.env` file or export them directly in your environment:

```bash
export TRUGEN_API_KEY="your-api-key"
export TRUGEN_AGENT_ID="your-agent-id"
```

## Error Handling

Handle exceptions using standard try/except blocks around `create_session` and connection logic:

```python
import asyncio
from trugen import TruGenClient

client = TruGenClient(api_key="invalid-key")

async def main():
    try:
        session = await client.create_session(agent_id="my-agent")
        await session.connect()
    except RuntimeError as e:
        print(f"Connection failed: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

asyncio.run(main())
```

## Requirements

- Python 3.10+
- **Core Dependencies:**
  - `livekit` (>=0.11.0)
  - `aiohttp` (>=3.8.0)
- **Optional Dependencies (`[display]`):**
  - `opencv-python` (>=4.8.0)
  - `sounddevice` (>=0.4.6)
  - `numpy` (>=1.24.0)

## License

MIT License - see [LICENSE](LICENSE) for details.

## Links

- [TruGen AI Website](https://trugen.ai)
- [Developer Portal / Dashboard](https://dashboard.trugen.ai)
