Metadata-Version: 2.4
Name: pipecat-ai-humanlike
Version: 0.1.4
Summary: Humanlike avatar plugin for Pipecat — real-time talking-head video with expression control
Author-email: Humanlike AI <minas@humanlike.ai>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/HumanlikeAI/pipecat-plugins-humanlike
Project-URL: Repository, https://github.com/HumanlikeAI/pipecat-plugins-humanlike
Keywords: pipecat,avatar,talking-head,real-time,humanlike,lip-sync
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pipecat-ai>=0.0.100
Requires-Dist: numpy
Requires-Dist: websockets
Requires-Dist: Pillow
Provides-Extra: fast
Requires-Dist: soxr; extra == "fast"

# pipecat-ai-humanlike

Humanlike avatar plugin for [Pipecat](https://github.com/pipecat-ai/pipecat) — real-time talking-head video with expression control.

Streams TTS audio to a Humanlike GPU orchestrator and receives lip-synced video frames with facial expressions guided by a natural-language prompt.

## Installation

```bash
pip install pipecat-ai-humanlike
```

For faster audio resampling:

```bash
pip install pipecat-ai-humanlike[fast]
```

## Quick Start

```python
from pipecat.services.humanlike import HumanlikeVideoService

humanlike = HumanlikeVideoService(
    ws_url="ws://your-gpu-server:8000/ws/stream",
    image="./face.png",
    avatar_model="humanlike-homo",
    prompt="warm, friendly, subtly smiling, occasional nods",
)

pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    humanlike,            # after TTS, before transport output
    transport.output(),
    context_aggregator.assistant(),
])
```

## Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `ws_url` | `str` | `ws://127.0.0.1:8000/ws/stream` | WebSocket URL of the Humanlike GPU orchestrator |
| `image` | `str \| bytes` | `./face.png` | Path to a face image, or raw PNG/JPEG bytes |
| `avatar_model` | `str` | `humanlike-homo` | Model identifier |
| `prompt` | `str` | `warm, friendly, subtly smiling` | Expression prompt guiding facial behaviour |
| `seed` | `int` | `42` | Random seed for reproducible generation |
| `video_width` | `int` | `512` | Output video width |
| `video_height` | `int` | `512` | Output video height |

## Live Expression Updates

Update the expression prompt during a live session:

```python
await humanlike.update_prompt("excited, wide eyes, big smile")
```

## How It Works

1. On pipeline start, connects to the orchestrator via WebSocket and sends the face image + config
2. Intercepts `TTSAudioRawFrame` from the TTS service, resamples to 16 kHz mono, and streams PCM chunks to the orchestrator
3. Receives JPEG video frames back, decodes them, and pushes `OutputImageRawFrame` downstream
4. All frames (including audio) pass through so the user still hears the TTS output
5. Shows the reference image as an idle frame until the first GPU-generated frame arrives

## License

Apache-2.0
