Metadata-Version: 2.4
Name: paty
Version: 0.0.6
Summary: PATY — Declarative voice agent deployment on Pipecat
Project-URL: Homepage, https://github.com/PATYai/PATY
Project-URL: Repository, https://github.com/PATYai/PATY
Project-URL: Issues, https://github.com/PATYai/PATY/issues
Author-email: Shea Hawkins <shea@paty.ai>
License: Copyright 2026 Shea Hawkins
        
        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
License-File: LICENSE
Keywords: assistant,llm,pipecat,stt,tts,voice,voice-ai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: click>=8.0
Requires-Dist: httpx>=0.27
Requires-Dist: loguru>=0.7
Requires-Dist: numpy>=1.24
Requires-Dist: opentelemetry-api>=1.20
Requires-Dist: opentelemetry-sdk>=1.20
Requires-Dist: pipecat-ai[local,openai,silero]<1.0,>=0.0.108
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0
Requires-Dist: ruamel-yaml>=0.18
Requires-Dist: websockets>=12.0
Provides-Extra: cpu
Requires-Dist: llama-cpp-python; extra == 'cpu'
Requires-Dist: pipecat-ai[silero,whisper]<1.0,>=0.0.108; extra == 'cpu'
Provides-Extra: cuda
Requires-Dist: llama-cpp-python; extra == 'cuda'
Requires-Dist: pipecat-ai[silero,whisper]<1.0,>=0.0.108; extra == 'cuda'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.4; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: eject
Requires-Dist: jinja2>=3.0; extra == 'eject'
Provides-Extra: mlx
Requires-Dist: misaki[en]<0.9; extra == 'mlx'
Requires-Dist: mlx-audio>=0.2; extra == 'mlx'
Requires-Dist: mlx-lm>=0.20; extra == 'mlx'
Requires-Dist: pipecat-ai[silero,whisper]<1.0,>=0.0.108; extra == 'mlx'
Provides-Extra: otlp
Requires-Dist: opentelemetry-exporter-otlp>=1.20; extra == 'otlp'
Provides-Extra: prometheus
Requires-Dist: opentelemetry-exporter-prometheus>=0.50b0; extra == 'prometheus'
Description-Content-Type: text/markdown

# PATY — Please & Thank You

Declarative voice agent deployment on Pipecat. `uv tool install paty && paty run` and you're talking to a voice agent. No `bot.py` to write, no YAML required.

## Quickstart

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh   # if you don't already have uv
uv tool install paty
paty run
```

`paty run` with no argument loads a bundled default config (friendly `paty` persona, auto-detected hardware profile). The first `paty run` will detect your platform and tell you exactly which extra to install for local inference:

```bash
uv tool install 'paty[mlx]'   # Apple Silicon
uv tool install 'paty[cuda]'  # NVIDIA GPU
uv tool install 'paty[cpu]'   # CPU fallback
```

Then `paty run` again. On first launch PATY will:

1. Pick a hardware profile from your platform and memory.
2. Download the LLM weights from Hugging Face (a few GB — first start is slow, subsequent runs hit the cache).
3. Download the Whisper STT model on first use.
4. Start the managed LLM server, warm it up, then open a local mic/speaker transport so you can talk to the agent.

Press `Ctrl+C` to stop.

To run a config of your own:

```bash
paty run path/to/your-config.yaml
```

### Platform notes

- **Apple Silicon (macOS arm64):** the `[mlx]` extra pulls in MLX. No system toolchain required.
- **NVIDIA GPU (CUDA):** the `[cuda]` extra installs `llama-cpp-python` with GPU offload, which needs a working CUDA toolchain at install time. See the [llama-cpp-python CUDA build docs](https://llama-cpp-python.readthedocs.io/en/latest/#installation-with-specific-hardware-acceleration-blas-cuda-metal-etc).
- **CPU-only:** the `[cpu]` extra needs a C/C++ toolchain (`build-essential` on Linux, Xcode Command Line Tools on macOS) for `llama-cpp-python`.

### External services

- **LLM** — PATY spawns a managed inference server automatically (`mlx_lm.server` on Apple Silicon, `llama_cpp.server` on CUDA/CPU). No separate Ollama install is required; models are pulled from Hugging Face on first run.
- **TTS on CUDA/CPU** — the `kokoro` provider expects an OpenAI-compatible Kokoro FastAPI server at `http://localhost:8880/v1`. The easiest way is the Docker image from [remsky/Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI). Apple Silicon runs Kokoro in-process via `mlx-audio` and needs nothing extra.
- **Piper (CPU alternative)** — `tts: piper` downloads its voice model on first use; no server needed.

## Contributing / dev install

```bash
git clone https://github.com/PATYai/PATY.git
cd PATY/cli
uv sync --extra mlx --extra dev          # or --extra cuda / --extra cpu
uv run pytest tests/ -v                  # run tests
uv run ruff check paty/ tests/           # lint
uv run ruff format --check paty/ tests/  # format check
```

## Config

The YAML config is PATY's primary interface. A minimal example:

```yaml
pak:
  persona: "You are a receptionist for Dr. Smith's dental office."

pipeline:
  stt: whisper
  llm: ollama
  tts: kokoro
  vad: silero

hardware:
  profile: auto    # or: apple-16gb, apple-24gb, cuda-24gb, cpu-only

sip:
  provider: voip-ms
  host: sip.voip.ms
  username: "100000"
  password: "${SIP_PASSWORD}"
  did: "+13035551234"

tracing:
  enabled: true
  console: true
```

Pipeline entries accept string shorthand (`stt: whisper`) or expanded form:

```yaml
pipeline:
  stt:
    provider: whisper
    model: large-v3-turbo
  llm:
    provider: ollama
    model: qwen3:14b
    base_url: http://localhost:11434/v1
  tts:
    provider: kokoro
    voice: af_bella
    base_url: http://localhost:8880/v1
```

Environment variables in `${VAR}` syntax are interpolated at load time.

## CLI Commands

```
paty run [config.yaml]       Start the voice agent (no arg → bundled default)
paty bus tail                Subscribe to a running bus and print events
paty bus tui                 Live conversation view subscribed to the bus
paty profiles                List hardware profiles and their model selections
paty pak list                List installed PAKs
paty pak active              Print the currently active PAK
paty pak validate <path>     Validate a PAK directory
paty pak switch <name>       Set the active PAK (applies on next `paty run`)
paty init                    Scaffold a starter config (coming soon)
paty doctor                  Check dependencies (coming soon)
paty eject <config.yaml>     Generate standalone bot.py (coming soon)
```

## PAKs (Personality Augmentation Kits)

A PAK bundles a persona (system prompt) and voice settings (TTS provider/voice, optional LLM pin) into a self-contained directory. PATY ships a default `paty` PAK; additional PAKs can be installed under `~/.paty/paks/<name>/`. Each PAK directory contains:

```
pak.yaml      # manifest: name, version, voice config
soul.md       # the system prompt / persona document
```

A PAK-style `paty.yaml`:

```yaml
pak:
  active: paty           # name of an installed PAK; bundled default is "paty"
hardware:
  profile: auto
```

For an ad-hoc persona without a PAK directory, set `pak.persona` instead of `pak.active` (the two are mutually exclusive). A transient PAK is synthesized from the inline text and routed through the same voice-resolution pipeline as a registered PAK. If neither field is set, the bundled `paty` PAK is loaded automatically.

User-provided `pipeline.tts.voice` or `pipeline.llm.model` override what the PAK declares — useful for debugging or forcing every PAK onto a single voice.

PAKs may pin `voice.llm.model` to a specific LLM. This is allowed but expensive — switching to or from a differently-pinned PAK forces a full LLM reload. PATY logs a loud warning at startup when a pin disagrees with the resolved hardware profile.

> **Note:** hot-swap is not yet implemented. `paty pak switch <name>` updates the active pointer; the change applies on the next `paty run`. A follow-up will land in-process swap (TTS replaced live, LLM warmed up where compatible).

## Event Bus

PATY can publish session events over a WebSocket so other processes (e.g. a TUI) can observe what the pipeline is doing without being coupled to it. Enable it in the config:

```yaml
bus:
  enabled: true            # publish session events for subscribers
  host: 127.0.0.1
  port: 8765
```

With the bus enabled, `paty run` starts a local WebSocket server at `ws://host:port`. Subscribers receive two frame types:

- **Text frames** — JSON control events with envelope `{v, seq, ts_ms, session_id, type, data}`. Types cover session lifecycle (`session.started`, `session.ended`), user turn (`user.speech_started/stopped`, `user.transcript.partial/final`), agent turn (`agent.thinking_started`, `agent.response.delta/completed`, `agent.speech_started/stopped`), derived `state.changed` (idle/listening/thinking/speaking), `metrics.tick`, `input.muted`, and `error`/`log`.
- **Binary frames** — a 16-byte header followed by PCM16LE audio samples. Header: `magic(1)`, `version(1)`, `stream(1: 1=mic, 2=agent)`, `reserved(1)`, `sample_rate(u16 LE)`, `channels(u16 LE)`, `seq(u32 LE)`, `ts_ms(u32 LE)` since session start.

The server fans out to any number of subscribers; control events never drop (overflow disconnects the slow subscriber), audio frames drop-oldest under backpressure.

### Bus actions

Subscribers can also send JSON commands to the bus to control the agent. Each command is a single JSON object:

```json
{"action": "mute.toggle"}
{"action": "mute.set", "muted": true}
```

| Action | Payload | Effect |
|--------|---------|--------|
| `mute.toggle` | — | Flip the mic mute. While muted, mic audio is dropped before reaching STT, so PATY can't hear you. |
| `mute.set` | `muted: bool` | Set the mute to an explicit state. |

Every state change is broadcast back as an `input.muted` event with `{muted: bool}` so all subscribers stay in sync.

### `paty bus tail`

Connects to a running bus and pretty-prints events as they arrive. Useful for verifying the bus end-to-end and as a reference implementation for TUI subscribers.

```bash
# terminal 1 — run the agent (the bundled default has bus.enabled: true)
paty run

# terminal 2 — tail the bus
paty bus tail                           # defaults to ws://127.0.0.1:8765
paty bus tail --url ws://remote:8765    # different host/port
paty bus tail --no-audio                # hide audio frame lines
```

### `paty bus tui`

Full-screen view of the same stream — transcript on the left, avatar top-right, equalizer bottom-right.

```bash
paty bus tui                            # defaults to ws://127.0.0.1:8765
paty bus tui --url ws://remote:8765
```

Built on Rich's immediate-mode `Live`: hold state in memory, rebuild the renderable tree on each event, let the library diff and repaint. `Layout` carves the terminal into named regions and each widget is a pure `(state) -> Renderable` function, so swapping a stub for real content is a one-file edit.

```
paty/tui/
├── __init__.py            — exports run
├── app.py                 — event loop, UIState, repaint
├── conversation.py        — Conversation/Turn
├── layout.py              — root split tree
└── widgets/
    ├── __init__.py
    ├── transcript.py      — conversation renderer
    ├── avatar.py          — stub face keyed off agent state
    └── equalizer.py       — stub bar chart (zero levels for now)
```

The avatar reacts to `state.changed` events out of the box (idle/listening/thinking/speaking). The equalizer is a visual stub — wiring it to real levels means subscribing to the bus's binary audio frames (`paty.bus.codec.unpack_audio_frame`) and computing per-band RMS.

## Hardware Profiles

When `profile: auto`, PATY detects your platform and memory to pick the best profile.

| Profile | STT | LLM | TTS | Memory Budget |
|---------|-----|-----|-----|---------------|
| apple-16gb | distil-whisper-large-v3 | qwen3:8b Q4 | kokoro | ~5.5GB |
| apple-24gb | large-v3-turbo | qwen3:14b Q4 | kokoro | ~9.5GB |
| cuda-24gb | distil-large-v2 | qwen3:14b Q4 | kokoro | ~9.5GB |
| cpu-only | distil-medium-en | qwen3:4b Q4 | piper | ~3GB |

## Architecture

PATY is a runtime resolver, not a code generator. It parses YAML, detects hardware, resolves config keys to Pipecat service constructors, builds a live Pipeline, and starts the runner.

```
YAML config
  → config loader (ruamel.yaml + Pydantic validation)
  → hardware detector (platform, GPU, memory)
  → service resolver (config keys → Pipecat service instances)
  → pipeline builder (services → Pipecat Pipeline)
  → runner (starts Pipecat PipelineRunner)
```

Every phase is traced via OpenTelemetry. Once the pipeline starts, Pipecat's built-in OTel tracing takes over for per-turn STT/LLM/TTS spans.

## Package Structure

```
paty/
├── cli.py                 # click CLI commands
├── config/
│   ├── schema.py          # Pydantic models
│   └── loader.py          # YAML loading + env interpolation
├── tracing/
│   └── setup.py           # OpenTelemetry TracerProvider init
├── hardware/
│   ├── detect.py          # platform/GPU/memory detection
│   └── profiles.py        # named profiles → model defaults
├── resolve/
│   ├── registry.py        # (provider, platform) → factory tables
│   └── resolver.py        # config + platform → Pipecat services
├── pipeline/
│   └── builder.py         # services → Pipeline + PipelineTask
├── bus/
│   ├── events.py          # event types + envelope
│   ├── codec.py           # binary audio frame pack/unpack
│   ├── server.py          # WebSocketBus (fan-out, backpressure)
│   ├── observer.py        # Pipecat frame → bus event translator
│   └── tail.py            # `paty bus tail` client
├── tui/
│   ├── app.py             # `paty bus tui` event loop + UIState
│   ├── conversation.py    # Conversation/Turn state
│   ├── layout.py          # Rich Layout split tree
│   └── widgets/
│       ├── transcript.py
│       ├── avatar.py
│       └── equalizer.py
└── utils/
    └── env.py             # ${VAR} interpolation
```
