Metadata-Version: 2.4
Name: pyai-sdk
Version: 0.2.0
Summary: Official Python SDK for PyAI — speech-to-text (Hear), text-to-speech (Speak), realtime voice agents (Omni), and call compliance (Trace).
Project-URL: Homepage, https://pyai.com
Project-URL: Documentation, https://api.pyai.com/docs
Project-URL: Repository, https://github.com/atomsai/pyai-platform-backend
License: MIT
Keywords: compliance,openai-compatible,pyai,realtime,speech,stt,transcription,tts,voice-agents,voice-ai
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# pyai-sdk (Python SDK)

Official Python SDK for [PyAI](https://pyai.com) — the all-in-one voice AI
platform: lightning-fast speech-to-text, ultra-realistic text-to-speech,
end-to-end realtime voice agents, and automatic call compliance. Zero
third-party dependencies (standard library only); Python 3.9+.

## PyAI products

- **[Hear](https://pyai.com/models/hear)** — Lightning-fast, telephony-native **speech-to-text**. Whisper-compatible transcription tuned for real phone-call audio, with live streaming partials so your app reacts mid-sentence, plus async batch transcription for big archives. `POST /v1/audio/transcriptions`
- **[Speak](https://pyai.com/models/speak)** — Ultra-realistic **text-to-speech** that starts speaking in tens of milliseconds. Stream lifelike, expressive voices, choose from 36 studio-quality presets, or clone any voice instantly — for free. `POST /v1/audio/speech`
- **[Omni](https://pyai.com/models/omni)** _(flagship)_ — One **API for a complete, end-to-end voice AI agent**. A single WebSocket where your agent listens, thinks, and speaks — grounded in your knowledge bases and tools, with human-like turn-taking and instant barge-in — no STT, LLM, or TTS to stitch together yourself. `wss://api.pyai.com/v1/omni`
- **[Trace](https://pyai.com/models/trace)** _(flagship)_ — The **compliance API that keeps your AI agents safe**. Trace automatically checks every call for HIPAA, TCPA, and PII risks (plus your own brand-voice rules), flags the exact rule broken, redacts sensitive data, and seals each call with a tamper-evident audit trail — so a risky conversation never slips through. `GET /v1/trace/interactions`
- **[Cue](https://pyai.com/models/cue)** — Realtime **turn detection + knowledge-grounded context** for your own stack. Bring your own LLM and voice; Cue nails the hard part — knowing the instant a speaker finishes and surfacing the right context. `wss://api.pyai.com/v1/audio/transcriptions/stream`
- **[Telephony](https://pyai.com/models/telephony)** — Instant **managed phone numbers** for your voice agents. Provision a US number and route live calls straight into an Omni agent — no carrier contracts, no telephony glue. `POST /v1/telephony/numbers`

The contract is `https://api.pyai.com/openapi.json`. This SDK wraps it with
typed errors, automatic retries, and realtime URL helpers.

## Install

```bash
pip install pyai-sdk
```

## Quickstart

```python
import os
from pyai import PyAI, new_idempotency_key

pyai = PyAI(api_key=os.environ["PYAI_API_KEY"])

# Text-to-speech
audio = pyai.audio.speech(input="Hello from PyAI.", voice="stock_sarah_style2")
open("hello.wav", "wb").write(audio)

# Voices
voices = pyai.voices.list(gender="female")

# Async transcription (safe retry with an idempotency key)
job = pyai.transcription_jobs.create(
    audio_url="https://example.com/call.wav",
    diarize=True,
    idempotency_key=new_idempotency_key(),
)
done = pyai.transcription_jobs.get(job["job_id"])
```

## Speak audio formats (incl. telephony G.711)

`audio.speech` encodes server-side into any of eight formats via `response_format`,
so telephony callers no longer hand-roll a resampler + μ-law encoder — the audio
comes back already in the shape you need:

```python
# Twilio/SIP-ready in one param: raw 8 kHz mono μ-law, no client-side DSP.
ulaw = pyai.audio.speech(
    input="Your appointment is confirmed.",
    voice="stock_sarah_style2",
    response_format="g711_ulaw",   # -> audio/basic, forced 8 kHz
)
import base64
media_frame_payload = base64.b64encode(ulaw).decode()  # straight into Twilio
```

| `response_format` | sample rates (Hz) | Content-Type |
|---|---|---|
| `mp3` (default) | 8000 / 16000 / 24000 / 48000 | `audio/mpeg` |
| `wav` | 8000 / 16000 / 24000 / 48000 | `audio/wav` |
| `opus` | 8000 / 16000 / 24000 / 48000 | `audio/ogg` |
| `aac` | 8000 / 16000 / 24000 / 48000 | `audio/aac` |
| `flac` | 8000 / 16000 / 24000 / 48000 | `audio/flac` |
| `pcm` (raw int16 LE, no header) | 8000 / 16000 / 24000 / 48000 | `audio/pcm` |
| `g711_ulaw` | 8000 (forced) | `audio/basic` |
| `g711_alaw` | 8000 (forced) | `audio/basic` |

The accepted set is exported as `SPEECH_FORMATS` / `SPEECH_SAMPLE_RATES` (and a
`SpeechFormat` `Literal` for type-checkers). Any other value is a
`400 unsupported_format`. `sample_rate` is optional — omit it for the engine's
native 24 kHz (`g711_*` is always 8 kHz); omit `response_format` for the default
`mp3`. See
[`examples/speak-telephony-formats`](../../examples/speak-telephony-formats) for
the full before/after.

## Realtime (Omni)

Keys travel as a WebSocket subprotocol. Use the helpers with your preferred WS
library (e.g. `websockets`):

```python
url = pyai.realtime_url(product="omni", agent_id="agent_123")
subprotocol = pyai.realtime_subprotocol()

import asyncio, websockets

async def main():
    async with websockets.connect(url, subprotocols=[subprotocol]) as ws:
        async for frame in ws:
            print(frame)

asyncio.run(main())
```

> Omni uses the native `wss://api.pyai.com/v1/omni` surface (the default for
> `product="omni"`); `product="flow"` uses `/v1/realtime`. The older
> `/v2/omni/chat` URL is deprecated but still works.

## Streaming speech-to-text (Hear / Cue)

The standard library has no production-grade WebSocket client, so the SDK gives
you a URL builder (`hear_stream_url`) plus the subprotocol helper; pair them with
`websockets` (or `websocket-client`). The wire protocol: stream binary PCM16/opus
frames, send `{"type":"commit"}` to force-finalize, and read JSON frames of type
`partial` / `partial_stable` / `speech_final` / `final` / `error`:

```python
import asyncio, json, websockets

url = pyai.hear_stream_url(sample_rate=16000)

async def transcribe(pcm_chunks):
    async with websockets.connect(url, subprotocols=[pyai.realtime_subprotocol()]) as ws:
        async for pcm16 in pcm_chunks:
            await ws.send(pcm16)
        await ws.send(json.dumps({"type": "commit"}))
        async for frame in ws:
            print(json.loads(frame))

asyncio.run(transcribe(mic_source()))
```

For **Cue** (turn detection + KB context), send `{"type": "config", "grounding": true}`
as the first text frame after connecting; `final`/`speech_final` frames then carry
a `grounding` list of top KB passages.

## Sync STT, telephony output, and more APIs

```python
# Synchronous speech-to-text
text = pyai.audio.transcriptions.create(file=open("call.wav", "rb"), language="en")["text"]

# Telephony-ready TTS: raw 8 kHz G.711 for Twilio/SIP, encoded server-side —
# no client-side resampler or μ-law encoder. Just base64 it into a media frame.
ulaw = pyai.audio.speech(input="Hi there", response_format="g711_ulaw")

# Voice clones (Speak)
clone = pyai.clones.create(name="Brand VO", file=open("ref.wav", "rb"))
pyai.clones.delete(clone["id"])

# Managed phone numbers (Telephony)
avail = pyai.telephony.numbers.available(area_code="415")["data"]
num = pyai.telephony.numbers.buy(phone_number=avail[0]["phone_number"], agent_id="agent_123")
pyai.telephony.numbers.assign(num["id"], "agent_123")
pyai.telephony.numbers.release(num["id"])

# Compliance (Trace)
fails = pyai.trace.interactions.list(verdict="FAIL")["data"]
pyai.trace.config.set(agent_id="agent_123", enabled=True)
exposure = pyai.trace.exposure(window_days=30)

# Per-call eval scorecard (timeline + quality metrics). Additive and forward-
# compatible — present once the engine emits them, so reading is always safe
# (call_timeline returns [] until then).
timeline = pyai.trace.call_timeline(fails[0]["id"])              # list[dict] of turns
quality = pyai.trace.interactions.get(fails[0]["id"]).get("quality_metrics")
```

## Reproducible runs (evals)

`audio.speech` and `audio.transcriptions.create` take optional `seed` and
`temperature` for deterministic eval runs. They're forward-compatible — honored
once the engine supports them and otherwise ignored — so it's always safe to pass:

```python
pyai.audio.speech(input="Hello", voice="stock_sarah_style2", seed=42, temperature=0)
pyai.audio.transcriptions.create(file=open("call.wav", "rb"), seed=42)
```

## CLI (`pyai`)

The package installs a `pyai` command (also `python -m pyai`). `pyai doctor`
introspects your key/scopes via `GET /v1/me` (skipped gracefully if the route
isn't deployed yet), checks endpoint liveness, runs a Speak→Hear round-trip, and
prints remediation hints:

```bash
export PYAI_API_KEY=pyai_test_...
pyai doctor
# PASS  key (/v1/me)  — env=test; 3 scope(s): hear:transcribe, voice:synthesize, hear:stream
# PASS  speak→hear round-trip  — synth 45210 bytes → "the quick brown fox…"
# Diagnosis: healthy. Key, endpoint, and a Speak→Hear round-trip all work.

pyai smoke   # lighter: models + voices + speak
```

## Errors

Failures raise `PyAIError` with a stable `code` (branch on it, not the message):

```python
from pyai import PyAIError

try:
    pyai.audio.speech(input="hi")
except PyAIError as err:
    if err.code == "credit_exhausted":
        ...  # out of prepaid credit — add credit or use a sandbox key
```

Common codes: `unauthorized`, `forbidden`, `credit_exhausted`,
`rate_limit_exceeded`, `concurrency_limit_exceeded`, `idempotency_conflict`.
`429`/`5xx` are retried automatically (honoring `Retry-After`); tune with
`PyAI(api_key, max_retries=...)`.

## Develop

```bash
python -m unittest discover -s tests -v   # no network; transport injected
```
