Metadata-Version: 2.4
Name: pipecat-shunyalabs
Version: 1.1.1
Summary: Shunyalabs ASR & TTS services for Pipecat
Author-email: Shunyalabs <support@shunyalabs.ai>
License: MIT
Project-URL: homepage, https://shunyalabs.ai
Project-URL: repository, https://github.com/Shunyalabsai/shunyalabs-python-sdk
Keywords: pipecat,shunyalabs,asr,stt,tts,speech-to-text,text-to-speech
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pipecat-ai>=0.0.30
Requires-Dist: shunyalabs[all]>=3.0.3

# pipecat-shunyalabs

[![PyPI](https://img.shields.io/pypi/v/pipecat-shunyalabs.svg)](https://pypi.org/project/pipecat-shunyalabs/)
[![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](../../LICENSE)

[Shunyalabs](https://shunyalabs.ai) STT and TTS services for [Pipecat](https://github.com/pipecat-ai/pipecat).

Provides `ShunyalabsSTTService` and `ShunyalabsTTSService` that integrate with Pipecat's pipeline framework, backed by the [Shunyalabs Python SDK](https://github.com/Shunyalabsai/shunyalabs-python-sdk).

**Key capabilities:**

- Real-time streaming ASR with interim and final transcription frames
- High-fidelity voice synthesis with 46 speakers across 23 languages
- 11 emotion/delivery style tags for expressive voice responses
- Native Pipecat frame protocol — drop-in with any Pipecat pipeline
- Persistent WebSocket for STT; per-request WebSocket for TTS
- Output formats: PCM, WAV, MP3, OGG Opus, FLAC, mu-law, A-law

---

## ⚠️ Upgrading from 1.0.0 → 1.0.1

If you're already using `pipecat-shunyalabs`, **read this first**. There are two breaking changes that affect TTS:

### 1. `language` is now required

Old (1.0.0) silently accepted missing `language`; the gateway now returns HTTP 422 if it's missing. Always pass an ISO 639 code:

```diff
  tts = ShunyalabsTTSService(
      voice="Rajesh",
+     language="en",        # required — pass "en", "hi", "ta", etc.
  )
```

### 2. `speaker` parameter removed; `_format_text` no longer prepends speaker name

Old behaviour produced text like `"Rajesh: <Neutral> Hello"` — but the gateway **already prepends the speaker name server-side**, which caused `"Rajesh: Rajesh: <Neutral> Hello"` in the LLM prompt and resulted in muddied output. Fixed in 1.0.1.

```diff
  tts = ShunyalabsTTSService(
      voice="Rajesh",
-     speaker="Rajesh",     # remove — was a duplicate of `voice`
-     style="<Neutral>",    # optional now — gateway defaults to <Conversational>
+     style="<Happy>",      # only set this if you want a non-default style
      language="en",
  )
```

### 3. `style` is now optional

If you don't pass `style`, the gateway automatically applies `<Conversational>`. You only need to set `style` when you want a specific emotion (e.g. `<Happy>`, `<Sad>`, `<News>`).

### Quick install upgrade

```bash
pip install --upgrade pipecat-shunyalabs
```

That's it for migrations. Everything else works as before.

---

## Installation (New Users)

**Requirements:** Python 3.9+, Pipecat framework, a valid Shunyalabs API key.

```bash
pip install pipecat-shunyalabs
```

Install with a transport:

```bash
# Daily WebRTC transport
pip install pipecat-shunyalabs pipecat-ai[daily]
```

## Authentication

Set your API key as an environment variable (recommended):

```bash
export SHUNYALABS_API_KEY="your-api-key"
```

Or pass it directly:

```python
stt = ShunyalabsSTTService(api_key="your-api-key")
tts = ShunyalabsTTSService(api_key="your-api-key")
```

> **Security:** Never commit API keys to source control. Use a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) in production.

---

## Quick Start

```python
import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService

async def main():
    transport = LocalAudioTransport()

    stt = ShunyalabsSTTService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        language="en",
    )

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o",
    )

    tts = ShunyalabsTTSService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        voice="Rajesh",
        language="en",
        # style is optional — defaults to <Conversational>
    )

    pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
    await PipelineRunner().run(task)

if __name__ == "__main__":
    asyncio.run(main())
```

---

## STT — `ShunyalabsSTTService`

Real-time streaming speech-to-text over WebSocket. Maintains a persistent connection for the lifetime of the pipeline. Supports 23 Indian and international languages with automatic language detection.

### Parameters

| Parameter     | Type  | Default                      | Description                                                         |
| ------------- | ----- | ---------------------------- | ------------------------------------------------------------------- |
| `api_key`     | `str` | `None`                       | API key. Falls back to `SHUNYALABS_API_KEY` env var.                |
| `language`    | `str` | `"auto"`                     | Language code (e.g. `"en"`, `"hi"`) or `"auto"` for auto-detection. |
| `url`         | `str` | `wss://asr.shunyalabs.ai/ws` | WebSocket endpoint URL.                                             |
| `sample_rate` | `int` | `16000`                      | Expected audio sample rate in Hz. Must match transport input.       |

### How It Works

1. On pipeline `start`, opens a WebSocket connection to the Shunyalabs ASR gateway.
2. Audio chunks from the pipeline input are forwarded via `send_audio()`.
3. The gateway's built-in VAD detects speech boundaries and emits transcription events.
4. Events are mapped to Pipecat frames and pushed into the pipeline.

### Frame Mapping

| Shunyalabs Event | Pipecat Frame                                                              |
| ---------------- | -------------------------------------------------------------------------- |
| `PARTIAL`        | `InterimTranscriptionFrame` — emitted continuously as speech is recognized |
| `FINAL_SEGMENT`  | `TranscriptionFrame` — emitted at speech segment boundary                  |
| `FINAL`          | `TranscriptionFrame` — emitted when full utterance is finalized            |

### Auto-Reconnect

If the WebSocket connection drops during audio streaming, the service automatically reconnects and resumes sending audio.

---

## TTS — `ShunyalabsTTSService`

Streaming text-to-speech over WebSocket. Each synthesis request opens a new connection, streams audio chunks back as `TTSAudioRawFrame` frames. Supports 46 speakers across 23 languages — any speaker can synthesize in any language.

### Parameters

| Parameter       | Type    | Default                      | Required | Description                                                   |
| --------------- | ------- | ---------------------------- | -------- | ------------------------------------------------------------- |
| `api_key`       | `str`   | env `SHUNYALABS_API_KEY`     | ✓        | API key. Pass directly or via env var.                        |
| `voice`         | `str`   | `"Rajesh"`                   | ✓        | Speaker voice. See [Available Speakers](#available-speakers). |
| `language`      | `str`   | `"en"`                       | **✓**    | ISO 639 language code (`"en"`, `"hi"`, `"ta"`, etc.). **Now required by gateway.** |
| `model`         | `str`   | `"zero-indic"`               |          | TTS model identifier.                                         |
| `style`         | `str`   | `None`                       |          | Emotion/delivery style tag. If omitted, gateway uses `<Conversational>`. |
| `url`           | `str`   | `wss://tts.shunyalabs.ai/ws` |          | WebSocket endpoint URL.                                       |
| `output_format` | `str`   | `"pcm"`                      |          | Audio encoding. See [Output Formats](#output-formats).        |
| `speed`         | `float` | `1.0`                        |          | Speaking speed multiplier (0.25–4.0).                         |
| `sample_rate`   | `int`   | `16000`                      |          | Output sample rate in Hz.                                     |

> **Note for upgraders:** The `speaker` parameter has been removed in 1.0.1 — use `voice` only. See [migration notes](#%EF%B8%8F-upgrading-from-100--101).

### Output Formats

| Format           | Value      | Recommended Use                                 |
| ---------------- | ---------- | ----------------------------------------------- |
| PCM (raw 16-bit) | `pcm`      | Real-time pipelines, Pipecat `TTSAudioRawFrame` |
| WAV              | `wav`      | Uncompressed storage, offline processing        |
| MP3              | `mp3`      | Compressed storage, web delivery                |
| OGG Opus         | `ogg_opus` | Compressed web streaming                        |
| FLAC             | `flac`     | Lossless compressed storage                     |
| mu-law           | `mulaw`    | Telephony systems (G.711)                       |
| A-law            | `alaw`     | Telephony systems (G.711 European)              |

### Style Tags

| Tag                | Description                                            |
| ------------------ | ------------------------------------------------------ |
| `<Conversational>` | Casual, everyday speech — **default if `style` omitted** |
| `<Neutral>`        | Clean read-speech                                      |
| `<Happy>`          | Joyful, upbeat tone                                    |
| `<Sad>`            | Somber, melancholic tone                               |
| `<Angry>`          | Forceful, intense tone                                 |
| `<Fearful>`        | Anxious, trembling tone                                |
| `<Surprised>`      | Exclamatory, astonished tone                           |
| `<Disgust>`        | Repulsed, disapproving tone                            |
| `<News>`           | Formal news-anchor style                               |
| `<Narrative>`      | Storytelling / audiobook delivery style                |
| `<Enthusiastic>`   | Energetic, passionate tone                             |

### Text Formatting

The plugin **only** prepends the style tag (if you set one). The gateway handles the speaker prefix and default style tag server-side.

```python
tts = ShunyalabsTTSService(voice="Rajesh", style="<Happy>", language="en")
# Plugin sends:    "<Happy> Welcome!"
# Gateway expands: "Rajesh: <Happy> Welcome!"
```

If you omit `style`:

```python
tts = ShunyalabsTTSService(voice="Rajesh", language="en")
# Plugin sends:    "Welcome!"
# Gateway expands: "Rajesh: <Conversational> Welcome!"
```

### Available Speakers

46 speakers across 23 languages (1 male + 1 female per language). Every speaker can synthesize in any language.

| Language  | Male               | Female   |
| --------- | ------------------ | -------- |
| English   | Varun              | Nisha    |
| Hindi     | Rajesh _(default)_ | Sunita   |
| Bengali   | Arjun              | Priyanka |
| Tamil     | Murugan            | Thangam  |
| Telugu    | Vishnu             | Lakshmi  |
| Kannada   | Kiran              | Shreya   |
| Malayalam | Krishnan           | Deepa    |
| Marathi   | Siddharth          | Ananya   |
| Gujarati  | Rakesh             | Pooja    |
| Punjabi   | Gurpreet           | Simran   |
| Urdu      | Salman             | Fatima   |
| Odia      | Bijay              | Sujata   |
| Assamese  | Bimal              | Anjana   |
| Maithili  | Suresh             | Meera    |
| Nepali    | Bikash             | Sapana   |
| Sanskrit  | Vedant             | Gayatri  |
| Kashmiri  | Farooq             | Habba    |
| Konkani   | Mohan              | Sarita   |
| Dogri     | Vishal             | Neelam   |
| Sindhi    | Amjad              | Kavita   |
| Manipuri  | Tomba              | Ibemhal  |
| Santali   | Chandu             | Roshni   |
| Bodo      | Daimalu            | Hasina   |

### Frame Output

| Frame              | Description                                      |
| ------------------ | ------------------------------------------------ |
| `TTSStartedFrame`  | Emitted when synthesis begins.                   |
| `TTSAudioRawFrame` | Emitted for each audio chunk (PCM, 16kHz, mono). |
| `TTSStoppedFrame`  | Emitted when synthesis completes.                |

### Examples

**Default (Conversational)** — recommended for voice agents:

```python
tts = ShunyalabsTTSService(
    voice="Nisha",
    language="en",
)
```

**Custom emotion + speed:**

```python
tts = ShunyalabsTTSService(
    voice="Nisha",
    style="<Enthusiastic>",
    language="en",
    speed=1.1,
    output_format="pcm",
)
```

**Hindi news-style:**

```python
tts = ShunyalabsTTSService(
    voice="Rajesh",
    language="hi",
    style="<News>",
)
```

---

## Full Pipeline Example

A complete voice agent using Shunyalabs STT and TTS with OpenAI LLM on the Daily WebRTC transport:

```python
import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
    OpenAILLMContext, OpenAILLMContextAggregator,
)
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService

async def run_voice_agent(room_url: str, token: str):
    transport = DailyTransport(
        room_url, token, "Shunyalabs Agent",
        DailyParams(audio_out_enabled=True, transcription_enabled=False),
    )

    stt = ShunyalabsSTTService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        language="auto",
        sample_rate=16000,
    )

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o",
    )

    messages = [{
        "role": "system",
        "content": (
            "You are a helpful voice assistant powered by Shunyalabs. "
            "Keep responses concise and natural for voice delivery."
        ),
    }]
    context = OpenAILLMContext(messages)
    context_aggregator = llm.create_context_aggregator(context)

    tts = ShunyalabsTTSService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        voice="Rajesh",
        language="hi",  # required
    )

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ])

    task = PipelineTask(
        pipeline,
        PipelineParams(allow_interruptions=True, enable_metrics=True),
    )

    @transport.event_handler("on_first_participant_joined")
    async def on_first_participant_joined(transport, participant):
        await task.queue_frames([context_aggregator.user().get_context_frame()])

    await PipelineRunner().run(task)

if __name__ == "__main__":
    asyncio.run(run_voice_agent(
        room_url=os.environ["DAILY_ROOM_URL"],
        token=os.environ["DAILY_TOKEN"],
    ))
```

---

## Multilingual Examples

```python
# Hindi conversational bot (default style)
tts = ShunyalabsTTSService(voice="Rajesh", language="hi")

# English news-style bot
tts = ShunyalabsTTSService(voice="Varun", language="en", style="<News>")

# Tamil narrative voice
tts = ShunyalabsTTSService(voice="Murugan", language="ta", style="<Narrative>")
```

---

## Error Reference

All Shunyalabs SDK exceptions inherit from `ShunyalabsError`.

| Exception               | HTTP Code | Description                                           |
| ----------------------- | --------- | ----------------------------------------------------- |
| `AuthenticationError`   | 401       | Invalid or missing API key.                           |
| `PermissionDeniedError` | 403       | API key lacks permission for the resource.            |
| `NotFoundError`         | 404       | Requested resource not found.                         |
| `ValidationError`       | 422       | Missing required field (e.g. `language`).             |
| `RateLimitError`        | 429       | Rate limit exceeded. Implement exponential backoff.   |
| `ServerError`           | 5xx       | Server-side error. Retried automatically.             |
| `TimeoutError`          | —         | Request exceeded timeout (default 60s).               |
| `ConnectionError`       | —         | Network connectivity issue.                           |
| `TranscriptionError`    | —         | ASR-specific failure (e.g. unsupported audio format). |
| `SynthesisError`        | —         | TTS-specific failure (e.g. invalid voice parameter).  |

```python
from shunyalabs.exceptions import AuthenticationError, RateLimitError, ShunyalabsError

try:
    result = await client.tts.synthesize(text, config=config)
except AuthenticationError:
    print("Invalid API key — check SHUNYALABS_API_KEY")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except ShunyalabsError as e:
    print(f"Unexpected error: {e}")
```

---

## Troubleshooting

| Symptom                                                | Resolution                                                                                 |
| ------------------------------------------------------ | ------------------------------------------------------------------------------------------ |
| **HTTP 422 "language: Field required"**                | Add `language="en"` (or another ISO 639 code) to `ShunyalabsTTSService(...)`.              |
| **TTS audio sounds wrong / muddied (after upgrade)**   | Remove `speaker=...` from the constructor — it's no longer needed and was causing a double-prefix bug in 1.0.0. |
| `AuthenticationError` on startup                       | Verify `SHUNYALABS_API_KEY` is set and valid.                                              |
| WebSocket connection refused                           | Ensure outbound WSS (port 443) is open to `asr.shunyalabs.ai` and `tts.shunyalabs.ai`.     |
| No transcription output                                | Check `sample_rate` matches your transport input. Verify audio source is active.           |
| TTS audio silent or missing                            | Ensure `output_format=pcm` matches transport output. Verify `TTSStartedFrame` is received. |
| High latency on first TTS chunk                        | Deploy closer to the Shunyalabs gateway region (`asia-south1`).                            |
| `RateLimitError`                                       | Implement exponential backoff. Check `e.retry_after`.                                      |
| `ImportError: pipecat_shunyalabs`                      | Run `pip install pipecat-shunyalabs`. Confirm virtual environment is activated.            |

---

## Changelog

### 1.0.1 (2026-04-11)

- **Breaking:** `language` is now required by the gateway. Always pass an ISO 639 code.
- **Breaking:** Removed `speaker` parameter (was a duplicate of `voice`).
- **Bug fix:** `_format_text` no longer prepends the speaker name on top of the gateway's server-side prefix. Old 1.0.0 behaviour produced `"Rajesh: Rajesh: <Neutral> ..."` in the model prompt and degraded output quality.
- `style` is now optional — defaults to `<Conversational>` server-side.

### 1.0.0

- Initial public release.

---

## License

[MIT](../../LICENSE)
