Metadata-Version: 2.4
Name: pipecat-rumik
Version: 0.1.3
Summary: Rumik text-to-speech services for Pipecat
Project-URL: Homepage, https://rumik.ai
Project-URL: Repository, https://github.com/ira-rumik/pipecat-rumik
Project-URL: Issues, https://github.com/ira-rumik/pipecat-rumik/issues
Project-URL: Support, https://github.com/ira-rumik/pipecat-rumik/issues
Author: Rumik AI
Maintainer: Rumik AI
License: MIT License
        
        Copyright (c) 2026 Rumik AI
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: pipecat,rumik,speech,text-to-speech,tts,voice-ai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: aiohttp>=3.9
Requires-Dist: certifi>=2024.2.2
Requires-Dist: pipecat-ai<2,>=1.0.0
Requires-Dist: websockets>=13
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.8; extra == 'dev'
Provides-Extra: examples
Requires-Dist: pipecat-ai[daily,deepgram,openai,runner,webrtc]<2,>=1.0.0; extra == 'examples'
Requires-Dist: python-dotenv<2,>=1.0.0; extra == 'examples'
Description-Content-Type: text/markdown

# pipecat-rumik

Text-to-speech service implementations for
[Pipecat](https://github.com/pipecat-ai/pipecat) using Rumik AI's TTS APIs.

## Overview

`pipecat-rumik` provides two Pipecat TTS services:

- `RumikTTSService` for WebSocket-based synthesis in interactive voice
  pipelines.
- `RumikHttpTTSService` for HTTP request/response synthesis.

The package follows Pipecat's service conventions: constructor-level provider
configuration, runtime-configurable `Settings`, raw PCM audio frames, metrics,
and standard service connection events.

```python
from pipecat_rumik import RumikHttpTTSService, RumikTTSService, RumikTTSSettings
```

## Compatibility and Maintenance

This package is maintained by Rumik AI. The current implementation is tested
with Pipecat `1.3.0` and supports Pipecat `>=1.0.0,<2`.

## Installation

```bash
pip install pipecat-rumik
```

To run the included voice pipeline examples, install the examples extra:

```bash
pip install "pipecat-rumik[examples]"
```

For local development:

```bash
uv sync --extra dev --extra examples
```

## Quick Start

```python
import os

from pipecat_rumik import RumikTTSService

tts = RumikTTSService(
    api_key=os.environ["RUMIK_API_KEY"],
    gateway_url=os.environ["RUMIK_GATEWAY_URL"],
    settings=RumikTTSService.Settings(
        model="muga",
    ),
)
```

For Mulberry expressive voices:

```python
import os

from pipecat_rumik import RumikTTSService

tts = RumikTTSService(
    api_key=os.environ["RUMIK_API_KEY"],
    gateway_url=os.environ["RUMIK_GATEWAY_URL"],
    settings=RumikTTSService.Settings(
        model="mulberry",
        voice="speaker_1",
        description=(
            "warm expressive Indian woman with clear Hinglish diction, natural "
            "pauses, gentle energy, and a friendly conversational delivery"
        ),
        f0_up_key=3,
    ),
)
```

## Prerequisites

Before using either service, configure access to the Rumik AI gateway:

```bash
export RUMIK_API_KEY=...
export RUMIK_GATEWAY_URL=...
```

Create API keys from the [Rumik AI dashboard](https://playground.rumik.ai/).
Use the gateway URL provided for your Rumik AI deployment.

The full Pipecat voice pipeline examples also require:

```bash
export DEEPGRAM_API_KEY=...
export OPENAI_API_KEY=...
```

Optional environment variables used by the smoke-test examples:

```bash
export RUMIK_MODEL=muga
export RUMIK_SPEAKER=
export RUMIK_DESCRIPTION=
export RUMIK_F0_UP_KEY=
export RUMIK_TEMPERATURE=
export RUMIK_TOP_P=
export RUMIK_TOP_K=
export RUMIK_REPETITION_PENALTY=
export RUMIK_MAX_NEW_TOKENS=
```

Leave optional values empty to use Rumik's API defaults. Set `RUMIK_MODEL=mulberry`
with `RUMIK_SPEAKER`, `RUMIK_DESCRIPTION`, or `RUMIK_F0_UP_KEY` when testing
expressive voices.

## Service Selection

| Service | Transport | Recommended Use |
| --- | --- | --- |
| `RumikTTSService` | WebSocket | Interactive Pipecat voice agents that need interruption-aware TTS. |
| `RumikHttpTTSService` | HTTP | Simpler synthesis flows where a request/response API is preferred. |

Use the WebSocket service for conversational applications. Use the HTTP service
for batch-style synthesis or integration tests that do not need a persistent TTS
connection.

## Configuration

### RumikTTSService

| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `api_key` | `str` | yes | Rumik AI API key. |
| `gateway_url` | `str` | yes | Rumik AI gateway base URL. |
| `settings` | `RumikTTSService.Settings \| None` | no | Runtime-configurable TTS settings. |
| `sample_rate` | `int \| None` | no | Output sample rate. Rumik currently emits 24 kHz PCM. |
| `full_response_aggregation` | `bool` | no | Buffer a complete LLM response before sending it to Rumik. Defaults to `True`. |

### RumikHttpTTSService

| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `api_key` | `str` | yes | Rumik AI API key. |
| `gateway_url` | `str` | yes | Rumik AI gateway base URL. |
| `aiohttp_session` | `aiohttp.ClientSession` | yes | Caller-owned HTTP session. |
| `settings` | `RumikHttpTTSService.Settings \| None` | no | Runtime-configurable TTS settings. |
| `sample_rate` | `int \| None` | no | Output sample rate. Rumik currently emits 24 kHz PCM. |

## Settings

Both services use Pipecat's service settings pattern:

```python
settings=RumikTTSService.Settings(...)
settings=RumikHttpTTSService.Settings(...)
```

`RumikTTSService.Settings` and `RumikHttpTTSService.Settings` are aliases of
`RumikTTSSettings`.

| Setting | Type | Default | Used By | Description |
| --- | --- | --- | --- | --- |
| `model` | `str \| None` | `"muga"` | HTTP, WS | Rumik model identifier. |
| `voice` | `str \| None` | `None` | HTTP, WS | Preset speaker voice. Sent to Rumik as `speaker`. |
| `language` | `Language \| str \| None` | `None` | inherited | Reserved for Pipecat provider compatibility. |
| `description` | `str \| None` | `None` | HTTP, WS | Natural-language voice/style description for expressive models. |
| `f0_up_key` | `int \| None` | `None` | HTTP, WS | Pitch shift in semitones for preset speaker voices. |
| `temperature` | `float \| None` | `None` | HTTP, WS | Sampling temperature. When omitted, Rumik uses its API default. |
| `top_p` | `float \| None` | `None` | HTTP, WS | Nucleus sampling value. When omitted, Rumik uses its API default. |
| `top_k` | `int \| None` | `None` | HTTP, WS | Top-k sampling value. When omitted, Rumik uses its API default. |
| `repetition_penalty` | `float \| None` | `None` | HTTP, WS | Penalty applied to repeated tokens. When omitted, Rumik uses its API default. |
| `max_new_tokens` | `int \| None` | `None` | HTTP, WS | Maximum generated audio tokens. When omitted, Rumik uses its API default. |

Runtime updates use Pipecat's `TTSUpdateSettingsFrame` with
`RumikTTSSettings`.

## Usage

### WebSocket Service

```python
import os

from pipecat_rumik import RumikTTSService

tts = RumikTTSService(
    api_key=os.environ["RUMIK_API_KEY"],
    gateway_url=os.environ["RUMIK_GATEWAY_URL"],
    settings=RumikTTSService.Settings(
        model="muga",
    ),
)
```

For expressive models, use Pipecat's inherited `voice` setting for preset
speaker voices. The service sends it to Rumik as `speaker`.

```python
tts = RumikTTSService(
    api_key=os.environ["RUMIK_API_KEY"],
    gateway_url=os.environ["RUMIK_GATEWAY_URL"],
    settings=RumikTTSService.Settings(
        model="mulberry",
        voice="speaker_1",
        description=(
            "warm expressive Indian woman with clear Hinglish diction, natural "
            "pauses, gentle energy, and a friendly conversational delivery"
        ),
        f0_up_key=3,
    ),
)
```

By default, `RumikTTSService` uses full-response aggregation. This sends one
complete assistant response to Rumik instead of creating a separate TTS request
for every sentence.

```python
tts = RumikTTSService(
    api_key=os.environ["RUMIK_API_KEY"],
    gateway_url=os.environ["RUMIK_GATEWAY_URL"],
    full_response_aggregation=False,
)
```

### HTTP Service

```python
import asyncio
import os

import aiohttp

from pipecat_rumik import RumikHttpTTSService


async def main():
    async with aiohttp.ClientSession() as session:
        tts = RumikHttpTTSService(
            api_key=os.environ["RUMIK_API_KEY"],
            gateway_url=os.environ["RUMIK_GATEWAY_URL"],
            aiohttp_session=session,
            settings=RumikHttpTTSService.Settings(
                model="mulberry",
                voice="speaker_2",
                description="calm female narrator",
                f0_up_key=0,
                temperature=0.6,
                top_p=0.95,
                top_k=50,
                repetition_penalty=1.2,
                max_new_tokens=2048,
            ),
        )


asyncio.run(main())
```

The HTTP service uses the caller-owned `aiohttp.ClientSession`. Create and close
the session in your application code.

## Notes

- **WebSocket vs HTTP**: The WebSocket service is intended for interactive
  Pipecat conversations. The HTTP service is useful for simpler batch-style
  synthesis.
- **Request lifecycle**: The WebSocket service keeps one active synthesis
  request per connection so audio chunks are routed deterministically to the
  active Pipecat audio context.
- **Interruption handling**: On interruption, the WebSocket service closes the
  active socket and opens a fresh session before accepting the next synthesis
  request.
- **Audio format**: Rumik currently emits 24 kHz, mono, signed 16-bit PCM. The
  services validate provider responses against this audio contract before
  emitting Pipecat audio frames.
- **Voice selection**: Use `voice` for preset speaker voices. The service sends
  this setting to Rumik as `speaker`.
- **Model steering**: `muga` can be steered with tone tags in the input text.
  Expressive models can use `description`, optional preset `voice`, and
  `f0_up_key`.
- **HTTP response handling**: The HTTP service validates the WAV response,
  removes the WAV container, and emits raw PCM frames.

## Event Handlers

`RumikTTSService` supports Pipecat's standard service connection events:

| Event | Description |
| --- | --- |
| `on_connected` | Connected to the Rumik WebSocket service. |
| `on_disconnected` | Disconnected from the Rumik WebSocket service. |
| `on_connection_error` | WebSocket connection error occurred. |

```python
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Rumik")
```

## Examples

The repository includes two full Pipecat voice pipeline examples:

```bash
uv run --extra examples examples/voice/voice-rumik-muga.py -t webrtc --host localhost --port 7860
uv run --extra examples examples/voice/voice-rumik-mulberry.py -t webrtc --host localhost --port 7861
```

Open the Pipecat WebRTC client for the selected port and talk to the bot:

- Muga: `http://localhost:7860/client/`
- Mulberry: `http://localhost:7861/client/`

The voice examples implement a simple, expressive Indian voice assistant for
everyday questions, quick decisions, small plans, and short messages. The Muga
example prompts the LLM to emit the required tone tags. The Mulberry example
uses `speaker_1`, a warm expressive Indian woman voice description, and
`f0_up_key=3` by default.

The repository also includes lower-level provider smoke tests:

```bash
uv run python examples/smoke_rumik_http.py
uv run python examples/smoke_rumik_ws.py
uv run python examples/smoke_rumik_ws_suite.py
```

The smoke tests require real Rumik credentials. Unit tests do not.

## Testing

Offline checks:

```bash
uv run --extra dev pytest -q
uv run --extra dev --extra examples ruff check src tests examples scripts
uv run --extra dev --extra examples python -m compileall -q src tests examples scripts
uv build
uv run --with twine twine check dist/*
```

See [TESTING.md](TESTING.md) for the full test checklist and
[RELEASE.md](RELEASE.md) for TestPyPI/PyPI release steps.

## License

This package is released under the MIT License. See [LICENSE](LICENSE).
