Metadata-Version: 2.4
Name: livekit-plugins-rumik-ai
Version: 0.2.0
Summary: LiveKit Agents plugin for text-to-speech with Rumik AI (muga & mulberry).
Project-URL: Homepage, https://rumik.ai/
Project-URL: Source, https://github.com/rumik-ai/livekit-plugins-rumik-ai
Project-URL: Issues, https://github.com/rumik-ai/livekit-plugins-rumik-ai/issues
Project-URL: Documentation, https://docs.livekit.io/agents/integrations/tts/
Author-email: Rumik AI <hello@rumik.ai>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: audio,hinglish,livekit,realtime,rumik-ai,text-to-speech,tts,webrtc
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10.0
Requires-Dist: livekit-agents[codecs]<2,>=1.5
Provides-Extra: dev
Requires-Dist: aiohttp; extra == 'dev'
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# livekit-plugins-rumik-ai

[Rumik AI](https://rumik.ai/) text-to-speech plugin for [LiveKit Agents](https://github.com/livekit/agents).

Streams low-latency 24 kHz speech from Rumik's **Silk** models over a reusable WebSocket
session:

- **muga** — emotion-controlled via a leading `[tone]` tag (e.g. `[happy]`, `[sad]`) plus
  optional `<laugh>`/`<chuckle>`/`<sigh>` events. Tuned for Romanized Hinglish.
- **mulberry** — steered by a natural-language voice `description` or a preset `speaker`,
  with optional pitch shift (`f0_up_key`).

## Install

```bash
pip install livekit-plugins-rumik-ai
```

This depends on `livekit-agents` (1.5+). Set your key:

```bash
export RUMIK_API_KEY="your-rumik-api-key"
```

## Quickstart

```python
from livekit.agents import AgentSession
from livekit.plugins import rumik_ai

# muga: the LLM should start each reply with one tone tag, e.g. "[happy] ..."
session = AgentSession(
    stt=...,
    llm=...,
    tts=rumik_ai.TTS(model="muga"),
)
```

Mulberry, steered by a voice description (or a preset speaker):

```python
tts = rumik_ai.TTS(
    model="mulberry",
    description="warm, gentle female friend",
    # speaker="speaker_1",      # optional preset, overrides description
    # f0_up_key=2.0,            # optional pitch shift, -12..12 semitones
)
```

### Changing the voice at runtime

`description`, `speaker`, `f0_up_key`, and the sampling params are sent on **every
request**, so you can change mulberry's voice between turns without reconnecting — the
pooled WebSocket is reused (only a `model` change re-mints the session):

```python
tts.update_options(description="excited young man, fast and energetic")
# the next synthesis request uses the new voice
```

## Latency vs. smoothness

The default is model-aware:

- **muga** buffers the full LLM reply and synthesizes it in one request, so its leading
  `[tone]` tag conditions the whole utterance (and there are no per-request TTFB gaps).
- **mulberry** streams sentence-by-sentence for lower time-to-first-word, since it has
  no tone tag to protect.

Override either with `full_response_aggregation`:

```python
rumik_ai.TTS(model="muga", full_response_aggregation=False, tone="neutral")  # muga, lower latency
rumik_ai.TTS(model="mulberry", full_response_aggregation=True)               # mulberry, smoother
```

When you turn aggregation **off for muga**, set a fallback `tone=` so every sentence
keeps a tone tag.

## Barge-in & cancel

Built for the live "call with AI" case. When the caller talks over the agent, LiveKit
interrupts the TTS and the plugin sends an explicit cancel to Rumik, so the in-flight
generation stops immediately (and billing is finalized cleanly). The pooled WebSocket is
kept **warm** across the interruption, so the next utterance doesn't pay a reconnect.

## Configuration

| Argument | Models | Notes |
|---|---|---|
| `model` | both | `"muga"` (default) or `"mulberry"` |
| `tone` | muga | fallback tone when input is untagged |
| `description` | mulberry | natural-language voice description |
| `speaker` | mulberry | `speaker_1`..`speaker_4` |
| `f0_up_key` | mulberry | pitch shift, `-12`..`12` |
| `temperature`, `top_p`, `top_k`, `repetition_penalty`, `max_new_tokens` | both | omitted unless set (Rumik defaults apply) |
| `full_response_aggregation` | both | buffer the full reply (`True`) vs. stream per sentence (`False`). Default: `True` for muga, `False` for mulberry |
| `api_key` | — | defaults to `RUMIK_API_KEY` |
| `base_url` | — | defaults to `https://silk-api.rumik.ai` |

## Examples

See [`examples/`](./examples) for a full voice agent (`rumik_ai_agent.py`) and a
record-to-WAV demo (`rumik_ai_tts.py`).

## License

Apache-2.0
