Metadata-Version: 2.4
Name: livekit-plugins-rumik-ai
Version: 0.1.1
Summary: LiveKit Agents plugin for text-to-speech with Rumik AI (muga & mulberry).
Project-URL: Homepage, https://rumik.ai/
Project-URL: Source, https://github.com/rumik-ai/livekit-plugins-rumik-ai
Project-URL: Issues, https://github.com/rumik-ai/livekit-plugins-rumik-ai/issues
Project-URL: Documentation, https://docs.livekit.io/agents/integrations/tts/
Author-email: Rumik AI <hello@rumik.ai>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: audio,hinglish,livekit,realtime,rumik-ai,text-to-speech,tts,webrtc
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10.0
Requires-Dist: livekit-agents[codecs]<2,>=1.5
Provides-Extra: dev
Requires-Dist: aiohttp; extra == 'dev'
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# livekit-plugins-rumik-ai

[Rumik AI](https://rumik.ai/) text-to-speech plugin for [LiveKit Agents](https://github.com/livekit/agents).

Streams low-latency 24 kHz speech from Rumik's **Silk** models over a reusable WebSocket
session:

- **muga** — emotion-controlled via a leading `[tone]` tag (e.g. `[happy]`, `[sad]`) plus
  optional `<laugh>`/`<chuckle>`/`<sigh>` events. Tuned for Romanized Hinglish.
- **mulberry** — steered by a natural-language voice `description` or a preset `speaker`,
  with optional pitch shift (`f0_up_key`).

## Install

```bash
pip install livekit-plugins-rumik-ai
```

This depends on `livekit-agents` (1.5+). Set your key:

```bash
export RUMIK_API_KEY="your-rumik-api-key"
```

## Quickstart

```python
from livekit.agents import AgentSession
from livekit.plugins import rumik_ai

# muga: the LLM should start each reply with one tone tag, e.g. "[happy] ..."
session = AgentSession(
    stt=...,
    llm=...,
    tts=rumik_ai.TTS(model="muga"),
)
```

Mulberry, steered by a voice description (or a preset speaker):

```python
tts = rumik_ai.TTS(
    model="mulberry",
    description="warm, gentle female friend",
    # speaker="speaker_1",      # optional preset, overrides description
    # f0_up_key=2.0,            # optional pitch shift, -12..12 semitones
)
```

### Changing the voice at runtime

`description`, `speaker`, `f0_up_key`, and the sampling params are sent on **every
request**, so you can change mulberry's voice between turns without reconnecting — the
pooled WebSocket is reused (only a `model` change re-mints the session):

```python
tts.update_options(description="excited young man, fast and energetic")
# the next synthesis request uses the new voice
```

## Latency vs. smoothness

The default is model-aware:

- **muga** buffers the full LLM reply and synthesizes it in one request, so its leading
  `[tone]` tag conditions the whole utterance (and there are no per-request TTFB gaps).
- **mulberry** streams sentence-by-sentence for lower time-to-first-word, since it has
  no tone tag to protect.

Override either with `full_response_aggregation`:

```python
rumik_ai.TTS(model="muga", full_response_aggregation=False, tone="neutral")  # muga, lower latency
rumik_ai.TTS(model="mulberry", full_response_aggregation=True)               # mulberry, smoother
```

When you turn aggregation **off for muga**, set a fallback `tone=` so every sentence
keeps a tone tag.

## Configuration

| Argument | Models | Notes |
|---|---|---|
| `model` | both | `"muga"` (default) or `"mulberry"` |
| `tone` | muga | fallback tone when input is untagged |
| `description` | mulberry | natural-language voice description |
| `speaker` | mulberry | `speaker_1`..`speaker_4` |
| `f0_up_key` | mulberry | pitch shift, `-12`..`12` |
| `temperature`, `top_p`, `top_k`, `repetition_penalty`, `max_new_tokens` | both | omitted unless set (Rumik defaults apply) |
| `full_response_aggregation` | both | buffer the full reply (`True`) vs. stream per sentence (`False`). Default: `True` for muga, `False` for mulberry |
| `api_key` | — | defaults to `RUMIK_API_KEY` |
| `base_url` | — | defaults to `https://silk-api.rumik.ai` |

## Examples

See [`examples/`](./examples) for a full voice agent (`rumik_ai_agent.py`) and a
record-to-WAV demo (`rumik_ai_tts.py`).

## License

Apache-2.0
