Metadata-Version: 2.4
Name: chiluka
Version: 0.1.0
Summary: Chiluka - A lightweight TTS inference package based on StyleTTS2
Home-page: https://github.com/PurviewVoiceBot/chiluka
Author: Seemanth
Author-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/Seemanth/chiluka
Project-URL: Documentation, https://github.com/Seemanth/chiluka#readme
Project-URL: Repository, https://github.com/Seemanth/chiluka
Project-URL: Issues, https://github.com/Seemanth/chiluka/issues
Keywords: tts,text-to-speech,speech-synthesis,styletts2,deep-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: torch>=1.13.0
Requires-Dist: torchaudio>=0.13.0
Requires-Dist: transformers>=4.20.0
Requires-Dist: librosa>=0.9.0
Requires-Dist: phonemizer>=3.0.0
Requires-Dist: nltk>=3.7
Requires-Dist: PyYAML>=6.0
Requires-Dist: munch>=2.5.0
Requires-Dist: einops>=0.6.0
Requires-Dist: einops-exts>=0.0.4
Requires-Dist: numpy>=1.21.0
Requires-Dist: scipy>=1.7.0
Provides-Extra: playback
Requires-Dist: pyaudio>=0.2.11; extra == "playback"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# Chiluka

**Chiluka** (చిలుక - Telugu for "parrot") is a self-contained TTS (Text-to-Speech) inference package based on StyleTTS2.

## Features

- Simple, clean API for TTS synthesis
- **Fully self-contained** - all models bundled in the package
- Style transfer from reference audio
- Multi-language support via phonemizer
- No external dependencies on other repos
- **Multiple distribution methods** - HuggingFace Hub, PyTorch Hub, pip install

## Installation

### Option 1: pip install (Recommended)

```bash
pip install chiluka
```

### Option 2: Install from GitHub

```bash
pip install git+https://github.com/Seemanth/chiluka.git
```

### Option 3: From Source

```bash
git clone https://github.com/Seemanth/chiluka.git
cd chiluka
pip install -e .
```

**Note:** If cloning with Git LFS for bundled weights:

```bash
# Install Git LFS first
sudo apt-get install git-lfs  # Ubuntu/Debian
# or: brew install git-lfs    # macOS

git lfs install
git lfs clone https://github.com/Seemanth/chiluka.git
```

### Install espeak-ng (Required)

```bash
# Ubuntu/Debian
sudo apt-get install espeak-ng

# macOS
brew install espeak-ng
```

## Quick Start

### Method 1: Auto-download from HuggingFace Hub (Recommended)

No need to clone the repo or download weights manually - they download automatically!

```python
from chiluka import Chiluka

# Automatically downloads model weights on first use
tts = Chiluka.from_pretrained()

# Synthesize speech
wav = tts.synthesize(
    text="Hello, this is Chiluka speaking!",
    reference_audio="path/to/reference.wav",
    language="en"
)

# Save to file
tts.save_wav(wav, "output.wav")
```

### Method 2: PyTorch Hub

```python
import torch

# Load directly via torch.hub
tts = torch.hub.load('Seemanth/chiluka', 'chiluka')

# Synthesize
wav = tts.synthesize(
    text="Hello from PyTorch Hub!",
    reference_audio="reference.wav",
    language="en"
)
```

### Method 3: From Specific HuggingFace Repository

```python
from chiluka import Chiluka

# Load from a specific HuggingFace repo
tts = Chiluka.from_pretrained("Seemanth/chiluka-tts")

# Or from a custom/fine-tuned model
tts = Chiluka.from_pretrained("someuser/custom-chiluka-model")
```

### Method 4: Local Weights (if you cloned with Git LFS)

```python
from chiluka import Chiluka

# Uses bundled weights from the cloned repo
tts = Chiluka()

wav = tts.synthesize(
    text="Hello world!",
    reference_audio="reference.wav",
    language="en"
)
```

## Multi-Language Examples

### Telugu

```python
from chiluka import Chiluka

tts = Chiluka.from_pretrained()

wav = tts.synthesize(
    text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
    reference_audio="path/to/telugu_reference.wav",
    language="te"
)

tts.save_wav(wav, "telugu_output.wav")
```

### Hindi

```python
wav = tts.synthesize(
    text="नमस्ते, मैं चिलुका बोल रहा हूं",
    reference_audio="hindi_reference.wav",
    language="hi"
)
```

## API Reference

### Loading the Model

```python
# Auto-download (recommended for most users)
tts = Chiluka.from_pretrained()

# From specific HuggingFace repo
tts = Chiluka.from_pretrained("username/model-name")

# With options
tts = Chiluka.from_pretrained(
    repo_id="username/model-name",  # HuggingFace repo
    device="cuda",                   # or "cpu"
    force_download=False,            # Re-download even if cached
    token="hf_xxx"                   # For private repos
)

# Local weights (if cloned with Git LFS)
tts = Chiluka(
    config_path=None,       # Uses bundled config
    checkpoint_path=None,   # Uses bundled checkpoint
    pretrained_dir=None,    # Uses bundled pretrained models
    device=None             # Auto-detect cuda/cpu
)
```

### synthesize()

```python
wav = tts.synthesize(
    text="Hello world",           # Text to synthesize
    reference_audio="ref.wav",    # Reference audio for style
    language="en",                # Language code
    alpha=0.3,                    # Acoustic style mixing (0-1)
    beta=0.7,                     # Prosodic style mixing (0-1)
    diffusion_steps=5,            # Quality vs speed tradeoff
    embedding_scale=1.0,          # Classifier-free guidance
    sr=24000                      # Sample rate
)
```

### Other Methods

```python
# Save audio to file
tts.save_wav(wav, "output.wav", sr=24000)

# Play audio (requires pyaudio)
tts.play(wav, sr=24000)

# Get style embedding from audio
style = tts.compute_style("reference.wav", sr=24000)
```

## Synthesis Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `alpha` | 0.3 | Acoustic style mixing (0=reference only, 1=predicted only) |
| `beta` | 0.7 | Prosodic style mixing (0=reference only, 1=predicted only) |
| `diffusion_steps` | 5 | Diffusion sampling steps (more = better quality, slower) |
| `embedding_scale` | 1.0 | Classifier-free guidance scale |

## Supported Languages

Uses [phonemizer](https://github.com/bootphon/phonemizer) with espeak-ng:

| Language | Code |
|----------|------|
| English (US) | `en-us` |
| English (UK) | `en-gb` |
| Telugu | `te` |
| Hindi | `hi` |
| Tamil | `ta` |
| Kannada | `kn` |

See espeak-ng documentation for full list.

## Hub Utilities

### Clear Cache

```python
from chiluka import clear_cache

# Clear all cached models
clear_cache()

# Clear specific repo cache
clear_cache("username/model-name")
```

### Push Your Own Model to HuggingFace

```python
from chiluka import push_to_hub

push_to_hub(
    local_dir="./my-trained-model",
    repo_id="myusername/my-chiluka-model",
    token="hf_your_token"
)
```

### Get Cache Directory

```python
from chiluka import get_cache_dir

print(get_cache_dir())  # ~/.cache/chiluka
```

## Environment Variables

| Variable | Description |
|----------|-------------|
| `CHILUKA_CACHE` | Custom cache directory (default: `~/.cache/chiluka`) |
| `HF_TOKEN` | HuggingFace API token for private repos |

## Requirements

- Python >= 3.8
- PyTorch >= 1.13.0
- CUDA (recommended for faster inference)
- espeak-ng

## Package Structure

```
chiluka/
├── chiluka/
│   ├── __init__.py
│   ├── inference.py          # Main Chiluka API
│   ├── hub.py                # HuggingFace Hub utilities
│   ├── text_utils.py
│   ├── utils.py
│   ├── configs/
│   ├── checkpoints/
│   ├── pretrained/
│   └── models/
├── hubconf.py                # PyTorch Hub config
├── examples/
├── setup.py
└── README.md
```

## Training Your Own Model

This package is for **inference only**. To train your own model, use the original [StyleTTS2](https://github.com/yl4579/StyleTTS2) repository.

After training:
1. Copy your checkpoint to a directory
2. Push to HuggingFace Hub using `push_to_hub()`
3. Load with `Chiluka.from_pretrained("your-repo")`

## Credits

Based on [StyleTTS2](https://github.com/yl4579/StyleTTS2) by Yinghao Aaron Li et al.

## License

MIT License
