Metadata-Version: 2.4
Name: py-speech-gen
Version: 0.1.0
Summary: A Python library for generating synthetic speech datasets using TTS providers.
Project-URL: Homepage, https://github.com/09kz/py-speech-gen
Project-URL: Repository, https://github.com/09kz/py-speech-gen
Project-URL: Issues, https://github.com/09kz/py-speech-gen/issues
Author-email: "k.zydek" <k.zydek@aol.com>
License: MIT
License-File: LICENSE
Keywords: audio,dataset-generation,elevenlabs,piper-tts,speech,synthetic-data,tts
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: dotenv>=0.9.9
Requires-Dist: langdetect>=1.0.9
Requires-Dist: librosa>=0.11.0
Requires-Dist: num2words>=0.5.14
Requires-Dist: numpy>=2.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: pyloudnorm>=0.2.0
Requires-Dist: soundfile>=0.13.1
Requires-Dist: syntok>=1.4.4
Requires-Dist: tqdm>=4.67.0
Provides-Extra: all
Requires-Dist: elevenlabs>=2.42.0; extra == 'all'
Requires-Dist: onnxruntime-gpu>=1.24.4; extra == 'all'
Requires-Dist: piper-tts>=1.4.2; extra == 'all'
Requires-Dist: silero-vad>=6.2.1; extra == 'all'
Provides-Extra: dev
Requires-Dist: ipykernel>=7.2.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Provides-Extra: elevenlabs
Requires-Dist: elevenlabs>=2.42.0; extra == 'elevenlabs'
Provides-Extra: piper
Requires-Dist: onnxruntime-gpu>=1.24.4; extra == 'piper'
Requires-Dist: piper-tts>=1.4.2; extra == 'piper'
Provides-Extra: vad
Requires-Dist: silero-vad>=6.2.1; extra == 'vad'
Description-Content-Type: text/markdown

# py-speech-gen

[![PyPI version](https://img.shields.io/pypi/v/py-speech-gen.svg)](https://pypi.org/project/py-speech-gen/)
[![Python versions](https://img.shields.io/pypi/pyversions/py-speech-gen.svg)](https://pypi.org/project/py-speech-gen/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)

A Python library for generating synthetic speech datasets using TTS providers. Supports [ElevenLabs](https://elevenlabs.io) and [Piper TTS](https://github.com/rhasspy/piper) out of the box, with an extensible provider system for adding custom backends.

## Features

- **Multi-provider support** — ElevenLabs, Piper TTS, or your own custom provider
- **Text preprocessing** — cleaning, normalization, number-to-words, sentence segmentation
- **Parameter randomization** — per-sample variation for voice diversity
- **Background noise injection** — 8 synthetic noise types (white, pink, brown, traffic, cafe, home, crowd, mic)
- **Flexible output formats** — WAV, MP3, FLAC at configurable sample rates
- **Reproducible generation** — export/load configs for deterministic datasets
- **Export options** — JSON, CSV, pandas DataFrame

## Installation

### Basic install (core features only)

```bash
pip install py-speech-gen
```

The base installation includes text processing, dataset management, randomization, and noise mixing.

### Install with TTS providers

```bash
# Piper TTS (local, offline)
pip install "py-speech-gen[piper]"

# ElevenLabs (cloud API)
pip install "py-speech-gen[elevenlabs]"

# All providers
pip install "py-speech-gen[all]"
```

### Requirements

- Python 3.11+
- For Piper TTS: ONNX Runtime (GPU or CPU variant)
- For ElevenLabs: valid API key

## Quick Start

```python
import os
from pathlib import Path
from py_speech_gen import (
    ElevenLabsProvider,
    PiperProvider,
    DatasetGenerator,
    TextProcessor,
)

# Setup providers
elevenlabs = ElevenLabsProvider(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_ids=["voice_id_1", "voice_id_2"],
    model="eleven_flash_v2_5",
)

piper = PiperProvider(
    models_path="./models",
    voice_ids=["en_US-lessac-medium"],
)

# Generate dataset
generator = DatasetGenerator(
    providers=[elevenlabs, piper],
    output_dir="./output",
)

dataset = generator.generate_dataset(
    texts=["Hello world.", "This is a test."],
    dataset_name="my_speech_dataset",
    process_texts=True,
    texts_lang="en",
)

# Export results
dataset.save()        # JSON
dataset.save_csv()    # CSV
print(f"Generated {len(dataset)} samples")
```

## Components

| Component | Description |
|---|---|
| **Providers** | TTS backends (`ElevenLabsProvider`, `PiperProvider`) implementing `BaseProvider` |
| **DatasetGenerator** | Orchestrator that manages generation across multiple providers |
| **Dataset** | Data model with save/load/export (JSON, CSV, pandas) |
| **TextProcessor** | Text cleaning, normalization, number-to-words, sentence segmentation |
| **Randomizer** | Per-sample parameter randomization for voice diversity |
| **NoiseMixer** | Background noise injection for realistic conditions |

## Randomizer

Adds random variation to provider parameters per sample for dataset diversity.

```python
from py_speech_gen import Randomizer

# Use a preset: "subtle", "moderate", "extreme"
randomizer = Randomizer.preset("moderate", seed=42)

generator = DatasetGenerator(
    providers=[piper, elevenlabs],
    output_dir="./output",
    randomizer=randomizer,
)
```

### Custom Randomizer Config

```python
randomizer = Randomizer(
    config={
        "global": {"speed": (0.8, 1.2)},
        "elevenlabs": {
            "stability": (0.3, 0.7),
            "similarity_boost": (0.5, 0.9),
            "style": (0.0, 0.3),
            "use_speaker_boost": [True, False],
        },
        "piper": {
            "length_scale": (0.8, 1.3),
            "noise_scale": (0.5, 0.8),
            "noise_w_scale": (0.6, 1.0),
        },
    },
    seed=42,
)

# Export/load for reproducibility
randomizer.export("randomizer_config.json")
r = Randomizer.load("randomizer_config.json")
```

## NoiseMixer

Adds background noise to generated audio for realistic conditions.

```python
from py_speech_gen import NoiseMixer

mixer = NoiseMixer(
    noise_types=["white", "traffic", "cafe"],  # or "all", "synthetic"
    snr_db=20,
    random_snr=True,
    snr_range=(15, 30),
    skip_prob=0.2,  # 20% of samples get no noise
    seed=42,
)

generator = DatasetGenerator(
    providers=[piper, elevenlabs],
    output_dir="./output",
    noise_mixer=mixer,
)
```

### Noise Types

| Type | Source | Description |
|---|---|---|
| `white` | Generated | White noise |
| `pink` | Generated | Pink noise (1/f) |
| `brown` | Generated | Brown noise (deep bass) |
| `traffic` | Generated | Filtered noise with amplitude modulation |
| `cafe` | Generated | Bandpass noise with random bursts |
| `home` | Generated | 50/60Hz hum + random clicks |
| `crowd` | Generated | Bandpass noise with rhythmic modulation |
| `mic` | Generated | High-pass hiss + occasional pops |

## Configuration

Copy `.env.example` to `.env` and set your API keys:

```bash
cp .env.example .env
```

## Full Pipeline Example

```python
from py_speech_gen import (
    DatasetGenerator, Randomizer, NoiseMixer,
)

generator = DatasetGenerator(
    providers=[piper, elevenlabs],
    output_dir="./output",
    randomizer=Randomizer.preset("moderate", seed=42),
    noise_mixer=NoiseMixer(
        noise_types=["white", "traffic", "cafe"],
        snr_range=(15, 25),
        skip_prob=0.2,
        seed=42,
    ),
)

dataset = generator.generate_dataset(
    texts=["Sentence 1.", "Sentence 2."],
    dataset_name="demo",
    process_texts=True,
    texts_lang="en",
)

# Export full generation config for reproducibility
generator.export_config("./output/generation_config.json")

# Load config to reproduce
generator2 = DatasetGenerator.load_config(
    "./output/generation_config.json",
    providers=[piper, elevenlabs],
)
```

## Custom Providers

You can create your own TTS provider by extending `BaseProvider`. See [Custom Provider Tutorial](docs/custom_provider_tutorial.md).

## Running Tests

```bash
pip install "py-speech-gen[dev]"
pytest tests/ -v
```
