Metadata-Version: 2.4
Name: py-speech-gen
Version: 0.2.0
Summary: A Python library for generating synthetic speech datasets using TTS providers.
Project-URL: Homepage, https://github.com/09kz/py-speech-gen
Project-URL: Repository, https://github.com/09kz/py-speech-gen
Project-URL: Issues, https://github.com/09kz/py-speech-gen/issues
Author-email: Kacper Żydek <k.zydek@aol.com>
License: MIT
License-File: LICENSE
Keywords: audio,dataset-generation,elevenlabs,piper-tts,speech,synthetic-data,tts
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: dotenv>=0.9.9
Requires-Dist: langdetect>=1.0.9
Requires-Dist: librosa>=0.11.0
Requires-Dist: matplotlib>=3.10.9
Requires-Dist: num2words>=0.5.14
Requires-Dist: numpy>=2.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: pyloudnorm>=0.2.0
Requires-Dist: seaborn>=0.13.2
Requires-Dist: soundfile>=0.13.1
Requires-Dist: syntok>=1.4.4
Requires-Dist: tqdm>=4.67.0
Provides-Extra: all
Requires-Dist: elevenlabs>=2.42.0; extra == 'all'
Requires-Dist: onnxruntime-gpu>=1.24.4; extra == 'all'
Requires-Dist: piper-tts>=1.4.2; extra == 'all'
Provides-Extra: dev
Requires-Dist: ipykernel>=7.2.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Provides-Extra: elevenlabs
Requires-Dist: elevenlabs>=2.42.0; extra == 'elevenlabs'
Provides-Extra: googlecloud
Requires-Dist: google-cloud-texttospeech>=2.26.0; extra == 'googlecloud'
Provides-Extra: notebook
Requires-Dist: elevenlabs>=2.42.0; extra == 'notebook'
Requires-Dist: ipywidgets>=7.0.0; extra == 'notebook'
Requires-Dist: onnxruntime-gpu>=1.24.4; extra == 'notebook'
Requires-Dist: piper-tts>=1.4.2; extra == 'notebook'
Provides-Extra: piper
Requires-Dist: onnxruntime-gpu>=1.24.4; extra == 'piper'
Requires-Dist: piper-tts>=1.4.2; extra == 'piper'
Description-Content-Type: text/markdown

# py-speech-gen

[![PyPI version](https://img.shields.io/pypi/v/py-speech-gen.svg)](https://pypi.org/project/py-speech-gen/)
[![Python versions](https://img.shields.io/pypi/pyversions/py-speech-gen.svg)](https://pypi.org/project/py-speech-gen/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)

A Python library for generating synthetic speech datasets using TTS providers. Supports [ElevenLabs](https://elevenlabs.io), [Piper TTS](https://github.com/OHF-Voice/piper1-gpl), and [Google Cloud TTS](https://cloud.google.com/text-to-speech) out of the box, with an extensible provider system for adding custom backends.

## Features

- **Multi-provider support** — ElevenLabs, Piper TTS, or your own custom provider
- **Text preprocessing** — cleaning, normalization, number-to-words, sentence segmentation
- **Parameter randomization** — per-sample variation for voice diversity
- **Background noise injection** — 8 synthetic noise types (white, pink, brown, traffic, cafe, home, crowd, mic)
- **Flexible output formats** — WAV, MP3, FLAC at configurable sample rates
- **Reproducible generation** — export/load configs for deterministic datasets
- **Export options** — JSON, CSV, pandas DataFrame

## Installation

### Basic install (core features only)

```bash
pip install py-speech-gen
```

The base installation includes text processing, dataset management, randomization, and noise mixing.

### Install with TTS providers

```bash
# Piper TTS (local, offline)
pip install "py-speech-gen[piper]"

# ElevenLabs (cloud API)
pip install "py-speech-gen[elevenlabs]"

# Google Cloud TTS (cloud API)
pip install "py-speech-gen[googlecloud]"

# All providers
pip install "py-speech-gen[all]"
```

### Requirements

- Python 3.11+
- For Piper TTS: ONNX Runtime (GPU or CPU variant)
- For ElevenLabs: valid API key
- For Google Cloud TTS: Google Cloud credentials and enabled Text-to-Speech API

## Documentation

- **[Usage Guide](docs/usage.md)** — Quick start, examples, presets, and detailed API reference
- **[Create a Provider](docs/create-provider.md)** — Step-by-step guide to adding custom TTS providers
- **Provider Documentation** — Each provider has its own docs:
  - [Piper TTS](docs/providers/piper.md) — local, offline TTS
  - [ElevenLabs](docs/providers/elevenlabs.md) — cloud API with natural voices
  - [Google Cloud TTS](docs/providers/googlecloud.md) — cloud API with 40+ languages

## Key Components

| Component | Description |
|---|---|
| **Providers** | TTS backends (`PiperProvider`, `ElevenLabsProvider`, `GoogleCloudProvider`) implementing `BaseProvider` |
| **DatasetGenerator** | Orchestrator that manages generation across multiple providers |
| **Dataset** | Data model with save/load/export (JSON, CSV, pandas) |
| **TextProcessor** | Text cleaning, normalization, number-to-words, sentence segmentation |
| **Randomizer** | Per-sample parameter randomization for voice diversity |
| **NoiseMixer** | Background noise injection for realistic conditions |

## Running Tests

```bash
pip install "py-speech-gen[dev]"
pytest tests/ -v
```

## License

MIT License — see [LICENSE](LICENSE) file for details.
