Metadata-Version: 2.4
Name: ebook-reader-supertonic
Version: 1.2.2
Summary: A high-quality Flow-Matching based Text-to-Speech library using ONNX.
Author-email: Izzet Sezer <sezer@imsezer.com>
Project-URL: Homepage, https://github.com/sezer-muhammed/ReaderAudioEngine
Project-URL: Bug Tracker, https://github.com/sezer-muhammed/ReaderAudioEngine/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: onnxruntime
Requires-Dist: scipy
Requires-Dist: huggingface-hub
Requires-Dist: pydantic
Requires-Dist: langid

# ebook-reader-supertonic

A high-quality Flow-Matching based Text-to-Speech library using ONNX. This is a Python port of the Supertonic-2 web implementation.

## Features
- **10 Unique Voice Styles**: Professional male and female voices.
- **Auto-Downloader**: Automatically fetches models from HuggingFace to a global cache (`~/.cache/ebook_reader_supertonic`).
- **Word Timestamps**: Precise estimation of word-level timing.
- **Adjustable Parameters**: Control speed (0.9 - 1.4) and diffusion steps (3 - 14).
- **Lightweight Inference**: Runs on CPU/GPU via ONNX Runtime.

## Installation

```bash
pip install ebook-reader-supertonic
```

## Quick Start

```python
from ebook_reader_supertonic import SupertonicTTS, VOICE_STYLES, MIN_SPEED, MAX_SPEED

# 1. Initialize engine
# Models are automatically cached in ~/.cache/ebook_reader_supertonic
engine = SupertonicTTS()

# 2. Synthesize
# Returns:
# - audio: np.ndarray (float32, normalized -1 to 1)
# - sample_rate: int (44100)
# - word_timestamps: List[Dict] -> [{'word': str, 'start': float, 'end': float}]
audio, sr, word_timestamps = engine.synthesize(
    text="Hello! Welcome to ebook-reader-supertonic.", 
    voice='F5', 
    speed=1.0, 
    steps=10
)
```

# 3. Calculate Total Duration
duration = len(audio) / sr
print(f"Generated {duration:.2f}s of audio")

# 4. Access Word Timing
for segment in word_timestamps:
    print(f"{segment['word']}: {segment['start']}s -> {segment['end']}s")

# 5. Save to file
engine.save_wav(audio, "output.wav")
```

## API Reference

### `SupertonicTTS.synthesize(text, voice='M3', steps=10, speed=1.0, lang=None)`
- **Parameters**:
  - `text` (str): Text to synthesize.
  - `voice` (str): Voice ID (`F1-F5`, `M1-M5`).
  - `steps` (int): Diffusion steps (`MIN_STEPS=3` to `MAX_STEPS=14`).
  - `speed` (float): Speed factor (`MIN_SPEED=0.9` to `MAX_SPEED=1.4`).
  - `lang` (str): Manual language override (e.g., 'en', 'ko'). Auto-detects if None.
- **Returns**: `(audio_data, sample_rate, word_timestamps)`

### `VOICE_STYLES`
A list of Pydantic models containing voice metadata:
```python
voice = VOICE_STYLES[0]
print(voice.id)          # 'F1'
print(voice.gender)      # 'female'
print(voice.description) # 'Correct and natural...'
```

## Author
Izzet Sezer <sezer@imsezer.com>
