Metadata-Version: 2.4
Name: ebook-reader-supertonic
Version: 1.3.3
Summary: A high-quality Flow-Matching based Text-to-Speech library using ONNX.
Author-email: Izzet Sezer <sezer@imsezer.com>
Project-URL: Homepage, https://github.com/sezer-muhammed/ReaderAudioEngine
Project-URL: Bug Tracker, https://github.com/sezer-muhammed/ReaderAudioEngine/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: onnxruntime; sys_platform != "win32" or platform_machine != "AMD64"
Requires-Dist: onnxruntime-gpu; sys_platform == "win32" and platform_machine == "AMD64"
Requires-Dist: scipy
Requires-Dist: huggingface-hub
Requires-Dist: pydantic
Requires-Dist: langid
Provides-Extra: timestamps
Requires-Dist: vosk; extra == "timestamps"
Provides-Extra: gpu-timestamps
Requires-Dist: faster-whisper; extra == "gpu-timestamps"

# ebook-reader-supertonic

A high-quality Flow-Matching based Text-to-Speech library using ONNX. This is a Python port of the Supertonic-2 web implementation.

## Features
- **10 Unique Voice Styles**: Professional male and female voices.
- **Auto-Downloader**: Automatically fetches models from HuggingFace to a global cache (`~/.cache/ebook_reader_supertonic`).
- **Word Timestamps**: Heuristic estimation by default, with optional Vosk-based extraction (offline ASR) for better word timing.
- **Adjustable Parameters**: Control speed (0.9 - 1.4) and diffusion steps (3 - 14).
- **Lightweight Inference**: Runs on CPU/GPU via ONNX Runtime.

## Installation

```bash
pip install ebook-reader-supertonic
```

## Quick Start

```python
from ebook_reader_supertonic import SupertonicTTS, VOICE_STYLES, MIN_SPEED, MAX_SPEED

# 1. Initialize engine
# Models are automatically cached in ~/.cache/ebook_reader_supertonic
engine = SupertonicTTS()

# 2. Synthesize
# Returns:
# - audio: np.ndarray (float32, normalized -1 to 1)
# - sample_rate: int (44100)
# - word_timestamps: List[Dict] -> [{'word': str, 'start': float, 'end': float}]
audio, sr, word_timestamps = engine.synthesize(
    text="Hello! Welcome to ebook-reader-supertonic.", 
    voice='F5', 
    speed=1.0, 
    steps=10,
    # timestamps_backend="auto",  # 'estimate' (default), 'vosk', or 'auto'
    # vosk_model_path="path/to/vosk/model",  # or set env VOSK_MODEL_PATH
)
```

### Vosk auto-download (optional)
If you use `timestamps_backend="auto"` or `"vosk"` and no model path is configured, the package can auto-download the pinned model `vosk-model-en-us-0.22-lgraph` into `~/.cache/vosk`.

Environment variables:
- `VOSK_MODEL_PATH`: use an existing local model directory (disables download).
- `VOSK_CACHE_DIR`: override cache base (default `~/.cache/vosk`).
- `VOSK_OFFLINE=1`: forbid downloads (error for `"vosk"`, fallback to estimate for `"auto"`).
- `EBOOK_READER_VOSK_AUTO_DOWNLOAD=0`: disable auto-download behavior.

# 3. Calculate Total Duration
duration = len(audio) / sr
print(f"Generated {duration:.2f}s of audio")

# 4. Access Word Timing
for segment in word_timestamps:
    print(f"{segment['word']}: {segment['start']}s -> {segment['end']}s")

# 5. Save to file
engine.save_wav(audio, "output.wav")
```

## API Reference

### `SupertonicTTS.synthesize(text, voice='M3', steps=10, speed=1.0, lang=None)`
- **Parameters**:
  - `text` (str): Text to synthesize.
  - `voice` (str): Voice ID (`F1-F5`, `M1-M5`).
  - `steps` (int): Diffusion steps (`MIN_STEPS=3` to `MAX_STEPS=14`).
  - `speed` (float): Speed factor (`MIN_SPEED=0.9` to `MAX_SPEED=1.4`).
  - `lang` (str): Manual language override (e.g., 'en', 'ko'). Auto-detects if None.
- **Returns**: `(audio_data, sample_rate, word_timestamps)`

### `VOICE_STYLES`
A list of Pydantic models containing voice metadata:
```python
voice = VOICE_STYLES[0]
print(voice.id)          # 'F1'
print(voice.gender)      # 'female'
print(voice.description) # 'Correct and natural...'
```

## Author
Izzet Sezer <sezer@imsezer.com>
