Metadata-Version: 2.4
Name: ominix-tts
Version: 0.1.0
Summary: Ominix TTS: A multilingual TTS system
Home-page: https://github.com/cshbli/Ominix-TTS
Author: Hongbing Li
Author-email: Hongbing Li <cshbli@hotmail.com>
License: MIT License
        
        Copyright (c) 2025 Hongbing Li
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/cshbli/Ominix-TTS
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cn2an
Requires-Dist: fast_langdetect>=0.3.0
Requires-Dist: ffmpeg-python
Requires-Dist: g2p_en
Requires-Dist: gradio<=4.24.0,>=4.0
Requires-Dist: huggingface_hub>=0.13
Requires-Dist: jieba
Requires-Dist: jieba_fast
Requires-Dist: librosa>=0.9.2
Requires-Dist: matplotlib
Requires-Dist: numpy<2.0.0,>=1.23.4
Requires-Dist: peft
Requires-Dist: pypinyin
Requires-Dist: pytorch-lightning>2.0
Requires-Dist: split-lang
Requires-Dist: torchaudio
Requires-Dist: tqdm
Requires-Dist: transformers>=4.43
Requires-Dist: wordsegment
Requires-Dist: x_transformers
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Ominix-TTS: Advanced Multilingual Text-to-Speech with Voice Cloning

Ominix-TTS is a cutting-edge text-to-speech synthesis framework that transforms input text into natural-sounding speech using a sophisticated two-stage pipeline. The system excels in producing high-quality audio across multiple languages with voice cloning capabilities.

## Key Features

- **Two-Stage Synthesis Pipeline**: First converts text to semantic tokens, then transforms these tokens into audio waveforms
- **Multilingual Support**: Handles Chinese, English, Japanese, Korean, and Cantonese with both pure and mixed-language modes
- **Voice Cloning**: Replicates voice characteristics from a short reference audio sample
- **Voice Fusion**: Combines multiple reference voices for custom voice creation
- **High-Quality Output**: Produces natural-sounding speech with proper prosody and intonation
- **Configurable Parameters**: Offers control over speed, temperature, and other synthesis qualities

## Language Codes in Ominix-TTS

Here's a comprehensive table of all language codes supported by the Ominix-TTS system:

| Language Code | Description | Recognition Type |
|---------------|-------------|------------------|
| `"en"`        | Pure English | English only processing |
| `"zh"`        | Mixed Chinese-English | Chinese-English hybrid processing |
| `"all_zh"`    | Pure Chinese | Chinese only processing |
| `"yue"`       | Mixed Cantonese-English | Cantonese-English hybrid processing |
| `"all_yue"`   | Pure Cantonese | Cantonese only processing |
| `"ja"`        | Mixed Japanese-English | Japanese-English hybrid processing |
| `"all_ja"`    | Pure Japanese | Japanese only processing |
| `"ko"`        | Mixed Korean-English | Korean-English hybrid processing |
| `"all_ko"`    | Pure Korean | Korean only processing |
| `"auto"`      | Auto-detect language | Multi-language detection and processing |
| `"auto_yue"`  | Auto-detect with Cantonese support | Multi-language detection including Cantonese |

## Technical Architecture

Ominix-TTS operates through coordinated specialized models:
- **BERT Models**: Extract linguistic features from input text
- **CNHuBERT**: Processes reference audio to capture voice characteristics
- **Text2Semantic Model**: Converts text features into semantic tokens
- **SoVITS Model**: Transforms semantic tokens into audio waveforms

The system supports different model versions (v1, v2, v3) with increasing capabilities and language support, allowing users to balance between quality, speed, and resource requirements.

## Applications

Ideal for creating audiobooks, virtual assistants, accessibility tools, content localization, and any application requiring high-quality speech synthesis with the ability to match specific voice characteristics.

## Usage

1. Please install `ffmpeg`. ffmpeg is used to decode the reference audio file. 

    - For MacOS:
    ```
    brew install ffmpeg 
    ```

2. Recommend to create one virtual environment to run tests and examples

```
conda create -n TTS python=3.9
conda activate TTS
```
