Metadata-Version: 2.4
Name: copytalker
Version: 0.0.2
Summary: Cross-modal data conversion driven asynchronous multi-voice translation system
Author-email: CopyTalker Team <copytalker@example.com>
Maintainer-email: CopyTalker Team <copytalker@example.com>
License-Expression: GPL-3.0-only
Project-URL: Homepage, https://github.com/cycleuser/CopyTalker
Project-URL: Repository, https://github.com/cycleuser/CopyTalker.git
Project-URL: Documentation, https://github.com/cycleuser/CopyTalker#readme
Project-URL: Issues, https://github.com/cycleuser/CopyTalker/issues
Project-URL: Changelog, https://github.com/cycleuser/CopyTalker/blob/main/CHANGELOG.md
Keywords: speech-to-speech,translation,text-to-speech,speech-recognition,multi-language,real-time,audio-processing,whisper,kokoro-tts,nllb,voice-cloning,indextts,fish-speech,emotion-tts
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Environment :: X11 Applications :: GTK
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Natural Language :: Chinese (Simplified)
Classifier: Natural Language :: Japanese
Classifier: Natural Language :: Korean
Classifier: Natural Language :: French
Classifier: Natural Language :: German
Classifier: Natural Language :: Spanish
Classifier: Natural Language :: Russian
Classifier: Natural Language :: Arabic
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: faster-whisper>=0.6.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: torch>=2.0.0
Requires-Dist: sounddevice>=0.4.4
Requires-Dist: webrtcvad-wheels>=2.0.10
Requires-Dist: numpy>=1.26.0
Requires-Dist: huggingface-hub>=0.16.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: sentencepiece>=0.1.96
Provides-Extra: full
Requires-Dist: kokoro>=0.1.0; extra == "full"
Requires-Dist: edge-tts>=6.1.0; extra == "full"
Requires-Dist: pyttsx3>=2.90; extra == "full"
Requires-Dist: librosa>=0.9.1; extra == "full"
Provides-Extra: pyaudio
Requires-Dist: pyaudio>=0.2.11; extra == "pyaudio"
Provides-Extra: indextts
Requires-Dist: pynini>=2.1.5; extra == "indextts"
Requires-Dist: soundfile>=0.12.0; extra == "indextts"
Provides-Extra: fish-speech
Requires-Dist: soundfile>=0.12.0; extra == "fish-speech"
Provides-Extra: fish-speech-api
Requires-Dist: fish-audio-sdk>=0.1.0; extra == "fish-speech-api"
Requires-Dist: httpx>=0.24.0; extra == "fish-speech-api"
Provides-Extra: cjk
Requires-Dist: cn2an>=0.5.17; extra == "cjk"
Requires-Dist: pypinyin>=0.44.0; extra == "cjk"
Requires-Dist: pypinyin-dict>=0.1.0; extra == "cjk"
Requires-Dist: jieba>=0.42.1; extra == "cjk"
Requires-Dist: fugashi>=1.1.0; extra == "cjk"
Requires-Dist: jaconv>=0.3.4; extra == "cjk"
Requires-Dist: mojimoji>=0.0.12; extra == "cjk"
Requires-Dist: unidic-lite>=1.0.0; extra == "cjk"
Provides-Extra: dev
Requires-Dist: pytest>=7.3.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.1.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.3.0; extra == "dev"
Requires-Dist: black>=23.3.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: mypy>=1.3.0; extra == "dev"
Requires-Dist: ruff>=0.0.270; extra == "dev"
Requires-Dist: types-PyYAML>=6.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=6.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.2.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.23.0; extra == "docs"
Provides-Extra: all
Requires-Dist: copytalker[cjk,dev,docs,fish-speech,fish-speech-api,full,indextts,pyaudio]; extra == "all"
Dynamic: license-file

# CopyTalker

[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![PyPI version](https://badge.fury.io/py/copytalker.svg)](https://badge.fury.io/py/copytalker)

**CopyTalker** is a cross-modal data conversion driven asynchronous multi-voice translation system. It enables real-time speech-to-speech translation with support for multiple languages and voices, utilizing state-of-the-art machine learning models for speech recognition, translation, and synthesis.

## Features

- **Real-time Speech Translation**: Instantly translate spoken language to another language with voice output
- **Multi-language Support**: Supports translation between 9 languages including English, Chinese, Japanese, Korean, French, German, Spanish, Russian, and Arabic
- **Multiple TTS Engines**: Kokoro (high-quality neural TTS), Edge TTS (cloud-based), pyttsx3 (offline)
- **Cross-modal Conversion**: Seamless conversion from speech to text to translated speech
- **Asynchronous Processing**: Efficient parallel processing with minimal latency
- **Simple GUI**: Easy-to-use Tkinter graphical interface
- **Offline Capabilities**: Download models for offline usage

## Supported Languages

| Code | Language |
|------|----------|
| en | English |
| zh | Chinese (Simplified) |
| ja | Japanese |
| ko | Korean |
| fr | French |
| de | German |
| es | Spanish |
| ru | Russian |
| ar | Arabic |

## Installation

### From PyPI (Recommended)

```bash
pip install copytalker
```

### With All Features

```bash
# Full installation with all TTS engines
pip install copytalker[full]

# With CJK language support
pip install copytalker[full,cjk]

# Development installation
pip install copytalker[dev]
```

### From Source

```bash
git clone https://github.com/cycleuser/CopyTalker.git
cd CopyTalker
pip install -e .[full]
```

### System Dependencies

CopyTalker requires FFmpeg and PortAudio for audio processing:

**Ubuntu/Debian:**
```bash
sudo apt install ffmpeg portaudio19-dev python3-dev
```

**Fedora:**
```bash
sudo dnf install ffmpeg portaudio-devel python3-devel
```

**macOS:**
```bash
brew install ffmpeg portaudio
```

**Windows:**
Download FFmpeg from https://ffmpeg.org/download.html and add to PATH.

## Quick Start

### Command Line Interface

```bash
# Start real-time translation (English to Chinese)
copytalker translate --target zh

# With auto-detection of source language
copytalker translate --source auto --target ja

# Specify TTS voice
copytalker translate --target zh --voice zf_xiaobei

# Use specific TTS engine
copytalker translate --target en --tts-engine edge-tts

# List available voices
copytalker list-voices --language zh

# List supported languages
copytalker list-languages
```

### GUI Mode

```bash
# Launch graphical interface
copytalker --gui

# Or use dedicated command
copytalker-gui
```

#### Screenshots

**Main Interface**

![Main Interface](images/0-interface.png)

The main window provides access to all settings, real-time transcription and translation displays, and control buttons including Start Translation, Stop, and Download Models.

**Source Language Selection**

![Source Language Selection](images/1-select-source.png)

Select the source language or choose Auto-detect to let Whisper identify the spoken language automatically.

**Target Language Selection**

![Target Language Selection](images/2-select-target.png)

Choose the target language for translation output.

**Voice Selection**

![Voice Selection](images/3-select-vioce.png)

Pick a TTS voice for the target language. Voices change dynamically based on the selected target language and TTS engine.

**TTS Engine Selection**

![TTS Engine Selection](images/4-select-tts.png)

Choose between Kokoro (high-quality neural), Edge TTS (cloud-based), pyttsx3 (offline), or auto (automatic best choice).

**Translation Model Selection**

![Translation Model Selection](images/5-select-translator.png)

Select between Helsinki-NLP (language-pair specific) or NLLB (multilingual, supports all language pairs including ja-zh).

**Translation Device Selection**

![Translation Device Selection](images/6-select-trans-device.png)

Assign the translation model to CPU or CUDA GPU to balance resources.

**TTS Device Selection**

![TTS Device Selection](images/7-select-tts-device.png)

Assign the TTS engine to CPU or CUDA GPU independently from the translation model to avoid GPU resource contention.

### Python API

```python
from copytalker import AppConfig, TranslationPipeline

# Configure
config = AppConfig()
config.stt.language = "auto"  # Auto-detect source language
config.translation.target_lang = "zh"  # Translate to Chinese
config.tts.engine = "kokoro"  # Use Kokoro TTS
config.tts.voice = "zf_xiaobei"  # Chinese female voice

# Create and start pipeline
pipeline = TranslationPipeline(config)

# Register callbacks for events
def on_transcription(event):
    print(f"Heard: {event.data.text}")

def on_translation(event):
    print(f"Translated: {event.data.translated_text}")

pipeline.register_callback("transcription", on_transcription)
pipeline.register_callback("translation", on_translation)

# Start translation
pipeline.start()

# ... (pipeline runs until stopped)

# Stop
pipeline.stop()
```

### Using Context Manager

```python
from copytalker import AppConfig, TranslationPipeline

config = AppConfig()
config.translation.target_lang = "ja"

with TranslationPipeline(config) as pipeline:
    # Pipeline is running
    input("Press Enter to stop...")
# Pipeline automatically stopped
```

## Model Management

### Pre-download Models

```bash
# Download Whisper model
copytalker download-models --whisper small

# Download Kokoro TTS model
copytalker download-models --kokoro

# Download all recommended models
copytalker download-models --all
```

### Cache Management

```bash
# Show cache info
copytalker cache --info

# Clear all cached models
copytalker cache --clear

# Clear specific model type
copytalker cache --clear whisper
```

## Configuration

CopyTalker can be configured via:

1. **Command-line arguments**
2. **Environment variables**
3. **Configuration file** (`~/.config/copytalker/config.yaml`)

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `COPYTALKER_CACHE_DIR` | Model cache directory | `~/.cache/copytalker` |
| `COPYTALKER_DEVICE` | Compute device (cpu/cuda/auto) | `auto` |
| `COPYTALKER_CONFIG` | Config file path | `~/.config/copytalker/config.yaml` |

### Configuration File Example

```yaml
audio:
  sample_rate: 16000
  vad_aggressiveness: 3

stt:
  model_size: small
  device: auto

translation:
  target_lang: zh

tts:
  engine: kokoro
  voice: zf_xiaobei
  speed: 1.0

debug: false
```

## Architecture

CopyTalker follows a modular pipeline architecture:

```
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Audio Capture  │────▶│  Speech-to-Text │────▶│   Translation   │────▶│  Text-to-Speech │
│    (VAD)        │     │   (Whisper)     │     │ (Helsinki/NLLB) │     │    (Kokoro)     │
└─────────────────┘     └─────────────────┘     └─────────────────┘     └─────────────────┘
```

1. **Audio Capture**: Records audio with Voice Activity Detection (WebRTC VAD)
2. **Speech Recognition**: Transcribes using Faster-Whisper
3. **Translation**: Translates using Helsinki-NLP or NLLB models
4. **Text-to-Speech**: Synthesizes using Kokoro, Edge TTS, or pyttsx3

## Development

### Setup Development Environment

```bash
git clone https://github.com/cycleuser/CopyTalker.git
cd CopyTalker
pip install -e .[dev]
```

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=copytalker

# Run only unit tests
pytest tests/unit/

# Run fast tests only (skip slow)
pytest -m "not slow"
```

### Code Quality

```bash
# Format code
black src/copytalker tests
isort src/copytalker tests

# Lint
ruff check src/copytalker

# Type checking
mypy src/copytalker
```

## Requirements

- Python 3.9 or higher
- FFmpeg
- PortAudio (for PyAudio)
- Audio input/output capabilities
- PyTorch 2.0+ (on macOS: CPU or MPS; on Linux/Windows: CPU or CUDA)

See [pyproject.toml](pyproject.toml) for detailed Python package dependencies.

### macOS Installation Notes

CopyTalker works on macOS (both Intel and Apple Silicon). On macOS, CUDA is not available, so PyTorch uses CPU or MPS (Apple Silicon) for inference.

If you encounter torch/numpy conflicts on macOS, install PyTorch first:

```bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install copytalker
```

If PyAudio fails to install on macOS, set the compiler flags:

```bash
LDFLAGS="-L$(brew --prefix portaudio)/lib" CFLAGS="-I$(brew --prefix portaudio)/include" pip install pyaudio
```

### Linux Installation Notes

On Linux, PyAudio is compiled from source and requires the PortAudio development headers and a C compiler. Install them before running `pip install`:

```bash
# Ubuntu/Debian
sudo apt install ffmpeg portaudio19-dev python3-dev build-essential

# Fedora
sudo dnf install ffmpeg portaudio-devel python3-devel gcc
```

## Agent Integration (OpenAI Function Calling)

CopyTalker exposes OpenAI-compatible tools for LLM agents:

```python
from copytalker.tools import TOOLS, dispatch

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=TOOLS,
)

result = dispatch(
    tool_call.function.name,
    tool_call.function.arguments,
)
```

## CLI Help

![CLI Help](images/copytalker_help.png)

## License

This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## Acknowledgments

- [faster-whisper](https://github.com/guillaumekln/faster-whisper) for speech recognition
- [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) for translation models
- [Facebook NLLB](https://ai.meta.com/research/no-language-left-behind/) for multilingual translation
- [Kokoro TTS](https://github.com/hexgrad/kokoro) for high-quality neural TTS
- Various TTS libraries for voice synthesis
