Metadata-Version: 2.4
Name: vocalyx
Version: 0.2.0
Summary: Real-time voice assistant with Speech-to-Text, GPT, and Text-to-Speech (Soprano TTS)
Author: Daxrajsinh Jadeja
License-Expression: MIT
Project-URL: Homepage, https://github.com/HolboxAI/Voice-to-Voice.git
Project-URL: Repository, https://github.com/HolboxAI/Voice-to-Voice.git
Keywords: voice,assistant,stt,tts,speech-to-text,text-to-speech,realtime,soprano
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: RealtimeSTT>=0.3.0
Requires-Dist: soprano-tts>=0.2.0
Requires-Dist: openai>=1.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: websockets>=12.0
Requires-Dist: PyAudio>=0.2.14
Requires-Dist: colorama>=0.4.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.10.0
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Dynamic: license-file

# 🎙️ Real-Time Voice Assistant (STT + GPT + TTS)

This project is a local, streaming voice assistant that listens to your voice, transcribes it in real time (STT), generates an AI response using GPT, and speaks the answer back using **SopranoTTS**.

It's built from three key components:

1. **stt_server** (`vocalyx-stt`) – handles real-time speech-to-text via WebSockets.
2. **tts_server** (`vocalyx-tts`) – streams GPT responses and converts them to speech using SopranoTTS.
3. **client** (`vocalyx`) – connects everything together: records your mic, shows live transcription, sends it to GPT, and plays back AI-generated voice.

---

## 📦 Install

```bash
pip install vocalyx
```

Or install from source:
```bash
git clone <repo-url>
cd Voice-to-Voice
pip install -e .
```

---

## 🧩 Requirements

### 1. Python

Make sure you have **Python 3.9+ and <=3.12** installed.

### 2. System Dependencies

You'll need:

* `ffmpeg` (for audio handling)
* A working microphone and audio output
* `portaudio` (for PyAudio)

#### macOS

```bash
brew install portaudio ffmpeg
```

#### Ubuntu / Debian

```bash
sudo apt update
sudo apt install portaudio19-dev ffmpeg python3-pyaudio
```

#### Windows

* Install Python (make sure to add it to PATH)
* PyAudio binaries can be installed with:

```bash
pip install pipwin
pipwin install pyaudio
```

---

## 📦 Install Python Dependencies

Create a virtual environment (Python 3.12 recommended) and install dependencies:

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

### Key Python Packages

* `RealtimeSTT` – real-time speech-to-text
* `openai` – GPT streaming responses
* `soprano-tts` – neural TTS engine
* `torch`, `numpy` – audio inference backend
* `pyaudio`, `sounddevice` – audio playback
* `python-dotenv` – environment variable loading

---

## 🔑 Environment Variables

Create a `.env` file in the project root with:

```
OPENAI_API_KEY=your_openai_api_key_here
```

Get your key from [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys).

---

## ⚙️ How It Works

End-to-end flow:

```
[Microphone Input]
        ↓
 [client.py] → Sends audio to STT server
        ↓
 [stt_server.py] → Transcribes speech in real time
        ↓
 [client.py] → Sends text to TTS server
        ↓
 [tts_server.py] → Streams GPT text + converts to speech (SopranoTTS)
        ↓
 [client.py] → Plays AI voice audio live
```

Each component communicates over WebSockets:

* STT control channel: `ws://localhost:8011`
* STT data channel: `ws://localhost:8012`
* TTS channel: `ws://localhost:8013`

---

## 🔊 Audio Format (Important)

SopranoTTS streams **raw float32 mono audio**:

* **Sample rate:** `32000 Hz`
* **Channels:** `1`
* **Format:** `float32`

The client plays audio **directly** using `paFloat32` without μ-law or int16 conversion. This ensures:

* Natural pitch
* Correct tempo
* No distortion

---

## 🚀 Running the System

You’ll need **three terminals**.

### 1️⃣ Start the STT Server

```bash
vocalyx-stt
```

Handles microphone audio and real-time transcription.

---

### 2️⃣ Start the TTS Server

```bash
vocalyx-tts
```

Streams GPT responses and converts them to speech using **SopranoTTS**.

---

### 3️⃣ Run the Client

```bash
vocalyx
```

The client:

* Captures microphone input
* Displays live transcription
* Sends prompts to GPT
* Plays streamed AI voice output

By default, it runs in **continuous mode**.

---

## 🗣️ Example Interaction

**You:**

> What's a good way to stay focused today?

**AI:** (spoken + printed)

> Try breaking your day into short focus sessions. Take a quick stretch between them.

---

## ⚙️ Optional Command-Line Arguments

You can tweak `client.py` behavior:

| Flag                     | Description                        | Default               |
| ------------------------ | ---------------------------------- | --------------------- |
| `--tts-url`              | TTS WebSocket server URL           | `ws://localhost:8013` |
| `--post-silence`         | Silence after each utterance       | `1.0`                 |
| `--speech-end-detection` | Adaptive silence detection         | off                   |
| `--debug`                | Print debug logs                   | off                   |
| `--norealtime`           | Disable live transcription display | off                   |
| `--list`                 | List microphone devices            | off                   |

List audio devices:

```bash
vocalyx --list
```

Select a specific mic:

```bash
vocalyx -i 2
```

---

## 🧠 Notes

* **SopranoTTS** is initialized once and reused for all requests.
* GPT responses are streamed sentence-by-sentence to minimize latency.
* Audio is streamed and played in near real time.

---

## 🧹 Troubleshooting

### Audio sounds distorted or slow

* Ensure client playback uses `paFloat32` at `32000 Hz`.
* Do **not** apply μ-law or int16 conversion.

### No response from GPT

* Verify `OPENAI_API_KEY` in `.env`.
* Check internet connectivity.

### STT not transcribing

* Ensure `RealtimeSTT` is installed correctly.
* Verify microphone index using `--list`.

---

## 🧾 License

This project is for personal and educational use.

---

## 💡 Future Improvements

* VAD-based auto start/stop for more natural conversations
* Opus/WebRTC streaming for browser clients
* GUI frontend for controlling STT/TTS parameters
* Interruptible (barge-in) speech handling

---

## 🏁 Summary

```bash
# Terminal 1
vocalyx-stt

# Terminal 2
vocalyx-tts

# Terminal 3
vocalyx
```
