Metadata-Version: 2.4
Name: stttype
Version: 3.0.2
Summary: Cross-platform voice-to-text typing assistant with GPU acceleration and automatic CPU fallback
Author: LucasApps
License: MIT
Project-URL: Homepage, https://github.com/LucasApps/stttype
Project-URL: Issues, https://github.com/LucasApps/stttype/issues
Keywords: stt,speech-to-text,whisper,voice,typing,keyboard
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: faster-whisper>=1.0.0
Requires-Dist: whisper-live>=0.6.0
Requires-Dist: sounddevice>=0.4.6
Requires-Dist: soundfile>=0.12.1
Requires-Dist: numpy>=1.24.0
Requires-Dist: pynput>=1.7.0
Requires-Dist: pyautogui>=0.9.54
Requires-Dist: pyperclip>=1.8.2
Requires-Dist: pystray>=0.19.4
Requires-Dist: pillow>=10.0.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: websocket-client>=1.0.0
Provides-Extra: gpu
Requires-Dist: torch>=2.0.0; extra == "gpu"
Requires-Dist: torchaudio>=2.0.0; extra == "gpu"

# STT Type v3.0.0

**Cross-platform voice-to-text typing assistant with real-time streaming transcription.**

Hold **F2** (configurable) to stream your voice in real-time, release to type the final transcribed text at your cursor position.

## What's New in v3.0.0

- **Real-time streaming mode** — Uses WhisperLive for continuous transcription while you speak. Interim results appear in the console in real-time.
- **Hold mode (legacy)** — Still available! Record on hold, transcribe on release using faster-whisper directly.
- **Mode selector** — Choose between Streaming and Hold in `stttype --config`
- **Fully offline** — No internet required after initial model download

## Features

- **Real-time streaming** — See transcription progress as you speak (Streaming mode)
- **Cross-platform** — Works on Windows, Linux, and macOS
- **Hold hotkey to record/stream** — Audio captures while key is held (default: F2)
- **Visual indicator** — Transparent red dot with hotkey label appears in top-right corner while recording
- **Bell sounds** — Audio feedback when recording starts/stops
- **GPU-accelerated STT** — Uses faster-whisper on your NVIDIA GPU
- **Auto CPU fallback** — Automatically falls back to CPU if GPU is not available
- **Clipboard typing** — Types text via Ctrl+V paste for reliability (no key interference)
- **System tray mode** — Runs silently in background
- **Auto-startup** — Starts automatically on login (Windows)
- **Settings GUI** — `stttype --config` to change model, language, device, compute type, hotkey, and mode

## Requirements

- Python 3.9+
- Microphone
- NVIDIA GPU with CUDA support (optional, for GPU mode)
- Linux: `paplay`, `aplay`, or `pw-play` for sound feedback (auto-detected)

## Installation

### From PyPI (Recommended)

```bash
pip install stttype
```

### Prerequisites (Optional GPU Support)

Install PyTorch with CUDA support for GPU acceleration:

```bash
# Windows/Linux with CUDA 11.8
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

# macOS (CPU only, no CUDA)
pip install torch torchaudio
```

Without PyTorch, STT Type will still work — it automatically falls back to CPU.

### Windows (Local Install)

```powershell
cd "E:\Lucas\STT Type"
.\install.ps1
```

Then restart PowerShell.

### Linux

```bash
cd /path/to/stttype
chmod +x install.sh
./install.sh
```

If `sounddevice` fails, install PortAudio:
```bash
# Debian/Ubuntu
sudo apt-get install portaudio19-dev

# Fedora
sudo dnf install portaudio-devel

# Arch
sudo pacman -S portaudio
```

For sound feedback, ensure one of these is installed:
```bash
# PulseAudio (most desktop distros)
pulseaudio --version

# PipeWire
pw-play --version

# ALSA (fallback)
aplay --version
```

### macOS

```bash
cd /path/to/stttype
chmod +x install.sh
./install.sh
```

If `sounddevice` fails, install PortAudio:
```bash
brew install portaudio
```

**Note:** On macOS, you need to grant Accessibility permissions for `pynput` to capture global hotkeys. Go to **System Settings > Privacy & Security > Accessibility** and add your terminal application.

## Commands

Once installed, `stttype` works from any terminal.

| Command | Description |
|---------|-------------|
| `stttype --start` | Start STT Type in background |
| `stttype --shutdown` | Stop all STT Type processes |
| `stttype --status` | Check if STT Type is running |
| `stttype --restart` | Restart STT Type |
| `stttype --config` | Open settings GUI |
| `stttype --addtostartup` | Add to startup (Windows only) |
| `stttype --rmtostartup` | Remove from startup (Windows only) |
| `stttype --model <size>` | Set Whisper model (tiny/base/small/medium/large-v3) |
| `stttype --lang <code>` | Set language (en/zh/auto/etc) |
| `stttype --help` | Show help |

### Examples

```bash
# Start with default settings
stttype --start

# Start with a larger model for better accuracy
stttype --start --model small

# Start with Chinese language
stttype --start --lang zh

# Open settings GUI
stttype --config
```

### Settings GUI

Run `stttype --config` to open a settings window where you can configure:

| Setting | Options |
|---------|---------|
| **Mode** | Streaming (real-time), Hold (legacy) |
| **Whisper Model** | tiny, base, small, medium, large-v3 |
| **Language** | Auto-detect, English, Chinese, Spanish, French, German, Japanese, Korean, Russian, Italian, Portuguese, Arabic, Hindi |
| **Device** | Auto (GPU if available), GPU (CUDA), CPU |
| **Compute Type** | Auto, Float16 (GPU), Int8 (CPU) |
| **Hotkey** | F1 - F12 |

Settings are saved to `~/.config/stttype/config.json` and persist across restarts.

## How It Works

### Streaming Mode (default)
1. **Hold F2** — A transparent red dot appears, microphone starts streaming to local WhisperLive server
2. **Speak** — Interim transcription appears in the console in real-time
3. **Release F2** — Red dot disappears, streaming stops, final text is typed at your cursor

### Hold Mode (legacy)
1. **Hold F2** — A transparent red dot appears, microphone starts recording
2. **Release F2** — Red dot disappears, recording stops
3. **Transcribe** — Whisper processes the recorded audio
4. **Text is pasted** — Result is pasted at your cursor position via Ctrl+V

## Models

| Model | Size | VRAM | Speed | Accuracy |
|-------|------|------|-------|----------|
| `tiny` | 39 MB | ~1 GB | Fastest | Basic |
| `base` | 74 MB | ~1 GB | Fast | Good |
| `small` | 244 MB | ~2 GB | Medium | Better |
| `medium` | 769 MB | ~5 GB | Slower | Best |
| `large-v3` | 1550 MB | ~10 GB | Slowest | Excellent |

Default is `base` - a good balance of speed and accuracy.

## Uninstall

### Windows
```powershell
cd "E:\Lucas\STT Type"
.\uninstall.ps1
```

### Linux/macOS
```bash
cd /path/to/stttype
chmod +x uninstall.sh
./uninstall.sh
```

## Troubleshooting

| Issue | Solution |
|-------|----------|
| `stttype` not found | Restart terminal after installation |
| "CUDA not available" | Install NVIDIA drivers and CUDA toolkit, or STT Type will auto-fallback to CPU |
| No sound on start/stop (Linux) | Install `pulseaudio-utils`, `alsa-utils`, or `pipewire` |
| No overlay on Linux | Ensure `$DISPLAY` is set (e.g., `DISPLAY=:0 stttype --start`) |
| Text not typing | Make sure the target window is focused |
| Model download fails | Check internet connection |
| Hotkeys don't work (macOS) | Grant Accessibility permissions to your terminal |
| Hotkeys don't work (Linux) | Make sure you're running under X11 (not Wayland) |
| Garbled text when typing | Fixed in v2.0.2+ — uses clipboard paste instead of key simulation |
| Config not applied | Fixed in v2.0.5+ — CLI defaults changed to `None` so config values are read |
| Streaming mode slow on CPU | Use `tiny` or `base` model; or switch to Hold mode |

## Version History

| Version | Changes |
|---------|---------|
| 3.0.0 | **Major**: Real-time streaming via WhisperLive; mode selector (Streaming/Hold); hold-F2 UX preserved |
| 2.0.8 | Linux: `pyautogui.typewrite()` typing; faster bell beeps; transparent PNG overlay via PIL |
| 2.0.7 | Linux sound via WAV playback; Linux overlay fixes |
| 2.0.6 | Linux sound via WAV playback (`paplay`/`aplay`/`pw-play`); Linux overlay alpha transparency fallback |
| 2.0.5 | Fixed config not being used by CLI |
| 2.0.4 | README sync |
| 2.0.3 | Config value extraction fix |
| 2.0.2 | Clipboard typing fix; config applied |
| 2.0.1 | Linux DISPLAY fix |
| 2.0.0 | Config GUI, transparent indicator |
| 1.0.x | Initial releases |

## Publish to PyPI

```bash
# Install build tools
pip install build twine

# Build
cd /path/to/stttype
python -m build

# Upload
python -m twine upload dist/*
```

When prompted:
- **Username**: `__token__`
- **Password**: Your PyPI API token

---

**Author**: LucasApps  
**Version**: 3.0.0  
**License**: MIT
