Metadata-Version: 2.4
Name: voicescript
Version: 0.1.0
Summary: You speak, it types - clean output on your clipboard in seconds
Author: VoiceScript Contributors
License-Expression: MIT
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.86.0
Requires-Dist: desktop-notifier>=6.2.0
Requires-Dist: evdev>=1.7.0; sys_platform == 'linux'
Requires-Dist: faster-whisper>=1.2.1
Requires-Dist: numpy>=2.4.3
Requires-Dist: pynput>=1.7.0
Requires-Dist: pyperclip>=1.11.0
Requires-Dist: pywebview>=6.1
Requires-Dist: sounddevice>=0.5.5
Requires-Dist: tomli-w>=1.2.0
Provides-Extra: dev
Requires-Dist: pytest-mock>=3.14.0; extra == 'dev'
Requires-Dist: pytest>=9.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# VoiceScript

You speak, it types — clean output on your clipboard in seconds, no matter what app you're in.

VoiceScript records your voice via microphone, transcribes it locally and offline using OpenAI Whisper (via faster-whisper), then sends the raw transcript to Claude (Anthropic) for cleanup. The polished result lands on your clipboard, ready to paste anywhere. A translucent HUD overlay sits at the bottom of your screen showing current state: idle, recording, or processing. Five output profiles let you shape the same spoken words into a plain transcript, a professional email, a Slack message, structured meeting notes, or clean code comments — all without leaving your keyboard.

## Requirements

- Python 3.11 or newer
- `ANTHROPIC_API_KEY` — get a key at https://console.anthropic.com/
- System browser engine for the HUD overlay (per OS):
  - **Linux (Debian/Ubuntu):** `sudo apt install python3-gi-cairo libwebkit2gtk-4.1-0`
  - **Linux (Fedora/RHEL):** `sudo dnf install webkit2gtk4.1`
  - **Windows:** Edge WebView2 — pre-installed on Windows 10 and later; if missing, download from Microsoft
  - **macOS:** No additional install required (native WebKit)
- First run downloads the Whisper `large-v3` model (~3 GB) — one-time, cached locally

## Installation

### From Test PyPI

```bash
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ voicescript
```

### From source

```bash
git clone https://github.com/your-org/voicescript.git
cd voicescript
pip install -e .
```

## First Run

**1. Set your API key:**

```bash
export ANTHROPIC_API_KEY=your-key-here
```

**2. Quick test (standalone, no daemon):**

```bash
voicescript record
```

Speak, press Enter to stop. The transcript is cleaned up by Claude and copied to your clipboard. This mode is useful to verify your setup before running the daemon.

**3. Start the daemon:**

```bash
voicescript start
```

The daemon runs in the background, listens for F9, and shows the HUD overlay.

**4. Trigger a recording:**

Press **F9** to start recording. Press **F9** again to stop. The result is copied to your clipboard.

On Wayland, use the CLI command instead (see [Wayland Note](#wayland-note)):

```bash
voicescript trigger
```

**5. Cycle output profiles:**

Hold **F9** to cycle to the next profile. Or from the command line:

```bash
voicescript profile next
```

**6. Check daemon status:**

```bash
voicescript status
```

**7. Stop the daemon:**

```bash
voicescript stop
```

## Output Profiles

VoiceScript shapes your spoken words into five distinct output formats. Tap F9 to record, hold F9 to cycle profiles (or use `voicescript profile next`).

| Profile | Icon | Description |
|---------|------|-------------|
| Transcript | 📝 | Light cleanup only — speech preserved verbatim, filler words and stutters removed |
| Email | 📧 | Polite professional email with greeting and closing signature |
| Slack | 💬 | Casual, informal message — no formal greetings or closings |
| Meeting Notes | 📋 | Bullet-point structured notes, organised by topic |
| Code Comment | 💻 | Clean technical documentation suitable for inline code comments |

All profiles preserve code-switching between Polish and English — no word is ever translated.

## Configuration

Config file location: `~/.config/voicescript/config.toml`

The file is created automatically on first run with the defaults shown below. Edit it with any text editor.

```toml
[transcription]
model = "large-v3"

[state]
active_profile = "transcript"
```

### Config keys

**`[transcription]`**

| Key | Default | Description |
|-----|---------|-------------|
| `model` | `"large-v3"` | Whisper model size. Options: `tiny`, `base`, `small`, `medium`, `large-v2`, `large-v3`, `large-v3-turbo`. Smaller models are faster but less accurate. |

**`[state]`**

| Key | Default | Description |
|-----|---------|-------------|
| `active_profile` | `"transcript"` | Last-used profile. Updated automatically when you cycle profiles — you do not need to set this manually. |

### Profile prompt overrides

You can replace any profile's Claude system prompt with your own. Add a `[profiles.PROFILENAME]` section:

```toml
[profiles.email]
prompt = """
Write this as a very brief 2-sentence email. No greeting, no closing.
Output ONLY the email body.

Text: {raw}
"""
```

The `{raw}` placeholder is replaced with the Whisper transcript at runtime. If you omit `{raw}`, the transcript will not be passed to Claude.

## Wayland Note

On native Wayland sessions, the global F9 hotkey is not available. This is a platform limitation — no stable Python library can capture global hotkeys on Wayland without elevated permissions.

**Workarounds:**

1. **Assign F9 in your desktop environment's keyboard settings** to run:

   ```
   voicescript trigger
   ```

   In GNOME: Settings → Keyboard → Custom Shortcuts. In KDE: System Settings → Shortcuts → Custom Shortcuts.

2. **Allow evdev-based hotkey capture** (requires logout):

   ```bash
   sudo usermod -aG input $USER
   ```

   Log out and back in. The daemon will detect this and use the evdev backend automatically.

3. **Cycle profiles from the command line** at any time:

   ```bash
   voicescript profile next
   ```

VoiceScript prints a diagnostic at daemon startup if it detects a native Wayland session.

## Troubleshooting

**"Error: ANTHROPIC_API_KEY is not set"**

```bash
export ANTHROPIC_API_KEY=your-key-here
```

Add this line to your shell profile (`~/.bashrc`, `~/.zshrc`) to make it permanent.

**"No audio device found" or microphone not working**

Check that your microphone is connected and that your OS has granted permission to access it. Verify available devices:

```bash
python -c "import sounddevice; print(sounddevice.query_devices())"
```

**"pywebview failed to import" or HUD not appearing**

Install the system browser engine for your OS (see [Requirements](#requirements) above). On Linux:

```bash
# Debian/Ubuntu
sudo apt install python3-gi-cairo libwebkit2gtk-4.1-0

# Fedora/RHEL
sudo dnf install webkit2gtk4.1
```

**HUD window opens but stays blank or crashes**

Check the HUD log:

```bash
cat ~/.cache/voicescript/hud.log
```

Common cause: missing system WebKit or WebView2. Reinstall the system browser engine.

**"Daemon is not running" when using start/stop/status/trigger**

The daemon process may have exited unexpectedly. Restart it:

```bash
voicescript stop
voicescript start
```

**First run takes a long time**

The Whisper `large-v3` model (~3 GB) is downloaded on first use. This is a one-time download cached at `~/.cache/huggingface/`. Subsequent runs load the model from disk.

## Screenshots

The HUD overlay sits at the bottom of your screen and shows the current state:

| State | Description |
|-------|-------------|
| Idle | Translucent bar showing active profile name and icon |
| Recording | Animated red pulse indicating audio capture in progress |
| Processing | Spinning indicator while Whisper transcribes and Claude cleans up |

<!-- Screenshots to be added after cross-platform HUD verification.
     Capture with any screenshot tool (e.g., gnome-screenshot, Snipping Tool, Cmd+Shift+4).
     Recommended: one GIF showing the full idle -> recording -> processing -> idle cycle. -->

## License

MIT — see [LICENSE](LICENSE).
