Metadata-Version: 2.4
Name: whisper-local
Version: 0.10.0
Summary: Free, open-source, fully offline AI dictation for Windows and macOS. Hold a hotkey, speak, get text at the cursor. 100% local. Powered by Whisper.
Author-email: Rohit Burani <50279000+drajb@users.noreply.github.com>
Maintainer-email: Rohit Burani <50279000+drajb@users.noreply.github.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/drajb/whisper-local
Project-URL: Repository, https://github.com/drajb/whisper-local
Project-URL: Documentation, https://github.com/drajb/whisper-local#readme
Project-URL: Issues, https://github.com/drajb/whisper-local/issues
Project-URL: Discussions, https://github.com/drajb/whisper-local/discussions
Project-URL: Changelog, https://github.com/drajb/whisper-local/blob/master/CHANGELOG.md
Project-URL: Author, https://gekro.com
Keywords: dictation,speech-to-text,whisper,voice-typing,transcription,ai-dictation,local-ai,offline,push-to-talk,voice-recognition,accessibility,faster-whisper,privacy,self-hosted,wispr-flow,ollama,voice-commands,windows,macos
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Environment :: Win32 (MS Windows)
Classifier: Environment :: MacOS X
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Developers
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Office/Business
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS.md
Requires-Dist: faster-whisper>=1.2.1
Requires-Dist: ctranslate2==4.7.1
Requires-Dist: numpy>=1.24.0
Requires-Dist: soxr>=0.3.0
Requires-Dist: sherpa-onnx>=1.11.3
Requires-Dist: sounddevice>=0.4.6
Requires-Dist: pyperclip>=1.8.2
Requires-Dist: ruamel.yaml>=0.18.14
Requires-Dist: pystray>=0.19.5
Requires-Dist: Pillow>=10.0.0
Requires-Dist: hf-xet>=1.1.5
Requires-Dist: playsound3>=2.0
Requires-Dist: ten-vad>=1.0.6
Requires-Dist: global-hotkeys>=0.1.7; sys_platform == "win32"
Requires-Dist: pywin32>=306; sys_platform == "win32"
Requires-Dist: pyobjc-framework-Quartz; sys_platform == "darwin"
Requires-Dist: pyobjc-framework-ApplicationServices; sys_platform == "darwin"
Provides-Extra: whispercpp
Requires-Dist: pywhispercpp>=1.2; extra == "whispercpp"
Provides-Extra: noise
Requires-Dist: noisereduce>=3.0.0; extra == "noise"
Dynamic: license-file

<div align="center">

# Whisper Local

### Free, Open-Source, **100% Offline** AI Dictation for Windows & macOS

**Press a hotkey. Speak. Your words appear at the cursor.**
No cloud. No subscription. No telemetry. Powered by [OpenAI Whisper](https://github.com/openai/whisper).

[![Tests](https://github.com/drajb/whisper-local/actions/workflows/test.yml/badge.svg)](https://github.com/drajb/whisper-local/actions/workflows/test.yml)
[![Release](https://github.com/drajb/whisper-local/actions/workflows/release.yml/badge.svg)](https://github.com/drajb/whisper-local/actions/workflows/release.yml)
[![PyPI](https://img.shields.io/pypi/v/whisper-local.svg)](https://pypi.org/project/whisper-local/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Platform](https://img.shields.io/badge/platform-Windows%20%7C%20macOS-lightgrey.svg)](#-quick-start)
[![Code of Conduct](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)

[**Quick Start**](#-quick-start) · [**Features**](#-features) · [**vs. Wispr Flow / Dragon**](#-why-whisper-local) · [**Voice Commands**](#-voice-commands) · [**Contributing**](#-contributing)

</div>

![Whisper Local — press, speak, type](docs/hero.svg)

<sub>Want a real screen-recording demo here? See [`docs/demo-recording.md`](docs/demo-recording.md) — drop a `docs/demo.gif` in and uncomment the line below.</sub>
<!-- ![Demo](docs/demo.gif) -->


Whisper Local is a **free, open-source, fully offline alternative to Wispr Flow, Dragon, and Otter** for power users who want **AI dictation without sending audio to the cloud**. Built on [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper) (CTranslate2), it delivers **push-to-talk speech-to-text in any application** — chat apps, code editors, browsers, terminals, design tools, anywhere a cursor blinks. Self-hosted, hackable, MIT-licensed.

**Looking for:** *Wispr Flow alternative*, *offline voice typing*, *local Whisper dictation*, *free Dragon NaturallySpeaking alternative*, *privacy-first speech-to-text*, *Windows voice dictation without cloud*, *macOS push-to-talk transcription*. You found it.

---

## 🌟 Why this exists

Most AI dictation tools are great — until you check the privacy policy. Your audio goes to a server, gets processed, and (sometimes) stored. You pay a monthly fee or get cut off.

Whisper Local exists because **you shouldn't have to choose between accuracy and privacy.**

- 🔒 Your voice never leaves your machine — not even metadata
- 🆓 Free forever — no account, no API key, no subscription
- 🔌 Works offline, air-gapped, after the internet is gone
- 🛠️ Fork it, hack it, ship your own version — MIT licensed
- 💡 Same Whisper model quality as cloud services, running on your own GPU

This is a **community tool**, not a product. There's no support SLA, no roadmap committee, no marketing. If it's useful to you, great. If something's broken, PRs are welcome.

> **A note from the maintainer:** I built this for myself, then realised it might help others. So I'm releasing it **for anyone who wants it** — no strings attached. Use it. Fork it. Rebrand it. Ship your own version. The only thing I ask is that you keep the LICENSE attribution intact (to Pin Wang, the original upstream author, and to me as the fork maintainer). If you build something cool on top of it, I'd love to hear about it via a [Discussion](https://github.com/drajb/whisper-local/discussions) — but you don't owe anyone anything.
>
> — **Rohit Burani**

---

## ✨ Why Whisper Local?

| Feature | **Whisper Local** | Wispr Flow | Dragon / Dragon Anywhere | Otter.ai | Windows Speech Recognition |
|---|:---:|:---:|:---:|:---:|:---:|
| **Runs 100% offline** | ✅ | ❌ | ❌ (Anywhere) | ❌ | ✅ |
| **Audio never leaves your machine** | ✅ | ❌ | ❌ | ❌ | partial |
| **Free / open source** | ✅ | ❌ | ❌ ($$$/yr) | ❌ ($$/mo) | ✅ |
| **Modern AI accuracy (Whisper)** | ✅ | ✅ | partial | ✅ | ❌ |
| **Works in any app via hotkey** | ✅ | ✅ | partial | ❌ | partial |
| **Customisable voice commands** | ✅ | partial | ✅ | ❌ | ❌ |
| **Push-to-talk + auto-paste + auto-send** | ✅ | ✅ | partial | ❌ | ❌ |
| **GPU acceleration (NVIDIA & AMD)** | ✅ | n/a | n/a | n/a | ❌ |
| **AI rephrase / transforms (Ollama)** | ✅ | ✅ | ❌ | ❌ | ❌ |
| **Hackable / MIT licensed** | ✅ | ❌ | ❌ | ❌ | ❌ |
| **No account required** | ✅ | ❌ | ❌ | ❌ | ✅ |

---

## 🎯 Features

- 🎙️ **Global push-to-talk hotkey** — start recording from any app with `Ctrl+Win` (Windows) or `Fn+Ctrl` (macOS)
- ⚡ **Pre-roll buffer + warmup** — captures the 500 ms before you press the key *and* pre-loads Whisper at boot, so the first word is never clipped and the first recording feels instant
- 🔵 **Floating level overlay** — a small pill at the screen edge shows you're being heard, with the transcript appearing next to the level bar (Wispr Flow–style). Optional [real-time streaming preview](docs/streaming.md) shows words *as you speak*.
- 📝 **Inline voice formatting** — say "comma", "period", "question mark", "new paragraph", "open quote", etc. mid-sentence
- 🤖 **AI rephrase** — dedicated `Ctrl+Shift+Win` hotkey: select text, hold, speak your instruction, release — local Ollama rewrites it in place
- 🌐 **Translation mode** — speak any language, get English; tray → Profile → Translate
- 🔁 **Continuous dictation mode** — for long-form notes, the app auto-restarts recording after each delivery
- 📋 **Fallback window** — if no text field is focused, the transcript appears in a small window (pre-selected, copy button, already on clipboard)
- ⏸ **Pause-all hotkey** — `Ctrl+Alt+Win` disables every Whisper Local hotkey until you press it again
- 📋 **Auto-paste at cursor** — transcript lands wherever you're typing, optionally followed by Enter (auto-send)
- 🔒 **100 % local & private** — no network calls during use; Whisper models cached on disk
- 🚀 **GPU acceleration** — NVIDIA CUDA and AMD ROCm supported, CPU works out of the box
- 🗣️ **Voice commands** — say a trigger phrase to send a hotkey, type pre-written text, or run a shell command
- 🔁 **Hot-reload** — edit `commands.yaml` and your change applies on the next transcription, no restart
- 🩺 **Built-in diagnostics** — `whisper-local --doctor` checks audio devices, model cache, hotkeys, and recent errors
- 🎛️ **Profiles** — switch between Dictation / Chat / Code / Notes presets from the tray
- 🪟 **Per-app rules** — different behaviour per foreground app (auto-send in Slack, copy-only in VS Code, suppress in 1Password)
- 🧹 **Optional LLM cleanup** — pipe transcripts through a local [Ollama](https://ollama.ai) model for punctuation / capitalisation polish (off by default, fully local)
- 📜 **Recent transcriptions** — last 10 results in the tray menu, click to copy back
- 🔧 **Settings backup/restore** — `--export-settings` / `--import-settings` for portability
- 🖥️ **Settings UI** — `whisper-local --settings` opens a GUI settings window (no YAML editing required)
- 📜 **Transcript history** — `whisper-local --history` opens a searchable log of everything you've dictated
- 🔔 **Opt-in update notifications** — daily GitHub release check, fully offline by default (`update_check.enabled: true` to opt in)
- 🎚️ **Noise suppression** — spectral gating via `noisereduce`, off by default (`pip install 'whisper-local[noise]'`)
- 🩺 **`--selftest`** — one-command sanity check (mic, model, transcription, clipboard) — perfect for first-launch
- 🎯 **Hotkey cheat sheet** — `whisper-local --cheat-sheet` or tray menu — shows your *current* configured hotkeys at a glance
- 📦 **`--bundle-logs`** — zip up redacted logs + diagnostics for bug reports with one command
- 🌐 **Local OpenAI-compatible API** — `whisper-local --serve` exposes `POST /v1/audio/transcriptions` on `localhost:7777` for Cursor, Open WebUI, anything that speaks OpenAI Whisper API
- 🛡️ **Auto-recovery** — silently reconnects when a USB mic is unplugged mid-recording
- 🛡️ **Crash reports** — uncaught errors write a self-contained dump to disk
- 🪟 **System tray UI** — model selection, mic selection, profile switch, diagnostics
- 🍎 **Cross-platform** — Windows 10+, macOS

---

## 🚀 Quick Start

### Install (Python 3.11–3.13)

```bash
git clone https://github.com/drajb/whisper-local.git
cd whisper-local
pip install -e .
```

### Launch

| | |
|---|---|
| **Terminal** | `whisper-local` (or `wl` for short) |
| **Double-click** | `whisper-local.cmd` (Windows) |
| **Start Menu / Startup** | Create a shortcut to `whisper-local.cmd`, drop it in `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\` for autostart |

First launch downloads the [`tiny`](https://huggingface.co/Systran/faster-whisper-tiny) Whisper model (~75 MB) into your HuggingFace cache. After that, **everything runs offline**.

### Use it

| Action | Windows | macOS |
|---|---|---|
| Hold to record | `Ctrl+Win` | `Fn+Ctrl` |
| Stop & paste | release key (push-to-talk) or `Ctrl` | release or `Fn` |
| Stop & auto-send (Enter) | `Alt` | `Option` |
| Cancel | `Esc` | `Shift` |
| Voice command mode | `Alt+Win` | `Fn+Command` |

### Verify everything works

```bash
whisper-local --doctor
```

Runs through Python version, dependencies, config validation, audio devices, model cache, hotkey backend, and recent log errors. Exit 0 = clean.

---

## 🗣️ Voice Commands

Speak a trigger to run keyboard shortcuts, type snippets, or launch programs. Defined in:

- **Windows:** `%APPDATA%\whisperkey\commands.yaml`
- **macOS:** `~/.whisperkey/commands.yaml`

```yaml
commands:
  # Send a keyboard shortcut
  - trigger: "undo"
    hotkey: "ctrl+z"

  # Deliver pre-written text
  - trigger: "my email"
    type: "user@example.com"

  # Run a shell command
  - trigger: "open notepad"
    run: 'notepad.exe'
```

Edits hot-reload — no app restart required. See **[docs/voice-commands.md](docs/voice-commands.md)** for the full guide.

> ⚠️ Voice commands with `run:` execute through your system shell with your user privileges. Only add commands you trust.

---

## ⚡ GPU Acceleration

On first launch, Whisper Local detects your GPU and offers one-press install of the required runtime libraries. Supports **NVIDIA CUDA** and **AMD ROCm**.

For manual setup or AMD RDNA 1, see **[docs/gpu-setup.md](docs/gpu-setup.md)**.

---

## 🌐 Local OpenAI-Compatible API

Whisper Local doubles as a **drop-in local replacement for the OpenAI Whisper API** — fully offline. Point any tool that speaks `POST /v1/audio/transcriptions` at it (Cursor, VS Code Continue, Open WebUI, n8n, custom scripts, anything else).

```bash
whisper-local --serve            # listens on http://127.0.0.1:7777
whisper-local --serve --serve-port 8080
```

```bash
# Drop-in compatible with the OpenAI SDK:
curl -X POST http://127.0.0.1:7777/v1/audio/transcriptions \
  -F file=@audio.wav -F model=whisper-1 -F response_format=text
```

Same Whisper model you use for dictation. Same GPU. No API key. No rate limit. No outgoing traffic.

---

## 🎛️ Profiles

Switch between presets from the tray icon → **Profile**:

| Profile | Behaviour |
|---|---|
| **Dictation** | General-purpose voice typing, auto-paste on |
| **Chat** | Push-to-talk, auto-paste + auto-send via `Alt` |
| **Code** | Copy-only mode for editors, never auto-sends |
| **Notes** | Quiet copy-to-clipboard, voice commands disabled |

Edit or add new profiles in `%APPDATA%\whisperkey\profiles.yaml`.

---

## 🪟 Per-app rules

Different apps want different behaviour. Whisper Local detects the
foreground window before delivering each transcription and matches it
against rules in `%APPDATA%\whisperkey\app_rules.yaml`:

```yaml
rules:
  # Chat apps: send the message immediately
  - match: ["slack.exe", "discord.exe"]
    auto_send: true

  # Code editors: never auto-send, copy only
  - match: ["code.exe", "cursor.exe"]
    auto_paste: false

  # Password managers: skip delivery entirely
  - match: ["1password.exe", "bitwarden.exe"]
    suppress: true
```

Hot-reloads — edit and the next transcription picks it up.

## 🧹 Optional LLM cleanup

If you have [Ollama](https://ollama.ai) running locally, Whisper Local can
pipe each transcript through a small local model for punctuation and
capitalisation polish. **Off by default and fully local** — set
`postprocess.ollama.enabled: true` in `user_settings.yaml` to enable.

```yaml
postprocess:
  capitalize_first: true        # works without Ollama
  ensure_punctuation: true      # works without Ollama
  strip_filler_words: true      # works without Ollama
  ollama:
    enabled: false              # set true to opt in
    endpoint: http://localhost:11434
    model: llama3.2
    timeout: 5
```

## ⚙️ Configuration

Local settings live at:

- **Windows:** `%APPDATA%\whisperkey\user_settings.yaml`
- **macOS:** `~/.whisperkey/user_settings.yaml`

Delete the file and restart to reset to defaults. Highlights:

| Option | Default | Notes |
|---|---|---|
| `whisper.model` | `tiny` | Any model from `whisper.models`. Larger = more accurate, slower |
| `whisper.device` | `cpu` | `cpu` or `cuda` (NVIDIA/AMD) |
| `whisper.compute_type` | `int8` | `int8`/`float16`/`float32` |
| `whisper.language` | `auto` | Auto-detect or specific language code |
| `whisper.hotwords` | `[]` | Words the model should favour — names, jargon |
| `hotkey.recording_hotkey` | `ctrl+win` | Configurable |
| `hotkey.recording_mode` | `toggle` | `toggle` or `push_to_talk` |
| `vad.vad_realtime_enabled` | `true` | Auto-stop on silence |
| `clipboard.auto_paste` | `true` | `false` = copy only |
| `clipboard.delivery_method` | `paste` | `paste` (Ctrl+V) or `type` (direct injection) |
| `voice_commands.enabled` | `true` | Enable command mode |
| `audio.host` | `null` | `WASAPI` recommended on Windows for low latency |

Full reference: [`config.defaults.yaml`](src/whisper_key/config.defaults.yaml).

---

## 🛠️ CLI Reference

```bash
whisper-local                      # Run the app (or use `wl`)
whisper-local --setup              # Interactive setup wizard (model, mode, mic)
whisper-local --doctor             # Run diagnostics
whisper-local --stats              # Transcription history & time saved
whisper-local --version            # Print version
whisper-local --quit               # Stop the running instance
whisper-local --export-settings DIR        # Back up user_settings + commands
whisper-local --import-settings DIR        # Restore from a backup
whisper-local --export-transcripts FILE    # Dump history (.txt/.md/.csv)
whisper-local --import-vocab FOLDER        # Mine a folder for hotwords
whisper-local --settings           # Open the settings GUI (no YAML editing required)
whisper-local --history            # Browse and search transcript history
whisper-local --cheat-sheet        # Show your currently configured hotkeys
whisper-local --selftest           # Run an automated self-test (mic, model, transcription)
whisper-local --bundle-logs        # Create a redacted diagnostic zip for bug reports
whisper-local --serve              # Run a local OpenAI-compatible Whisper API on :7777
whisper-local --test               # Run a separate test instance (own mutex)
```

Launching while an instance is already running **takes over** — the old one is replaced cleanly, no manual quit needed.

---

## 🏗️ How it works

```
┌─────────────────────┐  ┌──────────────────┐  ┌─────────────────────┐
│  global-hotkeys /   │  │   sounddevice +  │  │  faster-whisper /   │
│  NSEvent (macOS)    │─▶│  500ms ring buf  │─▶│  ctranslate2 (GPU)  │
└─────────────────────┘  │  + TEN VAD       │  └──────────┬──────────┘
                         └──────────────────┘             │
                                                          ▼
                         ┌──────────────────┐  ┌─────────────────────┐
                         │  Voice command   │◀─│  Transcribed text   │
                         │  matcher         │  │                     │
                         └──────────────────┘  └──────────┬──────────┘
                                                          ▼
                                                ┌─────────────────────┐
                                                │  ctypes SendInput / │
                                                │  Quartz CGEvent     │
                                                │  → cursor           │
                                                └─────────────────────┘
```

---

## 🔒 Privacy pledge

Whisper Local makes the following network calls and **no others**:

1. **First launch only:** downloads the Whisper model from `huggingface.co` into your local cache.
2. **GPU onboarding (opt-in):** if you accept the GPU setup prompt, `pip install` pulls CUDA / ROCm runtime packages from PyPI / `repo.radeon.com`.

After setup, **zero network traffic**. Confirm by running `whisper-local --doctor` and inspecting the source — every network entry point lives in [`onboarding.py`](src/whisper_key/onboarding.py) and is gated behind explicit user prompts.

---

## 📦 Tech stack

`faster-whisper` · `ctranslate2` · `sounddevice` · `ten-vad` · `pyperclip` · `pystray` · `ruamel.yaml` · `playsound3`
**Windows-only:** `global-hotkeys` · `pywin32` · ctypes `SendInput`
**macOS-only:** `pyobjc-framework-Quartz` · `pyobjc-framework-ApplicationServices`

---

## 📚 Documentation

- **[docs/troubleshooting.md](docs/troubleshooting.md)** — symptom → cause → fix table for the most common issues
- **[docs/faq.md](docs/faq.md)** — privacy, comparisons (Whisper.cpp / WSR / Wispr Flow / Dragon), model picks, GPU notes
- **[docs/distribution.md](docs/distribution.md)** — how releases work (standalone `.exe`, PyPI, winget, Homebrew) and how to ship one
- **[docs/voice-commands.md](docs/voice-commands.md)** — the full voice command DSL
- **[docs/gpu-setup.md](docs/gpu-setup.md)** — manual GPU setup for NVIDIA / AMD
- **[CHANGELOG.md](CHANGELOG.md)** — release notes

Hit a wall? Run `whisper-local --doctor` or `whisper-local --selftest` first — they catch 90% of issues.

---

## 🤝 Contributing

Contributions of all kinds are welcome — bug fixes, new features, docs improvements, or just opening an issue with a clear reproduction. This project is maintained on a best-effort basis with no SLA; please be patient with response times.

```bash
git clone https://github.com/drajb/whisper-local.git
pip install -e .
python -m unittest tests.test_smoke   # smoke suite — should report OK
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) for community standards. By contributing you agree your code will be MIT licensed. Found a security issue? See [SECURITY.md](SECURITY.md) — please don't open a public issue.

Good first issues are tagged [here](https://github.com/drajb/whisper-local/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22). The full credit list is in [AUTHORS.md](AUTHORS.md).

---

## ☕ Support

Whisper Local is free and always will be. If it saves you time or a monthly subscription, consider starring the repo and sharing it with people who'd find it useful — it helps the project grow.

No pressure. Starring the repo and sharing it with people who'd find it useful is just as helpful.

---

## 🙏 Credit

Forked from [whisper-key-local](https://github.com/PinW/whisper-key-local) by **Pin Wang** — huge thanks to the original work that made this fork possible. The full list of credits, including every open-source library Whisper Local builds on, is in [`AUTHORS.md`](AUTHORS.md).

MIT licensed; original copyright preserved in [`LICENSE`](LICENSE).

---

<div align="center">

### ⭐ If you find this useful, please [star the repo](https://github.com/drajb/whisper-local) — it helps others discover it.

**Maintained by [Rohit Burani](https://gekro.com) ([@drajb](https://github.com/drajb))**

[Website](https://gekro.com) · [GitHub](https://github.com/drajb) · [Discussions](https://github.com/drajb/whisper-local/discussions) · [Report a bug](https://github.com/drajb/whisper-local/issues/new?template=bug_report.yml) · [Request a feature](https://github.com/drajb/whisper-local/issues/new?template=feature_request.yml)

<sub>Tags: whisper · dictation · speech-to-text · voice-typing · transcription · ai-dictation · local-ai · offline · push-to-talk · voice-recognition · accessibility · faster-whisper · privacy · self-hosted · wispr-flow-alternative · dragon-naturallyspeaking-alternative · otter-alternative · ollama · voice-commands · windows · macos · python</sub>

</div>
