Metadata-Version: 2.4
Name: scribe-cli
Version: 0.17.1
Summary: Speech-to-text CLI and system-tray app for dictating into any focused window. Local (vosk, faster-whisper) or cloud (groq, openai) backends, batch or streaming.
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
License: MIT License
        
        Copyright (c) 2024 Mahé Perrette
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
        ---
        
        Note: This project relies on external packages that may have more restrictive
        licenses. For example, the `pynput` package is licensed under LGPLv3, which
        has different requirements compared to the MIT License. Please review the
        licenses of all dependencies before using or distributing this software to
        ensure compliance with their respective terms.
Project-URL: Homepage, https://github.com/perrette/scribe
Keywords: speech-to-text,speech recognition,transcription,dictation,voice-typing,voice-to-text,realtime,streaming,language,AI,local,API,cli,tray,vosk,whisper,openai,groq,gpt-4o,linux,wayland,keyboard,clipboard
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: sounddevice
Requires-Dist: tqdm
Requires-Dist: requests
Requires-Dist: pyperclip
Requires-Dist: unidecode
Requires-Dist: termcolor
Requires-Dist: platformdirs
Requires-Dist: desktop-ai-core>=0.2.0
Provides-Extra: keyboard
Requires-Dist: pynput; extra == "keyboard"
Provides-Extra: whisper
Requires-Dist: faster-whisper; extra == "whisper"
Provides-Extra: whisper-futo
Requires-Dist: pywhispercpp; extra == "whisper-futo"
Provides-Extra: vosk
Requires-Dist: vosk; extra == "vosk"
Provides-Extra: app
Requires-Dist: pystray; extra == "app"
Requires-Dist: PyGObject; extra == "app"
Provides-Extra: openai
Requires-Dist: openai<3,>=2.37.0; extra == "openai"
Requires-Dist: soundfile; extra == "openai"
Provides-Extra: groq
Requires-Dist: openai<3,>=2.37.0; extra == "groq"
Requires-Dist: soundfile; extra == "groq"
Provides-Extra: all
Requires-Dist: pynput; extra == "all"
Requires-Dist: faster-whisper; extra == "all"
Requires-Dist: pywhispercpp; extra == "all"
Requires-Dist: openai<3,>=2.37.0; extra == "all"
Requires-Dist: soundfile; extra == "all"
Requires-Dist: vosk; extra == "all"
Requires-Dist: pystray; extra == "all"
Dynamic: license-file

[![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
![](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fperrette%2Fscribe%2Frefs%2Fheads%2Fmain%2Fpyproject.toml)

# Scribe  <img src="https://github.com/perrette/scribe/raw/main/scribe_data/share/icon.png" width="48">

**Talk. It types.** Scribe is a speech-to-text CLI and tray app that
pipes transcribed text straight into the focused window. It supports local and
cloud-based APIs, batch and streaming workflows.

## What it does

- Records from your mic and transcribes via one of four backends —
  **Vosk** (local, streaming), **Whisper** (local, batch), **OpenAI**
  (cloud, batch *or* streaming), **Groq** (cloud, batch).
- Delivers the transcript three ways: paste into the focused window
  (default), copy to clipboard, or print to the terminal.
- Runs as a **system tray icon** with a single Record button, or as an
  interactive **terminal TUI** — same menu in both.
- Hooks into your DE's keyboard shortcuts via `SIGUSR1` (toggle
  recording) and `SIGUSR2` (cancel).
- Cross-platform: tested on Ubuntu (X11 and Wayland), macOS, Windows;
  works under Termux for clipboard / terminal output.

## Install

```bash
sudo apt-get install portaudio19-dev xclip   # Ubuntu; macOS: brew install portaudio
pip install scribe-cli[all]
export GROQ_API_KEY=YOURAPIKEY                # or OPENAI_API_KEY, or skip and run local
```

See documentation below for setting up keyboard input on Ubuntu Wayland.


## Usage

In a terminal:

```bash
scribe
```

This launches the system tray icon. Press Record, speak, press Stop —
the transcription lands in the focused window. Scribe picks the first
backend whose key / dependency is present, in order **`groq` →
`openai` → `whisper` → `vosk`**, so with `GROQ_API_KEY` set the
command above is equivalent to:

```bash
scribe --backend groq --model whisper-large-v3-turbo
```

<img src=https://raw.githubusercontent.com/perrette/scribe/main/docs/app-tray-menu.png width=300px>

You can override the defaults or drop the tray entirely:

```bash
scribe --backend openai --model gpt-4o-mini-transcribe # OpenAI sweet spot
scribe --backend openai --model gpt-realtime-whisper   # OpenAI streaming
scribe --backend whisper --model small                 # local, no API key
scribe --frontend terminal                             # interactive TUI menu
scribe --frontend terminal --no-interactive            # record immediately, no menu
scribe --mode clipboard                                # copy to clipboard, no keystroke
scribe --mode terminal                                 # only print to stdout
scribe -o transcript.txt                               # also append to a file
```

With `--no-interactive` (terminal frontend only), scribe skips the
interactive menu and starts recording right away — handy for scripted,
one-shot transcriptions. `--no-prompt` is kept as a deprecated alias.

Bias the recogniser toward names, jargon, or a domain glossary with
`--prompt "free text hint"` and `--words word1 word2 ...` (each also
accepts a `--prompt-file` / `--words-file` companion). See
[docs/backends.md › Vocabulary biasing](docs/backends.md#vocabulary-biasing)
for what each backend does with them.


## Backends at a glance

| Backend         | `--backend` | Default model              | Streaming model(s)        | Requires                            |
|-----------------|-------------|----------------------------|---------------------------|-------------------------------------|
| Groq (cloud)    | `groq`      | `whisper-large-v3-turbo`   | —                         | `GROQ_API_KEY`                      |
| OpenAI (cloud)  | `openai`    | `gpt-4o-mini-transcribe`   | `gpt-realtime-whisper`    | `OPENAI_API_KEY`                    |
| Whisper (local) | `whisper`   | `small`                    | —                         | `pip install scribe-cli[whisper]`   |
| Vosk (local)    | `vosk`      | language-dependent         | all Vosk models           | `pip install scribe-cli[vosk]`      |

Whether a transcription appears live as you speak or all at once when
you stop depends on the **model** picked — see
[docs/backends.md](docs/backends.md).


### Getting an API key

Groq is a good cloud backend to start with — very fast, quite accurate, and the
**free tier** is generous enough for everyday dictation. Sign up at
[console.groq.com](https://console.groq.com/), create an API key
under **Settings → API Keys**, and export it as `GROQ_API_KEY`.

I personally use [OpenAI](https://openai.com/api/) with `gpt-4o-mini-transcribe` as it is also fast and perhaps more accurate for my accent-tainted English.


## Documentation

- [Installation & dependencies](docs/installation.md) — PortAudio,
  extras, Ubuntu / GNOME tray libs.
- [Backends in detail](docs/backends.md) — model lists, when to pick
  which, the realtime model.
- [Keyboard modes & typer backends](docs/keyboard.md) — keystroke vs
  clipboard, Wayland / `eitype`, `--type-direct`.
- [System tray & global hotkeys](docs/tray.md) — menu tree, icon
  states, `SIGUSR1`/`SIGUSR2`.
- [Desktop entry & autostart (`scribe-install`)](docs/desktop-install.md)
  — GNOME / KDE launcher integration.
- [Fine tuning & CLI reference](docs/cli.md) — every `scribe --help`
  flag with examples.

## Compatibility

Initially developed for Python 3 on Ubuntu 24.04 (GNOME + Wayland);
works on macOS and Windows too. Wayland keystroke injection is
convoluted but [solved](docs/keyboard.md). For dependencies of
individual subsystems, check `pynput` (keyboard) and `pystray` (tray
icon).
