Metadata-Version: 2.4
Name: yazses
Version: 0.4.1
Summary: Local, offline voice dictation for Linux, macOS, and Windows — hold a key, speak, release
Project-URL: Homepage, https://github.com/novafabric/yazses
Project-URL: Repository, https://github.com/novafabric/yazses
Project-URL: Bug Tracker, https://github.com/novafabric/yazses/issues
Author-email: Mohsen Seyedkazemi Moghadam <mohsen.seyedkazemi@gmail.com>
License: Apache-2.0
License-File: LICENSE
Keywords: accessibility,dictation,linux,speech-to-text,voice,whisper
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: No Input/Output (Daemon)
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.11
Requires-Dist: evdev>=1.9.3; sys_platform == 'linux'
Requires-Dist: faster-whisper>=1.2.1
Requires-Dist: numpy>=2.4.5
Requires-Dist: pillow>=12.2.0; sys_platform == 'win32'
Requires-Dist: platformdirs>=4.9.6
Requires-Dist: pyobjc-framework-applicationservices>=12.1; sys_platform == 'darwin'
Requires-Dist: pyobjc-framework-avfoundation>=12.1; sys_platform == 'darwin'
Requires-Dist: pyobjc-framework-cocoa>=12.1; sys_platform == 'darwin'
Requires-Dist: pyobjc-framework-quartz>=12.1; sys_platform == 'darwin'
Requires-Dist: pystray>=0.19.5; sys_platform == 'win32'
Requires-Dist: pywin32>=311; sys_platform == 'win32'
Requires-Dist: rumps>=0.4.0; sys_platform == 'darwin'
Requires-Dist: sounddevice>=0.5.5
Requires-Dist: typer>=0.25.1
Provides-Extra: all
Requires-Dist: bleak>=3.0.2; extra == 'all'
Requires-Dist: llama-cpp-python>=0.3.23; extra == 'all'
Requires-Dist: pygls>=2.1.1; extra == 'all'
Requires-Dist: pynvim>=0.6.0; extra == 'all'
Requires-Dist: pyserial>=3.5; extra == 'all'
Provides-Extra: ble
Requires-Dist: bleak>=3.0.2; extra == 'ble'
Provides-Extra: emg
Requires-Dist: pyserial>=3.5; extra == 'emg'
Provides-Extra: lsp
Requires-Dist: pygls>=2.1.1; extra == 'lsp'
Requires-Dist: pynvim>=0.6.0; extra == 'lsp'
Provides-Extra: slm
Requires-Dist: llama-cpp-python>=0.3.23; extra == 'slm'
Description-Content-Type: text/markdown

# YazSes

Local, offline voice dictation for **Linux**, **macOS**, and **Windows**. Hold a key, speak, release — the transcribed text appears in whatever app is focused. No cloud, no GPU.

[![Tests](https://github.com/novafabric/yazses/actions/workflows/test.yml/badge.svg)](https://github.com/novafabric/yazses/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/yazses)](https://pypi.org/project/yazses/)
[![Get it from the Snap Store](https://snapcraft.io/en/dark/install.svg)](https://snapcraft.io/yazses)
[![Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

```
Hold the dictation key (>0.5s) → speak → release → text appears
```

Powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CPU/int8). Works in browsers, terminals, IDEs, chat apps — anywhere the OS lets keystrokes reach the focused window.

---

## Supported platforms

| OS      | Hotkey default | Install                                 | Status |
|---------|----------------|-----------------------------------------|--------|
| Linux   | `Space`        | apt / snap / PPA / pipx / .deb / installer | Stable |
| macOS   | `Right Option` | `.dmg` (Homebrew Cask coming)           | Developer preview (unsigned) |
| Windows | `Right Ctrl`   | `.exe` installer (winget coming)        | Developer preview (unsigned) |

> **Why Right Ctrl on Windows, not Right Alt?** On many international layouts Right Alt acts as **AltGr** — used to type `@`, `€`, `{}`, `[]`, `\`, `~`, etc. Hijacking it would break normal typing. Right Ctrl is rarely used for typing, so it's the safer default. Every platform's hotkey is configurable in `config.toml`.

---

## Quick install

One-line install on every major OS:

```sh
# macOS  — via Homebrew tap
brew tap novafabric/yazses && brew install --cask yazses

# Windows  — via winget (pending PR review at microsoft/winget-pkgs#371427)
winget install NovaFabric.YazSes

# Linux  — via the apt repo
bash <(curl -fsSL https://raw.githubusercontent.com/novafabric/yazses/main/install.sh)

# Cross-platform fallback — pip
pipx install yazses
```

After install:

| OS      | What's left                                                                                  |
|---------|----------------------------------------------------------------------------------------------|
| macOS   | Right-click → Open the first time (unsigned dev preview); grant **Accessibility** + **Microphone** when prompted; hold **Right Option** to dictate. |
| Windows | If SmartScreen warns, click **More info → Run anyway** (unsigned dev preview); hold **Right Ctrl** to dictate. |
| Linux   | `sudo usermod -aG input "$USER"` then re-login; `systemctl --user enable --now yazses.service`; hold **Space** to dictate. |

Full per-OS guides: [`docs/macos-install.md`](docs/macos-install.md), [`docs/windows-install.md`](docs/windows-install.md). Status of every distribution channel lives in [`docs/distribution-status.md`](docs/distribution-status.md).

### Other channels

If a one-liner above doesn't fit your environment, pick from the platform sections below.

#### macOS — alternatives

```sh
# Direct .dmg download (no Homebrew needed)
# https://github.com/novafabric/yazses/releases/latest
# Open the .dmg, drag YazSes.app into /Applications, right-click → Open the first time.
```

#### Windows — alternatives

```powershell
# Direct .exe download
# https://github.com/novafabric/yazses/releases/latest
# Click "More info → Run anyway" if SmartScreen warns.
```

#### Linux — alternatives

```bash
# APT repo (Debian/Ubuntu)
curl -fsSL https://novafabric.github.io/yazses/apt/KEY.gpg \
  | sudo gpg --dearmor --yes -o /usr/share/keyrings/yazses.gpg
echo "deb [signed-by=/usr/share/keyrings/yazses.gpg] https://novafabric.github.io/yazses/apt ./" \
  | sudo tee /etc/apt/sources.list.d/yazses.list
sudo apt update && sudo apt install yazses

# Launchpad PPA (Ubuntu)
sudo add-apt-repository ppa:novafabric/yazses
sudo apt update && sudo apt install yazses

# Snap (works on most distros after `snapd` is installed)
sudo snap install yazses --classic

# AUR (Arch / Manjaro / EndeavourOS)
yay -S yazses          # any AUR helper

# .deb download
# https://github.com/novafabric/yazses/releases/latest
sudo apt install ./yazses_*.deb

# pipx (any Linux)
sudo apt install libportaudio2 xdotool xclip pipx
pipx install yazses
```

---

## Optional extras

v0.4.0 introduces three opt-in feature groups. Install only what you need:

| Extra          | What it enables                                              | Dependencies installed          |
|----------------|--------------------------------------------------------------|---------------------------------|
| `yazses[slm]`  | SLM intent routing — natural phrasing for voice commands     | llama-cpp-python + GGUF model   |
| `yazses[lsp]`  | LSP code context injection — better identifier accuracy      | pygls, pynvim                   |
| `yazses[emg]`  | EMG silent speech backend — dictate without speaking aloud   | pyserial                        |
| `yazses[all]`  | All optional extras                                          | all of the above                |

```sh
pip install "yazses[slm]"        # SLM routing only
pip install "yazses[lsp]"        # LSP context only
pip install "yazses[emg]"        # EMG backend only
pip install "yazses[all]"        # everything
```

Each extra requires additional setup described in the [Configuration](#configuration) section below.

---

## Usage

YazSes runs silently in the background. The same CLI works on every platform.

| Command                        | What it does                                  |
|--------------------------------|-----------------------------------------------|
| Hold the hotkey, speak, release | Transcribe and inject text into focused app  |
| `yazses status`             | Daemon state, model, hotkey, backend, uptime  |
| `yazses start` / `stop`     | Manage the daemon                             |
| `yazses doctor`             | Per-platform prerequisite check               |
| `yazses inject "hello"`     | Type text without recording (debug)           |
| `yazses remote <host>`      | Forward voice typing to a remote SSH host     |
| `yazses remote --stop`      | Disconnect active remote session              |
| `yazses enroll`             | Calibration wizard for VAD / silence settings |

On macOS and Windows the **YazSes tray icon** changes color to reflect state (idle / recording / transcribing / remote / error).

### Voice commands (v0.4.0)

Speak natural commands while `[commands] enabled = true` (default). v0.4.0 adds a Tier 2 SLM routing layer (requires `yazses[slm]`) that handles natural, varied phrasing — you no longer need to say the exact canonical form:

| Say (examples)                               | Action                             |
|----------------------------------------------|------------------------------------|
| "undo" / "undo 3 times"                      | Ctrl+Z (×N)                        |
| "save file" / "save this" / "save it"        | Ctrl+S                             |
| "delete 2 words"                             | Ctrl+Backspace ×2                  |
| "delete 3 lines"                             | Delete 3 lines                     |
| "go to line 42"                              | Ctrl+G → "42" → Enter              |
| "comment selection"                          | Ctrl+/                             |
| "copy" / "paste"                             | Ctrl+C / Ctrl+V                    |
| "scratch that" / "delete that"               | Remove text back to last sentence  |
| "close this tab" / "close the current tab"   | Ctrl+W (SLM Tier 2)                |
| "zoom in" / "make this bigger"               | Ctrl++ (SLM Tier 2)                |

Without `yazses[slm]`, the Tier 1 regex grammar handles a fixed set of canonical phrases. With it, the SLM layer catches anything the regex misses, at the cost of ~50–200 ms additional latency per utterance.

Everything that does not match a command intent is typed verbatim.

---

## Configuration

`config.toml` lives in the platform's standard config dir:

| OS      | Path                                                      |
|---------|-----------------------------------------------------------|
| Linux   | `~/.config/yazses/config.toml`                         |
| macOS   | `~/Library/Application Support/yazses/config.toml`     |
| Windows | `%APPDATA%\yazses\config.toml`                         |

```toml
[stt]
model = "tiny.en"   # tiny.en (fast) | base.en (more accurate, slower)

[hotkey]
# "auto" → Space (Linux) / right_option (macOS) / right_ctrl (Windows).
key = "auto"
hold_threshold_ms = 500

[audio]
sample_rate = 16000
max_record_seconds = 90

[tray]
enabled = "auto"   # default true on macOS/Windows, false on Linux v0

[general]
log_level = "INFO"

# --- v0.3.0 additions (all optional — defaults shown) ---

[commands]
enabled = true          # voice command grammar (undo, save, go to line N, …)
profile = "auto"        # "auto" | "vscode" | "vim" | "default"

[filters.disfluency]
enabled = true          # remove filler words, repeated phrases, "scratch that"

[accessibility]
vad_threshold = 0.01    # silence threshold — run `yazses enroll` to calibrate
min_silence_ms = 500    # minimum silence to end a recording
pre_speech_padding_ms = 200   # prepend ring-buffer audio to catch voice onset

[streaming]
enabled = true          # emit stable partial transcripts while you speak
partial_interval_ms = 300

[remote]
default_host = ""       # SSH host for `yazses remote`
ssh_port = 22
agent_port = 9875
key_file = ""           # path to SSH private key (optional)

# --- v0.4.0 additions (all optional — defaults shown) ---

[commands]
# existing fields above…

# Tier 2 SLM routing (optional; requires `pip install yazses[slm]`)
# Download a GGUF model separately — TinyLlama (~700 MB) or Phi-3-mini (~2.2 GB).
slm_model_path = ""              # e.g. ~/.cache/yazses/models/tinyllama.gguf
slm_confidence_threshold = 0.75  # fall back to verbatim text below this score

# LSP code context injection (optional; requires `pip install yazses[lsp]`)
# Connects to Neovim or VS Code via LSP and feeds the active file's language,
# scope, and identifier list into Whisper's initial_prompt — significantly
# improves transcription accuracy for code identifiers spoken aloud.
lsp_enabled = false
lsp_editor = "auto"              # auto | neovim | vscode

[emg]
# EMG silent speech backend (optional; requires `pip install yazses[emg]` + device)
# Supported devices: YESP-protocol USB serial EMG headphones/wristbands.
# When active, replaces the hotkey-hold trigger — muscle signals start/stop capture.
device_port = ""                 # e.g. /dev/ttyUSB0, COM3
baud_rate = 115200
mode = "command"                 # command | full_text

# Map EMG gesture labels to voice-command strings (processed by the same
# grammar/SLM pipeline as spoken commands):
# [emg.command_map]
# save = "save file"
# undo = "undo"
```

---

## How it works

```
                   ┌─────────────────────┐
                   │  EMG backend        │  ← v0.4.0 (optional)
                   │  (YESP USB serial)  │
                   └──────────┬──────────┘
                              │ (alternative trigger)
┌──────────────┐   ┌──────────▼───────┐   ┌──────────────────────────────┐
│ Hotkey hook  │──▶│ Audio (16kHz     │──▶│ faster-whisper (CPU / int8)  │
│ (per-OS API) │   │  PortAudio)      │   │                              │◀──┐
└──────────────┘   └──────────────────┘   └──────────────┬───────────────┘   │
                                                         │                   │
                                          ┌──────────────▼───────────────┐   │
                                          │  LspContextProvider          │───┘
                                          │  (injects initial_prompt)    │  ← v0.4.0 (optional)
                                          └──────────────────────────────┘
                                                         │
                                          ┌──────────────▼───────────────┐
                                          │  disfluency filter           │  ← v0.3.0
                                          │  clean_text                  │
                                          └──────────────┬───────────────┘
                                                         │
                                          ┌──────────────▼───────────────┐
                                          │  Tier 1: grammar classifier  │  ← v0.3.0
                                          │  (regex, zero latency)       │
                                          └──────────────┬───────────────┘
                                                         │ (unmatched intents)
                                          ┌──────────────▼───────────────┐
                                          │  Tier 2: SLM router          │  ← v0.4.0 (optional)
                                          │  (llama-cpp-python + GGUF)   │
                                          └──────────────┬───────────────┘
                                                         │
                                          ┌──────────────▼───────────────┐
                                          │  Text injector               │
                                          │  (local or SSH remote)       │  ← v0.3.0
                                          └──────────────────────────────┘
       │
       └─────────── daemon process ──────────────────────────────────────
                          ▲
              JSON-RPC over Unix socket / named pipe
                          │
                ┌─────────┴─────────┐
                │     CLI / tray    │
                └───────────────────┘
```

Every platform-specific surface (keyboard hook, text injection, autostart, IPC, paths, permissions, tray) lives behind a single Protocol-based abstraction in `src/yazses/platform/`. Adding a fifth platform is a matter of writing one more sub-package.

### Remote voice forwarding (v0.3.0)

```
Local machine                             Remote machine
─────────────────────────────────────     ──────────────────────
microphone → daemon → transcript ──SSH──▶ yazses-agent → injector
                                  tunnel       (types into remote app)
```

Start: `yazses remote user@remote-host`  
Stop:  `yazses remote --stop`

Only the transcript text travels over SSH — audio never leaves the local machine.

### LSP code context injection (v0.4.0)

When `lsp_enabled = true`, YazSes connects to the running Neovim or VS Code LSP server and queries the active buffer for its language, current scope, and visible symbol names. This list is passed to faster-whisper as `initial_prompt`, biasing the model toward the identifiers actually present in the file. In practice this eliminates most transcription errors on camelCase and snake_case names spoken aloud.

Requires `pip install yazses[lsp]` and a running editor with an active LSP session.

### EMG silent speech (v0.4.0)

When an EMG device is configured, muscle-signal onset/offset replaces the hotkey-hold trigger. Audio is captured normally; the user does not need to speak aloud — the EMG envelope alone gates recording. This is useful in open-plan offices or wherever speaking is impractical.

Supported protocol: YESP (USB CDC serial). Hardware examples: YESP-1 EMG headband, compatible wristbands. The `mode = "full_text"` setting attempts continuous dictation; `mode = "command"` maps gesture labels via `[emg.command_map]`.

Requires `pip install yazses[emg]` and a compatible device.

---

## Build from source

```bash
git clone https://github.com/novafabric/yazses
cd yazses
uv sync
uv run pytest tests/ -v   # 246 tests across all platforms
```

Platform-specific installers:

```bash
# macOS — produces dist/YazSes-<v>.dmg
./scripts/build-macos.sh

# Windows — produces dist/YazSes-<v>-windows-x64.exe
./scripts/build-windows.ps1

# Linux .deb
./scripts/build-deb.sh
```

CI builds the unsigned `.dmg` and `.exe` on every PR that touches the relevant code paths.

---

## Troubleshooting

- `yazses doctor` — first stop. Tells you what's missing on the current OS.
- **macOS**: see [`docs/macos-install.md`](docs/macos-install.md) for Gatekeeper / Accessibility / Microphone.
- **Windows**: see [`docs/windows-install.md`](docs/windows-install.md) for SmartScreen / antivirus / privacy.
- **Linux**: confirm you're in the `input` group; check `journalctl --user -u yazses.service -f`.

---

## License

Apache 2.0 — see [LICENSE](LICENSE).
