Dictation

Dictation runtime setup

The dictation pipeline uses an LLM runtime for block classification and the optional project-context rewrite. Project Facts enrichment is deterministic and needs no model. Pick a local model backend or point HoldSpeak at an OpenAI-compatible endpoint. The model names below are suggestions, not requirements. Bring your own (see docs/MODELS.md).

Apple Silicon

recommended on M-series

Use MLX on Apple Silicon when available. The model below is a suggestion — any MLX chat model works; bring your own.

Install + download (example model).
uv pip install -e '.[dictation-mlx]'

# Example — swap for any current MLX chat model:
huggingface-cli download mlx-community/Qwen3.5-8B-MLX-4bit --local-dir ~/Models/mlx/Qwen3.5-8B-MLX-4bit

Default model path: ~/Models/mlx/Qwen3.5-8B-MLX-4bit

OpenAI-compatible endpoint

local server or hosted API

Use openai_compatible for LM Studio, Ollama's OpenAI bridge, vLLM, llama.cpp server, LiteLLM, or hosted OpenAI-compatible APIs.

Install + configure endpoint.
uv pip install -e '.[dictation-openai]'

# Example config slice:
# "dictation": {
#   "pipeline": { "enabled": true },
#   "runtime": {
#     "backend": "openai_compatible",
#     "openai_compatible_base_url": "http://127.0.0.1:8000/v1",
#     "openai_compatible_model": "qwen3.5-8b-instruct",
#     "openai_compatible_api_key_env": "OPENAI_API_KEY",
#     "openai_compatible_timeout_seconds": 8
#   }
# }

No GGUF/MLX file is required in HoldSpeak; the endpoint owns model loading and serving.

Set openai_compatible_timeout_seconds to cap rewrite latency. If the endpoint times out or returns malformed output, HoldSpeak preserves the original transcript and reports the failure in dry-run/readiness output.

Cross-platform GGUF

linux x86_64 · macOS fallback

Use llama_cpp on Linux x86_64, or as a macOS fallback. On macOS arm64, build with Metal flags. The model below is a suggestion — any GGUF chat model works.

Install + download (example model).
CMAKE_ARGS="-DGGML_METAL=on" uv pip install -e '.[dictation-llama]'

mkdir -p ~/Models/gguf
# Example — swap for any current GGUF chat model:
huggingface-cli download bartowski/Qwen3.5-4B-Instruct-GGUF \
  Qwen3.5-4B-Instruct-Q4_K_M.gguf --local-dir ~/Models/gguf --local-dir-use-symlinks False

Default model path: ~/Models/gguf/Qwen3.5-4B-Instruct-Q4_K_M.gguf

Enable runtime

Enable the dictation pipeline from the Runtime tab, or set dictation.pipeline.enabled in ~/.config/holdspeak/config.json.

Readiness and doctor only show commands; they do not install packages or download models.