Dictation
Dictation runtime setup
The dictation pipeline uses an LLM runtime for block
classification and the optional project-context rewrite. Project
Facts enrichment is deterministic and needs no model. Pick a local
model backend or point HoldSpeak at an OpenAI-compatible endpoint.
The model names below are suggestions, not requirements.
Bring your own (see docs/MODELS.md).
Apple Silicon
recommended on M-seriesUse MLX on Apple Silicon when available. The model below is a suggestion — any MLX chat model works; bring your own.
uv pip install -e '.[dictation-mlx]'
# Example — swap for any current MLX chat model:
huggingface-cli download mlx-community/Qwen3.5-8B-MLX-4bit --local-dir ~/Models/mlx/Qwen3.5-8B-MLX-4bit
Default model path: ~/Models/mlx/Qwen3.5-8B-MLX-4bit
OpenAI-compatible endpoint
local server or hosted API
Use openai_compatible for LM Studio, Ollama's
OpenAI bridge, vLLM, llama.cpp server, LiteLLM, or hosted
OpenAI-compatible APIs.
uv pip install -e '.[dictation-openai]'
# Example config slice:
# "dictation": {
# "pipeline": { "enabled": true },
# "runtime": {
# "backend": "openai_compatible",
# "openai_compatible_base_url": "http://127.0.0.1:8000/v1",
# "openai_compatible_model": "qwen3.5-8b-instruct",
# "openai_compatible_api_key_env": "OPENAI_API_KEY",
# "openai_compatible_timeout_seconds": 8
# }
# } No GGUF/MLX file is required in HoldSpeak; the endpoint owns model loading and serving.
Set openai_compatible_timeout_seconds to cap rewrite latency.
If the endpoint times out or returns malformed output, HoldSpeak preserves
the original transcript and reports the failure in dry-run/readiness output.
Cross-platform GGUF
linux x86_64 · macOS fallback
Use llama_cpp on Linux x86_64, or as a macOS
fallback. On macOS arm64, build with Metal flags. The model
below is a suggestion — any GGUF chat model works.
CMAKE_ARGS="-DGGML_METAL=on" uv pip install -e '.[dictation-llama]'
mkdir -p ~/Models/gguf
# Example — swap for any current GGUF chat model:
huggingface-cli download bartowski/Qwen3.5-4B-Instruct-GGUF \
Qwen3.5-4B-Instruct-Q4_K_M.gguf --local-dir ~/Models/gguf --local-dir-use-symlinks False
Default model path: ~/Models/gguf/Qwen3.5-4B-Instruct-Q4_K_M.gguf
Enable runtime
Enable the dictation pipeline from the
Runtime tab, or set
dictation.pipeline.enabled in
~/.config/holdspeak/config.json.
Readiness and doctor only show commands; they do not install packages or download models.