Quick start
voice-copilot runs a local HTTP reverse-proxy on port 8766. Point your CLI's base-URL env var at it — traffic is forwarded transparently and events are extracted live.
# standalone proxy mode (no wrapped subprocess) voice-copilot proxy # or wrap the CLI directly — env vars are injected automatically voice-copilot run claude -p "implement feature X" voice-copilot run codex "add dark mode"
CLI agents
ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic \ claude -p "your prompt"
Also works with claude --output-format stream-json for structured events.
OPENAI_BASE_URL=http://127.0.0.1:8766/openai/v1 \ codex "your prompt"
# Anthropic backend ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic \ aider --model claude-sonnet-4-5 # OpenAI backend OPENAI_BASE_URL=http://127.0.0.1:8766/openai/v1 \ aider --model gpt-4o # Ollama (local) OPENAI_BASE_URL=http://127.0.0.1:8766/ollama/v1 \ aider --model ollama/qwen2.5-coder:32b
GEMINI_BASE_URL=http://127.0.0.1:8766/gemini \ gemini "your prompt"
Gemini passthrough is supported but events are not yet parsed — session appears in the dropdown but Trace stays empty.
MiniMax M2.5 Free, Kimi, GLM, Qwen and other Zen models
# In CLI Proxy → OpenCode choose route: opencode-zen # voice-copilot injects a temporary runtime override for OpenCode: OPENCODE_CONFIG_CONTENT={"provider":{"opencode":{"options":{"baseURL":"http://127.0.0.1:8766/opencode-zen"}}}}
Use this for OpenCode Zen models such as opencode/minimax-m2.5-free. This is different from OPENAI_BASE_URL because OpenCode reads its provider base URL from runtime config.
(Continue, Open WebUI, LM Studio, Cursor, etc.)
# Set the base URL in the tool's settings to: http://127.0.0.1:8766/openai/v1 # Or via env var: OPENAI_BASE_URL=http://127.0.0.1:8766/openai/v1
No token setup needed — uses the CLI's existing auth
# 1. Install Copilot CLI and log in once: copilot login # 2. In Settings → Commentator, select "copilot-cli" # Set Model to one of: gpt-5-mini gpt-4.1 gpt-4.1-mini gpt-4o o3-mini # # The commentator calls `copilot -p "…" -s --allow-all --model <model>` # Each narration = one subprocess call (~1–2 s latency).
gh copilot / any tool using Copilot API
OPENAI_BASE_URL=https://api.githubcopilot.com \ OPENAI_API_KEY=<your-github-oauth-token> \ gh copilot suggest "…" # For the commentator LLM — select "github-copilot" in Settings → Commentator. # Token auto-discovered from VS Code / gh auth login, or set manually: # API keys tab → GITHUB_COPILOT_TOKEN # Available models (set in Commentator → Model field): # gpt-4.1 gpt-4.1-mini gpt-4.1-nano # gpt-5-mini gpt-4o gpt-4o-mini # o1 o3-mini claude-3.5-sonnet
# OpenRouter OPENROUTER_BASE_URL=http://127.0.0.1:8766/openrouter/v1 # Groq GROQ_BASE_URL=http://127.0.0.1:8766/groq/v1 # Mistral MISTRAL_BASE_URL=http://127.0.0.1:8766/mistral/v1
LLM provider routes
| Provider | Proxy path | Upstream | Parser |
|---|---|---|---|
| Anthropic | /anthropic/… | api.anthropic.com | SSE (Messages API) |
| OpenAI | /openai/v1/… | api.openai.com | SSE (Chat Completions) |
| Ollama compat | /ollama/v1/… | 127.0.0.1:11434 | SSE (OpenAI format) |
| Ollama native | /ollama/api/… | 127.0.0.1:11434 | NDJSON (/api/chat) |
| OpenRouter | /openrouter/v1/… | openrouter.ai/api | SSE |
| Groq | /groq/v1/… | api.groq.com/openai | SSE |
| Mistral | /mistral/v1/… | api.mistral.ai | SSE |
| OpenCode Zen | /opencode-zen/… | opencode.ai/zen/v1 | auto (messages / responses / chat) |
| Gemini | /gemini/… | generativelanguage.googleapis.com | passthrough |
The proxy forwards your original Authorization / x-api-key / x-goog-api-key headers unchanged. No keys are stored.
Custom port
voice-copilot proxy --proxy-port 9000
# Then use port 9000 in all env vars above.
Ollama thinking models
Models that stream a thinking field (e.g. qwq, deepseek-r1) are handled automatically — thinking content appears in Trace as purple THINKING blocks and is included in the narration prompt. Use the native Ollama path (/ollama/api/chat) via litellm ollama_chat/… prefix, or the OpenAI-compat path with reasoning / reasoning_content delta fields.