Shunt is a small, self-hosted router with an OpenAI-compatible API. It classifies every request — coding, reasoning, light chat — and forwards it to an open-source model priced for the job, from $0.01 per million tokens.
pip install shunt && shunt serve# your code, unchanged except the base URL client = OpenAI(base_url="http://localhost:8484/v1") client.chat.completions.create( model="shunt/auto", messages=[{"role": "user", "content": "Refactor this function..."}], )
Every track is plain YAML in shunt.yaml. Reorder models, change prices, add providers — Shunt ships with sensible defaults and gets out of the way.
| Track | Used for | First choice | Output, $/M |
|---|---|---|---|
light | Classification, extraction, short replies | Qwen3 8B | 0.01 |
general | Standard chat and writing | DeepSeek V4 Flash | 0.20 |
coding | Everyday code generation and edits | Qwen3 Coder 30B | 0.27 |
coding_heavy | Large refactors, difficult debugging | DeepSeek V4 Pro | 0.87 |
reasoning | Math, logic, multi-step planning | Kimi K2.5 | 1.90 |
Self-hosted by default. Shunt runs on your machine with your provider keys. Prompts never pass through infrastructure you didn't choose.
Boring and inspectable. One Python process, one YAML config, one SQLite file. No dashboard SaaS, no telemetry, no account.
Cheap means cheap. Routing overhead is a few milliseconds and the classifier never calls a paid model unless you opt in.
$ pip install shunt
$ export OPENROUTER_API_KEY=sk-or-...
$ shunt serve
listening on http://localhost:8484/v1
5 tracks loaded from shunt.yaml
Then set base_url in any OpenAI client and use model="shunt/auto". That's the whole integration.