shunt

Most prompts don't need a frontier model. Shunt sends each one to the cheapest model that can handle it.

Shunt is a small, self-hosted router with an OpenAI-compatible API. It classifies every request — coding, reasoning, light chat — and forwards it to an open-source model priced for the job, from $0.01 per million tokens.

pip install shunt && shunt serve

How it works

  1. Point your client at Shunt. It speaks the OpenAI API, so existing SDKs, agents, and tools work unchanged — you only swap the base URL.
  2. Each prompt is classified. A fast local heuristic (with an optional small-model fallback) tags the request: coding, heavy reasoning, general chat, or trivial.
  3. The request is switched to a track. Each track is an ordered chain of open-source models, cheapest capable first. Provider down or rate-limited? Shunt falls through to the next model without surfacing an error.
  4. Spend is recorded. A local SQLite ledger tracks tokens and cost per route, and what the same traffic would have cost on a frontier model.
# your code, unchanged except the base URL
client = OpenAI(base_url="http://localhost:8484/v1")

client.chat.completions.create(
    model="shunt/auto",
    messages=[{"role": "user", "content": "Refactor this function..."}],
)

Default routes

Every track is plain YAML in shunt.yaml. Reorder models, change prices, add providers — Shunt ships with sensible defaults and gets out of the way.


TrackUsed forFirst choiceOutput, $/M
lightClassification, extraction, short repliesQwen3 8B0.01
generalStandard chat and writingDeepSeek V4 Flash0.20
codingEveryday code generation and editsQwen3 Coder 30B0.27
coding_heavyLarge refactors, difficult debuggingDeepSeek V4 Pro0.87
reasoningMath, logic, multi-step planningKimi K2.51.90

Principles

Self-hosted by default. Shunt runs on your machine with your provider keys. Prompts never pass through infrastructure you didn't choose.

Boring and inspectable. One Python process, one YAML config, one SQLite file. No dashboard SaaS, no telemetry, no account.

Cheap means cheap. Routing overhead is a few milliseconds and the classifier never calls a paid model unless you opt in.

Quickstart

$ pip install shunt
$ export OPENROUTER_API_KEY=sk-or-...
$ shunt serve
listening on http://localhost:8484/v1
5 tracks loaded from shunt.yaml

Then set base_url in any OpenAI client and use model="shunt/auto". That's the whole integration.