Metadata-Version: 2.4
Name: claude-code-llm-router
Version: 6.11.0
Summary: Multi-LLM router MCP server for Claude Code — smart complexity routing, Claude subscription monitoring, Codex integration, 20+ providers
Project-URL: Homepage, https://github.com/ypollak2/llm-router
Project-URL: Repository, https://github.com/ypollak2/llm-router
Project-URL: Issues, https://github.com/ypollak2/llm-router/issues
Project-URL: Changelog, https://github.com/ypollak2/llm-router/blob/main/CHANGELOG.md
Author-email: ypollak2 <ypollak2@users.noreply.github.com>
License-Expression: MIT
License-File: LICENSE
Keywords: ai,claude,claude-code,claude-desktop,cost-optimization,gemini,litellm,llm,llm-router,mcp,mcp-server,model-routing,openai,router
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Requires-Python: <3.14,>=3.10
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: aiosqlite>=0.20.0
Requires-Dist: fastapi>=0.100.0
Requires-Dist: litellm>=1.50.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: structlog>=24.4.0
Requires-Dist: uvicorn>=0.24.0
Provides-Extra: agno
Requires-Dist: agno>=2.5.14; extra == 'agno'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-timeout>=2.3; extra == 'dev'
Requires-Dist: pytest-xdist>=3.5; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: scripts
Requires-Dist: httpx>=0.27; extra == 'scripts'
Requires-Dist: pyyaml>=6.0; extra == 'scripts'
Provides-Extra: tracing
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc>=1.20; extra == 'tracing'
Requires-Dist: opentelemetry-sdk>=1.20; extra == 'tracing'
Description-Content-Type: text/markdown

![LLM Router](docs/llm-router-header.png)

> Route every AI call to the cheapest model that can do the job well.
> 48 tools · 20+ providers · personal routing memory · budget caps, dashboards, traces.

[![PyPI](https://img.shields.io/pypi/v/claude-code-llm-router?style=flat-square)](https://pypi.org/project/claude-code-llm-router/)
[![Tests](https://img.shields.io/github/actions/workflow/status/ypollak2/llm-router/ci.yml?style=flat-square&label=tests)](https://github.com/ypollak2/llm-router/actions)
[![Downloads](https://img.shields.io/pypi/dm/claude-code-llm-router?style=flat-square)](https://pypi.org/project/claude-code-llm-router/)
[![Python](https://img.shields.io/badge/python-3.10–3.13-blue?style=flat-square)](https://pypi.org/project/claude-code-llm-router/)
[![MCP](https://img.shields.io/badge/MCP-1.0+-purple?style=flat-square)](https://modelcontextprotocol.io)
[![License](https://img.shields.io/badge/license-MIT-green?style=flat-square)](LICENSE)
[![Stars](https://img.shields.io/github/stars/ypollak2/llm-router?style=flat-square&color=yellow)](https://github.com/ypollak2/llm-router/stargazers)

**Average savings: 60–80% vs running everything on Claude Opus.**

## Install

```bash
pipx install claude-code-llm-router && llm-router install
```

| Host | Command |
|------|---------|
| Claude Code | `llm-router install` |
| VS Code | `llm-router install --host vscode` |
| Cursor | `llm-router install --host cursor` |
| Codex CLI | `llm-router install --host codex` |
| Gemini CLI | `llm-router install --host gemini-cli` |

## Supported Development Tools

llm-router works as an MCP server inside any tool that supports MCP, providing unified routing across your entire development environment.

| Tool | Status | What You Get |
|------|--------|--------------|
| **Claude Code** | ✅ Full | Auto-routing hooks + session tracking + quota display |
| **Gemini CLI** | ✅ Full | Auto-routing hooks + session tracking + quota display |
| **Codex CLI** | ✅ Full | Auto-routing hooks + savings tracking |
| **VS Code + Copilot** | ✅ MCP | llm-router tools available (routing is model-voluntary) |
| **Cursor** | ✅ MCP | llm-router tools available (routing is model-voluntary) |
| **OpenCode** | ✅ MCP | llm-router tools available (routing is model-voluntary) |
| **Windsurf** | ✅ MCP | llm-router tools available (routing is model-voluntary) |
| **Any MCP-compatible tool** | ⚡ Manual | Add llm-router to your tool's MCP config |

### Full Support vs MCP Support

**Full support** = auto-routing hooks fire before the model answers, enforcing your routing policy.
**MCP support** = tools are available, but the model chooses whether to use them.

### Quick Setup by Tool

#### Claude Code
```bash
pipx install claude-code-llm-router
llm-router install
```
Then in Claude Code, llm_route and friends appear as built-in tools. Your settings control the profile (budget/balanced/premium).

#### Gemini CLI
```bash
pipx install claude-code-llm-router
llm-router install --host gemini-cli
```
Gemini CLI users get full routing experience: auto-routing suggestions, quota display, and free-first chaining (Ollama → Codex → Gemini CLI → paid).

#### Codex CLI
```bash
pipx install claude-code-llm-router
llm-router install --host codex
```
Codex integrates deep into the routing chain as a free fallback when your OpenAI subscription is available.

#### VS Code / Cursor / Others
```bash
pipx install claude-code-llm-router
llm-router install --host vscode  # or --host cursor
```
The MCP server loads automatically. Tools appear in your IDE's model UI.

## What It Does

Intercepts prompts and routes them to the cheapest model that can handle the task. Most AI sessions are full of low-value work: file lookups, small edits, quick questions. Those burn through expensive models unnecessarily.

llm-router keeps cheap work on cheap/free models, escalates to premium models only when needed. No micromanagement required.

- Works in: Claude Code, Cursor, VS Code, Codex, Windsurf, Zed, claw-code, Agno
- Free-first: Ollama (local) → Codex → Gemini Flash → OpenAI → Claude (subscription)

## Mental Model

Think of llm-router as a **smart task dispatcher**. When you ask a question:

1. **Analyze** — What kind of task is this? (simple lookup vs. complex reasoning)
2. **Choose** — Which model can handle this best *and* cheapest?
3. **Check Constraints** — Are we over budget? Is this model degraded?
4. **Execute** — Send to that model

The dispatcher learns over time: if a model starts performing poorly (judge scores drop), it gets demoted in future decisions. If you're running low on quota (budget pressure), it automatically uses cheaper models. You don't manage any of this—it just happens behind the scenes.

**Example:** "Explain this error message" → Simple task → Route to Haiku (fast, cheap) → Done. vs. "Refactor this complex architecture" → Complex task → Route to Opus (expensive but thorough) → Done.

The savings come from not using Opus for every question.

## New in v6.10 — Automatic Model Evaluator & Kimi-K2.6 Integration

- **Model evaluator system** — Benchmarks all available models (Ollama, Codex, APIs) weekly
  - Tests on reasoning + code tasks, scores by quality + latency
  - 7-day cache TTL, auto-runs during session-end hook
  - Manual evaluation via `llm_model_eval` MCP tool
- **Kimi-K2.6:cloud** integrated as primary code specialist
  - 256K context window (2x qwen3.5), specialized for autonomous execution
  - Auto-selected for code-heavy tasks (refactor, debug, implement, test)
  - Fallback chain: qwen3-coder-next → qwen3.5
- **Profile-aware dynamic routing** 
  - Auto-detect available services (Ollama, API keys, subscriptions)
  - Token-wise tier organization (free local → free subscriptions → cheap APIs → expensive)
  - Quota pressure awareness with real-time deprioritization (≥85% usage)
  - Periodic service scanning (1-hour TTL)

This enables **per-user routing** respecting each user's unique setup + **automatic performance optimization** via weekly model benchmarking.

## New in v6.9 — Gemini CLI Integration

- **Gemini CLI as free routing provider** — 1,500 requests/day via Google One AI Pro
- **Smart insertion into free-first chain** — Ollama → Codex → Gemini CLI → paid APIs
- **Context-aware routing** — Prioritizes Gemini CLI on high budget pressure, code tasks
- **New `llm_gemini` MCP tool** for direct Gemini CLI invocation
- **Session tracking & quota display** — Daily usage meter, savings summary at session-end
- **Auto-route hook** for Gemini CLI with complexity classification

See [CHANGELOG.md](CHANGELOG.md) for full version history.

## How It Works

```
User Prompt
    ↓
[Complexity Classifier] — Haiku/Sonnet/Opus?
    ↓
[Free-First Router] — Ollama → Codex → Gemini Flash → OpenAI → Claude
    ↓
[Budget Pressure Check] — Downshift if over 85% budget
    ↓
[Quality Guard] — Demote if judge score < 0.6
    ↓
Selected Model → Execute
```

## Configuration

Zero-config by default if you use Claude Code Pro/Max (subscription mode).

Optional env vars:
```bash
OPENAI_API_KEY=sk-...                   # GPT-4o, o3
GEMINI_API_KEY=AIza...                  # Gemini Flash (free tier)
OLLAMA_BASE_URL=http://localhost:11434  # Local Ollama (free)
LLM_ROUTER_PROFILE=balanced             # budget|balanced|premium
LLM_ROUTER_COMPRESS_RESPONSE=true       # Enable response compression
```

For full setup guide, see [docs/SETUP.md](docs/SETUP.md).

## MCP Tools (48 total)

**Routing:**
- `llm_route` — Route task to optimal model
- `llm_classify` — Classify task complexity
- `llm_quality_guard` — Monitor model health

**Text:**
- `llm_query`, `llm_research`, `llm_generate`, `llm_analyze`, `llm_code`

**Media:**
- `llm_image`, `llm_video`, `llm_audio`

**Admin:**
- `llm_usage`, `llm_savings`, `llm_budget`, `llm_health`, `llm_providers`

**Advanced:**
- `llm_orchestrate` — Multi-step pipelines
- `llm_setup` — Configure provider keys
- `llm_policy` — Routing policy management

[Full tool reference](docs/TOOLS.md) — Complete documentation for all 48 tools

## Architecture

See [CLAUDE.md](CLAUDE.md) for:
- Design decisions
- Module organization
- Development workflow
- Release process

See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for:
- Three-layer compression pipeline
- Judge scoring system
- Quality trend tracking
- Budget pressure algorithm

## Development

```bash
uv run pytest tests/ -q          # Run tests
uv run ruff check src/ tests/    # Lint
uv run llm-router --version      # Check version
```

## License

MIT — See [LICENSE](LICENSE)

## Support

- Issues: [GitHub Issues](https://github.com/ypollak2/llm-router/issues)
- Discussions: [GitHub Discussions](https://github.com/ypollak2/llm-router/discussions)
- Releases: [PyPI](https://pypi.org/project/claude-code-llm-router/)
