Metadata-Version: 2.4
Name: smart-llm-router
Version: 0.1.2
Summary: Provider-agnostic LLM router. Pick the cheapest capable model per prompt with rule-based scoring. Wraps LiteLLM for format conversion + streaming.
Author-email: Huanzhou Huang <huanzhou.huang@netmind.ai>
License: MIT
Project-URL: Homepage, https://github.com/protagolabs/smart-llm-router
Project-URL: Issues, https://github.com/protagolabs/smart-llm-router/issues
Project-URL: Repository, https://github.com/protagolabs/smart-llm-router
Keywords: llm,router,litellm,openrouter,openai,anthropic,gemini,smart-routing,cost-optimization
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: litellm[proxy]>=1.50.0
Requires-Dist: typer>=0.12.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Dynamic: license-file

# smart-llm-router

Provider-agnostic LLM router. Pick the cheapest capable model per prompt with rule-based scoring. Wraps [LiteLLM](https://github.com/BerriAI/litellm) for format conversion, streaming, tool calls, and 100+ provider integrations.

## Why

Every LLM proxy today routes based on a model name you pick. This one **picks the model for you** — locally, in <1ms, with zero ML — by scoring the prompt across 14 dimensions (code presence, reasoning markers, multi-step patterns, multilingual keywords, etc.) and mapping to one of four tiers (SIMPLE / MEDIUM / COMPLEX / REASONING).

You bring an upstream (OpenRouter, Together, Fireworks, Groq, Anthropic direct, vLLM, Ollama — anything OpenAI-compatible). It does the rest.

## Install

```bash
pip install smart-llm-router
```

Two console scripts ship with the package: `smart-llm-router` (full name) and `slr` (short alias).

## Quick start with OpenRouter (default upstream)

```bash
# 1. Get an OpenRouter key at https://openrouter.ai/keys
export OPENROUTER_API_KEY=sk-or-v1-...
export LITELLM_MASTER_KEY=sk-anything    # gates the proxy itself

# 2. Start the proxy on :4000 (uses bundled OpenRouter config by default)
smart-llm-router start
```

In another terminal — any OpenAI-compatible client works:

```python
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:4000/v1", api_key="sk-anything")

# Smart routing — rule-based scorer picks the cheapest capable model
resp = client.chat.completions.create(
    model="smart/auto",
    messages=[{"role": "user", "content": "prove that sqrt(2) is irrational step by step"}],
)
# → routed to REASONING tier (e.g. deepseek/deepseek-r1)
```

Or curl:

```bash
curl http://127.0.0.1:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-anything" \
  -H "Content-Type: application/json" \
  -d '{"model":"smart/auto","messages":[{"role":"user","content":"hi"}]}'
```

## Inspect routing without dispatching

```bash
slr test "what is the capital of france"
# → SIMPLE / google/gemini-2.5-flash-lite / 100% savings vs claude-sonnet-4.6

slr test "Prove that sqrt(2) is irrational step by step"
# → REASONING / deepseek/deepseek-r1 / 90% savings

slr test "design a high-availability microservices architecture" --profile premium
# → COMPLEX / anthropic/claude-opus-4.7

slr models --profile auto    # show the tier→model table
```

## Pointing at a different upstream

The bundled config targets OpenRouter, but anything OpenAI-compatible works (Together, Fireworks, Groq, DeepInfra, vLLM, Ollama, OpenAI direct). Copy the bundled YAML and edit `api_base` / `api_key`:

```bash
# Copy the bundled config to your working directory
python -c "from importlib.resources import files; import shutil; shutil.copy(files('smart_llm_router') / 'default_config.yaml', './smart-llm-router.yaml')"

# Edit smart-llm-router.yaml — swap api_base / api_key per model_list entry
# Then start with --config
smart-llm-router start --config smart-llm-router.yaml
```

## Available routing profiles

| `model` value | Behavior |
|---|---|
| `smart/auto` | Rule-based scoring → cheapest capable model |
| `smart/eco` | Rule-based scoring → cheapest tier table (free + lite models) |
| `smart/premium` | Rule-based scoring → quality-first tier table (Claude Sonnet/Opus, GPT-4o, o1) |
| `smart/agentic` | Rule-based scoring → tool-use-friendly tier table (auto-engaged when `tools[]` present) |
| `smart/free` | Forces only free/local models |
| `<provider>/<model>` | Bypasses routing, dispatches directly |

## Pin a specific model (no routing)

Pass a concrete model ID and the router leaves it alone:

```python
client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",   # always Sonnet
    messages=[...]
)

client.chat.completions.create(
    model="anthropic/claude-opus-4.7",     # always Opus
    messages=[...]
)

client.chat.completions.create(
    model="openai/gpt-4o",                 # always GPT-4o
    messages=[...]
)
```

Models pre-wired in the bundled config: `anthropic/claude-haiku-4.5`, `anthropic/claude-sonnet-4.6`, `anthropic/claude-opus-4.6`, `anthropic/claude-opus-4.7`, `openai/gpt-4o`, `openai/gpt-4o-mini`, `openai/o1`, `openai/o3`, `openai/o3-mini`, `openai/o4-mini`, `google/gemini-2.5-flash-lite`, `google/gemini-2.5-flash`, `google/gemini-2.5-pro`, `google/gemini-2.0-flash-lite-001`, `deepseek/deepseek-chat`, `deepseek/deepseek-r1`, `meta-llama/llama-3.3-70b-instruct`. Add more by editing the `model_list` in your config YAML.

## Use with Claude Code

Claude Code respects `ANTHROPIC_BASE_URL`. Point it at the proxy:

```bash
export ANTHROPIC_BASE_URL=http://127.0.0.1:4000
export ANTHROPIC_AUTH_TOKEN=sk-anything   # the proxy's master key
claude
```

Then inside Claude Code: `/model anthropic/claude-opus-4.7` to pin Opus, or `/model smart/premium` to let the router pick the best Claude per request.

## How it works

1. Client sends OpenAI/Anthropic/Gemini-format request to `localhost:4000`.
2. LiteLLM Proxy parses; `SmartRouterHook.async_pre_call_hook` intercepts.
3. If `model` is a `smart/*` profile, the rule-based router scores the prompt and picks a concrete upstream model ID.
4. LiteLLM dispatches to the configured upstream — handling format conversion, streaming, tool calls, retries, etc.

## Routing internals

The classifier (`smart_llm_router/router/rules.py`) scores each prompt across 14 weighted dimensions:

| Dimension | Weight | Detects |
|---|---|---|
| reasoningMarkers | 0.18 | `prove`, `theorem`, `step by step`, `证明`, `теорема`, ... |
| codePresence | 0.15 | ` ``` `, `function`, `class`, `SELECT`, `异步`, ... |
| multiStepPatterns | 0.12 | "first ... then", "step 1", "1. " |
| technicalTerms | 0.10 | `algorithm`, `architecture`, `kubernetes`, ... |
| tokenCount | 0.08 | <50 tok ⇒ -1, >500 ⇒ +1 |
| creativeMarkers | 0.05 | "write a story/poem" |
| questionComplexity | 0.05 | count of `?` |
| constraintCount | 0.04 | "must", "exactly", "at most" |
| agenticTask | 0.04 | "edit file", "deploy", "install", "verify" |
| imperativeVerbs | 0.03 | "implement", "build", "fix" |
| outputFormat | 0.03 | `json`, `yaml`, `table`, `schema` |
| referenceComplexity | 0.02 | "above", "below", "the docs" |
| domainSpecificity | 0.02 | `quantum`, `fpga`, `homomorphic`, ... |
| simpleIndicators | 0.02 | "what is", "hello" → negative |
| negationComplexity | 0.01 | "not", "without", "except" |

Keyword sets are multilingual — EN + ZH + JA + RU + DE + ES + PT + KO + AR — so the same scorer works across 9 languages without translation.

The score maps to a tier through three boundaries:

```
< 0.0   → SIMPLE        0.3-0.5 → COMPLEX
0.0-0.3 → MEDIUM        > 0.5   → REASONING
```

Plus three hard overrides: 2+ reasoning keywords ⇒ force REASONING; >100k tokens ⇒ force COMPLEX; system prompt mentioning `json/schema` ⇒ floor at MEDIUM.

## Attribution

The 14-dimension rule-based router in `smart_llm_router/router/` is ported from [ClawRouter](https://github.com/BlockRunAI/ClawRouter) (MIT). Format conversion and streaming come from [LiteLLM](https://github.com/BerriAI/litellm) (MIT).

## License

MIT
