Metadata-Version: 2.4
Name: hermes-router
Version: 0.5.3
Summary: ModelRouter: fast deterministic model routing and OpenAI-compatible proxying for custom AI agents
Author: Samuel Behjou
License-Expression: MIT
Project-URL: Homepage, https://github.com/doncazper/model-router
Project-URL: Repository, https://github.com/doncazper/model-router
Project-URL: Issues, https://github.com/doncazper/model-router/issues
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyYAML<7,>=6
Provides-Extra: proxy
Requires-Dist: fastapi<1,>=0.115; extra == "proxy"
Requires-Dist: httpx<1,>=0.27; extra == "proxy"
Requires-Dist: uvicorn<1,>=0.30; extra == "proxy"
Provides-Extra: dev
Requires-Dist: pytest<9,>=8; extra == "dev"
Requires-Dist: ruff<1,>=0.8; extra == "dev"
Provides-Extra: release
Requires-Dist: build<2,>=1; extra == "release"
Requires-Dist: twine<7,>=5; extra == "release"
Dynamic: license-file

# ModelRouter

Deterministic, fast, safety-first model routing for custom AI agents.

ModelRouter gives agents one local OpenAI-compatible endpoint that routes each
chat request to the right configured model server. Simple work can go to fast
local models, complex work to stronger reasoning models, fresh research to
research tools, repo work to code models, and risky actions to human
confirmation.

## Use With Your Agent In 3 Minutes

Install the proxy extra:

```bash
pip install "hermes-router[proxy]"
```

Create first-run configs:

```bash
model-router init --preset lmstudio --yes
```

Start the local routing proxy:

```bash
model-router-proxy --config ~/.model-router/routing_proxy.yaml
```

Point any OpenAI-compatible agent/client at:

```text
http://127.0.0.1:8082/v1
```

Useful follow-ups:

```bash
model-router validate-proxy-config --config ~/.model-router/routing_proxy.yaml
model-router doctor --config ~/.model-router/routing_proxy.yaml
curl http://127.0.0.1:8082/health
```

This project is intentionally a decision router only. It does not execute
prompts, call model providers, load local model weights, browse the web, run
shell commands, send messages, delete files, or purchase anything.

## At a Glance

| Need | ModelRouter provides |
| --- | --- |
| Fast hot-path routing | `ModelRouter.route_fast(prompt)` returns an engine string |
| Diagnostic decisions | `ModelRouter.route(prompt)` returns scores, flags, reasons, and alternatives |
| CLI tooling | `decide`, `validate-config`, `dispatch-plan`, and `setup` commands |
| Local/API flexibility | YAML routing targets for local models, hosted APIs, vision, image generation, and custom adapters |
| Safety boundaries | High-risk or invalid requests fail closed to `human_confirm` |
| Setup help | Safe local scans, config recommendations, and opt-in Hugging Face download plans |

## Highlights

- Deterministic heuristic routing with no LLM classification call.
- Fast initialized hot path: `router.route_fast(prompt)` returns only the
  selected engine.
- Rich receipt path: `router.route(prompt)` returns scores, reasons, rejected
  engines, alternatives, requirements, and safety flags.
- YAML-driven engine catalog; model names are not hardcoded throughout the
  router.
- OpenAI-compatible proxy for agents that only know how to call a local AI
  endpoint.
- First-run `model-router init` for local proxy configs.
- User-configurable routing targets for local models, hosted APIs, web/RAG
  tools, vision, image generation, or custom adapters.
- Fail-closed safety: missing/invalid config and high-risk actions route to
  `human_confirm`.
- Declarative availability checks for env vars, commands, and local paths.
- Setup assistant for local/API/mixed model configuration and optional Hugging
  Face download plans.

## Project Status

ModelRouter is a lean production-ready decision layer when embedded through
the initialized Python API. The stable surface today is:

- `ModelRouter.route_fast(...)` for production routing.
- `ModelRouter.route(...)` for diagnostic and audit receipts.
- Config-driven model/agent catalog.
- Safe dry-run dispatch plans.
- Local setup wizard and recommendations.

The local proxy is the main product path for agents. Direct dispatch beyond
OpenAI-compatible chat forwarding remains intentionally behind explicit adapter
boundaries and confirmation gates.

## Install

Requires Python 3.11 or newer.

```bash
git clone https://github.com/doncazper/model-router.git
cd model-router
python -m pip install -e ".[dev]"
```

For normal use from PyPI:

```bash
pip install "hermes-router[proxy]"
```

ModelRouter began as Hermes Router and was renamed after evolving into a
generic OpenAI-compatible routing proxy for local/custom agents. The PyPI
distribution name remains `hermes-router` for compatibility because
`model-router` is already occupied on PyPI. The primary command and Python API
are `model-router`, `model-router-proxy`, and `import model_router`.

If your shell does not provide `python`, use `python3`. If your system Python is
older, use `uv`:

```bash
uv run --python 3.11 --with pytest --with PyYAML python -m pytest
```

## Quick Start

Readable CLI output:

```bash
model-router decide "rewrite this text"
```

JSON receipt:

```bash
model-router decide --json "fix the repo and run tests"
```

Expected default routing examples:

| Prompt | Selected engine |
| --- | --- |
| `rewrite this text` | `fast_local` |
| `summarize these notes` | `balanced_local` |
| `design a distributed task scheduler architecture` | `reasoning_local` |
| `fix the repo and run tests` | `code_agent` |
| `search the web for the latest TypeScript release notes` | `web_research` |
| `extract text from this screenshot` | `multimodal_vision` |
| `generate an image of a router dashboard` | `image_generation` |
| `drop the production database` | `human_confirm` |

## Python API

Initialize once and reuse the router. Runtime calls stay in memory and do not
re-read YAML, scan disk, or run setup helpers.

```python
from model_router import ModelRouter

router = ModelRouter.from_config("configs/model_router.yaml")

# Production hot path: selected engine only.
engine = router.route_fast("fix the repo and run tests")

# Diagnostic path: scores, reasons, rejected engines, alternatives, and flags.
decision = router.route("fix the repo and run tests")

print(engine)
print(decision.requires_code_execution)
```

Use `route_fast(...)` for production routing, live routing loops, UI
responsiveness, and high-volume classification. Use `route(...)` when you need
a receipt, explanation, audit trail, or ranked alternatives. If you need a rich
decision but not ranked alternatives:

```python
decision = router.route("rewrite this text", include_alternatives=False)
```

For one-off scripts, the compatibility function remains available:

```python
from model_router import route_prompt

decision = route_prompt("research current GLP-1 supplement trends")
```

The historical `hermes.plugins.model_router` import path remains available for
backward compatibility, but new custom-agent integrations should use
`model_router`.

## CLI

After installation, you can use the console command:

```bash
model-router decide "rewrite this text"
model-router decide --json "fix the repo and run tests"
```

The old `hermes-router` command remains as a compatibility alias for existing
scripts.

Use a custom catalog:

```bash
model-router decide \
  --config configs/model_router.local.yaml \
  "research current GLP-1 supplement trends"
```

Pass routing hints:

```bash
model-router decide \
  --attachment image \
  --force-engine multimodal_vision \
  --max-cost-tier medium \
  --max-latency-tier medium \
  "summarize this attachment"
```

Validate a config:

```bash
model-router validate-config
model-router validate-config --json
```

Create a dry-run dispatch plan:

```bash
model-router dispatch-plan "fix the repo and run tests"
model-router dispatch-plan --json "rewrite this text"
model-router dispatch-plan --include-alternatives --json "rewrite this text"
```

Dispatch plans only describe what a future adapter would do. They do not execute
models, tools, shell commands, provider calls, or external actions. They skip
ranked alternatives by default for speed; pass `--include-alternatives` when a
full receipt is useful.

## Local Routing Proxy

Most agents can talk to an OpenAI-compatible local endpoint. Install the optional
proxy extra to expose one local endpoint that routes each chat request to the
configured upstream model server:

```bash
model-router init --preset lmstudio --yes
model-router-proxy --config ~/.model-router/routing_proxy.yaml
```

Then point the agent at:

```text
http://127.0.0.1:8082/v1
```

The proxy supports `/v1/chat/completions`, `/v1/models`, and `/health`. It
calls initialized `route_fast(...)` once per chat request, maps the selected
engine to a configured backend, overrides the outgoing backend model, and
forwards to an OpenAI-compatible upstream such as LM Studio, llama.cpp server,
LocalAI, or a frontier gateway. `human_confirm` returns HTTP `409` and is never
forwarded. Tools are preserved by default and can be stripped per backend for
small local models.

Packaged presets:

```bash
model-router init --preset lmstudio --yes
model-router init --preset ollama --yes
model-router init --preset llamacpp --yes
model-router init --preset localai --yes
model-router init --preset hosted-openai-compatible --yes
```

Use `model-router doctor --config ~/.model-router/routing_proxy.yaml` when a
backend is unavailable or a model name/endpoint is wrong.

## Hindsight Routing Logs

The proxy can write privacy-safe JSONL events for calibration and replay:

```yaml
observability:
  enabled: true
  log_path: ~/.model-router/routing-events.jsonl
  prompt_capture: redacted_preview
```

By default events keep a prompt hash, length, estimated tokens, selected engine,
scores, feature flags, backend, fallback status, and latencies. Raw prompts are
not stored unless `prompt_capture: full` or `MODEL_ROUTER_LOG_PROMPTS=1` is set.
Use full capture only during deliberate calibration runs.

When a route is wrong, label it:

```bash
model-router feedback req-123 code_agent --notes "repo prompt routed too small"
```

Replay captured traffic against the current router:

```bash
python scripts/replay_routing_log.py \
  --events ~/.model-router/routing-events.jsonl \
  --feedback ~/.model-router/routing-feedback.jsonl \
  --json
```

Rows without full prompts are skipped for replay but still useful for aggregate
latency, score, fallback, and route distribution analysis.

## Troubleshooting

- Wrong route: enable observability, label the request with
  `model-router feedback`, and replay logs before changing scoring.
- Backend unavailable or wrong model: run `model-router doctor --config
  ~/.model-router/routing_proxy.yaml` and check `/health`. Both diagnostics
  verify backend reachability and, when `/v1/models` returns a model list, that
  each configured backend model is advertised by the upstream server.
- `human_confirm`: the prompt matched a destructive, sending, purchase/payment,
  deployment, or other high-impact action. Use explicit safety overrides only
  in versioned configs.
- Proxy auth: if `proxy.api_key` or `proxy.api_key_env` is configured, clients
  must send `Authorization: Bearer <token>`.
- Logs/replay: default logs do not include raw prompts. Use
  `prompt_capture: full` or `MODEL_ROUTER_LOG_PROMPTS=1` only during deliberate
  calibration runs.

## Example Receipt

```json
{
  "selected_engine": "code_agent",
  "complexity_score": 56,
  "risk_score": 38,
  "confidence_score": 90,
  "fallback_engine": "reasoning_local",
  "requires_confirmation": false,
  "requires_tools": true,
  "requires_freshness": false,
  "requires_code_execution": true,
  "requires_vision": false,
  "requires_image_generation": false,
  "config_valid": true,
  "availability_valid": true,
  "reasons": [
    "coding or repository intent",
    "tool use likely",
    "file, shell, or GitHub operation",
    "coding or repository work"
  ],
  "rejected_engines": [
    {
      "engine": "fast_local",
      "reason": "tools required but engine does not support tools"
    }
  ],
  "alternatives": [
    {
      "engine": "web_research",
      "rank_score": 61,
      "capability": 70,
      "trust": 60,
      "cost": 50,
      "latency": 75,
      "reasons": [
        "capability 70/100",
        "trust 60/100",
        "cost 50/100",
        "latency 75/100"
      ]
    }
  ]
}
```

Receipts intentionally do not include the raw prompt.

## Configure Engines

The default catalog lives at `configs/model_router.yaml`. Machine-specific
settings should go in `configs/model_router.local.yaml` and be passed with
`--config`.

Routing targets map semantic routes to configured engines:

```yaml
routing_targets:
  simple: fast_local
  balanced: balanced_local
  reasoning: reasoning_local
  coding: code_agent
  research: web_research
  vision: multimodal_vision
  image_generation: image_generation
  confirmation: human_confirm
```

Human confirmation is a default-on safety feature. Escape hatches are explicit,
scoped config choices:

```yaml
safety:
  require_human_confirmation: true
  confirmation_overrides:
    allow_destructive_actions: false
    allow_send_actions: false
    allow_purchase_actions: false
    allow_high_impact_external_actions: false
    allow_ambiguous_high_impact: false
```

Each target points at an engine entry:

```yaml
engines:
  claude_code:
    provider: anthropic
    model: claude-code
    adapter: claude_code
    strengths:
      - repository edits
      - tests
    max_context: 200000
    cost_tier: high
    latency_tier: medium
    capability: 90
    trust: 90
    cost: 80
    latency: 45
    supports_tools: true
    enabled: true
    fallback: code_agent
    availability:
      status: auto
      required_commands:
        - claude
```

Coding does not have to use Codex. You can point `routing_targets.coding` at
`claude_code`, `codex`, `code_agent`, a local coding model, or any custom
engine you define.

Optional numeric metadata uses a 0-100 scale:

- `capability`: model/agent strength.
- `trust`: reliability for sensitive work.
- `cost`: relative cost, where higher means more expensive.
- `latency`: relative latency, where higher means slower.

These values rank compatible alternatives. They do not override the configured
target when that target is enabled, available, and compatible.

## Setup Assistant

The setup assistant can create a local config without guessing what you want.

Scan your machine:

```bash
model-router setup scan
model-router setup scan --json
```

Get recommendations:

```bash
model-router setup recommend
model-router setup recommend --json
```

Recommendations are produced by a bundled, versioned model advisor catalog at
`hermes/plugins/model_router/data/model_catalog.yaml`. The advisor detects basic
local hardware signals such as RAM, CPU architecture, Apple Silicon, and free
disk space, then ranks setup-time Hugging Face suggestions for each route. This
does not run during `route_fast(...)`, `route(...)`, or ordinary `decide` calls.

Run the wizard:

```bash
model-router setup wizard \
  --output configs/model_router.local.yaml
```

Write a recommended config non-interactively:

```bash
model-router setup write \
  --output configs/model_router.local.yaml
```

`setup write` will not overwrite an existing file unless `--force` is passed.

The wizard asks whether you want:

- Local LLMs only.
- API keys / hosted models.
- A mix of local models, hosted APIs, and agent tools.

It then walks each main route and shows numbered local model choices plus
hardware-aware recommended downloads when a local role is missing. Downloads are
never run by ordinary routing commands. They require explicit confirmation.

The scanner includes current LM Studio model storage at
`~/.lmstudio/models`, plus Ollama, Hugging Face cache, and common local model
folders, so wizard choices should reflect the models your local tools can see.

If recommended downloads are available and the Hugging Face `hf` CLI is missing,
the wizard warns at the beginning and asks whether to install it into the current
Python environment before model choices start. Declining is safe; the router can
still write the config, and downloads can be run later.

Plan downloads:

```bash
model-router setup download
model-router setup download --route fast_local
```

Run an approved Hugging Face download:

```bash
model-router setup download \
  --route balanced_local \
  --repo-id custom-org/custom-model \
  --execute
```

For non-interactive scripts, add `--yes`.

## Engine Roles

| Role | Default coverage |
| --- | --- |
| Intent classifier/router | `intent_router` plus deterministic router code |
| Fast response/summarization | `fast_local`, `balanced_local` |
| Deep reasoning/planning | `reasoning_local` |
| Coding/repo work | `code_agent`, with optional `codex` or `claude_code` |
| Web research/RAG | `web_research` |
| Multimodal/vision/OCR | `multimodal_vision` |
| Image generation | `image_generation` |
| Confirmation/fail-closed | `human_confirm` |

## Safety Model

- The router never executes user requests.
- The router never calls hosted model APIs.
- The router never loads local model weights.
- The router never sends email, deletes files, buys anything, or runs shell
  commands.
- High-risk destructive, sending, purchasing, payment, scheduling, publishing,
  and external-action prompts require confirmation by default.
- Confirmation escape hatches must be explicit in `safety.confirmation_overrides`;
  the router does not learn approvals or silently relax safety rules.
- `force_engine` cannot bypass human confirmation.
- Missing or invalid config routes to `human_confirm`.
- Unavailable or incompatible engines are skipped through configured fallbacks.
- Receipts omit raw prompt text.

## Performance

Use the initialized API for runtime performance:

```bash
python scripts/benchmark_route_fast.py
python scripts/benchmark_route_fast.py --json
python scripts/check_route_fast_latency.py --json
```

`route_fast(...)` is the production hot path. It returns only the selected
engine string. The scorer precompiles its stable regex patterns at import time, and
initialized routers keep YAML config and availability results in memory. The
richer `route(...)` path does more work by design because it builds scores,
explanations, rejected-engine details, alternatives, and receipt fields.

The CLI is intended for humans, diagnostics, and scripts. Latency-sensitive
services should not spawn a Python process per prompt; instantiate `ModelRouter`
once and call the Python API in process.

The default production SLO for initialized ordinary prompts is <= 25 us best
sample and <= 50 us mean sample for `route_fast(...)`. The benchmark guard
enforces those budgets in CI. See
[Production readiness](docs/production-readiness.md) for the API contract,
benchmark command, SLOs, and logging guidance.

## Install For Local Testing

Use a virtual environment so the router does not modify a managed Python
installation:

```bash
cd /path/to/model-router
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"
model-router decide --json "fix the repo and run tests"
```

For a non-editable install from GitHub:

```bash
python -m pip install "git+https://github.com/doncazper/model-router.git@v0.5.0"
model-router decide "rewrite this text"
```

The package exposes console commands, `model-router` and the legacy
`hermes-router` alias, plus
the importable Python API:

```python
from model_router import ModelRouter

router = ModelRouter.from_config()
engine = router.route_fast(prompt)
```

The default catalog is included as package data, so `ModelRouter.from_config()`
works after wheel installation without relying on the repository checkout. Pass
an explicit config path when an embedding app needs its own engine catalog.

See `examples/basic_custom_agent.py` for a minimal host-neutral integration.

ModelRouter does not currently claim any host-app plugin manifest or automatic
per-turn model switching contract. Embedding applications should use their own
runtime integration boundary and call the stable `route_fast(...)` production
API.

## Development

Run tests:

```bash
python -m pytest
```

Run lint:

```bash
python -m ruff check .
```

With `uv` and Python 3.11:

```bash
uv run --python 3.11 --with pytest --with PyYAML python -m pytest
uv run --python 3.11 --with ruff --with PyYAML python -m ruff check .
```

## Project Layout

```text
model_router/
  __init__.py          # Generic public import path
hermes/plugins/model_router/
  availability.py     # Non-executing availability validation
  cli.py              # CLI entrypoint
  config.py           # YAML catalog loading and validation
  data/               # Packaged default config
  dispatch.py         # Safe dry-run dispatch plans
  models.py           # Dataclass models and JSON-safe serialization
  policy.py           # Engine selection and fail-closed fallback rules
  receipts.py         # Routing receipt helpers
  scorer.py           # Deterministic heuristic prompt scoring
  setup_assistant.py  # Local setup scanning and config recommendations
configs/
  model_router.yaml
  model_router.local.example.yaml
docs/
  adapter-contract.md
  model-router.md
examples/
  basic_custom_agent.py
scripts/
  benchmark_route_fast.py
tests/
```

The `model_router` package is the generic public import path. The
`hermes/plugins` path is retained only as a backward-compatible legacy namespace.
Neither path is a host-application plugin registration point.

## Documentation

- [Model router details](docs/model-router.md)
- [Production readiness](docs/production-readiness.md)
- [Host adapter contract](docs/adapter-contract.md)
- [Roadmap](docs/roadmap.md)
- [Contributing](CONTRIBUTING.md)

## Roadmap

### v0.5: Usable Local Proxy Beta

- Keep the proxy-first install path polished: `pip install "hermes-router[proxy]"`.
- Keep `model-router init`, `validate-proxy-config`, `doctor`, `/health`, log
  rotation, and provider presets reliable.
- Publish releases with a changelog, GitHub release notes, and benchmark output.

### v0.6: Passthrough And Gateway Mode

- Add router mode and passthrough mode.
- Keep legacy command/import aliases for one release.
- Add backend request overrides for temperature, context, max tokens, and common
  generation controls.
- Add first-class llama.cpp and MLX gateway templates.

### v0.6.5: Managed Local Runtime Beta

- Add explicit process configuration for llama.cpp and MLX backends.
- Add start, stop, restart, status, and logs commands for configured runtimes.
- Detect port conflicts and capture per-backend process logs.
- Keep process management opt-in and transparent; never auto-start arbitrary
  commands without user configuration.

### v1.0: Finished Local AI Gateway

- Version the public config schema and provide migrations.
- Use labeled real-world routing logs as release-blocking regression checks.
- Document security/privacy expectations for logs, proxy auth, process commands,
  and local network exposure.
- Decide whether to ship a web UI or keep the product CLI/proxy-first.
