Metadata-Version: 2.4
Name: lmrelay
Version: 0.0.0
Summary: Local LLM-aware balancer/gateway. Relays Ollama, OpenAI and Anthropic requests across multiple upstream providers with failover.
Project-URL: Homepage, https://github.com/lmrelay/lmrelay
Project-URL: Repository, https://github.com/lmrelay/lmrelay
Project-URL: Issues, https://github.com/lmrelay/lmrelay/issues
Author: lmrelay contributors
License-Expression: MIT
License-File: LICENSE
Keywords: ai-gateway,anthropic,cerebras,cloudflare-workers-ai,failover,free-tier,groq,huggingface,llm,nvidia-nim,ollama,openai-compatible,openrouter
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: FastAPI
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: Proxy Servers
Requires-Python: >=3.11
Requires-Dist: fastapi>=0.115
Requires-Dist: httpx<1,>=0.27
Requires-Dist: pydantic>=2.7
Requires-Dist: python-dotenv>=1.0
Requires-Dist: rich>=13.7
Requires-Dist: uvicorn[standard]>=0.30
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5; extra == 'dev'
Requires-Dist: pytest-httpx>=0.30; extra == 'dev'
Requires-Dist: pytest-timeout>=2.3; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Provides-Extra: e2e
Requires-Dist: anthropic>=0.40; extra == 'e2e'
Requires-Dist: openai>=2; extra == 'e2e'
Description-Content-Type: text/markdown

# LM Relay

> Local LLM-aware balancer / gateway. One endpoint, three wire
> protocols (Ollama / OpenAI / Anthropic), eight upstream providers
> with health-sorted failover, multi-key rotation, per-key model
> allow-lists, free-vs-paid catalog toggles, and a TOML config that
> hot-reloads.

> Previously known as `freellama`. The "free-tier only" framing was
> too narrow — `lmrelay` now handles paid keys and BYOK accounts just
> as well, and is the foundation for upcoming **profiles** (named
> routing presets) and **tokens** (per-user auth / quotas) work.

`lmrelay` speaks three wire protocols on the same port:

- **Ollama** `/api/*` (so Open WebUI, LobeChat, Continue, Page Assist, n8n, AnythingLLM, Cherry Studio just work),
- **OpenAI** `/v1/*` (so Aider, Continue, Cursor, OpenClaw, Codex just work),
- **Anthropic** `/anthropic/v1/messages` (so Claude Code just works).

Requests are relayed across **eight cloud providers** — OpenRouter,
Groq, NVIDIA NIM, HuggingFace, Cerebras, Cloudflare Workers AI,
Google Gemini, and a local Ollama you may already have — with
automatic failover, per-(provider, key) cooldown matrix,
health-sorted candidate chains, and multi-key rotation.

## Install

```sh
pipx install lmrelay        # recommended for end users
pip install lmrelay         # if pipx is unavailable
uv tool install lmrelay     # if you prefer uv
```

Developing lmrelay itself? `make install` creates `.venv` and
installs the package with dev extras; `make check` runs ruff + mypy +
pytest. See all targets with `make`.

## Quickstart

```sh
lmrelay init           # interactive wizard: writes ~/.lmrelay/.env
lmrelay serve          # starts the gateway on :11434

# In another terminal — point any Ollama client at us:
curl http://localhost:11434/api/tags

# Claude Code with a free backend:
lmrelay run claude

# OpenClaw / Aider / Continue:
lmrelay bind aider
lmrelay bind continue
```

## Providers

| Provider           | chat | stream | tools | embed | vision | json-mode |
|--------------------|------|--------|-------|-------|--------|-----------|
| OpenRouter         | ✓    | ✓      | ✓     | ✓     | ✓      | ✓         |
| Groq               | ✓    | ✓      | ✓     | —     | —      | ✓         |
| NVIDIA NIM         | ✓    | ✓      | ✓     | ✓     | ✓      | ✓         |
| HuggingFace Router | ✓    | ✓      | partial | ✓   | —      | partial   |
| Cerebras           | ✓    | ✓      | ✓     | —     | —      | ✓         |
| Cloudflare WAI     | ✓    | ✓      | —     | ✓     | —      | partial   |
| Gemini             | ✓    | ✓      | ✓     | ✓     | ✓      | ✓         |
| Local Ollama       | ✓    | ✓      | ✓     | ✓     | ✓      | via format=json |

## Virtual aliases

Instead of typing a full model id, use:

- `free`    — any free model, health-sorted
- `fast`    — lowest TTFT (Groq / Cerebras first)
- `quality` — widest catalog of strong models (OpenRouter first)
- `coding`  — code-tuned models with reliable tool use
- `vision`  — multimodal models accepting image input
- `embed`   — embedding models

## CLI

```
lmrelay init           Interactive wizard: write ~/.lmrelay/.env
lmrelay serve          Run the gateway
lmrelay reload         Re-read lmrelay.toml + .env without restart
lmrelay keys           Show recognised provider keys and health
lmrelay list           List available models (per provider / alias)
lmrelay doctor         Pre-flight checks (--claude / --openclaw)
lmrelay ping           List providers (enabled/disabled) + tiny pong probe
lmrelay audit-models   Probe each model with each key
lmrelay bench          Benchmark p50/p95 latency per provider
lmrelay run <agent>    Launch claude / openclaw / codex / gemini / ...
lmrelay bind <agent>   Write persistent agent config (aider / continue / cursor / lobechat / ...)
lmrelay migrate-ollama Move local Ollama to :11435 and register it as backend
lmrelay telemetry      on / off / status
lmrelay dashboard      TUI live dashboard
lmrelay-watcher        Background head-of-chain probe (separate binary)
```

## Configuration (lmrelay.toml)

lmrelay looks for a TOML config in two well-known locations:

1. `./lmrelay.toml` — when running from a checkout (already in `.gitignore`).
2. `~/.lmrelay/lmrelay.toml` — when installed via pipx / pip / docker.

Process env vars take precedence over values in the file, so Docker /
systemd / CI can override without editing it. See
[`lmrelay.toml.example`](./lmrelay.toml.example) for the full
schema. Each provider gets its own section and one
`[[provider.X.keys]]` block per credential:

```toml
[server]
host       = "0.0.0.0"
port       = 11434
log_level  = "INFO"

[runtime]
disabled_providers = ["cloudflare_wai"]

[provider.openrouter]
[[provider.openrouter.keys]]
api_key = "sk-or-v1-..."
label   = "personal"

[[provider.openrouter.keys]]
api_key = "sk-or-v1-..."
label   = "work"
```

### Per-key model filter

Each `[[provider.X.keys]]` block optionally takes a `models = [...]`
list of fnmatch globs. The router only routes a `(provider, key,
model)` triple through this key if at least one pattern matches the
resolved model id. Missing or empty → no filter. Useful when one key
has a quota only for a model family, or when different keys belong to
different paid sub-accounts:

```toml
[[provider.openrouter.keys]]
api_key = "sk-or-v1-aaa"
label   = "free-tier"
models  = ["*:free"]              # only OpenRouter free models

[[provider.openrouter.keys]]
api_key = "sk-or-v1-bbb"
label   = "work-paid"
# no `models` → this key handles everything else
```

`lmrelay keys` shows the filter in the `models` column.

### Free vs paid models per provider

By default each provider exposes only its **free-tier** models. Opt
into paid models with `include_paid = true`, cherry-pick specific
paid models with `include_extra`, or hide individual ones with
`exclude`:

```toml
[provider.openrouter]
include_paid  = false                       # default — free models only
include_extra = ["openai/gpt-4o-mini"]      # let one paid model through
exclude       = ["meta-llama/llama-3.2-1b:free"]
```

All four fields are optional. `include_extra` and `exclude` are
`fnmatch` globs (`*` matches anything). `exclude` wins over the
others. Only OpenRouter currently surfaces mixed free/paid tiers
— the rest of the providers tag their catalogs as `free` by default.
Use `lmrelay reload` after editing the file.

### Enabling and disabling providers

A provider is **enabled** when:

- at least one of its `<PROVIDER>_API_KEY[_N]` env vars is set OR its
  `[provider.X]` section in `lmrelay.toml` carries at least one
  `[[provider.X.keys]]` block, AND
- its name does not appear in `LMRELAY_DISABLED_PROVIDERS` (or the
  TOML `runtime.disabled_providers` list).

`lmrelay ping` prints the per-provider on/off table and, unless you
pass `--no-probe`, sends a 4-token `Reply with one word: pong` request to
the first model in each enabled provider's catalog to verify it is
actually reachable:

```sh
lmrelay ping                       # full table + live probes
lmrelay ping --no-probe            # just list status, no network calls
lmrelay ping --provider groq       # narrow to one provider
lmrelay ping --json
```

To temporarily disable a provider without removing its key:

```sh
LMRELAY_DISABLED_PROVIDERS=cloudflare_wai,huggingface lmrelay serve
```

Disabled providers are dropped from `app.state.providers` at startup, so
the router never picks them.

### Reload

After editing the file on a running gateway, hot-reload without restart:

```sh
lmrelay reload                       # local instance
sudo systemctl reload lmrelay        # systemd-managed
docker compose exec frl_app lmrelay reload
```

Reload re-reads `lmrelay.toml` + `.env`, rebuilds the key ring and
the active provider list, and clears the cooldown matrix. Host/port
stay bound until a full restart.

## Deploying

- **systemd**: see [`deploy/systemd/`](./deploy/systemd/) and
  [`deploy/README.md`](./deploy/README.md) for unit files and the
  install procedure (system account, `WorkingDirectory`,
  `ExecReload`).
- **Docker**: `docker compose up -d`. Drop your `lmrelay.toml` into
  `./.lmrelay/lmrelay.toml` (mounted at `/root/.lmrelay`).

## Multi-key rotation

Every provider's env var accepts `_2`, `_3`, ... suffixes:

```
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_API_KEY_2=sk-or-v1-...
OPENROUTER_API_KEY_3=sk-or-v1-...
```

When the active key hits a 429 / quota, the router automatically rotates to
the next key and puts the offender in cooldown.

## Security

- Default bind is `0.0.0.0:11434` so the LAN can see it (consistent with Ollama).
  When `LMRELAY_TOKEN` is **not** set, a loud banner is printed at startup.
- Set `LMRELAY_TOKEN=$(openssl rand -hex 16)` to require
  `Authorization: Bearer <token>` for `/api/*`, `/v1/*`, `/anthropic/v1/*`.
  `/health`, `/ready`, `/metrics`, `/docs` always remain open.
- API keys are never logged in full — only the last 4 characters.

## Telemetry

**Off by default.** `lmrelay telemetry on` enables sending install_uuid,
lmrelay version, OS, python version, and aggregate counters. Never prompts,
completions, hostnames, IPs, or API keys.

## Docker

```sh
docker compose up -d
```

`docker-compose.yml` runs two services:

- `frl_app` — the gateway on `:11434`
- `frl_watcher` — the head-of-chain probe daemon

## License

MIT — see [LICENSE](./LICENSE).

`lmrelay` is not affiliated with Meta, Ollama Inc., OpenAI, Anthropic, Groq,
NVIDIA, HuggingFace, Cerebras, or Cloudflare. All trademarks are property of
their respective owners.
