Metadata-Version: 2.4
Name: dispatch-relay
Version: 0.0.1
Summary: Provider-agnostic LLM dispatch layer: 3 injected seams (config / usage / dispatch) + a pure cost model. Relays usage to a sink rather than tracking it.
Author: Pierre Samson, Claude
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: facade
Requires-Dist: langchain-core>=0.3; extra == "facade"
Provides-Extra: dspy
Requires-Dist: langchain-core>=0.3; extra == "dspy"
Requires-Dist: dspy>=2.0; extra == "dspy"
Requires-Dist: litellm>=1.0; extra == "dspy"
Provides-Extra: all
Requires-Dist: langchain-core>=0.3; extra == "all"
Requires-Dist: dspy>=2.0; extra == "all"
Requires-Dist: litellm>=1.0; extra == "all"
Dynamic: license-file

# dispatch-relay

**A provider-agnostic LLM layer with three injected seams.** Resolve a model, dispatch a call across any provider, and *relay* usage to a sink your application owns — instead of the library tracking it for you. Pure-stdlib core, zero runtime dependencies.

```bash
pip install dispatch-relay
```

**Who it's for:** anyone running more than one LLM provider who wants one consistent dispatch + usage-attribution surface, with the host application in control of config resolution, usage recording, and the actual transport. The "relay, not track" name is the contract: usage is relayed to *your* sink (a database, a log, nothing) — the library never decides where it lands.

This is the **dependency-light foundation increment**: the three injected-interface seams + the pure cost model. (Caching and the higher-level façade arrive in a later increment and bring `langchain-core` etc. with them; this increment is pure-stdlib.)

> **Renamed from `omega-llm`.** `import omega_llm` still works as a deprecated alias that re-exports `dispatch_relay` (with a `DeprecationWarning`) — migrate to `import dispatch_relay`.

## The 3 injected seams (`dispatch_relay.interfaces`)

Each is a `@runtime_checkable typing.Protocol` (structural typing — a host satisfies the contract WITHOUT importing this library) + a dependency-light default impl.

| Seam | Method(s) | Default impl | A host can back it with |
|------|-----------|--------------|-------------------------|
| `ConfigSource` | `resolve(key, role, default) → model_id` | `DefaultConfigSource` (`os.getenv(f"{KEY}_MODEL") or default`) | a config store (role → global → env → default) |
| `UsageSink` | `record(*, provider, role, caller, model, tier, input_tokens, output_tokens, cache_read=0, cache_creation=0, cost_usd=0.0, cost_usd_raw=0.0, billing="metered", **extra) → None` | `NoOpUsageSink` (no-op) | a usage store / time-series table |
| `DispatchBackend` | `supports(*, provider, role, tier) → bool` + `dispatch(*, provider, model, messages, tier, role, caller, **kwargs) → LLMResponse` | `DefaultDispatchBackend` (direct SDK via injected `llm_factory`; `supports`→True) | subscription lanes / custom transports |

`cache_read` and `cache_creation` are **separate** fields on `UsageSink.record` and on `UsageRecord` — summing them undercounts Anthropic. `billing` marks the lane: `"metered"` ($-tracked SDK) vs `"subscription"` ($0).

## Value types & core-owned facts (`dispatch_relay.core`)

```python
@dataclass(frozen=True)
class UsageRecord:  # input_tokens, output_tokens, cache_read=0, cache_creation=0, model=""
@dataclass(frozen=True)
class LLMResponse:  # text, usage: UsageRecord | None, raw: Any
```

The provider-facts live in `dispatch_relay.core` (one place, never duplicated per backend):

- `DEFAULTS: dict[str, str]` — the abstract-key → model-id table. The core passes `default=DEFAULTS[key]` into `ConfigSource.resolve`.
- `extract_usage(provider, raw) → UsageRecord | None` — the single place that knows each provider's usage-from-raw shape. **Anthropic dual-path**: prefer `raw.response_metadata["usage"]` (the uncached remainder), fall back to `raw.usage_metadata` only if absent (using the wrong one double-counts). The **model name** comes from `raw.response_metadata["model_name"]` (both Anthropic and Gemini surface it there — a real LangChain `AIMessage` has no top-level `.model` attribute), falling back to `""`. Returns `None` when no usage metadata is present.
- `resolve_usage(response, provider, model) → UsageRecord | None` — the **locked reconciliation rule**: resolve `response.usage if response.usage is not None else extract_usage(provider, response.raw)`, then **stamp the authoritative `model`** — the dispatch call knows the configured `model`, so the dispatch-arg model always wins over whatever the raw echoed (via `dataclasses.replace`). Returns `None` unchanged when there's no usage (the subscription lane). `LLMResponse.usage` is a real escape hatch — a backend MAY pre-populate it; else the core extracts.

Both shipped backends return `LLMResponse(usage=None)`; the core extracts usage. The `DefaultDispatchBackend` derives `text` from `raw.content`: a `str` passes through; an Anthropic content **list** has its `type=="text"` blocks joined (non-text blocks skipped); anything else falls back to `str(raw)`. That fallback is only the default backend's degenerate case — real subscription backends (raws are **dicts**, not strings) construct `text` explicitly and pass `usage=None` with `billing="subscription"`.

## The pure cost model (`dispatch_relay.cost`)

`estimate_cost(*, prompt, tier="flash", provider="gemini", output_tokens_max=1024, cache_hit_ratio=0.0, role="agents") -> dict` — a single source of cost truth. Pricing tables for Gemini / Anthropic / OpenAI, the Gemini Flex 50% rebate gate, Anthropic + OpenAI cache-ratio math. Zero deps.

## Usage

```python
from dispatch_relay import estimate_cost, DefaultConfigSource, DEFAULTS

DefaultConfigSource().resolve("gemini_flash", "council", DEFAULTS["gemini_flash"])
# -> "gemini-2.5-flash"  (env GEMINI_FLASH_MODEL wins if set)
estimate_cost(prompt=10_000, tier="sonnet", provider="anthropic", output_tokens_max=512)
```

## Authors

Pierre Samson and Claude. MIT licensed.
