Metadata-Version: 2.4
Name: keel-llm-reliability
Version: 0.1.1
Summary: Composed reliability for multi-model LLM calls — quorum fan-out + primary/failover, category-dispatched, transparent degradation. Built on keel-llm-protocol + keel-circuit-breaker.
Project-URL: Homepage, https://github.com/keelplatform/keel
Project-URL: Source, https://github.com/keelplatform/keel/tree/main/py/packages/llm-reliability
Project-URL: Changelog, https://github.com/keelplatform/keel/blob/main/py/packages/llm-reliability/CHANGELOG.md
Author: Raj Yakkali
License: MIT
Keywords: failover,fan-out,keel,llm,reliability,resilience
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: keel-circuit-breaker>=0.1.0
Requires-Dist: keel-llm-protocol>=0.1.0
Description-Content-Type: text/markdown

# keel-llm-reliability

> Production-grade reliability for multi-model LLM calls — quorum fan-out **and** primary/failover, category-dispatched, with *transparent degradation*. The composed solution, not a parts bin.

Part of the **Keel** toolkit. Composes [`keel-llm-protocol`](https://pypi.org/project/keel-llm-protocol/) (the error taxonomy) + [`keel-circuit-breaker`](https://pypi.org/project/keel-circuit-breaker/) into the consumer-side machinery that *acts* on typed errors — so you don't hand-write the fan-out/failover loop.

## Why it exists

A typed error taxonomy tells you *what* failed; this tells your app *what to do about it*. The core lesson (measured in production): **a rate-limited model is healthy, not failing** — defer it, don't trip its circuit. Acting on that one distinction moved a throttled model from 3/10 to 10/10 availability. This package generalizes it into two strategies and makes every decision visible.

## Install

```bash
# this package + at least one adapter for the providers you call:
pip install keel-llm-reliability keel-llm-adapter-openai
#   keel-llm-adapter-anthropic / keel-llm-adapter-google also available;
#   reliability itself pulls in keel-llm-protocol + keel-circuit-breaker.
```

## Quickstart (copy-paste runnable)

```python
import asyncio
from keel_llm_reliability import ResilientClient, Request
from keel_llm_adapter_openai import OpenAIAdapter
from keel_llm_protocol import user

# Adapters are plain objects implementing keel-llm-protocol. Any OpenAI-compatible
# endpoint works (OpenAI, Groq, OpenRouter, Mistral, vLLM, Ollama, …); mix providers freely.
primary  = OpenAIAdapter(model="llama-3.3-70b-versatile", api_key="gsk_…",
                         base_url="https://api.groq.com/openai/v1", provider="groq")
fallback = OpenAIAdapter(model="llama-3.1-8b", base_url="http://localhost:11434/v1",
                         provider="local")

client = ResilientClient([primary, fallback])     # ordered: primary, then fallbacks

async def main() -> None:
    result = await client.failover(Request(messages=[user("One-line summary of TCP.")]))
    if result.succeeded:
        print(result.response.text)
    for a in result.attempts:                     # every decision is visible data
        print(a.model_key, a.outcome, f"{a.latency_ms}ms")

asyncio.run(main())
```

## Two strategies

```python
from keel_llm_reliability import ResilientClient, Request
from keel_llm_protocol import user

client = ResilientClient([primary, fallback])      # adapters built as above
req = Request(messages=[user("Summarize this in one line.")])

# Primary + ordered failover — the single-good-answer case (most apps):
result = await client.failover(req)
if result.succeeded:
    print(result.response.text)

# Quorum / parallel fan-out — the ensemble case:
result = await client.fan_out(req)
for r in result.successes:        # every model that answered
    ...
```

Both are also available as plain functions (`fan_out`, `failover`) if you'd rather wire collaborators yourself.

## Transparent degradation — every decision is data

No silent retries, no hidden fallbacks. Every provider interaction is a visible `Attempt`:

```python
result = await client.failover(req)
for a in result.attempts:
    print(a.model_key, a.outcome, a.latency_ms, a.error and a.error.category)
# groq:…     deferred_backpressure  120   backpressure   (throttled — skipped, NOT failed)
# gemini:…   failed                 310   transient      (5xx — counted, failed over)
# openai:…   success                420   None
```

`outcome` is one of `success` / `preempted_open` / `preempted_limited` / `deferred_backpressure` / `failed`. A `failed` attempt carries its `error.category` (`transient` vs `terminal`) so you can tell "flaky" from "broken config." Degradation you can see and operate on — not a black box.

## How it behaves (category-dispatched)

| Error category | fan_out (quorum) | failover |
|---|---|---|
| `backpressure` (429) | defer — contributes nothing this round; **no breaker failure** | route to the next candidate immediately; **no breaker failure** |
| `transient` (5xx, timeout) | record a breaker failure; that model contributes nothing | record a breaker failure; fail over (optionally retry the same model up to `transient_retries`) |
| `terminal` (auth/bad-request/context/content) | visible `failed`; **no breaker failure** (request-level, not model health) | visible `failed`; fail over |

Before any dispatch, both strategies **preempt**: a model whose breaker is open (`preempted_open`) or whose limiter predicts it's full (`preempted_limited`) is skipped — *predict, don't block*. There are no hidden sleeps; exhaustion returns visibly (empty successes / `response=None`).

## Injected collaborators — born ready for scale

The `Breaker` and (optional) `Limiter` are **injected async protocols**, never owned:

```python
from keel_llm_reliability import InProcessBreaker, ResilientClient

# Default: zero-config in-process breaker (wraps keel-circuit-breaker).
client = ResilientClient(adapters)                       # InProcessBreaker()

# At scale: swap in a Redis-backed breaker/limiter (same protocol) for cross-worker
# state — the orchestrator code doesn't change.
client = ResilientClient(adapters, breaker=my_redis_breaker, limiter=my_redis_limiter)
```

The protocols are async precisely so a Redis-backed implementation (which does network I/O) can satisfy them — the in-process default just returns immediately.

## Status

`0.1.1` — quorum semantics are grounded in a production multi-model deployment; failover serves the single-answer broad base. `0.x` while the API stabilizes through year one (breaking changes possible at minor bumps, documented in the CHANGELOG; **pin exact versions**).

## The Keel toolkit

Composable, vendor-neutral LLM reliability libraries on PyPI:
[`keel-llm-reliability`](https://pypi.org/project/keel-llm-reliability/) · [`keel-llm-protocol`](https://pypi.org/project/keel-llm-protocol/) · [`keel-llm-adapter-openai`](https://pypi.org/project/keel-llm-adapter-openai/) · [`keel-llm-adapter-anthropic`](https://pypi.org/project/keel-llm-adapter-anthropic/) · [`keel-llm-adapter-google`](https://pypi.org/project/keel-llm-adapter-google/) · [`keel-circuit-breaker`](https://pypi.org/project/keel-circuit-breaker/)

MIT licensed.
