Metadata-Version: 2.4
Name: keel-llm-reliability
Version: 0.1.0
Summary: Composed reliability for multi-model LLM calls — quorum fan-out + primary/failover, category-dispatched, transparent degradation. Built on keel-llm-protocol + keel-circuit-breaker.
Project-URL: Homepage, https://github.com/keelplatform/keel
Project-URL: Source, https://github.com/keelplatform/keel/tree/main/py/packages/llm-reliability
Project-URL: Changelog, https://github.com/keelplatform/keel/blob/main/py/packages/llm-reliability/CHANGELOG.md
Author: Raj Yakkali
License: MIT
Keywords: failover,fan-out,keel,llm,reliability,resilience
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: keel-circuit-breaker>=0.1.0
Requires-Dist: keel-llm-protocol>=0.1.0
Description-Content-Type: text/markdown

# keel-llm-reliability

> Production-grade reliability for multi-model LLM calls — quorum fan-out **and** primary/failover, category-dispatched, with *transparent degradation*. The composed solution, not a parts bin.

Part of [Keel](https://github.com/keelplatform/keel). Composes [`keel-llm-protocol`](https://github.com/keelplatform/keel/tree/main/py/packages/llm-protocol) (the error taxonomy) + [`keel-circuit-breaker`](https://github.com/keelplatform/keel/tree/main/py/packages/circuit-breaker) into the consumer-side machinery that *acts* on typed errors — so you don't hand-write the fan-out/failover loop.

## Why it exists

A typed error taxonomy tells you *what* failed; this tells your app *what to do about it*. The core lesson (measured in production): **a rate-limited model is healthy, not failing** — defer it, don't trip its circuit. Acting on that one distinction moved a throttled model from 3/10 to 10/10 availability. This package generalizes it into two strategies and makes every decision visible.

## Install

```bash
pip install keel-llm-reliability     # pulls in keel-llm-protocol + keel-circuit-breaker
```

## Two strategies

```python
from keel_llm_reliability import ResilientClient, Request
from keel_llm_protocol import user

client = ResilientClient([groq_adapter, gemini_adapter, openai_adapter])
req = Request(messages=[user("Summarize this in one line.")])

# Primary + ordered failover — the single-good-answer case (most apps):
result = await client.failover(req)
if result.succeeded:
    print(result.response.text)

# Quorum / parallel fan-out — the ensemble/council case:
result = await client.fan_out(req)
for r in result.successes:        # every model that answered
    ...
```

Both are also available as plain functions (`fan_out`, `failover`) if you'd rather wire collaborators yourself.

## Transparent degradation — every decision is data

No silent retries, no hidden fallbacks. Every provider interaction is a visible `Attempt`:

```python
result = await client.failover(req)
for a in result.attempts:
    print(a.model_key, a.outcome, a.latency_ms, a.error and a.error.category)
# groq:…     deferred_backpressure  120   backpressure   (throttled — skipped, NOT failed)
# gemini:…   failed                 310   transient      (5xx — counted, failed over)
# openai:…   success                420   None
```

`outcome` is one of `success` / `preempted_open` / `preempted_limited` / `deferred_backpressure` / `failed`. A `failed` attempt carries its `error.category` (`transient` vs `terminal`) so you can tell "flaky" from "broken config." This generalizes a council's `judges_count` — degradation you can see and operate on.

## How it behaves (category-dispatched)

| Error category | fan_out (quorum) | failover |
|---|---|---|
| `backpressure` (429) | defer — contributes nothing this round; **no breaker failure** | route to the next candidate immediately; **no breaker failure** |
| `transient` (5xx, timeout) | record a breaker failure; that model contributes nothing | record a breaker failure; fail over (optionally retry the same model up to `transient_retries`) |
| `terminal` (auth/bad-request/context/content) | visible `failed`; **no breaker failure** (request-level, not model health) | visible `failed`; fail over |

Before any dispatch, both strategies **preempt**: a model whose breaker is open (`preempted_open`) or whose limiter predicts it's full (`preempted_limited`) is skipped — *predict, don't block*. There are no hidden sleeps; exhaustion returns visibly (empty successes / `response=None`).

## Injected collaborators — born ready for scale

The `Breaker` and (optional) `Limiter` are **injected async protocols**, never owned:

```python
from keel_llm_reliability import InProcessBreaker, ResilientClient

# Default: zero-config in-process breaker (wraps keel-circuit-breaker).
client = ResilientClient(adapters)                       # InProcessBreaker()

# At scale: swap in a Redis-backed breaker/limiter (same protocol) for cross-worker
# state — the orchestrator code doesn't change.
client = ResilientClient(adapters, breaker=my_redis_breaker, limiter=my_redis_limiter)
```

The protocols are async precisely so a Redis-backed implementation (which does network I/O) can satisfy them — the in-process default just returns immediately.

## Status

`0.1.0` — first release. Quorum semantics are grounded in LLMCouncil's production fan-out (PR #77); failover serves the single-answer broad base. Pin exact versions while in `0.x`. Source: [Keel monorepo](https://github.com/keelplatform/keel/tree/main/py/packages/llm-reliability).

## License

MIT — see [LICENSE](https://github.com/keelplatform/keel/blob/main/LICENSE).
