Metadata-Version: 2.4
Name: keel-llm-protocol
Version: 0.1.2
Summary: The universal ModelAdapter Protocol for LLM clients — a vendor-neutral adapter interface (the "JDBC of LLMs"). Zero dependencies.
Project-URL: Homepage, https://github.com/keelplatform/keel
Project-URL: Source, https://github.com/keelplatform/keel/tree/main/py/packages/llm-protocol
Project-URL: Changelog, https://github.com/keelplatform/keel/blob/main/py/packages/llm-protocol/CHANGELOG.md
Author: Raj Yakkali
License: MIT
Keywords: adapter,interface,keel,llm,protocol,vendor-neutral
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown

# keel-llm-protocol

> The vendor-neutral LLM adapter standard — the "JDBC of LLMs." A small, zero-dependency interface any LLM adapter implements, plus the standard types adapters exchange and a standardized error taxonomy. Implement it once; plug into anything that speaks it.

Part of the **Keel** toolkit. No provider lock-in, no base class to inherit, no framework. A reference implementation against the OpenAI-compatible wire format ships as [`keel-llm-adapter-openai`](https://pypi.org/project/keel-llm-adapter-openai/), so the standard is grounded in a working adapter — not just an interface document.

## Why a standard

Every product talking to more than one LLM provider rebuilds the same shim: a common interface so the rest of the code doesn't care which provider answered. Everyone's shim is slightly different, so nothing composes. `keel-llm-protocol` is that interface, written once, vendor-neutral, with the part that actually matters for reliability standardized too:

- **A standardized error taxonomy** — every adapter raises typed, `retryable`-tagged failures. A 429 is `RateLimitError` whether it came from Groq, Gemini, or a local model, so circuit breakers, retries, and failover act on *types* instead of parsing provider strings. (This is the difference that lets a circuit breaker *not* trip on a 429 — a healthy-but-throttled model — and defer to a rate limiter instead.)
- **Composable capability protocols** — implement only what your backend supports.
- **Normalized results** — `Usage`, `FinishReason`, `ToolCall` mean the same thing everywhere.

## Is this for you?

**Adopt when** — you're building tooling across multiple LLM providers (a router, evaluator, observability layer, gateway); writing a vendor-neutral adapter; or want a shared error vocabulary across LLM clients.
**Skip when** — you only ever call one provider in a simple app (the official SDK is enough); or you're already invested in another abstraction (LangChain's `BaseChatModel`, LlamaIndex's `LLM`).

## Install

```bash
pip install keel-llm-protocol     # or: uv add keel-llm-protocol
```

Zero runtime dependencies (stdlib only).

## The composable protocols

```python
from keel_llm_protocol import ModelAdapter, StreamingModelAdapter, ToolCallingModelAdapter
```

- **`ModelAdapter`** — the core. `model_key`, `generate(messages, ...) -> AdapterResponse`, `health_check()`. Every adapter implements this.
- **`StreamingModelAdapter`** — adds `stream(...) -> AsyncIterator[StreamChunk]`. Implement only if your backend streams.
- **`ToolCallingModelAdapter`** — adds `generate_with_tools(messages, tools, ...)`. Implement only if your backend supports tools.

Consumers type against the capability they need — a plain text model and a streaming tool-using model both fit, and the type checker tells a consumer at compile time whether the adapter it was handed can stream.

## Implement an adapter (structural — no inheritance)

```python
from keel_llm_protocol import (
    ModelAdapter, Message, AdapterResponse, HealthStatus, Usage, user,
)

class MyAdapter:
    @property
    def model_key(self) -> str:
        return "myprovider:my-model"

    async def generate(self, messages, *, temperature=None, max_tokens=None,
                       stop=None, response_format=None) -> AdapterResponse:
        # ... call the provider, map failures to keel_llm_protocol.errors.* ...
        return AdapterResponse(
            text="...", model_key=self.model_key, model_id="my-model",
            usage=Usage(input_tokens=12, output_tokens=8), finish_reason="stop",
        )

    async def health_check(self) -> HealthStatus:
        return HealthStatus(model_key=self.model_key, healthy=True)

adapter: ModelAdapter = MyAdapter()
assert isinstance(adapter, ModelAdapter)        # @runtime_checkable
response = await adapter.generate([user("hello")])
```

## The error taxonomy (the reliability core)

Every adapter maps its provider's failures to these types, so consuming reliability logic is provider-agnostic:

```python
from keel_llm_protocol.errors import (
    AdapterError,          # base — catch this for "any adapter failure"
    RateLimitError,        # 429 / quota — retryable=True, carries retry_after
    AuthenticationError,   # 401/403 — retryable=False
    AdapterTimeoutError,   # timed out — retryable=True
    TransientError,        # 5xx / connection — retryable=True
    ContentFilterError,    # content policy — retryable=False
    ContextLengthError,    # context window exceeded — retryable=False
    BadRequestError,       # 400 invalid request — retryable=False
    ProviderError,         # unexpected — retryable=False
)

try:
    resp = await adapter.generate(messages)
except RateLimitError as e:
    await asyncio.sleep(e.retry_after or 1.0)   # healthy but throttled — back off, don't trip the breaker
except AdapterError as e:
    if e.retryable:
        ...   # retry / failover
    else:
        raise
```

`err.retryable` lets generic retry/breaker logic decide *without* knowing the provider. That single fact is what every product otherwise reimplements per provider.

## Consuming the taxonomy (how to *act* on a typed error)

Typed errors only improve reliability if reliability machinery acts on them correctly. The key is that there are **three** reactions, not two — and a plain `retryable` boolean can't express the most important one. Dispatch on `error.category`:

- **`"backpressure"`** (e.g. 429) — a rate-limited model is *healthy, not failing*. **Defer** (let a rate limiter pace it); do **not** retry now, and do **not** record a circuit-breaker failure. Tripping a breaker on a 429 skips a *working* model — the opposite of what you want.
- **`"transient"`** (5xx, timeout) — a real but temporary failure → **retry / fail over** (this one *does* count as a breaker failure).
- **`"terminal"`** (auth, bad request, context-length, content-filter) — won't change on retry → **fail fast**; retrying just burns quota.

```python
from keel_llm_protocol.errors import AdapterError

try:
    return await adapter.generate(messages)
except AdapterError as e:
    if e.category == "backpressure":
        ...        # defer to the rate limiter; do NOT record a breaker failure
    elif e.category == "transient":
        ...        # retry / fail over (this one DOES count as a breaker failure)
    else:          # "terminal"
        raise      # fail fast — won't change on retry
```

> Grounded result: the `backpressure`-vs-`transient` split — *defer* the 429 instead of *retrying-or-failing* it — moved a throttled model from **3/10 to 10/10** availability in a real multi-model fan-out (the model stayed in rotation instead of being spuriously circuit-broken). `retryable` remains as a convenience (`category != "terminal"`), but `category` is the source of truth because only it distinguishes *defer* from *retry-now*. The machinery that implements this dispatch lives in [`keel-llm-reliability`](https://pypi.org/project/keel-llm-reliability/).

This is *guidance*, not behavior baked into the protocol — routing, retry, and breaker policy live in your code (the protocol never decides or holds state). The taxonomy gives you the typed signal; this is how to use it well.

## Standard types

```python
@dataclass
class AdapterResponse:
    text: str
    model_key: str
    model_id: str
    finish_reason: FinishReason = "stop"     # "stop" | "length" | "tool_calls" | "content_filter" | "error"
    usage: Usage = Usage()                    # input_tokens, output_tokens, cost_usd?, .total_tokens
    tool_calls: list[ToolCall] = []
    latency_ms: int = 0
    raw_response: dict[str, Any] | None = None   # escape hatch; not part of the stable contract

@dataclass
class StreamChunk:
    delta: str = ""
    finish_reason: FinishReason | None = None
    usage: Usage | None = None               # typically only on the final chunk
```

Message helpers keep call sites clean: `system(...)`, `user(...)`, `assistant(...)`, `tool(..., tool_call_id=...)`.

## What this is *not* (the framework line)

A Protocol describes the *shape* of a call and the *type* of a result. It deliberately does **not**: execute tools or run agent loops, decide retry/backoff policy, route failover, buffer/reassemble streams, or hold conversation state. Those are the consumer's to own. *If it makes a decision or holds state, it isn't in this package.*

## Status

`0.1.2` — grounded by [`keel-llm-adapter-openai`](https://pypi.org/project/keel-llm-adapter-openai/) as a working reference implementation. An interface is most valuable when stable; the surface is intentionally minimal-but-complete and stays in `0.x` through year one (changes documented in the CHANGELOG; **pin exact versions**).

## The Keel toolkit

Composable, vendor-neutral LLM reliability libraries on PyPI:
[`keel-llm-reliability`](https://pypi.org/project/keel-llm-reliability/) · [`keel-llm-protocol`](https://pypi.org/project/keel-llm-protocol/) · [`keel-llm-adapter-openai`](https://pypi.org/project/keel-llm-adapter-openai/) · [`keel-llm-adapter-anthropic`](https://pypi.org/project/keel-llm-adapter-anthropic/) · [`keel-llm-adapter-google`](https://pypi.org/project/keel-llm-adapter-google/) · [`keel-circuit-breaker`](https://pypi.org/project/keel-circuit-breaker/)

MIT licensed.
