LLM Providers
The "brain" is swappable. The framework core depends only on the LLMAdapter abstraction, so no vendor SDK API leaks into your agent code (Dependency Inversion).
Selecting a brain
Pass a "provider:model" string anywhere a brain is accepted (@agent(llm=...), Pipeline(llm=...), LocalRunner(llm=...), or create_adapter(...)).
from weave import create_adapter
brain = create_adapter("anthropic:claude-opus-4-8")
text = await brain.complete("Hello", system="Be brief.")
Supported providers
| Spec prefix | Backend | Install extra | API key env |
|---|---|---|---|
openai: | OpenAI Chat Completions | weaveflow[openai] | OPENAI_API_KEY |
anthropic: | Anthropic Messages | weaveflow[anthropic] | ANTHROPIC_API_KEY |
google: | Google Gemini | weaveflow[google] | GOOGLE_API_KEY / GEMINI_API_KEY |
mistral: | Mistral (OpenAI-compatible) | weaveflow[mistral] | MISTRAL_API_KEY |
deepseek: | DeepSeek (OpenAI-compatible) | weaveflow[deepseek] | DEEPSEEK_API_KEY |
ollama: | Local Ollama (OpenAI-compatible) | weaveflow[ollama] | none (local) |
Examples: "openai:gpt-4o", "anthropic:claude-opus-4-8", "google:gemini-1.5-pro", "mistral:mistral-large-latest", "deepseek:deepseek-chat", "ollama:llama3". Ollama reads OLLAMA_HOST (default http://localhost:11434/v1).
Provider SDKs are imported lazily. A missing one raises an actionable AdapterNotInstalledError telling you which extra to install.
Timeouts & retries
Every adapter is bounded by a timeout (default 30s) and retries transient failures with exponential backoff + jitter (max_retries, default 2). Deterministic framework errors (e.g. a missing SDK or API key) are surfaced immediately, never retried. The provider client is built once and reused (connection pooling).
from weave import create_adapter
brain = create_adapter("openai:gpt-4o", timeout=10, max_retries=3)
# pass the configured adapter straight to an agent or pipeline:
# @agent(..., llm=brain) · Pipeline([...], llm=brain)
After the retry budget is exhausted, the last failure is normalized into an AdapterError with the attempt count and underlying cause in its detail.
The adapter contract
from weave import LLMAdapter
class LLMAdapter(ABC):
async def complete(self, prompt: str, *, system: str | None = None, **opts) -> str: ...
def stream(self, prompt: str, *, system: str | None = None, **opts): ... # async iterator
complete returns a single string; stream yields tokens. Provider errors are normalized into AdapterError so failures are uniform.
Bring your own provider
Implement the contract and register it, with no edits to existing dispatch logic (Open/Closed):
from weave import LLMAdapter, register_provider, create_adapter
class EchoAdapter(LLMAdapter):
async def complete(self, prompt, *, system=None, **opts):
return prompt
async def stream(self, prompt, *, system=None, **opts):
for token in prompt.split():
yield token
register_provider("echo", EchoAdapter)
brain = create_adapter("echo:any-model")
You can also pass an already-constructed adapter instance anywhere a spec string is accepted. This is useful for tests and custom configuration:
Pipeline([my_agent], llm=EchoAdapter(model="x"))
Streaming
stream is an async generator, ideal for the stream data type and long-form output:
async for token in ctx.stream("Write a haiku about ports"):
print(token, end="", flush=True)