Metadata-Version: 2.4
Name: retrievalagent
Version: 0.2.1
Summary: retrievalagent — multi-backend retrieval-augmented generation with LangGraph
Project-URL: Homepage, https://github.com/bmsuisse/retrievalagent
Project-URL: Repository, https://github.com/bmsuisse/retrievalagent
Author: Dominik Peter
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: langchain-openai>=0.3.0
Requires-Dist: langchain>=0.3.0
Requires-Dist: langgraph>=0.4.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: stop-words>=2024.1.1
Requires-Dist: tenacity>=8.2.0
Provides-Extra: all
Requires-Dist: azure-identity>=1.19.0; extra == 'all'
Requires-Dist: azure-search-documents>=11.6.0; extra == 'all'
Requires-Dist: chromadb>=1.0.0; extra == 'all'
Requires-Dist: cohere>=5.21.1; extra == 'all'
Requires-Dist: duckdb>=1.2.0; extra == 'all'
Requires-Dist: faiss-cpu>=1.9.0; extra == 'all'
Requires-Dist: httpx>=0.27.0; extra == 'all'
Requires-Dist: lancedb>=0.20.0; extra == 'all'
Requires-Dist: meilisearch>=0.40.0; extra == 'all'
Requires-Dist: pgvector>=0.4.0; extra == 'all'
Requires-Dist: psycopg[binary]>=3.2.0; extra == 'all'
Requires-Dist: python-dotenv>=1.2.2; extra == 'all'
Requires-Dist: qdrant-client>=1.12.0; extra == 'all'
Requires-Dist: rerankers>=0.6.0; extra == 'all'
Requires-Dist: rich>=13.0.0; extra == 'all'
Requires-Dist: sentence-transformers>=3.0.0; extra == 'all'
Provides-Extra: azure
Requires-Dist: azure-identity>=1.19.0; extra == 'azure'
Requires-Dist: azure-search-documents>=11.6.0; extra == 'azure'
Provides-Extra: cache-pg
Requires-Dist: psycopg-pool>=3.2.0; extra == 'cache-pg'
Requires-Dist: psycopg[binary]>=3.2.0; extra == 'cache-pg'
Provides-Extra: chromadb
Requires-Dist: chromadb>=1.0.0; extra == 'chromadb'
Provides-Extra: cli
Requires-Dist: python-dotenv>=1.2.2; extra == 'cli'
Requires-Dist: rich>=13.0.0; extra == 'cli'
Provides-Extra: cohere
Requires-Dist: cohere>=5.21.1; extra == 'cohere'
Provides-Extra: duckdb
Requires-Dist: duckdb>=1.2.0; extra == 'duckdb'
Provides-Extra: embed-anything
Requires-Dist: embed-anything>=0.7.0; extra == 'embed-anything'
Provides-Extra: eval
Requires-Dist: bm25s>=0.3.3; extra == 'eval'
Requires-Dist: datasets>=4.8.4; extra == 'eval'
Requires-Dist: mteb>=2.12.11; extra == 'eval'
Requires-Dist: pandas>=3.0.2; extra == 'eval'
Requires-Dist: pyarrow>=23.0.1; extra == 'eval'
Requires-Dist: python-dotenv>=1.2.2; extra == 'eval'
Requires-Dist: rich>=13.0.0; extra == 'eval'
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.9.0; extra == 'faiss'
Provides-Extra: huggingface
Requires-Dist: sentence-transformers>=3.0.0; extra == 'huggingface'
Provides-Extra: jina
Requires-Dist: httpx>=0.27.0; extra == 'jina'
Provides-Extra: lancedb
Requires-Dist: lancedb>=0.20.0; extra == 'lancedb'
Requires-Dist: pyarrow>=23.0.1; extra == 'lancedb'
Provides-Extra: meilisearch
Requires-Dist: meilisearch>=0.40.0; extra == 'meilisearch'
Provides-Extra: pgvector
Requires-Dist: pgvector>=0.4.0; extra == 'pgvector'
Requires-Dist: psycopg[binary]>=3.2.0; extra == 'pgvector'
Provides-Extra: qdrant
Requires-Dist: qdrant-client>=1.12.0; extra == 'qdrant'
Provides-Extra: recommended
Requires-Dist: cohere>=5.21.1; extra == 'recommended'
Requires-Dist: meilisearch>=0.40.0; extra == 'recommended'
Requires-Dist: python-dotenv>=1.2.2; extra == 'recommended'
Requires-Dist: rich>=13.0.0; extra == 'recommended'
Provides-Extra: rerankers
Requires-Dist: rerankers>=0.6.0; extra == 'rerankers'
Provides-Extra: tune
Requires-Dist: optuna>=4.8.0; extra == 'tune'
Description-Content-Type: text/markdown

# retrievalagent

<div align="center">

**An autonomous retrieval-augmented generation agent.**
Plug in any vector store, any LLM, any reranker. Hybrid search,
reranking, query rewriting, an LLM quality gate, and an autonomous
retry loop — built on LangGraph.

[![PyPI](https://img.shields.io/pypi/v/retrievalagent)](https://pypi.org/project/retrievalagent/)
[![Python](https://img.shields.io/pypi/pyversions/retrievalagent)](https://pypi.org/project/retrievalagent/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![CI](https://github.com/bmsuisse/retrievalagent/actions/workflows/ci.yml/badge.svg)](https://github.com/bmsuisse/retrievalagent/actions/workflows/ci.yml)

</div>

---

```python
from retrievalagent import init_agent

rag = init_agent("documents", model="openai:gpt-5.4", backend="qdrant")
state = rag.chat("What is the status of operation overlord?")
print(state.answer)
```

---

## Scope — Retrieval, Not Ingestion

`retrievalagent` is built for **retrieval quality at query time** —
hybrid search, reranking, query rewriting, an autonomous retry loop,
and an LLM quality gate.

**Ingestion is out of scope.** The library does not chunk, clean,
embed-at-scale, or index your corpus. Use a dedicated tool for that —
[Docling](https://github.com/docling-project/docling),
[Unstructured](https://unstructured.io),
[LlamaIndex](https://www.llamaindex.ai), a Databricks job, or a
custom script — then point `retrievalagent` at the resulting index.
Every backend exposes a minimal `add_documents()` helper for
convenience and smoke tests; it is not meant to replace a real
ingestion pipeline.

The narrow surface is deliberate: one thing, done well.

---

## What it does

Most retrieval systems do a single search pass. `retrievalagent`
runs a state machine that retrieves, evaluates the result, rewrites
when needed, and retries — all autonomously, up to `max_iter` rounds.

Per query the agent will:

1. **Understand the intent** — rewrite the question into precise
   search keywords, detect keyword-vs-semantic, and pick the hybrid
   ratio.
2. **Search broadly** — run query variants in parallel across BM25
   and vector search; fuse the results; rerank.
3. **Evaluate** — an LLM quality gate decides whether the retrieved
   docs actually answer the question.
4. **Adapt** — if not, rewrite the query and retry; on hard
   failures, swarm-retrieve with parallel strategies as a fallback.
5. **Generate** — only once the evidence holds, produce a cited,
   grounded answer.

---

## Features

- **Fully async pipeline** — parallel HyDE + preprocessing, zero
  blocking calls; every public op has sync and async variants.
- **LLM quality gate** — rejects weak results, drives the rewrite
  loop until the evidence holds.
- **Multi-query swarm** — fans out across BM25 + vector
  simultaneously, fuses results.
- **Autonomous retry loop** — retrieve → judge → rewrite → retry,
  up to `max_iter` rounds.
- **Hybrid search** — BM25 + vector, fused with RRF or DBSF.
- **HyDE** — hypothetical document embeddings for vague queries.
- **Tool-calling agent** — `get_index_settings`, `get_filter_values`,
  `search_hybrid`, `search_bm25`, `rerank_results`; the LLM picks
  tools dynamically.
- **Multiple rerankers** — Cohere, HuggingFace, Jina, ColBERT,
  RankGPT, embed-anything, or a custom callable.
- **8 search backends** — Meilisearch, Azure AI Search, ChromaDB,
  LanceDB, Qdrant, pgvector, DuckDB, InMemory.
- **Any LLM** — OpenAI, Azure, Anthropic, Ollama, Vertex AI, or any
  LangChain `BaseChatModel`.
- **One-line init** — `init_agent("docs", model="openai:gpt-5.4", backend="qdrant")`.
- **Multi-turn chat** — conversation history with citation-aware
  answers.
- **Auto-strategy** — the agent samples your collection at init and
  tunes itself.
- **Optuna auto-tuner** — 20+ retrieval knobs tuned to your corpus
  in ~5 min; `None` is a first-class value for disabling stages.
  [Full guide](docs/auto-tuning.md).

---

## Install

```bash
# Recommended — Meilisearch + Cohere reranker + interactive CLI
pip install retrievalagent[recommended]

# Base only — in-memory backend, BM25 keyword search
pip install retrievalagent
```

| Extra | What you get | Command |
|-------|-------------|---------|
| **`recommended`** | Meilisearch + Cohere reranker + Rich CLI | `pip install retrievalagent[recommended]` |
| `cli` | Interactive CLI with guided setup wizard | `pip install retrievalagent[cli]` |
| `all` | Every backend + reranker + CLI | `pip install retrievalagent[all]` |

<details>
<summary>Individual backends &amp; rerankers</summary>

```bash
pip install retrievalagent[meilisearch]
pip install retrievalagent[azure]
pip install retrievalagent[chromadb]
pip install retrievalagent[lancedb]
pip install retrievalagent[pgvector]
pip install retrievalagent[qdrant]
pip install retrievalagent[duckdb]
pip install retrievalagent[cohere]
pip install retrievalagent[huggingface]
pip install retrievalagent[jina]
pip install retrievalagent[rerankers]      # ColBERT, Flashrank, RankGPT, …
pip install retrievalagent[embed-anything] # Local Rust-accelerated embeddings + reranking
```

Mix and match: `pip install retrievalagent[qdrant,cohere,cli]`

</details>

---

## Quick Start

### One-liner with `init_agent`

The fastest way to get started — no provider imports, string aliases for everything:

```python
from retrievalagent import init_agent

# Minimal — in-memory backend, LLM from env vars
rag = init_agent("docs")

# OpenAI + Qdrant + Cohere reranker
rag = init_agent(
    "my-collection",
    model="openai:gpt-5.4",
    backend="qdrant",
    backend_url="http://localhost:6333",
    reranker="cohere",
)

# Anthropic + Azure AI Search (native vectorisation, no client-side embeddings)
rag = init_agent(
    "my-index",
    model="anthropic:claude-sonnet-4-6",
    gen_model="anthropic:claude-opus-4-6",
    backend="azure",
    backend_url="https://my-search.search.windows.net",
    reranker="huggingface",
    auto_strategy=True,
)

# Fully local — Ollama + ChromaDB + HuggingFace cross-encoder
rag = init_agent(
    "docs",
    model="ollama:llama3",
    backend="chroma",
    reranker="huggingface",
    reranker_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
)
```

### Multi-collection routing

Pass several collections and let the agent decide which to search. The LLM
picks the relevant subset before retrieval, using either the collection names
alone or optional natural-language descriptions.

```python
from retrievalagent import init_agent

# List form — LLM routes by name only
rag = init_agent(
    collections=["products", "faq", "policies"],
    backend="qdrant",
    backend_url="http://localhost:6333",
    model="openai:gpt-5.4",
)

# Dict form — LLM routes using descriptions (better precision)
rag = init_agent(
    collections={
        "products": "Product catalog: SKUs, prices, specs, availability",
        "faq":      "Customer-facing FAQ, troubleshooting, return policy",
        "policies": "Internal HR/legal/compliance policy documents",
    },
    backend="qdrant",
    backend_url="http://localhost:6333",
    model="openai:gpt-5.4",
)

rag.invoke("What's our return policy?")       # → routes to faq / policies
rag.invoke("Price of SKU 12345?")              # → routes to products
```

Each retrieved document carries its origin in `metadata["_collection"]` so you
can merge, filter, or attribute citations downstream. One backend instance is
built per collection; they share the same backend type and URL.

**Backend aliases**

| Alias | Class | Extra |
|-------|-------|-------|
| `"memory"` / `"in_memory"` | `InMemoryBackend` | _(none)_ |
| `"chroma"` / `"chromadb"` | `ChromaDBBackend` | `retrievalagent[chromadb]` |
| `"qdrant"` | `QdrantBackend` | `retrievalagent[qdrant]` |
| `"lancedb"` / `"lance"` | `LanceDBBackend` | `retrievalagent[lancedb]` |
| `"duckdb"` | `DuckDBBackend` | `retrievalagent[duckdb]` |
| `"pgvector"` / `"pg"` | `PgvectorBackend` | `retrievalagent[pgvector]` |
| `"meilisearch"` | `MeilisearchBackend` | `retrievalagent[meilisearch]` |
| `"azure"` | `AzureAISearchBackend` | `retrievalagent[azure]` |

**Reranker aliases**

| Alias | Class | `reranker_model` | Extra |
|-------|-------|-----------------|-------|
| `"cohere"` | `CohereReranker` | Cohere model name (default: `rerank-v3.5`) | `retrievalagent[cohere]` |
| `"huggingface"` / `"hf"` | `HuggingFaceReranker` | HF model name (default: `cross-encoder/ms-marco-MiniLM-L-6-v2`) | `retrievalagent[huggingface]` |
| `"jina"` | `JinaReranker` | Jina model name (default: `jina-reranker-v2-base-multilingual`) | `retrievalagent[jina]` |
| `"llm"` | `LLMReranker` | _(uses the agent's LLM)_ | _(none)_ |
| `"rerankers"` | `RerankersReranker` | Any model from the `rerankers` library | `retrievalagent[rerankers]` |
| `"embed-anything"` | `EmbedAnythingReranker` | ONNX reranker model (default: `jina-reranker-v1-turbo-en`) | `retrievalagent[embed-anything]` |

```python
# Cohere (default model)
rag = init_agent("docs", model="openai:gpt-5.4", reranker="cohere")

# HuggingFace — multilingual model
rag = init_agent("docs", model="openai:gpt-5.4", reranker="huggingface",
                 reranker_model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1")

# Jina
rag = init_agent("docs", model="openai:gpt-5.4", reranker="jina")  # uses JINA_API_KEY

# ColBERT via rerankers library
rag = init_agent("docs", model="openai:gpt-5.4", reranker="rerankers",
                 reranker_model="colbert-ir/colbertv2.0",
                 reranker_kwargs={"model_type": "colbert"})

# Pass a pre-built reranker instance directly
from retrievalagent import CohereReranker
rag = init_agent("docs", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))
```

**Model strings:** any `"provider:model-name"` from LangChain's `init_chat_model` — `openai`, `anthropic`, `azure_openai`, `google_vertexai`, `ollama`, `groq`, `mistralai`, and more

### Manual setup

```python
from retrievalagent import Agent, InMemoryBackend

backend = InMemoryBackend(embed_fn=my_embed_fn)
backend.add_documents([
    {"content": "RAG combines retrieval with generation", "source": "wiki"},
    {"content": "Vector search finds similar embeddings", "source": "docs"},
])

rag = Agent(index="demo", backend=backend)

# Single query → full answer
state = rag.invoke("What is retrieval-augmented generation?")
print(state.answer)

# Retrieve only — documents without LLM answer
query, docs = rag.retrieve_documents("What is retrieval-augmented generation?")
for doc in docs:
    print(doc.page_content)

# Override top-K at call time
query, docs = rag.retrieve_documents("hybrid search", top_k=3)
```

### `Agent.from_model` — model string with explicit backend

```python
from retrievalagent import Agent, QdrantBackend

rag = Agent.from_model(
    "openai:gpt-5.4-mini",          # fast model for routing & rewriting
    index="docs",
    gen_model="openai:gpt-5.4",     # powerful model for the final answer
    backend=QdrantBackend("docs", url="http://localhost:6333"),
)
```

---

## Multi-turn Chat

```python
from retrievalagent import Agent, ConversationTurn

rag = Agent(index="articles")
history: list[ConversationTurn] = []

state = rag.chat("What is hybrid search?", history)
history.append(ConversationTurn(question="What is hybrid search?", answer=state.answer))

state = rag.chat("How does it compare to pure vector search?", history)
print(state.answer)
print(f"Sources: {len(state.documents)}")
```

Async variant:

```python
state = await rag.achat("What is hybrid search?", history)
```

### Cross-session memory with mem0

`history=` only carries the current session. For facts that should
persist across sessions (user preferences, past Q&A), plug
[mem0](https://docs.mem0.ai) into the agent — it uses an LLM to
extract discrete facts, deduplicate them, and resolve conflicts.

```bash
pip install mem0ai
```

```python
from retrievalagent import Agent
from mem0 import Memory  # or AsyncMemory for async-native I/O

rag = Agent(index="articles", mem0_memory=Memory())

cfg = {"configurable": {"user_id": "alice"}}
rag.invoke("I prefer answers in German.", config=cfg)
rag.invoke("What is hybrid search?", config=cfg)
# → mem0 stored the language preference and recalls it on the second call
```

How it integrates: the graph runs `read_memory` before retrieval and
`write_memory` after answer generation. Each call passes the
`user_id` to mem0's `search`/`add` so memories are scoped per user.
Recalled facts are filtered by mem0's relevance score against
`memory_relevance_threshold` (env: `RAG_MEMORY_RELEVANCE_THRESHOLD`,
default `0.7`) — only memories that clear the bar are injected, both
as a BM25 extra term during retrieval and as a system-prompt prefix
during generation. Below-threshold matches are still logged to
`state.trace` for diagnostics but do not influence the answer.
mem0 brings its own vector/graph store; see its
[docs](https://docs.mem0.ai) to point it at OpenAI, Anthropic,
Qdrant, Postgres, etc. See [`docs/memory.md`](docs/memory.md) for
the full memory matrix (history vs. checkpointer vs. memory_store
vs. mem0).

---

## Architecture

retrievalagent has two operating modes — both fully autonomous:

### Graph mode (`rag.chat` / `rag.invoke`)

The default. A LangGraph state machine that runs the full agentic pipeline:

```
Query
  │
  ├─[HyDE]──────────────────────────────────────────┐
  │  Hypothetical document embedding (parallel)      │
  │                                                  ▼
  ▼                                         [Embed HyDE text]
[Preprocess]                                         │
  Extract keywords + variants                        │
  Detect semantic_ratio + fusion strategy            │
  │                                                  │
  └──────────────────────────────────────────────────┘
                        │
                        ▼
              [Hybrid Search × N queries]
               BM25 + Vector, multi-arm
                        │
                        ▼
               [RRF / DBSF Fusion]
                        │
                        ▼
                    [Rerank]
               Cohere / HF / Jina / embed-anything / LLM
                        │
                        ▼
               [Quality Gate]
               LLM judges relevance
                   │         │
                (good)     (bad)
                   │         │
                   ▼         ▼
              [Generate]  [Rewrite] ──► loop (max_iter)
                   │
                   ▼
        Answer + [n] inline citations
```

### Tool-calling agent mode (`rag.invoke_agent`)

The agent receives a set of tools and reasons step-by-step, calling them in whatever order makes sense for the question. No fixed pipeline — pure field improvisation:

```
Query
  │
  ▼
[LLM Agent]  ◄──────────────────────────────────────┐
  Thinks: "What do I need to answer this?"           │
  │                                                  │
  ├── get_index_settings()                           │
  │   Discover filterable / sortable / boost fields  │
  │                                                  │
  ├── get_filter_values(field)                       │
  │   Sample real stored values for a field          │
  │   → build precise filter expressions             │
  │                                                  │
  ├── search_hybrid(query, filter, sort_fields)      │
  │   BM25 + vector, optional filter + sort boost    │
  │                                                  │
  ├── search_bm25(query, filter)                     │
  │   Fallback pure keyword search                   │
  │                                                  │
  ├── rerank_results(query, hits)                    │
  │   Re-rank with configured reranker               │
  │                                                  │
  └── [needs more info?] ─────────────────────────► │

  [done]
  │
  ▼
Answer  (tool calls explained inline)
```

Use `invoke_agent` when questions involve **dynamic filtering** — the agent inspects the index schema, samples real field values, builds filters on the fly, and decides whether to sort by business signals like popularity or recency.

---

## Examples

### 1. Knowledge base Q&A (InMemory, no external services)

```python
from retrievalagent import AgenticRAG, InMemoryBackend
from langchain_openai import ChatOpenAI

docs = [
    {"id": "1", "content": "The Eiffel Tower was built in 1889 for the World's Fair in Paris.", "topic": "history"},
    {"id": "2", "content": "The Louvre is the world's largest art museum, located in Paris.", "topic": "art"},
    {"id": "3", "content": "Photosynthesis converts sunlight and CO2 into glucose and oxygen.", "topic": "science"},
    {"id": "4", "content": "The Python programming language was created by Guido van Rossum in 1991.", "topic": "tech"},
    {"id": "5", "content": "Machine learning is a subset of artificial intelligence.", "topic": "tech"},
]

backend = InMemoryBackend(documents=docs)
llm = ChatOpenAI(model="gpt-5.4-mini")

rag = AgenticRAG(index="kb", backend=backend, llm=llm, gen_llm=llm)

state = rag.invoke("When was the Eiffel Tower built?")
print(state.answer)
# → "The Eiffel Tower was built in 1889 for the World's Fair in Paris. [1]"
print(state.query)        # rewritten query
print(state.iterations)   # how many retrieval rounds it took
```

---

### 2. Retrieve documents without generating an answer

Useful when you want the docs and will handle the answer yourself:

```python
from retrievalagent import AgenticRAG, InMemoryBackend

rag = AgenticRAG(index="kb", backend=backend)

query, docs = rag.retrieve_documents("machine learning", top_k=3)
print(f"Rewritten query: {query}")
for doc in docs:
    print(doc.page_content)
    print(doc.metadata)  # original fields + _rankingScore
```

---

### 3. Multi-turn chat

```python
from retrievalagent import AgenticRAG, InMemoryBackend, ConversationTurn

rag = AgenticRAG(index="kb", backend=backend, llm=llm, gen_llm=llm)
history: list[ConversationTurn] = []

q1 = "What is machine learning?"
s1 = rag.chat(q1, history)
history.append(ConversationTurn(question=q1, answer=s1.answer))
print(s1.answer)

q2 = "How does it relate to AI?"   # pronoun resolved from history
s2 = rag.chat(q2, history)
history.append(ConversationTurn(question=q2, answer=s2.answer))
print(s2.answer)
```

---

### 4. Always-on filter (e-commerce: in-stock items only)

```python
from retrievalagent import AgenticRAG, MeilisearchBackend

backend = MeilisearchBackend(
    "products",
    url="http://localhost:7700",
    api_key="masterKey",
)

# Every search is scoped to in-stock items — no per-call boilerplate
rag = AgenticRAG(
    index="products",
    backend=backend,
    filter="is_in_stock = true",
    llm=llm,
    gen_llm=llm,
)

state = rag.invoke("red running shoes size 42")
for doc in state.documents:
    print(doc.metadata["product_name"], "|", doc.metadata["price"])
```

---

### 5. Filter + own-brand exclusion

```python
# Exclude own-brand articles and search for third-party alternatives
rag = AgenticRAG(
    index="products",
    backend=backend,
    filter="is_own_brand = false",
    llm=llm,
    gen_llm=llm,
)

state = rag.invoke("Find alternatives to our house-brand brake cleaner 500ml")
print(state.answer)
# LLM strips the brand prefix, rewrites to "brake cleaner 500ml",
# filter ensures only third-party results are returned.
```

---

### 6. Async usage (FastAPI / Databricks / Jupyter)

```python
import asyncio
from retrievalagent import AgenticRAG, InMemoryBackend

rag = AgenticRAG(index="kb", backend=backend, llm=llm, gen_llm=llm)

# Async single query
state = await rag.ainvoke("What is photosynthesis?")
print(state.answer)

# Async batch — runs all queries in parallel
states = await rag.abatch([
    "What is photosynthesis?",
    "Who created Python?",
    "Where is the Louvre?",
])
for s in states:
    print(s.answer)
```

Sync variants work from any context including Databricks/Jupyter (running event loop is handled automatically):

```python
# Safe to call from a notebook cell even with a running event loop
state = rag.invoke("What is photosynthesis?")
states = rag.batch(["question one", "question two"])
```

---

### 7. Tool-calling agent — dynamic filter discovery

When you don't know the filter values upfront, the agent inspects the schema and samples field values itself:

```python
from retrievalagent import AgenticRAG, MeilisearchBackend

rag = AgenticRAG(
    index="products",
    backend=MeilisearchBackend("products", url="http://localhost:7700"),
    llm=llm,
    gen_llm=llm,
)

# Agent calls get_index_settings() → get_filter_values("brand") →
# search_hybrid(filter="brand = 'Bosch'", sort_fields=["popularity"])
result = rag.invoke_agent("Show me the most popular Bosch power tools")
print(result)
```

---

### 8. Streaming the final answer

```python
async def stream_answer():
    async for chunk in rag.astream("Explain hybrid search in simple terms"):
        print(chunk, end="", flush=True)

asyncio.run(stream_answer())
```

---

### 9. Qdrant — vector search with metadata filter

```python
from retrievalagent import AgenticRAG, QdrantBackend
from qdrant_client import QdrantClient, models

# Insert docs (done once)
client = QdrantClient("http://localhost:6333")
client.upsert("articles", points=[
    models.PointStruct(id=1, vector=embed("RAG combines retrieval and generation"),
                       payload={"content": "RAG combines retrieval and generation", "year": 2023}),
    models.PointStruct(id=2, vector=embed("Vector databases store high-dimensional embeddings"),
                       payload={"content": "Vector databases store high-dimensional embeddings", "year": 2022}),
])

from qdrant_client.models import FieldCondition, MatchValue

rag = AgenticRAG(
    index="articles",
    backend=QdrantBackend("articles", url="http://localhost:6333", embed_fn=embed),
    llm=llm,
    gen_llm=llm,
)

# Pass native Qdrant filter dict — no string translation needed
state = rag.invoke("what is RAG?")
# Or with explicit filter at retrieve time:
_, docs = rag.retrieve_documents("vector databases")
```

---

### 10. Custom instructions (tone / domain)

```python
rag = AgenticRAG(
    index="legal_docs",
    backend=backend,
    llm=llm,
    gen_llm=llm,
    instructions=(
        "You are a legal assistant. Answer in formal language. "
        "Always cite the article number when referencing a law. "
        "If the context is insufficient, say so explicitly."
    ),
)

state = rag.invoke("What are the notice periods for dismissal?")
print(state.answer)
```

---

## Backends

### Azure AI Search

Native hybrid search — no client-side embeddings needed when the index has an integrated vectorizer:

```python
from retrievalagent import Agent, AzureAISearchBackend

# Native vectorization — service embeds the query server-side
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
    ),
)

# Client-side vectorization
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
        embed_fn=my_embed_fn,
    ),
)

# With Azure semantic reranking
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
        semantic_config="my-semantic-config",
    ),
)
```

### Qdrant

```python
from retrievalagent import Agent, QdrantBackend

rag = Agent(
    index="my_collection",
    backend=QdrantBackend("my_collection", url="http://localhost:6333", embed_fn=my_embed_fn),
)
```

### ChromaDB

```python
from retrievalagent import Agent, ChromaDBBackend

rag = Agent(
    index="my_collection",
    backend=ChromaDBBackend("my_collection", path="./chroma_db", embed_fn=my_embed_fn),
)
```

### LanceDB

```python
from retrievalagent import Agent, LanceDBBackend

rag = Agent(
    index="docs",
    backend=LanceDBBackend("docs", db_uri="./lancedb", embed_fn=my_embed_fn),
)
```

### PostgreSQL + pgvector

```python
from retrievalagent import Agent, PgvectorBackend

rag = Agent(
    index="documents",
    backend=PgvectorBackend(
        "documents",
        dsn="postgresql://user:pass@localhost:5432/mydb",
        embed_fn=my_embed_fn,
    ),
)
```

### DuckDB

```python
from retrievalagent import Agent, DuckDBBackend

rag = Agent(
    index="vectors",
    backend=DuckDBBackend("vectors", db_path="./my.duckdb", embed_fn=my_embed_fn),
)
```

### Meilisearch

```python
from retrievalagent import Agent, MeilisearchBackend

rag = Agent(
    index="articles",
    backend=MeilisearchBackend("articles", url="http://localhost:7700", api_key="masterKey"),
)
```

### InMemory (default, zero dependencies)

```python
from retrievalagent import Agent, InMemoryBackend

backend = InMemoryBackend(embed_fn=my_embed_fn)
backend.add_documents([
    {"content": "RAG combines retrieval with generation", "source": "wiki"},
    {"content": "Vector search finds similar embeddings", "source": "docs"},
])

rag = Agent(index="demo", backend=backend)
```

---

## LLM Configuration

Pass a pre-built LangChain model or use `init_agent` / `Agent.from_model` for string-based init.  
When using `Agent` directly, configure via env vars or pass an explicit model instance.

### OpenAI

```python
from langchain_openai import ChatOpenAI
from retrievalagent import Agent

rag = Agent(
    index="articles",
    llm=ChatOpenAI(model="gpt-5.4", api_key="sk-..."),
    gen_llm=ChatOpenAI(model="gpt-5.4", api_key="sk-..."),
)
```

### Azure OpenAI (explicit keys)

```python
from langchain_openai import AzureChatOpenAI
from retrievalagent import Agent

llm = AzureChatOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    azure_deployment="gpt-5.4",
    api_key="...",
    api_version="2024-12-01-preview",
)
rag = Agent(index="articles", llm=llm, gen_llm=llm)
```

### Azure OpenAI (env vars)

```python
# Set: AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT
from retrievalagent import Agent

rag = Agent(index="articles")  # auto-detected
```

### Azure OpenAI with Managed Identity (no API key)

```python
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from langchain_openai import AzureChatOpenAI
from retrievalagent import Agent

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
llm = AzureChatOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    azure_deployment="gpt-5.4",
    azure_ad_token_provider=token_provider,
    api_version="2024-12-01-preview",
)
rag = Agent(index="articles", llm=llm, gen_llm=llm)
```

### Anthropic Claude

```bash
pip install langchain-anthropic
```

```python
from langchain_anthropic import ChatAnthropic
from retrievalagent import Agent

llm = ChatAnthropic(model="claude-sonnet-4-6", api_key="sk-ant-...")
rag = Agent(index="articles", llm=llm, gen_llm=llm)
```

### Ollama (local, no API key)

```bash
pip install langchain-ollama
```

```python
from langchain_ollama import ChatOllama
from retrievalagent import Agent

rag = Agent(
    index="articles",
    llm=ChatOllama(model="llama3.2", base_url="http://localhost:11434"),
    gen_llm=ChatOllama(model="llama3.2", base_url="http://localhost:11434"),
)
```

### Google Vertex AI

```bash
pip install langchain-google-vertexai
```

```python
from langchain_google_vertexai import ChatVertexAI
from retrievalagent import Agent

llm = ChatVertexAI(model="gemini-2.0-flash", project="my-gcp-project", location="us-central1")
rag = Agent(index="articles", llm=llm, gen_llm=llm)
```

### Separate fast and generation models

Use a cheap/fast model for query rewriting and routing, a powerful model for the final answer:

```python
from langchain_openai import AzureChatOpenAI
from retrievalagent import Agent

fast_llm = AzureChatOpenAI(azure_deployment="gpt-5.4-mini", api_key="...", api_version="2024-12-01-preview")
gen_llm  = AzureChatOpenAI(azure_deployment="gpt-5.4",      api_key="...", api_version="2024-12-01-preview")

rag = Agent(index="articles", llm=fast_llm, gen_llm=gen_llm)
```

---

## Rerankers

### Cohere

```python
from retrievalagent import Agent, CohereReranker

rag = Agent(index="articles", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))
```

### HuggingFace cross-encoder (local, no API key)

```bash
pip install retrievalagent[huggingface]
```

```python
from retrievalagent import Agent, HuggingFaceReranker

rag = Agent(index="articles", reranker=HuggingFaceReranker())

# Multilingual
rag = Agent(index="articles", reranker=HuggingFaceReranker(model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"))
```

### Jina (multilingual API)

```bash
pip install retrievalagent[jina]
```

```python
from retrievalagent import Agent, JinaReranker

rag = Agent(index="articles", reranker=JinaReranker(api_key="..."))  # or JINA_API_KEY env var
```

### rerankers — ColBERT / Flashrank / RankGPT / any cross-encoder

Unified bridge to the [`rerankers`](https://github.com/AnswerDotAI/rerankers) library by answer.ai:

```bash
pip install retrievalagent[rerankers]
```

```python
from retrievalagent import Agent, RerankersReranker

rag = Agent(index="articles", reranker=RerankersReranker("cross-encoder/ms-marco-MiniLM-L-6-v2", model_type="cross-encoder"))
rag = Agent(index="articles", reranker=RerankersReranker("colbert-ir/colbertv2.0", model_type="colbert"))
rag = Agent(index="articles", reranker=RerankersReranker("flashrank", model_type="flashrank"))
rag = Agent(index="articles", reranker=RerankersReranker("gpt-5.4-mini", model_type="rankgpt", api_key="..."))
```

### Embed-anything — Rust-accelerated local embeddings + reranking

Embeddings and reranking in a single Rust-powered package. Fully
local — no API keys, no network calls. Powered by
[embed-anything](https://github.com/StarlightSearch/EmbedAnything).

```bash
pip install retrievalagent[embed-anything]
```

```python
from retrievalagent import Agent, EmbedAnythingEmbedder, EmbedAnythingReranker

# Local embeddings — works as embed_fn (callable)
embedder = EmbedAnythingEmbedder("sentence-transformers/all-MiniLM-L6-v2")

# Local reranker — implements Reranker protocol
reranker = EmbedAnythingReranker("jinaai/jina-reranker-v1-turbo-en")

rag = Agent(
    index="articles",
    backend=QdrantBackend("articles", url="http://localhost:6333", embed_fn=embedder),
    embed_fn=embedder,
    reranker=reranker,
)
```

Mix and match freely — use embed-anything for one piece and a cloud provider for the other:

```python
from retrievalagent import Agent, EmbedAnythingEmbedder, CohereReranker

# Local embeddings + cloud reranker
rag = Agent(index="docs", embed_fn=EmbedAnythingEmbedder(), reranker=CohereReranker())

# Cloud embeddings + local reranker
from retrievalagent import EmbedAnythingReranker
rag = Agent(index="docs", embed_fn=azure_embed_fn, reranker=EmbedAnythingReranker())
```

### Custom reranker

```python
from retrievalagent import Agent, RerankResult

class MyReranker:
    def rerank(self, query: str, documents: list[str], top_n: int) -> list[RerankResult]:
        return [RerankResult(index=i, relevance_score=1.0 / (i + 1)) for i in range(top_n)]

rag = Agent(index="articles", reranker=MyReranker())
```

---

## Tools

When using `invoke_agent`, the LLM has access to a set of tools it can call in any order. No fixed pipeline — the agent decides what it needs.

| Tool | Description |
|------|-------------|
| `get_index_settings()` | Discover filterable, searchable, sortable, and boost fields from the index schema |
| `get_filter_values(field)` | Sample real stored values for a field — used to build precise filter expressions |
| `search_hybrid(query, filter_expr, semantic_ratio, sort_fields)` | BM25 + vector hybrid search with optional filter and sort boost |
| `search_bm25(query, filter_expr)` | Pure keyword search — fallback when hybrid returns poor results |
| `rerank_results(query, hits)` | Re-rank a list of hits with the configured reranker |

The agent follows this reasoning pattern:

1. Call `get_index_settings()` to learn the schema
2. If the question names a specific entity, call `get_filter_values(field)` to find the exact stored value
3. Call `search_hybrid()` with a filter and/or sort if relevant, otherwise broad hybrid search
4. Fall back to `search_bm25()` if results are thin
5. Call `rerank_results()` to surface the most relevant hits
6. Summarise — explaining which filters and signals influenced the answer

```python
from retrievalagent import Agent

rag = Agent(index="products")

# Agent inspects schema, detects brand field, samples values,
# builds filter, sorts by popularity signal — all autonomously
result = rag.invoke_agent("Show me the most popular Bosch power tools")
print(result)
```

---

## Constructor Reference

```python
Agent(
    index="my_index",           # collection / index name
    backend=...,                # SearchBackend (default: InMemoryBackend)
    llm=...,                    # fast LLM — routing, rewrite, filter
    gen_llm=...,                # generation LLM — final answer
    reranker=...,               # Cohere / HuggingFace / Jina / custom
    top_k=10,                   # final result count            [RAG_TOP_K]
    rerank_top_n=5,             # reranker top-n                [RAG_RERANK_TOP_N]
    retrieval_factor=4,         # over-retrieval multiplier     [RAG_RETRIEVAL_FACTOR]
    max_iter=20,                # max retrieve-rewrite cycles   [RAG_MAX_ITER]
    semantic_ratio=0.5,         # hybrid semantic weight        [RAG_SEMANTIC_RATIO]
    fusion="rrf",               # "rrf" or "dbsf"               [RAG_FUSION]
    instructions="",            # extra system prompt for generation
    embed_fn=None,              # (str) -> list[float]
    boost_fn=None,              # (doc_dict) -> float score boost
    filter=None,                # always-on Meilisearch filter expr (e.g. "brand = 'Bosch'")
    category_fields=None,       # fields used by alternative retrieve (None → auto-detect via regex)
    hyde_min_words=8,           # min words to trigger HyDE     [RAG_HYDE_MIN_WORDS]
    hyde_style_hint="",         # style hint for HyDE prompt
    auto_strategy=True,         # auto-tune from index samples
)
```

### Always-on filter

Pin every search to a subset of the index with `filter` — Meilisearch syntax,
AND-joined with any per-call filter (intent, language, ...):

```python
rag = init_agent("products", filter="brand = 'Bosch'")
# every BM25 + vector + hybrid search scoped to Bosch only
```

The legacy `base_filter` kwarg still works but emits a `DeprecationWarning` —
migrate to `filter` at your convenience.

### Category fields (alternative retrieve)

The alternative-retrieve fallback broadens the search by pivoting on
category-like fields (product groups, taxonomy levels, sections, ...). By
default, retrievalagent auto-detects them from the index schema via regex — matching
names like `category`, `product_group_l3`, `article_group_name`, `kategorie`,
`family`, `section`, ... — and prioritises deeper taxonomy levels
(`_l3 > _l2 > _l1`).

Override explicitly when your schema uses unusual names:

```python
rag = init_agent(
    "products",
    category_fields=["taxonomy_leaf", "taxonomy_parent", "department"],
)
```

Pass `category_fields=[]` to disable the fallback entirely.

---

## API Reference

| Method | Returns | Description |
|--------|---------|-------------|
| `rag.invoke(query)` | `RAGState` | Full RAG pipeline (sync) |
| `rag.ainvoke(query)` | `RAGState` | Full RAG pipeline (async) |
| `rag.chat(query, history)` | `RAGState` | Multi-turn chat (sync) |
| `rag.achat(query, history)` | `RAGState` | Multi-turn chat (async) |
| `rag.retrieve_documents(query, top_k)` | `(str, list[Document])` | Retrieve only, no answer |
| `rag.query(query)` | `str` | Answer string directly |
| `rag.invoke_agent(query)` | `str` | Tool-calling agent mode (sync) |
| `rag.ainvoke_agent(query)` | `str` | Tool-calling agent mode (async) |

`RAGState` fields: `answer` · `documents` · `query` · `question` · `history` · `iterations`

---

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint | — |
| `AZURE_OPENAI_API_KEY` | Azure OpenAI API key | — |
| `AZURE_OPENAI_DEPLOYMENT` | Default deployment | — |
| `AZURE_OPENAI_FAST_DEPLOYMENT` | Fast model deployment | → `DEPLOYMENT` |
| `AZURE_OPENAI_GENERATION_DEPLOYMENT` | Generation deployment | → `DEPLOYMENT` |
| `AZURE_OPENAI_API_VERSION` | API version | `2024-12-01-preview` |
| `OPENAI_API_KEY` | OpenAI API key (fallback) | — |
| `OPENAI_MODEL` | OpenAI model name | `gpt-5.4` |
| `AZURE_COHERE_ENDPOINT` | Azure Cohere endpoint | — |
| `AZURE_COHERE_API_KEY` | Azure Cohere API key | — |
| `COHERE_API_KEY` | Cohere API key (fallback) | — |
| `JINA_API_KEY` | Jina reranker API key | — |
| `MEILI_URL` | Meilisearch URL | `http://localhost:7700` |
| `MEILI_KEY` | Meilisearch API key | `masterKey` |
| `RAG_TOP_K` | Final result count | `10` |
| `RAG_RERANK_TOP_N` | Reranker top-n | `5` |
| `RAG_RETRIEVAL_FACTOR` | Over-retrieval multiplier | `4` |
| `RAG_SEMANTIC_RATIO` | Hybrid semantic weight | `0.5` |
| `RAG_FUSION` | Fusion strategy | `rrf` |
| `RAG_HYDE_MIN_WORDS` | Min words to trigger HyDE | `8` |

---

## Tune It For Your Data

**`retrievalagent` ships with curated tuned defaults** in `[tool.retrievalagent]` of `pyproject.toml`,
found by running the built-in tuner against real German/Swiss product catalog
data. These are better than hand-picked defaults for most retrieval tasks.

For peak performance on **your** corpus (product vs. legal vs. support vs.
scientific), run the tuner yourself — it searches 20+ knobs with
[Optuna](https://optuna.org) TPE sampler and usually beats defaults by 3–10%
combined score (hit@5 + paraphrase consistency).

**Real benchmark** (Meilisearch cloud, 3 German product-catalog indexes, 39
hit cases + 8 paraphrase groups):

| Config | hit@5 | consistency | stable_top1 | combined |
|--------|------:|------------:|------------:|---------:|
| Library defaults | 0.903 | 0.750 | 0.250 | 0.727 |
| Shipped tuned (`[tool.retrievalagent]`) | 0.968 | 0.792 | 0.250 | **0.761** |
| Full corpus-tuned (local) | **0.968** | **0.792** | **0.375** | **0.761** |

`combined = hit@5×0.4 + consistency×0.35 + stable_top1×0.25`

> 📘 **Full walkthrough:** [`docs/auto-tuning.md`](docs/auto-tuning.md) — testset
> design, CLI and Python API, every searched parameter, result interpretation,
> troubleshooting.

### 1. Install

```bash
pip install 'retrievalagent[tune]'
```

### 2. Write a testset

A list of `(query, expected_doc_ids, id_field)` tuples — or a JSON file:

```json
[
  {"query": "Makita Akku Bohrhammer 18V", "expected_ids": ["SKU-1065144"], "id_field": "sku"},
  {"query": "Bosch Winkelschleifer 125mm", "expected_ids": ["SKU-1057802"], "id_field": "sku"}
]
```

### 3. Run the tuner

**CLI (fastest path):**

```bash
python -m retrievalagent.tuner \
  --index my_index \
  --hits testset.json \
  --trials 50 \
  --patience 8         # early-stop if 8 trials show no improvement
```

Use `--pyproject` to write to `pyproject.toml [tool.retrievalagent]` instead of the
gitignored `retrievalagent.config.toml`.

**Python API (full control):**

```python
from retrievalagent import MeilisearchBackend, RAGConfig
from retrievalagent.tuner import RAGTuner, load_testset
from retrievalagent.utils import _make_azure_embed_fn

tuner = RAGTuner(
    backend_factory=lambda: MeilisearchBackend(index="my_index"),
    embed_fn=_make_azure_embed_fn(),
    hit_cases=load_testset("testset.json"),
    eval_k=5,
    # Optional: let the tuner mix/match weak + thinking models across
    # cost tiers (gen_llm / strong_model stays fixed — quality-critical).
    candidate_models=["azure:gpt-5.4-mini", "azure:gpt-5.4-nano"],
)

best = tuner.tune(
    n_trials=50,
    patience=8,             # early-stop on plateau
    trial_timeout_s=120,    # hung trials score 0, never block the study
)
best.save_toml("retrievalagent.config.toml")   # gitignored — local override (recommended)
# or: best.save_pyproject()          # [tool.retrievalagent] — commit if your team shares tuning
```

### 4. Use the tuned config

No code change required — `AgenticRAG` picks up `[tool.retrievalagent]` automatically:

```python
from retrievalagent import AgenticRAG, RAGConfig, MeilisearchBackend

rag = AgenticRAG(
    index="my_index",
    backend=MeilisearchBackend("my_index"),
    embed_fn=embed_fn,
    config=RAGConfig.auto(),   # discovers pyproject.toml → retrievalagent.config.toml → env
)
```

### Config discovery order

1. **Runtime kwarg** — `AgenticRAG(config=RAGConfig(...))`
2. **`retrievalagent.config.toml`** — per-deployment local override. **Gitignored by
   default** so your corpus-specific tuning doesn't leak into source control.
   Wins over pyproject defaults — drop a tuned file here and you're done.
3. **`[tool.retrievalagent]` in `pyproject.toml`** — shipped / shared defaults.
   `retrievalagent` ships with curated values here (tuned on real product-catalog data).
   Matches ruff/black/mypy convention for committed tool config.
4. **`RAG_*` env vars** — containers/CI overrides.
5. **Library defaults** — fallback if nothing else is set.

### What gets tuned

**Scalar thresholds:**

| Parameter | Range | Effect |
|-----------|-------|--------|
| `retrieval_factor` | 2–8 | Over-retrieve multiplier before reranking |
| `rerank_top_n` | 3–10 | Docs kept post-rerank |
| `rerank_cap_multiplier` | 1.5–4 | Caps reranker input at `top_k × m` |
| `semantic_ratio` | 0.3–0.9 | BM25 ⇄ vector balance |
| `fusion` | `rrf` / `dbsf` | Score fusion strategy |
| `short_query_threshold` | 3–8 | When to skip LLM preprocessing |
| `short_query_sort_tokens` | bool | Sort tokens for paraphrase invariance |
| `bm25_fallback_semantic_ratio` | 0.7–1.0 | Semantic ratio used when BM25 fails |
| `rerank_skip_gap` | 0.05–0.3 | Top-1 vs top-5 score gap to skip reranker |
| `name_field_boost_max` | 0.0–0.5 | Post-rerank boost for docs matching query tokens in `name_field`. Higher → precise lookups win; lower → paraphrase stability. |

**Optional (`None` = disable stage, first-class tuning option):**

| Parameter | Range / None | Effect |
|-----------|--------------|--------|
| `bm25_fallback_threshold` | 0.2–0.6 / `None` | BM25 score below which we boost semantic. `None` = never boost. |
| `fast_accept_score` | 0.5–0.95 / `None` | BM25 score to accept fast path. `None` = always slow path. |
| `fast_accept_confidence` | 0.6–0.95 / `None` | LLM confidence to accept fast path. `None` = no LLM confirm. |
| `rerank_skip_dominance` | 0.6–0.95 / `None` | Score to skip reranker on obvious hits. `None` = always rerank. |
| `expert_threshold` | 0.05–0.3 / `None` | Gap to escalate to expert reranker. `None` = never escalate. |
| `hyde_min_words` | 2–20 / `None` | Min words to trigger HyDE. `None` = disable HyDE. |

**Pipeline-stage toggles:**

| Parameter | Default | Effect |
|-----------|---------|--------|
| `enable_hyde` | `true` | Hypothetical-document expansion |
| `enable_filter_intent` | `true` | LLM detects filter intent from query |
| `enable_quality_gate` | `true` | LLM judges retrieval quality before answering |
| `enable_preprocess_llm` | `true` | LLM query rewrite + variant generation |
| `enable_reasoning` | `true` | Per-document relevance reasoning (uses `thinking_model`) |

**LLM tiers (cost ⇄ quality, mix and match):**

| Parameter | Role | Typical cheap pick |
|-----------|------|--------------------|
| `strong_model` | Final cited answer — gen_llm | `azure:gpt-5.4` |
| `weak_model` | Preprocess / quality / filter-intent / rewrite | `azure:gpt-5.4-mini` |
| `thinking_model` | Per-document reasoning when `enable_reasoning=true` | `azure:gpt-5.4-mini` |

Specs are `provider:model` — resolved via LangChain's `init_chat_model`, so
every supported provider works (`azure:`, `openai:`, `anthropic:`,
`bedrock:`, `ollama:`, etc.). `None` = inherit from env vars.

Optuna's TPE sampler learns from prior trials — **50 trials usually beats
hand-tuning**. The tuner explores *disabling* each optional stage as a
first-class hypothesis, so it can discover e.g. "your corpus doesn't need
filter-intent detection." Use a cheap LLM (`gpt-4o-mini`) during tuning to
keep cost down; swap to production LLM afterwards.

### Disabling stages in TOML

TOML has no `null`, so disabled fields go in a `disable` list:

```toml
[tool.retrievalagent]
top_k = 10
semantic_ratio = 0.5
# disable these optional stages for this corpus:
disable = ["bm25_fallback_threshold", "expert_threshold"]
```

`RAGConfig.from_toml()` / `from_pyproject()` translates `disable = [...]`
into `None` values on load.

---

## CLI

```bash
pip install retrievalagent[recommended]

# Guided setup wizard — choose LLM, embedder, backend, reranker
retrievalagent

# Chat mode — full agentic pipeline
retrievalagent --chat -c my_index

# Retriever mode — documents only, no LLM
retrievalagent --retriever -c my_index

# Skip wizard, use env vars
retrievalagent --skip-wizard -c my_index
```

The wizard guides you through:
1. **LLM provider** — OpenAI, Anthropic, Ollama, or env default
2. **Embedding model** — OpenAI, Azure OpenAI, Ollama, or none (BM25 only)
3. **Vector store** — InMemory, Meilisearch, ChromaDB, Qdrant, pgvector, DuckDB, LanceDB, Azure AI Search
4. **Reranker** — Cohere, Jina, HuggingFace, LLM-based, or none
5. **Mode** — Chat (with answers) or Retriever (documents only)

---

## License

MIT — *Licence to code.*
