Metadata-Version: 2.4
Name: ai-prishtina-agentic-rag
Version: 1.0.3
Summary: A comprehensive, professional-grade agentic Retrieval-Augmented Generation (RAG) library — core building blocks for building RAG applications
Author-email: "Alban Maxhuni, PhD" <info@albanmaxhuni.com>
License: AGPL-3.0-or-later OR Commercial
Project-URL: Homepage, https://github.com/albanmaxhuni/ai-prishtina-agentic-rag
Project-URL: Documentation, https://ai-prishtina-agentic-rag.readthedocs.io
Project-URL: Repository, https://github.com/albanmaxhuni/ai-prishtina-agentic-rag
Project-URL: Issues, https://github.com/albanmaxhuni/ai-prishtina-agentic-rag/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.10.0
Requires-Dist: typing-extensions>=4.12.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: requests>=2.32.0
Requires-Dist: aiohttp>=3.11.0
Requires-Dist: tenacity>=9.0.0
Requires-Dist: rich>=13.9.0
Requires-Dist: tqdm>=4.67.0
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: structlog>=24.4.0
Requires-Dist: tiktoken>=0.8.0
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=3.3.0; extra == "embeddings"
Provides-Extra: openai
Requires-Dist: openai>=1.58.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40.0; extra == "anthropic"
Provides-Extra: cohere
Requires-Dist: cohere>=5.13.0; extra == "cohere"
Provides-Extra: ollama
Requires-Dist: ollama>=0.4.0; extra == "ollama"
Provides-Extra: llama-cpp
Requires-Dist: llama-cpp-python>=0.3.0; extra == "llama-cpp"
Provides-Extra: local-llm
Requires-Dist: transformers>=4.47.0; extra == "local-llm"
Requires-Dist: torch>=2.5.0; extra == "local-llm"
Provides-Extra: llm-all
Requires-Dist: openai>=1.58.0; extra == "llm-all"
Requires-Dist: anthropic>=0.40.0; extra == "llm-all"
Requires-Dist: cohere>=5.13.0; extra == "llm-all"
Requires-Dist: ollama>=0.4.0; extra == "llm-all"
Requires-Dist: llama-cpp-python>=0.3.0; extra == "llm-all"
Requires-Dist: transformers>=4.47.0; extra == "llm-all"
Requires-Dist: torch>=2.5.0; extra == "llm-all"
Provides-Extra: chroma
Requires-Dist: chromadb>=0.5.15; extra == "chroma"
Provides-Extra: pinecone
Requires-Dist: pinecone>=5.4.2; extra == "pinecone"
Provides-Extra: weaviate
Requires-Dist: weaviate-client>=4.11.0; extra == "weaviate"
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.9.0; extra == "faiss"
Provides-Extra: vector-all
Requires-Dist: chromadb>=0.5.15; extra == "vector-all"
Requires-Dist: pinecone>=5.4.2; extra == "vector-all"
Requires-Dist: weaviate-client>=4.11.0; extra == "vector-all"
Requires-Dist: faiss-cpu>=1.9.0; extra == "vector-all"
Provides-Extra: documents
Requires-Dist: pypdf>=5.1.0; extra == "documents"
Requires-Dist: python-docx>=1.1.2; extra == "documents"
Requires-Dist: beautifulsoup4>=4.12.3; extra == "documents"
Requires-Dist: markdown>=3.7.0; extra == "documents"
Requires-Dist: langchain>=0.3.7; extra == "documents"
Provides-Extra: pdf-advanced
Requires-Dist: docling>=2.0; extra == "pdf-advanced"
Requires-Dist: unstructured[all-docs]>=0.16; extra == "pdf-advanced"
Requires-Dist: camelot-py>=0.11; extra == "pdf-advanced"
Requires-Dist: kreuzberg>=0.1; extra == "pdf-advanced"
Requires-Dist: pdf2image>=1.17; extra == "pdf-advanced"
Requires-Dist: pymupdf>=1.25; extra == "pdf-advanced"
Provides-Extra: pdf-docling
Requires-Dist: docling>=2.0; extra == "pdf-docling"
Provides-Extra: pdf-unstructured
Requires-Dist: unstructured[all-docs]>=0.16; extra == "pdf-unstructured"
Provides-Extra: pdf-camelot
Requires-Dist: camelot-py>=0.11; extra == "pdf-camelot"
Requires-Dist: ghostscript; extra == "pdf-camelot"
Provides-Extra: pdf-kreuzberg
Requires-Dist: kreuzberg>=0.1; extra == "pdf-kreuzberg"
Provides-Extra: nlp
Requires-Dist: spacy>=3.8.2; extra == "nlp"
Requires-Dist: nltk>=3.9.1; extra == "nlp"
Provides-Extra: multimodal
Requires-Dist: Pillow>=11.0.0; extra == "multimodal"
Requires-Dist: pytesseract>=0.3.13; extra == "multimodal"
Requires-Dist: pydub>=0.25.1; extra == "multimodal"
Requires-Dist: SpeechRecognition>=3.10.1; extra == "multimodal"
Requires-Dist: opencv-python>=4.10.0.84; extra == "multimodal"
Provides-Extra: web-tools
Requires-Dist: duckduckgo-search>=7.3.1; extra == "web-tools"
Provides-Extra: cli
Requires-Dist: click>=8.1.7; extra == "cli"
Provides-Extra: server
Requires-Dist: fastapi>=0.115.6; extra == "server"
Requires-Dist: uvicorn[standard]>=0.32.1; extra == "server"
Requires-Dist: redis>=5.2.1; extra == "server"
Requires-Dist: python-jose[cryptography]>=3.3.0; extra == "server"
Provides-Extra: observability
Requires-Dist: opentelemetry-api>=1.29.0; extra == "observability"
Requires-Dist: opentelemetry-sdk>=1.29.0; extra == "observability"
Requires-Dist: opentelemetry-instrumentation-fastapi>=0.48b0; extra == "observability"
Provides-Extra: dev
Requires-Dist: pytest>=8.3.3; extra == "dev"
Requires-Dist: pytest-asyncio>=0.24.0; extra == "dev"
Requires-Dist: pytest-cov>=6.0.0; extra == "dev"
Requires-Dist: black>=24.10.0; extra == "dev"
Requires-Dist: isort>=5.13.2; extra == "dev"
Requires-Dist: flake8>=7.1.1; extra == "dev"
Requires-Dist: mypy>=1.13.0; extra == "dev"
Requires-Dist: pre-commit>=4.0.1; extra == "dev"
Requires-Dist: httpx>=0.28.1; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6.1; extra == "docs"
Requires-Dist: mkdocs-material>=9.5.49; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.27.2; extra == "docs"
Provides-Extra: all
Requires-Dist: ai-prishtina-agentic-rag[dev,docs,documents,llm-all,multimodal,nlp,observability,pdf-advanced,vector-all,web-tools]; extra == "all"
Dynamic: license-file

# AI Prishtina · Agentic RAG

<div align="center">

<img src="https://cdn.buymeacoffee.com/uploads/cover_images/2025/07/Z2zeOfmCLnzhA78njCsfaRNA0ebw0dlXo53HmGhq.jpg@2560w_0e.webp" alt="AI-Prishtina Logo" width="100%">


<p align="center">
  <a href="https://pypi.org/project/ai-prishtina-agentic-rag/"><img src="https://img.shields.io/pypi/v/ai-prishtina-agentic-rag.svg" alt="PyPI version"></a>
  <a href="https://pepy.tech/project/ai-prishtina-agentic-rag"><img src="https://static.pepy.tech/badge/ai-prishtina-agentic-rag" alt="Total Downloads"></a>
  <a href="https://pepy.tech/project/ai-prishtina-agentic-rag"><img src="https://static.pepy.tech/badge/ai-prishtina-agentic-rag/month" alt="Monthly Downloads"></a>
  <a href="https://pepy.tech/project/ai-prishtina-agentic-rag"><img src="https://static.pepy.tech/badge/ai-prishtina-agentic-rag/week" alt="Weekly Downloads"></a>
  <a href="https://pypi.org/project/ai-prishtina-agentic-rag/"><img src="https://img.shields.io/pypi/dm/ai-prishtina-agentic-rag.svg" alt="PyPI Downloads"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python 3.8+"></a>
  <a href="https://albanmaxhuni.com"><img src="https://img.shields.io/badge/License-Commercial-green.svg" alt="License: Commercial"></a>
  <a href="https://github.com/albanmaxhuni/ai-prishtina-agentic-rag/actions"><img src="https://img.shields.io/github/actions/workflow/status/albanmaxhuni/ai-prishtina-agentic-rag/ci.yml?label=tests" alt="Tests"></a>
  <a href="./htmlcov/index.html"><img src="https://img.shields.io/badge/coverage-99%25-brightgreen.svg" alt="Coverage"></a>
</p>

**Your documents had one job. Now they have an agent.**

A Python library for **agentic retrieval-augmented generation (RAG)** that turns your documents into a self-improving, tool-wielding, multi-agent knowledge system. Ships with 18 LLM providers, 13 vector stores, a cognitive layer that *literally critiques its own answers*, and enough integrations to automate your entire Tuesday. Current release: **v0.1.3**.

</div>

## Table of contents

- [Why this library?](#why-this-library)
- [Key features](#key-features)
 - [AC-RAG Cognitive Layer](#the-brain--adaptive-cognitive-rag-ac-rag)
 - [Novel Retrieval Patterns](#the-secret-sauce--novel-retrieval-patterns)
 - [Clean Architecture](#the-janitor--clean-architecture)
 - [Production Infrastructure](#the-suit--production-grade-infrastructure)
 - [18 LLM Providers](#the-muscles--18-llm-providers)
 - [13 Vector Stores](#the-memory--13-vector-stores)
 - [Core Tools](#the-hands--core-tools)
 - [Contrib Integrations](#the-wardrobe--contrib-integrations)
- [Package structure](#package-structure)
- [Quick start](#quick-start)
 - [Installation](#installation)
 - [Basic usage](#basic-usage)
- [The full cookbook](#the-full-cookbook)
 - [Recipe 1 — Hello, RAG world](#)
 - [Recipe 2 — Agentic mode](#)
 - [Recipe 3 — Cognitive pipeline](#)
 - [Recipe 4 — LLM providers (18 options)](#)
 - [Recipe 5 — Vector stores (13 options)](#)
 - [Recipe 6 — Document processing](#)
 - [Recipe 7 — Graph RAG](#)
 - [Recipe 8 — Streaming](#)
 - [Recipe 9 — Evaluation](#)
 - [Recipe 10 — Production hardening](#)
 - [Recipe 11 — Contrib tools](#)
 - [Recipe 12 — Custom tools](#)
 - [Recipe 13 — FastAPI server](#)
 - [Recipe 14 — Bulk document loading](#)
 - [Recipe 15 — Parent-child chunking](#)
 - [Recipe 16 — ReAct Agent](#)
 - [Recipe 17 — Conversation memory](#)
 - [Recipe 18 — Monitoring dashboard](#)
 - [Recipe 19 — Prompt optimization (DSPy-style)](#)
 - [Recipe 20 — Migration from LlamaIndex/LangChain](#)
 - [Recipe 21 — Prompt versioning](#)
 - [Recipe 22 — Workflow orchestration UI](#)

---

## Why this library?

Most RAG libraries stop at "retrieve then generate." This one keeps going.

It plans multi-step queries, remembers what worked last time, critiques its own answers, fuses knowledge from multiple sources, and — if you let it — sends a Slack message about the results. Think of it as the difference between a search bar and an intern who actually reads the documents.

```python
# TL;DR — three lines to go from "I have documents" to "I have answers"
from agentic_rag import AgenticRAG
rag = AgenticRAG(vector_store=my_store, llm_provider=my_llm, enable_agent=True)
response = await rag.aquery("Summarize Q4 revenue trends", use_tools=True)
```

---

## Key features

### The Brain — Adaptive Cognitive RAG (AC-RAG)

The optional metacognitive layer that makes this library *agentic* rather than just *retrieval-augmented*.

| Component | What it does | Module |
|-----------|-------------|--------|
| **Neural Query Router** | Routes queries to the optimal retrieval strategy (rule-based + LLM fallback) | `cognitive.query_router` |
| **Reflective Agent** | Critiques its own answers and iteratively improves them | `cognitive.reflective_agent` |
| **Hierarchical Memory** | Three-tier memory (episodic / semantic / procedural) that learns from interactions | `cognitive.hierarchical_memory` |
| **Progressive Retrieval** | Reformulates queries when initial retrieval quality is low | `cognitive.progressive_retrieval` |
| **Calibrated Confidence** | Platt-scaled confidence scores trained against actual accuracy | `cognitive.confidence` |
| **Knowledge Fusion** | Merges results from multiple sources with learned trust weights | `cognitive.knowledge_fusion` |
| **Tool Composer** | Discovers tools and builds DAG-based execution chains automatically | `cognitive.tool_composer` |
| **Multi-Agent Orchestrator** | Event-driven multi-agent collaboration for complex queries | `cognitive.multi_agent` |
| **Neural Classifier** | Sub-millisecond DistilBERT-based intent classification | `cognitive.neural_classifier` |
| **Query Decomposer** | Breaks complex questions into sub-queries with dependency tracking | `cognitive.agentic_components` |
| **Query Rewriter** | Multi-query, step-back, and sub-question rewriting strategies | `cognitive.query_rewriter` |
| **Corrective RAG (CRAG)** | Evaluate retrieval quality → refine or web-fallback | `cognitive.corrective_rag` |
| **Self-RAG** | Four-checkpoint pipeline: retrieve? relevant? supported? useful? | `cognitive.self_rag` |
| **Adaptive RAG** | Decides *whether* to retrieve based on query complexity | `cognitive.adaptive_rag` |
| **Speculative RAG** | Parallel draft generation → verification → best pick | `cognitive.speculative_rag` |
| **ReAct Agent** | Interleaved reasoning + acting with automatic tool selection | `cognitive.react_agent` |

### The Secret Sauce — Novel Retrieval Patterns

Research-backed retrieval techniques that no other PyPI RAG library bundles together. Because why settle for "good enough" when you can have "actually novel"?

| Feature | What it does (in plain English) | Module |
|---------|------------------------------|--------|
| **HyDE** | Generates a fake-but-plausible answer, embeds that instead of your question, and retrieves docs that match the hypothetical. It sounds wrong, but it works. | `retrieval.hyde` |
| **RAPTOR** | Builds a hierarchical tree of document summaries at multiple abstraction levels. Retrieves from the right level based on query specificity. Like a Russian nesting doll, but for knowledge. | `retrieval.raptor` |
| **Citation Grounding** | Maps every sentence in the LLM's answer back to specific source chunks with `[1]` markers. Because "trust me bro" is not a valid citation format. | `core.citation` |

### The Janitor — Clean Architecture

We just Marie Kondo'd the codebase. Deprecated modules? Thanked them for their service and sent them packing. What remains is a clean, logical structure where:
- `providers/` holds all your LLMs, embeddings, and vector stores (because they *provide* value)
- `document_processing/` has loaders and chunkers in their own subdirectories (no more flat chaos)
- Everything else is exactly where your intuition expects it to be

If it doesn't spark joy (or serve a purpose), it's gone. The imports work. The structure makes sense. You're welcome.

---

### The Suit — Production-Grade Infrastructure

Enterprise features that keep your RAG system running in production without embarrassing itself. Or you.

| Feature | What it does (in plain English) | Module |
|---------|------------------------------|--------|
| **Per-Document Access Control** | ACL-based document filtering during retrieval. Each document gets `read_groups`/`read_users`. Essential for multi-tenant deployments where not everyone should see everything. | `core.access_control` |
| **Streaming Structured Output** | SSE-compatible JSON events (answer chunk, source, confidence, tool_call) for real-time UIs. Your frontend will thank you. | `core.structured_stream` |
| **Prompt Compression** | LLMLingua-inspired context compression. Fits more context into your token budget while preserving relevance. Because LLMs are expensive and tokens don't grow on trees. | `core.prompt_compression` |
| **Answer Scorer** | Automatic answer quality scoring with reference-free and reference-based modes. Integrates with the feedback loop for continuous improvement. It grades your homework so you don't have to. | `evaluation.answer_scorer` |

### The Muscles — 18 LLM Providers

OpenAI, Anthropic, Cohere, Gemini, Mistral, Ollama, Groq, DeepSeek, xAI/Grok, AWS Bedrock, Azure OpenAI, Together.ai, AI21, Fireworks, Perplexity, llama.cpp, HuggingFace local models, and any OpenAI-compatible API. Swap one line, keep everything else.

### The Memory — 13 Vector Stores

ChromaDB, FAISS, Pinecone, Weaviate, Qdrant, pgvector, Milvus, Redis, MongoDB Atlas, Elasticsearch, Vespa, Azure AI Search, and an in-memory store for tests.

### The Hands — Core Tools

| Category | Tools | Import |
|----------|-------|--------|
| **Search** | `WebSearchTool`, `WebScrapeTool`, `HTTPRequestTool`, `APISpecTool`, `GraphQLTool` | `tools.search` |
| **Calculation** | `CalculatorTool`, `StatisticsTool`, `UnitConverterTool` | `tools.calculation` |
| **Code** | `CodeExecutorTool` (sandboxed Python/JS) | `tools.code` |
| **File** | `FileReadTool`, `FileWriteTool`, `FileListTool`, `DocumentLoaderTool` | `tools.file` |
| **Data** | `JSONTool`, `SQLTool` | `tools.data` |
| **Utility** | `VectorStoreTool`, `KnowledgeGraphTool`, `DateTimeTool`, `TextProcessingTool`, `TextSummarizerTool` | `tools.utility` |

### The Wardrobe — Contrib Integrations

Optional integrations that live in `contrib/` because not every RAG system needs to send a Slack message.

| Category | Tools | Import path |
|----------|-------|-------------|
| **Communication** | `EmailTool`, `GoogleCalendarTool`, `SlackTool`, `DiscordTool`, `JiraTool` | `contrib.communication` |
| **Productivity** | `NotionTool`, `ConfluenceTool`, `AsanaTool`, `TrelloTool`, `LinearTool`, `ClickUpTool`, `AirtableTool`, `MondayTool` | `contrib.productivity` |
| **API Integrations** | `StripeTool`, `TwilioTool`, `SendGridTool`, `PagerDutyTool`, `DatadogTool`, `ZapierTool` | `contrib.api_integrations` |
| **Storage** | `S3Tool`, `GCSTool`, `AzureBlobTool`, `MemoryTool` | `contrib.storage` |
| **DevOps** | `GitTool`, `GitHubTool` | `contrib.devops` |
| **Media** | `AudioTool`, `WhisperTranscriptionTool`, `ImageGenerationTool`, `ImageEditTool` | `contrib.media` |
| **Database** | `MongoDBTool`, `RedisTool`, `PostgreSQLTool` | `contrib.database` |
| **Infrastructure** | `TenantManager`, `ModelCompressor`, `FederatedCoordinator` | `contrib.infrastructure` |

### The Librarian — Document Processing

Ingest anything short of a napkin sketch (we're working on it): PDF, DOCX, HTML, Markdown, CSV, Excel, JSON, XML, PowerPoint, EPUB, ZIP archives, Jupyter notebooks, images (OCR), audio (Whisper), and video (frame + audio analysis).

**Organization that makes sense:**
- `document_processing.loaders.*` — 20+ document loaders (PDFLoader, DocxLoader, HTMLLoader, etc.)
- `document_processing.chunkers.*` — 10+ chunking strategies (FixedSizeChunker, SemanticChunker, MarkdownChunker, CodeChunker, AgenticChunker, etc.)

No more "where did I put that import?" moments. Everything is exactly where it should be.

### The Armor — Production Infrastructure

| Component | What it does |
|-----------|-------------|
| `SemanticCache` | In-memory or Redis — stop paying for the same question twice |
| `CircuitBreaker` | Prevent cascading failures when your LLM provider has a bad day |
| `OutputGuardrails` | PII redaction, toxicity filtering, content validation |
| `CostTracker` | Token usage monitoring with configurable pricing |
| `FeedbackLoop` | Collect feedback, track trends, learn from interactions |
| `DocumentVersionStore` | Diff-based change tracking with SHA-256 hashing |
| `BatchIngestionPipeline` | Async bulk ingestion with back-pressure and retry |
| `ModelCompressor` | INT8/FP16 quantization, ONNX export, pruning for edge deployment |
| `FederatedCoordinator` | Privacy-preserving FedAvg strategy learning across deployments |
| OpenTelemetry Tracing | Distributed tracing via `get_tracer()` |

### The Judge — Evaluation & Monitoring

`ComprehensiveEvaluator` (relevance, faithfulness, answer quality, latency), `RAGBenchmark` + `PerformanceBenchmark` suites, and `ABTest` for statistically rigorous variant comparison. 298+ tests in the suite.

### Developer experience

- **Configuration-first**: YAML, INI, or env vars — no hardcoded models or thresholds
- **Factory pattern**: `create_tool("web_search")`, `create_provider("openai")` — config-driven instantiation
- **Type safety**: Pydantic v2 models across all public APIs
- **Modular**: Swap vector stores, LLM providers, tools, and cognitive components independently

## Package structure

```
agentic_rag/
├── base/                   # Abstract base classes (BaseTool, BaseProvider, ...)
├── factories/              # Factory pattern (create_tool, create_provider, ...)
├── core/                   # AgenticRAG, planner, orchestrator, memory, cache, guardrails
├── cognitive/              # AC-RAG: router, reflection, memory, fusion, multi-agent
├── tools/                  # Core tools (search, calculation, code, file, data, utility)
├── contrib/                # Optional integrations
│   ├── communication/      #   Slack, email, calendar, Jira, Discord
│   ├── productivity/       #   Notion, Confluence, Asana, Trello, Linear, ...
│   ├── api_integrations/   #   Stripe, Twilio, SendGrid, PagerDuty, ...
│   ├── storage/            #   S3, GCS, Azure Blob
│   ├── devops/             #   Git, GitHub
│   ├── media/              #   Audio, image generation (DALL-E)
│   ├── database/           #   MongoDB, Redis, PostgreSQL
│   └── infrastructure/     #   Multi-tenancy, model compression, federated learning
├── providers/              # Everything that provides a service
│   ├── llm/                #   18 LLM providers (was: llm/)
│   ├── embeddings/         #   8 embedding providers + cache/utils (was: embeddings/)
│   └── vector_stores/      #   13 vector store backends
├── retrieval/              # Retrievers, rerankers, BM25, ColBERT
├── document_processing/    # Loaders, chunkers, preprocessors (organized in subdirs)
│   ├── loaders/            #   20+ document loaders (pdf, docx, html, etc.)
│   └── chunkers/           #   10+ chunking strategies
├── strategies/             # Pluggable chunking & retrieval strategies
├── graph/                  # Knowledge graph, entity extraction, graph retrieval
├── evaluation/             # Metrics, benchmarks, A/B testing
├── server/                 # FastAPI app (optional)
└── utils/                  # Config, exceptions, logging
```

---

## Quick start

### Installation

```bash
# The basics
pip install ai-prishtina-agentic-rag

# I want everything and I want it now
pip install ai-prishtina-agentic-rag[all]

# I'm a responsible adult who only installs what I need
pip install ai-prishtina-agentic-rag[openai,chroma]

# Development (you beautiful contributor, you)
pip install -e .[dev]
```

| Use case | Install command |
|----------|-----------------|
| Core library | `pip install ai-prishtina-agentic-rag` |
| Single vector backend | `pip install ai-prishtina-agentic-rag[chroma]` (or `pinecone`, `weaviate`, `faiss`) |
| All vector backends | `pip install ai-prishtina-agentic-rag[vector-all]` |
| LLM providers | `pip install ai-prishtina-agentic-rag[openai]` (or `anthropic`, `cohere`, `llm-all`) |
| Document processing | `pip install ai-prishtina-agentic-rag[documents,nlp,multimodal]` |
| PDF (advanced) | `pip install ai-prishtina-agentic-rag[pdf-advanced]` (docling, unstructured, camelot, kreuzberg) |
| PDF (tables only) | `pip install ai-prishtina-agentic-rag[pdf-camelot]` |
| PDF (figures/equations) | `pip install ai-prishtina-agentic-rag[pdf-docling]` |
| Production API | `pip install ai-prishtina-agentic-rag[server,observability]` |

### Environment setup

```bash
# Required: at least one LLM provider key
export OPENAI_API_KEY=sk-your-key-here

# Optional: more providers, vector stores, tool keys
export ANTHROPIC_API_KEY=sk-ant-...
export PINECONE_API_KEY=...
export SERP_API_KEY=...       # for WebSearchTool
```

---

## The full cookbook

<details>
<summary><strong>Recipe 1 — Hello, RAG world</strong></summary>

The absolute minimum to go from zero to answers. No bells, no whistles, just vibes.

### Basic Setup
```python
import asyncio
from agentic_rag import AgenticRAG
from agentic_rag.providers.llm import OpenAIProvider
from agentic_rag.providers.vector_stores import InMemoryVectorStore

async def main():
    store = InMemoryVectorStore()
    llm = OpenAIProvider(api_key="sk-...", model="gpt-4o")

    rag = AgenticRAG(vector_store=store, llm_provider=llm)

    await rag.add_documents([
        {"content": "Python was created by Guido van Rossum in 1991."},
        {"content": "The Zen of Python includes 'Beautiful is better than ugly.'"},
    ])

    response = await rag.aquery("Who created Python and when?")
    print(response.answer)
    # Output: Python was created by Guido van Rossum in 1991.

asyncio.run(main())
```

### Load Documents from Files
```python
from agentic_rag.document_processing.loaders import (
    PDFLoader, DocxLoader, TextLoader, MarkdownLoader,
    HTMLLoader, JSONLoader, CSVLoader
)

# PDF documents (basic extraction with PyPDF2)
docs = PDFLoader().load("report.pdf")
await rag.add_documents([{"content": d.content, "metadata": d.metadata} for d in docs])

# Advanced PDF extraction - tables, figures, equations
# Install: pip install ai-prishtina-agentic-rag[pdf-advanced]
docs = PDFLoader(
    extraction_backend="docling",  # or "unstructured", "camelot", "kreuzberg"
    extract_tables=True,
    extract_figures=True,
    extract_equations=True
).load("research_paper.pdf")

# Word documents
docs = DocxLoader().load("contract.docx")
await rag.add_documents([{"content": d.content, "metadata": d.metadata} for d in docs])

# Text files
docs = TextLoader(encoding="utf-8").load("notes.txt")

# Markdown
docs = MarkdownLoader().load("documentation.md")

# HTML (web pages)
docs = HTMLLoader().load("page.html")

# Structured data
json_docs = JSONLoader().load("data.json")
csv_docs = CSVLoader().load("spreadsheet.csv")
```

### Smart Chunking
```python
from agentic_rag.document_processing.chunkers import (
    SemanticChunker, FixedSizeChunker, MarkdownChunker,
    CodeChunker, Language
)

# Semantic chunking (preserves sentence boundaries)
chunker = SemanticChunker(max_chunk_size=1000, overlap=100)
chunks = chunker.chunk(long_document)

# Fixed size with overlap
chunker = FixedSizeChunker(chunk_size=500, overlap=50)

# Markdown-aware (respects headers)
chunker = MarkdownChunker(max_chunk_size=1500, header_split_levels=[1, 2, 3])

# Code-aware chunking
chunker = CodeChunker(language=Language.PYTHON, max_chunk_size=2000)
```

### Query with Options
```python
# Basic query
response = await rag.aquery("What is machine learning?")

# With filters
response = await rag.aquery(
    "Q4 revenue",
    filters={"source": "financial_reports", "year": 2024}
)

# With source citations
response = await rag.aquery(
    "Explain neural networks",
    include_citations=True,
    max_sources=5
)
print(response.answer)  # Includes [1], [2] citations
print(response.sources)  # List of source documents

# Control creativity
response = await rag.aquery(
    "Write a poem about AI",
    temperature=0.9,
    max_tokens=500
)
```

**Why you need this:** Because every journey starts with a single step, and this is that step. It's the "hello world" of making your documents actually useful instead of just taking up disk space.

</details>

<details>
<summary><strong>Recipe 2 — Agentic mode (let it plan)</strong></summary>

When your question is too complex for a single retrieval pass, let the planner break it down.

### Basic Agent Setup
```python
from agentic_rag.tools import WebSearchTool, CalculatorTool

rag = AgenticRAG(
    vector_store=store,
    llm_provider=llm,
    enable_agent=True,
    enable_memory=True,
)

rag.register_tool(WebSearchTool(api_key="your-serp-key"))
rag.register_tool(CalculatorTool())

response = await rag.aquery(
    "Find the GDP of France and Germany, then calculate the difference",
    enable_planning=True,
    use_tools=True,
)
print(response.answer)
print("Steps taken:", response.reasoning_steps)
print("Confidence:", response.confidence)
```

### All Available Tools
```python
from agentic_rag.tools import (
    # Search tools
    WebSearchTool,      # Google/Bing search via SERP API
    WebScrapeTool,      # Scrape web pages
    HTTPRequestTool,    # Generic HTTP requests
    APISpecTool,        # Read OpenAPI specs
    GraphQLTool,        # GraphQL queries
    
    # Calculation tools
    CalculatorTool,     # Math expressions
    StatisticsTool,     # Statistical analysis
    UnitConverterTool,  # Unit conversions
    
    # Code execution
    CodeExecutorTool,   # Sandboxed Python/JS execution
    
    # File operations
    FileReadTool,       # Read local files
    FileWriteTool,      # Write files
    FileListTool,       # List directories
    DocumentLoaderTool, # Load docs with auto-detection
    
    # Data processing
    JSONTool,           # JSON operations
    SQLTool,            # SQL queries on data
    
    # Knowledge tools
    VectorStoreTool,    # Direct vector search
    KnowledgeGraphTool, # Graph queries
    DateTimeTool,       # Date/time operations
    TextProcessingTool, # Text transformations
    TextSummarizerTool, # Summarize text
)

# Register multiple tools
rag.register_tools([
    WebSearchTool(api_key=os.getenv("SERP_API_KEY")),
    CalculatorTool(),
    CodeExecutorTool(allowed_languages=["python", "javascript"]),
    FileReadTool(base_path="/app/data"),
    JSONTool(),
    DateTimeTool(),
])
```

### Multi-Step Planning
```python
# Complex query requiring multiple steps
response = await rag.aquery(
    """
    Research the top 3 LLM models released in 2024,
    calculate their average parameter count,
    and save the results to a JSON file
    """,
    enable_planning=True,
    use_tools=True,
    max_steps=10,  # Allow up to 10 planning steps
)

# The agent will:
# 1. Search for "top LLM models 2024"
# 2. Extract parameter counts from results
# 3. Use Calculator to compute average
# 4. Use FileWriteTool to save JSON
```

### Tool Chains
```python
from agentic_rag.cognitive import ToolComposer

# Auto-compose tool chains for complex workflows
composer = ToolComposer(rag)
chain = composer.create_chain([
    "web_search",
    "web_scrape",
    "text_summarize",
    "file_write"
])

result = await chain.execute(
    "Find articles about climate change, summarize them, and save to climate_research.txt"
)
```

**Why you need this:** Because real questions aren't simple. Sometimes you need to search the web, do math, and synthesize results. This is like giving your RAG system a Swiss Army knife and the intelligence to know when to use each tool.

</details>

<details>
<summary><strong>Recipe 3 — The cognitive pipeline (full AC-RAG)</strong></summary>

For when you want the system to route, retrieve progressively, reflect, and learn.

### Basic Cognitive Query
```python
result = await rag.run_cognitive_query(
    "Compare the economic impact of AI adoption in healthcare vs finance",
    enable_reflection=True,
    enable_progressive_retrieval=True,
)
print(result.answer)
print(f"Confidence: {result.confidence:.2f}")
print(f"Reflections: {result.reflection_count}")
```

### All 15 Cognitive Components
```python
from agentic_rag.cognitive import (
    # Routing & Classification
    NeuralQueryRouter,      # Route queries to optimal strategy
    NeuralQueryClassifier,  # Sub-millisecond intent classification
    
    # Query Processing
    QueryDecomposer,        # Break complex questions into sub-queries
    QueryRewriter,          # Multi-query, step-back, sub-question rewriting
    
    # Retrieval Strategies
    ProgressiveRetriever,   # Reformulate when quality is low
    CorrectiveRAG,          # Evaluate quality → refine or web-fallback
    SelfRAG,                # Four-checkpoint pipeline
    AdaptiveRAG,            # Decide whether to retrieve
    SpeculativeRAG,         # Parallel draft → verify → pick best
    
    # Answer Quality
    ReflectiveAgent,        # Self-critique and iterative improvement
    CalibratedConfidence,   # Platt-scaled confidence scores
    KnowledgeFusion,        # Merge multi-source with trust weights
    
    # Multi-Agent
    MultiAgentOrchestrator, # Event-driven collaboration
    ToolComposer,           # Auto-discover and compose tool chains
    
    # Memory
    HierarchicalMemory,     # Episodic / semantic / procedural
)

# Use individual components
router = NeuralQueryRouter()
decision = router.route("What were Apple's Q3 earnings?")
print(decision.strategy)  # "financial_retrieval"

# Query decomposition
decomposer = QueryDecomposer()
decomposed = decomposer.decompose(
    "Compare Tesla and BMW's EV market share in Europe"
)
for sub in decomposed.sub_queries:
    print(f"  - {sub.query}")  # "Tesla EV market share Europe"
                              # "BMW EV market share Europe"

# Progressive retrieval
progressive = ProgressiveRetriever(vector_store=store)
result = await progressive.retrieve(
    "Quantum computing breakthroughs 2024",
    min_confidence=0.7,
    max_attempts=3
)
```

### RAG Pattern Selection
```python
from agentic_rag.cognitive import (
    CorrectiveRAG, SelfRAG, AdaptiveRAG, SpeculativeRAG
)

# Corrective RAG: Fix bad retrieval
crag = CorrectiveRAG(vector_store=store, web_search_tool=web_search)
result = await crag.query("Latest SpaceX Starship launch date")
# If retrieval quality is low, automatically falls back to web search

# Self-RAG: Four checkpoints
self_rag = SelfRAG(vector_store=store)
result = await self_rag.query(
    "Explain transformer architecture",
    checkpoints=["retrieve", "relevant", "supported", "useful"]
)
print(result.checkpoints_passed)  # [True, True, True, True]

# Adaptive RAG: Smart retrieval decisions
adaptive = AdaptiveRAG(vector_store=store)
result = await adaptive.query("What is 2+2?")  # No retrieval needed
result = await adaptive.query("Explain RAPTOR paper")  # Retrieves automatically

# Speculative RAG: Draft and verify
speculative = SpeculativeRAG(llm=llm)
result = await speculative.query(
    "Summarize climate change impacts",
    num_drafts=3,  # Generate 3 parallel drafts
)
print(result.best_draft.confidence)
```

### Reflective Agent with Critique
```python
from agentic_rag.cognitive import ReflectiveAgent

reflective = ReflectiveAgent(llm=llm, max_iterations=3)
result = await reflective.process(
    "What are the main causes of World War I?",
    reflection_focus=["factual_accuracy", "completeness", "source_diversity"]
)

print(f"Answer: {result.answer}")
print(f"Iterations: {result.iteration_count}")
for critique in result.critiques:
    print(f"  Issue: {critique.issue} | Severity: {critique.severity}")
```

### Hierarchical Memory
```python
from agentic_rag.cognitive import HierarchicalMemory

memory = HierarchicalMemory()

# Episodic: Remember past interactions
await memory.episodic.record_interaction(
    query="Python list comprehensions",
    answer="List comprehensions provide...",
    outcome="helpful"
)

# Semantic: Learn facts
await memory.semantic.store_fact(
    subject="Python",
    predicate="created_by",
    object="Guido van Rossum"
)

# Procedural: Learn strategies
await memory.procedural.record_strategy(
    situation="vague_query",
    action="ask_clarifying_question",
    outcome="success"
)

# Query memory
similar = await memory.episodic.find_similar("list comprehension syntax")
print(f"Found {len(similar)} similar past queries")
```

**Why you need this:** Because sometimes one pass isn't enough. This is like giving your RAG system a PhD - it questions its own answers, retrieves more when uncertain, and learns from every interaction.

</details>

<details>
<summary><strong>Recipe 4 — Swap your LLM like changing socks (18 providers)</strong></summary>

Every provider has the same interface. Swap one line, keep everything else.

### All 18 LLM Providers
```python
from agentic_rag.providers.llm import (
    # Cloud Providers
    OpenAIProvider,         # GPT-4o, o1, o3, GPT-4.5
    AnthropicProvider,      # Claude 3.5 / 4 Sonnet, Opus, Haiku
    GeminiProvider,         # Google Gemini 1.5/2.0 Pro/Flash
    CohereProvider,         # Command R/R+
    MistralProvider,        # Mistral Large/Medium/Small
    
    # Fast Inference
    GroqProvider,           # Llama/Mixtral at 500+ tok/sec
    TogetherAIProvider,     # 100+ open models
    FireworksProvider,      # Fast inference for open models
    
    # Specialized
    DeepSeekProvider,       # DeepSeek V3/R1 (reasoning)
    XAIProvider,            # Grok models
    PerplexityProvider,     # Sonar models with citations
    AI21Provider,           # Jamba models
    
    # Enterprise / Cloud
    AzureOpenAIProvider,    # GPT-4 via Azure
    BedrockProvider,        # AWS Bedrock (Claude, Llama, etc.)
    
    # OpenAI-Compatible
    OpenAICompatibleProvider,  # Any OpenAI-compatible API
    
    # Local / Self-Hosted
    OllamaProvider,         # Local models (Llama, Mistral, etc.)
    LocalModelProvider,     # Generic local model wrapper
    LlamaCppProvider,       # llama.cpp backend
)
```

### Quick Examples
```python
# OpenAI - The reliable choice
llm = OpenAIProvider(api_key="sk-...", model="gpt-4o")

# Anthropic - Best for long context
llm = AnthropicProvider(
    api_key="sk-ant-...",
    model="claude-sonnet-4-20250514",
    max_tokens=8192
)

# Google Gemini - Multimodal powerhouse
llm = GeminiProvider(api_key="...", model="gemini-2.0-flash")

# Groq - Speed demon (500+ tok/sec)
llm = GroqProvider(api_key="...", model="llama-3.3-70b-versatile")

# DeepSeek - Best reasoning model
llm = DeepSeekProvider(api_key="...", model="deepseek-r1")

# Ollama - Run locally, pay nothing
llm = OllamaProvider(model="llama3.3", host="http://localhost:11434")

# Azure OpenAI - Enterprise ready
llm = AzureOpenAIProvider(
    api_key="...",
    endpoint="https://myresource.openai.azure.com",
    deployment_name="gpt-4o"
)

# AWS Bedrock - Your AWS bill's new friend
llm = BedrockProvider(
    aws_access_key="...",
    aws_secret_key="...",
    region="us-east-1",
    model="anthropic.claude-3-sonnet-20240229-v1:0"
)

# Any OpenAI-compatible API
llm = OpenAICompatibleProvider(
    base_url="https://api.mystartup.com/v1",
    api_key="...",
    model="custom-model"
)
```

### Using the Factory
```python
from agentic_rag.factories import create_provider

# Create by name (useful for config-driven setups)
llm = create_provider("openai", model="gpt-4o", api_key="...")
llm = create_provider("anthropic", model="claude-sonnet-4", api_key="...")
llm = create_provider("ollama", model="llama3.3")

# Same interface everywhere
rag = AgenticRAG(vector_store=store, llm_provider=llm)
```

**Why you need this:** Because vendor lock-in is the adult version of "you can't sit with us." We believe in playing the field - try OpenAI today, switch to local models tomorrow, your code stays the same.

</details>

<details>
<summary><strong>Recipe 5 — Vector stores (13 options + hybrid retrieval)</strong></summary>

### All 13 Vector Stores
```python
from agentic_rag.providers.vector_stores import (
    # Local / Development
    InMemoryVectorStore,      # Zero-config, for testing
    ChromaVectorStore,        # Local ChromaDB
    FAISSVectorStore,         # Facebook AI Similarity Search
    
    # Production / Managed
    PineconeVectorStore,      # Managed vector search
    WeaviateVectorStore,      # Vector + semantic search
    QdrantVectorStore,        # High-performance vector DB
    PGVectorStore,            # PostgreSQL + pgvector
    MilvusVectorStore,        # Distributed vector DB
    
    # Cloud / Enterprise
    RedisVectorStore,         # Redis Stack with RediSearch
    MongoDBAtlasVectorStore,  # MongoDB Atlas Vector Search
    ElasticsearchVectorStore, # Elasticsearch dense vectors
    VespaVectorStore,          # Vespa search engine
    AzureAISearchVectorStore, # Azure AI Search
)
```

### Quick Setup Examples
```python
# In-Memory (testing/development)
store = InMemoryVectorStore()

# ChromaDB (local development)
store = ChromaVectorStore(
    collection_name="my_docs",
    persist_directory="./chroma_db"
)

# Pinecone (managed production)
store = PineconeVectorStore(
    api_key="your-key",
    environment="us-west1-gcp",
    index_name="production-index",
    dimension=1536  # Match your embedding model
)

# Weaviate (local or cloud)
store = WeaviateVectorStore(
    host="http://localhost:8080",
    class_name="Documents"
)

# Qdrant (local or cloud)
store = QdrantVectorStore(
    host="localhost",
    port=6333,
    collection_name="docs"
)

# PostgreSQL + pgvector
store = PGVectorStore(
    connection_string="postgresql://user:pass@localhost/db",
    table_name="embeddings",
    dimension=1536
)

# Milvus (distributed)
store = MilvusVectorStore(
    host="localhost",
    port="19530",
    collection_name="documents"
)

# Redis (with RediSearch)
store = RedisVectorStore(
    redis_url="redis://localhost:6379",
    index_name="rag_docs"
)

# MongoDB Atlas
store = MongoDBAtlasVectorStore(
    connection_string="mongodb+srv://...",
    database="rag",
    collection="documents",
    index_name="vector_index"
)

# Elasticsearch
store = ElasticsearchVectorStore(
    hosts=["http://localhost:9200"],
    index_name="rag_documents"
)

# Azure AI Search
store = AzureAISearchVectorStore(
    endpoint="https://search.search.windows.net",
    api_key="...",
    index_name="documents"
)
```

### Hybrid Retrieval (Dense + Sparse)
```python
from agentic_rag.retrieval import HybridRetriever, BM25Retriever

# Combine vector search with BM25 keyword search
hybrid = HybridRetriever(
    vector_store=store,
    dense_weight=0.7,
    sparse_weight=0.3
)

# Or with explicit BM25
from agentic_rag.retrieval import BM25Retriever
bm25 = BM25Retriever(documents=all_docs)
hybrid = HybridRetriever(
    vector_store=store,
    sparse_retriever=bm25,
    dense_weight=0.6,
    sparse_weight=0.4
)

# Use with RAG
rag = AgenticRAG(vector_store=store, llm_provider=llm)
response = await rag.aquery("What is machine learning?", retriever=hybrid)
```

### Using the Factory
```python
from agentic_rag.factories import create_vector_store

# Create by name from config
store = create_vector_store("chroma", collection_name="docs")
store = create_vector_store("pinecone", api_key="...", index_name="prod")
store = create_vector_store("qdrant", host="localhost", port=6333)
```

**Why you need this:** Because your use case matters. Local development shouldn't require a PhD in cloud architecture, and production shouldn't run on your laptop. We've got you covered from prototype to scale.

</details>

<details>
<summary><strong>Recipe 6 — Document processing (20+ loaders, 10+ chunkers)</strong></summary>


### All Document Loaders
```python
from agentic_rag.document_processing.loaders import (
    # Text & Documents
    TextLoader,           # Plain text files (.txt)
    PDFLoader,            # PDF documents (.pdf) - with 5 backends!
    DocxLoader,           # Word documents (.docx)
    MarkdownLoader,       # Markdown files (.md)
    HTMLLoader,           # HTML files (.html, .htm)
    
    # Structured Data
    JSONLoader,           # JSON files (.json)
    CSVLoader,            # CSV files (.csv)
    XMLLoader,            # XML files (.xml)
    
    # Spreadsheets & Presentations
    ExcelLoader,          # Excel files (.xlsx, .xls)
    PowerPointLoader,     # PowerPoint (.pptx)
    
    # Media & Archives
    ImageLoader,          # Images with OCR (.jpg, .png, etc.)
    AudioLoader,          # Audio files with transcription
    VideoLoader,          # Video analysis (frames + audio)
    EPubLoader,           # eBooks (.epub)
    NotebookLoader,       # Jupyter notebooks (.ipynb)
    ArchiveLoader,        # Zip, tar, etc.
    
    # Advanced PDF Loaders (specialized)
    DoclingPDFLoader,     # Full document AI (tables, figures, equations)
    CamelotTableLoader,   # Table extraction specialist
    UnstructuredPDFLoader, # Multi-element extraction
    KreuzbergPDFLoader,   # Modern extraction with math support
)

# Load any document
docs = PDFLoader().load("report.pdf")
docs = DocxLoader().load("contract.docx")
docs = MarkdownLoader().load("docs.md")
docs = HTMLLoader().load("page.html")
docs = JSONLoader().load("data.json")
docs = CSVLoader().load("spreadsheet.csv")

# Excel with sheet selection
excel_docs = ExcelLoader(sheet_name="Revenue").load("financials.xlsx")

# Image with OCR
image_docs = ImageLoader(
    ImageLoaderConfig(enable_ocr=True, extract_metadata=True)
).load("diagram.png")

# Video analysis (extract frames + transcribe audio)
from agentic_rag.document_processing.loaders import VideoLoader, VideoLoaderConfig

video_docs = VideoLoader(
    VideoLoaderConfig(
        extract_frames_every_seconds=5,
        transcribe_audio=True,
        max_frames=20
    )
).load("presentation.mp4")
```

### All Chunking Strategies
```python
from agentic_rag.document_processing.chunkers import (
    # Basic chunkers
    FixedSizeChunker,       # Fixed size with overlap
    SemanticChunker,        # Sentence/paragraph boundaries
    
    # Structure-aware
    MarkdownChunker,        # Respects headers (# ## ###)
    HTMLDOMChunker,         # HTML DOM tree chunking
    PDFLayoutChunker,       # Layout-aware PDF chunking
    CodeChunker,            # AST-aware code chunking
    
    # Specialized
    TableChunker,           # Table structure preservation
    JSONStructureChunker,   # JSON hierarchy chunking
    XMLStructureChunker,    # XML element chunking
    MultimodalChunker,      # Image + text alignment
    AgenticChunker,         # LLM-based semantic chunking
)
from agentic_rag.document_processing.chunkers import Language

# Basic semantic chunking
chunker = SemanticChunker(max_chunk_size=1000, overlap=100)
chunks = chunker.chunk(long_text)

# Fixed size
chunker = FixedSizeChunker(chunk_size=500, overlap=50)

# Markdown with headers
chunker = MarkdownChunker(
    max_chunk_size=1500,
    header_split_levels=[1, 2, 3]  # Split at h1, h2, h3
)

# Code-aware (Python, JS, Java, etc.)
chunker = CodeChunker(
    language=Language.PYTHON,
    max_chunk_size=2000,
    respect_function_boundaries=True
)

# HTML DOM chunking
chunker = HTMLDOMChunker(
    max_chunk_size=1500,
    respect_semantic_tags=True,  # <article>, <section>, etc.
    include_attributes=["id", "class"]
)

# PDF layout-aware
chunker = PDFLayoutChunker(
    max_chunk_size=2000,
    detect_columns=True,
    preserve_page_breaks=False
)

# Tables with context
chunker = TableChunker(
    max_chunk_size=1500,
    keep_headers_with_rows=True,
    output_format="text"  # or "csv", "json"
)
```

### Advanced PDF Extraction (5 Backends)
```python
from agentic_rag.document_processing.loaders import (
    PDFLoader,
    PDFExtractionConfig,
    DoclingPDFLoader, DoclingTableLoader, DoclingFigureLoader, DoclingEquationLoader,
    CamelotTableLoader, CamelotLatticeLoader, CamelotStreamLoader,
    UnstructuredPDFLoader, UnstructuredTableLoader,
    KreuzbergPDFLoader, KreuzbergEquationLoader,
    get_available_backends,  # Check which backends are installed
)

# Check available backends
backends = get_available_backends()
# {'pypdf2': True, 'docling': True, 'camelot': False, ...}

# 1. Universal PDFLoader with backend selection
# Uses PyPDF2 by default (no extra dependencies)
docs = PDFLoader().load("document.pdf")

# Auto-select best backend based on extraction needs
docs = PDFLoader(
    extraction_backend="auto",  # Auto-select based on needs
    extract_tables=True,
    extract_figures=True,
    extract_equations=True
).load("research_paper.pdf")

# Force specific backend
docs = PDFLoader(extraction_backend="docling").load("paper.pdf")      # Best overall
docs = PDFLoader(extraction_backend="unstructured").load("form.pdf") # Multi-element
docs = PDFLoader(extraction_backend="kreuzberg").load("math.pdf")    # Math equations

# 2. Docling - Full Document AI (IBM's state-of-the-art)
# Best for: research papers, complex layouts, tables, figures, equations
# Install: pip install ai-prishtina-agentic-rag[pdf-docling]

# Extract everything
docs = DoclingPDFLoader(
    extraction_config=PDFExtractionConfig(
        extract_tables=True,
        extract_figures=True,
        extract_equations=True
    )
).load("research_paper.pdf")

# Returns list of documents by type:
# - Document(content="Introduction text...", metadata={"type": "text", "page": 1})
# - Document(content="| Revenue | Q1 | Q2 |...", metadata={"type": "table", "page": 3})
# - Document(content="Figure 1: Neural network architecture", metadata={"type": "figure", "page": 2})
# - Document(content="E = mc^2", metadata={"type": "equation", "latex": "E = mc^2"})

# Specialized loaders
from agentic_rag.document_processing.loaders import DoclingTableLoader
tables = await DoclingTableLoader().load("financial_report.pdf")
figures = await DoclingFigureLoader().load("paper_with_charts.pdf")
equations = await DoclingEquationLoader().load("math_paper.pdf")

# 3. Camelot - Table Extraction Specialist
# Best for: financial reports, data tables, structured PDFs
# Install: pip install ai-prishtina-agentic-rag[pdf-camelot]

# Auto-detect table type (lattice vs stream)
tables = await CamelotTableLoader(flavor="auto").load("report.pdf")

# Force lattice mode (for tables with ruling lines)
tables = await CamelotLatticeLoader(line_scale=15).load("financial_report.pdf")

# Force stream mode (for unruled tables using whitespace)
tables = await CamelotStreamLoader(shift_text=["l", "t"]).load("data_table.pdf")

# Access extracted table data
for table in tables:
    df = table.metadata.get("dataframe")  # pandas DataFrame
    csv = table.metadata.get("csv")       # CSV string
    accuracy = table.metadata.get("accuracy")  # Extraction confidence
    print(f"Table on page {table.metadata['page']}: {table.metadata['shape']}")

# 4. Unstructured - Multi-Element Extraction
# Best for: mixed documents, forms, headers/footers, complex layouts
# Install: pip install ai-prishtina-agentic-rag[pdf-unstructured]

# Extract all elements
docs = UnstructuredPDFLoader(
    strategy="hi_res",  # high accuracy (slower)
    extract_tables=True,
    languages=["eng"]
).load("document.pdf")

# Filter by element type
tables = [d for d in docs if d.metadata.get("type") == "table"]
headers = [d for d in docs if d.metadata.get("type") == "header"]
text_blocks = [d for d in docs if d.metadata.get("type") == "text"]

# Fast text-only extraction
docs = UnstructuredTextLoader(strategy="fast").load("document.pdf")

# Table-only extraction
tables = UnstructuredTableLoader().load("report.pdf")

# 5. Kreuzberg - Modern Extraction with Math
# Best for: modern PDFs, mathematical content, academic papers
# Install: pip install ai-prishtina-agentic-rag[pdf-kreuzberg]

# General extraction
docs = KreuzbergPDFLoader().load("document.pdf")

# Math-focused extraction
equations = KreuzbergEquationLoader().load("math_paper.pdf")
for eq in equations:
    latex = eq.metadata.get("latex")
    print(f"Equation: {latex}")
```

### Factory Pattern
```python
from agentic_rag.factories import create_chunker, create_loader, get_loader_for_file

# Config-driven chunking
chunker = create_chunker("semantic", max_chunk_size=1000)
chunker = create_chunker("markdown", max_chunk_size=1500)
chunker = create_chunker("fixed", chunk_size=500)

# Config-driven loading
loader = create_loader("pdf")
loader = create_loader("docx")
loader = create_loader("auto")  # Auto-detect from extension
docs = loader.load("document.pdf")

# Advanced PDF via factory
loader = create_loader("pdf", extraction_backend="docling")
loader = create_loader("pdf_tables", flavor="lattice")
loader = create_loader("pdf_docling_figures")
loader = create_loader("pdf_unstructured", strategy="hi_res")

# Auto-detect loader from file extension
loader = get_loader_for_file("research.pdf", extraction_backend="docling")
loader = get_loader_for_file("data.csv")
```

### Preprocessing Pipeline
```python
from agentic_rag.document_processing.preprocessors import TextPreprocessor, MetadataExtractor

# Clean and normalize
preprocessor = TextPreprocessor(
    remove_extra_whitespace=True,
    normalize_unicode=True,
    fix_line_breaks=True
)
clean_text = preprocessor.process(raw_text)

# Extract metadata
extractor = MetadataExtractor()
metadata = extractor.extract(
    content=doc_content,
    source_path="report.pdf",
    extract_title=True,
    extract_dates=True,
    extract_entities=True
)
```

**Why you need this:** Because real documents come in all shapes and sizes. PDFs, Word docs, Markdown, JSON - we handle them all. Smart chunking means better retrieval, which means better answers. It's that simple.

</details>

<details>
<summary><strong>Recipe 7 — Graph RAG (entity relationships + graph traversal)</strong></summary>

### Basic Graph Setup
```python
from agentic_rag.graph import KnowledgeGraph, EntityExtractor, GraphBuilder

# Create graph
graph = KnowledgeGraph()

# Choose extraction method
extractor = EntityExtractor(method="spacy")  # or "llm" for better accuracy

# Build from documents
builder = GraphBuilder(graph=graph, extractor=extractor)
await builder.build_from_documents(documents)

# Query with graph
rag = AgenticRAG(vector_store=store, llm_provider=llm, knowledge_graph=graph)
response = await rag.aquery("How is Company X related to Product Y?")
```

### Entity Extraction Methods
```python
from agentic_rag.graph import EntityExtractor, GraphRAGQuery

# Fast rule-based extraction
extractor = EntityExtractor(method="spacy")

# LLM-powered extraction (more accurate, slower)
extractor = EntityExtractor(
    method="llm",
    llm_provider=llm,
    entity_types=["PERSON", "ORG", "PRODUCT", "EVENT", "TECHNOLOGY"]
)

# Hybrid approach
extractor = EntityExtractor(
    method="hybrid",
    llm_provider=llm,
    confidence_threshold=0.7
)
```

### Graph Traversal Queries
```python
from agentic_rag.graph import GraphRAGQuery, TraversalStrategy

# Multi-hop traversal
query = GraphRAGQuery(
    start_node="Elon Musk",
    relation_type="founded",
    max_hops=2,
    strategy=TraversalStrategy.BFS
)
path = await graph.traverse(query)
# Returns: Elon Musk -> founded -> Tesla -> produces -> Model S

# Relationship-based retrieval
results = await graph.query(
    "Find all companies ACQUIRED_BY Google since 2020"
)

# Hybrid: Vector + Graph
response = await rag.aquery(
    "Who worked at companies that were later acquired by Meta?",
    use_graph=True,
    graph_depth=3
)
```

### Graph RAG Patterns
```python
from agentic_rag.graph import (
    EntityLinker,       # Link entities across documents
    RelationClassifier, # Classify relationship types
    GraphSummarizer,    # Summarize subgraphs
)

# Link entities (disambiguation)
linker = EntityLinker(graph=graph)
await linker.link_entities(threshold=0.85)
# Links "Apple Inc." and "Apple Company" as the same entity

# Relation classification
classifier = RelationClassifier(llm=llm)
relations = await classifier.classify(
    entity1="Microsoft",
    entity2="OpenAI",
    context="Microsoft invested $10B in OpenAI"
)
print(relations)  # ["investor_of", "partner_of"]
```

**Why you need this:** Because knowledge isn't flat. Companies acquire other companies, people work at multiple places, products have dependencies. Graph RAG understands relationships, not just keywords.

</details>

<details>
<summary><strong>Recipe 8 — Streaming (SSE + structured events)</strong></summary>

### Basic Streaming
```python
# Simple text streaming
async for chunk in rag.astream("Explain quantum computing in detail"):
    print(chunk.content, end="", flush=True)
```

### Structured Streaming Events
```python
from agentic_rag.core import StreamingEvent

# Full event streaming (SSE-compatible)
async for event in rag.astream_structured(
    "What are the latest AI breakthroughs?",
    include_sources=True,
    include_confidence=True
):
    if event.type == "answer_chunk":
        print(event.content, end="")
    elif event.type == "source":
        print(f"\n[Source {event.metadata['index']}]: {event.content[:100]}...")
    elif event.type == "confidence":
        print(f"\n[Confidence: {event.content:.2f}]")
    elif event.type == "tool_call":
        print(f"\n[Using tool: {event.metadata['tool_name']}]")
    elif event.type == "reflection":
        print(f"\n[Reflection: {event.content}]")
    elif event.type == "complete":
        print("\n[Done]")
```

### FastAPI SSE Endpoint
```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from agentic_rag.server import create_app

app = create_app()

@app.post("/stream")
async def stream_query(query: str):
    async def event_generator():
        async for event in rag.astream_structured(query):
            yield f"event: {event.type}\ndata: {event.to_json()}\n\n"
    
    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )

# JavaScript client:
# const eventSource = new EventSource('/stream?query=...');
# eventSource.onmessage = (e) => console.log(JSON.parse(e.data));
```

### WebSocket Streaming
```python
from agentic_rag.server import WebSocketRAGHandler

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    handler = WebSocketRAGHandler(rag)
    await handler.handle(websocket)
    
# Supports:
# - Bidirectional communication
# - Query cancellation
# - Session persistence
# - Multi-turn conversations
```

**Why you need this:** Because users hate waiting. Streaming gives instant feedback, lower perceived latency, and lets you build chat-style interfaces that don't feel like they're loading on dial-up.

</details>

<details>
<summary><strong>Recipe 9 — Evaluation (metrics + A/B testing)</strong></summary>

### Comprehensive Evaluation
```python
from agentic_rag.evaluation import (
    ComprehensiveEvaluator,
    AnswerRelevanceMetric,
    ContextPrecisionMetric,
    FaithfulnessMetric,
    AnswerCorrectnessMetric,
    LatencyMetric,
    CostMetric
)

evaluator = ComprehensiveEvaluator()

# Evaluate a single response
scores = await evaluator.evaluate(
    query="What is the capital of France?",
    response="Paris is the capital of France.",
    ground_truth="Paris",
    context=["France is a country in Western Europe. Its capital is Paris."],
)

print(f"Answer Relevance: {scores['relevance'].score:.2f}")
print(f"Context Precision: {scores['context_precision'].score:.2f}")
print(f"Faithfulness: {scores['faithfulness'].score:.2f}")
print(f"Correctness: {scores['correctness'].score:.2f}")
print(f"Latency: {scores['latency'].value_ms}ms")
print(f"Cost: ${scores['cost'].value:.4f}")
```

### Reference-Free Evaluation
```python
from agentic_rag.evaluation import ReferenceFreeEvaluator

# When you don't have ground truth
rf_evaluator = ReferenceFreeEvaluator()
scores = await rf_evaluator.evaluate(
    query="Explain neural networks",
    response="Neural networks are computational models inspired by biological brains...",
    context=["Neural networks consist of layers of interconnected nodes..."]
)
# Evaluates: coherence, completeness, relevance without ground truth
```

### Batch Evaluation
```python
from agentic_rag.evaluation import BatchEvaluator, EvaluationDataset

# Load test dataset
dataset = EvaluationDataset.from_json("test_queries.json")
# Format: [{"query": "...", "ground_truth": "...", "context": "..."}, ...]

# Run batch evaluation
batch_eval = BatchEvaluator(rag)
results = await batch_eval.evaluate_dataset(dataset)

# Get summary statistics
print(f"Mean relevance: {results.mean('relevance'):.2f}")
print(f"Mean latency: {results.mean('latency'):.0f}ms")
print(f"P95 latency: {results.percentile('latency', 95):.0f}ms")

# Export detailed results
results.to_csv("evaluation_results.csv")
results.to_html("evaluation_report.html")
```

### A/B Testing
```python
from agentic_rag.evaluation import ABTest, ABTestConfig, Variant

# Configure test
config = ABTestConfig(
    test_name="gpt4_vs_claude_sonnet",
    min_sample_size=100,
    max_sample_size=500,
    primary_metric="answer_correctness",
    confidence_level=0.95
)

ab = ABTest(config)

# Register variants
ab.register_variant("gpt4", Variant(
    pipeline=gpt4_rag,
    name="GPT-4 Pipeline"
))
ab.register_variant("claude", Variant(
    pipeline=claude_rag,
    name="Claude Sonnet Pipeline"
))

# Run test (automatically determines winner)
winner = await ab.run(dataset)
print(f"Winner: {winner.name}")
print(f"Improvement: {winner.improvement_percent:.1f}%")
print(f"P-value: {winner.p_value:.4f}")

# Or manual control
ab.start()
ab.add_result("gpt4", query, response, score)
ab.add_result("claude", query, response, score)
status = ab.get_status()
if status.is_conclusive:
    print(f"Winner: {status.winner}")
```

### Custom Metrics
```python
from agentic_rag.evaluation import BaseMetric, MetricResult

class CustomMetric(BaseMetric):
    """Custom metric for domain-specific evaluation."""
    
    async def evaluate(self, query, response, context, **kwargs) -> MetricResult:
        # Your custom logic
        score = self._calculate_score(response)
        return MetricResult(
            score=score,
            explanation="Custom evaluation reasoning",
            metadata={"custom_field": value}
        )

# Register and use
evaluator.register_metric("custom", CustomMetric())
```

**Why you need this:** Because "it works" isn't enough. You need to know HOW well it works, whether that expensive model upgrade actually helps, and if your new retrieval strategy is worth the complexity.

</details>

<details>
<summary><strong>Recipe 10 — Production hardening (caching + guardrails + monitoring)</strong></summary>

### Semantic Caching
```python
from agentic_rag import SemanticCache

# Redis-backed semantic cache
cache = SemanticCache(
    backend="redis",
    redis_url="redis://localhost:6379",
    similarity_threshold=0.95,  # Consider queries similar above 95%
    ttl=3600  # Cache for 1 hour
)

# Or in-memory for single-node
cache = SemanticCache(backend="memory", max_size=10000)

# Use with RAG
rag = AgenticRAG(
    vector_store=store,
    llm_provider=llm,
    cache=cache
)

# Cache stats
print(f"Hit rate: {cache.hit_rate:.2%}")
print(f"Saved tokens: {cache.saved_tokens:,}")
print(f"Saved cost: ${cache.saved_cost:.2f}")
```

### Circuit Breakers
```python
from agentic_rag import CircuitBreaker

# Protect against LLM failures
breaker = CircuitBreaker(
    name="openai_api",
    failure_threshold=5,      # Open after 5 failures
    recovery_timeout=60,        # Try again after 60s
    half_open_max_calls=3,      # Test with 3 calls when recovering
    success_threshold=2         # Need 2 successes to close
)

# Use with provider
llm = OpenAIProvider(
    api_key="...",
    circuit_breaker=breaker,
    fallback_provider=OllamaProvider(model="llama3.3")  # Local fallback
)

# Monitor breaker state
print(f"State: {breaker.state}")  # CLOSED, OPEN, HALF_OPEN
print(f"Failures: {breaker.failure_count}")
print(f"Last failure: {breaker.last_failure_time}")
```

### Output Guardrails
```python
from agentic_rag import OutputGuardrails, PII Detector, ToxicityFilter

# Comprehensive guardrails
guardrails = OutputGuardrails(
    # PII Detection
    enable_pii_detection=True,
    pii_types=["email", "phone", "ssn", "credit_card", "address"],
    
    # Content filtering
    enable_toxicity_filter=True,
    toxicity_threshold=0.7,
    
    # Custom rules
    blocked_patterns=[
        r"\b(password|secret_key|api_key)\s*=\s*['\"][^'\"]+['\"]"
    ],
    
    # Fact-checking (optional)
    enable_fact_check=True,
    fact_checker=fact_checker_service
)

# Apply to response
safe_response = await guardrails.check(response)
if not safe_response.is_safe:
    print(f"Blocked: {safe_response.violations}")
    return "I cannot provide that information."
```

### Cost Tracking & Budgeting
```python
from agentic_rag import CostTracker, BudgetManager

# Track costs
tracker = CostTracker()
tracker.record_usage(
    model="gpt-4o",
    input_tokens=500,
    output_tokens=200,
    embedding_tokens=1000
)

print(f"Total cost: ${tracker.total_cost:.4f}")
print(f"By model: {tracker.cost_by_model}")
print(f"By day: {tracker.cost_by_day}")

# Budget management
budget = BudgetManager(
    daily_limit=50.0,      # $50/day
    monthly_limit=1000.0,  # $1000/month
    alert_threshold=0.8    # Alert at 80%
)

if budget.would_exceed_limit(expected_cost=2.0):
    # Switch to cheaper model or reject request
    llm = GroqProvider(model="llama-3.3-70b")  # Much cheaper
```

### Rate Limiting
```python
from agentic_rag import RateLimiter

# Token bucket rate limiter
limiter = RateLimiter(
    requests_per_second=10,
    burst_size=20,
    per_user=True  # Track per API key
)

@limiter.limit()
async def query_endpoint(request):
    return await rag.aquery(request.query)

# Or with decorator
@rate_limit(requests_per_minute=60, per_user=True)
async def stream_endpoint(request):
    async for chunk in rag.astream(request.query):
        yield chunk
```

### Health Checks & Monitoring
```python
from agentic_rag import HealthChecker, MetricsCollector

# Health check
health = HealthChecker(rag)
status = await health.check()
print(f"Vector store: {status.vector_store}")  # healthy / degraded / down
print(f"LLM provider: {status.llm_provider}")
print(f"Cache: {status.cache}")

# Metrics
metrics = MetricsCollector()
metrics.record_latency("query", duration_ms=250)
metrics.record_throughput("requests_per_second", count=10)
metrics.record_error_rate("error_rate", errors=1, total=100)

# Export to Prometheus/Grafana
from agentic_rag.metrics import PrometheusExporter
exporter = PrometheusExporter()
exporter.export(metrics)
```

**Why you need this:** Because production is where prototypes go to die. Caching saves money, circuit breakers prevent cascading failures, guardrails keep you out of trouble, and cost tracking prevents surprise bills.

</details>

<details>
<summary><strong>Recipe 11 — Contrib tools (30+ integrations)</strong></summary>

### Communication
```python
from agentic_rag.contrib.communication import (
    SlackTool,          # Post to Slack channels
    DiscordTool,        # Discord messages
    EmailTool,          # Send emails via SMTP/SendGrid
    TeamsTool,          # Microsoft Teams
    TelegramTool,       # Telegram bot messages
    WebhookTool,        # Generic webhooks
)

# Slack notifications
slack = SlackTool(token="xoxb-your-token")
await slack.execute(
    channel="#ai-research",
    message=f"New insight: {response.answer[:500]}...",
    blocks=[{
        "type": "section",
        "text": {"type": "mrkdwn", "text": f"*Query:* {query}"}
    }]
)

# Email reports
email = EmailTool(
    provider="sendgrid",  # or "smtp", "aws_ses"
    api_key="..."
)
await email.execute(
    to="team@company.com",
    subject="Daily RAG Summary",
    body=response.answer,
    attachments=["report.pdf"]
)

# Generic webhook
webhook = WebhookTool()
await webhook.execute(
    url="https://hooks.zapier.com/hooks/catch/...",
    payload={"query": query, "answer": response.answer}
)
```

### Productivity
```python
from agentic_rag.contrib.productivity import (
    NotionTool,         # Notion pages/databases
    GoogleDocsTool,     # Google Docs
    TrelloTool,         # Trello cards
    AsanaTool,          # Asana tasks
    MondayTool,         # Monday.com
    LinearTool,         # Linear issues
    JiraTool,           # Jira tickets
    ConfluenceTool,     # Confluence pages
)

# Create Notion page
notion = NotionTool(api_key="secret_...")
page = await notion.execute(
    action="create_page",
    parent_id="workspace-id",
    title="Q4 Financial Analysis",
    content=response.answer,
    properties={
        "Status": "Draft",
        "Tags": ["AI-Generated", "Q4"]
    }
)

# Create Jira ticket
jira = JiraTool(
    server="https://company.atlassian.net",
    username="bot@company.com",
    api_token="..."
)
await jira.execute(
    action="create_issue",
    project="AI",
    summary=f"Research: {query[:100]}",
    description=response.answer,
    issue_type="Task",
    labels=["rag-generated"]
)
```

### Storage & Cloud
```python
from agentic_rag.contrib.storage import (
    S3Tool,             # AWS S3
    GCS Tool,           # Google Cloud Storage
    AzureBlobTool,      # Azure Blob Storage
    DropboxTool,        # Dropbox
    GoogleDriveTool,    # Google Drive
    OneDriveTool,       # Microsoft OneDrive
    BoxTool,            # Box.com
)

# S3 upload
s3 = S3Tool(
    bucket="my-rag-outputs",
    region="us-east-1",
    aws_access_key="...",
    aws_secret_key="..."
)
await s3.execute(
    action="upload",
    key=f"reports/{datetime.now():%Y-%m-%d}/analysis.txt",
    body=response.answer,
    metadata={"query": query, "confidence": str(response.confidence)}
)

# Google Drive
drive = GoogleDriveTool(credentials="credentials.json")
file = await drive.execute(
    action="create_document",
    name="Research Summary",
    content=response.answer,
    folder_id="folder-id"
)
```

### Databases
```python
from agentic_rag.contrib.database import (
    PostgresTool,       # PostgreSQL queries
    MySQLTool,          # MySQL queries
    MongoDBTool,        # MongoDB operations
    RedisTool,          # Redis commands
    BigQueryTool,       # Google BigQuery
    SnowflakeTool,      # Snowflake
    ClickHouseTool,     # ClickHouse
    SupabaseTool,       # Supabase
)

# Query database
postgres = PostgresTool(connection_string="postgresql://...")
results = await postgres.execute(
    action="query",
    sql="SELECT * FROM customers WHERE churn_risk > 0.8"
)

# Store in MongoDB
mongo = MongoDBTool(uri="mongodb+srv://...")
await mongo.execute(
    action="insert_one",
    database="rag",
    collection="queries",
    document={
        "query": query,
        "answer": response.answer,
        "timestamp": datetime.now(),
        "confidence": response.confidence
    }
)
```

### Business & Payments
```python
from agentic_rag.contrib.business import (
    StripeTool,         # Stripe payments
    ShopifyTool,        # Shopify operations
    HubSpotTool,        # HubSpot CRM
    SalesforceTool,     # Salesforce
    ZendeskTool,        # Zendesk tickets
    IntercomTool,       # Intercom conversations
    TwilioTool,         # SMS/Voice
)

# Send SMS notification
twilio = TwilioTool(
    account_sid="...",
    auth_token="..."
)
await twilio.execute(
    action="send_sms",
    to="+1234567890",
    body=f"Alert: {response.answer[:100]}..."
)
```

### Search & Discovery
```python
from agentic_rag.contrib.search import (
    AlgoliaTool,        # Algolia search
    ElasticsearchTool,  # Elasticsearch
    MeilisearchTool,    # Meilisearch
    TypesenseTool,      # Typesense
)

# Index documents
algolia = AlgoliaTool(app_id="...", api_key="...")
await algolia.execute(
    action="index",
    index_name="docs",
    objects=[{"objectID": "1", "content": response.answer}]
)
```

**Why you need this:** Because insights stuck in your terminal aren't useful. Send them to Slack, save them to Notion, archive them to S3 - make your RAG system a team player, not a hermit.

</details>

<details>
<summary><strong>Recipe 12 — Build your own tool (advanced patterns)</strong></summary>

### Basic Custom Tool
```python
from agentic_rag.tools.base import BaseTool, ToolResult, ToolParameter
from typing import Dict, Any

class WeatherTool(BaseTool):
    """Because even RAG agents check the weather sometimes."""

    def __init__(self, api_key: str):
        super().__init__(
            name="weather",
            description="Get current weather for a city",
            parameters=[
                ToolParameter(
                    name="city",
                    type="string",
                    description="City name (e.g., 'London', 'New York')",
                    required=True
                ),
                ToolParameter(
                    name="units",
                    type="string",
                    description="Temperature units",
                    enum=["metric", "imperial"],
                    default="metric"
                )
            ],
        )
        self.api_key = api_key

    async def execute(self, city: str, units: str = "metric", **kwargs) -> ToolResult:
        """Execute the weather lookup."""
        try:
            # Your API call here
            data = await self._fetch_weather(city, units)
            return ToolResult(
                success=True,
                result={
                    "temperature": data["temp"],
                    "conditions": data["weather"][0]["description"],
                    "humidity": data["humidity"]
                },
                metadata={"source": "openweather", "cached": False}
            )
        except Exception as e:
            return ToolResult(
                success=False,
                error=str(e),
                error_code="WEATHER_API_ERROR"
            )

    async def _fetch_weather(self, city: str, units: str) -> Dict:
        # Implementation
        pass

# Register
rag.register_tool(WeatherTool(api_key="your-key"))
```

### Tool with Schema Validation
```python
from pydantic import BaseModel, Field

class DatabaseQueryInput(BaseModel):
    """Input schema for database queries."""
    table: str = Field(..., description="Table name to query")
    columns: list[str] = Field(default=["*"], description="Columns to select")
    where: str = Field(default="", description="WHERE clause (optional)")
    limit: int = Field(default=100, ge=1, le=1000, description="Max results")

class DatabaseTool(BaseTool):
    """Execute safe database queries."""
    
    input_schema = DatabaseQueryInput
    
    async def execute(
        self,
        table: str,
        columns: list[str] = ["*"],
        where: str = "",
        limit: int = 100
    ) -> ToolResult:
        # Validate table exists
        if table not in self.allowed_tables:
            return ToolResult(
                success=False,
                error=f"Table '{table}' not in allowed list",
                error_code="INVALID_TABLE"
            )
        
        # Safe query execution
        try:
            results = await self.db.fetch(table, columns, where, limit)
            return ToolResult(
                success=True,
                result=results,
                metadata={"row_count": len(results)}
            )
        except Exception as e:
            return ToolResult(
                success=False,
                error=f"Query failed: {str(e)}",
                error_code="QUERY_ERROR"
            )
```

### Tool with Caching
```python
from agentic_rag.utils.cache import Cache

class CachedAPITool(BaseTool):
    """Tool with built-in caching."""
    
    def __init__(self):
        super().__init__(...)
        self.cache = Cache(ttl=3600)  # 1 hour cache
    
    async def execute(self, query: str, **kwargs) -> ToolResult:
        # Check cache
        cache_key = f"api:{hash(query)}"
        if cached := await self.cache.get(cache_key):
            return ToolResult(success=True, result=cached, cached=True)
        
        # Fetch and cache
        result = await self._fetch(query)
        await self.cache.set(cache_key, result)
        return ToolResult(success=True, result=result, cached=False)
```

### Tool Factory Registration
```python
from agentic_rag.factories import register_tool

@register_tool("my_weather")
class WeatherTool(BaseTool):
    """Auto-registered weather tool."""
    
    def __init__(self, api_key: str = None, **kwargs):
        super().__init__(...)
        self.api_key = api_key or os.getenv("WEATHER_API_KEY")

# Create via factory
from agentic_rag.factories import create_tool
tool = create_tool("my_weather", api_key="...")
```

### Multi-Step Tool
```python
class ReportGeneratorTool(BaseTool):
    """Generate comprehensive reports from multiple sources."""
    
    async def execute(
        self,
        topic: str,
        depth: str = "comprehensive",
        format: str = "markdown"
    ) -> ToolResult:
        steps = []
        
        # Step 1: Web search
        search_results = await self._search_web(topic)
        steps.append(f"Searched web: {len(search_results)} sources")
        
        # Step 2: Vector store query
        docs = await self._query_documents(topic)
        steps.append(f"Queried docs: {len(docs)} documents")
        
        # Step 3: Synthesize
        report = await self._generate_report(
            topic, search_results, docs,
            depth=depth, format=format
        )
        steps.append("Generated report")
        
        return ToolResult(
            success=True,
            result={
                "report": report,
                "sources": len(search_results) + len(docs),
                "format": format
            },
            metadata={"steps": steps, "duration_ms": 2500}
        )
```

**Why you need this:** Because your use case is unique. We give you the building blocks to extend the system however you need - weather APIs, internal databases, custom calculations, whatever.

</details>

<details>
<summary><strong>Recipe 18 — Monitoring Dashboard (analytics & observability)</strong></summary>

Real-time web dashboard for monitoring your RAG system. Track queries, latency, success rates, and view recent interactions.

### Enable Dashboard
```python
from agentic_rag.server import create_app_with_dashboard

# Create app with dashboard
app = create_app_with_dashboard()

# Dashboard available at http://localhost:8000/dashboard
```

### Record Queries for Analytics
```python
from agentic_rag.server import record_rag_query
import time

# In your query handler
start = time.time()
response = await rag.aquery("What is machine learning?")
latency = (time.time() - start) * 1000

# Record for dashboard
record_rag_query(
    query="What is machine learning?",
    response=response.answer,
    latency_ms=latency,
    token_count=150,
    sources=[doc.id for doc in response.sources],
    confidence=response.confidence,
    success=True,
)
```

### Dashboard Features
- **Real-time metrics**: Query count, avg latency, success rate
- **Recent queries**: View last 50 queries with latency and status
- **Performance trends**: Latency and success rate over time
- **Auto-refresh**: Updates every 30 seconds

**Access URLs:**
- Dashboard UI: `/dashboard`
- Metrics API: `/api/dashboard/metrics`
- Query history: `/api/dashboard/queries`

**Why you need this:** Because flying blind in production is terrifying. See what's happening, spot issues before users do, and have data to optimize.

</details>

<details>
<summary><strong>Recipe 17 — Conversation Memory (multi-turn chat)</strong></summary>

Simple conversation memory for multi-turn chat sessions. Unlike complex hierarchical memory, this is purpose-built for maintaining chat context within token limits.

### Buffer Memory (Simple)
```python
from agentic_rag.core import ConversationBufferMemory

memory = ConversationBufferMemory(max_token_limit=4000)

# Add messages
memory.add_user_message("What is machine learning?")
memory.add_ai_message("Machine learning is a subset of AI...")
memory.add_user_message("Give me an example")
memory.add_ai_message("Sure! Spam detection is a classic example...")

# Get context for LLM
context = memory.to_list()
# Returns: [
#   {"role": "user", "content": "What is machine learning?"},
#   {"role": "assistant", "content": "Machine learning is a subset of AI..."},
#   ...
# ]

# Check token usage
tokens = memory.get_token_count()  # 450 tokens
```

### Window Memory (Last N turns)
```python
from agentic_rag.core import ConversationBufferWindowMemory

# Keep only last 5 conversation turns
memory = ConversationBufferWindowMemory(k=5)

# Older messages auto-removed
```

### Summary Memory (Compress old context)
```python
from agentic_rag.core import ConversationSummaryMemory

# Summarizes old messages to save tokens
memory = ConversationSummaryMemory(
    llm_provider=llm,
    max_token_limit=4000,
    summary_token_limit=500
)

# After many messages, early ones become:
# "Summary: User asked about ML, AI explained it's a subset of AI..."
```

### Entity Memory (Track mentioned entities)
```python
from agentic_rag.core import ConversationEntityMemory

# Automatically extracts and remembers entities
memory = ConversationEntityMemory(llm_provider=llm)

memory.add_user_message("Alice is working on the Q4 report.")
memory.add_ai_message("What is Alice's role?")
memory.add_user_message("She's the CTO.")

# Extracted entities
entities = memory.get_entities()
# {"alice": {"name": "Alice", "type": "Person", "role": "CTO"}}
```

### With AgenticRAG
```python
from agentic_rag import AgenticRAG
from agentic_rag.core import ConversationBufferMemory

memory = ConversationBufferMemory()
rag = AgenticRAG(vector_store=store, llm_provider=llm)

async def chat(user_input: str):
    # Get conversation context
    context = memory.to_list()
    
    # Add user message
    memory.add_user_message(user_input)
    
    # Query with context
    response = await rag.aquery(
        user_input,
        context_messages=context
    )
    
    # Store response
    memory.add_ai_message(response.answer)
    
    return response
```

**Why you need this:** Because users don't ask one question and leave. They have conversations. This keeps context without exploding your token budget.

</details>

<details>
<summary><strong>Recipe 16 — ReAct Agent (reasoning + acting)</strong></summary>

The ReAct pattern: think → act → observe → repeat. Our implementation combines reasoning and tool use in an interleaved loop.

### Basic ReAct Agent
```python
from agentic_rag.cognitive import ReActAgent
from agentic_rag.tools import WebSearchTool, CalculatorTool

agent = ReActAgent(
    llm_provider=llm,
    vector_store=store,
    tools=[WebSearchTool(), CalculatorTool()],
    max_iterations=5
)

result = await agent.query(
    "Find GDP of France and Germany, calculate the difference"
)
print(result.answer)
# "The difference between France's GDP ($2.78T) and Germany's GDP ($4.07T) is $1.29T"

# Full reasoning trace
for step in result.reasoning_trace:
    print(f"[{step.step_type}] {step.content}")
```

### ReAct with Retrieval
```python
# Vector store becomes a retrieval tool automatically
agent = ReActAgent(
    llm_provider=llm,
    vector_store=store,  # Adds implicit "retrieve" tool
    tools=[WebSearchTool()],
    max_iterations=5
)

result = await agent.query(
    "What does our Q4 report say about revenue? Also search for industry benchmarks."
)
# Combines internal documents + web search seamlessly
```

**Why you need this:** Sometimes you need to think, search, calculate, think again, and finally answer. ReAct makes this systematic and observable.

</details>

<details>
<summary><strong>Recipe 15 — Parent-child chunking (provenance tracking)</strong></summary>

Track which chunks came from which parent document. Essential for citations, expanded context, and hierarchical retrieval.

### Chunk with Parent Tracking
```python
from agentic_rag.document_processing.chunkers import (
    ParentChildChunker,
    FixedSizeChunker
)

chunker = ParentChildChunker(
    base_chunker=FixedSizeChunker(chunk_size=500),
    include_parent_metadata=True
)

# Chunk with tracking
parent = chunker.chunk_document(
    text="Long document text...",
    doc_id="doc_001",
    metadata={"source": "report.pdf", "author": "Alice"}
)

# Access children with full provenance
for child in parent.children:
    print(f"Chunk {child.child_index}/{child.total_children}")
    print(f"  Parent: {child.parent_doc_id}")
    print(f"  Citation: {child.to_citation()}")  # [report.pdf, chunk 1/5]
    print(f"  Position: chars {child.char_start}-{child.char_end}")
```

### Hierarchical Retrieval
```python
from agentic_rag.document_processing.chunkers import HierarchicalRetriever

# Index child chunks
for parent in parent_chunks:
    for child in parent.children:
        await vector_store.add(
            child.text,
            metadata={
                "parent_doc_id": child.parent_doc_id,
                "child_index": child.child_index,
            }
        )

# Retrieve with context expansion
retriever = HierarchicalRetriever(
    vector_store=store,
    parents=parent_chunks
)

chunks = await retriever.retrieve("revenue trends", top_k=3)
expanded = retriever.expand_context(chunks, window_size=1)
# Returns target chunks + neighboring siblings for context
```

**Why you need this:** Because "trust me bro" is not a valid citation. Know exactly which chunk came from which document, and retrieve siblings for expanded context when needed.

</details>

<details>
<summary><strong>Recipe 14 — Bulk document loading (SimpleDirectoryReader)</strong></summary>

Load entire directories with automatic file type detection. The missing piece between "I have a folder of documents" and "my RAG system works."

### Load Entire Directory
```python
from agentic_rag.document_processing.loaders import SimpleDirectoryReader

reader = SimpleDirectoryReader(
    input_dir="./data",
    recursive=True,
    exclude=["*.tmp", "*.log", "node_modules/*"]
)

# Load with progress tracking
def on_progress(file_path, current, total):
    print(f"Loaded {current}/{total}: {Path(file_path).name}")

documents = await reader.load(progress_callback=on_progress)
print(f"Loaded {len(documents)} documents from {total} files")
```

### Auto File Type Detection
```python
# Automatically detects and uses correct loader for each extension
# .pdf → PDFLoader, .docx → DocxLoader, .md → MarkdownLoader, etc.
reader = SimpleDirectoryReader("./mixed_documents")
docs = await reader.load()

# Each document has file metadata
doc = docs[0]
print(doc.metadata["file_path"])   # ./mixed_documents/report.pdf
print(doc.metadata["file_ext"])    # .pdf
print(doc.metadata["directory"]) # ./mixed_documents
```

### With Custom Metadata
```python
def add_source_metadata(file_path: str) -> dict:
    return {
        "department": "engineering",
        "processed_at": datetime.now().isoformat(),
        "filename": Path(file_path).name
    }

reader = SimpleDirectoryReader(
    "./data",
    file_metadata=add_source_metadata
)
```

**Why you need this:** Because loading documents one-by-one is tedious and you have better things to do. Point it at a folder, get documents back.

</details>

<details>
<summary><strong>Recipe 13 — FastAPI server (production-ready API)</strong></summary>

### Basic Server
```python
from agentic_rag.server import create_app

# One-liner production API
app = create_app()

# Run with uvicorn
# uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4
```

### Custom Server Configuration
```python
from agentic_rag.server import create_app, ServerConfig
from agentic_rag import AgenticRAG

# Configure components
config = ServerConfig(
    # RAG settings
    vector_store="chroma",
    llm_provider="openai",
    llm_model="gpt-4o",
    enable_agent=True,
    enable_memory=True,
    
    # API settings
    title="My RAG API",
    version="1.0.0",
    docs_url="/docs",
    
    # Security
    api_key_header="X-API-Key",
    require_auth=True,
    
    # Rate limiting
    rate_limit_requests=100,
    rate_limit_window=60,
    
    # CORS
    allowed_origins=["https://app.example.com"],
    
    # Features
    enable_streaming=True,
    enable_caching=True,
    enable_metrics=True
)

app = create_app(config=config)
```

### API Endpoints Reference

| Method | Path | Description | Request Body |
|--------|------|-------------|--------------|
| `GET` | `/health` | Health check | - |
| `GET` | `/ready` | Readiness probe | - |
| `POST` | `/query` | RAG query | `{"query": "...", "filters": {}}` |
| `POST` | `/query/cognitive` | Full AC-RAG | `{"query": "...", "enable_reflection": true}` |
| `POST` | `/stream` | SSE streaming | `{"query": "..."}` |
| `POST` | `/stream/structured` | Structured events | `{"query": "...", "include_sources": true}` |
| `POST` | `/ingest` | Document upload | multipart/form-data |
| `POST` | `/ingest/text` | Text ingestion | `{"text": "...", "metadata": {}}` |
| `GET` | `/documents` | List documents | `?limit=10&offset=0` |
| `DELETE` | `/documents/{id}` | Delete document | - |
| `GET` | `/memory/stats` | Memory statistics | - |
| `POST` | `/memory/clear` | Clear memory | `{"type": "episodic"}` |
| `GET` | `/tools` | List available tools | - |
| `POST` | `/tools/execute` | Execute tool | `{"tool": "...", "params": {}}` |
| `GET` | `/metrics` | Prometheus metrics | - |

### Query Endpoint Examples
```python
import requests

# Basic query
response = requests.post(
    "http://localhost:8000/query",
    headers={"X-API-Key": "your-key"},
    json={
        "query": "What is machine learning?",
        "temperature": 0.7,
        "max_tokens": 500,
        "include_sources": True
    }
)
result = response.json()
print(result["answer"])
print(result["sources"])  # Source documents
print(result["confidence"])  # 0.0 - 1.0

# Cognitive query with full AC-RAG
response = requests.post(
    "http://localhost:8000/query/cognitive",
    json={
        "query": "Explain quantum computing applications",
        "enable_reflection": True,
        "enable_progressive_retrieval": True,
        "max_reflections": 3
    }
)
result = response.json()
print(result["answer"])
print(result["reflection_count"])  # Number of self-improvement loops
print(result["retrieval_attempts"])  # Progressive retrieval count
```

### Streaming Endpoints
```bash
# Server-Sent Events (SSE)
curl -X POST http://localhost:8000/stream \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain neural networks"}'

# Response: stream of text chunks
# data: {"type": "chunk", "content": "Neural"}
# data: {"type": "chunk", "content": " networks"}
# data: {"type": "complete"}
```

```javascript
// JavaScript client
const eventSource = new EventSource('/stream?query=...');
eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === 'chunk') {
        document.getElementById('output').innerHTML += data.content;
    }
};
```

### Document Ingestion
```python
# Upload file
with open("document.pdf", "rb") as f:
    response = requests.post(
        "http://localhost:8000/ingest",
        headers={"X-API-Key": "your-key"},
        files={"file": ("document.pdf", f, "application/pdf")},
        data={
            "chunker": "semantic",
            "max_chunk_size": 1000,
            "metadata": '{"source": "upload", "user": "alice"}'
        }
    )
print(response.json()["document_ids"])  # List of inserted doc IDs

# Ingest text directly
response = requests.post(
    "http://localhost:8000/ingest/text",
    json={
        "text": "Your document content here...",
        "metadata": {"source": "api", "title": "My Doc"}
    }
)
```

### Authentication & Security
```python
from agentic_rag.server import AuthMiddleware

# API key authentication
app = create_app(
    auth_provider="api_key",
    api_keys=["key-1", "key-2", "key-3"]
)

# JWT authentication
app = create_app(
    auth_provider="jwt",
    jwt_secret="your-secret",
    jwt_algorithm="HS256"
)

# Custom auth
from agentic_rag.server import BaseAuthProvider

class CustomAuth(BaseAuthProvider):
    async def authenticate(self, request):
        token = request.headers.get("Authorization")
        # Your validation logic
        return {"user_id": "123", "roles": ["user"]}

app = create_app(auth_provider=CustomAuth())
```

### Deployment Configuration
```yaml
# docker-compose.yml
version: '3.8'
services:
  rag-api:
    image: ai-prishtina-agentic-rag:latest
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - VECTOR_STORE=chroma
      - ENABLE_AGENT=true
      - RATE_LIMIT=100/minute
    volumes:
      - ./data:/app/data
      - ./config.yaml:/app/config.yaml
    command: uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4

  redis:
    image: redis:alpine
    # For caching and session storage
```

### Kubernetes Deployment
```yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rag-api
  template:
    metadata:
      labels:
        app: rag-api
    spec:
      containers:
      - name: api
        image: ai-prishtina-agentic-rag:latest
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: openai-key
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
```

**Why you need this:** Because not everyone wants to write Python. Sometimes you just need an API that works out of the box - with auth, rate limiting, streaming, and all the bells and whistles already baked in.

</details>

---

## Test Coverage & Quality

We don't ship code we wouldn't use ourselves. That's why we maintain **99% test coverage** across all modules.

### Coverage by Module
- **Core functionality**: 100% coverage
- **Cognitive layer**: 99% coverage  
- **Providers (18 LLM, 13 vector stores)**: 99% coverage
- **Document processing**: 99% coverage
- **Contrib integrations**: 99% coverage

### Test Suite Stats
- **131 test files** (124 unit + 7 integration)
- **280+ Python files** tested
- **Advanced PDF backends**: Full coverage for docling, camelot, unstructured, kreuzberg
- **Continuous integration** on every commit
- **No regressions** - ever

### Running Tests
```bash
# Run all tests
pytest

# Run PDF loader tests specifically
pytest tests/unit/test_pdf_advanced_loaders.py -v
pytest tests/integration/test_pdf_backends_integration.py -v

# Run with coverage
pytest --cov=agentic_rag --cov-report=html
```

**Why this matters:** Because confidence comes from knowing your code works. We test everything from the happy path to the weird edge cases, so you can deploy with confidence.

---

## Architecture

```
                    ┌─────────────────────────────────────┐
                    │        FastAPI Server / CLI         │
                    │  (Rate Limit · Auth · Guardrails    │
                    │   Multi-Tenancy · Feedback Loop)    │
                    └──────────────┬──────────────────────┘
                                   │
          ┌────────────────────────┼──────────────────────────┐
          │              AC-RAG Cognitive Layer               │
          │  ┌──────────┐ ┌──────────┐ ┌─────────────────┐    │
          │  │  Neural  │ │Reflective│ │  Hierarchical   │    │
          │  │  Router  │ │  Agent   │ │    Memory       │    │
          │  └──────────┘ └──────────┘ └─────────────────┘    │
          │  ┌───────────┐ ┌──────────┐ ┌─────────────────┐   │
          │  │Progressive│ │Calibrated│ │   Knowledge     │   │
          │  │ Retrieval │ │Confidence│ │    Fusion       │   │
          │  └───────────┘ └──────────┘ └─────────────────┘   │
          │  ┌───────────┐ ┌──────────┐ ┌─────────────────┐   │
          │  │   Neural  │ │   Tool   │ │  Multi-Agent    │   │
          │  │ Classifier│ │ Composer │ │   Orchestrator  │   │
          │  └───────────┘ └──────────┘ └─────────────────┘   │
          └─────────────────────┬─────────────────────────────┘
                                │
     ┌──────────────┬───────────┼──────────┬──────────────┐
     ▼              ▼           ▼          ▼              ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐
│ Retrieval│ │   LLM    │ │  Graph   │ │ Tools  │ │  Cache / │
│ (Vector) │ │ Providers│ │   RAG    │ │(Search)│ │ Circuit  │
└──────────┘ └──────────┘ └──────────┘ └────────┘ │ Breaker  │
     │              │           │          │      └──────────┘
     ▼              ▼           ▼          ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────────┐
│ Document │ │Evaluation│ │  Config  │ │   Docker / CI/CD   │
│Processing│ │& Metrics │ │(YAML/INI)│ │   Infrastructure   │
└──────────┘ └──────────┘ └──────────┘ └────────────────────┘
```

For detailed architecture docs: **[docs/01-architecture.md](docs/01-architecture.md)** | AC-RAG vision: **[docs/vision-ac-rag.md](docs/vision-ac-rag.md)**

## Configuration

Everything is externalized. No magic numbers hiding in the code. Load from YAML, INI, or env vars — or just use the defaults (they're pretty good).

```python
from agentic_rag.utils.config import Config

cfg = Config.from_file("config.yaml")   # YAML
cfg = Config.from_ini("config.ini")     # INI
cfg = Config.from_env(prefix="AGENTIC_RAG_")  # Environment variables
cfg = Config()                           # Sensible defaults
```

See [`config.example.yaml`](config.example.yaml) and [`config.example.ini`](config.example.ini) for complete examples.

<details>
<summary><strong>Full configuration reference (click to expand)</strong></summary>

### Configuration Sections

#### **llm** — LLM Provider Settings

| Parameter | Default | Options | Description |
|-----------|---------|---------|-------------|
| `provider` | `openai` | `openai`, `anthropic`, `cohere`, `local` | LLM provider |
| `model` | `gpt-3.5-turbo` | - | Model name |
| `api_key` | `null` | - | API key (or `LLM_API_KEY` env var) |
| `base_url` | `null` | - | Custom API base URL |
| `temperature` | `0.7` | 0.0–2.0 | Sampling temperature |
| `max_tokens` | `1000` | - | Max tokens to generate |
| `timeout` | `30` | - | Request timeout (seconds) |

#### **vector_store** — Vector Database

| Parameter | Default | Options | Description |
|-----------|---------|---------|-------------|
| `provider` | `chroma` | `chroma`, `pinecone`, `weaviate`, `faiss` | Backend provider |
| `collection_name` | `agentic_rag` | - | Collection/index name |
| `persist_directory` | `null` | - | Persistence path (Chroma/FAISS) |
| `embedding_model` | `sentence-transformers/all-MiniLM-L6-v2` | - | HuggingFace embedding model |
| `dimension` | `384` | - | Embedding dimension |

#### **document_processing** — Chunking

| Parameter | Default | Options | Description |
|-----------|---------|---------|-------------|
| `chunk_size` | `1000` | - | Token/character chunk size |
| `chunk_overlap` | `200` | - | Overlap between chunks |
| `chunking_strategy` | `recursive` | `fixed`, `semantic`, `recursive` | Chunking algorithm |
| `enable_preprocessing` | `true` | - | Enable text preprocessing |

#### **retrieval** — Search & Reranking

| Parameter | Default | Description |
|-----------|---------|-------------|
| `top_k` | `5` | Number of results to retrieve |
| `similarity_threshold` | `0.7` | Minimum similarity score (0.0–1.0) |
| `enable_reranking` | `true` | Enable cross-encoder reranking |
| `reranker_model` | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Reranker model name |
| `enable_hybrid_search` | `false` | Combine dense + keyword search |
| `dense_weight` | `0.7` | Weight for dense embeddings (0.0–1.0) |
| `sparse_weight` | `0.3` | Weight for sparse/BM25 (0.0–1.0) |

#### **agent** — Agentic Capabilities

| Parameter | Default | Description |
|-----------|---------|-------------|
| `enable_planning` | `true` | Enable query planning |
| `max_planning_steps` | `5` | Max planning iterations |
| `enable_memory` | `true` | Enable working memory |
| `memory_size` | `1000` | Memory capacity (entries) |
| `enable_tools` | `true` | Enable tool integration |
| `available_tools` | `["web_search", "calculator"]` | List of enabled tool names |
| `tool_timeout` | `30` | Tool execution timeout (seconds) |

#### **cognitive** — AC-RAG Features

| Parameter | Default | Description |
|-----------|---------|-------------|
| `enable_query_routing` | `true` | Enable neural query routing |
| `use_llm_classification` | `false` | Use LLM vs DistilBERT for routing |
| `routing_confidence_threshold` | `0.8` | Router confidence cutoff (0.0–1.0) |
| `rule_confidence_fallback` | `0.6` | Fallback threshold when rules fail |
| `enable_reflection` | `true` | Enable reflective agent loop |
| `confidence_threshold` | `0.75` | Minimum answer confidence (0.0–1.0) |
| `max_reflections` | `3` | Max self-critique iterations |
| `reflection_temperature` | `0.3` | Temperature for reflection (0.0–1.0) |
| `critique_context_window` | `1500` | Tokens for critique context |
| `regeneration_context_window` | `3000` | Tokens for answer regeneration |
| `episodic_max_entries` | `100` | Episodic memory size |
| `episodic_ttl_seconds` | `3600` | Episodic memory TTL (seconds) |
| `semantic_persist_path` | `null` | Semantic memory storage path |
| `procedural_persist_path` | `null` | Procedural memory storage path |
| `procedural_ema_alpha` | `0.3` | EMA learning rate (0.0–1.0) |
| `procedural_recall_threshold` | `0.3` | Strategy recall threshold (0.0–1.0) |
| `procedural_maturity_count` | `5` | Uses before strategy matures |
| `progressive_max_iterations` | `3` | Progressive retrieval iterations |
| `progressive_min_quality` | `0.6` | Minimum retrieval quality (0.0–1.0) |
| `progressive_reformulation_temperature` | `0.3` | Query reformulation temperature |
| `min_calibration_samples` | `30` | Calibration training samples |
| `calibration_learning_rate` | `0.01` | Platt scaling learning rate |
| `calibration_epochs` | `200` | Calibration training epochs |
| `confidence_length_normalizer` | `200` | Length normalization factor |
| `confidence_weights` | (map) | Weights for confidence components |

#### **graph** — Graph RAG

| Parameter | Default | Options | Description |
|-----------|---------|---------|-------------|
| `extraction_method` | `pattern` | `pattern`, `spacy`, `llm` | Entity extraction method |
| `spacy_model` | `en_core_web_sm` | - | spaCy model for NER |
| `merge_similar_entities` | `true` | - | Deduplicate similar entities |
| `similarity_threshold` | `0.85` | - | Entity merge threshold (0.0–1.0) |
| `max_hops` | `2` | - | Max graph traversal depth |
| `graph_top_k` | `10` | - | Max graph results |

#### **server** — FastAPI Server

| Parameter | Default | Description |
|-----------|---------|-------------|
| `host` | `0.0.0.0` | Bind address |
| `port` | `8000` | Listen port |
| `workers` | `2` | Uvicorn worker processes |
| `cors_origins` | `["*"]` | Allowed CORS origins |
| `rate_limit_rpm` | `60` | Rate limit (requests per minute) |
| `log_level` | `INFO` | Logging level |
| `enable_auth` | `false` | Enable API key/JWT auth |
| `api_keys` | `[]` | List of allowed API keys |
| `jwt_secret` | `null` | JWT signing secret |
| `jwt_algorithm` | `HS256` | JWT algorithm |

#### **cache** — Semantic Cache

| Parameter | Default | Options | Description |
|-----------|---------|---------|-------------|
| `enabled` | `false` | - | Enable caching |
| `backend` | `memory` | `memory`, `redis` | Cache backend |
| `redis_url` | `redis://localhost:6379/0` | - | Redis connection URL |
| `ttl_seconds` | `3600` | - | Cache entry TTL (seconds) |
| `similarity_threshold` | `0.92` | - | Semantic match threshold (0.0–1.0) |
| `max_entries` | `10000` | - | Max cache size |

#### **guardrails** — Safety & Output Control

| Parameter | Default | Description |
|-----------|---------|-------------|
| `enable_pii_detection` | `false` | Enable PII detection/redaction |
| `enable_toxicity_filter` | `false` | Enable toxic content filtering |
| `max_output_length` | `5000` | Max output character limit |
| `blocked_terms` | `[]` | List of blocked terms/strings |

#### **evaluation** — Metrics & Benchmarking

| Parameter | Default | Description |
|-----------|---------|-------------|
| `enable_evaluation` | `false` | Enable evaluation metrics |
| `metrics` | `["relevance", "faithfulness"]` | Metrics to compute |
| `log_level` | `INFO` | Evaluation logging level |

### Environment Variables

```bash
# API Keys
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
COHERE_API_KEY=your_cohere_key
PINECONE_API_KEY=your_pinecone_key
SERP_API_KEY=your_search_api_key

# Config loading
AGENTIC_RAG_CONFIG_PATH=/path/to/config.yaml
```

</details>

## Docker deployment

```bash
# One command to rule them all
docker-compose up -d

# Or if you prefer doing things the hard way
docker build -t agentic-rag .
docker run -p 8000:8000 -e OPENAI_API_KEY=sk-... agentic-rag
```

The server is `agentic_rag.server.create_app` (see Recipe 13 above). Multi-stage Dockerfile and docker-compose included in `examples/docker/`.

## Testing

298+ tests and growing. Run them all or pick your battles:

```bash
# Run everything (grab a coffee)
pytest

# Run with coverage (grab two coffees)
pytest --cov=agentic_rag --cov-report=html

# Pick your battles
pytest tests/unit/test_tools.py             # Core tools
pytest tests/unit/test_cognitive.py         # AC-RAG modules
pytest tests/unit/test_graph*.py            # Graph RAG (all graph tests)
pytest tests/unit/test_chunkers*.py         # All chunking strategies
pytest tests/unit/test_loaders*.py          # Document loaders
pytest tests/unit/test_prompts.py           # Prompt management
pytest tests/integration/                   # Integration tests
pytest tests/unit/test_evaluation.py -v     # Benchmarks
```

## Contributing

We don't bite. Contributions are welcome — from typo fixes to new vector store backends.

```bash
git clone https://github.com/albanmaxhuni/ai-prishtina-agentic-rag.git
cd ai-prishtina-agentic-rag
pip install -e .[dev]
pre-commit install
pytest  # Make sure everything passes before you start
```

**Code quality:** Black + isort (formatting), Flake8 (linting), mypy (type checking), pytest (tests). See **[CONTRIBUTING.md](CONTRIBUTING.md)** for branch conventions and review expectations.

## License

**Dual-licensed:**

- **AGPLv3+** — Free for open source. Copyleft applies. Network use requires source disclosure.
- **Commercial** — For proprietary use without copyleft obligations. Contact info@albanmaxhuni.com or alban.q.maxhuni@gmail.com

See the [LICENSE](https://www.gnu.org/licenses/agpl-3.0.html) for complete details.

---

<details>
<summary><strong>Recipe 19 — Prompt Optimization (DSPy-style)</strong></summary>

Automated prompt tuning against golden datasets. Like [DSPy](https://github.com/stanfordnlp/dspy) but built-in.

### Basic Prompt Optimization
```python
from agentic_rag.cognitive import PromptOptimizer, OptimizationConfig

# Golden dataset for optimization
golden_examples = [
    {
        "question": "What is RAG?",
        "answer": "Retrieval-Augmented Generation combines information retrieval with text generation..."
    },
    {
        "question": "How does chunking work?",
        "answer": "Chunking splits documents into smaller segments for efficient retrieval..."
    },
]

# Configure optimization
config = OptimizationConfig(
    metric="answer_relevance",  # "accuracy", "completeness", "semantic_similarity"
    max_iterations=20,
    few_shot_k=3,
)

optimizer = PromptOptimizer(
    llm_provider=llm,
    vector_store=store,
    config=config,
)

# Run optimization
result = await optimizer.optimize(
    base_prompt="Answer the question based on the provided context.",
    examples=golden_examples,
)

print(f"Best prompt: {result.best_prompt.prompt_text}")
print(f"Score: {result.best_prompt.score:.3f}")
print(f"Improvement: +{result.improvement:.3f}")

# Save results
optimizer.save_candidates("optimized_prompts.json", result)
```

### With Custom Metrics
```python
from agentic_rag.cognitive import ExactMatchMetric, SemanticSimilarityMetric

# Exact match for factual QA
metric = ExactMatchMetric()

# Or semantic similarity
metric = SemanticSimilarityMetric(embedding_provider=embedder)

config = OptimizationConfig(
    metric="custom",
    custom_metric_fn=lambda pred, ref: 1.0 if "correct" in pred else 0.0
)
```

### Optimization Strategies
The optimizer automatically tries:
1. **Few-shot variations** — Different example selections
2. **Instruction refinement** — LLM-rewritten prompts
3. **Chain-of-thought** — Adding reasoning steps
4. **Combinations** — Best of both worlds

**Why you need this:** Hand-tuning prompts is tedious and unscientific. Let the system optimize against your actual golden dataset — measurable, reproducible, better.

</details>

<details>
<summary><strong>Recipe 20 — Migration from LlamaIndex/LangChain</strong></summary>

Import your existing indices and documents from other RAG frameworks. Zero-downtime migration.

### Import from LlamaIndex
```python
from agentic_rag.contrib.migration import LlamaIndexImporter
from llama_index.core import VectorStoreIndex

# Your existing LlamaIndex
li_index = VectorStoreIndex.from_documents(docs)

# Import to Agentic RAG
importer = LlamaIndexImporter(vector_store=my_store)
result = await importer.import_vector_store_index(li_index)

print(f"Migrated {result.documents_imported} documents in {result.duration_seconds:.2f}s")
print(f"Failed: {result.documents_failed}")
```

### Import from LangChain
```python
from agentic_rag.contrib.migration import LangChainImporter
from langchain_chroma import Chroma

# Your existing LangChain store
lc_store = Chroma(persist_directory="./my_chroma_db")

# Import
importer = LangChainImporter(vector_store=my_store)
result = await importer.import_vectorstore(lc_store)

print(f"Migrated {result.documents_imported} documents")
```

### Export from Agentic RAG
```python
from agentic_rag.contrib.migration import AgenticRAGExporter

exporter = AgenticRAGExporter(vector_store=store)

# Export to native format
await exporter.export_to_file("backup.json")

# Export to LangChain format
await exporter.export_to_langchain_format("lc_export.json")

# Export to LlamaIndex format
await exporter.export_to_llamaindex_format("li_export.json")
```

### Validation & Dry Run
```python
from agentic_rag.contrib.migration import MigrationConfig

config = MigrationConfig(
    batch_size=100,
    skip_errors=True,
    preserve_metadata=True,
    dry_run=True,  # Preview without actual import
)

importer = LlamaIndexImporter(vector_store=store, config=config)
result = await importer.import_vector_store_index(li_index)
print(f"Would import {result.documents_imported} documents")
```

**Why you need this:** Because migrating RAG systems shouldn't require a PhD in data engineering. One line to import, one line to export, your data is free.

</details>

<details>
<summary><strong>Recipe 21 — Prompt Versioning</strong></summary>

Git-like version control for your prompts. Track evolution, compare versions, rollback when needed.

### Basic Prompt Versioning
```python
from agentic_rag.core import PromptVersioning, PromptVersion

# Create version store
versioning = PromptVersioning(storage_dir="./prompt_versions")

# Save initial version
v1 = versioning.save_prompt(
    prompt_id="qa_prompt",
    prompt_text="Answer the question based on the context.",
    author="alice",
    tags=["qa", "baseline"],
    change_notes="Initial version"
)

# Iterate and improve
v2 = versioning.save_prompt(
    prompt_id="qa_prompt",
    prompt_text="Provide a concise answer (max 3 sentences) based on the context.",
    author="alice",
    parent_version=v1.version_id,
    tags=["qa", "improved"],
    change_notes="Added length constraint for conciseness"
)

# Another iteration with CoT
v3 = versioning.save_prompt(
    prompt_id="qa_prompt",
    prompt_text="""Answer the question based on the context.

Think step by step:
1. Identify the key information needed
2. Check if it's in the context
3. Formulate a concise answer

Answer:""",
    author="bob",
    parent_version=v2.version_id,
    tags=["qa", "chain-of-thought"],
    change_notes="Added chain-of-thought reasoning"
)
```

### Compare Versions
```python
# List all versions
versions = versioning.list_versions("qa_prompt")
for v in versions:
    print(f"{v.version_id}: {v.change_notes} (by {v.author})")

# Diff two versions
diff = versioning.diff_versions("qa_prompt", "v1", "v3")
print(f"Similarity: {diff.similarity_score:.2f}")
print(diff.unified_diff)
```

### Rollback
```python
# Oops, v3 made things worse. Rollback to v2.
v4 = versioning.rollback("qa_prompt", "v2")
print(f"Rolled back to v2, new version: {v4.version_id}")
```

### Tagging & Organization
```python
# Tag specific version
versioning.tag_version("qa_prompt", "v3", "production")

# Search by tag
prod_versions = versioning.search_by_tag("production")
```

### Export/Import
```python
# Export all versions of a prompt
versioning.export_prompt("qa_prompt", "qa_prompt_history.json")

# Import on another machine
versioning.import_prompt("qa_prompt_history.json")
```

**Why you need this:** Because prompts are code and deserve version control. Track what worked, rollback what didn't, collaborate without chaos.

</details>

<details>
<summary><strong>Recipe 22 — Workflow Orchestration UI</strong></summary>

Visual workflow builder for designing complex RAG pipelines without writing code.

### Enable Workflow UI
```python
from agentic_rag.server import create_app_with_workflow_ui

# Create app with workflow UI
app = create_app_with_workflow_ui()

# Access at http://localhost:8000/workflow/ui
```

### Build Workflows Visually
The workflow UI provides:
- **Node Palette**: Drag and drop nodes (loader, chunker, retriever, generator, tools)
- **Canvas**: Visual pipeline builder with connections
- **Properties Panel**: Configure each node's parameters
- **Execution Monitor**: Run and debug workflows

### Programmatic Workflow Creation
```python
from agentic_rag.server import Workflow, WorkflowNode, WorkflowEdge

# Define a workflow in code
workflow = Workflow(
    name="Document Q&A Pipeline",
    description="Load docs, chunk, embed, and answer questions",
    nodes=[
        WorkflowNode(
            id="loader",
            type="loader",
            label="PDF Loader",
            config={"source": "./docs/*.pdf"},
            position={"x": 100, "y": 100}
        ),
        WorkflowNode(
            id="chunker",
            type="chunker",
            label="Semantic Chunker",
            config={"chunk_size": 1000, "overlap": 200},
            position={"x": 300, "y": 100}
        ),
        WorkflowNode(
            id="embedder",
            type="embedder",
            label="OpenAI Embeddings",
            config={"model": "text-embedding-3-small"},
            position={"x": 500, "y": 100}
        ),
        WorkflowNode(
            id="retriever",
            type="retriever",
            label="Vector Retriever",
            config={"top_k": 5},
            position={"x": 300, "y": 300}
        ),
        WorkflowNode(
            id="generator",
            type="generator",
            label="GPT-4 Generator",
            config={"model": "gpt-4", "temperature": 0.7},
            position={"x": 500, "y": 300}
        ),
    ],
    edges=[
        WorkflowEdge(source="loader", target="chunker"),
        WorkflowEdge(source="chunker", target="embedder"),
        WorkflowEdge(source="embedder", target="retriever", label="index"),
        WorkflowEdge(source="retriever", target="generator", label="context"),
    ]
)

# Save workflow
import requests
response = requests.post("http://localhost:8000/workflow/api/workflows", json=workflow.dict())
```

### Execute Workflow
```python
# Run via API
response = requests.post(f"http://localhost:8000/workflow/api/workflows/{workflow.id}/run")
execution = response.json()

# Check status
status = requests.get(f"http://localhost:8000/workflow/api/executions/{execution['execution_id']}")
```

### Available Node Types
- **📄 Loader**: Document loaders (PDF, HTML, etc.)
- **✂️ Chunker**: Text chunking strategies
- **🔢 Embedder**: Embedding models
- **🔍 Retriever**: Vector/hybrid/graph retrieval
- **🤖 Generator**: LLM response generation
- **🛠️ Tool**: Custom tools and APIs
- **❓ Condition**: Conditional branching
- **📤 Output**: Result output and formatting

**Why you need this:** Because not everyone wants to write Python to build a pipeline. Visual design, click-to-configure, run-and-monitor — democratizes RAG for the whole team.

</details>

## Contact and links

| Resource | URL |
|----------|-----|
| Documentation | [ai-prishtina-agentic-rag.readthedocs.io](https://ai-prishtina-agentic-rag.readthedocs.io) |
| Issue tracker | [GitHub Issues](https://github.com/albanmaxhuni/ai-prishtina-agentic-rag/issues) |
| Discussions | [GitHub Discussions](https://github.com/albanmaxhuni/ai-prishtina-agentic-rag/discussions) |
| Email | info@albanmaxhuni.com |

Maintained by the **AI Prishtina** project. Built on the shoulders of open-source giants and published RAG research.

**Sponsor ongoing development:**

· [coff.ee/albanmaxhuni](https://coff.ee/albanmaxhuni)

`or`

· **BTC:** `3BfwQJ2dNTWDn98H5SggNC47fNX8HeWshP`

<p align="center">
  <img src="assets/png/btc-wallet.png" alt="BTC Wallet QR Code" width="200"/>
</p>
