Metadata-Version: 2.4
Name: consistent-rag
Version: 0.1.5
Summary: ConsistentRAG: Improving factual consistency in RAG through knowledge graph grounding and multi-agent refinement
Keywords: rag,knowledge-graph,consistency,llm,ppr,agentic
Author: Seb
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Intended Audience :: Science/Research
License-File: LICENSE
Requires-Dist: loguru
Requires-Dist: numpy
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-ai>=0.2.0
Requires-Dist: pydantic-graph>=0.1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: openai>=1.0.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: tenacity>=8.2.0
Requires-Dist: nest-asyncio>=1.5.0
Requires-Dist: inflect>=7.0.0
Requires-Dist: networkx>=3.0
Requires-Dist: faiss-cpu>=1.7.0
Requires-Dist: gliner>=0.2.0
Requires-Dist: qdrant-client>=1.7.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: pre-commit>=4.5.1
Requires-Dist: pytest-cov>=7.1.0
Requires-Dist: psycopg[binary]>=3.1.0 ; extra == "db"
Requires-Dist: psycopg-pool>=3.1.0 ; extra == "db"
Requires-Dist: consistent-rag[eval, finance, server, db, viz] ; extra == "dev"
Requires-Dist: pytest ; extra == "dev"
Requires-Dist: pytest-asyncio ; extra == "dev"
Requires-Dist: ruff ; extra == "dev"
Requires-Dist: ipython ; extra == "dev"
Requires-Dist: jupyterlab ; extra == "dev"
Requires-Dist: notebook ; extra == "dev"
Requires-Dist: mkdocs ; extra == "dev"
Requires-Dist: python-dotenv ; extra == "dev"
Requires-Dist: datasets>=2.14.0 ; extra == "eval"
Requires-Dist: deepeval ; extra == "eval"
Requires-Dist: pandas ; extra == "eval"
Requires-Dist: scikit-learn ; extra == "eval"
Requires-Dist: matplotlib ; extra == "eval"
Requires-Dist: tqdm ; extra == "eval"
Requires-Dist: stable-baselines3>=2.3.0 ; extra == "finance"
Requires-Dist: gymnasium>=0.29.0 ; extra == "finance"
Requires-Dist: spacy>=3.7.0 ; extra == "finance"
Requires-Dist: fastmcp>=0.1.0 ; extra == "server"
Requires-Dist: gravis>=0.1.0 ; extra == "viz"
Project-URL: Documentation, https://github.com/pinkfloydsito/consistent-rag/tree/main/docs
Project-URL: Repository, https://github.com/pinkfloydsito/consistent-rag
Provides-Extra: db
Provides-Extra: dev
Provides-Extra: eval
Provides-Extra: finance
Provides-Extra: server
Provides-Extra: viz

# ConsistentRAG

[![Tests](https://github.com/pinkfloydsito/consistent-rag/actions/workflows/tests.yml/badge.svg)](https://github.com/pinkfloydsito/consistent-rag/actions/workflows/tests.yml)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

**ConsistentRAG: Improving Factual Consistency in RAG via Graph-Based Retrieval and Ranking**

A modular Python framework that improves factual consistency in Retrieval-Augmented Generation by grounding queries in a live knowledge graph. It uses a suite of graph algorithms to retrieve and rank structurally sound reasoning paths, which are then refined through a multi-agent loop where specialized AI "critics" collaboratively improve the answer across multiple iterations.

---

## Try It in 5 Minutes (No Infrastructure Needed)

The fastest way to see ConsistentRAG in action uses **KG-only mode** — no vector database, no Docker, just Python + an LLM API key.

```bash
# 1. Clone and install
git clone https://github.com/pinkfloydsito/consistent-rag.git
cd consistent-rag
uv sync

# 2. Set your API key (pick one)
export OPENAI_API_KEY="sk-..."           # OpenAI, Azure, or any OpenAI-compatible endpoint
# OR
export DEEPSEEK_API_KEY="sk-..."         # DeepSeek (cheaper, no credit card needed)

# 3. Run the simplest example
uv run python examples/kg_rag_example.py
```

That's it. The example builds a knowledge graph from text and answers questions using graph reasoning.

---

## Prerequisites

### API Keys (Required)

ConsistentRAG needs an LLM for extraction, answering, and critique. Set **at least one** of these:

| Priority | Variable | Example | Notes |
|---|---|---|---|
| 1st | `OPENAI_API_KEY` | `sk-...` | OpenAI, Azure, or any OpenAI-compatible provider |
| 2nd | `DEEPSEEK_API_KEY` | `sk-...` | Fallback if OpenAI key is absent |
| — | `OPENAI_API_BASE` | `https://api.openai.com/v1` | Override the API endpoint (e.g., for Azure or local LLMs) |

Copy `.env.example` to `.env` and fill in your keys:

```bash
cp .env.example .env
# Edit .env with your keys
```

### Infrastructure (Only for specific modes)

| What | When needed | How to start |
|---|---|---|
| **Qdrant** | Online modes (`backend="qdrant"`) | `make docker-up` |
| **PostgreSQL** | Experiment tracking / integration tests | `make docker-up` (same command) |

The `make docker-up` command starts both Qdrant (port 6333) and PostgreSQL (port 5432). You don't need them for KG-only or FAISS-offline modes.

---

## Installation

```bash
# With uv (recommended)
uv sync --extra dev

# With pip
pip install -e ".[dev]"
```

---

## Quick Start: Three Paths

### Path A: "Just want to try it" — KG-Only, No Servers

No vector DB. No Docker. Just a graph built from text.

```python
from consistent_rag import ConsistentRAG

rag = ConsistentRAG(pipeline="kg_only", strategy="ppr")

context = """
TechStart was founded in 2019 by Alice Johnson in San Francisco.
The company focuses on AI-powered analytics for retail businesses.
"""

result = rag.query("Who founded TechStart?", context=context)
print(result.answer)
# Output: "TechStart was founded by Alice Johnson."
```

**What's happening:** The pipeline extracts triples from text → builds a NetworkX graph → uses PPR to rank reasoning paths → generates an answer.

See `examples/kg_rag_example.py` for a full runnable script.

### Path B: "I have documents to index" — FAISS + KG, Fully Local

Good for: offline use, no external services, saving/loading indices.

```python
from consistent_rag import ConsistentRAG

rag = ConsistentRAG(
    pipeline="hybrid",
    backend="faiss",
    strategy="ppr",
    embedding_model="all-MiniLM-L6-v2",  # local embeddings, no API key
)

# Index your documents
docs = [
    "Pydantic AI is a Python agent framework...",
    "DeepSeek develops large language models...",
]
rag.index_documents(docs, recreate=True, build_kg=True)

# Query
result = rag.query("What is Pydantic AI?")
print(result.answer)

# Save for later
rag.save_index("./my_index")
```

See `examples/agentic_rag_example.py` for a complete example.

### Path C: "Production setup" — Qdrant + Full Pipeline

Good for: large document collections, online dynamic KG, adaptive strategies.

```bash
# 1. Start infrastructure
make docker-up
```

```python
from consistent_rag import ConsistentRAG

rag = ConsistentRAG(
    pipeline="hybrid",
    backend="qdrant",
    collection_name="my_docs",
    strategy="adaptive",  # auto-selects best strategy per query
)

# Index documents (KG built incrementally)
rag.index_documents(docs, build_kg=True)

# Query with full agentic loop
result = rag.query("What is the relationship between X and Y?")

print(f"Answer: {result.answer}")
print(f"Iterations: {result.iterations_used}")
print(f"Critic score: {result.final_score:.2f}")
print(f"Seeds: {result.seeds_used}")
```

See `examples/basic_rag_example.py` for vector-only mode and `examples/kg_ppr_rag_example.py` for KG+PPR with Qdrant.

---

## Streamlit Visualization

Run the interactive demo to watch the pipeline execute step-by-step:

```bash
uv run streamlit run streamlit_app.py
```

Features:
- Live query input with strategy selection
- Per-iteration subgraph visualization
- Critic scores, weight deltas, and convergence tracking
- Benchmark dataset selection (FaithEval, SQuAD, etc.)
- Reasoning path evolution charts

---

## Configuration

All parameters are passed to `ConsistentRAG()` or `PipelineConfig`:

```python
from consistent_rag import ConsistentRAG

rag = ConsistentRAG(
    # Pipeline mode
    pipeline="hybrid",          # "vector_only", "kg_only", or "hybrid"
    backend="faiss",            # "faiss" or "qdrant" (vector pipelines only)
    strategy="ppr",             # "ppr", "nhops", "random_walk", "hybrid", "adaptive"

    # Model settings
    llm_model="gpt-4o-mini",    # or "deepseek-chat", etc.
    embedding_model="all-MiniLM-L6-v2",

    # Agentic loop
    max_iterations=3,           # Max answer-critic iterations
    improvement_threshold=0.8,  # Stop if critic score >= this

    # Graph algorithm
    ppr_alpha=0.5,              # PPR damping factor
    max_paths=20,               # Max reasoning paths in context
    path_max_hops=3,            # Max hops per path
)
```

See `consistent_rag/pipeline/config.py` for the full parameter list.

---

## Pipeline Configurations

| Configuration | Type | Vector | KG | Graph Strategy |
|---|---|---|---|---|
| `baseline_no_tool` | Single-pass | No | No | — |
| `baseline_vector` | Single-pass | Yes | No | — |
| `baseline_kg` | Single-pass | No | Yes | PPR |
| `baseline_hybrid` | Single-pass | Yes | Yes | PPR |
| `agentic_vector_only` | Multi-agent | Yes | No | — |
| `agentic_kg_nhops` | Multi-agent | Yes | Yes | N-Hops BFS |
| `agentic_kg_ppr` | Multi-agent | Yes | Yes | PPR |
| `agentic_kg_random_walk` | Multi-agent | Yes | Yes | Random Walk |
| `agentic_kg_hybrid` | Multi-agent | Yes | Yes | Hybrid Semantic |
| `agentic_kg_adaptive` | Multi-agent | Yes | Yes | Adaptive (Full) |

Run any configuration via the evaluation CLI:

```bash
uv run python -m consistent_rag.evaluate_all \
    --approaches agentic_kg_ppr \
    --benchmark faitheval \
    --limit 10 --verbose
```

---

## Architecture

ConsistentRAG operates as a four-phase pipeline:

```
+-------------------------------------------------------------+
|                     Pipeline Layer                          |
| (AdaptiveHybrid / OnlineDynamic / OfflineKG / Baseline)     |
+-------------------------------------------------------------+
|                   Orchestration Layer                        |
|    (AdaptiveRouter Orchestrator / Multi-Agent Loop)          |
+-------------------------------------------------------------+
|                      Agent Layer                            |
|   Seed Agent | Router Agent | Answer Agent | Critic         |
+-------------------------------------------------------------+
|                  Core Services Layer                        |
|   Retriever | Graph Store | Multi-Algorithm Engine | LLM    |
+-------------------------------------------------------------+
|                  Infrastructure Layer                       |
|   Qdrant | NetworkX | FAISS | DeepSeek API (Default)        |
|                       | OpenAI API (Configurable)           |
+-------------------------------------------------------------+
```

### Four Phases

1. **Dual-Mode Indexing & KG Construction** — Offline (FAISS + pre-built graph) or Online (Qdrant + on-the-fly graph)
2. **Multi-Algorithmic Graph Retrieval Engine** — PPR, Directed Random Walks, Hybrid Semantic Traversal, N-Hops BFS
3. **Adaptive Strategy Routing** — Router Agent dynamically selects the optimal graph algorithm per query
4. **Agentic Iterative Refinement** — Answer + Critic loop with context pruning, structural expansion, and instructional augmentation

---

## Running Evaluations

The evaluation system runs any combination of the 10 pipeline configurations against the 5 benchmarks. Results are persisted to CSV (and optionally PostgreSQL) with full resume support.

```bash
# Quick smoke test: 1 config, 1 benchmark, 2 samples
uv run python -m consistent_rag.evaluate_all \
    --approaches baseline_no_tool \
    --benchmark faitheval \
    --limit 2 --verbose

# Run a specific config against all benchmarks
uv run python -m consistent_rag.evaluate_all \
    --approaches agentic_kg_ppr \
    --benchmark all \
    --limit 50 \
    --csv results/agentic_kg_ppr.csv

# Run all configs against all benchmarks (full thesis experiment matrix)
uv run python -m consistent_rag.evaluate_all \
    --approaches all \
    --benchmark all \
    --csv results/full_matrix.csv \
    --metrics all

# Resume an interrupted run
uv run python -m consistent_rag.evaluate_all \
    --approaches all \
    --benchmark all \
    --csv results/full_matrix.csv \
    --resume

# With PostgreSQL persistence
uv run python -m consistent_rag.evaluate_all \
    --approaches all \
    --benchmark faitheval \
    --csv results/faitheval.csv \
    --postgres postgresql://consistent_rag:consistent_rag@localhost:5432/consistent_rag
```

### CLI Options

| Flag | Description |
|---|---|
| `--approaches` | Comma-separated config names or `all` |
| `--benchmark` | Comma-separated benchmark names or `all` |
| `--limit` | Max samples per benchmark (default: 5) |
| `--metrics` | `basic`, `llm`, `deepeval`, or `all` |
| `--csv` | Output CSV path (auto-generated if omitted) |
| `--resume` | Skip completed samples in existing CSV |
| `--no-cache` | Re-extract triplets instead of using cache |
| `--postgres` | PostgreSQL URL for experiment tracking |
| `--verbose` | Print per-sample progress |

### Build Offline Index

```bash
uv run python scripts/build_offline_index.py
```

---

## Benchmarks

| Benchmark | Focus | Samples |
|---|---|---|
| **FaithEval** | Factual consistency (unanswerable, inconsistent, counterfactual) | ~15K |
| **MuSiQuE** | Multi-hop reasoning across documents | ~25K |
| **TimeQA** | Temporal reasoning and evolving facts | ~20K |
| **SQuAD** | Single-hop extractive QA (control baseline) | ~100K |
| **FinanceBench** | Domain-specific financial document QA (SEC filings) | 150 |

---

## MCP Tool

ConsistentRAG can be used as an MCP tool by any compatible agent:

```bash
# Start the MCP server
python -m consistent_rag.mcp_server
```

The server exposes a `query` tool with parameters:
- `question` (required): The question to answer
- `context` (optional): Pre-provided context (skips retrieval)
- `mode`: `online_dynamic` or `offline_static`
- `strategy`: `adaptive`, `ppr`, `random_walk`, `hybrid`, `nhops`
- `top_k`: Number of documents to retrieve

---

## Troubleshooting

### "No API key found"

```
ValueError: No OpenAI or DeepSeek API key found
```

**Fix:** Set `OPENAI_API_KEY` or `DEEPSEEK_API_KEY` in your environment or `.env` file.

### "Connection refused" to localhost:6333

```
qdrant_client.http.exceptions.ResponseHandlingException
```

**Fix:** Qdrant is not running. Start it with `make docker-up`. If you're using FAISS or KG-only mode, you don't need Qdrant.

### "Module not found" after installation

**Fix:** Make sure you're in the correct virtual environment:
```bash
uv sync --extra dev
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
```

### DeepSeek format errors

DeepSeek occasionally rejects message formats. The pipeline automatically retries once. If it persists, switch to OpenAI or another provider.

### Out of memory during indexing

For large document collections, reduce `max_triples_per_doc` (default: 40) or use Qdrant instead of FAISS:

```python
rag = ConsistentRAG(
    pipeline="hybrid",
    backend="qdrant",  # better for large collections
    max_triples_per_doc=20,
)
```

---

## Development

```bash
make lint          # Format check + linting
make format        # Auto-fix formatting
make test-unit     # Run unit tests (no external services)
make test          # Run all tests
make ci            # Full CI pipeline
make docker-up     # Start Qdrant + PostgreSQL
make docker-down   # Stop infrastructure
```

---

## Project Structure

```
consistent_rag/
├── api.py                       # Public API (ConsistentRAG, QueryResult)
├── pipeline/
│   ├── config.py                # PipelineConfig (unified configuration)
│   ├── pipeline.py              # Main pipeline orchestrator
│   └── pipeline_factory.py      # Pipeline configuration factory
├── llm.py                       # Universal LLM client (OpenAI-compatible)
├── embeddings.py                # Embedding service (local + API)
├── retrievers/                  # Vector retrievers (Qdrant, FAISS)
├── knowledge_graph/
│   ├── networkx_graph_store.py  # NetworkX graph backend
│   ├── ppr_engine.py            # Personalized PageRank
│   ├── algorithms.py            # Random Walk, Hybrid, N-Hops
│   ├── extractor.py             # LLM-based triple extraction
│   └── lsh_index.py             # Locality Sensitive Hashing
├── agents/
│   ├── answer/                  # Answer generation agent
│   ├── critic/                  # Critic agent with structured feedback
│   └── retrieval/               # Retrieval orchestrator
├── benchmarks/                  # Dataset loaders
├── evaluation/                  # Metrics and experiment runner
└── examples/                    # Runnable example scripts
```

