Skip to content

Welcome to EmbedRAG

Production-grade Retrieval-Augmented Generation (RAG) system with a high-performance read/write split architecture. Designed to handle millions of documents with sub-second latency (1000+ QPS) on modest hardware (4C16G).

Why EmbedRAG?

Most RAG systems struggle with scaling ingestion and search simultaneously. EmbedRAG solves this by separating the Writer Node (ingestion, chunking, embedding, index building) from the Query Node (high-speed retrieval, fusion, search).

  • Extreme Performance: Engineered for high-throughput search with FAISS and SQLite FTS5.
  • Cost-Efficient: Achieving 1000 QPS on a single 16GB RAM machine.
  • Production Ready: Built-in snapshot management, zero-downtime synchronization, and structured logging.
  • Flexible Search: Hybrid search (dense + sparse) with Reciprocal Rank Fusion (RRF) and hierarchical expansion.
  • Multi-Modal: Support for heterogeneous embedding spaces (text, images, audio).
  • Clustering: Standalone + integrated topic discovery over embeddings, with auto parameter tuning and an interactive visualization.

Architecture

EmbedRAG's architecture revolves around a "Build-Publish-Sync" lifecycle:

graph TD
    Embed[Embedding Service] --> Writer
    subgraph "Writer Node (:8001)"
    Writer[Writer] -- "/ingest" --> Chunking
    Chunking --> Embedding
    Embedding --> SQLite[SQLite WAL]
    SQLite -- "/build" --> FAISS[FAISS Index]
    FAISS -- "/publish" --> S3[S3/TOS Storage]
    end
    S3 -- "snapshot" --> Query
    subgraph "Query Node (:8000)"
    Query[Query Node] -- "startup" --> Load[Load Snapshot]
    Load --> Mmap[FAISS mmap]
    Load --> SQLiteRO[SQLite RO]
    Search[/search] --> Fusion[RRF Fusion]
    Fusion --> Results
    end

Documentation Roadmap

Dive into the details of EmbedRAG:


Quick Start

Prerequisites

  • Python: 3.11 or later.
  • Package Manager: uv (recommended) or pip.
  • Embedding API: An external service (like OpenAI or a local vLLM/Ollama instance) that returns float vectors.

1. Install

git clone <repo-url> && cd embedRAG
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[dev,docs]"

2. Start the Writer Node

The writer handles ingestion and index building.

# Start with default settings (data at /data/embedrag)
embedrag writer --port 8001

3. Start the Query Node

The query node serves search requests.

# Start with default settings
embedrag query --port 8000

4. Search and Explore

Once you've ingested some data via the writer, you can perform hybrid searches:

curl "http://localhost:8000/search/text?query_text=What+is+EmbedRAG&mode=hybrid"