Metadata-Version: 2.3
Name: nexrag
Version: 0.4.0
Summary: Framework-agnostic RAG pipeline SDK. Plug in any component, swap any stage, configure everything in YAML
Keywords: rag,retrieval-augmented-generation,llm,vector-database,embeddings,ai,nlp,pipeline,sdk
Author: KevinRawal
Author-email: KevinRawal <kevinrawal30@gmail.com>
License: Apache-2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: nexrag[all-providers] ; extra == 'all'
Requires-Dist: nexrag[all-loaders] ; extra == 'all'
Requires-Dist: nexrag[all-retrieval] ; extra == 'all'
Requires-Dist: nexrag[all-chunkers] ; extra == 'all'
Requires-Dist: nexrag[all-guards] ; extra == 'all'
Requires-Dist: nexrag[tiktoken] ; extra == 'all-chunkers'
Requires-Dist: nexrag[code] ; extra == 'all-chunkers'
Requires-Dist: nexrag[pii] ; extra == 'all-guards'
Requires-Dist: nexrag[pdf] ; extra == 'all-loaders'
Requires-Dist: nexrag[word] ; extra == 'all-loaders'
Requires-Dist: nexrag[excel] ; extra == 'all-loaders'
Requires-Dist: nexrag[html] ; extra == 'all-loaders'
Requires-Dist: nexrag[openai] ; extra == 'all-providers'
Requires-Dist: nexrag[anthropic] ; extra == 'all-providers'
Requires-Dist: nexrag[gemini] ; extra == 'all-providers'
Requires-Dist: nexrag[ollama] ; extra == 'all-providers'
Requires-Dist: nexrag[chromadb] ; extra == 'all-providers'
Requires-Dist: nexrag[pinecone] ; extra == 'all-providers'
Requires-Dist: nexrag[huggingface] ; extra == 'all-providers'
Requires-Dist: nexrag[cohere] ; extra == 'all-rerankers'
Requires-Dist: nexrag[cross-encoder] ; extra == 'all-rerankers'
Requires-Dist: nexrag[all-sparse] ; extra == 'all-retrieval'
Requires-Dist: nexrag[all-rerankers] ; extra == 'all-retrieval'
Requires-Dist: nexrag[bm25] ; extra == 'all-sparse'
Requires-Dist: anthropic>=0.20 ; extra == 'anthropic'
Requires-Dist: rank-bm25>=0.2 ; extra == 'bm25'
Requires-Dist: chromadb>=1.5.9 ; extra == 'chromadb'
Requires-Dist: tree-sitter>=0.21 ; extra == 'code'
Requires-Dist: tree-sitter-language-pack>=0.2 ; extra == 'code'
Requires-Dist: cohere>=5.0 ; extra == 'cohere'
Requires-Dist: sentence-transformers>=2.7 ; extra == 'cross-encoder'
Requires-Dist: pytest>=8.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=5.0 ; extra == 'dev'
Requires-Dist: ruff>=0.4 ; extra == 'dev'
Requires-Dist: mypy>=1.10 ; extra == 'dev'
Requires-Dist: pre-commit>=3.7 ; extra == 'dev'
Requires-Dist: types-pyyaml ; extra == 'dev'
Requires-Dist: rank-bm25>=0.2 ; extra == 'dev'
Requires-Dist: openpyxl>=3.1 ; extra == 'excel'
Requires-Dist: google-genai>=1.0 ; extra == 'gemini'
Requires-Dist: beautifulsoup4>=4.12 ; extra == 'html'
Requires-Dist: lxml>=5.0 ; extra == 'html'
Requires-Dist: huggingface-hub>=0.20 ; extra == 'huggingface'
Requires-Dist: ollama>=0.1 ; extra == 'ollama'
Requires-Dist: httpx>=0.27 ; extra == 'ollama'
Requires-Dist: openai>=2.38.0 ; extra == 'openai'
Requires-Dist: pypdf>=4.0 ; extra == 'pdf'
Requires-Dist: presidio-analyzer>=2.2 ; extra == 'pii'
Requires-Dist: presidio-anonymizer>=2.2 ; extra == 'pii'
Requires-Dist: pinecone>=5.0 ; extra == 'pinecone'
Requires-Dist: tiktoken>=0.7 ; extra == 'tiktoken'
Requires-Dist: python-docx>=1.0 ; extra == 'word'
Requires-Python: >=3.12
Project-URL: Homepage, https://github.com/kevinrawal/nexrag
Project-URL: Repository, https://github.com/kevinrawal/nexrag
Project-URL: Issues, https://github.com/kevinrawal/nexrag/issues
Project-URL: Changelog, https://github.com/kevinrawal/nexrag/blob/main/CHANGELOG.md
Provides-Extra: all
Provides-Extra: all-chunkers
Provides-Extra: all-guards
Provides-Extra: all-loaders
Provides-Extra: all-providers
Provides-Extra: all-rerankers
Provides-Extra: all-retrieval
Provides-Extra: all-sparse
Provides-Extra: anthropic
Provides-Extra: bm25
Provides-Extra: chromadb
Provides-Extra: code
Provides-Extra: cohere
Provides-Extra: cross-encoder
Provides-Extra: dev
Provides-Extra: excel
Provides-Extra: gemini
Provides-Extra: html
Provides-Extra: huggingface
Provides-Extra: ollama
Provides-Extra: openai
Provides-Extra: pdf
Provides-Extra: pii
Provides-Extra: pinecone
Provides-Extra: tiktoken
Provides-Extra: word
Description-Content-Type: text/markdown

# NexRAG

```text
███╗   ██╗███████╗██╗  ██╗██████╗  █████╗  ██████╗
████╗  ██║██╔════╝╚██╗██╔╝██╔══██╗██╔══██╗██╔════╝
██╔██╗ ██║█████╗   ╚███╔╝ ██████╔╝███████║██║  ███╗
██║╚██╗██║██╔══╝   ██╔██╗ ██╔══██╗██╔══██║██║   ██║
██║ ╚████║███████╗██╔╝ ██╗██║  ██║██║  ██║╚██████╔╝
╚═╝  ╚═══╝╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝

●plug ⇄swap ▶scale
```

> Framework-agnostic RAG pipeline SDK. Plug in any component, swap any stage, configure everything in YAML.

[![PyPI version](https://img.shields.io/pypi/v/nexrag.svg)](https://pypi.org/project/nexrag/)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

---

## What is NexRAG?

NexRAG is a production-grade RAG (Retrieval-Augmented Generation) pipeline SDK for Python.

**NexRAG owns the pipeline shape. You own the components.**

Every stage — loading, chunking, embedding, retrieval, generation — is a clean interface. NexRAG ships default implementations for each. You can swap any of them by implementing the interface and declaring it in YAML. No framework lock-in. No magic. No hidden behavior.

---

## Quickstart

```bash
pip install "nexrag[openai,chromadb,pdf]"
export OPENAI_API_KEY=sk-...
cp nexrag.example.yaml nexrag.yaml   # edit to taste
```

```python
from nexrag import NexRAG, RunMetrics

pipeline = NexRAG.from_config("nexrag.yaml")

# Ingest a PDF
result = pipeline.ingest("contracts/agreement.pdf")
print(f"Ingested {result.documents_loaded} doc, {result.chunks_produced} chunks")

# Query (blocking)
result = pipeline.query("What are the termination clauses?")
print(result.answer)
for source in result.sources:
    print(f"  [{source.rank}] score={source.score:.3f}  {source.chunk.metadata.get('source')}")

# Streaming — tokens arrive live; RunMetrics is the final item
metrics = None
for item in pipeline.stream_query("Summarise the key obligations."):
    if isinstance(item, RunMetrics):
        metrics = item
    else:
        print(item, end="", flush=True)
print(f"\n\n{metrics.total_latency_ms:.0f}ms — {metrics.chunks_retrieved} chunks retrieved")
```

```yaml
# nexrag.yaml (minimal)
ingestion:
  loader:
    type: pdf
  embedder:
    provider: openai
    model: text-embedding-3-small
    api_key: ${OPENAI_API_KEY}
  vector_db:
    provider: chroma
    default_collection: documents
    collections:
      documents:
        mode: persistent
        path: ./.nexrag/chroma

query:
  embedder: inherit
  llm:
    provider: openai
    model: gpt-4o
    api_key: ${OPENAI_API_KEY}
```

See [docs/](docs/) for the full documentation site.

---

## Installation

```bash
# Core only — pydantic + pyyaml. No provider SDKs; add the extras you use below.
pip install nexrag

# Default getting-started stack (OpenAI embedder/LLM + ChromaDB vector store)
pip install "nexrag[openai,chromadb]"

# Provider extras — install only what you use
pip install "nexrag[openai]"         # OpenAI embedder + LLM
pip install "nexrag[chromadb]"       # ChromaDB vector store
pip install "nexrag[anthropic]"      # Anthropic (Claude) LLM
pip install "nexrag[ollama]"         # Ollama local LLM + embedder
pip install "nexrag[huggingface]"    # HuggingFace embedder

# Document loaders
pip install "nexrag[pdf]"            # PDFLoader (pypdf)
pip install "nexrag[word]"           # Word documents (python-docx)
pip install "nexrag[html]"           # HTML pages (beautifulsoup4)

# Retrieval extras
pip install "nexrag[bm25]"           # BM25Retriever keyword search (rank-bm25)
pip install "nexrag[cohere]"         # CohereReranker
pip install "nexrag[cross-encoder]"  # CrossEncoderReranker (sentence-transformers)

# Convenience bundles
pip install "nexrag[all-sparse]"     # all sparse retrievers (bm25)
pip install "nexrag[all-rerankers]"  # all rerankers (cohere + cross-encoder)
pip install "nexrag[all-retrieval]"  # all-sparse + all-rerankers
pip install "nexrag[all-loaders]"    # all document loaders
pip install "nexrag[all-providers]"  # all LLM + embedder providers

# Full bundle — dev/CI; pulls PyTorch via sentence-transformers
pip install "nexrag[all]"
```

---

## Design Principles

| Principle | What it means |
|---|---|
| Interface-first | Every stage is a contract. Implementation is secondary. |
| Config-driven | YAML configures the pipeline. Code defines the logic. |
| Zero lock-in | Core has no dependency on LangChain, LlamaIndex, or any AI SDK. |
| Explicit over implicit | No hidden defaults. Every behavior is declared or documented. |
| Extensible by design | New components plug in without touching core. |

---

## Architecture

NexRAG has two independent pipelines:

```
INGESTION  →  Loader → Sanitizer → Chunker → Embedder → VectorDB
QUERY      →  Embedder → Retriever → PromptBuilder → LLM → PipelineResult
```

See [Architecture Documentation](docs/) for full pipeline diagrams.

---

## Supported Providers

| Category | Providers |
|---|---|
| Embedders | OpenAI, Ollama, HuggingFace |
| Vector DBs | ChromaDB (in-memory, persistent, remote server) |
| LLMs | OpenAI, Ollama, Anthropic |
| Loaders | PDF, plain text, Word, HTML, Excel |
| Chunkers | Recursive (separator-aware) |
| Retrievers | Dense (cosine similarity), BM25 (keyword), Hybrid (dense + BM25) |
| Rerankers | Cohere, CrossEncoder (sentence-transformers) |

---

## Contributing

NexRAG is in early development. Contribution guidelines will be published with v1.0.

---

## Changelog

See [CHANGELOG.md](CHANGELOG.md).
