Metadata-Version: 2.4
Name: bifrost-rag
Version: 0.1.0
Summary: Production RAG pipeline with vector retrieval, configurable chunking, and evaluation harness
Project-URL: Homepage, https://github.com/Jbermingham1/bifrost-rag
Project-URL: Documentation, https://github.com/Jbermingham1/bifrost-rag#readme
Project-URL: Repository, https://github.com/Jbermingham1/bifrost-rag
Project-URL: Issues, https://github.com/Jbermingham1/bifrost-rag/issues
Author-email: Jarrad Bermingham <jarrad.bermingham98@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: ai,embeddings,llm,rag,retrieval-augmented-generation,vector-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.40.0
Requires-Dist: chromadb>=0.5.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: tiktoken>=0.7.0
Provides-Extra: all
Requires-Dist: pinecone>=5.0.0; extra == 'all'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'all'
Requires-Dist: pytest-cov>=5.0.0; extra == 'all'
Requires-Dist: pytest>=8.0.0; extra == 'all'
Requires-Dist: ruff>=0.5.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.5.0; extra == 'dev'
Provides-Extra: pinecone
Requires-Dist: pinecone>=5.0.0; extra == 'pinecone'
Description-Content-Type: text/markdown

# bifrost-rag

[![PyPI](https://img.shields.io/pypi/v/bifrost-rag)](https://pypi.org/project/bifrost-rag/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

Production RAG pipeline with vector retrieval, configurable chunking, and evaluation harness.

## Features

- **Chunking strategies** — Fixed-size, sentence-based, and recursive splitting
- **Vector stores** — ChromaDB (local) with pluggable store interface
- **Retrieval** — Top-K retrieval with score thresholds
- **Evaluation harness** — Precision@K, Recall@K, F1, MRR metrics
- **Pipeline orchestrator** — Ingest, retrieve, and query in one interface

## Architecture

```
┌─────────────┐    ┌──────────────┐    ┌──────────────┐
│   Chunking   │───▶│  Embeddings  │───▶│ Vector Store │
│  strategies  │    │   (Voyage)   │    │  (ChromaDB)  │
└─────────────┘    └──────────────┘    └──────┬───────┘
                                              │
                   ┌──────────────┐    ┌──────▼───────┐
                   │  Evaluation  │◀───│  Retrieval   │
                   │   harness    │    │   pipeline   │
                   └──────────────┘    └──────────────┘
```

## Installation

```bash
pip install bifrost-rag
```

## Quick Start

```python
from bifrost_rag import ChromaStore, Document, RAGPipeline, FixedSizeChunker

# Create a vector store
store = ChromaStore(collection_name="my-docs")

# Chunk and ingest documents
chunker = FixedSizeChunker(chunk_size=500, overlap=50)
text = "Your long document text here..."
chunks = chunker.chunk(text, metadata={"source": "doc1"})

# Add documents with embeddings (from your embedding provider)
docs = [
    Document(id=f"chunk-{c.index}", text=c.text, embedding=your_embeddings[i])
    for i, c in enumerate(chunks)
]
pipeline = RAGPipeline(store=store, top_k=5)
pipeline.ingest(docs)

# Query
result = pipeline.query(query_embedding=query_embed)
print(result.sources)
```

## Chunking Strategies

| Strategy | Description |
|----------|-------------|
| `FixedSizeChunker` | Split by character count with overlap |
| `SentenceChunker` | Split by sentence boundaries |
| `RecursiveChunker` | Hierarchical split: paragraphs → sentences → characters |

## Evaluation

```python
from bifrost_rag import RAGEvaluator, Document

evaluator = RAGEvaluator()

# Evaluate retrieval quality
dataset = [
    {
        "retrieved": [Document(id="d1", text="..."), Document(id="d2", text="...")],
        "relevant_ids": {"d1"},
    },
]
result = evaluator.evaluate(dataset, k=5)
print(result.summary())
```

## License

MIT
