Metadata-Version: 2.4
Name: ragforge-ml
Version: 0.1.0
Summary: Local-first RAG pipeline — PDF/Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness. Pairs with turboquant-ml for quantized LLM serving.
Project-URL: Homepage, https://github.com/Ademo93/ragforge
Project-URL: Repository, https://github.com/Ademo93/ragforge
Project-URL: Issues, https://github.com/Ademo93/ragforge/issues
Project-URL: Documentation, https://Ademo93.github.io/ragforge/
Author: RAGforge Contributors
License: MIT
License-File: LICENSE
Keywords: bge,evaluation,fastapi,llm,qdrant,rag,reranker,retrieval-augmented-generation,sentence-transformers,vector-search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.10
Requires-Dist: markdown-it-py>=3.0
Requires-Dist: numpy>=1.24
Requires-Dist: pydantic>=2.7
Requires-Dist: pypdf>=4.2
Requires-Dist: pyyaml>=6.0
Requires-Dist: qdrant-client>=1.9
Requires-Dist: rich>=13.7
Requires-Dist: sentence-transformers>=3.0
Requires-Dist: torch>=2.2
Requires-Dist: tqdm>=4.66
Requires-Dist: transformers>=4.40
Requires-Dist: typer>=0.12
Provides-Extra: all
Requires-Dist: accelerate>=0.30; extra == 'all'
Requires-Dist: fastapi>=0.111; extra == 'all'
Requires-Dist: matplotlib>=3.8; extra == 'all'
Requires-Dist: pandas>=2.2; extra == 'all'
Requires-Dist: python-multipart>=0.0.9; extra == 'all'
Requires-Dist: scikit-learn>=1.4; extra == 'all'
Requires-Dist: turboquant-ml>=0.1; extra == 'all'
Requires-Dist: uvicorn>=0.30; extra == 'all'
Provides-Extra: dev
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Provides-Extra: eval
Requires-Dist: pandas>=2.2; extra == 'eval'
Requires-Dist: scikit-learn>=1.4; extra == 'eval'
Provides-Extra: quantized
Requires-Dist: accelerate>=0.30; extra == 'quantized'
Requires-Dist: turboquant-ml>=0.1; extra == 'quantized'
Provides-Extra: serve
Requires-Dist: fastapi>=0.111; extra == 'serve'
Requires-Dist: python-multipart>=0.0.9; extra == 'serve'
Requires-Dist: uvicorn>=0.30; extra == 'serve'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.8; extra == 'viz'
Requires-Dist: pandas>=2.2; extra == 'viz'
Description-Content-Type: text/markdown

<h1 align="center">RAGforge</h1>

<p align="center">
  <strong>Local-first RAG pipeline — PDF & Markdown ingestion, Qdrant retrieval, bge reranking, and an answer-quality eval harness.</strong>
  <br>
  Pairs with <a href="https://github.com/Ademo93/turboquant">turboquant-ml</a> for quantized LLM serving.
</p>

<p align="center">
  <a href="https://pypi.org/project/ragforge-ml/"><img alt="PyPI" src="https://img.shields.io/badge/pypi-ragforge--ml-blue"></a>
  <a href="#"><img alt="Python" src="https://img.shields.io/badge/python-3.10%2B-blue"></a>
  <a href="#"><img alt="PyTorch" src="https://img.shields.io/badge/pytorch-2.2%2B-ee4c2c"></a>
  <a href="#"><img alt="License" src="https://img.shields.io/badge/license-MIT-green"></a>
  <a href="https://Ademo93.github.io/ragforge/"><img alt="Docs" src="https://img.shields.io/badge/docs-mkdocs--material-blue"></a>
</p>

---

## Why RAGforge?

Most "RAG starter" repos are a 30-line glue between LangChain and OpenAI that nobody can reproduce because it hides retrieval quality, reranking, latency, and cost behind a single `.invoke()` call. **RAGforge is the opposite**: a small, readable, **local-first** pipeline that you can run end-to-end on your own laptop with open-source models, and that ships with an **answer-quality eval harness** so you can actually measure what changing a knob does.

Three opinions:

1. **Local-first.** Default everywhere is open-source: BAAI/bge-small-en-v1.5 for embeddings, BAAI/bge-reranker-base for reranking, Qdrant in **embedded mode** (no Docker required), and any HuggingFace causal LM for generation. No OpenAI key required to try the project.
2. **Measurable.** Every change should answer the question "did the answer get better?". RAGforge ships `ragforge eval` with built-in `context_recall`, `answer_relevance`, and `faithfulness` metrics — no RAGAS dependency required, but RAGAS-compatible.
3. **Composable, not framework-y.** Each stage (ingest, embed, retrieve, rerank, generate, evaluate) is one short module behind a small interface. Swap the encoder, swap the vector store, swap the LLM — no `Runnable.invoke()` magic to debug.

## Features

| Stage | Default | Swappable for |
|---|---|---|
| **Ingest** | PDF (pypdf), Markdown (markdown-it-py) | Anything that yields `(text, metadata)` |
| **Chunk** | Recursive char splitter, ~512 tokens, 64 overlap | Token-aware splitter, sentence splitter |
| **Embed** | `BAAI/bge-small-en-v1.5` (sentence-transformers) | Any sentence-transformers model |
| **Vector store** | Qdrant (embedded, no server required) | Qdrant remote, in-memory NumPy backend |
| **Rerank** | `BAAI/bge-reranker-base` | Any cross-encoder |
| **LLM** | Any HF causal LM | Same model, NF4-quantized via `turboquant-ml` |
| **Eval** | `context_recall`, `answer_relevance`, `faithfulness` | RAGAS, hand-rolled |
| **Serve** | FastAPI `/ingest`, `/ask`, `/eval` | — |
| **CLI** | `ragforge ingest / ask / eval / serve` | — |

## Installation

The PyPI distribution is named **`ragforge-ml`** (the unsuffixed `ragforge`
name was taken by an unrelated project). Python import and CLI are just
`ragforge` / `rf`:

```bash
pip install ragforge-ml                       # core
pip install "ragforge-ml[serve]"              # + FastAPI
pip install "ragforge-ml[quantized]"          # + turboquant-ml NF4 path
pip install "ragforge-ml[all]"                # everything
```

## 60-second tour

```python
from ragforge import Pipeline

rag = Pipeline.from_defaults(model_id="Qwen/Qwen2.5-3B-Instruct")
rag.ingest(["docs/policy.pdf", "notes/onboarding.md"])

answer = rag.ask("What is the maximum reimbursable amount for client lunches?")
print(answer.text)
for src in answer.sources:
    print(f"  {src.score:.3f}  {src.metadata['path']}#chunk{src.metadata['chunk']}")
```

### CLI

```bash
rf ingest docs/ --collection company
rf ask "How do I rotate an API key?" --collection company --k 5
rf eval datasets/qa.jsonl --collection company --metrics context_recall,faithfulness
rf serve --collection company --host 0.0.0.0 --port 8080
```

### Quantized LLM via TurboQuant

```python
from ragforge import Pipeline
from ragforge.llm import QuantizedHFLLM

llm = QuantizedHFLLM("meta-llama/Llama-3.2-3B-Instruct", method="bnb-nf4")
rag = Pipeline.from_defaults(llm=llm)
```

## Architecture

```
ragforge/
├── ingest/        # PDF + Markdown loaders, chunking
├── embed/         # sentence-transformers wrapper
├── vectorstore/   # Qdrant embedded + remote
├── rerank/        # bge-reranker-base
├── llm/           # HF causal LM + turboquant-ml integration
├── pipeline.py    # The end-to-end orchestrator
├── eval/          # context_recall, answer_relevance, faithfulness
├── serve/         # FastAPI app
└── cli.py         # ragforge / rf
```

Each module is short, readable, and replaceable through a small interface
(`Encoder`, `VectorStore`, `Reranker`, `LLM`). The pipeline calls them in
order — no DAG, no runnables, no callbacks.

## Eval harness

The reason RAGforge exists. Most RAG projects ship without measuring whether
their retrieval is any good. RAGforge ships three metrics in pure Python
(no external API), all RAGAS-compatible:

| Metric | What it measures |
|---|---|
| **`context_recall`** | Of the gold-context tokens, what fraction were retrieved? |
| **`answer_relevance`** | Cosine similarity between the answer and synthetic questions back-generated from the answer (RAGAS recipe) |
| **`faithfulness`** | Fraction of answer claims that are entailed by the retrieved context (NLI-based, can fall back to embedding overlap) |

```bash
rf eval datasets/qa.jsonl --collection company
```

```text
                            +---------------+--------+
                            |  metric       |  mean  |
                            +---------------+--------+
                            | context_recall|  0.84  |
                            | answer_rel    |  0.78  |
                            | faithfulness  |  0.91  |
                            +---------------+--------+
                            n=120  ·  latency_p50=620ms  ·  latency_p95=1.4s
```

## Roadmap

- [x] PDF + Markdown ingestion
- [x] Recursive char chunker with overlap
- [x] BGE embeddings + BGE reranker
- [x] Qdrant embedded + remote
- [x] FastAPI serve
- [x] CLI: ingest, ask, eval, serve
- [x] Eval: context_recall, answer_relevance, faithfulness
- [x] `turboquant-ml` integration for NF4 LLM serving
- [ ] Hybrid retrieval (BM25 + dense)
- [ ] Streaming generation in `/ask`
- [ ] Notion / Confluence loaders (community PRs welcome)
- [ ] SQL agent for structured-data questions

## Contributing

See [`docs/CONTRIBUTING.md`](docs/CONTRIBUTING.md).

```bash
git clone https://github.com/Ademo93/ragforge
cd ragforge
pip install -e ".[dev,serve,eval]"
pytest
```

## License

[MIT](LICENSE).
