Metadata-Version: 2.4
Name: ragrails
Version: 0.2.1
Summary: A modular RAG SDK for ingesting web, document, and API sources, chunking them, and storing embeddings in pluggable vector databases.
Project-URL: Homepage, https://github.com/samowolabi/ragrails
Project-URL: Repository, https://github.com/samowolabi/ragrails
Project-URL: Documentation, https://dev.ragrails.com
Author: Sam Owolabi
License: MIT License
        
        Copyright (c) 2026 Sam Owolabi
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: agentic-ai,ai,api-ingestion,chunking,document-ingestion,embeddings,llm,markdown,pinecone,qdrant,rag,retrieval-augmented-generation,semantic-search,vector-database,vector-search,weaviate,web-scraping
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: fastapi>=0.115
Requires-Dist: httpx>=0.27.2
Requires-Dist: langchain-text-splitters>=1.1.2
Requires-Dist: markitdown[docx,pdf]>=0.1.5
Requires-Dist: pymupdf4llm>=1.27.2.3
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: uvicorn>=0.34
Provides-Extra: all
Requires-Dist: anthropic>=0.97.0; extra == 'all'
Requires-Dist: click>=8.1.0; extra == 'all'
Requires-Dist: crawl4ai>=0.8.6; extra == 'all'
Requires-Dist: fastapi>=0.115; extra == 'all'
Requires-Dist: flagembedding>=1.4.0; extra == 'all'
Requires-Dist: httpx>=0.27.2; extra == 'all'
Requires-Dist: langchain-text-splitters>=1.1.2; extra == 'all'
Requires-Dist: markitdown[docx,pdf]>=0.1.5; extra == 'all'
Requires-Dist: openai>=2.32.0; extra == 'all'
Requires-Dist: pinecone>=9.0.0; extra == 'all'
Requires-Dist: pymupdf4llm>=1.27.2.3; extra == 'all'
Requires-Dist: qdrant-client>=1.17.1; extra == 'all'
Requires-Dist: sentence-transformers>=5.4.1; extra == 'all'
Requires-Dist: uvicorn>=0.34; extra == 'all'
Requires-Dist: voyageai>=0.3.7; extra == 'all'
Requires-Dist: weaviate-client>=4.20.5; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.97.0; extra == 'anthropic'
Provides-Extra: openai
Requires-Dist: openai>=2.32.0; extra == 'openai'
Provides-Extra: pinecone
Requires-Dist: pinecone>=9.0.0; extra == 'pinecone'
Provides-Extra: qdrant
Requires-Dist: qdrant-client>=1.17.1; extra == 'qdrant'
Provides-Extra: rerank
Requires-Dist: flagembedding>=1.4.0; extra == 'rerank'
Requires-Dist: sentence-transformers>=5.4.1; extra == 'rerank'
Provides-Extra: server
Requires-Dist: fastapi>=0.115; extra == 'server'
Requires-Dist: uvicorn>=0.34; extra == 'server'
Provides-Extra: server-pinecone
Requires-Dist: fastapi>=0.115; extra == 'server-pinecone'
Requires-Dist: openai>=2.32.0; extra == 'server-pinecone'
Requires-Dist: pinecone>=9.0.0; extra == 'server-pinecone'
Requires-Dist: uvicorn>=0.34; extra == 'server-pinecone'
Requires-Dist: voyageai>=0.3.7; extra == 'server-pinecone'
Provides-Extra: server-qdrant
Requires-Dist: fastapi>=0.115; extra == 'server-qdrant'
Requires-Dist: openai>=2.32.0; extra == 'server-qdrant'
Requires-Dist: qdrant-client>=1.17.1; extra == 'server-qdrant'
Requires-Dist: uvicorn>=0.34; extra == 'server-qdrant'
Requires-Dist: voyageai>=0.3.7; extra == 'server-qdrant'
Provides-Extra: server-weaviate
Requires-Dist: fastapi>=0.115; extra == 'server-weaviate'
Requires-Dist: openai>=2.32.0; extra == 'server-weaviate'
Requires-Dist: uvicorn>=0.34; extra == 'server-weaviate'
Requires-Dist: voyageai>=0.3.7; extra == 'server-weaviate'
Requires-Dist: weaviate-client>=4.20.5; extra == 'server-weaviate'
Provides-Extra: store-pinecone
Requires-Dist: langchain-text-splitters>=1.1.2; extra == 'store-pinecone'
Requires-Dist: pinecone>=9.0.0; extra == 'store-pinecone'
Requires-Dist: voyageai>=0.3.7; extra == 'store-pinecone'
Provides-Extra: store-qdrant
Requires-Dist: langchain-text-splitters>=1.1.2; extra == 'store-qdrant'
Requires-Dist: qdrant-client>=1.17.1; extra == 'store-qdrant'
Requires-Dist: voyageai>=0.3.7; extra == 'store-qdrant'
Provides-Extra: store-weaviate
Requires-Dist: langchain-text-splitters>=1.1.2; extra == 'store-weaviate'
Requires-Dist: voyageai>=0.3.7; extra == 'store-weaviate'
Requires-Dist: weaviate-client>=4.20.5; extra == 'store-weaviate'
Provides-Extra: url
Requires-Dist: crawl4ai>=0.8.6; extra == 'url'
Provides-Extra: voyage
Requires-Dist: voyageai>=0.3.7; extra == 'voyage'
Provides-Extra: weaviate
Requires-Dist: weaviate-client>=4.20.5; extra == 'weaviate'
Description-Content-Type: text/markdown

# Ragrails

[![PyPI](https://img.shields.io/pypi/v/ragrails)](https://pypi.org/project/ragrails/)
[![Python](https://img.shields.io/pypi/pyversions/ragrails)](https://pypi.org/project/ragrails/)
[![Downloads](https://static.pepy.tech/badge/ragrails)](https://pepy.tech/project/ragrails)
[![License](https://img.shields.io/pypi/l/ragrails)](LICENSE)

Ragrails is a modular RAG toolkit for turning URLs, local documents, and REST
API responses into retrieval-ready vector indexes.

It is organized in layers:

```text
core -> SDK -> CLI -> REST API
```

Most users should use the SDK, CLI, or REST API. The core modules are the shared
foundation that the public interfaces build on.

## Install

Ragrails requires Python 3.10 or newer.

```bash
pip install ragrails
```

The base install includes the SDK, CLI, REST API server, document ingestion, API
ingestion, chunking, embedding orchestration, vector storage orchestration,
retrieval, chat orchestration, and pipeline helpers.

Install extras for URL scraping, model providers, reranking, and vector database
clients:

| Need | Install |
|---|---|
| URL ingestion | `pip install "ragrails[url]"` |
| Store in Qdrant | `pip install "ragrails[store-qdrant]"` |
| Store in Pinecone | `pip install "ragrails[store-pinecone]"` |
| Store in Weaviate | `pip install "ragrails[store-weaviate]"` |
| REST API with Qdrant stack | `pip install "ragrails[server-qdrant]"` |
| REST API with Pinecone stack | `pip install "ragrails[server-pinecone]"` |
| REST API with Weaviate stack | `pip install "ragrails[server-weaviate]"` |
| Everything | `pip install "ragrails[all]"` |

Provider extras are also available separately:

| Provider | Install |
|---|---|
| Voyage embeddings | `pip install "ragrails[voyage]"` |
| Qdrant | `pip install "ragrails[qdrant]"` |
| Pinecone | `pip install "ragrails[pinecone]"` |
| Weaviate | `pip install "ragrails[weaviate]"` |
| OpenAI | `pip install "ragrails[openai]"` |
| Anthropic | `pip install "ragrails[anthropic]"` |
| Reranking | `pip install "ragrails[rerank]"` |

## SDK Quick Start

```python
from ragrails import RagRails

rag = RagRails()
```

### Ingest and Store

Run ingestion, chunking, embedding, and vector storage in one call:

```python
from ragrails import RagRails

rag = RagRails()

result = rag.ingest(
    markdown="# Guide\n\nRagrails builds modular RAG workflows.",
    embedding={
        "provider": "voyage",
        "model": "voyage-3",
    },
    storage={
        "vector_db": "qdrant",
        "collection": "docs",
        "url": "http://localhost:6333",
    },
)

print(result.stored)
```

You can also provide `docs`, `urls`, or `api` sources:

```python
rag.ingest(
    docs=["files/guide.pdf"],
    ingestion={"docs": {"mode": "single"}},
    embedding={"provider": "voyage", "model": "voyage-3"},
    storage={"vector_db": "qdrant", "collection": "docs"},
)
```

URL ingestion uses Playwright through `crawl4ai`. Install the URL extra and run
browser setup once in the target environment:

```bash
pip install "ragrails[url,voyage,qdrant]"
```

```python
RagRails().setup_url()
```

### Query

Run query embedding and retrieval in one call:

```python
result = rag.query(
    "What does the guide cover?",
    embedding={
        "provider": "voyage",
        "model": "voyage-3",
    },
    retrieval={
        "vector_db": "qdrant",
        "collection": "docs",
        "url": "http://localhost:6333",
        "top_k": 5,
    },
)

for chunk in result.items:
    print(chunk.text)
```

### Chat

Chat is stateless. Pass `history` explicitly and persist the returned
`result.history` in your app session.

```python
from ragrails import QueryRewriteConfig, RagRails

rag = RagRails()

llm = rag.llm(provider="openai", model="gpt-4.1-mini")
embedder = rag.embedder(provider="voyage", model="voyage-3", input_type="query")

history = []

result = rag.chat(
    "How do I authenticate?",
    llm=llm,
    embedder=embedder,
    vector_db="qdrant",
    collection="docs",
    url="http://localhost:6333",
    history=history,
    query_rewrite=QueryRewriteConfig(enabled=True),
)

print(result.answer)
history = result.history
```

### Edit and Delete Stored Chunks

Edit and delete operations are vector database agnostic at the SDK layer.

```python
edit_result = rag.edit(
    chunks=[
        {
            "id": "chunk-id",
            "text": "Updated chunk text",
            "source": "files/guide.pdf",
            "metadata": {"title": "Guide"},
        }
    ],
    embedder=rag.embedder(provider="voyage", model="voyage-3"),
    vector_db="qdrant",
    collection="docs",
    url="http://localhost:6333",
)

delete_result = rag.delete(
    ids=["chunk-id"],
    vector_db="qdrant",
    collection="docs",
    url="http://localhost:6333",
)
```

## CLI

Ragrails ships with a CLI for local workflows and smoke tests.

```bash
ragrails --help
```

Examples:

```bash
ragrails ingest \
  --markdown "# Guide\n\nUse Ragrails for RAG workflows." \
  --vector-db qdrant \
  --collection docs
```

```bash
ragrails query "What does the guide cover?" \
  --vector-db qdrant \
  --collection docs
```

CLI docs live in [ragrails/interfaces/cli/README.md](ragrails/interfaces/cli/README.md).

## REST API

Ragrails ships a FastAPI server on top of the SDK.

```bash
ragrails-api
```

Swagger UI is available at `http://127.0.0.1:8000/docs` when the server is
running. The OpenAPI schema is available at `/v1/openapi.json`.

REST docs live in
[ragrails/interfaces/server/README.md](ragrails/interfaces/server/README.md).

## Package Structure

```text
ragrails/
  core/
    stg_01_ingestors/
    stg_02_chunker/
    stg_03_embedder/
    stg_04_storing/
    stg_05_retriever/
    stg_06_chat/
  interfaces/
    sdk/
    cli/
    server/
```

## Interface Docs

| Interface | Docs |
|---|---|
| SDK chunking | [ragrails/interfaces/sdk/chunking/README.md](ragrails/interfaces/sdk/chunking/README.md) |
| SDK embedding | [ragrails/interfaces/sdk/embedding/README.md](ragrails/interfaces/sdk/embedding/README.md) |
| SDK storing | [ragrails/interfaces/sdk/storing/README.md](ragrails/interfaces/sdk/storing/README.md) |
| SDK retrieval | [ragrails/interfaces/sdk/retrieval/README.md](ragrails/interfaces/sdk/retrieval/README.md) |
| SDK chat | [ragrails/interfaces/sdk/chat/README.md](ragrails/interfaces/sdk/chat/README.md) |
| CLI | [ragrails/interfaces/cli/README.md](ragrails/interfaces/cli/README.md) |
| REST API | [ragrails/interfaces/server/README.md](ragrails/interfaces/server/README.md) |

Specialized SDK ingestion docs:

- [URL ingestion](ragrails/interfaces/sdk/ingestion/url/README.md)
- [Document ingestion](ragrails/interfaces/sdk/ingestion/docs/README.md)
- [API ingestion](ragrails/interfaces/sdk/ingestion/api/README.md)

## Development Checks

Run local interface checks:

```bash
scripts/test-core.sh
scripts/test-sdk.sh
scripts/test-cli.sh
scripts/test-rest.sh
```

The repo uses `.githooks/pre-push`, so `git push` runs the same checks and
blocks the push if any interface test fails.

Build and validate release artifacts:

```bash
uv build
uvx twine check dist/*
```

Publish only after the checks pass:

```bash
uv publish
```

## Status

The public SDK currently covers ingestion, chunking, embedding, vector storage,
retrieval, chat, vector edit/delete, and end-to-end ingestion/query pipeline
helpers. CLI and REST API interfaces are built on top of the SDK.
