Metadata-Version: 2.4
Name: retrivo
Version: 0.2.0
Summary: Store everything. Retrieve what matters.
Requires-Python: >=3.10
Requires-Dist: chromadb>=0.4.0
Requires-Dist: gradio>=4.0.0
Requires-Dist: langchain-chroma>=0.1.0
Requires-Dist: langchain-community>=0.3.0
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: langchain-huggingface>=0.1.0
Requires-Dist: langchain-text-splitters>=0.3.0
Requires-Dist: langchain>=0.3.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: pdfplumber>=0.9.0
Requires-Dist: pypdf>=4.0.0
Requires-Dist: python-docx>=1.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.31.0
Requires-Dist: sentence-transformers>=2.6.0
Provides-Extra: anthropic
Requires-Dist: langchain-anthropic>=0.3.0; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: full
Requires-Dist: ebooklib>=0.18; extra == 'full'
Requires-Dist: html2text>=2020.1.16; extra == 'full'
Requires-Dist: pillow>=10.0.0; extra == 'full'
Requires-Dist: pymupdf>=1.23.0; extra == 'full'
Requires-Dist: pytesseract>=0.3.10; extra == 'full'
Requires-Dist: python-magic>=0.4.27; extra == 'full'
Requires-Dist: python-pptx>=0.6.21; extra == 'full'
Provides-Extra: openai
Requires-Dist: langchain-openai>=0.2.0; extra == 'openai'
Description-Content-Type: text/markdown

<div align="center">

# Retrivo

**Store everything. Retrieve what matters.**

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776AB.svg)](https://python.org)

---

A RAG framework for intelligent document understanding.

Ingest any document. Build knowledge. Get answers.

</div>

---

## What is Retrivo?

Retrivo is a Retrieval-Augmented Generation platform that turns your documents into a queryable knowledge base. It goes beyond simple text search — understanding tables, images, charts, and relationships across your entire document collection.

### Core Capabilities

- **Multi-Modal Intelligence** — First-class support for text, tables, images, and charts
- **Hybrid Search** — BM25 + vector search with reciprocal rank fusion
- **Knowledge Graphs** — Automatic entity and relationship extraction across documents
- **Smart Chunking** — Structure-aware and semantic boundary detection
- **Advanced Retrieval** — HyDE, cross-encoder reranking, MMR diversity
- **26+ File Formats** — PDF, DOCX, XLSX, PPTX, images, EPUB, and more
- **Online & Offline** — Works with cloud APIs or fully local models

### Architecture

```
Document → Ingest → Chunk → Embed → Store
                                       ↓
Query → Understand → Search → Rerank → Answer
```

### Tech Stack

| Layer | Technologies |
|-------|-------------|
| LLMs | OpenAI, Anthropic, Ollama, HuggingFace |
| Embeddings | OpenAI, Cohere, Sentence Transformers |
| Vector Stores | ChromaDB, FAISS |
| Search | BM25, Vector, Hybrid + Cross-Encoder Reranking |
| Documents | pdfplumber, PyMuPDF, camelot, python-docx |
| API | FastAPI |
| UI | Streamlit |

---

<div align="center">

**Something cool is coming.**

Stay tuned.

</div>
