Metadata-Version: 2.4
Name: treerag
Version: 0.1.1
Summary: Lightweight structural RAG library — index documents as trees, query without a vector DB.
Author: Sagar Jadhav
License: MIT
Project-URL: Homepage, https://github.com/jsagar783/treerag
Project-URL: Repository, https://github.com/jsagar783/treerag
Keywords: rag,document,indexing,llm,langchain,markdown,tree,structural-rag
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain-core>=1.2.26
Requires-Dist: pymupdf>=1.27.2.2
Requires-Dist: python-docx>=1.0.0
Requires-Dist: requests>=2.33.0
Requires-Dist: beautifulsoup4>=4.12.0
Provides-Extra: openai
Requires-Dist: langchain-openai>=1.1.10; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: langchain-anthropic>=0.3.0; extra == "anthropic"
Provides-Extra: gemini
Requires-Dist: langchain-google-genai>=2.0.0; extra == "gemini"
Provides-Extra: ollama
Requires-Dist: langchain-ollama>=0.2.0; extra == "ollama"
Provides-Extra: all
Requires-Dist: langchain-openai>=1.1.10; extra == "all"
Requires-Dist: langchain-anthropic>=0.3.0; extra == "all"
Requires-Dist: langchain-google-genai>=2.0.0; extra == "all"
Requires-Dist: langchain-ollama>=0.2.0; extra == "all"
Dynamic: license-file

# treerag 🌳

**RAG without vector databases.**

A lightweight structural RAG library that indexes documents as hierarchical trees and retrieves answers using structure instead of embeddings.

Works with any LangChain-compatible LLM: **OpenAI, Anthropic, Gemini, Ollama**.

---

## 🚀 Why treerag?

- ❌ No vector database required  
- ⚡ Fast hierarchical retrieval  
- 🧠 Uses document structure instead of embeddings  
- 🔌 Works with any LLM  
- 🪶 Lightweight and easy to integrate  

---

## 📦 Install

```bash
# Base
pip install treerag

# With OpenAI
pip install "treerag[openai]"

# With Anthropic
pip install "treerag[anthropic]"

# Everything
pip install "treerag[all]"
```

---

## ⚡ Quick Start

```python
from treerag import index_document, make_summarizer, ask, make_retriever
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

# 1. Index document
doc = index_document("my_doc.md", summarizer=make_summarizer(llm))

# 2. Ask question
result = ask("What does this document cover?", doc, make_retriever(llm))

print(result.content)            # answer
print(result.references)         # sections used
print(result.response_metadata)  # token usage
```

---

## 🧠 How It Works

### Indexing

```
File / URL
    ↓ read_file()
    ↓ parse_sections()
    ↓ make_summarizer()
    ↓ build_hierarchy()
    ↓ flatten_tree()
    ↓ save_registry()
```

### Querying

```
Query
    ↓ tree search (LLM selects relevant nodes)
    ↓ fetch content
    ↓ answer generation
    ↓ AIMessage with references
```

---

## 📂 Supported Inputs

```python
# Files
index_document("file.md")
index_document("report.pdf")
index_document("document.docx")
index_document("notes.txt")

# URLs
index_document("https://docs.example.com")
```

---

## 📊 Response Formats

```python
# Default (LangChain AIMessage)
result = ask("What is this?", doc, retriever)
print(result.content)
print(result.references)

# Plain dict
result = ask("What is this?", doc, retriever, return_raw=False)
print(result["answer"])

# Streaming
for chunk in ask("What is this?", doc, retriever, stream=True):
    if isinstance(chunk, dict):
        print(chunk["__references__"])
    else:
        print(chunk, end="")
```

---

## ⚡ Async Support

```python
from treerag import aask, make_async_retriever

retriever = make_async_retriever(llm)
result = await aask("What is this?", doc, retriever)
print(result.content)
```

---

## 📚 Multi-Document Q&A

```python
from treerag import ask_multi, get_document_by_id

doc1 = get_document_by_id("uuid-1")
doc2 = get_document_by_id("uuid-2")

result = ask_multi("What is the budget cap?", [doc1, doc2], retriever)
print(result.content)
print(result.references)
```

---

## 🗂️ Registry Management

```python
from treerag import list_documents, get_document_by_id, delete_document

for doc in list_documents():
    print(doc["name"], doc["doc_id"])

doc = get_document_by_id("uuid")
delete_document("uuid")
```

---

## 🎯 Custom Prompts

```python
summarizer = make_summarizer(
    llm,
    system_prompt="You are a legal expert. Summarize clauses and obligations."
)

retriever = make_retriever(
    llm,
    answer_system_prompt=(
        "You are a legal assistant. "
        "Answer using ONLY the provided context."
    )
)

result = ask(
    "What are the key obligations?",
    doc,
    retriever,
    extra_context="This is a legal agreement."
)
```

---

## 🔌 Supported Providers

```python
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_ollama import ChatOllama

# OpenAI
make_retriever(ChatOpenAI(model="gpt-4o"))

# Anthropic
make_retriever(ChatAnthropic(model="claude-haiku"))

# Gemini
make_retriever(ChatGoogleGenerativeAI(model="gemini-2.0-flash"))

# Ollama (local)
make_retriever(ChatOllama(model="llama3"))
```

---

## 🏗️ Production Usage

```python
doc = index_document("file.md", persist=False)

# store externally
db.save(doc["doc_id"], doc)

# load later
doc = db.get("id")

result = ask("Your question?", doc, retriever)
```

---

## 📌 Example Use Cases

- 📄 Documentation Q&A  
- 📚 Internal knowledge base  
- 🤖 AI assistants without vector DB  
- 🧾 Legal / contract analysis  

---

## ⚔️ treerag vs Traditional RAG

| Feature        | treerag | Vector RAG |
|----------------|--------|-----------|
| Setup          | Simple | Complex   |
| DB required    | ❌     | ✅        |
| Cost           | Low    | High      |
| Retrieval      | Structure-based | Embeddings |

---

## 📄 License

MIT
