Metadata-Version: 2.4
Name: manasrag
Version: 0.1.1
Summary: Hierarchical Retrieval-Augmented Generation with Haystack
Project-URL: Homepage, https://github.com/yourusername/manasrag
Project-URL: Documentation, https://github.com/yourusername/manasrag#readme
Project-URL: Source, https://github.com/yourusername/manasrag
Author-email: Your Name <your.email@example.com>
License: MIT
Keywords: haystack,hierarchical,knowledge-graph,llm,rag
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: haystack-ai>=2.6
Requires-Dist: networkx
Requires-Dist: python-dotenv
Requires-Dist: python-louvain
Requires-Dist: pyyaml
Requires-Dist: tiktoken
Provides-Extra: all
Requires-Dist: boilerpy3; extra == 'all'
Requires-Dist: coverage[toml]; extra == 'all'
Requires-Dist: fastapi>=0.104.0; extra == 'all'
Requires-Dist: haystack-ai[openai]; extra == 'all'
Requires-Dist: kaleido>=0.2.0; extra == 'all'
Requires-Dist: markdown-it-py; extra == 'all'
Requires-Dist: neo4j>=5.0; extra == 'all'
Requires-Dist: openai>=1.0; extra == 'all'
Requires-Dist: openpyxl; extra == 'all'
Requires-Dist: plotly>=5.18.0; extra == 'all'
Requires-Dist: pypdf; extra == 'all'
Requires-Dist: pytest-asyncio>=0.21; extra == 'all'
Requires-Dist: pytest>=7.0; extra == 'all'
Requires-Dist: python-docx; extra == 'all'
Requires-Dist: pyvis>=0.3.2; extra == 'all'
Requires-Dist: scikit-learn; extra == 'all'
Requires-Dist: uvicorn[standard]>=0.24.0; extra == 'all'
Provides-Extra: api
Requires-Dist: fastapi>=0.104.0; extra == 'api'
Requires-Dist: uvicorn[standard]>=0.24.0; extra == 'api'
Provides-Extra: cli
Requires-Dist: boilerpy3; extra == 'cli'
Requires-Dist: markdown-it-py; extra == 'cli'
Requires-Dist: openpyxl; extra == 'cli'
Requires-Dist: pypdf; extra == 'cli'
Requires-Dist: python-docx; extra == 'cli'
Provides-Extra: dev
Requires-Dist: coverage[toml]; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: neo4j
Requires-Dist: neo4j>=5.0; extra == 'neo4j'
Provides-Extra: openai
Requires-Dist: haystack-ai[openai]; extra == 'openai'
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: scikit-learn
Requires-Dist: scikit-learn; extra == 'scikit-learn'
Provides-Extra: visualization
Requires-Dist: kaleido>=0.2.0; extra == 'visualization'
Requires-Dist: plotly>=5.18.0; extra == 'visualization'
Requires-Dist: pyvis>=0.3.2; extra == 'visualization'
Description-Content-Type: text/markdown

# ManasRAG

> Hierarchical Retrieval-Augmented Generation with Haystack

This project implements [HiRAG](https://github.com/hhy-huang/HiRAG) using the [Haystack](https://github.com/deepset-ai/haystack) framework. ManasRAG is a hierarchical knowledge retrieval approach that combines knowledge graphs with community-based summarization for improved RAG systems.

## Features

- **Hierarchical Knowledge Structure**: Uses Leiden clustering to build multi-level community hierarchies
- **Multiple Retrieval Modes**:
  - `naive`: Basic RAG with document chunks
  - `local`: Local entity and relationship knowledge
  - `global`: Global community report knowledge
  - `bridge`: Cross-community reasoning paths
  - `nobridge`: Local + global combined (no paths)
  - `hi`: Full hierarchical retrieval combining all modes
- **Flexible Storage**: Supports NetworkX (in-memory) and Neo4j graph databases
- **Haystack Integration**: Built on Haystack's component and pipeline architecture

## Installation

```bash
# Basic installation
pip install -e .

# With OpenAI support
pip install -e ".[openai]"

# With Neo4j support
pip install -e ".[neo4j]"

# All optional dependencies
pip install -e ".[all]"
```

## Configuration

### Environment Variables

The project supports loading environment variables from a `.env` file. Copy the example file and configure it:

```bash
cp .env.example .env
```

Edit `.env` and add your API key:

```env
OPENAI_API_KEY=your-openai-api-key-here

# Optional: Custom API base URL
# OPENAI_BASE_URL=https://api.openai.com/v1
```

The examples will automatically load environment variables from the `.env` file.

## Quick Start

```python
from manasrag import ManasRAG
from haystack.components.generators import OpenAIGenerator
import os

# Initialize with OpenAI
manas = ManasRAG(
    working_dir="./manas_data",
    generator=OpenAIGenerator(
        model="gpt-4o-mini",
        api_key=os.getenv("OPENAI_API_KEY")
    ),
)

# Index documents
documents = """
# Machine Learning

Machine Learning is a subset of Artificial Intelligence focused on
algorithms that can learn from data...

# Neural Networks

Neural networks are computing systems inspired by biological neurons...
"""

manas.index(documents)

# Query with different modes
result = manas.query(
    "How are neural networks related to machine learning?",
    mode="hi"  # Full hierarchical retrieval
)

print(result["answer"])
```

## Retrieval Modes

| Mode | Description | Components |
|------|-------------|------------|
| `naive` | Basic RAG | Document chunks only |
| `local` | Local knowledge | Entities + Relations + Chunks |
| `global` | Global knowledge | Community reports + Chunks |
| `bridge` | Bridge knowledge | Cross-community reasoning paths |
| `nobridge` | No-bridge | Local + Global (no paths) |
| `hi` | Full hierarchical | All components combined |

## Advanced Usage

### Custom Query Parameters

```python
from manasrag import QueryParam

param = QueryParam(
    mode="hi",
    top_k=20,           # Number of entities to retrieve
    top_m=10,           # Key entities per community
    max_token_for_text_unit=20000,
    response_type="Multiple Paragraphs",
)

result = manas.query("Your query here", param=param)
```

### Using Custom LLM

```python
from haystack.components.generators import HuggingFaceLocalGenerator

generator = HuggingFaceLocalGenerator(
    model="HuggingFaceH4/zephyr-7b-beta"
)

manas = ManasRAG(generator=generator)
```

### Accessing Communities

```python
# After indexing, access detected communities
for comm_id, community in manas.communities.items():
    print(f"Community: {community.title}")
    print(f"Entities: {len(community.nodes)}")
    print(f"Report: {community.report_string[:200]}...")
```

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                      Indexing Pipeline                        │
├─────────────────────────────────────────────────────────────┤
│  Documents → Splitter → EntityExtractor → GraphDocumentStore │
│                                    ↓                          │
│                          CommunityDetector                     │
│                                    ↓                          │
│                       CommunityReportGenerator                │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                       Query Pipeline                          │
├─────────────────────────────────────────────────────────────┤
│  Query → EntityRetriever → HierarchicalRetriever            │
│                            ↓                                 │
│                      ContextBuilder                          │
│                            ↓                                 │
│                       PromptBuilder                          │
│                            ↓                                 │
│                       ChatGenerator → Answer                  │
└─────────────────────────────────────────────────────────────┘
```

## Project Structure

```
manasrag/
├── core/           # Core data structures
├── stores/         # Graph storage backends
├── components/     # Haystack components
├── pipelines/      # Indexing and query pipelines
└── __init__.py     # High-level API
```

## References

- [HiRAG Paper](https://arxiv.org/abs/2503.10150)
- [HiRAG GitHub](https://github.com/hhy-huang/HiRAG)
- [Haystack Documentation](https://docs.haystack.deepset.ai/)

## License

MIT

## Acknowledgments

Based on [HiRAG](https://github.com/hhy-huang/HiRAG) by Haoyu Huang et al.
