Metadata-Version: 2.4
Name: deeplightrag
Version: 1.0.16
Summary: DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)
Author-email: Phuong Nguyen <nhphuong.code@gmail.com>
Maintainer-email: Phuong Nguyen <nhphuong.code@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/png261/DeepLightRag
Project-URL: Repository, https://github.com/png261/DeepLightRag
Project-URL: Bug Tracker, https://github.com/png261/DeepLightRag/issues
Project-URL: Changelog, https://github.com/png261/DeepLightRag/releases
Keywords: rag,retrieval,augmented,generation,ocr,vision,graph,nlp,llm,deepseek,document-processing
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Requires-Dist: networkx>=3.0
Requires-Dist: Pillow>=10.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: typing-extensions>=4.0.0; python_version < "3.10"
Requires-Dist: pdf2image>=1.16.0
Requires-Dist: PyMuPDF>=1.23.0
Requires-Dist: easyocr>=1.7.0
Requires-Dist: torch>=2.0.0
Requires-Dist: torchvision>=0.15.0
Requires-Dist: transformers>=4.40.0
Requires-Dist: accelerate>=0.24.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: setfit>=1.0.0
Requires-Dist: gliner>=0.1.12
Requires-Dist: faiss-cpu>=1.7.4
Requires-Dist: toon-python>=0.1.2
Provides-Extra: gpu
Requires-Dist: bitsandbytes>=0.41.0; extra == "gpu"
Provides-Extra: macos
Requires-Dist: mlx>=0.21.0; extra == "macos"
Requires-Dist: mlx-lm>=0.19.0; extra == "macos"
Requires-Dist: mlx-vlm>=0.0.3; extra == "macos"
Provides-Extra: llm
Requires-Dist: google-generativeai>=0.3.0; extra == "llm"
Requires-Dist: openai>=1.0.0; extra == "llm"
Requires-Dist: anthropic>=0.25.0; extra == "llm"
Provides-Extra: advanced-re
Requires-Dist: opennre>=1.1.0; extra == "advanced-re"
Provides-Extra: web
Requires-Dist: streamlit>=1.30.0; extra == "web"
Requires-Dist: plotly>=5.18.0; extra == "web"
Requires-Dist: pandas>=2.0.0; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.3.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.1.0; extra == "dev"
Requires-Dist: pytest-mock>=3.11.0; extra == "dev"
Requires-Dist: black>=23.9.0; extra == "dev"
Requires-Dist: ruff>=0.0.290; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: pre-commit>=3.3.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Requires-Dist: wheel>=0.41.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=7.1.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.24.0; extra == "docs"
Requires-Dist: myst-parser>=1.0.0; extra == "docs"
Provides-Extra: all
Requires-Dist: deeplightrag[advanced-re,dev,docs,gpu,llm,web]; extra == "all"
Dynamic: license-file

# DeepLightRAG

DeepLightRAG is a high-performance document indexing and retrieval system designed to work with any Large Language Model (LLM). It features a dual-layer graph architecture (Visual-Spatial and Entity-Relationship) to provide context-aware and visually-grounded retrieval.

## Features

- **Dual-Layer Graph**: Combines visual layout awareness with semantic entity relationships.
- **Visual-Grounded Retrieval**: Retrieves not just text, but visual regions and their spatial context.
- **Robust OCR**: Integrated with DeepSeek-OCR and EasyOCR fallback for reliable text extraction.
- **Advanced NER**: Uses GLiNER for zero-shot entity recognition.
- **Flexible LLM Support**: Compatible with OpenAI, Google Gemini, Anthropic, and local LLMs via MLX/Ollama.

## Installation

### Standard Installation
```bash
pip install deeplightrag
```

### With GPU Support (NVIDIA CUDA)
For optimized performance using quantization (4-bit/8-bit):
```bash
pip install "deeplightrag[gpu]"
```

### For macOS (Apple Silicon)
For optimization on M1/M2/M3 chips:
```bash
pip install "deeplightrag[macos]"
```

## Usage

### Command Line Interface

Index a document:
```bash
# Basic usage
deeplightrag index document.pdf

# With custom configuration
deeplightrag index document.pdf --config config.yaml
```

Retrieve information:
```bash
deeplightrag retrieve "What is the main topic?" --config config.yaml
```

### Configuration File (config.yaml)

You can customize the model and system behavior using a YAML file:

```yaml
ocr:
  model_name: "deepseek-ai/deepseek-ocr"
  # Override MLX automatic selection (useful for some models)
  use_mlx: false 
  resolution: "base"

retrieval:
  top_k: 5
  rerank: true
```

### Python API

```python
from deeplightrag.core import DeepLightRAG

# Initialize with hardware auto-detection
rag = DeepLightRAG(config={"ocr": {"use_mlx": True}})

# Index
rag.index_document("research_paper.pdf", document_id="doc_001")

# Retrieve
result = rag.retrieve("Summarize the methodology")
print(result)
```

## License

MIT License
