Metadata-Version: 2.4
Name: deeplightrag
Version: 1.0.22
Summary: DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)
Author-email: Phuong Nguyen <nhphuong.code@gmail.com>
Maintainer-email: Phuong Nguyen <nhphuong.code@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/png261/DeepLightRag
Project-URL: Repository, https://github.com/png261/DeepLightRag
Project-URL: Bug Tracker, https://github.com/png261/DeepLightRag/issues
Project-URL: Changelog, https://github.com/png261/DeepLightRag/releases
Keywords: rag,retrieval,augmented,generation,ocr,vision,graph,nlp,llm,deepseek,document-processing
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Requires-Dist: networkx>=3.0
Requires-Dist: Pillow>=10.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: typing-extensions>=4.0.0; python_version < "3.10"
Requires-Dist: pdf2image>=1.16.0
Requires-Dist: PyMuPDF>=1.23.0
Requires-Dist: easyocr>=1.7.0
Requires-Dist: torch>=2.0.0
Requires-Dist: torchvision>=0.15.0
Requires-Dist: transformers>=4.40.0
Requires-Dist: accelerate>=0.24.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: setfit>=1.0.0
Requires-Dist: gliner>=0.1.12
Requires-Dist: faiss-cpu>=1.7.4
Requires-Dist: toon-python>=0.1.2
Provides-Extra: gpu
Requires-Dist: bitsandbytes>=0.41.0; extra == "gpu"
Provides-Extra: macos
Requires-Dist: mlx>=0.21.0; extra == "macos"
Requires-Dist: mlx-lm>=0.19.0; extra == "macos"
Requires-Dist: mlx-vlm>=0.0.3; extra == "macos"
Provides-Extra: llm
Requires-Dist: google-generativeai>=0.3.0; extra == "llm"
Requires-Dist: openai>=1.0.0; extra == "llm"
Requires-Dist: anthropic>=0.25.0; extra == "llm"
Provides-Extra: advanced-re
Requires-Dist: opennre>=1.1.0; extra == "advanced-re"
Provides-Extra: web
Requires-Dist: streamlit>=1.30.0; extra == "web"
Requires-Dist: plotly>=5.18.0; extra == "web"
Requires-Dist: pandas>=2.0.0; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.3.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.1.0; extra == "dev"
Requires-Dist: pytest-mock>=3.11.0; extra == "dev"
Requires-Dist: black>=23.9.0; extra == "dev"
Requires-Dist: ruff>=0.0.290; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: pre-commit>=3.3.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Requires-Dist: wheel>=0.41.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=7.1.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.24.0; extra == "docs"
Requires-Dist: myst-parser>=1.0.0; extra == "docs"
Provides-Extra: all
Requires-Dist: deeplightrag[advanced-re,dev,docs,gpu,llm,web]; extra == "all"
Dynamic: license-file

# DeepLightRAG

[![PyPI version](https://badge.fury.io/py/deeplightrag.svg)](https://badge.fury.io/py/deeplightrag)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

High-performance document indexing and retrieval system. Works with any LLM.

## Features

- **Dual-Layer Graph**: Visual-Spatial + Entity-Relationship architecture
- **GLiNER2 NER**: Zero-shot entity extraction with [fastino/gliner2-base-v1](https://github.com/fastino-ai/GLiNER2)
- **DeepSeek-OCR**: Visual token compression for efficient document processing
- **TOON Format**: Token-efficient context formatting via [toon-python](https://pypi.org/project/toon-python/)
- **Any LLM**: Works with OpenAI, Gemini, Claude, Ollama, MLX

## Installation

```bash
pip install deeplightrag
```

**With GPU (CUDA):**
```bash
pip install "deeplightrag[gpu]"
```

**macOS (Apple Silicon):**
```bash
pip install "deeplightrag[macos]"
```

## Quick Start

### Python API

```python
from deeplightrag import DeepLightRAG

# Initialize
rag = DeepLightRAG()

# Index document
rag.index_document("document.pdf", document_id="doc_001")

# Retrieve
result = rag.retrieve("What are the key findings?")
print(result["context"])
```

### CLI

```bash
# Index
deeplightrag index document.pdf

# Query
deeplightrag retrieve "What is the main topic?"
```

## Architecture

```
┌─────────────────────────────────────────────┐
│              DeepLightRAG                   │
├─────────────────────────────────────────────┤
│  DeepSeek-OCR → Visual Token Compression    │
├─────────────────────────────────────────────┤
│  Dual-Layer Graph                           │
│  ├── Layer 1: Visual-Spatial (WHERE)        │
│  └── Layer 2: Entity-Relationship (WHAT)    │
├─────────────────────────────────────────────┤
│  GLiNER2 → Entity Extraction                │
├─────────────────────────────────────────────┤
│  Adaptive Retriever → Context Generation    │
└─────────────────────────────────────────────┘
```

## Configuration

```yaml
# config.yaml
ocr:
  model_name: "deepseek-ai/DeepSeek-OCR"
  resolution: "gundam"

ner:
  model_name: "fastino/gliner2-base-v1"
  confidence_threshold: 0.3

retrieval:
  top_k: 5
```

## Requirements

- Python >= 3.9
- CUDA GPU recommended for DeepSeek-OCR
- Dependencies: `gliner2`, `toon-python`, `networkx`, `torch`

## License

MIT License
