Metadata-Version: 2.4
Name: visual-parser
Version: 1.0.2
Summary: Standalone Visual-RAG PDF Parser — text extraction + Vision-LLM figure descriptions → JSONL
License: MIT
Project-URL: Homepage, https://github.com/SmartLabNuclear/RADIANT_LLM
Project-URL: Repository, https://github.com/SmartLabNuclear/RADIANT_LLM
Project-URL: Docker Hub, https://hub.docker.com/r/zev94/radiant-llm
Keywords: pdf,rag,nougat,vision-llm,ocr,document-parsing,jsonl,knowledge-base
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Markup
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: PyMuPDF==1.24.7
Requires-Dist: Pillow==11.1.0
Requires-Dist: torch==2.7.0
Requires-Dist: transformers==4.45.2
Requires-Dist: huggingface-hub==0.36.0
Requires-Dist: langchain-community==0.3.3
Requires-Dist: langchain==0.3.13
Requires-Dist: langchain-text-splitters==0.3.4
Requires-Dist: openai==1.78.1
Requires-Dist: google-generativeai==0.8.5
Requires-Dist: python-dotenv==1.1.0
Requires-Dist: tqdm==4.67.1
Requires-Dist: nltk>=3.8
Requires-Dist: python-Levenshtein>=0.20
Provides-Extra: ocr
Requires-Dist: pytesseract==0.3.13; extra == "ocr"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"

# visual-parser (Standalone Visual-RAG PDF Ingestion)

<!-- ![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg) -->
![Python 3.12.10](https://img.shields.io/badge/Python-3.12.10-brightgreen.svg)
<!-- ![PyTorch](https://img.shields.io/badge/PyTorch-2.0%2B-ee4c2c.svg) -->

`visual-parser` is a standalone document-ingestion tool that converts PDFs into a multi-modal JSONL knowledge base (text chunks + figure descriptions + metadata). The intended workflow is:

1) Run `visual-parser` on curated PDFs to generate JSONL KB files.
2) Run RADIANT-LLM Visual-RAG for QA over the generated KB.

## Outputs (JSONL KB)

By default, the pipeline writes:
- `01_chunks_kb.jsonl`: chunked text extracted from PDFs (Nougat by default).
- `02_visuals_kb.jsonl`: figure/page visual descriptions (Vision LLM).
- `03_metadata_kb.jsonl`: document metadata rows (title/author/etc.).
- `04_processed_pdfs.txt`: a tracker so re-runs only process new PDFs (unless `--rebuild`).

## API keys (`.env`)

Provide at least one provider:
- `OPENAI_API_KEY` (OpenAI)
- `GEMINI_API_KEY` (Gemini)

Optional:
- `HF_TOKEN` (if you use gated Hugging Face models)

## Run with Docker (Docker Hub)

Prebuilt images are on **[zev94/radiant-llm](https://hub.docker.com/r/zev94/radiant-llm)** under the **visual-parser** tags:

| Tag | Description |
|-----|-------------|
| `visual-parser-1.0` | Pinned release |
| `visual-parser-latest` | Latest visual-parser build |

### 1) Install Docker
- Docker Desktop (Windows/macOS) or Docker Engine (Linux)

### 2) Pull the image
```bash
docker pull zev94/radiant-llm:visual-parser-1.0
```

### 3) Run (input + output on the same mounted folder)
Windows PowerShell:
```powershell
docker run --rm --env-file .env `
  -v "C:\path\to\pdfs:/data" `
  zev94/radiant-llm:visual-parser-1.0 `
  --input-dir /data --output-dir /data
```

Linux / WSL:
```bash
docker run --rm --env-file .env \
  -v "/path/to/pdfs:/data" \
  zev94/radiant-llm:visual-parser-1.0 \
  --input-dir /data --output-dir /data
```

### 4) Run (separate output directory)
Windows PowerShell:
```powershell
docker run --rm --env-file .env `
  -v "C:\path\to\pdfs:/data" `
  -v "C:\path\to\out:/out" `
  zev94/radiant-llm:visual-parser-1.0 `
  --input-dir /data --output-dir /out
```

### Offline install (legacy `.tar`)

```powershell
docker load -i .\visual-parser_0.1.0.tar
docker images   # use the tag printed by Docker
```

### Model overrides (optional)

Default vision model is **GPT-5.5** when using `--vision-provider gpt`. Override on the command line:

```powershell
docker run --rm --env-file .env -v "C:\path\to\pdfs:/data" `
  zev94/radiant-llm:visual-parser-1.0 `
  --input-dir /data --output-dir /data --vision-model gpt-5.4
```

<!-- ## Run from source (Python)

From `codebase/Visual-Parser/`:
```powershell
python visual-parser.py --input-dir "C:\path\to\pdfs"
``` -->

## Common configuration flags

After pulling the image, run:

```bash
docker run --rm zev94/radiant-llm:visual-parser-1.0 --help
```

For copy-paste **Docker** examples (vision presets, text modes, workers, rebuild), see [`docker-usage-examples.md`](docker-usage-examples.md).

Paths:
- `--input-dir` / `-i` (required)
- `--output-dir` / `-o` (default: same as input)

Text extraction:
- `--text-mode nougat|lightweight` (default: `nougat`)
- `--nougat-model facebook/nougat-small`
- `--chunk-size 500`
- `--chunk-overlap 100`

Vision LLM:
- `--vision-provider gpt|gemini` (default: `gpt`)
- `--vision-model gpt-5.2` (or `gpt-4o`, `gemini-2.5-flash`, etc.)
- `--vision-detail low|high|auto`
- `--reasoning-effort none|low|medium|high|xhigh`
- `--metadata-pages 2`

Performance / misc:
- `--max-workers 4`
- `--rebuild` (reprocess everything; ignore `04_processed_pdfs.txt`)
- `--log-level DEBUG|INFO|WARNING|ERROR`

---

## Citation

If you use RADIANT-LLM or the accompanying evaluation materials, please cite the preprint:

```bibtex
@article{ndum2026radiant,
  title={RADIANT-LLM: an Agentic Retrieval Augmented Generation Framework for Reliable Decision Support in Safety-Critical Nuclear Engineering},
  author={Ndum, Zavier Ndum and Tao, Jian and Ford, John and Yim, Mansung and Liu, Yang},
  journal={arXiv preprint arXiv:2604.22755},
  year={2026}
}
```

Preprint: https://arxiv.org/abs/2604.22755

---

## License

This repository is currently proprietary and not licensed for public use, redistribution, or modification. Licensing terms will be updated after institutional review.


