Metadata-Version: 2.4
Name: llama-index-readers-paddle-ocr
Version: 0.2.0
Summary: llama-index readers paddle_ocr integration
Author-email: Michael M L Ye <1156961988@qq.com>
Maintainer: mdarshad1000
License-Expression: MIT
License-File: LICENSE
Keywords: academic papers,ocr,pdf
Requires-Python: <4.0,>=3.10
Requires-Dist: llama-index-core<0.15,>=0.13.0
Requires-Dist: paddleocr>=2.7.0
Requires-Dist: paddlepaddle==3.2.0
Requires-Dist: pdfplumber==0.11.7
Requires-Dist: pillow==10.4.0
Requires-Dist: pymupdf==1.26.4
Requires-Dist: requests>=2.0.0
Description-Content-Type: text/markdown

# Paddle OCR loader

```bash
pip install llama-index-readers-paddle-ocr
```

This loader reads the equations, symbols, and tables included in the PDF.

Users can input the path of the academic PDF document `file` which they want to parse. This OCR understands LaTeX math and tables.

## Usage

Here's an example usage of the PDFPaddleOCR.

```python
from llama_index.readers.paddle_ocr import PDFPaddleOCR

reader = PDFPaddleOCR()

pdf_path = Path("/path/to/pdf")

documents = reader.load_data(pdf_path)
```

## Miscellaneous

An `output` folder will be created with the same name as the pdf and `.mmd` extension.
