Metadata-Version: 2.4
Name: multimodal-parsers
Version: 0.1.1
Summary: PDF processing pipeline: remove headers/footers, convert to markdown, and generate image captions
Home-page: https://github.com/thuuyen98/PIER-QA
Author: Uyen Hoang
Author-email: thho00003@stud.uni-saarland.de
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: Pillow>=10.0.0
Requires-Dist: mlx-vlm>=0.3.0
Requires-Dist: pymupdf>=1.23.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: marker-pdf>=0.2.14
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# multimodal-parsers

PDF processing pipeline: removes headers/footers, converts to markdown, and generates image captions using MLX VLM.

## Installation

```bash
pip install multimodal-parsers
```

## Dependencies

The package automatically installs:
- Pillow
- mlx-vlm
- pymupdf
- scikit-learn
- numpy
- marker-pdf

Additionally, you may need to install:
```bash
pip install "unstructured[pdf]"
```

## Usage

After installation, use the `multimodal-parsers` command:

```bash
multimodal-parsers <input_dir> <output_dir>
```

### Example

```bash
multimodal-parsers Database/Private/Files Database/Private/Files/output
```

## What it does

1. **Removes headers and footers** from PDF files using clustering algorithms
2. **Converts PDFs to markdown** using marker-pdf
3. **Generates image captions** using MLX VLM (InternVL3-1B-4bit)
4. **Outputs final markdown files** with captioned images

## Development

```bash
git clone https://github.com/thuuyen98/PIER-QA
cd PIER-QA
pip install -e ".[dev]"
```

## License

MIT License
