Metadata-Version: 2.4
Name: pyconverters-mineru
Version: 0.7.3
Summary: Convert PDF to structured text using MinerU
Home-page: https://github.com/oterrier/pyconverters_mineru/
Keywords: 
Author: Olivier Terrier
Author-email: olivier.terrier@kairntech.com
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development
Classifier: License :: OSI Approved :: MIT License
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3.8
License-File: LICENSE
Requires-Dist: pymultirole-plugins>=0.7.0,<0.8.0
Requires-Dist: httpx==0.23.0
Requires-Dist: requests
Requires-Dist: inscriptis==1.2
Requires-Dist: filetype==1.0.13
Requires-Dist: pymupdf==1.24.10
Requires-Dist: pylatexenc
Requires-Dist: flit ; extra == "dev"
Requires-Dist: pre-commit ; extra == "dev"
Requires-Dist: bump2version ; extra == "dev"
Requires-Dist: sphinx ; extra == "docs"
Requires-Dist: sphinx-rtd-theme ; extra == "docs"
Requires-Dist: m2r2 ; extra == "docs"
Requires-Dist: sphinxcontrib.apidoc ; extra == "docs"
Requires-Dist: jupyter_sphinx ; extra == "docs"
Requires-Dist: pytest ; extra == "test"
Requires-Dist: pytest-cov ; extra == "test"
Requires-Dist: pytest-flake8 ; extra == "test"
Requires-Dist: pytest-black ; extra == "test"
Requires-Dist: flake8==3.9.2 ; extra == "test"
Requires-Dist: tox ; extra == "test"
Requires-Dist: pandas ; extra == "test"
Requires-Dist: langdetect ; extra == "test"
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: test

# pyconverters_mineru

[![license](https://img.shields.io/github/license/oterrier/pyconverters_mineru)](https://github.com/oterrier/pyconverters_mineru/blob/master/LICENSE)
[![tests](https://github.com/oterrier/pyconverters_mineru/workflows/tests/badge.svg)](https://github.com/oterrier/pyconverters_mineru/actions?query=workflow%3Atests)
[![codecov](https://img.shields.io/codecov/c/github/oterrier/pyconverters_mineru)](https://codecov.io/gh/oterrier/pyconverters_mineru)
[![docs](https://img.shields.io/readthedocs/pyconverters_mineru)](https://pyconverters_mineru.readthedocs.io)
[![version](https://img.shields.io/pypi/v/pyconverters_mineru)](https://pypi.org/project/pyconverters_mineru/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pyconverters_mineru)](https://pypi.org/project/pyconverters_mineru/)

Convert PDF to structured text using [MinerU](https://github.com/kermitt2/mineru)

## Installation

You can simply `pip install pyconverters_mineru`.

## Developing

### Pre-requisites

You will need to install `uv` (for package management and building):

```
pip install uv
```

Clone the repository:

```
git clone https://github.com/oterrier/pyconverters_mineru
```

### Install dependencies

```
uv sync --extra test
```

### Running the test suite

```
uv run pytest
```

### Linting

```
uv run ruff check .
uv run ruff format --check .
```

### Building the documentation

```
uv run --extra docs sphinx-build docs docs/_build
```

The built documentation is available at `docs/_build/index.html`.

## SBOM & vulnerability check

Install the SBOM dependencies:

```
uv sync --extra sbom
```

Generate a CycloneDX SBOM from the current environment:

```
uv run cyclonedx-py environment -o sbom.cdx.json --output-format json
```

Audit dependencies for known vulnerabilities:

```
uv run pip-audit --format json --output audit-report.json
```

To fail on any known vulnerability (useful in CI):

```
uv run pip-audit --strict
```

