Metadata-Version: 2.3
Name: ocrmypdf-rapidocr
Version: 1.0.0
Author: Adrian Mazur
License: MIT License
         
         Copyright (c) 2026 Adrian Mazur
         
         Permission is hereby granted, free of charge, to any person obtaining a copy
         of this software and associated documentation files (the "Software"), to deal
         in the Software without restriction, including without limitation the rights
         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
         copies of the Software, and to permit persons to whom the Software is
         furnished to do so, subject to the following conditions:
         
         The above copyright notice and this permission notice shall be included in all
         copies or substantial portions of the Software.
         
         THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
         SOFTWARE.
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: ocrmypdf>=17.3.0
Requires-Dist: onnxruntime>=1.24.3
Requires-Dist: rapidocr>=3.7.0
Requires-Python: >=3.12
Project-URL: Homepage, https://github.com/adrianmazur-dev/ocrmypdf-rapidocr
Project-URL: Repository, https://github.com/adrianmazur-dev/ocrmypdf-rapidocr.git
Description-Content-Type: text/markdown

# ocrmypdf-rapidocr

`ocrmypdf-rapidocr` is an OCRmyPDF plugin that uses [RapidOCR](https://github.com/RapidAI/RapidOCR) as an OCR engine.

## Status

Supported:

- OCR engine integration via OCRmyPDF plugin hooks
- `hOCR` output path (`--pdf-renderer auto` or `--pdf-renderer fpdf2`)
- ONNXRuntime backend only
- Single language selection from `-l/--language`

Not supported:

- `--pdf-renderer sandwich`
- multi-language combinations such as `-l eng+fra`

## Installation

```bash
pip install ocrmypdf-rapidocr
```

Or from source:

```bash
pip install .
```

## Usage

Load the plugin explicitly with `--plugin`:

```bash
ocrmypdf --plugin ocrmypdf_rapidocr -l eng input.pdf output.pdf
```

Optional plugin arguments:

- `--rapidocr-config-path PATH`: use a custom RapidOCR YAML config

Example:

```bash
ocrmypdf \
  --plugin ocrmypdf_rapidocr \
  -l deu \
  input.pdf output.pdf
```

## Language behavior

The plugin uses the first OCRmyPDF language code and maps it to RapidOCR language families.

- direct mappings: `eng`, `chi_sim`, `chi_tra`, `jpn`, `kor`, `ara`, `rus`, `ukr`, `tha`, `tam`, `tel`, `ell`/`gre`
- selected Latin-script codes map to RapidOCR `LATIN`

If a language code is unsupported, OCRmyPDF exits with a clear error message.

## Runtime model downloads

RapidOCR downloads model files on first use when model paths are not pinned in config.
For offline or restricted environments, provide a custom config via
`--rapidocr-config-path` that points to local model files.

## References

- OCRmyPDF plugin API docs: <https://github.com/ocrmypdf/OCRmyPDF/blob/main/docs/plugins.md>
- OCRmyPDF EasyOCR reference plugin: <https://github.com/ocrmypdf/OCRmyPDF-EasyOCR>
- OCRmyPDF AppleOCR reference plugin: <https://github.com/mkyt/OCRmyPDF-AppleOCR>
- OCRmyPDF PaddleOCR reference plugin: <https://github.com/clefru/ocrmypdf-paddleocr>
- RapidOCR project: <https://github.com/RapidAI/RapidOCR>
