Metadata-Version: 2.4
Name: anyocr
Version: 0.2.3
Summary: A lightweight, unified OCR toolkit with a one-liner API. Supports Surya, EasyOCR, PaddleOCR, Tesseract, and Vision LLMs through a single interface.
Project-URL: Homepage, https://github.com/vietanhdev/anyocr
Project-URL: Documentation, https://github.com/vietanhdev/anyocr#readme
Project-URL: Repository, https://github.com/vietanhdev/anyocr
Project-URL: Issues, https://github.com/vietanhdev/anyocr/issues
Project-URL: Author, https://www.nrl.ai
Author-email: Viet-Anh Nguyen <vietanh.dev@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: easyocr,ocr,paddleocr,surya,tesseract,text-recognition,vlm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: click>=8.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: pillow>=9.0.0
Provides-Extra: all
Requires-Dist: anyllm>=0.1.0; extra == 'all'
Requires-Dist: easyocr>=1.6.0; extra == 'all'
Requires-Dist: paddleocr>=2.6.0; extra == 'all'
Requires-Dist: paddlepaddle>=2.4.0; extra == 'all'
Requires-Dist: pytesseract>=0.3.10; extra == 'all'
Requires-Dist: surya-ocr>=0.6.0; extra == 'all'
Provides-Extra: benchmark
Requires-Dist: numpy>=1.21.0; extra == 'benchmark'
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Provides-Extra: easyocr
Requires-Dist: easyocr>=1.6.0; extra == 'easyocr'
Provides-Extra: llm
Requires-Dist: anyllm>=0.1.0; extra == 'llm'
Provides-Extra: paddle
Requires-Dist: paddleocr>=2.6.0; extra == 'paddle'
Requires-Dist: paddlepaddle>=2.4.0; extra == 'paddle'
Provides-Extra: pdf
Requires-Dist: pdf2image>=1.16.0; extra == 'pdf'
Provides-Extra: progress
Requires-Dist: tqdm>=4.60.0; extra == 'progress'
Provides-Extra: surya
Requires-Dist: surya-ocr>=0.6.0; extra == 'surya'
Provides-Extra: tesseract
Requires-Dist: pytesseract>=0.3.10; extra == 'tesseract'
Provides-Extra: vlm
Requires-Dist: anyllm>=0.1.0; extra == 'vlm'
Description-Content-Type: text/markdown

<h1 align="center">anyocr</h1>
<p align="center"><em>One unified API over every major OCR engine — from Tesseract to vision LLMs.</em></p>

<p align="center">
<img src="https://img.shields.io/pypi/v/anyocr.svg" alt="PyPI">
<img src="https://img.shields.io/pypi/pyversions/anyocr.svg" alt="Python">
<img src="https://img.shields.io/pypi/l/anyocr.svg" alt="License">
</p>

**anyocr** gives you a single `anyocr.read()` call that extracts text from images and documents using whichever OCR engine you have installed. It auto-selects the best available backend by priority (Surya > EasyOCR > PaddleOCR > Tesseract > Vision-LLM), applies a smart preprocessing pipeline (auto-rotate, deskew, contrast enhancement, binarization), and returns structured results with bounding boxes, confidence scores, and reading order.

Built by [Viet-Anh Nguyen](https://github.com/vietanhdev) at [NRL.ai](https://www.nrl.ai).

## Why anyocr?

- **One-liner API** — `anyocr.read("scan.png")` just works with any installed backend
- **Plugin architecture** — Register new OCR engines via `@register_backend`
- **Local-first** — Surya, EasyOCR, Paddle, Tesseract all run on your machine
- **Minimal core deps** — Only `pillow` and `numpy`; every OCR engine is an optional extra
- **Production-ready** — Auto-preprocessing, structured dataclass results, batch inference

## Installation

```bash
pip install anyocr
```

For optional backends:

```bash
pip install anyocr[surya]        # Surya OCR — SOTA open source, 90+ languages
pip install anyocr[easyocr]      # EasyOCR — CRNN models, 80+ languages
pip install anyocr[paddle]       # PaddleOCR — strong Asian languages
pip install anyocr[tesseract]    # Tesseract via pytesseract (needs tesseract binary)
pip install anyocr[vlm]          # Vision-LLM via anyllm (GPT-4V, Claude, Gemini)
pip install anyocr[all]          # everything
```

**Python 3.8+ supported** (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

## Quick Start

```python
import anyocr

# 1. Auto-selects the best installed backend (Surya > EasyOCR > Paddle > Tesseract > VLM)
result = anyocr.read("receipt.png")
print(result.text)                       # full extracted text
for line in result.lines:
    print(line.text, line.confidence, line.bbox)

# 2. Force a specific backend
text = anyocr.read("chinese.jpg", backend="paddle", lang="ch").text

# 3. Use a vision LLM for hard cases (tables, handwriting)
text = anyocr.read("handwritten.jpg", backend="vlm", model="gpt-4o").text
```

## Models & Methods

### Supported backends (auto-selected by priority)

| Priority | Backend | Model family | Languages | Install |
|---|---|---|---|---|
| 1 | **Surya OCR** | Transformer-based detection + recognition (DETR + Donut-style) | 90+ | `anyocr[surya]` |
| 2 | **EasyOCR** | CRAFT detector + CRNN recognizer | 80+ | `anyocr[easyocr]` |
| 3 | **PaddleOCR** | PP-OCRv4 (DBNet + SVTR) | 80+, strong CJK | `anyocr[paddle]` |
| 4 | **Tesseract** | LSTM-based (Tesseract 4+) | 100+ | `anyocr[tesseract]` |
| 5 | **Vision-LLM** | Any multi-modal LLM via anyllm (GPT-4V, Claude 3.5 Sonnet, Gemini, LLaVA) | Any | `anyocr[vlm]` |

You can change the priority or force a backend via `anyocr.read(..., backend="easyocr")` or `anyocr.set_priority(["paddle", "surya"])`.

### Preprocessing pipeline

Applied automatically (can be disabled per call):

1. **EXIF orientation fix** — rotate based on metadata
2. **Auto-rotate** — detect 90/180/270 rotation via text-line angle histogram
3. **Deskew** — Hough-transform-based angle correction (<= 15 degrees)
4. **Contrast enhancement** — CLAHE (adaptive histogram equalization)
5. **Binarization** — adaptive threshold for low-quality scans (opt-in)
6. **Denoise** — bilateral filter for scanned documents (opt-in)

### Result dataclasses

```python
@dataclass
class OCRLine:
    text: str
    confidence: float
    bbox: tuple[float, float, float, float]   # x1, y1, x2, y2
    polygon: list[tuple[float, float]] | None  # 4-point quad if supported

@dataclass
class OCRResult:
    text: str                  # joined full text in reading order
    lines: list[OCRLine]
    backend: str               # which backend produced this result
    language: str | None
```

## API Reference

| Function | Purpose |
|---|---|
| `anyocr.read(image, backend="auto", lang=None)` | Run OCR, returns `OCRResult` |
| `anyocr.read_pdf(pdf_path)` | OCR every page of a PDF |
| `anyocr.list_backends()` | Show installed backends |
| `anyocr.set_priority([...])` | Override auto-selection order |
| `anyocr.preprocess(image, ops=[...])` | Run preprocessing pipeline only |
| `anyocr.register_backend(name, cls)` | Add a custom backend |

## CLI Usage

```bash
anyocr read receipt.png
anyocr read scan.jpg --backend paddle --lang ch
anyocr read-pdf document.pdf --out text.txt
anyocr list-backends
```

## Examples

### OCR an entire PDF and save as text

```python
import anyocr

# Rasterizes each page and runs the auto-selected backend
result = anyocr.read_pdf("report.pdf")
with open("report.txt", "w") as f:
    for page_num, page in enumerate(result.pages, 1):
        f.write(f"=== Page {page_num} ===\n{page.text}\n\n")
```

### Combine preprocessing with a specific backend

```python
import anyocr

# Run the preprocessing pipeline explicitly before OCR
cleaned = anyocr.preprocess("noisy_scan.jpg", ops=["deskew", "clahe", "binarize"])
result  = anyocr.read(cleaned, backend="tesseract", lang="eng")
print(result.text)
```

### Compare two backends on the same image

```python
import anyocr

for backend in ["surya", "easyocr", "paddle"]:
    r = anyocr.read("test.jpg", backend=backend)
    print(f"{backend}: {r.text[:80]}... (avg conf {r.mean_confidence():.2f})")
```

## License

MIT (c) Viet-Anh Nguyen
