Metadata-Version: 2.4
Name: panoocr
Version: 0.5.0
Summary: PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.
Project-URL: Homepage, https://github.com/yz3440/panoocr
Project-URL: Repository, https://github.com/yz3440/panoocr
Project-URL: Issues, https://github.com/yz3440/panoocr/issues
Author-email: Yufeng Zhao <yufeng-zhao@outlook.com>
License: MIT
License-File: LICENSE
Keywords: 360,computer-vision,equirectangular,ocr,panorama,text-recognition
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.11
Requires-Dist: geopandas>=0.14.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: py360convert>=0.1.0
Requires-Dist: shapely>=2.0.0
Requires-Dist: textdistance>=4.5.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: docs
Requires-Dist: mkdocs-terminal>=4.6.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'docs'
Requires-Dist: pymdown-extensions>=10.0.0; extra == 'docs'
Provides-Extra: easyocr
Requires-Dist: easyocr>=1.7.0; extra == 'easyocr'
Provides-Extra: florence2
Requires-Dist: einops>=0.7.0; extra == 'florence2'
Requires-Dist: timm>=0.9.0; extra == 'florence2'
Requires-Dist: torch>=2.0.0; extra == 'florence2'
Requires-Dist: transformers>=4.40.0; extra == 'florence2'
Provides-Extra: full
Requires-Dist: easyocr>=1.7.0; extra == 'full'
Requires-Dist: einops>=0.7.0; extra == 'full'
Requires-Dist: matplotlib>=3.7.0; extra == 'full'
Requires-Dist: onnxruntime; extra == 'full'
Requires-Dist: opencv-python>=4.8.0; extra == 'full'
Requires-Dist: paddleocr>=3.0.0; extra == 'full'
Requires-Dist: paddlepaddle>=3.0.0; extra == 'full'
Requires-Dist: rapidocr; extra == 'full'
Requires-Dist: scipy>=1.10.0; extra == 'full'
Requires-Dist: timm>=0.9.0; extra == 'full'
Requires-Dist: torch>=2.0.0; extra == 'full'
Requires-Dist: transformers>=4.40.0; extra == 'full'
Provides-Extra: gemini
Requires-Dist: google-genai; extra == 'gemini'
Provides-Extra: google-vision
Requires-Dist: requests>=2.28.0; extra == 'google-vision'
Provides-Extra: macocr
Requires-Dist: ocrmac>=0.2.0; extra == 'macocr'
Provides-Extra: mlx-vlm
Requires-Dist: mlx-vlm; extra == 'mlx-vlm'
Requires-Dist: torch>=2.0.0; extra == 'mlx-vlm'
Requires-Dist: torchvision; extra == 'mlx-vlm'
Provides-Extra: paddleocr
Requires-Dist: paddleocr>=3.0.0; extra == 'paddleocr'
Requires-Dist: paddlepaddle>=3.0.0; extra == 'paddleocr'
Provides-Extra: rapidocr
Requires-Dist: onnxruntime; extra == 'rapidocr'
Requires-Dist: rapidocr; extra == 'rapidocr'
Provides-Extra: trocr
Requires-Dist: torch>=2.0.0; extra == 'trocr'
Requires-Dist: transformers>=4.40.0; extra == 'trocr'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.7.0; extra == 'viz'
Requires-Dist: opencv-python>=4.8.0; extra == 'viz'
Requires-Dist: scipy>=1.10.0; extra == 'viz'
Description-Content-Type: text/markdown

# PanoOCR

PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.

https://github.com/user-attachments/assets/57507c48-ec88-4d4a-bf68-067eefc9d42f

## Features

- **Multiple OCR Engines**: Support for MacOCR (Apple Vision), RapidOCR, EasyOCR, PaddleOCR, Florence-2, Google Cloud Vision, Gemini, and more
- **Automatic Perspective Projection**: Converts equirectangular panoramas to multiple perspective views for better OCR accuracy
- **Deduplication**: Automatically removes duplicate text detections across overlapping perspective views
- **Spherical Coordinates**: Returns OCR results in yaw/pitch coordinates that map directly to the panorama
- **Preview Tool**: Interactive 3D preview of OCR results on the panorama

## Installation

Install the base package:

```bash
pip install panoocr
```

Install with OCR engine dependencies:

```bash
# macOS (Apple Vision Framework)
pip install "panoocr[macocr]"

# RapidOCR (PP-OCRv4/v5 via ONNX Runtime, cross-platform)
pip install "panoocr[rapidocr]"

# EasyOCR (cross-platform)
pip install "panoocr[easyocr]"

# PaddleOCR (cross-platform)
pip install "panoocr[paddleocr]"

# Florence-2 via transformers + torch (requires GPU recommended)
pip install "panoocr[florence2]"

# MLX VLM engines: Florence-2 MLX, GLM-OCR, DOTS.OCR (macOS Apple Silicon)
pip install "panoocr[mlx-vlm]"

# Google Cloud Vision API (requires API key)
pip install "panoocr[google-vision]"

# Gemini API (requires API key)
pip install "panoocr[gemini]"

# Cross-platform local engines + visualization
# (excludes macOS-only macocr, Apple Silicon mlx-vlm, cloud APIs, and experimental trocr)
pip install "panoocr[full]"
```

Using uv (recommended):

```bash
uv add panoocr
uv sync --extra macocr      # or other extras
uv sync --extra rapidocr
uv sync --extra mlx-vlm     # Florence-2 MLX, GLM-OCR, DOTS.OCR
```

## Quick Start

```python
from panoocr import PanoOCR
from panoocr.engines.macocr import MacOCREngine  # or other engines

# Create an OCR engine
engine = MacOCREngine()

# Create the PanoOCR pipeline
pano = PanoOCR(engine)

# Run OCR on a panorama
result = pano.recognize("panorama.jpg")

# Save results as JSON
result.save_json("results.json")

# Access individual results
for r in result.results:
    print(f"Text: {r.text}")
    print(f"Position: yaw={r.yaw}°, pitch={r.pitch}°")
    print(f"Confidence: {r.confidence}")
```

## Available OCR Engines

### Structured engines (return per-word bounding boxes)

#### MacOCREngine (macOS only)

Uses Apple's Vision Framework for fast, accurate OCR on macOS.

```python
from panoocr.engines.macocr import MacOCREngine

engine = MacOCREngine()
```

#### RapidOCREngine

PaddleOCR PP-OCRv4/v5 models via ONNX Runtime. Supports both v4 (2023) and v5 (2025) models, multilingual including CJK.

```python
from panoocr.engines.rapidocr_engine import RapidOCREngine

engine_v4 = RapidOCREngine()                                     # default: PP-OCRv4
engine_v5 = RapidOCREngine(config={"ocr_version": "PP-OCRv5"})   # PP-OCRv5
```

#### EasyOCREngine

Cross-platform OCR supporting 80+ languages.

```python
from panoocr.engines.easyocr import EasyOCREngine

engine = EasyOCREngine(config={"language_preference": ["en"], "gpu": True})
```

#### PaddleOCREngine

PaddlePaddle-based OCR supporting multiple languages with automatic model management.

```python
from panoocr.engines.paddleocr import PaddleOCREngine

engine = PaddleOCREngine()
```

#### GoogleVisionEngine

Google Cloud Vision API (`TEXT_DETECTION`). Requires `GOOGLE_VISION_API_KEY` in environment or `.env`.

```python
from panoocr.engines.google_vision import GoogleVisionEngine

engine = GoogleVisionEngine()
```

#### Florence2OCREngine (transformers + torch)

Microsoft's Florence-2 vision-language model via transformers.

```python
from panoocr.engines.florence2 import Florence2OCREngine

engine = Florence2OCREngine()
```

#### Florence2MLXEngine (mlx-vlm, macOS Apple Silicon)

Florence-2 via mlx-vlm with `<OCR_WITH_REGION>` for structured quad-box output. The only VLM engine that returns per-word bounding boxes.

```python
from panoocr.engines.florence2_mlx import Florence2MLXEngine

engine = Florence2MLXEngine()
```

### Unstructured engines (return text without bounding boxes)

These engines return text only. Each detection gets a full-image bounding box for crop-level attribution in the panoocr pipeline.

#### GeminiEngine

Google Gemini API. Supports multiple model variants. Requires `GOOGLE_GEMINI_API_KEY` in environment or `.env`.

```python
from panoocr.engines.gemini import GeminiEngine

engine_flash = GeminiEngine(config={"model": "gemini-2.5-flash"})
engine_pro = GeminiEngine(config={"model": "gemini-2.5-pro"})
```

#### GlmOCREngine (mlx-vlm, macOS Apple Silicon)

GLM-OCR (0.9B) via mlx-vlm. Document-focused VLM -- limited effectiveness on scene text.

```python
from panoocr.engines.glm_ocr import GlmOCREngine

engine = GlmOCREngine()
```

#### DotsOCREngine (mlx-vlm, macOS Apple Silicon)

DOTS.OCR (2.9B) via mlx-vlm. Document layout parser -- limited effectiveness on scene text.

```python
from panoocr.engines.dots_ocr import DotsOCREngine

engine = DotsOCREngine()
```

#### TrOCREngine (experimental)

Microsoft's TrOCR transformer-based single-line OCR. Does not detect text regions -- treats the entire image as one text line. Experimental; consider other engines for panorama OCR.

```python
from panoocr.engines.trocr import TrOCREngine

engine = TrOCREngine()
```

## Advanced Usage

### Custom Perspectives

```python
from panoocr import PanoOCR, PerspectivePreset, generate_perspectives

# Use a preset
pano = PanoOCR(engine, perspectives=PerspectivePreset.ZOOMED_IN)

# Or create custom perspectives
custom_perspectives = generate_perspectives(
    fov=30,              # Horizontal FOV in degrees
    resolution=1024,     # Pixel width/height
    overlap=0.5,         # 50% overlap between adjacent views
    pitch_angles=[0, 15, -15],  # Multiple rows
)
pano = PanoOCR(engine, perspectives=custom_perspectives)
```

### Multi-Scale Detection

```python
from panoocr import PanoOCR, PerspectivePreset

pano = PanoOCR(engine)

# Run OCR at multiple scales to catch both small and large text
result = pano.recognize_multi(
    "panorama.jpg",
    presets=[
        PerspectivePreset.ZOOMED_IN,
        PerspectivePreset.DEFAULT,
    ],
)
```

### Custom Deduplication Settings

```python
from panoocr import PanoOCR, DedupOptions

pano = PanoOCR(
    engine,
    dedup_options=DedupOptions(
        min_text_similarity=0.6,
        min_intersection_ratio=0.2,
    ),
)
```

### Using the Protocol for Custom Engines

You can create your own OCR engine by implementing the `OCREngine` protocol:

```python
from panoocr import OCREngine, FlatOCRResult
from PIL import Image

class MyCustomEngine:
    def recognize(self, image: Image.Image) -> list[FlatOCRResult]:
        # Your OCR implementation here
        # Return results with normalized bounding boxes (0-1 range)
        ...

# No inheritance required - just implement the method
engine = MyCustomEngine()
pano = PanoOCR(engine)
```

## Preview Tool

The package includes an interactive HTML preview tool for visualizing OCR results on the panorama. Open `preview/index.html` in a browser and drag & drop your panorama image and JSON results file.

## Output Format

OCR results are returned as `SphereOCRResult` objects with spherical coordinates:

```json
{
  "results": [
    {
      "text": "HELLO WORLD",
      "confidence": 0.95,
      "yaw": 45.0,
      "pitch": 0.0,
      "width": 10.5,
      "height": 3.2,
      "engine": "APPLE_VISION_FRAMEWORK"
    }
  ],
  "image_path": "panorama.jpg",
  "perspective_preset": "default"
}
```

- `yaw`: Horizontal angle in degrees (-180 to 180)
- `pitch`: Vertical angle in degrees (-90 to 90)
- `width`, `height`: Angular dimensions in degrees

## License

MIT License - see [LICENSE](LICENSE) for details.
