Metadata-Version: 2.4
Name: panoocr
Version: 0.2.0
Summary: PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.
Project-URL: Homepage, https://github.com/yz3440/panoocr
Project-URL: Repository, https://github.com/yz3440/panoocr
Project-URL: Issues, https://github.com/yz3440/panoocr/issues
Author-email: Yufeng Zhao <yufeng-zhao@outlook.com>
License: MIT
License-File: LICENSE
Keywords: 360,computer-vision,equirectangular,ocr,panorama,text-recognition
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.11
Requires-Dist: geopandas>=0.14.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: py360convert>=0.1.0
Requires-Dist: shapely>=2.0.0
Requires-Dist: textdistance>=4.5.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: docs
Requires-Dist: mkdocs-terminal>=4.6.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'docs'
Requires-Dist: pymdown-extensions>=10.0.0; extra == 'docs'
Provides-Extra: easyocr
Requires-Dist: easyocr>=1.7.0; extra == 'easyocr'
Provides-Extra: florence2
Requires-Dist: einops>=0.7.0; extra == 'florence2'
Requires-Dist: timm>=0.9.0; extra == 'florence2'
Requires-Dist: torch>=2.0.0; extra == 'florence2'
Requires-Dist: transformers>=4.40.0; extra == 'florence2'
Provides-Extra: full
Requires-Dist: easyocr>=1.7.0; extra == 'full'
Requires-Dist: einops>=0.7.0; extra == 'full'
Requires-Dist: matplotlib>=3.7.0; extra == 'full'
Requires-Dist: opencv-python>=4.8.0; extra == 'full'
Requires-Dist: paddleocr>=2.7.0; extra == 'full'
Requires-Dist: scipy>=1.10.0; extra == 'full'
Requires-Dist: timm>=0.9.0; extra == 'full'
Requires-Dist: torch>=2.0.0; extra == 'full'
Requires-Dist: transformers>=4.40.0; extra == 'full'
Provides-Extra: macocr
Requires-Dist: ocrmac>=0.2.0; extra == 'macocr'
Provides-Extra: paddleocr
Requires-Dist: paddleocr>=2.7.0; extra == 'paddleocr'
Provides-Extra: trocr
Requires-Dist: torch>=2.0.0; extra == 'trocr'
Requires-Dist: transformers>=4.40.0; extra == 'trocr'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.7.0; extra == 'viz'
Requires-Dist: opencv-python>=4.8.0; extra == 'viz'
Requires-Dist: scipy>=1.10.0; extra == 'viz'
Description-Content-Type: text/markdown

# PanoOCR

PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images with automatic perspective projection and deduplication.

https://github.com/user-attachments/assets/57507c48-ec88-4d4a-bf68-067eefc9d42f

## Features

- **Multiple OCR Engines**: Support for MacOCR (Apple Vision), EasyOCR, PaddleOCR, Florence-2, and TrOCR
- **Automatic Perspective Projection**: Converts equirectangular panoramas to multiple perspective views for better OCR accuracy
- **Deduplication**: Automatically removes duplicate text detections across overlapping perspective views
- **Spherical Coordinates**: Returns OCR results in yaw/pitch coordinates that map directly to the panorama
- **Preview Tool**: Interactive 3D preview of OCR results on the panorama

## Installation

Install the base package:

```bash
pip install panoocr
```

Install with OCR engine dependencies:

```bash
# macOS (Apple Vision Framework)
pip install "panoocr[macocr]"

# EasyOCR (cross-platform)
pip install "panoocr[easyocr]"

# PaddleOCR (cross-platform)
pip install "panoocr[paddleocr]"

# Florence-2 (requires GPU recommended)
pip install "panoocr[florence2]"

# All engines (excluding platform-specific macocr)
pip install "panoocr[full]"
```

Using uv (recommended):

```bash
uv add panoocr
uv add "panoocr[macocr]"  # or other extras
```

## Quick Start

```python
from panoocr import PanoOCR
from panoocr.engines.macocr import MacOCREngine  # or other engines

# Create an OCR engine
engine = MacOCREngine()

# Create the PanoOCR pipeline
pano = PanoOCR(engine)

# Run OCR on a panorama
result = pano.recognize("panorama.jpg")

# Save results as JSON
result.save_json("results.json")

# Access individual results
for r in result.results:
    print(f"Text: {r.text}")
    print(f"Position: yaw={r.yaw}°, pitch={r.pitch}°")
    print(f"Confidence: {r.confidence}")
```

## Available OCR Engines

### MacOCREngine (macOS only)

Uses Apple's Vision Framework for fast, accurate OCR on macOS.

```python
from panoocr.engines.macocr import MacOCREngine, MacOCRLanguageCode

engine = MacOCREngine(config={
    "language_preference": [MacOCRLanguageCode.ENGLISH_US],
})
```

### EasyOCREngine

Cross-platform OCR supporting 80+ languages.

```python
from panoocr.engines.easyocr import EasyOCREngine, EasyOCRLanguageCode

engine = EasyOCREngine(config={
    "language_preference": [EasyOCRLanguageCode.ENGLISH],
    "gpu": True,
})
```

### PaddleOCREngine

PaddlePaddle-based OCR with optional V4 server model for Chinese text.

```python
from panoocr.engines.paddleocr import PaddleOCREngine, PaddleOCRLanguageCode

engine = PaddleOCREngine(config={
    "language_preference": PaddleOCRLanguageCode.CHINESE,
    "use_v4_server": True,
})
```

### Florence2OCREngine

Microsoft's Florence-2 vision-language model for OCR.

```python
from panoocr.engines.florence2 import Florence2OCREngine

engine = Florence2OCREngine(config={
    "model_id": "microsoft/Florence-2-large",
})
```

## Advanced Usage

### Custom Perspectives

```python
from panoocr import PanoOCR, PerspectivePreset, generate_perspectives

# Use a preset
pano = PanoOCR(engine, perspectives=PerspectivePreset.ZOOMED_IN)

# Or create custom perspectives
custom_perspectives = generate_perspectives(
    pixel_size=1024,
    horizontal_fov=30,
    vertical_fov=30,
    pitch_offsets=[0, 15, -15],  # Multiple rows
)
pano = PanoOCR(engine, perspectives=custom_perspectives)
```

### Multi-Scale Detection

```python
from panoocr import PanoOCR, PerspectivePreset

pano = PanoOCR(engine)

# Run OCR at multiple scales to catch both small and large text
result = pano.recognize_multi(
    "panorama.jpg",
    presets=[
        PerspectivePreset.ZOOMED_IN,
        PerspectivePreset.DEFAULT,
    ],
)
```

### Custom Deduplication Settings

```python
from panoocr import PanoOCR, DedupOptions

pano = PanoOCR(
    engine,
    dedup_options=DedupOptions(
        min_text_similarity=0.6,
        min_intersection_ratio=0.2,
    ),
)
```

### Using the Protocol for Custom Engines

You can create your own OCR engine by implementing the `OCREngine` protocol:

```python
from panoocr import OCREngine, FlatOCRResult
from PIL import Image

class MyCustomEngine:
    def recognize(self, image: Image.Image) -> list[FlatOCRResult]:
        # Your OCR implementation here
        # Return results with normalized bounding boxes (0-1 range)
        ...

# No inheritance required - just implement the method
engine = MyCustomEngine()
pano = PanoOCR(engine)
```

## Preview Tool

The package includes an interactive HTML preview tool for visualizing OCR results on the panorama. Open `preview/index.html` in a browser and drag & drop your panorama image and JSON results file.

## Output Format

OCR results are returned as `SphereOCRResult` objects with spherical coordinates:

```json
{
  "results": [
    {
      "text": "HELLO WORLD",
      "confidence": 0.95,
      "yaw": 45.0,
      "pitch": 0.0,
      "width": 10.5,
      "height": 3.2,
      "engine": "APPLE_VISION_FRAMEWORK"
    }
  ],
  "image_path": "panorama.jpg",
  "perspective_preset": "default"
}
```

- `yaw`: Horizontal angle in degrees (-180 to 180)
- `pitch`: Vertical angle in degrees (-90 to 90)
- `width`, `height`: Angular dimensions in degrees

## License

MIT License - see [LICENSE](LICENSE) for details.
