Metadata-Version: 2.4
Name: handscribe
Version: 0.1.2
Summary: Multilingual handwritten OCR for student notes - production-grade text extraction
Author-email: Ronald Gosso <ronaldgosso@gmail.com>
License: GPL-3.0
Project-URL: Homepage, https://github.com/ronaldgosso/handscribe
Project-URL: Bug Reports, https://github.com/ronaldgosso/handscribe/issues
Project-URL: Source, https://github.com/ronaldgosso/handscribe
Keywords: ocr,handwriting,multilingual,student-notes,easyocr,paddleocr,trocr
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: easyocr>=1.7
Requires-Dist: opencv-python-headless>=4.8
Requires-Dist: pillow>=10.0
Requires-Dist: numpy>=1.24
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13.0
Requires-Dist: fastapi>=0.110
Requires-Dist: uvicorn>=0.29
Requires-Dist: python-multipart>=0.0.6
Provides-Extra: paddle
Requires-Dist: paddlepaddle>=2.6; extra == "paddle"
Requires-Dist: paddleocr>=2.7; extra == "paddle"
Provides-Extra: trocr
Requires-Dist: transformers>=4.40; extra == "trocr"
Requires-Dist: torch>=2.0; extra == "trocr"
Provides-Extra: dev
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: mypy>=1.5; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: httpx>=0.24; extra == "dev"
Provides-Extra: all
Requires-Dist: handscribe[paddle,trocr]; extra == "all"
Dynamic: license-file

# 🖋️ HandScribe OCR

> **Production-grade multilingual handwritten OCR for student notes**

<p align="center">
  <img src="https://raw.githubusercontent.com/ronaldgosso/handscribe/main/docs/logo.svg" alt="HandScribe Logo" width="400">
</p>

<p align="center">
  <a href="https://github.com/ronaldgosso/handscribe/actions/workflows/ci.yml"><img src="https://github.com/ronaldgosso/handscribe/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://github.com/ronaldgosso/handscribe/actions/workflows/publish.yml"><img src="https://github.com/ronaldgosso/handscribe/actions/workflows/publish.yml/badge.svg" alt="Publish"></a>
  <a href="https://ronaldgosso.github.io/handscribe/"><img src="https://img.shields.io/badge/docs-handscribe.pages.dev-blue" alt="Docs"></a>
  <a href="https://www.gnu.org/licenses/gpl-3.0"><img src="https://img.shields.io/badge/License-GPL--3.0-blue.svg" alt="License: GPL-3.0"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.9+-blue.svg" alt="Python 3.9+"></a>
  <a href="https://hub.docker.com/r/ronaldgosso/handscribe"><img src="https://img.shields.io/badge/docker-ready-blue.svg" alt="Docker"></a>
</p>

---

## What is HandScribe?

HandScribe extracts text from **handwritten student notes** in 80+ languages. It wraps three OCR engines — EasyOCR, PaddleOCR, and TrOCR — behind a single interface with advanced image preprocessing, a CLI, a REST API, and one-command Docker deployment.

Built for Tanzanian students. Designed for everyone.

---

## Quick Start

| Method | Command |
|--------|---------|
| **Docker** (zero setup) | `docker run -p 8000:8000 ronaldgosso/handscribe` |
| **pip install** | `pip install handscribe` |
| **From source** | `git clone https://github.com/ronaldgosso/handscribe.git && cd handscribe && pip install -e .` |

---

## Usage

### CLI

```bash
handscribe extract notes.jpg -b easyocr -l en,sw     # extract text
handscribe extract notes.jpg --json                   # output as JSON
handscribe extract notes.jpg -o output.txt -c 0.6     # save to file
handscribe batch ./images/ -o ./results/               # batch process
handscribe compare notes.jpg -l en,sw                  # compare all backends
```

### REST API

```bash
uvicorn ocr_engine.api:api --port 8000
# Interactive docs → http://localhost:8000/docs

curl -X POST http://localhost:8000/ocr \
  -F "file=@notes.jpg" -F "backend=easyocr" -F "languages=en,sw"
```

### Python

```python
from ocr_engine import OCREngine, OCRBackend

engine = OCREngine(backend=OCRBackend.EASYOCR, languages=["en", "sw"])
text = engine.extract_text("student_notes.jpg")
```

---

## OCR Backends

| Backend | Best For | Languages | Speed | Accuracy |
|---------|----------|-----------|-------|----------|
| **EasyOCR** | Quick setup, mixed scripts | 80+ | ⚡⚡⚡ | ⭐⭐⭐⭐ |
| **PaddleOCR** | Fast processing, documents | 80+ | ⚡⚡⚡⚡ | ⭐⭐⭐⭐ |
| **TrOCR** | Handwriting accuracy | English* | ⚡⚡ | ⭐⭐⭐⭐⭐ |

*\*TrOCR can be fine-tuned for other languages.*

### Language Codes

| Language | EasyOCR | PaddleOCR |
|----------|---------|-----------|
| English | `en` | `en` |
| Swahili | `sw` | `en` (Latin script) |
| Arabic | `ar` | `arabic` |
| Hindi | `hi` | `hi` |
| French | `fr` | `french` |

---

## CI/CD Pipeline

HandScribe uses two separate GitHub Actions workflows:

| Workflow | File | Triggers | What It Does |
|----------|------|----------|-------------|
| **CI** | [`ci.yml`](.github/workflows/ci.yml) | Push, PR | Lint → Test → Build & push Docker image |
| **Publish** | [`publish.yml`](.github/workflows/publish.yml) | Tag push (`v*`), Release, Manual | Build & publish to PyPI |
| **Pages** | [`pages.yml`](.github/workflows/pages.yml) | Push to `main` (docs/) | Deploy landing page to GitHub Pages |

### How It Works

```
push / PR
   │
   ├── lint ─── ruff ─ black ─ mypy
   │
   └── test ─── pytest (41 tests, 60% coverage)
         │
         └── on main ── build & push Docker image to Docker Hub


push tag v0.1.0
   │
   └── publish ─── build ─── twine check ─── upload to PyPI
```

### Status Badges

| Badge | Status |
|-------|--------|
| CI Build | [![CI](https://github.com/ronaldgosso/handscribe/actions/workflows/ci.yml/badge.svg)](https://github.com/ronaldgosso/handscribe/actions/workflows/ci.yml) |
| PyPI Publish | [![Publish](https://github.com/ronaldgosso/handscribe/actions/workflows/publish.yml/badge.svg)](https://github.com/ronaldgosso/handscribe/actions/workflows/publish.yml) |

### Publishing a Release

```bash
# 1. Bump version in pyproject.toml
# 2. Tag and push
git tag v0.1.0
git push origin v0.1.0

# → GitHub Actions auto-publishes to PyPI
```

---

## Docker

```bash
docker run -p 8000:8000 ronaldgosso/handscribe        # run
docker build -t handscribe .                           # build
docker compose up -d                                   # compose
```

Full instructions in [CONTRIBUTING.md](CONTRIBUTING.md).

---

## Architecture

```
handscribe/
├── ocr_engine/
│   ├── engine.py            # Core OCR engine (3 backends)
│   ├── preprocessing.py     # Denoise, CLAHE, binarize, deskew
│   ├── cli.py               # CLI (Typer) — extract, batch, compare, info
│   └── api.py               # REST API (FastAPI) — /ocr, /ocr/text, /ocr/batch
├── tests/
│   └── test_engine.py       # 41 tests
├── .github/workflows/
│   ├── ci.yml               # Lint → Test → Docker
│   └── publish.yml          # Build → PyPI
├── Dockerfile               # Optimized single-stage build
├── docker-compose.yml
└── pyproject.toml
```

---

## 🔧 Add Other OCR Backends

EasyOCR is included by default. Add PaddleOCR or TrOCR as optional extras:

```bash
pip install handscribe[paddle]    # PaddleOCR — faster, document-style
pip install handscribe[trocr]     # TrOCR — highest handwriting accuracy
pip install handscribe[all]       # Both PaddleOCR + TrOCR
```

```python
from ocr_engine import OCREngine, OCRBackend

# Switch backend at runtime
engine = OCREngine(backend=OCRBackend.PADDLE, languages=["en"])
engine = OCREngine(backend=OCRBackend.TROCR)
```

## 🧪 Development

```bash
git clone https://github.com/ronaldgosso/handscribe.git
cd handscribe
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e ".[all,dev]"
pytest tests/ -v
```

Full guide → [CONTRIBUTING.md](CONTRIBUTING.md)

---

## Acknowledgments

- **EasyOCR** — Jaided AI
- **PaddleOCR** — PaddlePaddle
- **TrOCR** — Microsoft
- **Tanzanian Students** — the inspiration

---

**Ronald Gosso** — ronaldgosso@gmail.com · [GitHub](https://github.com/ronaldgosso/handscribe)

*Made with ❤️ for students everywhere*
