Metadata-Version: 2.4
Name: renzi
Version: 0.1.0
Summary: Local OCR extraction + LLM text correction. PaddleOCR for recognition, Ollama for correction, fully offline.
Author-email: CodeOfMe <wedonotuse@outlook.com>
License: GPL-3.0
Project-URL: Homepage, https://github.com/CodeOfMe/RenZi
Project-URL: Bug Tracker, https://github.com/CodeOfMe/RenZi/issues
Keywords: ocr,paddleocr,ollama,llm,text-correction,chinese-ocr,offline,local-ai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Graphics :: Capture :: Digital Camera
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: paddleocr>=3.6.0
Requires-Dist: flask>=2.0
Requires-Dist: pillow>=9.0
Requires-Dist: requests>=2.20
Requires-Dist: werkzeug>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file

# RenZi 认字

Local OCR extraction + LLM text correction. PaddleOCR recognizes text in images, a local Ollama model corrects OCR mistakes, and an optional Flask web UI ties it together. Everything runs offline — no cloud APIs.

## What It Does

RenZi takes images containing Chinese/English text and produces clean electronic text:

1. **OCR** — PaddleOCR (PP-OCRv5) extracts text plus per-region bounding boxes and confidence scores.
2. **Visualization** — renders an annotated image (green boxes) and a reconstructed text bitmap that preserves the original layout on a white canvas.
3. **Correction** — sends the raw OCR text to a local Ollama model (`gemma3:1b` by default) that fixes typos and misrecognized characters while keeping the original language and line structure.

Supported formats: `.png`, `.jpg`, `.jpeg`, `.bmp`, `.tiff`, `.tif`, `.webp`.

What it does **not** do: streaming OCR, mobile capture, cloud OCR, translation between languages.

## Features

- Single image or whole-directory batch OCR
- PaddleOCR 3.6 + PP-OCRv5 server detection/recognition models
- Optional annotated image and reconstructed text bitmap PNGs
- Local Ollama LLM correction (offline, no API keys)
- Flask web UI with drag-and-drop upload and side-by-side raw/corrected text
- Unified Python API with `ToolResult` dataclass
- OpenAI function-calling tools schema for agent integration
- CLI with unified flags (`-V -v -q --json -o`)

## Requirements

- Python >= 3.9
- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) 3.6.0 + PaddlePaddle 3.0.0 (see install notes below)
- [Ollama](https://ollama.com) running locally with a model pulled (e.g. `ollama pull gemma3:1b`)
- A CJK TrueType font for bitmap rendering (auto-detected on Windows/macOS/Linux)

### PaddlePaddle version pin

The known-good combination is **PaddlePaddle 3.0.0** + **PaddleOCR 3.6.0**. Newer PaddlePaddle (3.1+) breaks old model loading via the PIR executor. Install from the Baidu mirror:

```bash
pip install paddlepaddle==3.0.0 -i https://mirror.baidu.com/pypi/simple
```

## Installation

```bash
pip install -e .
```

For development dependencies:

```bash
pip install -e ".[dev]"
```

### Ollama setup (for correction)

```bash
ollama pull gemma3:1b
```

## Quick Start

Recognize one image and print text:

```bash
renzi photo.jpg
```

Batch a directory and write `.txt` transcripts:

```bash
renzi images/ -l ch -o transcripts/
```

Full pipeline with annotated image + text bitmap + corrected text file:

```bash
renzi scan.png --pipeline --bbox annotated.png --bitmap bitmap.png -o fixed.txt
```

Correct OCR text piped on stdin:

```bash
echo "我在吉林省、长春市" | renzi --correct -
```

Launch the web UI:

```bash
python -m renzi.web
# open http://127.0.0.1:8765
```

## CLI Usage

```
renzi [-V] [-v] [-q] [--json] [-l LANG] [--confidence N] [-o PATH]
     [--bbox PATH] [--bitmap PATH] [--ollama-base URL] [--ollama-model NAME]
     [--timeout S] [--correct] [--pipeline] [PATH]
```

| Flag | Meaning |
|------|---------|
| `-V`, `--version` | Print version and exit |
| `-v`, `--verbose` | Show per-item confidence and geometry |
| `-q`, `--quiet` | Suppress non-essential output |
| `--json` | Output results as JSON |
| `-l`, `--lang` | OCR language hint (default `ch`) |
| `--confidence` | Minimum OCR confidence (default `0.5`) |
| `-o`, `--output` | Output file (single) or directory (batch) |
| `--bbox` | Write annotated image PNG to this path |
| `--bitmap` | Write reconstructed text bitmap PNG to this path |
| `--ollama-base` | Ollama API base URL |
| `--ollama-model` | Ollama model name |
| `--timeout` | LLM request timeout in seconds |
| `--correct` | Correct text from stdin instead of running OCR |
| `--pipeline` | Run OCR + render + LLM correction |

## Python API

```python
from renzi import extract, extract_dir, correct, pipeline, ToolResult

# Single image
result = extract(image_path="photo.jpg", lang="ch")
print(result.success)        # True
print(result.data["text"])   # raw OCR text
for it in result.data["items"]:
    print(it["text"], it["confidence"], it["bbox"])

# Directory batch with text files
result = extract_dir(directory="images/", output="transcripts/")
for filename, info in result.data.items():
    print(filename, info["text"][:50])

# Correct raw OCR text
result = correct(text="我在吉林省、长春市")
print(result.data["text"])      # corrected
print(result.data["used_llm"])  # True/False

# Full pipeline with visualizations
result = pipeline(
    image_path="scan.png",
    bbox_image="annotated.png",
    bitmap_image="bitmap.png",
    output="fixed.txt",
)
```

## Agent Integration

```python
from renzi.tools import TOOLS, dispatch

# Register TOOLS in your OpenAI function-calling client, then:
result = dispatch("renzi_pipeline", {
    "image_path": "scan.png",
    "bbox_image": "annotated.png",
    "bitmap_image": "bitmap.png",
})
```

## Web UI

```python
from renzi.web import run
run(port=8765)
```

Drag-and-drop upload, three-up image comparison (original / annotated / text bitmap), and side-by-side raw vs corrected text with one-click copy and re-correct.

## Model Cache

PaddleOCR downloads models on first run from ModelScope into `~/.paddlex/official_models/`. To bundle models offline, place them in a directory and point `PADDLEX_HOME` (or `RENZI_MODEL_DIR`) at it:

```bash
export RENZI_MODEL_DIR=/path/to/official_models
renzi photo.jpg
```

## Development

```bash
pip install -e ".[dev]"
ruff format . && ruff check . && pytest
```

## License

GPL-3.0
