Metadata-Version: 2.4
Name: scanforge
Version: 1.0.0
Summary: Official Python SDK for the scan-forge OCR service
Project-URL: Homepage, https://moonforge.tech/produkty/scan-forge
Project-URL: Repository, https://github.com/jaaaco/scan-forge
Author-email: Moonforge <hello@moonforge.tech>
License: MIT
License-File: LICENSE
Keywords: barcode,document,ocr,pdf,scan-forge
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: httpx<1.0,>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Description-Content-Type: text/markdown

# scanforge

[![PyPI version](https://img.shields.io/pypi/v/scanforge)](https://pypi.org/project/scanforge/)
[![Python](https://img.shields.io/pypi/pyversions/scanforge)](https://pypi.org/project/scanforge/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

Official Python SDK for the [scan-forge](https://moonforge.tech/produkty/scan-forge) OCR service — an on-premise, AI-powered drop-in replacement for ABBYY Recognition Server.

## Installation

```bash
pip install scanforge
```

Requires **Python 3.11+**.

## Quick Start

```python
from scanforge import Client

client = Client(api_key="sf_live_...")

# Extract text from a PDF
result = client.ocr("faktura.pdf")
print(result.text)

# Detect barcodes
barcodes = client.barcodes("dokument.pdf")
for b in barcodes:
    print(b.value, b.type)

# Convert a scan to DOCX
client.convert("skan.png", output="wynik.docx")
```

## API Reference

### `Client(api_key, base_url=...)`

Creates a new client instance.

| Parameter | Type | Required | Default |
|---|---|---|---|
| `api_key` | `str` | Yes | — |
| `base_url` | `str` | No | `https://api.scanforge.tech` |

```python
client = Client(
    api_key="sf_live_...",
    base_url="https://ocr.your-server.com",  # for self-hosted deployments
)
```

---

### `client.ocr(file_path, *, language=None, page_number=None, separate_pages=False)`

Extracts text from a PDF or image file.

**Parameters**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `file_path` | `str` | — | Path to input file (PDF, PNG, JPG, TIFF) |
| `language` | `str \| None` | `None` | OCR language code; auto-detected server-side when omitted |
| `page_number` | `int \| None` | `None` | Process a single page (0-indexed) |
| `separate_pages` | `bool` | `False` | Return each page separated by form-feed in `text` |

**Returns** `OcrResult`

```python
@dataclass
class OcrResult:
    text: str
    pages: int
    metadata: dict[str, Any]
```

**Example**

```python
result = client.ocr("invoice.pdf", language="eng")
print(result.text)    # extracted text
print(result.pages)   # number of pages processed
```

---

### `client.barcodes(file_path, *, page_number=0)`

Detects and decodes barcodes (1D and 2D) in a document.

**Parameters**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `file_path` | `str` | — | Path to input file |
| `page_number` | `int` | `0` | Page to scan (`0` = all pages) |

**Returns** `list[BarcodeResult]`

```python
@dataclass
class BarcodeResult:
    value: str   # decoded barcode content
    type: str    # symbology e.g. 'EAN-13', 'QR-Code', 'CODE-128'
    page: int    # 1-indexed page number
```

**Example**

```python
barcodes = client.barcodes("shipment.pdf")
for b in barcodes:
    print(b.value, b.type, b.page)
```

---

### `client.convert(file_path, *, output)`

Converts a PDF or image to an editable document format. The output format is determined by the extension of `output` (`.docx` → DOCX, `.xlsx` → XLSX).

**Parameters**

| Parameter | Type | Default | Description |
|---|---|---|---|
| `file_path` | `str` | — | Path to input file |
| `output` | `str` | — | Destination path (`.docx` or `.xlsx`) |

**Returns** `None` — the converted file is downloaded and written to `output` locally.

**Example**

```python
# Convert to Word document
client.convert("scan.pdf", output="result.docx")

# Convert to Excel spreadsheet (preserves table structure)
client.convert("table.pdf", output="data.xlsx")
```

---

## Error Handling

All methods raise `ScanForgeError` on failure.

```python
from scanforge import Client, ScanForgeError

client = Client(api_key="sf_live_...")

try:
    result = client.ocr("document.pdf")
except ScanForgeError as e:
    print(e)              # human-readable message
    print(e.status_code)  # HTTP status code (int or None for network errors)
    print(e.body)         # raw response body from the server
```

| Error condition | `status_code` |
|---|---|
| Invalid API key | `401` |
| Unsupported file type | `422` |
| Server error | `5xx` |
| Network / connection failure | `None` |

---

## Configuration

### Self-hosted deployment

Point the client at your own scan-forge server:

```python
client = Client(
    api_key="sf_live_...",
    base_url="https://ocr.internal.example.com",
)
```

### Environment variables (recommended)

```python
import os
from scanforge import Client

client = Client(
    api_key=os.environ["SCANFORGE_API_KEY"],
    base_url=os.environ.get("SCANFORGE_URL", "http://localhost:8000"),
)
```

---

## Requirements

- **Python 3.11+**
- A running **scan-forge server** — see [deployment docs](https://moonforge.tech/produkty/scan-forge#architektura)

---

## License

MIT © [Moonforge](https://moonforge.tech)
