Metadata-Version: 2.4
Name: scanheroai
Version: 1.0.0
Summary: Official Python SDK for the Scan Hero document conversion API
Project-URL: Homepage, https://www.scanheroai.com
Project-URL: Documentation, https://www.scanheroai.com/docs
Project-URL: Repository, https://github.com/scanheroai/scanhero-python
Project-URL: Bug Tracker, https://github.com/scanheroai/scanhero-python/issues
Author-email: Scan Hero <support@scanheroai.com>
License: MIT
Keywords: conversion,document,llm,markdown,ocr,pdf
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: httpx>=0.25.0
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-httpx; extra == 'dev'
Description-Content-Type: text/markdown

# scanhero-python

Official Python SDK for the [Scan Hero](https://www.scanheroai.com) document conversion API.

Convert PDF, Word, Excel, PowerPoint, images, audio, email, and 20+ other formats to Markdown (or DOCX, CSV, EPUB, and more) via a simple Python interface.

## Install

```bash
pip install scanhero
```

Python 3.9+ required. The only dependency is `httpx`.

## Quick start

```python
from scanhero import ScanHero

sh = ScanHero(api_key="sh_...")  # get your key at scanheroai.com/settings/api-keys

# Convert a PDF — sync for files ≤5 MB
task = sh.tasks.create("report.pdf")
print(task.output_markdown)

# Large files process asynchronously — wait until done
task = sh.tasks.create("recording.mp4")
task = sh.tasks.wait(task.task_id)   # polls every 2s, up to 5 minutes
print(task.output_markdown)

# Refine output with an LLM prompt
task = sh.tasks.adjust(task.task_id, "Summarise in bullet points")

# Download as DOCX
docx_bytes = sh.tasks.download(task.task_id, format="docx")
with open("output.docx", "wb") as f:
    f.write(docx_bytes)
```

## Authentication

Generate an API key at [scanheroai.com/settings/api-keys](https://www.scanheroai.com/settings/api-keys).

```python
sh = ScanHero(api_key="sh_your_key_here")
```

Or set the `SCANHERO_API_KEY` environment variable and use:

```python
import os
sh = ScanHero(api_key=os.environ["SCANHERO_API_KEY"])
```

## Tasks

```python
# Upload from a path
task = sh.tasks.create("invoice.pdf")

# Upload from a file object
with open("invoice.pdf", "rb") as f:
    task = sh.tasks.create(f)

# Upload raw bytes
task = sh.tasks.create(pdf_bytes, filename="invoice.pdf")

# With options
from scanhero import ProcessingOptions

task = sh.tasks.create(
    "scan.jpg",
    options=ProcessingOptions(
        image_handling="describe",   # ask LLM to describe images
        output_language="pt",        # Portuguese output
        output_format="markdown",
    ),
)

# Check status
task = sh.tasks.get(task.task_id)
print(task.status)      # "pending" | "processing" | "done" | "failed"
print(task.credits_used)

# List recent tasks
tasks = sh.tasks.list()

# Estimate cost before uploading
estimate = sh.tasks.estimate_cost(size_bytes=5_000_000, format="application/pdf")
print(f"Will cost {estimate.credits} credits")
```

## Batch jobs

```python
job = sh.jobs.create(["file1.pdf", "file2.docx", "file3.xlsx"])
print(job.job_id, job.status)

# Check progress
job = sh.jobs.get(job.job_id)
for item in job.items:
    print(item.filename, item.status)
```

## Webhooks

```python
# Register a webhook
wh = sh.webhooks.create(
    "https://your.app/hooks/scanhero",
    events=["task.completed", "task.failed"],
)
print(wh.webhook_id)

# In your web server, verify incoming payloads:
from scanhero import ScanHero
from scanhero.webhooks import WebhooksResource

is_valid = WebhooksResource.verify_signature(
    payload=request.body,
    signature_header=request.headers["X-Scan-Hero-Signature"],
    secret="your_webhook_secret",
)
```

## Templates

```python
from scanhero import ProcessingOptions

tmpl = sh.templates.create(
    "Legal doc pipeline",
    options=ProcessingOptions(output_language="en", image_handling="describe"),
    adjust_prompts=["Format citations as footnotes", "Add an executive summary"],
)

# Use template when creating tasks
task = sh.tasks.create("contract.pdf", template_id=tmpl.template_id)
```

## Error handling

```python
from scanhero import (
    ScanHeroError,
    InsufficientCreditsError,
    AuthenticationError,
    NotFoundError,
)

try:
    task = sh.tasks.create("huge_video.mp4")
except InsufficientCreditsError:
    print("Not enough credits — top up at scanheroai.com/pricing")
except AuthenticationError:
    print("Invalid API key")
except ScanHeroError as e:
    print(f"API error {e.status_code}: {e}")
```

## Regenerating from the OpenAPI spec

This SDK can be regenerated automatically from the live API spec:

```bash
# Install the generator
pip install openapi-python-client

# Regenerate (requires the API to be running)
openapi-python-client generate \
    --url https://api.scanheroai.com/openapi.json \
    --output-path sdk/python-generated
```

For the handcrafted SDK (this package), update `sdk/python/` directly.

## API reference

Full reference: [scanheroai.com/docs](https://www.scanheroai.com/docs)  
Interactive (OpenAPI): [scanheroai.com/docs/reference](https://www.scanheroai.com/docs/reference)

## Related SDKs

| Language | Package | Docs |
|----------|---------|------|
| Python | `pip install scanhero` | This package (`sdk/python/`) |
| TypeScript / JavaScript | `npm install @scanhero/sdk` | [`sdk/typescript/`](../typescript/) — generated from `/openapi.json` |

Both SDKs are documented together at [scanheroai.com/docs](https://www.scanheroai.com/docs#sdk).

## License

MIT
