Metadata-Version: 2.4
Name: ocrqueen
Version: 0.2.1
Summary: Official Python SDK for the OCRQueen document extraction API
Project-URL: Homepage, https://ocrqueen.com
Project-URL: Documentation, https://ocrqueen.com/docs/sdks/python
Project-URL: Repository, https://github.com/ocrqueen/ocrqueen-python
Project-URL: Issues, https://github.com/ocrqueen/ocrqueen-python/issues
Project-URL: Changelog, https://github.com/ocrqueen/ocrqueen-python/blob/main/CHANGELOG.md
Author-email: OCRQueen <support@ocrqueen.com>
Maintainer-email: OCRQueen <support@ocrqueen.com>
License-Expression: MIT
License-File: LICENSE
Keywords: document-ai,document-extraction,heic,image-extraction,image-ocr,ocr,ocr-api,ocrqueen,pdf,pdf-extraction,pdf-to-json,pdf-to-markdown,powerpoint,pptx,pptx-extraction,presentation-extraction,rag,sdk,structured-extraction
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx<1.0.0,>=0.27.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: respx>=0.21.0; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Description-Content-Type: text/markdown

# ocrqueen-python

Official Python SDK for the [OCRQueen](https://ocrqueen.com) document and image extraction API.

> 🚧 **Status:** Pre-release. APIs and surface area will change before `v1.0.0`.

## Installation

```bash
pip install ocrqueen
```

Requires Python 3.10 or newer.

## Supported formats

| Category | Formats |
|---|---|
| Documents | **PDF** |
| Presentations | **PPTX**, **PPT** (PowerPoint) |
| Images | **PNG**, **JPEG**, **WebP**, **HEIC** / **HEIF** (iPhone photos) |

The API returns structured JSON + Markdown for every supported type —
text, tables, images, and (with `extraction_profile="advanced"`)
diagram graph extraction and image alt-text.

## Quickstart

```python
from ocrqueen import OCRQueen

client = OCRQueen(api_key="pk_...")

with open("paper.pdf", "rb") as f:
    job = client.extract.create(file=f)

result = client.jobs.wait(job)
print(result.result["markdown"])
```

Get an API key from [dashboard.ocrqueen.com](https://ocrqueen.com/dashboard/keys).

### Other file types

```python
# Slide decks — speaker notes are preserved
job = client.extract.create(file=open("pitch.pptx", "rb"))

# iPhone photos — HEIC handled natively, no conversion needed
job = client.extract.create(file=open("receipt.heic", "rb"))

# Scanned document images
job = client.extract.create(file=open("invoice.png", "rb"))

# Deeper extraction profile — diagrams, image alt-text, OCR on
# embedded text
job = client.extract.create(
    file=open("patent.pdf", "rb"),
    profile="advanced",
)
```

## Documentation

- Full API reference: <https://ocrqueen.com/docs>
- Python SDK guide: <https://ocrqueen.com/docs/sdks/python>
- Data retention & deletion: <https://ocrqueen.com/docs/data-retention>

## License

MIT — see [LICENSE](LICENSE).
