Metadata-Version: 2.4
Name: markdownbridge
Version: 0.1.0
Summary: Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown
Project-URL: Homepage, https://www.markdownbridge.com
Project-URL: Documentation, https://www.markdownbridge.com/docs
Project-URL: Repository, https://github.com/markdownbridge/markdownbridge-python
Project-URL: Issues, https://github.com/markdownbridge/markdownbridge-python/issues
Author-email: MarkdownBridge <support@markdownbridge.com>
License-Expression: MIT
License-File: LICENSE
Keywords: api,document,markdown,ocr,pdf,sdk
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: httpx>=0.24.0
Description-Content-Type: text/markdown

# markdownbridge

Python SDK for the [MarkdownBridge](https://www.markdownbridge.com) OCR API — convert documents and images to Markdown.

## Installation

```bash
pip install markdownbridge
```

## Quick Start

```python
from markdownbridge import MarkdownBridge

client = MarkdownBridge(api_key="ocrb_prd_xxx")

# One-liner: URL → Markdown
result = client.ocr("https://example.com/invoice.pdf")
print(result.markdown)

# One-liner: local file → Markdown
result = client.ocr("./receipt.png")
print(result.markdown)
```

## Authentication

Pass your API key directly or set the `MARKDOWNBRIDGE_API_KEY` environment variable:

```bash
export MARKDOWNBRIDGE_API_KEY="ocrb_prd_xxx"
```

```python
client = MarkdownBridge()  # reads from env
```

## Client Options

```python
client = MarkdownBridge(
    api_key="ocrb_prd_xxx",                        # or env MARKDOWNBRIDGE_API_KEY
    base_url="https://api.markdownbridge.com",      # default
    timeout=30.0,                                    # request timeout in seconds
    max_retries=3,                                   # retry 5xx errors with backoff
)
```

## API Reference

### `client.ocr(source, **opts)`

The convenience method — give it a URL or file path, get back a `ProcessingResult`.

```python
result = client.ocr(
    "https://example.com/doc.pdf",
    language="en",
    output_format="markdown",
    enhance_quality=True,
    poll_interval=2.0,     # seconds between status checks
    poll_timeout=300.0,    # max wait time
)
print(result.markdown)
print(result.page_count)
```

### `client.process_url(file_url, **opts)`

Submit a URL for processing without waiting for completion.

```python
proc = client.process_url("https://example.com/doc.pdf")
print(proc.process_id)  # use with get_status() / wait_for_completion()
```

### `client.process_file(file_path, **opts)`

Upload a local file and submit it for processing.

```python
proc = client.process_file("./invoice.pdf")
print(proc.process_id)
```

### `client.upload_file(file_path)`

Upload a file without processing it.

```python
upload = client.upload_file("./photo.png")
print(upload.document_id)
```

### `client.get_status(process_id)`

Check the current status of a processing job.

```python
status = client.get_status("uuid-here")
print(status.status)   # queued | processing | completed | failed
print(status.progress)  # 0–100
print(status.stage)     # queued | download | ocr | llm_improvement | completed | failed
```

### `client.wait_for_completion(process_id, **opts)`

Poll until the job completes or fails.

```python
result = client.wait_for_completion(
    "uuid-here",
    poll_interval=2.0,
    poll_timeout=300.0,
    on_status_change=lambda s: print(f"Status: {s.status} ({s.stage})"),
)
```

### `client.list_results(**filters)`

Fetch paginated results.

```python
page = client.list_results(limit=20, offset=0, status="completed")
for item in page.data:
    print(item.file_name, item.status)
print(f"Total: {page.pagination.total}")
```

### `client.iter_results(**filters)`

Auto-paginating iterator over all results.

```python
for item in client.iter_results(status="completed"):
    print(item.file_name)
```

### `client.get_result(result_id)`

Fetch a specific result by ID.

```python
result = client.get_result("uuid-here")
print(result.result.markdown)
```

### `client.info()`

Get API version and status.

```python
info = client.info()
print(info.version, info.status)
```

## Async Usage

Every method has an async equivalent via `AsyncMarkdownBridge`:

```python
import asyncio
from markdownbridge import AsyncMarkdownBridge

async def main():
    async with AsyncMarkdownBridge(api_key="ocrb_prd_xxx") as client:
        result = await client.ocr("https://example.com/invoice.pdf")
        print(result.markdown)

        # Auto-paginating async iteration
        async for item in client.iter_results():
            print(item.file_name)

asyncio.run(main())
```

## Error Handling

All exceptions inherit from `MarkdownBridgeError` and include `status_code`, `error_code`, and `correlation_id`:

```python
from markdownbridge import MarkdownBridge, RateLimitError, AuthenticationError

client = MarkdownBridge(api_key="ocrb_prd_xxx")

try:
    result = client.ocr("https://example.com/doc.pdf")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except MarkdownBridgeError as e:
    print(f"API error {e.status_code}: {e}")
```

### Exception Hierarchy

| Exception | HTTP Status | When |
|-----------|-------------|------|
| `AuthenticationError` | 401 | Invalid or missing API key |
| `ValidationError` | 400/422 | Invalid request parameters |
| `NotFoundError` | 404 | Resource not found |
| `RateLimitError` | 429 | Too many requests |
| `InsufficientCreditsError` | 402 | Account has no credits |
| `ServerError` | 5xx | Server-side failure |
| `ProcessingError` | — | OCR job failed |
| `FileUploadError` | — | Upload failed |
| `TimeoutError` | — | Polling exceeded timeout |

## Data Types

All response types are frozen dataclasses:

- `ProcessResponse` — process_id, status, file_id, stage
- `ProcessingStatus` — process_id, status, progress, stage, result, error
- `ProcessingResult` — text, markdown, json, page_count, processing_time
- `UploadResponse` — file_key, public_url, document_id
- `ResultItem` — id, process_id, file_name, status, result
- `ResultsPage` — data, pagination
- `Pagination` — total, limit, offset, has_more, next_offset
- `ApiInfo` — version, status, endpoints

## License

MIT
