Metadata-Version: 2.4
Name: pdftables-io
Version: 0.1.0
Summary: Official Python SDK for the pdftables.io API – extract tables from PDFs.
Project-URL: Homepage, https://pdftables.io
Project-URL: Documentation, https://api.pdftables.io/v1/swagger/
Project-URL: Repository, https://github.com/pdftables-io/python-sdk
Project-URL: Issues, https://github.com/pdftables-io/python-sdk/issues
Author-email: "pdftables.io" <developer@software-fuhrmeister.de>
License-Expression: BSD-3-Clause
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx<1,>=0.27
Requires-Dist: pydantic<3,>=2.0
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: respx>=0.22; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# pdftables-io

Official Python SDK for the [pdftables.io](https://pdftables.io) API — extract tables from PDFs programmatically.

```bash
pip install pdftables-io
```

## Quick Start

```python
from pdftables import PDFTablesClient

client = PDFTablesClient(api_key="your-api-key")

# 1. Upload a PDF
upload = client.upload("invoice.pdf")

# 2. Start table extraction
job = client.create_job(upload.upload_id)

# 3. Wait for completion
job = client.wait_for_job(job.id)

# 4. Download results as CSV
csv_zip = client.download_job_csv(job.id)
with open("tables.zip", "wb") as f:
    f.write(csv_zip)
```

## Authentication

Pass your API key directly or set the `PDFTABLES_API_KEY` environment variable:

```python
# Explicit
client = PDFTablesClient(api_key="sk_live_...")

# Via environment variable
# export PDFTABLES_API_KEY=sk_live_...
client = PDFTablesClient()
```

## Async Usage

```python
import asyncio
from pdftables import AsyncPDFTablesClient

async def main():
    async with AsyncPDFTablesClient(api_key="your-api-key") as client:
        upload = await client.upload("invoice.pdf")
        job = await client.create_job(upload.upload_id)
        job = await client.wait_for_job(job.id)
        csv_zip = await client.download_job_csv(job.id)

asyncio.run(main())
```

## API Reference

### Upload

| Method | Description |
|---|---|
| `upload(file)` | Upload a PDF file (path or file object) |
| `list_uploads()` | List all uploads |

### Extraction Jobs

| Method | Description |
|---|---|
| `create_job(upload_id, *, pages, mode)` | Start extraction (`mode`: `auto`, `stream`, `lattice`) |
| `get_job(job_id)` | Get job status |
| `wait_for_job(job_id, *, poll_interval, timeout)` | Poll until complete |
| `list_jobs()` | List all jobs |
| `list_job_tables(job_id)` | List extracted tables |

### Downloads

| Method | Description |
|---|---|
| `download_table(table_id, *, format, structure)` | Download single table (`csv`/`json`/`xlsx`) |
| `download_tables_zip(table_ids, *, format, structure)` | Download multiple tables as ZIP |
| `download_job_csv(job_id)` | Download all job tables as CSV ZIP |
| `download_job_xlsx(job_id)` | Download all job tables as XLSX ZIP |
| `download_job_json(job_id)` | Download all job tables as JSON ZIP |

### Export Structures

| Method | Description |
|---|---|
| `list_structures()` | List all structures |
| `create_structure(*, name, slug, fields, ...)` | Create custom structure |
| `get_structure(structure_id)` | Get structure details |
| `update_structure(structure_id, *, name, slug, ...)` | Update structure |
| `delete_structure(structure_id)` | Delete structure |

### DATEV

| Method | Description |
|---|---|
| `create_datev_export(job_id, *, table_id, fiscal_year)` | Trigger DATEV export |
| `download_datev_export(job_id, datev_id, *, format)` | Download DATEV file |

## Error Handling

```python
from pdftables import PDFTablesClient, AuthenticationError, RateLimitError

client = PDFTablesClient(api_key="your-key")

try:
    upload = client.upload("invoice.pdf")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded — try again later")
```

| Exception | HTTP Status |
|---|---|
| `AuthenticationError` | 401, 403 |
| `ValidationError` | 400 |
| `PaymentRequiredError` | 402 |
| `NotFoundError` | 404 |
| `RateLimitError` | 429 |
| `ConflictError` | 409 |
| `ServerError` | 5xx |

## Advanced: Custom Base URL

```python
client = PDFTablesClient(
    api_key="your-key",
    base_url="https://staging-api.pdftables.io",
    timeout=60.0,
)
```

## Requirements

- Python ≥ 3.10
- [httpx](https://www.python-httpx.org/) ≥ 0.27
- [pydantic](https://docs.pydantic.dev/) ≥ 2.0

## License

BSD 3-Clause — see [LICENSE](LICENSE).
