Metadata-Version: 2.4
Name: symagedocs
Version: 1.0.1
Summary: Python SDK for the SymageDocs synthetic data API
Project-URL: Homepage, https://symagedocs.ai
Project-URL: Documentation, https://symagedocs.ai/docs/api/
Project-URL: Repository, https://github.com/GeiselSoftware/paperlives
Project-URL: Changelog, https://symagedocs.ai/docs/api/changelog.html
Author-email: Geisel Software <support@symagedocs.ai>
License-Expression: MIT
License-File: LICENSE
Keywords: compliance,document-generation,ml-training,synthetic-data
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: httpx>=0.25
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: mypy>=1.5; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-httpx>=0.30; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Provides-Extra: progress
Requires-Dist: tqdm>=4.60; extra == 'progress'
Description-Content-Type: text/markdown

# SymageDocs Python SDK

Generate synthetic documents, identities, and tabular datasets for testing, ML training, and compliance.

## Installation

```bash
pip install symagedocs
```

For progress bars during long jobs:

```bash
pip install symagedocs[progress]
```

## Quick Start

```python
from symagedocs import Client

client = Client(api_key="sk_live_...")

# List available forms
forms = client.forms.list()
for f in forms:
    print(f"{f.id}: {f.name} ({f.credit_cost} credits)")

# Generate 100 W-2 documents
job = client.generate.create(
    "irs_w2_2024",
    quantity=100,
    output_formats=["pdf_typed", "json"],
)
result = client.generate.wait(job.job_id)  # polls until complete
client.generate.download(job.job_id, "pdf_typed", "./w2_documents.zip")

# Batch generation with token budget
batch = client.batches.create(
    "Training Data",
    "irs_w2_2024",
    token_budget=5000,
    output_formats=["pdf_typed", "json"],
)
gen = client.batches.generate(batch.batch_id, quantity=10)
for item_id in gen.item_ids:
    files = client.batches.download_urls(batch.batch_id, item_id)
    for f in files:
        print(f"{f.filename}: {f.url}")  # presigned S3 URLs

# Generate tabular data from a description
schema = client.tabular.parse("name, age, SSN, city, state, annual income")
tab_job = client.tabular.generate(columns=schema.columns, quantity=5000)
client.tabular.wait(tab_job.job_id)
client.tabular.download(tab_job.job_id, "csv", "./dataset.csv")

# Check credit balance
balance = client.account.balance()
print(f"Credits used: {balance.credits_used}")
```

## Authentication

Get your API key at [symagedocs.ai/account?tab=api](https://symagedocs.ai/account?tab=api).

```python
# Pass directly
client = Client(api_key="sk_live_...")

# Or set environment variable
# export SYMAGEDOCS_API_KEY=sk_live_...
client = Client()  # reads from env
```

## Async Support

```python
from symagedocs import AsyncClient

async with AsyncClient(api_key="sk_live_...") as client:
    forms = await client.forms.list()
    job = await client.generate.create("irs_w2_2024", quantity=10)
    result = await client.generate.wait(job.job_id)
```

## Configuration

```python
client = Client(
    api_key="sk_live_...",
    base_url="https://symagedocs.ai",  # custom server
    timeout=30.0,                       # request timeout (seconds)
    max_retries=3,                      # retry on 429/5xx
)
```

## Method Reference

### Forms

| Method | Description |
|--------|-------------|
| `forms.list(category=None)` | List available forms, optionally filtered by category |
| `forms.get(form_id)` | Get detailed form info including field definitions |

### Generation

| Method | Description |
|--------|-------------|
| `generate.create(form_id, quantity=1, output_formats=["pdf_typed"], config=None, seed=None)` | Create an async generation job |
| `generate.list_jobs(limit=50, cursor=None, status=None)` | List generation jobs (cursor-paginated) |
| `generate.get_job(job_id)` | Get full job status and progress |
| `generate.download(job_id, format, path)` | Download job output to a local file |
| `generate.wait(job_id, poll_interval=3.0)` | Poll until job completes or fails |

### Identities

| Method | Description |
|--------|-------------|
| `identities.generate(quantity=1, config=None, seed=None)` | Generate raw synthetic identities as JSON |

### Batches

| Method | Description |
|--------|-------------|
| `batches.create(name, form_id, token_budget=None, output_formats=["pdf_typed"], config=None, label_scheme=None)` | Create a batch with optional token budget |
| `batches.list(limit=50, cursor=None)` | List batches (cursor-paginated) |
| `batches.get(batch_id)` | Get batch status and details |
| `batches.generate(batch_id, quantity=1, seed=None, webhook_url=None)` | Generate items within a batch |
| `batches.list_items(batch_id, limit=50, cursor=None)` | List batch items (cursor-paginated) |
| `batches.download_urls(batch_id, item_id)` | Get presigned S3 URLs for item files |
| `batches.get_bio_labels(batch_id, item_id)` | Get BIO-tagged token annotations (ML training) |
| `batches.get_word_annotations(batch_id, item_id)` | Get word-level spatial annotations (ML training) |
| `batches.iter_training_examples(batch_id)` | Iterate all items as training examples with images, BIO labels, and word annotations |
| `batches.wait(batch_id, poll_interval=3.0)` | Poll until batch is exhausted or revoked |

### Tabular

| Method | Description |
|--------|-------------|
| `tabular.parse(prompt)` | Convert natural language to a column schema (LLM-powered) |
| `tabular.generate(columns, quantity=100, output_formats=["csv"], seed=None)` | Create a tabular generation job |
| `tabular.status(job_id)` | Get tabular job progress and ETA |
| `tabular.download(job_id, format, path)` | Download tabular output to a local file |
| `tabular.wait(job_id, poll_interval=2.0)` | Poll until tabular job completes or fails |

### Account

| Method | Description |
|--------|-------------|
| `account.balance()` | Get credit balance (`credits_used`, `credits_allocated`) |
| `account.usage(days=30)` | Get usage summary for the specified period |

## Error Handling

The SDK raises typed exceptions for API errors and retries automatically on `429` and `5xx`:

```python
from symagedocs import Client, AuthenticationError, RateLimitError, NotFoundError

try:
    forms = client.forms.list()
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Too many requests — SDK retries automatically")
except NotFoundError:
    print("Resource not found")
```

**All error classes:**

| Exception | HTTP Code | Description |
|-----------|-----------|-------------|
| `SymageDocsError` | — | Base exception for all SDK errors |
| `AuthenticationError` | 401 | Invalid or revoked API key |
| `PermissionDeniedError` | 403 | Key missing required scope |
| `NotFoundError` | 404 | Resource not found |
| `ValidationError` | 400 | Invalid request parameters |
| `InsufficientCreditsError` | 402 | Not enough credits for the operation |
| `ConflictError` | 409 | Resource in unexpected state (e.g., downloading incomplete job) |
| `RateLimitError` | 429 | Rate limit exceeded (SDK retries automatically) |
| `ServerError` | 5xx | Server-side error (SDK retries automatically) |

## Examples

See [`examples/`](examples/) for complete working scripts:

- [`list_forms.py`](examples/list_forms.py) — Browse available forms and credit costs
- [`generate_w2s.py`](examples/generate_w2s.py) — Full pipeline: create job, wait, download PDF + JSON
- [`tabular_dataset.py`](examples/tabular_dataset.py) — Parse NL description, generate 5k rows, download CSV
- [`train_kie_model.py`](examples/train_kie_model.py) — Create batch with NIST3 labels, iterate training examples with BIO labels and spatial annotations

## Documentation

- [API Reference](https://symagedocs.ai/docs/api/)
- [Getting Started](https://symagedocs.ai/docs/api/getting-started.html)
- [Code Samples](https://symagedocs.ai/docs/api/code-samples.html)

## License

MIT
