Metadata-Version: 2.4
Name: ai-factory-sdk
Version: 0.1.1
Summary: Python SDK for the AI Factory Compute API
Project-URL: Homepage, https://pypi.org/project/ai-factory-sdk/
Author-email: AI Factory Team <software.platform@ai-at.eu>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: httpx>=0.28
Requires-Dist: pydantic>=2.0
Description-Content-Type: text/markdown

# AI Factory SDK

Python SDK for the [AI Factory Compute API](https://compute-api.ai-factory.datalab.tuwien.ac.at/compute-api/v1/docs) — submit and manage HPC jobs from Python.

## Features

- Synchronous and asynchronous clients (`AIFactoryClient`, `AsyncAIFactoryClient`)
- Typed request/response models with Pydantic validation
- Job polling with configurable timeout and retry (`client.wait()`)
- Automatic retry on transient errors (429, 5xx)
- PEP 561 compatible — full type annotation coverage

## Installation

```bash
pip install ai-factory-sdk
```

Or with [uv](https://docs.astral.sh/uv/):

```bash
uv add ai-factory-sdk
```

### Pre-release versions

Development builds published from the `dev` branch use PEP 440 pre-release
suffixes (e.g., `0.2.0.dev1`). Install them with:

```bash
pip install ai-factory-sdk --pre
```

## Quick Start

```python
from ai_factory.sdk import AIFactoryClient, JobRequest

# Credentials via environment: AI_FACTORY_API_KEY, AI_FACTORY_SLURM_USER
# Or pass explicitly:
with AIFactoryClient(token="...", slurm_user="jane") as client:
    # Submit a job
    resp = client.submit_job(
        JobRequest(name="hello", script="#!/bin/bash\necho Hello from SLURM")
    )
    print(f"Submitted job {resp.job_id}")

    # Wait for completion
    if resp.job_id is not None:
        detail = client.wait(str(resp.job_id), timeout=3600)
        print(f"Job finished with status: {detail.status}")
```

### Async Usage

```python
import asyncio
from ai_factory.sdk import AsyncAIFactoryClient, JobRequest

async def main():
    async with AsyncAIFactoryClient(token="...", slurm_user="jane") as client:
        resp = await client.submit_job(
            JobRequest(name="async-job", script="#!/bin/bash\nsleep 10 && echo done")
        )
        if resp.job_id is not None:
            detail = await client.wait(str(resp.job_id))
            print(detail.status)

asyncio.run(main())
```

### Container Jobs

```python
from ai_factory.sdk import AIFactoryClient, ContainerJobRequest

with AIFactoryClient(token="...", slurm_user="jane") as client:
    resp = client.submit_container(
        ContainerJobRequest(
            name="gpu-training",
            image="docker://nvcr.io/nvidia/pytorch:24.01-py3",
            container_command="python train.py",
            gres="gpu:a40:1",
            time_limit=120,
        )
    )
```

## Configuration

| Parameter | Environment Variable | Default |
|-----------|---------------------|---------|
| `base_url` | `AI_FACTORY_API_URL` | `https://compute-api.ai-factory.datalab.tuwien.ac.at/compute-api/v1` |
| `token` | `AI_FACTORY_API_KEY` | *(required)* |
| `slurm_user` | `AI_FACTORY_SLURM_USER` | *(required)* |
| `timeout` | — | `30.0` (HTTP timeout in seconds) |

Constructor parameters take precedence over environment variables.

## API Reference

### Clients

| Class | Description |
|-------|-------------|
| `AIFactoryClient` | Synchronous client (context manager) |
| `AsyncAIFactoryClient` | Asynchronous client (async context manager) |

### Methods

| Method | Description |
|--------|-------------|
| `submit_job(request)` | Submit a Slurm job script |
| `submit_container(request)` | Submit a containerised job |
| `get_job(job_id)` | Get job details by ID |
| `list_jobs(...)` | List jobs with optional filters and pagination |
| `cancel_job(job_id)` | Cancel a running or pending job |
| `wait(job_id, ...)` | Poll until the job reaches a terminal state |

### Request Models

| Model | Fields |
|-------|--------|
| `JobRequest` | `name`, `script`, `partition`, `tasks`, `cpus_per_task`, `time_limit`, `gres`, `standard_output`, `standard_error` |
| `ContainerJobRequest` | `name`, `image`, `container_command`, `partition`, `tasks`, `cpus_per_task`, `time_limit`, `gres`, `standard_output`, `standard_error` |

### Response Models

| Model | Fields |
|-------|--------|
| `SubmitJobResponse` | `job_id`, `output_dir`, `logs_url` |
| `JobDetail` | `job_id`, `name`, `status`, `partition`, `nodes`, `exit_code`, `duration`, `start_time`, `end_time`, `submit_time`, `working_directory`, `standard_output`, `standard_error`, `gres`, `output_dir`, `logs_url` |
| `JobListItem` | `job_id`, `name`, `status`, `duration`, `start_time`, `end_time` |
| `JobList` | `jobs`, `total`, `limit`, `offset` |
| `CancelJobResponse` | `message` |

### Exceptions

| Exception | When |
|-----------|------|
| `SDKError` | Base for all SDK errors |
| `APIError` | Non-2xx HTTP response |
| `AuthError` | 401 or 403 response |
| `NotFoundError` | 404 response |
| `WaitTimeoutError` | `wait()` exceeded its deadline |

## Requirements

- Python >= 3.11
- [httpx](https://www.python-httpx.org/) >= 0.28
- [pydantic](https://docs.pydantic.dev/) >= 2.0

## License

[MIT](LICENSE)
