Metadata-Version: 2.4
Name: prefect-submitit
Version: 0.1.4
Summary: A Prefect 3 TaskRunner that submits tasks to SLURM clusters via submitit
Project-URL: Homepage, https://github.com/dexterity-systems/prefect-submitit
Project-URL: Repository, https://github.com/dexterity-systems/prefect-submitit
Project-URL: Issues, https://github.com/dexterity-systems/prefect-submitit/issues
Project-URL: Documentation, https://github.com/dexterity-systems/prefect-submitit#readme
Project-URL: Changelog, https://github.com/dexterity-systems/prefect-submitit/blob/main/CHANGELOG.md
Maintainer: Dexterity Systems
License-Expression: BSD-3-Clause
License-File: LICENSE
Keywords: hpc,prefect,slurm,submitit,task-runner,workflow
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: System :: Distributed Computing
Requires-Python: >=3.12
Description-Content-Type: text/markdown

# prefect-submitit

A [Prefect 3](https://docs.prefect.io/) TaskRunner that submits tasks to
[SLURM](https://slurm.schedmd.com/) clusters via
[submitit](https://github.com/facebookincubator/submitit).

## Features

- **Single task submission** -- submit individual Prefect tasks as SLURM jobs
- **Job arrays** -- submit `task.map()` calls as SLURM job arrays with automatic
  chunking when the array exceeds cluster limits
- **Batched execution** -- group multiple items per SLURM job with
  `units_per_worker` to reduce scheduling overhead
- **Local mode** -- swap to local execution for testing without changing your
  flow code (`execution_mode="local"` or `SLURM_TASKRUNNER_BACKEND=local`)
- **Prefect UI integration** -- task run names include SLURM job IDs for easy
  cross-referencing with `squeue`/`sacct`

## Installation

```bash
pip install prefect-submitit
```

Or with conda (after the first conda-forge release):

```bash
conda install -c conda-forge prefect-submitit
```

## Quick Start

```python
from prefect import flow, task
from prefect_submitit import SlurmTaskRunner


@task
def add(x: int, y: int) -> int:
    return x + y


@flow(task_runner=SlurmTaskRunner(partition="cpu", time_limit="00:10:00"))
def my_flow():
    # Single task
    future = add.submit(1, 2)
    print(future.result())  # 3

    # Map over inputs (submitted as a SLURM job array)
    futures = add.map([1, 2, 3], [4, 5, 6])
    print([f.result() for f in futures])  # [5, 7, 9]


if __name__ == "__main__":
    my_flow()
```

## Configuration

```python
SlurmTaskRunner(
    partition="gpu",  # SLURM partition
    time_limit="04:00:00",  # Wall time (HH:MM:SS)
    mem_gb=16,  # Memory per job
    gpus_per_node=1,  # GPUs per job
    units_per_worker=10,  # Items per SLURM job (batched mode)
    execution_mode="slurm",  # "slurm" or "local"
    slurm_array_parallelism=100,  # Max concurrent array tasks
    log_folder="slurm_logs",  # Where submitit writes logs
    fail_on_error=True,  # Raise on SLURM job failure
    poll_interval=2.0,  # Seconds between job status checks
    max_poll_time=3600,  # Max seconds to poll before timing out
    max_array_size=1000,  # Override auto-detected max SLURM array size
)
```

Any additional keyword arguments are passed through to submitit (e.g.
`slurm_gres="gpu:a100:1"`).

## Examples

The `examples/` directory contains Jupyter notebooks that demonstrate each
feature end-to-end on a real SLURM cluster:

| Notebook | Covers |
|----------|--------|
| `01_single_task_submission` | Submitting individual tasks as SLURM jobs |
| `02_job_arrays_with_map` | `task.map()` with automatic job array chunking |
| `03_batched_execution` | Grouping items per job with `units_per_worker` |
| `04_error_handling_and_cancellation` | Failure propagation and job cancellation |
| `05_local_mode_and_development` | Local execution mode for dev/testing |

To run them: install dependencies, register the Jupyter kernel, and start the
Prefect server (see [Development](#development) below), then open any notebook
and select the **Prefect-Submitit** kernel.

## Integration Tests

The test suite includes SLURM integration tests that submit real jobs to verify
the runner works on your cluster. Use these to validate a new deployment:

```bash
pixi run -e dev test-slurm
```

Tests cover single submission, job arrays, batched execution, cancellation,
failure handling, polling, and environment propagation. They are marked with
`@pytest.mark.slurm` and skipped unless `--run-slurm` is passed.

## Local Testing

Set the environment variable to skip SLURM entirely:

```bash
export SLURM_TASKRUNNER_BACKEND=local
```

Or pass it directly:

```python
SlurmTaskRunner(execution_mode="local")
```

## Development

Requires [Pixi](https://pixi.sh):

```bash
pixi install
pixi run -e dev fmt
pixi run -e dev test
```

### Prefect Server

The repo includes a `prefect-server` CLI to run a local Prefect server backed
by PostgreSQL (handles SLURM concurrency better than SQLite). The server uses a
UID-based port to avoid conflicts on shared nodes.

```bash
pixi run prefect-start   # Start in background (PostgreSQL + Prefect)
pixi run prefect-stop    # Stop the server
```

The CLI automatically:
- Initializes PostgreSQL on first run (stored in `~/.prefect-submitit/postgres/`)
- Picks a UID-based port (range 4200-4999) to avoid conflicts
- Uses the node's FQDN so SLURM workers can reach it (falls back to IP if
  FQDN is unresolvable)
- Writes a discovery file to `~/.prefect-submitit/server.json`
- Tunes connection pool sizes for high-concurrency SLURM workloads

**Direct CLI**

```bash
prefect-server start [--bg] [--sqlite] [--restart] [--port N] [--pg-port N]
prefect-server stop [-f]
prefect-server status
prefect-server init-db [--reset]
```

- `--sqlite` uses SQLite instead of PostgreSQL
- `--restart` stops any existing server before starting
- `start` is idempotent — skips if the server is already healthy

**Server discovery**

Workers resolve the Prefect API URL in this order: `PREFECT_SUBMITIT_SERVER`
env var → `PREFECT_API_URL` env var → discovery file (auto-written by
`prefect-server start`).

### IDE Setup (VSCode)

**Python Interpreter:** Set the Pixi environment as your VSCode Python
interpreter:

```bash
pixi run which python
# Example output: /home/user/prefect-submitit/.pixi/envs/default/bin/python
```

In VSCode: `Ctrl+Shift+P` → "Python: Select Interpreter" → paste the path.

**Jupyter Kernel:** Register the Pixi environment as a Jupyter kernel so
notebooks use the correct packages:

```bash
pixi run install-kernel
```

In VSCode: open a `.ipynb` file → click "Select Kernel" → choose
**Prefect-Submitit**.

> **Tip:** If the Jupyter kernel takes 30+ seconds to start in VS Code, the
> `Python Environments` extension (`ms-python.vscode-python-envs`) is likely
> the cause. Uninstall it — the core Python extension works fine without it.
> Tracked upstream:
> [microsoft/vscode-python#25804](https://github.com/microsoft/vscode-python/issues/25804)

## License

BSD 3-Clause. See [LICENSE](LICENSE) for details.
