Metadata-Version: 2.4
Name: axl-workflows
Version: 0.3.0
Summary: Lightweight framework for building data and ML workflows with class-based Python syntax
Project-URL: Homepage, https://github.com/axl-platform/axl-workflows
Project-URL: Documentation, https://axl-workflows.readthedocs.io/
Project-URL: Repository, https://github.com/axl-platform/axl-workflows
Project-URL: Bug Tracker, https://github.com/axl-platform/axl-workflows/issues
Project-URL: Changelog, https://github.com/axl-platform/axl-workflows/blob/master/CHANGELOG.md
Project-URL: Release Notes, https://github.com/axl-platform/axl-workflows/releases
Author-email: AXL Workflows Contributors <contributors@axl-workflows.dev>
Maintainer-email: AXL Workflows Contributors <contributors@axl-workflows.dev>
License: MIT
License-File: LICENSE
Keywords: argo,data,kubeflow,kubernetes,ml,mlops,pipelines,workflow
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Distributed Computing
Requires-Python: >=3.10
Requires-Dist: cloudpickle>=3.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.9.0
Provides-Extra: all
Requires-Dist: google-cloud-aiplatform>=1.49.0; extra == 'all'
Requires-Dist: google-cloud-storage>=2.0.0; extra == 'all'
Requires-Dist: kfp>=2.13; extra == 'all'
Requires-Dist: kubernetes>=28.0.0; extra == 'all'
Requires-Dist: pyyaml>=6.0.0; extra == 'all'
Provides-Extra: argo
Requires-Dist: kubernetes>=28.0.0; extra == 'argo'
Requires-Dist: pyyaml>=6.0.0; extra == 'argo'
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.10.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: scikit-learn>=1.3.0; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0.0; extra == 'dev'
Provides-Extra: kfp
Requires-Dist: kfp>=2.13; extra == 'kfp'
Requires-Dist: kubernetes>=28.0.0; extra == 'kfp'
Provides-Extra: vertex
Requires-Dist: google-cloud-aiplatform>=1.49.0; extra == 'vertex'
Requires-Dist: google-cloud-storage>=2.0.0; extra == 'vertex'
Description-Content-Type: text/markdown

<div align="center">
  <img src="docs/assets/axl-slogan.png" alt="AXL Workflows Logo"/>
</div>

[![CI](https://github.com/axl-platform/axl-workflows/actions/workflows/ci.yml/badge.svg)](https://github.com/axl-platform/axl-workflows/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/axl-workflows.svg?labelColor=ffffff&color=116aea&logo=pypi&logoColor=595959)](https://pypi.org/project/axl-workflows/)
[![Python](https://img.shields.io/pypi/pyversions/axl-workflows.svg?label=python&labelColor=ffffff&color=116aea&logo=python&logoColor=595959)](https://pypi.org/project/axl-workflows/)

**AXL Workflows (axl)** is the workflow authoring layer of the **AXL ML Platform** — a full AI platform from development to deployment. It lets teams define **data and ML workflows** as plain Python classes and compile them to any runtime provider.

* **Local runtime** → fast iteration on your machine.
* **Argo Workflows** → production Kubernetes pipelines.
* **Kubeflow Pipelines** → KFP-native execution (coming in v0.4.0).

**Write once → compile to any runtime. No YAML, no vendor lock-in.**

> axl-workflows is one product in the AXL ML Platform. For cluster ops and infrastructure bootstrap, see [`axlctl`](https://github.com/pedrospinosa/axlctl).

---

## 🚀 Quick Start

```bash
# Install
pip install axl-workflows

# Or with uv
uv pip install axl-workflows

# Create your first workflow
axl --help
```

---

## ✨ Key Features

* **Class-based DSL**: Define workflows as Python classes, with steps as methods and a `dag()` to wire them.
* **Simple params**: Treat parameters as a **normal step** that returns a Python object (e.g., a Pydantic model or dict). No special Param/Artifact classes.
* **IO Handlers**: Steps return **plain Python objects**; axl persists/loads them via an `io_handler` (default: **pickle**).

  * Per-step override (`@step(io_handler=...)`)
  * **Input modes**: receive **objects** by default or **file paths** with `input_mode="path"`.
* **Intermediate Representation (IR)**: Backend-agnostic DAG model (nodes, edges, resources, IO metadata).
* **Multiple backends**:

  * **Local runtime** → develop and iterate quickly.
  * **Argo Workflows** → YAML generation for production Kubernetes pipelines.
  * **Kubeflow Pipelines** → KFP pipeline packages (coming in v0.4.0).
* **Unified runner image**: One container executes steps locally and in Argo pods.
* **Resource & retry hints**: Declare CPU, memory, caching, retries, and conditions at the step level.
* **CLI tools**: Compile, validate, run locally, or render DAGs.

---

## 📦 Example Workflow (params as a step, with Pydantic)

```python
# examples/churn_workflow.py
from axl import Workflow, step
from pydantic import BaseModel

# Parameters are just a normal step output (typed with Pydantic for convenience).
class TrainParams(BaseModel):
    seed: int = 42
    input_path: str = "data/raw.csv"

class ChurnTrain(Workflow):
    # Workflow configuration via class attributes
    name = "churn-train"
    image = "ghcr.io/axl-platform/axl-workflows/runner:0.3.0"
    io_handler = "pickle"

    @step
    def params(self) -> TrainParams:
        # Use defaults here; optionally read from YAML/env if you prefer.
        return TrainParams()

    @step  # default io_handler = pickle
    def preprocess(self, p: TrainParams):
        import pandas as pd
        df = pd.read_csv(p.input_path)
        # ... feature engineering ...
        return df  # persisted via pickle (default)

    @step
    def train(self, features, p: TrainParams):
        from sklearn.ensemble import RandomForestClassifier
        import numpy as np
        y = (features.sum(axis=1) > features.sum(axis=1).median()).astype(int)
        X = features.select_dtypes(include=[np.number]).fillna(0)
        model = RandomForestClassifier(n_estimators=50, random_state=p.seed).fit(X, y)
        return model  # persisted via pickle

    @step
    def evaluate(self, model) -> float:
        # pretend evaluation
        return 0.9123

    def dag(self):
        p = self.params()
        feats = self.preprocess(p)
        model = self.train(feats, p)
        return self.evaluate(model)
```

**Variations**

* Receive a **file path** instead of an object:

  ```python
  from pathlib import Path

  @step(input_mode={"features": "path"})
  def profile(self, features: Path) -> dict:
      return {"bytes": Path(features).stat().st_size}
  ```

* Override the **io handler** (e.g., Parquet for DataFrames):

  ```python
  from axl.io.parquet_io import parquet_io_handler

  @step(io_handler=parquet_io_handler)
  def preprocess(self, p: TrainParams):
      import pandas as pd
      return pd.read_csv(p.input_path)  # saved as .parquet; downstream gets a DataFrame
  ```

---

## 🛠 CLI

```bash
# Compile to Argo Workflows YAML
axl compile -m examples/churn_workflow.py:ChurnTrain --target argo --out churn.yaml

# Compile to Kubeflow Pipelines package (v0.4.0+)
axl compile -m examples/churn_workflow.py:ChurnTrain --target kfp --out pipeline.yaml

# Run locally
axl run local -m examples/churn_workflow.py:ChurnTrain

# Validate workflow definition
axl validate -m examples/churn_workflow.py:ChurnTrain

# Render DAG graph
axl render -m examples/churn_workflow.py:ChurnTrain --out dag.png
```

> For cluster lifecycle, storage init, and runner image ops, use [`axlctl`](https://github.com/pedrospinosa/axlctl).

---

## 📐 Architecture

axl-workflows is **Layer 1** of the AXL ML Platform:

```
┌─────────────────────────────────────────────────┐
│  LAYER 5: MONITOR     (future: axl-monitor)     │
├─────────────────────────────────────────────────┤
│  LAYER 4: SERVE       axl-serving               │
├─────────────────────────────────────────────────┤
│  LAYER 3: MANAGE      axl-model-registry        │
├─────────────────────────────────────────────────┤
│  LAYER 2: EXECUTE     axl-etl                   │
├─────────────────────────────────────────────────┤
│  LAYER 1: AUTHOR → COMPILE → RUN  ← (here)     │
├─────────────────────────────────────────────────┤
│  OPS (cross-cutting)  axlctl (separate repo)    │
└─────────────────────────────────────────────────┘
```

Within this repo, the layers are:

1. **Authoring Layer**

   * Python DSL: `@step` decorator, `Workflow` base class
   * **Params are a normal step** (often a Pydantic model)
   * **Configuration via class attributes** (name, image, io_handler)
   * IO handled by **io_handlers** (default: pickle)
   * Wire dependencies via `dag()` (auto-inferred in v0.3.0+)

2. **IR (Intermediate Representation)**

   * Backend-agnostic DAG: nodes, edges, inputs/outputs, resources, retry policies, IO metadata

3. **Compilers**

   * **Argo**: IR → Argo Workflow YAML
   * **KFP**: IR → Kubeflow Pipelines package (v0.4.0+)
   * Plugin architecture (v0.4.0+) — add any target via entry points

4. **Runtime**

   * Unified runner image (`axl-runner`) executes steps in pods and locally
   * Handles env (via **uv**), IO handler save/load, structured logging, retries

5. **CLI**

   * `axl compile`, `axl run local`, `axl validate`, `axl render`
   * `axl pack`, `axl build-image` (v0.5.0+)

---

## 📂 Project Structure

```
axl/
  core/          # DSL: decorators, base classes, typing
  io/            # io_handlers (pickle default; parquet/npy/torch optional)
  ir/            # Intermediate Representation (nodes, edges, workflows)
  compiler/      # Backend compilers (Argo, Kubeflow)
  runtime/       # Runner container + IO + env setup (uv)
  cli.py         # CLI entrypoint
examples/
  churn_workflow.py
tests/
  test_core.py   # Tests for DSL components
  test_ir.py     # Tests for IR components
pyproject.toml
README.md
```

---

## 🎯 Why AXL Workflows?

* **Local development** is fast and simple.
* **Argo/KFP is production-grade** but YAML is verbose and hard to get started with.
* **axl bridges the gap**:

  * Simple, class-based DSL — no YAML, no vendor-specific decorators
  * **Params as a normal step** — no special `Param`/`Artifact` classes
  * IO handlers for painless object ↔ file persistence
  * Backend-agnostic IR — one workflow definition, multiple compile targets
  * Compile once, run anywhere

---

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
