Metadata-Version: 2.1
Name: document-workflow
Version: 0.1.0
Summary: Decorator-based workflow documentation for Python scripts and Jupyter notebooks
Home-page: https://github.com/ddpoe/dFlow
License: MIT
Keywords: documentation,jupyter,notebooks,workflow,automation,decorators,ast
Author: Dante Poe
Author-email: dap182@pitt.edu
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Documentation
Requires-Dist: nbformat (>=5.0.0)
Requires-Dist: pydantic (>=2.0.0)
Requires-Dist: pyyaml (>=6.0.0)
Requires-Dist: sqlmodel (>=0.0.14)
Project-URL: Documentation, https://github.com/ddpoe/dFlow
Project-URL: Repository, https://github.com/ddpoe/dFlow
Description-Content-Type: text/markdown

# dFlow

Decorator-based workflow documentation for Python scripts and Jupyter notebooks. Annotate your analysis with `Step` markers — dFlow statically extracts the structure (no code execution) and exports it to a standalone HTML page you can share with collaborators.

## Installation

```bash
pip install dflow
```

To also install HTML export support:

```bash
pip install dflow[export]   # adds sphinx + sphinx-dflow-ext
```

For development:

```bash
git clone https://github.com/ddpoe/dFlow.git
cd dFlow
poetry install --with export
```

## Notebook Quickstart

This is the primary use case: you have a Jupyter notebook with an analysis and you want to produce a clean, shareable HTML document describing the whole pipeline.

### 1. Write your helper functions with `@task`

Mark reusable functions with `@task`. These can live in a `.py` helper module or in a notebook cell — dFlow scans both:

```python
# helpers.py  (or a notebook cell — either works)
from dflow.core.decorators import task, Step

@task(
    purpose="Load and validate 10X h5ad file",
    inputs="Path to h5ad file",
    outputs="AnnData object with raw counts",
)
def load_data(path: str):
    adata = sc.read_h5ad(path)
    assert adata.n_obs > 0, "Empty dataset"
    return adata

@task(
    purpose="Dimensionality reduction via PCA + UMAP",
    inputs="Filtered AnnData",
    outputs="AnnData with UMAP coordinates in .obsm",
    critical="1-2 minutes on large datasets",
)
def reduce_dims(adata):
    口 = Step(step_num=1, name="PCA", purpose="Principal component analysis")
    sc.tl.pca(adata)

    口 = Step(step_num=2, name="Neighbors", purpose="Build kNN graph")
    sc.pp.neighbors(adata)

    口 = Step(step_num=3, name="UMAP", purpose="Compute UMAP embedding")
    sc.tl.umap(adata)
    return adata
```

Tasks can have their own `Step` markers inside them. When a workflow calls this task via `AutoStep`, dFlow resolves those internal steps as sub-steps (see below).

### 2. Annotate your notebook

The `workflow()` declaration and all `Step` / `AutoStep` markers go in **one cell**. Use `AutoStep` when calling a `@task` — dFlow pulls in its docs automatically. Use `Step` for inline code:

```python
from dflow import workflow, Step, AutoStep
from helpers import load_data, reduce_dims

口 = workflow(name="scrna_pipeline", purpose="Single-cell RNA-seq analysis")

# ── Step 1: Load Data ──
口 = AutoStep(step_num=1)
adata = load_data("data.h5ad")

# ── Step 2: Quality Control ──
口 = Step(step_num=2, name="Quality control",
          purpose="Filter low-quality cells and genes",
          critical="Removes 20-30% of cells")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)

# ── Step 3: Normalize ──
口 = Step(step_num=3, name="Normalize",
          purpose="Log-normalize and find highly variable genes",
          outputs="Normalized AnnData with HVG annotations")
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)

# ── Step 4: Reduce Dimensions ──
口 = AutoStep(step_num=4)
adata = reduce_dims(adata)
```

> **About `口`:** The CJK character 口 (mouth/opening) is used as a visual marker — it makes step annotations stand out from real data assignments. It's purely a convention; any variable name works, or you can omit the assignment entirely and just call `Step(...)` / `AutoStep(...)` as bare statements.

### 3. Cross-module step resolution

When `dflow build` scans this project, it sees that `reduce_dims` (called at step 4 via `AutoStep`) has internal steps 1, 2, 3. During assembly, those become **sub-steps** of step 4:

```
Step 1: Load Data              ← from @task on load_data
Step 2: Quality control        ← inline Step
Step 3: Normalize              ← inline Step
Step 4: Reduce Dimensions      ← from @task on reduce_dims
  Step 4.1: PCA                ← resolved from reduce_dims Step 1
  Step 4.2: Neighbors          ← resolved from reduce_dims Step 2
  Step 4.3: UMAP               ← resolved from reduce_dims Step 3
```

This works across files — the task can be in `helpers.py`, another notebook, or the same notebook. dFlow resolves everything statically from the database.

### 4. Build and export

```bash
dflow init                          # Create .dflow/ directory and database
dflow build .                       # Scan notebook + helpers.py → populate database
dflow export -o docs/               # Generate HTML documentation
```

### 5. What you get

`dflow export` produces a self-contained HTML site in `docs/`:

```
docs/
├── index.html                  # Landing page listing all workflows
├── scrna_pipeline.html         # Your workflow page
├── _static/                    # CSS, JS assets
└── ...
```

Open `docs/index.html` in a browser. The workflow page shows:

- **Workflow title and purpose** — from the `workflow()` call
- **Step-by-step outline** — each `Step` rendered with its name, purpose, inputs, outputs, and warnings
- **Resolved AutoStep details** — purpose/inputs/outputs pulled from `@task` decorators, with internal steps expanded as sub-steps (4.1, 4.2, …)
- **Mermaid diagram** — auto-generated flowchart showing the step sequence
- **Coverage links** — if any `@test_workflow` tests reference this workflow, they appear here

This is rendered by [sphinx-dflow-ext](https://github.com/ddpoe/sphinx_dflow_ext), which reads the `.dflow/workflow.db` database directly.

### Step reference

| Parameter | Required | Description |
|---|---|---|
| `step_num` | Yes | Integer for major steps, float for sub-steps in loops (e.g. `1.1`) |
| `name` | Yes (`Step` only) | Short label — `AutoStep` pulls this from the `@task` |
| `purpose` | Yes (`Step` only) | What this step accomplishes — `AutoStep` pulls this from the `@task` |
| `inputs` | No | What goes in |
| `outputs` | No | What comes out |
| `critical` | No | Warnings (long runtime, data loss, etc.) |

## How It Works

1. You annotate code with `Step()` / `AutoStep()` markers (notebooks) or `@workflow` / `@task` decorators (modules)
2. `dflow build` statically parses your code via AST — **no execution** — and stores the structure in `.dflow/workflow.db` (SQLite)
3. `dflow export` generates Sphinx RST stubs referencing the database, runs `sphinx-build`, and produces standalone HTML

The database is the single source of truth — downstream tools (Sphinx, Cortex, etc.) read it directly.

## Python Modules

For `.py` files, use `@workflow` and `@task` decorators with `Step()` / `AutoStep()` markers:

```python
from dflow.core.decorators import workflow, task, Step, AutoStep

@task(purpose="Load and validate data from h5ad file")
def load_data(path: str):
    adata = sc.read_h5ad(path)
    return adata

@task(purpose="Dimensionality reduction via PCA + UMAP")
def reduce_dims(adata):
    sc.tl.pca(adata)
    sc.tl.umap(adata)
    return adata

@workflow(purpose="Single-cell RNA-seq analysis pipeline")
def run_pipeline(data_path: str):
    口 = AutoStep(step_num=1)
    adata = load_data(data_path)

    口 = Step(step_num=2, name="Filter", purpose="Remove low-quality cells")
    adata = adata[adata.obs["n_genes"] > 200]

    口 = AutoStep(step_num=3)
    adata = reduce_dims(adata)
```

## Testing

Use `@test_workflow` and `@test_suite` to annotate test functions. dFlow scans these just like production workflows — extracting steps, purpose, and coverage links — but stores them with `role="test"` so documentation tooling can distinguish tests from production code.

### `@test_workflow`

Decorator for individual test functions. Same structure as `@workflow` but adds an optional `covers` parameter linking the test to the production functions it validates:

```python
from dflow.core.decorators import test_workflow, Step

@test_workflow(
    purpose="Execute preprocess → filter → label via CLI and verify lineage chain",
    covers=["pm.snakemake_gen.generate_snakefile"],
)
def test_three_step_pipeline(seeded_project):
    口 = Step(step_num=1, name="Run preprocess",
             purpose="Register and complete preprocess as the root step",
             outputs="Run ID for preprocess",
             critical="NOT IMPLEMENTED")
    # ... test body ...

    口 = Step(step_num=2, name="Run filter_cells",
             purpose="Register filter_cells with parent=preprocess",
             inputs="preprocess run ID")
    # ... test body ...
```

**Parameters:**

| Parameter | Required | Description |
|---|---|---|
| `purpose` | Yes | What this test validates (positional) |
| `covers` | No | List of dotpaths to production functions this test covers, e.g. `["pm.database.get_engine"]` (keyword-only) |
| `inputs` | No | Description of test inputs |
| `outputs` | No | Description of expected outputs |
| `critical` | No | Warnings (e.g. `"NOT IMPLEMENTED"` for skeleton tests) |

The `covers` list creates `CoverageLink` rows in the database. These are queried at Sphinx build time by `sphinx_dflow_ext` to inject test-coverage tables into API docs.

> **Note:** `test_workflow.__test__` is set to `False` so pytest won't try to collect the decorator itself as a test.

### `@test_suite`

Class-level decorator for grouping related test methods. Purpose is taken from the class **docstring**:

```python
from dflow.core.decorators import test_suite, test_workflow, Step

@test_suite(covers=["pm.database.get_engine", "pm.database.init_db"])
class TestDatabaseSetup:
    """Verify database initialization and engine creation."""

    @test_workflow(purpose="Engine connects to correct SQLite file")
    def test_engine_path(self, tmp_path):
        口 = Step(step_num=1, name="Create engine", purpose="Call get_engine()")
        # ...

    @test_workflow(
        purpose="Re-init clears stale data",
        covers=["pm.database.reset_db"],  # additional function-level covers
    )
    def test_reinit(self, tmp_path):
        口 = Step(step_num=1, name="Reset", purpose="Call reset_db()")
        # ...
```

**Key behaviors:**
- Class-level `covers` creates class-scoped `CoverageLink` rows (applies to all methods)
- Individual `@test_workflow` methods can add their own `covers` on top of the class-level list (function-scoped rows)
- Methods without `@test_workflow` are still discovered by the `TestScanner` as `role="test"` functions with `class_id` set

## Decorator Reference

| Decorator / Marker | Context | Purpose |
|---|---|---|
| `@workflow(purpose=...)` | `.py` files | Marks a top-level orchestration function |
| `@task(purpose=...)` | `.py` files | Marks a reusable unit of work |
| `@test_workflow(purpose=..., covers=[...])` | `.py` test files | Marks a test function with optional coverage links |
| `@test_suite(covers=[...])` | `.py` test files | Groups test methods in a class (purpose from docstring) |
| `workflow(name=..., purpose=...)` | Notebooks | Declares a workflow (plain function call) |
| `Step(step_num, name, purpose)` | Both | Inline step with explicit metadata |
| `AutoStep(step_num)` | Both | Step that inherits docs from the next function call |

**Optional parameters** on `@workflow`, `@task`, `@test_workflow`, and `Step`: `inputs`, `outputs`, `critical`.

## Step Numbering

- **Major steps**: integers (`1`, `2`, `3`) — sequential top-level operations
- **Minor steps**: floats (`1.1`, `1.2`) — sub-operations, use inside loops

## CLI

```bash
dflow init                        # Initialize .dflow/ directory
dflow build [paths...]            # Scan + resolve references (the main command)
dflow list                        # List discovered workflows
dflow export [workflows...] -o .  # Export to HTML (requires sphinx-dflow-ext)
dflow scan [paths...]             # Scan only (no resolve step)
dflow assemble                    # Resolve AutoStep references only
dflow validate [files...]         # Validate annotations
```

Common flags: `-v` (verbose), `-d` (debug), `--strict`, `-r` (project root).

## Database

All annotation data lives in `.dflow/workflow.db` (SQLite). Key tables:

| Table | Contents |
|---|---|
| `modules` | Scanned source files |
| `functions` | Decorated functions with purpose, inputs, outputs |
| `steps` | Step/AutoStep markers within functions |
| `workflow_entries` | Top-level `@workflow` entry points |
| `classes` | `@test_suite` class metadata |
| `coverage_links` | `covers=[...]` links from tests to production functions |

## License

MIT
