Metadata-Version: 2.4
Name: workforce
Version: 1.1.14
Summary: Run bash commands with python multiprocessing. Includes a Tkinter GUI for workflow editing.
Author-email: Theo Portlock <zn.tportlock@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/theoportlock/workforce
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
License-File: LICENSE
License-File: AUTHORS.rst
Requires-Dist: networkx
Requires-Dist: flask
Requires-Dist: flask-socketio
Requires-Dist: python-socketio
Requires-Dist: websocket-client
Requires-Dist: requests
Requires-Dist: platformdirs
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Dynamic: license-file

=========
workforce
=========

.. image:: https://img.shields.io/pypi/v/workforce.svg
    :target: https://pypi.python.org/pypi/workforce

.. image:: https://readthedocs.org/projects/workforce/badge/?version=latest
    :target: https://workforce.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status

.. image:: docs/images/small.png
    :alt: Small pipeline example
    :align: center
    :width: 800px

.. image:: docs/images/complex.png
    :alt: Complex pipeline editor view
    :align: center
    :width: 800px

Workforce is an application that creates and runs bash commands in the order of a graph. It serves as a desktop for terminals, allowing you to build and run pipelines of bash commands with python multiprocessing according to a graphml file.

Similar to other workflow management systems like Galaxy workflow, QIIME plugin workflows, AnADAMA2, Snakemake, Nextflow, and Make, but designed with multiuser support and a graphical interface for workflow editing.

* Free software: MIT license
* Documentation: https://workforce-documentation.readthedocs.io.

Features
--------

* **Graph-based workflow execution**: Define bash commands as nodes in a directed graph
* **Multiuser support**: Multiple clients can interact with the same workflow simultaneously
* **Server-based architecture**: Workflows are served via Flask API with unique URLs
* **Event-driven execution**: Dependency-aware scheduling with real-time status updates
* **Flexible edge types**: Use blocking edges for strict dependencies or non-blocking edges for flexible triggering and re-execution
* **Subset execution**: Run specific subgraphs or the entire workflow
* **Resume capability**: Restart failed nodes and continue pipeline execution
* **Interactive GUI**: Edit workflows visually with a Tkinter-based interface
* **Flexible command wrapping**: Add prefixes/suffixes to commands (Docker, SSH, tmux, etc.)

Architecture Overview
---------------------

Server
~~~~~~

The server component provides a single machine-wide instance that manages multiple workspace contexts:

**Server Startup**: When starting a server using the CLI (``python -m workforce server start``):

1. Checks if a server is already running via health check discovery (ports 5000-5100)
2. If found, informs user and exits (enforces singleton per machine)
3. If not found, discovers free port and starts Flask + Socket.IO server
4. Waits for clients to connect and creates workspace contexts on-demand

**Workspace Management**:

- Each workfile gets a deterministic workspace ID (SHA256 hash of absolute path)
- Server maintains isolated ServerContext objects per workspace with:
  - Dedicated modification queue for serialized graph operations
  - Per-workspace event bus for domain events
  - Worker thread for processing queued mutations
  - Socket.IO room for event isolation
- Contexts created on first client connect, destroyed on last disconnect

**Server Operations**:

- Accepts workspace-scoped requests at ``/workspace/{workspace_id}/...`` endpoints
- Edit API: Modify workflow structure (add/remove nodes, edges, statuses)
- Run API: Initiate workflow execution with arguments:
  - ``nodes``: Specific nodes to execute as subset
  - ``wrapper``: Command prefix/suffix wrapper
- Status updates propagate via Socket.IO room-based events

**Server Shutdown**: On idle (no active clients or runs):

- Automatically shuts down after brief idle period
- Contexts destroyed on last client disconnect
- Next client connection auto-starts new server instance

Unified Execution Model
~~~~~~~~~~~~~~~~~~~~~~~~

The system employs a unified execution model where every run is treated as a subset run:

**Node Selection**:

- If specific nodes are selected (via CLI or GUI), those nodes form an induced subgraph for execution
- If no nodes are explicitly selected:
  - The system first checks for failed nodes and selects them for re-execution
  - If there are no failed nodes, nodes with zero in-degree in the full workflow are selected
  - This means by default, the entire workflow is treated as the active subset

**Execution Initialization**: Upon initialization, the scheduler:

1. Identifies all nodes within the target subset that have an in-degree of zero relative only to that subset
2. Transitions these nodes to a "run" state
3. Ensures nodes start immediately if their dependencies in the master workfile are omitted from the current run scope

**Subgraph Boundaries**: To prevent execution from bleeding into the rest of the workfile:

- The scheduler strictly enforces subnetwork boundaries
- Propagation is confined entirely to the active selection
- When a node completes, only outgoing edges within the filtered subnetwork are evaluated
- Edges leading to nodes outside the original subset are ignored, effectively "capping" the execution

Execution Loop and Dependency Management
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Node Execution**:

1. When a node runs, its stdout and stderr are captured as node attributes
2. These outputs are viewable from the GUI (with the 'l' shortcut key)
3. Upon successful completion, an event is emitted to the run request
4. Each event is tagged with a client ID, allowing multiple concurrent runs and GUI clients to operate without interference

**Scheduler Operations**:

1. The emission triggers the scheduler to retrieve the filtered subnetwork map
2. All valid outgoing edges (within the subnetwork) are updated to a ``to_run`` status
3. An edge-status change event is broadcast

**Dependency Checking**:

1. The status change prompts the target node to perform a dependency check
2. The node transitions to the ``run`` state only if ALL incoming edges (within the subnetwork context) are marked as ``to_run``
3. Once this condition is satisfied:
   - The node clears the statuses from those incoming edges
   - Begins execution
   - Loops back to the capture and emission phase

This mechanism ensures the engine only advances when subset-specific dependencies are fully met.

Resume Logic
~~~~~~~~~~~~

The resume functionality (Shift+R in GUI) handles failures or cancellations:

- Replaces a node's ``failed`` status with ``run``
- Re-triggers the event loop, which causes the scheduler to re-check dependencies and queue the node for execution
- Allows the remainder of the pipeline to proceed through the normal dependency checking process
- Strictly bounded by the subset; resume never propagates to nodes outside the original selection
- Ensures nodes do not remain in a running state indefinitely

By ensuring clean status management and ignoring edges outside the active scope, the system guarantees a clean termination once the selected subgraph is exhausted.

Installation
------------
Installation can be done with:

.. code-block:: bash

   pip install workforce

Building a workforce workflow
-----------------------------
To launch the pipeline editor, run:

.. code-block:: bash

   wf

or:

.. code-block:: bash

   python -m workforce

To open a previously constructed pipeline, run:

.. code-block:: bash

   wf <PIPELINE.graphml>
    
If a `Workfile` is in the current directory:

.. code-block:: bash

   wf

Running workforce plan
----------------------
To run a plan from the GUI, click the 'Run' button or press 'r'. If nodes are selected, execution starts from those nodes. Otherwise, the full pipeline is executed. Run from cli with:

.. code-block:: bash

   wf run Workfile

Prefix and Suffix
-----------------
Adding the following prefix and suffixes to the wf run command (or within gui) will add those prefix and suffixes to each command ran by the pipeline.

+-------------------------------------------------+---------------------------------------------------------------------------+
| Wrapper Command                                 | Description                                                               |
+=================================================+===========================================================================+
| --wrapper 'bash -c "{}"'                        | Standard bash execution                                                   |
| --wrapper 'bash -c '. env.sh ''                 | Bash execution with definition of config or other environmental settings  |
| --wrapper 'tmux send-keys {} C-m'               | Sends each command to a tmux session and executes it.                     |
| --wrapper 'ssh ADDRESS {}'                      | Executes each command remotely on the specified server.                   |
| --wrapper 'parallel {} ::: FILENAMES'           | Runs the pipeline on each specified filename.                             |
| --wrapper 'docker run -it IMAGE {}'             | Executes each command inside a Docker container with an interactive TTY.  |
| --wrapper 'echo {} >> commands.sh'              | Exports pipeline commands to a bash script named commands.sh.             |
| --wrapper 'bash -lc "conda activate ENV && {}"' | Activates a Conda environment before executing the command.               |
| --wrapper 'nohup {} &'                          | Runs commands in the background.                                          |
+-------------------------------------------------+---------------------------------------------------------------------------+

To run specific process(es) from the editor, select the process(es) and click the 'Run' button (or shortcut with 'r' key). If no processes are selected, the entire pipeline will run. Opening the terminal with shortcut 't' (or on the toolbar), you can see the output of the commands.

This is tested on mac, linux, and windows powershell and wsl2.
