Metadata-Version: 2.4
Name: GitAtbx
Version: 0.1.1
Summary: Git-integrated CLI toolkit for scaffolding and running Nextflow-based data analysis pipelines
Author: Cagatay Özcan Jagiello Gutt
License-Expression: MIT
Project-URL: Repository, https://github.com/CagatayGutt/AnalysisToolbox
Keywords: nextflow,pipeline,data-analysis,neuroscience,cli
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Requires-Dist: polars>=0.20
Requires-Dist: pyarrow>=14
Requires-Dist: mne>=1.6
Requires-Dist: mne-connectivity>=0.7
Requires-Dist: mne-nirs>=0.6
Requires-Dist: pyxdf>=1.16
Requires-Dist: statsmodels>=0.14
Requires-Dist: pingouin>=0.5
Requires-Dist: neurokit2>=0.2.7

# AnalysisToolbox

A modular framework for automated data processing and statistical analysis pipelines. Built on **Nextflow** for scalable, reproducible workflows with automatic result synchronization.

## Overview

The AnalysisToolbox provides infrastructure for building data processing pipelines that:

- **Process multiple datasets** in parallel with automatic participant discovery
- **Handle diverse data types** through a generic reader/processor/analyzer architecture
- **Track progress** via per-participant logging visible live in the web UI
- **Recover gracefully** from failures without losing completed work

The framework is domain-agnostic — modules follow simple input/output conventions (Parquet/FIF files) and can implement any processing logic.

## Repository Structure

```
AnalysisToolbox/
├── gitatbx/               # pip-installable package (installed via pip install GitAtbx)
│   ├── bin/               # workflow_wrapper.nf, log_to_parquet.py, nextflow.config, ...
│   ├── modules/           # analyzers/, processors/, readers/, utils/
│   ├── templates/         # workflow_template.nf, modules_template.nf, parameters_template.config
│   └── utils/             # serve_html.ps1, reinject.sh, result_collector.py
├── pyproject.toml
└── README.md
```

On first run `gitatbx` creates a **symlink** `~/Documents/GitAtbxModules` → `<site-packages>/gitatbx/modules/`. The symlink always reflects the live installed version — upgrading via `pip install --upgrade GitAtbx` automatically shows updated modules through the same symlink path.

## Prerequisites

### Java Runtime (required by Nextflow)

```bash
sudo apt update && sudo apt install default-jre
java -version
```

### Nextflow

```bash
curl -s https://get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
nextflow -version
```

## Install

```bash
pip install GitAtbx
```

Python dependencies (`numpy`, `scipy`, `polars`, `mne`, `neurokit2`, …) are installed automatically.

On first run `gitatbx` creates a symlink `~/Documents/GitAtbxModules` → `<site-packages>/gitatbx/modules/` and saves the path to `~/.gitatbx_config`.

## Usage

### Commands

| Command | What it does |
|---|---|
| `gitatbx init <dir>` | Scaffold a new analysis project |
| `gitatbx run [pattern]` | Find and run a pipeline by project name pattern |
| `gitatbx serve [--dir DIR] [--port PORT]` | Serve results HTML locally in browser |
| `gitatbx reinject <PID> [options]` | Reinject a corrected output for one participant |
| `gitatbx move <dest>` | Move the deployed modules folder to a new location |
| `gitatbx config show` | Print current configuration (`~/.gitatbx_config`) |

### `gitatbx init`

Prompts for project name, raw data directory, Python executable, toolbox path, git author identity, and an optional GitHub remote URL for the results repo. Then creates:

```
<dir>/
├── {name}_analysis/
│   ├── {name}_pipeline.nf       ← edit your workflow here
│   ├── {name}_modules.nf        ← add IOInterface includes here
│   └── {name}_parameters.config ← pre-filled paths, params, and git identity
└── {name}_results/
    └── .git/                    ← initialised + remote added (if URL provided)
```

Git author name and email default to your global `git config` values if already set. The remote URL is validated immediately with `git ls-remote` — if authentication fails (e.g. SSH key not yet added to GitHub), a warning is printed with a link to the GitHub SSH setup guide.

The pipeline uses the stamped `params.git_user_name` / `params.git_user_email` as the commit author for all automatic result syncs.

### `gitatbx run`

GitAtbx searches the entire accessible filesystem (home directory and all drives on Windows) for a directory named `(name)_analysis` containing a `*_pipeline.nf`, then runs it automatically. No need to `cd` anywhere.

```bash
gitatbx run (name) --resume   # continue a previous run
gitatbx run               # no pattern: use current directory
```

Found paths are cached in `~/.gitatbx_config` so subsequent calls are instant.

### `gitatbx serve`

Starts a local HTTP server to browse results HTML generated by the pipeline.

```bash
gitatbx serve --dir ../EV_results --port 8080
```

### `gitatbx reinject`

Places a corrected parquet into `corrections/<script_name>/`, marks the participant for replay, invalidates relevant Nextflow cache entries, and resumes the pipeline for that participant only.

```bash
gitatbx reinject EV_002 --corrected-file fixed.parquet --script-name filtering_processor
```

### `gitatbx move`

To move the symlink to a custom location:

```bash
gitatbx move /mnt/d/repoShaggy/GitAtbxModules
```

`gitatbx` moves the folder and updates `~/.gitatbx_config` automatically. `gitatbx init` will then default `toolbox_dir` to that path when scaffolding new projects.

## Key Components

### `bin/workflow_wrapper.nf`

Discovers participant directories, manages per-participant output folders, runs `log_to_parquet.py` and `interactive_plotter.py` automatically, and handles per-participant git sync on completion.

### `IOInterface`

Generic Nextflow process that runs any script (reader/processor/analyzer) with automatic logging. Every module in `modules/` is called through `IOInterface`.

### Modules

The `modules/` folder is a curated but non-exhaustive starting collection of commonly useful scripts. The first time `gitatbx` is run it mirrors the full collection to `~/Documents/GitAtbxModules` (Windows) or `~/GitAtbxModules` (Linux/macOS) automatically. You can move that folder anywhere with `gitatbx move`. The local copy is yours to extend: add domain-specific scripts, modify existing ones, or organise them into subfolders. Any script there can be called via `IOInterface` identically to the built-in modules, as long as it follows the same convention: positional CLI arguments in, Parquet or FIF outputs, non-zero exit on failure.

## Authors

**Cagatay Özcan Jagiello Gutt** — Lead Developer
ORCID: https://orcid.org/0000-0002-1774-532X
