Metadata-Version: 2.4
Name: GEMspa-CLI
Version: 2.0.2
Summary: GEMspa: Advanced single-particle tracking analysis pipeline with MSD, diffusion analysis, and visualization tools
Author-email: "Andrew Bazley, Sarah Keegan" <Andrew.bazley@nyulangone.org>
License: MIT
Keywords: single-particle tracking,MSD,diffusion,trackmate,microscopy,analysis,SPT,SMT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2.0.0,>=1.21.0
Requires-Dist: pandas<3.0.0,>=1.3.0
Requires-Dist: scipy<2.0.0,>=1.7.0
Requires-Dist: matplotlib<4.0.0,>=3.4.0
Requires-Dist: seaborn<1.3.0,>=0.11.0
Requires-Dist: scikit-image<0.22.0,>=0.19.0
Requires-Dist: tifffile<2025.0.0,>=2020.12.8
Requires-Dist: nd2reader<4.0.0,>=3.0.0
Requires-Dist: joblib<2.0.0,>=1.2.0
Requires-Dist: numba<0.61.0,>=0.55.0
Dynamic: license-file

# GEMspa-CLI: Single-Particle Tracking Analysis (v2.0.0)

**Advanced, modular single-particle tracking (SPT) and ensemble diffusion analysis for microscopy data.**

---

## Overview

`gemspa-cli` is a command-line interface for **GEMspa**, a modular single-particle tracking and diffusion analysis suite. It performs trajectory extraction, per-track MSD fitting, ensemble averaging, step-size statistics, and condition-wise comparisons from single-molecule tracking data.

### New in v2.0.0

- **Condition-grouped metrics** — automatically generates split CSV files per condition
- **Steps and tracks analysis** — step-size vs. brightness heatmaps and colored track overlays
- **Flexible condition extraction** — strips date codes and handles arbitrary dataset naming conventions
- **Unified filtering system** — consistent D and α filtering across all pipeline stages
- **Enhanced CLI** — new flags for advanced visualization and analysis

### Features

- **Advanced group analysis** with automatic metrics splitting by condition
- **TrackMate cleaning utility** (`--clean-trackmate`)
- **Unified global filtering** of D and α applied consistently across the entire pipeline
- **Step-size vs. brightness heatmaps** with customizable binning and color scaling
- **Track overlays** colored by step size for visual quality control
- **Cross-condition comparison plots** with KS-test statistical annotations

---

## Installation

### From PyPI (recommended)

```bash
pip install GEMspa-CLI
```

### Virtual environment (optional but recommended)

```bash
python3 -m venv ~/venvs/gemspa
source ~/venvs/gemspa/bin/activate
```

Windows PowerShell:

```powershell
python -m venv %USERPROFILE%\venvs\gemspa
%USERPROFILE%\venvs\gemspa\Scripts\Activate.ps1
```

---

## Input File Formats

GEMspa-CLI accepts trajectory data in CSV format.

### TrackMate export format
- **File pattern**: `*Spots in tracks*.csv` (pass with `--csv-pattern "*Spots in tracks*.csv"`)
- **Required columns**: `POSITION_X`, `POSITION_Y`, `FRAME`, `TRACK_ID`
- **Optional columns**: `MEAN_INTENSITY_CH1` (for brightness analysis)

### GEMspa format
- **File pattern**: `Traj_*.csv` (default)
- **Required columns**: `x`, `y`, `frame`, `track_id`
- **Optional columns**: `brightness` (for step-size analysis)

### Naming conventions

GEMspa-CLI automatically strips date codes and replicate suffixes to extract condition labels:

- `Traj_20220706_G12V_4.csv` → condition: `G12V`
- `Traj_20220708_G12V_13.csv` → condition: `G12V` (correctly pooled with the above)
- `Traj_20220706_HKWT_2.csv` → condition: `HKWT`

---

## Pipeline Stages

### 1. Data discovery and preparation
- Scans the working directory for trajectory CSV files
- Optionally cleans TrackMate exports to GEMspa column format
- Groups files by condition, removing date codes automatically

### 2. Per-replicate analysis
- Loads each trajectory file and computes per-track MSD curves
- Fits diffusion parameters (D, α, r²) for every track
- Writes per-replicate result tables and diagnostic plots

### 3. Ensemble analysis
- Pools tracks by condition (raw and filtered variants)
- Computes ensemble-averaged MSD curves per condition
- Applies global filter parameters consistently

### 4. Advanced group analysis (default: ON)
- Computes VACF, confinement index, convex-hull area, and tortuosity
- Splits all metrics by condition into separate CSV files
- Generates comprehensive condition-level visualizations

### 5. Steps and tracks analysis (optional: `--steps-tracks`)
- Produces step-size vs. brightness heatmaps
- Generates track overlays colored by step size
- Supports both per-file and pooled outputs

### 6. Cross-condition comparison
- KS tests and boxplots comparing conditions
- Publication-ready figures saved to `comparison/`

---

## Workflow Overview

```
SMT data export  -->  gemspa-cli pipeline
        |
        +-- (optional) --clean-trackmate
        |        --> standardized Traj_*.csv
        |
        +-- Per-replicate trajectory analysis
        |        --> D_fit, alpha_fit, r2, MSD curves, rainbow overlay
        |
        +-- Ensemble pooling by condition
        |        --> grouped_raw/ and grouped_filtered/
        |
        +-- (optional) --step-size-analysis
        |        --> KDEs, non-Gaussian (alpha2) stats, KS tests
        |
        +-- (optional) --steps-tracks
        |        --> brightness_stepsize/   (heatmaps)
        |        --> tracks_stepsize_map/   (track overlays)
        |
        +-- Advanced group analysis (default: ON)
        |        --> grouped_advanced_analysis/
        |        --> split_metrics_by_condition/
        |
        +-- Cross-condition comparison
                 --> comparison/*.png
```

---

## Core Analysis

### 1. Mean-square displacement (MSD)

For a trajectory of N frames, the time-averaged MSD at lag tau is:

    MSD(tau) = < (x[i+tau] - x[i])^2 + (y[i+tau] - y[i])^2 >_i

where tau = frame_lag x dt (set by `--time-step`), and the average is taken over all valid frame pairs i.

---

### 2. Diffusion coefficient (D)

Estimated from a linear fit to the early MSD regime:

    MSD(tau) ~ 4 * D * tau   =>   D = (1/4) * d(MSD)/d(tau)

Units: µm²/s

---

### 3. Anomalous exponent (alpha)

Extracted from the log-log slope of MSD vs. tau:

    log10[ MSD(tau) ] = alpha * log10(tau) + log10(4D)

- alpha ≈ 1: normal (Brownian) diffusion
- alpha < 1: subdiffusive (confined or hindered)
- alpha > 1: superdiffusive (directed or active)

---

### 4. Non-Gaussian parameter (alpha2)

Quantifies deviation from Gaussian (Brownian) step-size statistics:

    alpha2 = <r^4> / (3 * <r^2>^2) - 1

alpha2 = 0 for a Gaussian distribution; positive values indicate heterogeneous or anomalous dynamics.

---

### 5. Velocity autocorrelation function (VACF)

Used in advanced group analysis to assess directional persistence:

    VACF(k) = < v_i . v_{i+k} > / < v_i . v_i >

where v_i is the displacement vector at step i and k is the lag in frames.

---

## Usage

### Graphical interface

```bash
gemspa-gui
```

The GUI exposes all CLI parameters through organized input sections, includes real-time validation, and supports saving and loading parameter sets for reproducible analysis.

### Command-line interface

```bash
gemspa-cli -d /path/to/folder [options]
```

**Required**

- `-d, --work-dir` — directory containing trajectory CSV files

**Input discovery**

- `--csv-pattern` — glob for input CSVs (default: `Traj_*.csv`); for TrackMate use `"*Spots in tracks*.csv"`

**Acquisition and units**

- `--time-step` — seconds per frame
- `--micron-per-px` — pixel size in µm

**Track and fit constraints**

- `--min-track-len` — minimum frames per track
- `--tlag-cutoff` — maximum lag in frames used for MSD fitting

**Parallelism**

- `-j, --n-jobs` — parallel processes across replicates
- `--threads-per-rep` — threads per replicate

**Rainbow track overlays (optional)**

- `--rainbow-tracks` — enable D-colored track overlays
- `--img-prefix` — prefix for background TIFF images (e.g., `MAX_`)
- `--rainbow-min-D`, `--rainbow-max-D` — D range for colormap
- `--rainbow-colormap`, `--rainbow-scale`, `--rainbow-dpi`

**Ensemble filters (applied globally)**

- `--filter-D-min`, `--filter-D-max` — D bounds (µm²/s)
- `--filter-alpha-min`, `--filter-alpha-max` — alpha bounds

**Optional analyses**

- `--step-size-analysis` — step-size KDE and KS plots
- `--clean-trackmate` — run TrackMate CSV cleaner and exit
- `--no-advanced-group` — skip the advanced group analysis stage

**Steps and tracks analysis**

- `--steps-tracks` — enable step-size vs. brightness heatmaps and track overlays
- `--steps-tracks-mode {both,heatmaps,tracks}` — what to generate (default: `both`)
- `--stepsize-max` — max step size for plots and LUT in pixels (default: 3.0)
- `--bins-x`, `--bins-y` — heatmap bin counts (default: 150 each)
- `--count-cap` — count cap for heatmap color scale (default: 300)
- `--line-width` — line width for track overlays (default: 0.7)
- `--min-track-length` — minimum track length for overlays (default: 10)
- `--brightness-col` — brightness column name (default: `MEAN_INTENSITY_CH1`)
- `--invert-lut-tracks` — invert the colormap for track overlays
- `--strip-datecodes` — strip date codes from output filenames (default: true)

---

## Outputs

### Per replicate (`<COND>_<REP>/`)

- `msd_results.csv` — per-track D_fit, alpha_fit, r2_fit
- `msd_vs_tau.png` — linear MSD vs. tau with D estimate
- `msd_vs_tau_loglog.png` — log-log MSD vs. tau with alpha slope
- `D_fit_distribution.png` — histogram of D (log x-axis)
- `alpha_vs_logD.png` — scatter of alpha vs. log10(D)
- `rainbow_tracks.png` — D-colored trajectory overlay (if enabled)

### Ensemble level

- `grouped_raw/` and `grouped_filtered/` subdirectories
- Ensemble-averaged MSD plots (`ensemble_msd_vs_tau_<COND>.png`)
- Step-size KDEs (`step_kde_<COND>_(ensemble).png` and filtered variants)

### Comparison (`comparison/`)

- `ensemble_filtered_D_histograms.png` — log-scale D distributions with KS annotation
- `ensemble_filtered_alpha_histograms.png` — alpha distributions with KS annotation
- `replicate_median_D_boxplot.png` — per-replicate median D by condition

### Advanced group analysis (`grouped_advanced_analysis/`)

Runs automatically unless `--no-advanced-group` is specified.

Per-track metrics table columns:

```
track_id, condition, D_fit, alpha_fit, r2_fit, vacf_lag1,
confinement_idx, hull_area_um2, tortuosity, n_frames
```

Plots: D_fit and alpha_fit box/violin plots by condition, VACF histograms and mean curves, convex-hull area vs. tortuosity scatterplots.

Split metrics (`split_metrics_by_condition/`) — one CSV per metric, conditions as columns:

- `D_fit.csv`, `alpha_fit.csv`, `r2_fit.csv`
- `vacf_lag1.csv`, `confinement_idx.csv`
- `hull_area_um2.csv`, `tortuosity.csv`, `n_frames.csv`
- `_index.csv` — parameter-to-file mapping

### Steps and tracks analysis

Generated when `--steps-tracks` is specified.

Heatmaps (`brightness_stepsize/`):
- `heatmap_all.png`, `heatmap_<filename>.png`
- `steps_vs_brightness_all.csv`, `steps_vs_brightness_<filename>.csv`

Track overlays (`tracks_stepsize_map/`):
- `overlay_all.png`, `overlay_<filename>.png`
- `tracks_stepsize_combined.pdf`

### TrackMate cleaner (`--clean-trackmate`)

Normalizes TrackMate exports to GEMspa column format (`x`, `y`, `frame`, `track_id`), writing cleaned files in place while preserving original filenames.

Options:
- `--clean-out-dir` — write outputs to a separate directory
- `--clean-include-date` — include date codes in output names (YYMMDD or YYYYMMDD)
- `--clean-move` — move rather than copy when renaming legacy `Traj_*` files
- `--clean-dry-run` — preview actions without writing any files

---

## Example Commands

```bash
# Clean TrackMate CSVs only
gemspa-cli -d /data/TrackMateExports --clean-trackmate

# Standard run with physical units
gemspa-cli -d /data/GEMspa --time-step 0.03 --micron-per-px 0.11 \
  --min-track-len 4 --tlag-cutoff 4

# Add step-size KDEs and rainbow overlays
gemspa-cli -d /data/GEMspa --rainbow-tracks --step-size-analysis

# Steps and tracks analysis with default settings
gemspa-cli -d /data/GEMspa --steps-tracks --time-step 0.01 --micron-per-px 0.11

# Heatmaps only with custom binning
gemspa-cli -d /data/GEMspa --steps-tracks --steps-tracks-mode heatmaps \
  --stepsize-max 5.0 --bins-x 200 --bins-y 200 --count-cap 500

# Track overlays only with inverted colormap
gemspa-cli -d /data/GEMspa --steps-tracks --steps-tracks-mode tracks \
  --line-width 1.0 --invert-lut-tracks --min-track-length 15

# Skip advanced analysis
gemspa-cli -d /data/GEMspa --no-advanced-group

# TrackMate data with custom CSV pattern
gemspa-cli -d /data/TrackMateExports --csv-pattern "*Spots in tracks*.csv" \
  --time-step 0.01 --micron-per-px 0.11 --steps-tracks
```

---

## Symbol Reference

| Symbol    | Definition                          | Units   |
|-----------|-------------------------------------|---------|
| tau       | Time lag (frame_lag x dt)           | s       |
| MSD(tau)  | Mean-square displacement            | µm²     |
| D         | Diffusion coefficient               | µm²/s   |
| alpha     | Anomalous exponent                  | —       |
| alpha2    | Non-Gaussian parameter              | —       |
| VACF      | Velocity autocorrelation function   | —       |
| T         | Tortuosity                          | —       |
| A_hull    | Convex-hull area                    | µm²     |

---

## Output Directory Structure

```
<work_dir>/
├── <COND>_<REP>/                     # per-replicate analysis
│   ├── msd_results.csv
│   ├── msd_vs_tau.png
│   └── ...
├── grouped_raw/                      # raw ensemble analysis
│   ├── msd_results.csv
│   └── step_kde/
├── grouped_filtered/                 # filtered ensemble analysis
│   ├── msd_results.csv
│   └── step_kde/
├── grouped_advanced_analysis/        # advanced metrics (default: ON)
│   ├── all_conditions_advanced_metrics.csv
│   ├── figures/
│   └── split_metrics_by_condition/
│       ├── D_fit.csv
│       ├── alpha_fit.csv
│       └── ...
├── brightness_stepsize/              # (--steps-tracks)
│   ├── heatmap_all.png
│   ├── heatmap_<filename>.png
│   └── steps_vs_brightness_*.csv
├── tracks_stepsize_map/              # (--steps-tracks)
│   ├── overlay_all.png
│   ├── overlay_<filename>.png
│   └── tracks_stepsize_combined.pdf
└── comparison/                       # cross-condition comparisons
    └── *.png
```

---

## Citation

If you use this software, please cite:

> **Bazley A., Keegan S. et al.**
> [GEMspa-CLI (PyPI)](https://pypi.org/project/GEMspa-CLI/)

---

## Acknowledgements

Developed by:

1. **Andrew Bazley** and **Sarah Keegan** — *Holt and Fenyo Labs, Institute for Systems Genetics, NYU Langone Health*
2. **David Duran** — *Holt Lab, Institute for Systems Genetics, NYU Langone Health*

**Original package:** [gemspa-spt (PyPI)](https://pypi.org/project/gemspa-spt/)  
**Primary reference:** [Keegan et al., *bioRxiv* 2023.06.26.546612](https://www.biorxiv.org/content/10.1101/2023.06.26.546612v1)

---

© 2025 GEMspa Project · MIT License
