Metadata-Version: 2.4
Name: GEMspa-CLI
Version: 1.9.6
Summary: GEMspa single-particle tracking analysis pipeline (CLI and Python API)
Author-email: "Andrew Bazley, Sarah Keegan" <Andrew.bazley@nyulangone.org>
License-Expression: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy<1.25.0,>=1.21.0
Requires-Dist: pandas<2.0.0,>=1.3.0
Requires-Dist: scipy<2.0.0,>=1.7.0
Requires-Dist: matplotlib<4.0.0,>=3.4.0
Requires-Dist: seaborn<1.3.0,>=0.11.0
Requires-Dist: scikit-image<0.21.0,>=0.19.0
Requires-Dist: tifffile<2022.0.0,>=2020.12.8
Requires-Dist: nd2reader<4.0.0,>=3.0.0
Requires-Dist: joblib<2.0.0,>=1.2.0
Requires-Dist: numba<0.60.0,>=0.55.0

# GEMspa-CLI: Single-Particle Tracking Analysis (v1.9.6)

**Advanced, modular single-particle tracking (SPT) and ensemble diffusion analysis for microscopy data, compatible with TrackMate, GEMspa, and HK1 pipelines.**

---

## Authors & Credits

1. **Andrew Bazley** and **Sarah Keegan** — *Liam Holt and David Fenyo Labs, Institute for Systems Genetics, NYU Langone Health*  
2. **David Duran** — *Liam Holt Lab, Institute for Systems Genetics, NYU Langone Health*

**Original build:** [gemspa-spt (PyPI)](https://pypi.org/project/gemspa-spt/)  
**Primary reference:** [Keegan et al., *bioRxiv* 2023.06.26.546612](https://www.biorxiv.org/content/10.1101/2023.06.26.546612v1)

---

## Overview

`gemspa-cli` is a command-line interface for **GEMspa**, a modular single-particle tracking and diffusion analysis suite.  
It performs robust trajectory extraction, per-track MSD fitting, ensemble averaging, step-size statistics, and condition-wise comparisons.

### New in v1.9.6

- Automatic **HK1 grouped analysis** (advanced metrics and geometric descriptors)
- **TrackMate cleaning utility** (`--clean-trackmate`)
- **Unified global filtering parameters** for D and α across the entire pipeline
- **Standardized step-size axes:** x-limit = 3 µm, y-min = 1e-5 (log-scale)
- Improved **ensemble averaging** and **step-size KDE** calculations
- **Cross-condition comparison plots** with KS statistical annotations
- Modular architecture supporting automatic condition detection and flexible I/O

---

## Installation

### Create a virtual environment

```bash
python3 -m venv ~/venvs/gemspa
source ~/venvs/gemspa/bin/activate
```

Windows PowerShell:

```powershell
python -m venv %USERPROFILE%\venvs\gemspa
%USERPROFILE%\venvs\gemspa\Scripts\Activate.ps1
```

### Install from source or PyPI

```bash
python -m pip install --upgrade pip
python -m pip install -e .
```

or from PyPI:

```bash
python -m pip install gemspa-cli
```

---

## Workflow Overview

```
 ┌──────────────────────────────────────────────┐
 │ TrackMate export  →  gemspa-cli pipeline     │
 └──────────────────────────────────────────────┘
        │
        ├── (optional) --clean-trackmate
        │        → Traj_<COND>_<REP>.csv
        │
        ├── Per-replicate trajectory analysis
        │        → D_fit, α_fit, r², MSD curves, rainbow overlay
        │
        ├── Ensemble pooling by condition
        │        → grouped_raw / grouped_filtered
        │
        ├── (optional) --step-size-analysis
        │        → KDEs, α₂ non-Gaussian stats, KS tests
        │
        ├── (optional) automatic HK1 grouped analysis
        │        → grouped_advanced_analysis/
        │
        └── Cross-condition comparison
                 → comparison/*.png
```

---

## Core Analysis Logic

### 1. Mean-Square Displacement (MSD)

For each trajectory of \(N\) frames:

\[
\mathrm{MSD}(\tau) = \langle (x_{i+\tau} - x_i)^2 + (y_{i+\tau} - y_i)^2 \rangle_i
\]

where \(\tau\) is the time lag in seconds (`τ = frame × --time-step`).

---

### 2. Diffusion Coefficient (D)

Linear fit to the early MSD regime:

\[
\mathrm{MSD}(\tau) \approx 4D\tau \quad \Rightarrow \quad D = \frac{1}{4}\frac{d(\mathrm{MSD})}{d\tau}
\]

---

### 3. Anomalous Exponent (α)

Log–log slope across valid lags:

\[
\log_{10}\!\big[\mathrm{MSD}(\tau)\big] = \alpha \log_{10}(\tau) + \log_{10}(4D)
\]

- α ≈ 1 → normal diffusion  
- α < 1 → subdiffusive  
- α > 1 → superdiffusive

---

### 4. Non-Gaussian Parameter (α₂)

Quantifies deviation from Brownian motion based on step-size moments:

\[
\alpha_2 = \frac{\langle r^4 \rangle}{3\langle r^2 \rangle^2} - 1
\]

---

### 5. Velocity Autocorrelation (VACF)

Used in HK1 grouped analysis:

\[
\mathrm{VACF}(k) = \frac{\langle \vec{v}_i \cdot \vec{v}_{i+k} \rangle}
                        {\langle \vec{v}_i \cdot \vec{v}_i \rangle}
\]

---

## Command-Line Usage

```bash
gemspa-cli -d /path/to/folder [options]
```

**Required**
- `-d, --work-dir` : Directory with trajectory CSVs

### Common Options

#### Input Discovery
- `--csv-pattern` Glob for CSVs (default: `Traj_*.csv`)
- For TrackMate: `"*Spots in tracks*.csv"`

#### Acquisition / Units
- `--time-step` Seconds between frames  
- `--micron-per-px` Pixel size in µm

#### Track/fit Constraints
- `--min-track-len` Minimum frames per track  
- `--tlag-cutoff` Maximum lag (frames) for MSD fit

#### Parallelism
- `-j, --n-jobs` Processes across replicates  
- `--threads-per-rep` Threads per replicate

#### Rainbow Tracks (optional)
- `--rainbow-tracks` Enable colored overlays  
- `--img-prefix` Image prefix (e.g., `MAX_`)  
- `--rainbow-min-D`, `--rainbow-max-D`, `--rainbow-colormap`, `--rainbow-scale`, `--rainbow-dpi`

#### Ensemble Filters (shared globally)
- `--filter-D-min`, `--filter-D-max` (µm²/s)  
- `--filter-alpha-min`, `--filter-alpha-max`

#### Optional Analyses
- `--step-size-analysis` Enable step-size KDE + KS plots  
- `--clean-trackmate` Run TrackMate CSV cleaner and exit  
- `--no-HK1` Disable automatic HK1 analysis

---

## Outputs

### Per Replicate `<COND>_<REP>/`
- `msd_results.csv` : Per-track D_fit, α_fit, r²_fit  
- `msd_vs_tau.png` : Linear MSD vs τ with D estimate  
- `msd_vs_tau_loglog.png` : Log–log MSD vs τ with α slope  
- `D_fit_distribution.png` : Histogram of D (log x-axis)  
- `alpha_vs_logD.png` : Scatter of α vs log₁₀ D  
- `rainbow_tracks.png` : Colored trajectories (if enabled)

### Ensemble Level
- `grouped_raw/` and `grouped_filtered/` subfolders  
- Ensemble-averaged MSD plots (`ensemble_msd_vs_tau_<COND>.png`)  
- Step-size KDEs (`step_kde_<COND>_(ensemble).png`, filtered variants)  
  - Global limits: **x ≤ 3 µm**, **y ≥ 1e-5** (log-scale)

### Comparison (`comparison/`)
- `ensemble_filtered_D_histograms.png` (log-scale with KS annotation)  
- `ensemble_filtered_alpha_histograms.png`  
- `replicate_median_D_boxplot.png`

### HK1 Grouped Analysis (`grouped_advanced_analysis/`)
Automatically runs unless `--no-HK1` is specified.

**Per-track metrics**
```
track_id, condition, D_fit, alpha_fit, r2_fit, vacf_lag1,
confinement_idx, hull_area_um2, tortuosity, n_frames
```

**Plots**
- D_fit and α_fit box/violin plots by condition  
- VACF histograms and mean curves  
- Convex-hull area vs tortuosity scatterplots

### TrackMate Cleaner (`--clean-trackmate`)
Cleans TrackMate exports to GEMspa schema (`x, y, frame, track_id`) and standardizes names as:
```
Traj_<COND>_<REP>.csv
```

Options:  
- `--clean-out-dir` : Output directory  
- `--clean-include-date` : Include date codes (YYMMDD / YYYYMMDD)  
- `--clean-move` : Move instead of copy  
- `--clean-dry-run` : Preview only

---

## Example Commands

```bash
# Clean TrackMate CSVs only
gemspa-cli -d /data/TrackMateExports --clean-trackmate

# Full GEMspa run (auto HK1, no step-size)
gemspa-cli -d /data/GEMspa --time-step 0.03 --micron-per-px 0.11 --min-track-len 4 --tlag-cutoff 4

# Include step-size and rainbow overlays
gemspa-cli -d /data/GEMspa --rainbow-tracks --step-size-analysis

# Skip HK1 module
gemspa-cli -d /data/GEMspa --no-HK1
```

---

## Mathematical Summary

| Symbol | Definition | Units |
|:-------:|-------------|:------:|
| τ | Time lag ( frame × Δt ) | s |
| MSD(τ) | Mean square displacement | µm² |
| D | Diffusion coefficient | µm²/s |
| α | Anomalous exponent | – |
| α₂ | Non-Gaussian parameter | – |
| VACF | Velocity autocorrelation | – |
| T | Tortuosity | – |
| A_hull | Convex-hull area | µm² |

---

## Output Organization

```
<work_dir>/
├── <COND>_<REP>/
│   ├── msd_results.csv
│   ├── msd_vs_tau.png
│   └── ...
├── grouped_raw/
│   ├── msd_results.csv
│   └── step_kde/
├── grouped_filtered/
│   ├── msd_results.csv
│   └── step_kde/
├── grouped_advanced_analysis/
└── comparison/
```

---

## Citation

If you use this software, please cite:

> **Keegan S., Bazley A., et al.**  
> “A quantitative framework for analyzing protein mobility in the endoplasmic reticulum using genetically encoded nanoparticles.”  
> *bioRxiv* (2023): 2023.06.26.546612.  
> DOI: [10.1101/2023.06.26.546612](https://www.biorxiv.org/content/10.1101/2023.06.26.546612v1)

and reference the original package:  
> [gemspa-spt (PyPI)](https://pypi.org/project/gemspa-spt/)

---

## Acknowledgements

Developed by  
**Andrew Bazley**, **Sarah Keegan**, and **David Duran**  
*(Liam Holt and David Fenyo Labs, Institute for Systems Genetics, NYU Langone Health)*

---

© 2025 GEMspa Project · MIT License
