Metadata-Version: 2.1
Name: mcmicroprep
Version: 0.2.1
Summary: 
Author: Ajit Johnson Nirmal
Author-email: ajitjohnson.n@gmail.com
Requires-Python: >=3.10,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: lxml (>=6.0.0,<7.0.0)
Requires-Dist: ome-types (>=0.6.1,<0.7.0)
Requires-Dist: pandas (>=2.3.1,<3.0.0)
Requires-Dist: pydantic (>=2.11.7,<3.0.0)
Description-Content-Type: text/markdown

# 🧪 mcmicroprep 🚀

A **command-line tool** for preparing multiplexed imaging datasets (🦠 Olympus, 🩸 RareCyte) for the MCMICRO Nextflow pipeline.


## 🛠️ Installation

1. **Prerequisites**

   - Conda or Miniconda installed 🐍
   - Python **3.10+** environment 🌟
   - SLURM & Nextflow (`labsyspharm/mcmicro`) on your `$PATH`

2. **Create Conda env**

   ```bash
   conda create -n mcmicroprep python=3.12
   conda activate mcmicroprep
   ```

3. **Install package**

   ```bash
   pip install mcmicroprep
   ```

## 📁 Expected Dataset Structure

Your dataset root should contain one subdirectory per slide. Structures vary by vendor:

### 🦠 Olympus

Each *slide directory* must contain at least one non-`Overview` `*_frames/` folder (at any depth) —-- this is the minimum required structure. Additional files or folders may be present and do not need to be removed.

**Important**: For reliable stitching/registration, each image (per cycle) should have exactly one `*_frames/` folder. If you have multiple ROIs, separate them into separate image folders before running the pipeline; otherwise stitching/registration may produce non-legible results.

```
.DATASET FOLDER
├── slide1/
│   ├── image1_frames/
│   ├── image2_frames/
├── slide2/
└── slideN/
```

After running for **Olympus**, the dataset is reorganized into:

```
DATASET/
├── raw/
│   ├── slide1/             # non-Overview image*_frames/
│   ├── slide2/
│   └── slideN/
├── misc_files/
│   ├── slide1/             # everything else (+ Overview *_frames/)
│   ├── slide2/
│   └── slideN/
├── mcmicro_template.sh    # Nextflow template
├── base.config
├── markers.csv
└── params.yml            
```

### 🩸 RareCyte

Slide dirs may contain `*.rcpnl` at any depth: —-- this is the minimum required structure. Additional files or folders may be present and do not need to be removed.

```
/path/to/dataset/
├── slide1/
│   ├── img001.rcpnl
│   ├── subA/img002.rcpnl
│   └── other files
└── slideN/
```

After running for **RareCyte**, the dataset is reorganized into:

```
DATASET/
├── raw/
│   ├── slide1/             # all .rcpnl files (flattened)
│   ├── slide2/
│   └── slideN/
├── misc_files/
│   ├── slide1/             # everything else
│   ├── slide2/
│   └── slideN/
├── mcmicro_template.sh
├── base.config
├── markers.csv
└── params.yml
```

## 🚀 Usage

> **Note**: Configured for the HMS O2 cluster (SLURM). Generalize by editing SLURM directives in `templates/common/`.

### 🦠 Olympus

```bash
preparemcmicro \
  --microscope olympus \
  --image-root /path/to/dataset
```

### 🩸 RareCyte

```bash
preparemcmicro \
  --microscope rarecyte \
  --image-root /path/to/dataset
```

## 🛠️ Next Steps for Users

1. ✏️ **Edit** `markers.csv` in the dataset root to include your experiment-specific cycle-to-marker mappings.
2. 📤 **Upload** the entire processed dataset folder to the O2 cluster if you ran this locally.
3. 🚀 **Start the job** on O2:
   ```bash
   cd /n/scratch/users/${USER:0:1}/$USER/<DATASET FOLDER>
   sbatch mcmicro_template.sh
   ```

Happy processing! 🔬

