Metadata-Version: 2.4
Name: pentachrome-plugin
Version: 0.4.0
Summary: Napari plugin for the Pentachrome histology pipeline: VSI extraction, nnUNet inference, statistics.
Author: Dimitrios Tsilis
License: MIT
Project-URL: Homepage, https://github.com/dtsilis7/PentachromePipeline
Project-URL: Bug Tracker, https://github.com/dtsilis7/PentachromePipeline/issues
Keywords: napari,histology,nnunet,bioformats,segmentation
Classifier: Framework :: napari
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: Microsoft :: Windows
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: napari[all]>=0.4.18
Requires-Dist: qtpy
Requires-Dist: tifffile
Requires-Dist: numpy<2
Requires-Dist: opencv-python
Requires-Dist: scipy
Requires-Dist: scikit-image
Requires-Dist: skan
Requires-Dist: imagecodecs
Dynamic: license-file

# pentachrome-plugin

Napari plugin for the Pentachrome histology pipeline. Current widgets:

- **VSI to TIFF Extractor** (Phase 1) — extract tissue-region TIFFs from VSI files.
- **nnUNet Inference** (Phase 2) — run the trained Epithelium / MultiStructure models on selected TIFF layers and load colorized masks back into the viewer.
- **Mask Statistics** (Phase 3) — per-region statistics (thickness, composition, cell densities) computed on the inference output, with CSV export.

Source and issues: https://github.com/dtsilis7/PentachromePipeline

**Requirements:** Windows, Python 3.10, napari >= 0.4.18, and a Java JDK (JDK 17 confirmed). The Java/bioformats stack and the nnUNet model weights are installed separately (see below) — they can't come from a plain `pip install`.

## Install (Windows, PowerShell)

Requires a working Java JDK on PATH (JDK 17 confirmed working).

### Via napari plugin manager (after publishing)

Once this package is published to PyPI it shows up in napari's **Plugins -> Install/Uninstall Plugins** (search "pentachrome"). That installs the Python package only — you still need the conda-forge Java/bioformats step and the nnUNet weights below for the extractor and inference to work.

### Recommended: conda env

`cd` into the plugin directory first — `pip install -e .` resolves `.` relative to your current shell directory:

```powershell
conda activate napari
cd "...\pentachrome_plugin"
conda install -c conda-forge python-javabridge python-bioformats
pip install opencv-python tifffile
pip install -e .
```

If you would rather not `cd`, pass the absolute path explicitly:

```powershell
pip install -e "...\pentachrome_plugin"
```

Conda-forge ships pre-built wheels for `python-javabridge` and `python-bioformats` and avoids the MSVC + NumPy-2 compile failure that `pip install python-javabridge` hits today (the C extension references `_PyArray_Descr` fields removed in NumPy 2.0).

### Fallback: pure pip

Only use this if conda-forge is unavailable. NumPy must be pinned below 2 *before* javabridge builds, and build isolation must be off so the build sees the pinned NumPy:

```powershell
cd "...\pentachrome_plugin"
pip install "numpy<2"
pip install --no-build-isolation python-javabridge python-bioformats
pip install opencv-python tifffile
pip install -e .
```

## Launch

```powershell
conda activate napari   # or whichever env you installed into
python -m napari
```

In napari: **Plugins -> VSI to TIFF Extractor** or **Plugins -> nnUNet Inference**.

## nnUNet Inference (Phase 2)

Requires `nnunetv2` installed in the same environment (the `nnUNetv2_predict` CLI must be on PATH).

```powershell
conda activate napari
pip install nnunetv2
```

Workflow:

1. Load TIFFs into napari (e.g. via Phase 1's auto-load checkbox, or drag-and-drop).
2. Open **Plugins -> nnUNet Inference**.
3. Select one or more image layers in the list.
4. Tick **Epithelium**, **MultiStructure**, or both.
5. Set **Output folder** (where raw + colorized masks go) and **nnUNet results** (folder containing `Dataset001_Epithelium` and `Dataset002_MultiStructure`). The results path auto-fills if `nnUNet_Training/nnUNet_results/results` is found.
6. Pick **Device** (`cpu` or `cuda`) and click **Analyze**.

### Speed vs quality (important on CPU)

nnUNet inference on a laptop CPU is slow because every image goes through *folds × mirror augmentations × sliding-window patches* forward passes. With defaults that can be 20+ passes per image. The widget exposes three knobs in the **Speed / quality** group:

| Knob | Default | What it does |
| --- | --- | --- |
| Epithelium folds | `Fold 0 only` | Use 1 of the 5 trained folds for Dataset001. All 5 ensembled is best quality but ~5x slower. Dataset002 only has fold 0 trained, so it's always 1 fold. |
| Disable test-time mirroring | on | Passes `--disable_tta`. Skips the 4 mirror augmentations the model normally averages over. ~4x faster, small accuracy hit. |
| Sliding-window step | `0.5` | Passes `-step_size`. Larger = fewer overlapping patches = faster but rougher tile borders. Try `0.7` for a middle ground. |

With all three defaults on a CPU laptop, one ROI tile should take a few minutes instead of 30+. Switch to `All 5 folds` + TTA on once you've moved to a GPU box.

### Continuing from the extractor

The two widgets are linked through two small bridges, so you can run **Extract -> Analyze** in a single napari session without re-picking files:

- When the extractor auto-loads a TIFF as a viewer layer, it stashes the on-disk path on `layer.metadata['source_tiff']`. The inference widget reads that during staging and **copies the original file** into `_staging_input/` rather than re-saving the in-memory array — important for 15k x 15k tiles.
- When an extraction completes, the inference widget's **"Use last extractor output"** button pre-fills the output folder to `<extractor_output_root>/_inference`, so masks land next to the per-VSI subfolders the extractor created.

Both bridges are in-process only (see `_session.py`); they reset when napari closes.

Outputs land in:

```
<output_folder>/
  _staging_input/            # nnUNet-named (_0000.tif) copies of the selected layers
  epithelium_raw/            # binary masks from Dataset001
  epithelium_colored/        # RGB colorized masks (red epithelium)
  multistructure_raw/        # 6-class masks from Dataset002
  multistructure_colored/    # RGB colorized masks (Elastin/Collagen/Nuclei/Mucins/Membrane/Goblets)
```

Colorized masks are added to the viewer as RGB image layers when the run finishes.

### nnUNet inference architecture

Same subprocess pattern as Phase 1. The widget never imports torch or nnUNetv2 directly; it spawns `_inference_worker.py` which:

- sets `nnUNet_results` to the configured results dir,
- calls `nnUNetv2_predict` once per enabled model (folds 0-4 for Epithelium, fold 0 for MultiStructure, matching `run_inference.py`),
- colorizes the resulting integer masks with the palettes from `colorize_masks.py` / `compare_grid.py`,
- streams JSON-line events on stdout for the widget's progress bar and log.

## How it works

- The widget itself never touches the JVM. When you click **Extract ROIs**, it spawns `_vsi_worker.py` as a separate Python process.
- That worker process starts the bioformats JVM, loops over the VSI files using `TileMaskStitcher` (reused from `VSI_Handler/tile_mask_stitcher.py`), writes numbered TIFFs into `<output_root>/<vsi_basename>/`, and emits JSON-line progress events on stdout.
- The widget streams those events on a background thread and updates the progress bar / log without blocking the UI.
- When the worker exits, the JVM dies with it. The next extraction batch starts a fresh JVM in a fresh process - this avoids the "JVM cannot be restarted" pitfall during a long napari session.

## Defaults

The parameter defaults mirror `Processing_VSI_Files.py`:

| Parameter | Default |
| --- | --- |
| Series | 6 |
| Tile width / height | 15000 |
| Threshold | 50 |
| Min ROI area | 150000 |
| Merge margin | 1000 |
| Extra crop margin | 100 |

## Layout

```
pentachrome_plugin/
  pyproject.toml
  README.md
  src/pentachrome_plugin/
    __init__.py
    napari.yaml             # napari manifest
    _session.py             # in-process cross-widget state (extractor -> inference -> analysis)
    _widget.py              # VsiExtractorWidget (Phase 1)
    _vsi_worker.py          # VSI subprocess entrypoint
    _inference_widget.py    # NnUnetInferenceWidget (Phase 2)
    _inference_worker.py    # nnUNet subprocess entrypoint
    _analysis_widget.py     # AnalysisWidget (Phase 3, in-process)
```

Phase 3 (Mask Statistics) lives alongside these and registers through `napari.yaml`.

## Mask Statistics (Phase 3)

Pure in-process; no subprocess needed (no JVM, no torch). Reuses
`EpithelialAnalysis/Analyzers/` (`Descriptors.py`, `Thickness.py`), so the
same metrics that fed the original `region_summary.csv` show up in the
widget.

Workflow:

1. Run Phase 2 first so `epithelium_raw/` and `multistructure_raw/` exist.
2. Open **Plugins -> Mask Statistics**.
3. Select one or more image layers in the list (their names must match the
   mask filenames in `epithelium_raw/` / `multistructure_raw/`; if the
   inference widget staged them, that's already true).
4. Click **Use last inference output** (or browse).
5. Tweak **Pixel size**, **Region dilation**, **Min epithelium area** if
   needed (defaults match `Main.py`).
6. Click **Analyze**.

For each detected epithelial region the widget reports:

| Column | What it is |
| --- | --- |
| Area (mm^2) | Region area after the 50 um dilation |
| Thickness mean/std (um) | Medial-axis thickness of (membrane within eroded region) U goblets U nuclei |
| Elastin / Collagen / Other % | Fraction of stained structure pixels — same definition as `compute_structure_percentages` |
| Mucin % | Mucin pixels as a fraction of the epithelium area (not of total structure pixels) |
| Nuclei / mm^2 and Goblets / mm^2 | Density per mm^2 of epithelium — goblet hyperplasia is a classic COPD readout |
| Nuclei (n), Goblets (n) | Raw counts inside the region |

A bold **(all regions)** row appended per image gives area-weighted means
of the percentages / thickness and totals for the counts. **Export CSV...**
saves the whole table (per-region rows + aggregate rows).

The elastin organization score (`ElastinAnalyzer.determine_organized_region`)
from `Main.py` is intentionally not yet exposed — it's much heavier (skan +
shapely + ROI polygons) and will land as a separate toggle.

### Class isolation

A "Class isolation" group at the top of the widget lets you view a single
class (or a combination) without rerunning anything:

1. Pick a source layer (the **original** TIFF — not a colorized mask).
2. Tick one or more of **Elastin**, **Collagen**, **Nuclei**, **Mucins**,
   **Cell Membrane**, **Goblets**, **Epithelium**.
3. Click one of:
   - **Show as mask** — adds a new layer that's white everywhere except the
     ticked classes, colored with the same palette as the inference widget.
   - **Show on original** — adds a copy of the original image with all
     pixels outside the ticked classes turned white. Useful for sanity-
     checking the segmentation against the stain.
4. **Clear isolated layers** removes everything this panel added in one go.

Masks are read on demand from the inference output folder; the original
layer's pixels are taken from the viewer.

## License

MIT. (Matches `license = {text = "MIT"}` in `pyproject.toml`. Consider adding a
top-level `LICENSE` file with the MIT text so it ships in the sdist/wheel.)
