Metadata-Version: 2.1
Name: vpt_plugin_instanseg
Version: 1.0.0
Summary: Instanseg Plugin for VPT
Home-page: https://github.com/Vizgen/vizgen-postprocessing
License: Apache-2.0
Author: Vizgen
Author-email: techsupport@vizgen.com
Maintainer: Stanislav Lukyanenko
Maintainer-email: stanislav.lukyanenko@vizgen.com
Requires-Python: >=3.9,<3.11
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Dist: einops (>=0.8.1,<0.9.0)
Requires-Dist: fastremap (>=1.17.7,<2.0.0)
Requires-Dist: fill-voids (>=2.1.1,<3.0.0)
Requires-Dist: geojson (>=3.2.0,<4.0.0)
Requires-Dist: matplotlib (<3.10)
Requires-Dist: natsort (>=8.4.0,<9.0.0)
Requires-Dist: numba (>=0.55.2)
Requires-Dist: numpy (>=1.24.3)
Requires-Dist: pandas (>=2.0.3)
Requires-Dist: roifile (==2023.5.12)
Requires-Dist: tiffslide (>=2.5.1,<3.0.0)
Requires-Dist: torch (>=2.0.0,!=2.0.1,!=2.1.0)
Requires-Dist: torchvision (==0.19.1)
Requires-Dist: vpt_core (>=1.1.0)
Project-URL: Documentation, https://vizgen.github.io/vizgen-postprocessing/
Project-URL: Repository, https://github.com/Vizgen/vizgen-postprocessing
Description-Content-Type: text/markdown

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Coverage Status](https://coveralls.io/repos/github/Vizgen/vpt-plugin-instanseg-internal/badge.svg)](https://coveralls.io/github/Vizgen/vpt-plugin-instanseg-internal)

# InstanSeg Plugin for VPT

This Vizgen Post-processing Tool (VPT) plugin enables users to run
[InstanSeg](https://github.com/instanseg/instanseg) cell and nuclei
segmentation on MERSCOPE experiments. VPT is a command line tool that
emphasizes scalable, reproducible analysis, and can be run on a workstation,
a cluster, or be deployed in a cloud computing environment. **Linux is
recommended for production use.**


## Key Features

- Cell and nuclei segmentation using the bundled InstanSeg fluorescence model
- Channel-invariant architecture — works with any combination of fluorescence stains
- Configurable target output: **cell** polygons, **nuclei** polygons, or both masks
  (when `target` is `"all_outputs"` the plugin returns **cell** polygons to VPT)
- Bring your own weights via the `custom_weights` TorchScript path
- Automatic device selection: CUDA > MPS > CPU (not user-configurable;
  determined at runtime by `torch` availability)
- Full VPT integration: tiled execution, polygon generation, overlap resolution, and `.vzg` file update

### Why InstanSeg?

InstanSeg is a lightweight, channel-invariant segmentation model optimized for
speed. Key advantages:

- **Fast inference** — significantly faster than transformer-based models,
  making it well-suited for large-scale production workflows
- **Channel-invariant** — accepts any number of fluorescence channels in any
  order; no need to match training channel configurations
- **Flexible output** — can produce cell masks, nuclei masks, or both

> **Tip:** For best results, provide only the channels that contain useful
> boundary information (e.g. DAPI, membrane stains). Adding channels with no
> cell structure (e.g. gene-specific FISH) may reduce accuracy.


## Requirements

| Requirement | Details |
|-------------|---------|
| **Python** | 3.9 or 3.10 (3.11+ not supported) |
| **OS** | Linux recommended; macOS/Windows may work but are untested |
| **GPU** | NVIDIA GPU (CUDA) or Apple Silicon (MPS) recommended; CPU fallback is slower |
| **Disk** | ~200 MB for bundled model weights |
| **RAM** | 16 GB+ recommended for large tiles |


## Installation

Install VPT (the host CLI) **and** this plugin into the same Python environment:

```bash
pip install vpt
pip install vpt-plugin-instanseg
```

Or install from source with [Poetry](https://python-poetry.org/):

```bash
pip install vpt                    # host CLI — required separately
git clone https://github.com/Vizgen/vpt-plugin-instanseg-internal.git
cd vpt-plugin-instanseg-internal
poetry install                     # installs the plugin + bundled InstanSeg
```

Verify the installation:

```bash
vpt --help
```

### Bundled InstanSeg code

This package includes a subset of the
[instanseg/instanseg](https://github.com/instanseg/instanseg) repository
(inference engine, utility code, and model checkpoints). The bundled code is
licensed under **Apache-2.0**; see [`src/instanseg/LICENSE`](src/instanseg/LICENSE)
for the full text and attribution. Training scripts, data loaders, and export
utilities are excluded from the distributed wheel.


## Model Selection

The recommended model for MERSCOPE experiments is
`fluorescence_nuclei_and_cells` (v0.1.1). It segments both nuclei and cells
from any combination of fluorescence stains (DAPI, PolyT, Cellbound, etc.).
The model is channel-invariant — channel order does not matter, and you can
pass any number of channels as `task_input_data` entries.

**`version`** selects a specific model release. Use an explicit version string
(e.g. `"0.1.1"`) for reproducible pipelines. `"latest"` resolves to the newest
bundled version at runtime.

**`custom_weights`** overrides bundled model resolution entirely. When set, the
plugin loads the TorchScript `.pt` file at that path and ignores `model` and
`version`.


## Usage

VPT runs segmentation from a **segmentation specification JSON** that describes
experiment properties, input channels, model parameters, and output files.
The same spec run against multiple experiments ensures identical, reproducible
processing.

Working specs are provided in
[`example_analysis_algorithm/`](example_analysis_algorithm/):

| File | Dataset | Channels | Notes |
|------|---------|----------|-------|
| [`segmentation_specification.json`](example_analysis_algorithm/segmentation_specification.json) | Generic MERSCOPE | DAPI, PolyT | Default template (pixel_size 0.1) |
| [`U2OS_segmentation_specification.json`](example_analysis_algorithm/U2OS_segmentation_specification.json) | [U2OS_small_set](https://vzg-web-resources.s3.amazonaws.com/202305010900_U2OS_small_set_VMSC00000.zip) | DAPI, PolyT, Cellbound1 | 3-channel, 7 z-levels, pixel_size 0.108 |

### Quick start

```bash
vpt --verbose run-segmentation \
  --segmentation-algorithm path/to/instanseg_spec.json \
  --input-images 'path/to/images/mosaic_(?P<stain>[\w|-]+)_z(?P<z>[0-9]+).tif' \
  --input-micron-to-mosaic path/to/micron_to_mosaic_pixel_transform.csv \
  --output-path output_dir/ \
  --tile-size 2400 \
  --tile-overlap 200
```

### Example: U2OS small dataset

The [U2OS_small_set](https://vzg-web-resources.s3.amazonaws.com/202305010900_U2OS_small_set_VMSC00000.zip)
is a public MERSCOPE dataset (3953 x 3960 px, 5 stains, 7 z-levels) suitable
for end-to-end verification. The example spec selects three of the five stains
(DAPI, PolyT, Cellbound1) and segments z-layer 3:

```bash
# Download and extract (~960 MB)
wget -q https://vzg-web-resources.s3.amazonaws.com/202305010900_U2OS_small_set_VMSC00000.zip
unzip -q 202305010900_U2OS_small_set_VMSC00000.zip

# Run segmentation
vpt --verbose run-segmentation \
  --segmentation-algorithm example_analysis_algorithm/U2OS_segmentation_specification.json \
  --input-images '202305010900_U2OS_small_set_VMSC00000/images/mosaic_(?P<stain>[\w|-]+)_z(?P<z>[0-9]+).tif' \
  --input-micron-to-mosaic 202305010900_U2OS_small_set_VMSC00000/images/micron_to_mosaic_pixel_transform.csv \
  --output-path u2os_output/ \
  --tile-size 2400 --tile-overlap 200
```

**Expected results:** ~860 cells across 4 tiles. After completion the output
directory contains:

| File | Description |
|------|-------------|
| `result_tiles/` | Per-tile intermediate segmentation results |
| `instanseg_mosaic_space.parquet` | Cell boundary polygons in **mosaic pixel** coordinates |
| `instanseg_micron_space.parquet` | Cell boundary polygons in **micron** coordinates (transformed via `micron_to_mosaic_pixel_transform.csv`) |
| `instanseg_cell_metadata.csv` | Per-cell geometric metadata (area, centroid, bounding box) |


### Segmentation specification anatomy

A spec JSON has four top-level sections. The plugin only reads the
**bold** keys; the rest are consumed by VPT itself. For a complete description
of VPT-level parameters (`task_input_data`, `polygon_parameters`,
`segmentation_task_fusion`, `output_files`), see the
[VPT User Guide](https://vizgen.github.io/vizgen-postprocessing/).

```jsonc
{
  "experiment_properties": {            // VPT: z-index and z-position metadata
    "all_z_indexes": [0, 1, 2, ...],
    "z_positions_um": [1.5, 3.0, ...]
  },
  "segmentation_tasks": [
    {
      "task_id": 0,
      "segmentation_family": "InstanSeg",  // VPT: selects this plugin
      "entity_types_detected": ["cell"],   // VPT: output entity type
      "z_layers": [3],                     // VPT: which z-planes to segment
      "segmentation_properties": {         // ** InstanSeg plugin **
        "model": "fluorescence_nuclei_and_cells",
        "model_dimensions": "2D",
        "version": "0.1.1",
        "custom_weights": null
      },
      "task_input_data": [ ... ],          // VPT: channel selection + preprocessing
      "segmentation_parameters": {         // ** InstanSeg plugin **
        "pixel_size": 0.1,
        "normalise": true,
        "target": "all_outputs",
        "rescale_output": true
      },
      "polygon_parameters": { ... }        // VPT: mask-to-polygon conversion
    }
  ],
  "segmentation_task_fusion": { ... },  // VPT: multi-task entity fusion
  "output_files": [ ... ]              // VPT: output file paths
}
```

#### `segmentation_properties`

| Key | Type | Required | Description |
|-----|------|----------|-------------|
| `model` | `str` | **yes** | Bundled model name (see [Model Selection](#model-selection)) |
| `model_dimensions` | `str` | **yes** | Must be `"2D"` |
| `version` | `str` | **yes** | Model version (e.g. `"0.1.1"`) or `"latest"` |
| `custom_weights` | `str` or `null` | no | Path to a TorchScript `.pt` file. When set, overrides `model`/`version` |

#### `segmentation_parameters`

| Key | Type | Required | Default | Description |
|-----|------|----------|---------|-------------|
| `pixel_size` | `float` | no | `0.1` | Image pixel size in microns. Must be > 0 |
| `normalise` | `bool` | no | `true` | Percentile-normalize input before inference |
| `target` | `str` | no | `"all_outputs"` | `"all_outputs"` (cell polygons), `"nuclei"`, or `"cells"` |
| `rescale_output` | `bool` | no | `true` | Rescale output masks to input coordinate space |

> **Note on `target`:** When `target` is `"all_outputs"` the model produces
> both a nuclei and a cells mask, but the plugin returns only the **cells**
> layer to VPT for polygon generation. Use `"nuclei"` if you need nucleus
> boundaries instead.

### Validation

The plugin validates the spec at task load time and will **fail fast** with a
`ValueError` for any of:

- Missing or empty `model`
- `model_dimensions` other than `"2D"`
- `custom_weights` path that does not exist on disk
- `pixel_size` that is not a positive number
- `target` not in `{"all_outputs", "nuclei", "cells"}`
- Any unrecognized key in `segmentation_properties` or `segmentation_parameters`


## Limitations

- **2D only.** `model_dimensions` must be `"2D"`. The plugin processes each
  z-plane independently; there is no 3D volumetric segmentation.
- **Single entity type per run.** Although the model can produce both nuclei
  and cell masks, the plugin returns only one mask layer to VPT per task.
  Use `target` to choose which.
- **Device selection is automatic.** The plugin picks CUDA > MPS > CPU based
  on `torch` runtime availability. There is no configuration parameter to
  force a specific device.


## Troubleshooting

### CUDA out of memory

Reduce the tile size to lower GPU memory usage:

```bash
vpt run-segmentation ... --tile-size 1200 --tile-overlap 100
```

### Segmentation is very slow

1. **Confirm GPU is detected** — Check verbose output for device selection.
   If running on CPU, verify your CUDA or MPS installation.
2. **Check tile size** — Very large tiles (>4000 px) can slow down processing.

### No cells detected / poor segmentation

- Verify `pixel_size` matches your image resolution (MERSCOPE default: ~0.108 µm/px)
- Ensure input channels contain visible cell boundaries (DAPI, membrane stains)
- Avoid including channels with no structural information

### Custom weights fail to load

- Ensure the file is a valid TorchScript `.pt` file (not a raw PyTorch checkpoint)
- Verify the path is absolute or correct relative to the working directory


## Documentation

- [VPT User Guide](https://vizgen.github.io/vizgen-postprocessing/)
- [InstanSeg upstream repository](https://github.com/instanseg/instanseg)

### Citation

> Goldsborough, T. et al. (2024) 'A novel channel invariant architecture for
> the segmentation of cells and nuclei in multiplexed images using InstanSeg'.
> *bioRxiv*, p. 2024.09.04.611150.
> doi: [10.1101/2024.09.04.611150](https://doi.org/10.1101/2024.09.04.611150)


## Feedback & Support

For bugs or feature requests **related to this plugin or VPT**, please
[open an issue](https://github.com/Vizgen/vizgen-postprocessing/issues) on the
VPT repository (covers all Vizgen VPT plugins). Include:

- A quick issue summary
- Steps to reproduce
- The exception / traceback, if applicable

For other questions, contact your regional Vizgen field application scientist
and CC Vizgen Tech Support at techsupport@vizgen.com (include "VPT" in the
subject line).


## Contributing & Code of Conduct

We welcome code contributions! Please refer to the [contribution guide](CONTRIBUTING.md) and
[code of conduct](CODE_OF_CONDUCT.md) before getting started.


## Authors

- [Vizgen](https://vizgen.com/)

![Logo](https://vizgen.com/wp-content/uploads/2022/12/Vizgen-Logo_Vizgen-BlackColor-.png)


## License

Copyright 2024 Vizgen, Inc. Licensed under the
[Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
See [LICENSE](LICENSE) for the full text.

The bundled InstanSeg code (`src/instanseg/`) is separately licensed under
Apache-2.0 by the InstanSeg authors. See [`src/instanseg/LICENSE`](src/instanseg/LICENSE).

