Metadata-Version: 2.4
Name: storm-omics
Version: 1.1.0
Summary: Statistical Test for spatial patterns using k-nearest neighbors (STORM)
Author: Yiqing Wang, Jinpu Li
Author-email: yqw@wangemail.com, lijinp@health.missouri.edu
License: GPL-3.0-only
Project-URL: Equivalent R implementation, https://github.com/CastleLi/STORM
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.4
Requires-Dist: pandas>=1.5
Requires-Dist: scipy>=1.10.1
Requires-Dist: scikit-learn>=1.3.2
Provides-Extra: gpu
Requires-Dist: torch>=2.0; extra == "gpu"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# STORM-OMICS

## A Principled Statistical Framework for Analyzing Spatial Patterns in Spatially Resolved Multi-Omics

STORM is a principled statistical method for identifying, quantifying, and
comparing spatial patterns in spatially resolved multi-omics data. Unlike
conventional approaches that focus solely on statistical significance, STORM
provides a unified framework for:

- Identifying spatially variable features (SVFs)
- Quantifying the magnitude of spatial dependency through interpretable
  effect sizes
- Comparing spatial patterns across biological conditions
- Performing statistical power analysis and sample size determination for
  spatial studies

The framework is designed to support both single-sample and multi-sample
analyses, enabling rigorous statistical inference in modern spatial
transcriptomics and spatial multi-omics experiments.

This repository provides a standalone Python package equivalent to
[CastleLi/STORM](https://github.com/CastleLi/STORM).

## Features

- Approximate and finite-sample STORM tests
- Dense NumPy, pandas DataFrame, and SciPy sparse expression matrices
- Batched CPU execution that preserves sparse inputs
- Optional batched PyTorch CUDA execution
- Reusable immutable exact k-NN graphs for repeated analyses
- Treatment-versus-control and treatment-versus-treatment comparisons
- Power analysis for all three tests

## Installation

The PyPI distribution name is `storm-omics`; the Python import
is `storm_omics`. Python identifiers cannot contain hyphens. The shorter
`storm` distribution name belongs to an unrelated
Canonical ORM package. Install STORM-OMICS from PyPI:

```bash
python -m pip install storm-omics
```

### GPU support

Install a CUDA-enabled PyTorch build appropriate for your operating system,
driver, and CUDA runtime using the
[official PyTorch installer](https://pytorch.org/get-started/locally/), then
install STORM:

```bash
python -m pip install "storm-omics[gpu]"
```

GPU execution is opt-in with `use_gpu=True`. If PyTorch or CUDA is not
available, STORM uses the CPU. If a CUDA operation fails, STORM emits a
`RuntimeWarning` and recomputes on the CPU.

## Spatial Pattern Test

The expression matrix must have shape `M x N`: spots in rows and features in
columns. Coordinates must have shape `M x D`.

```python
import pandas as pd
import storm_omics

data = pd.read_csv("spatial_data.csv")
coords = data[["x", "y", "z"]].to_numpy()
expression = data.drop(columns=["x", "y", "z"])

result = storm_omics.storm(
    coords,
    expression,
    k_nn=50,
    approx=True,
    use_gpu=False,
)
```

`storm` returns a DataFrame with:

- `gene_names`: DataFrame column names, or generated names for array inputs
- `p_values`: two-sided p-values for spatial dependency
- `effect_size`: spatial effect size, `1 - S2 / S0`

Example output:

| gene_names | p_values | effect_size |
| --- | ---: | ---: |
| `GeneA` | 0.0001 | 0.065 |
| `GeneB` | 0.0123 | 0.041 |

`k_nn` defaults to 50 for the approximation and 100 for the finite-sample
calculation. It is capped at `M - 1`. Constant features receive a neutral
p-value of 1 and effect size of 0.

To use CUDA:

```python
result_gpu = storm_omics.storm(coords, expression, k_nn=50, use_gpu=True)
```

On the CPU, sparse inputs remain sparse throughout each feature batch. CUDA
uses a sparse neighbor graph but transfers dense feature batches, with the
batch size chosen from currently available VRAM. Large matrices remain
float32; column reductions use bounded partial sums accumulated in float64 for
CPU/GPU agreement without full-size float64 GPU copies.

### Reference performance

Measured June 26, 2026 with Python 3.13, PyTorch 2.10.0, CUDA 12.8, an NVIDIA
GeForce RTX 5070 Ti (15.9 GB), `k_nn=50`, `approx=True`, and the included real
2,308-spot by 10,000-feature dataset tiled with small coordinate jitter:

| Copies | Spots | CPU | GPU | Speedup | CPU-run peak | GPU-run host peak | GPU peak VRAM |
| ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| 1x | 2,308 | 0.197 s | 0.091 s | 2.17x | 164 MB | 154 MB | 193 MB |
| 2x | 4,616 | 0.393 s | 0.156 s | 2.51x | 242 MB | 244 MB | 259 MB |
| 3x | 6,924 | 0.584 s | 0.238 s | 2.46x | 335 MB | 334 MB | 260 MB |
| 4x | 9,232 | 0.759 s | 0.285 s | 2.66x | 445 MB | 440 MB | 262 MB |

All four CPU/GPU comparisons had identical `p < 0.05` calls and maximum
p-value and effect-size differences below `5e-5`. Host peak allocations are
measured inside `storm` and exclude the already-loaded input DataFrame and
interpreter. GPU memory is PyTorch peak allocated VRAM. Timings are median of
five; memory-mode timings include tracing overhead and are therefore not shown
here.

### Reusing the exact neighbor graph

When several expression matrices share identical coordinates and neighbor
count, prepare the graph once:

```python
graph = storm_omics.prepare_storm_graph(coords, k_nn=50)

result_a = storm_omics.storm(coords, expression_a, graph=graph)
result_b = storm_omics.storm(coords, expression_b, graph=graph, use_gpu=True)
```

`prepare_storm_graph` runs the same exact directed k-NN construction as the
ordinary call. The graph is immutable, `storm` verifies that its coordinates
and neighbor count match, and exact-mode's graph-only scaling constant is
cached lazily. Reuse therefore changes only setup cost, not any statistic.

On the 4x real dataset, median repeated-call time decreased from 0.759 s to
0.702 s on CPU and from 0.285 s to 0.230 s on GPU. Cold graph preparation took
about 0.2 s, so it paid for itself after roughly four analyses on either path
in this environment.

## Group Comparisons

### Treatment versus control

`stormtrt` implements a one-sided pooled two-proportion test with a default
per-sample significance threshold of 0.05.

```python
import pandas as pd
import storm_omics

data = pd.DataFrame({
    "Group": ["Control"] * 4 + ["Treatment"] * 4,
    "Pvalue": [0.40, 0.20, 0.01, 0.30, 0.01, 0.02, 0.03, 0.20],
})

comparison = storm_omics.stormtrt(data, control="Control", sig_level=0.05)
```

### Two treatment groups

```python
data = pd.DataFrame({
    "Group": ["A"] * 3 + ["B"] * 3,
    "EffectSize": [0.10, 0.11, 0.09, 0.02, 0.03, 0.01],
    "M": [3000] * 6,
    "K": [50] * 6,
})

comparison = storm_omics.storm2trt(data, alternative="two.sided")
```

## Power Analysis

Exactly one parameter described as unknown must be `None`.

```python
import storm_omics

# Single-sample STORM: solve for number of spots.
single = storm_omics.power_storm(
    es=0.06, n=None, power=0.80, sig_level=0.05
)

# Treatment versus control: solve for samples per control group.
control = storm_omics.powertrt(
    nsample=None,
    power=0.90,
    sig_level=0.05,
    ratio=1,
    power_single=0.80,
    sig_level_single=0.05,
)

# Two treatments: solve for group-1 sample size.
two_treatments = storm_omics.power2trt(
    delta=0.04,
    phi=4 / (50 * 3000),
    psi1=0.010,
    psi2=0.005,
    nsample=None,
    ratio=2,
    sig_level=0.05,
    power=0.80,
)
```

Power calculations return continuous sample-size estimates. Round up before
using them as study sizes.

## Tests and Benchmarks

```bash
python -m pytest
python test/benchmark_cpu_gpu.py --multipliers 1 2 3 4 --reps 5
python test/benchmark_cpu_gpu.py --multipliers 4 --reps 5 --reuse-graph
python test/benchmark_memory.py --multipliers 1 2 3 4
```

Benchmark results depend on data density, feature count, neighbor count, CPU,
GPU, CUDA/PyTorch versions, and available memory. Run the included scripts on
the target system rather than relying on timings from another machine.

## Upstream Reproduction

The upstream repository contains the manuscript reproduction scripts for data
processing, simulations, benchmark analyses, figures, and supplementary
analyses in its
[`Reproduction`](https://github.com/CastleLi/STORM/tree/main/Reproduction)
directory. Those R and manuscript-specific scripts are not duplicated in this
Python distribution.

## Citation

When using this implementation, cite the STORM method and the upstream
[CastleLi/STORM](https://github.com/CastleLi/STORM) software. The upstream
manuscript citation is still listed as pending and should be added here once
its final bibliographic record is available.

STORM authors: Jinpu Li
([ORCID 0000-0002-6656-2896](https://orcid.org/0000-0002-6656-2896)) and
Yiqing Wang.
