Metadata-Version: 2.4
Name: prism-de
Version: 0.2.0
Summary: PRISM: Probabilistic Inference of Subject-level Mixture for contextualized differential expression
Author-email: Andrea Rubbi <ar2232@cam.ac.uk>, Ben Lengerich <blengeri@mit.edu>
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.2
Requires-Dist: numpy>=1.26
Requires-Dist: scipy>=1.12
Requires-Dist: pandas>=2.1
Requires-Dist: scanpy>=1.10
Requires-Dist: anndata>=0.10
Requires-Dist: matplotlib>=3.8
Requires-Dist: seaborn>=0.13
Requires-Dist: statsmodels>=0.14
Requires-Dist: scikit-learn>=1.4
Requires-Dist: rich
Requires-Dist: pyyaml>=6.0
Provides-Extra: full
Requires-Dist: lightning>=2.2; extra == "full"
Requires-Dist: wandb; extra == "full"
Requires-Dist: gseapy>=1.1; extra == "full"
Requires-Dist: Py-BOBYQA>=1.4; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: torch-tb-profiler; extra == "dev"

# PRISM

**Phenotype-Resolved Inference in Single-Cell Mixed Models via Latent Disease States and Contextualized Differential Expression**

<p align="center">
  <video src="media/prism_logo.mp4" autoplay loop muted playsinline width="640"></video>
</p>

<details open>
<summary><strong>▶ Full model explainer (1080p, ~1 min)</strong></summary>
<p align="center">
  <video src="media/prism_explainer.mp4" controls width="720"></video>
</p>
</details>

## Overview

PRISM extends the NEBULA negative-binomial log-normal mixed model for
single-cell differential expression by introducing:

1. **Latent per-cell disease states** $d_{ij} \sim \text{Bernoulli}(\rho_i)$ that
   separate truly affected cells from unaffected ones within disease subjects.
2. **Contextualized covariate effects** $x^\top \Gamma_g z$ that model how context
   modulates all covariate effects (a main effect of context on expression).
3. **Context-dependent disease effects** $\Delta_g(z) = \alpha_g + \theta_g^\top z$ that let
   the DE magnitude vary with continuous cell-level covariates ($\alpha_g$ = constant disease effect, $\theta_g$ = context modulation).
4. An **EM algorithm** that alternates between inferring $q_{ij} = P(d_{ij}=1 \mid Y, \Theta)$
   (E-step) and optimizing the NEBULA-LN approximate likelihood weighted by $q_{ij}$ (M-step).

## Installation

### Option A — conda (recommended)

```bash
git clone https://github.com/AndreaRubbi/ContextualizedDifferentialExpression.git && cd ContextualizedDifferentialExpression
conda env create -f environment.yml
conda activate prism
pip install -e .
```

### Option B — pip + venv

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# If CUDA 12.4 driver, pin a compatible torch build:
pip install torch==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124

# Required for HVG selection (seurat_v3 flavor):
pip install scikit-misc
```

### Verify

```bash
python -c "
import torch, prism
print(f'PRISM v{prism.__version__}')
print(f'PyTorch {torch.__version__}, CUDA available: {torch.cuda.is_available()}')
"
```

## Quick start

PRISM auto-builds a two-stage q-prior on disease biology by default — no
configuration needed. On real cohorts this anchors the EM to the disease
axis and avoids the latent-axis identifiability problem (Discussion §latent).

```python
from prism import PRISMConfig, PRISMTrainer, PrismData
from prism.data.simulation import generate_prism_data

data, _ = generate_prism_data(n_subjects=50, n_genes=60, n_cells_per_subject=200,
                              rho=0.6, seed=42, device="cpu")
trainer = PRISMTrainer(PRISMConfig(
    n_genes=data.n_genes, n_covars=data.n_covars, n_context=data.n_context,
    max_em_iter=30, run_wald=True, device="cpu",
))                                  # auto_q_prior=True is the default
results = trainer.fit(data)
print("general DE (FDR<0.05):", int((results.q_values_de < 0.05).sum()))
print("context DE (FDR<0.05):", int((results.q_values_context.min(dim=1).values < 0.05).sum()))
```

### Anchoring strategies for real data

- **Curated marker prior** (best when biology is known):
  ```python
  from prism import marker_prior, DAM_MICROGLIA_UP, HOMEOSTATIC_MICROGLIA
  import torch
  prior = marker_prior(adata, up_markers=DAM_MICROGLIA_UP,
                       down_markers=HOMEOSTATIC_MICROGLIA, condition_col="disease")
  cfg = PRISMConfig(..., q_prior=torch.from_numpy(prior), q_prior_weight=0.9)
  ```
- **Two-stage prior** (data-adaptive; this is what `auto_q_prior=True` builds for you).
- **Safe fallback PRISM-C** (`fix_q=True`): pins q to the donor label, loses
  per-cell q_ij and context-DE θ_g(z) but matches bulk DE most closely.

## Project layout

```
prism/
├── prism/
│   ├── model/           # PrismModel, encoders, NB/HL likelihoods
│   ├── inference/       # EM (e_step, m_step, em_loop), Wald & score tests
│   ├── data/            # PrismData, simulation, ROSMAP preprocessing
│   ├── baselines/       # NEBULA, context-only, latent-only, stratified
│   ├── evaluation/      # Metrics and plotting
│   └── utils/           # Numerical helpers, logging
├── tests/               # Unit and integration tests
└── tutorials/           # Notebook tutorials (API, synthetic, ROSMAP)
```

## Tutorials

* [tutorials/getting_started.ipynb](tutorials/getting_started.ipynb) — end-to-end
  API walkthrough on simulated data (fit, test, interpret).
* [tutorials/synthetic_data_tutorial.ipynb](tutorials/synthetic_data_tutorial.ipynb) —
  generate data, sweep parameters, plot FPR/power.
* [tutorials/rosmap_real_data.ipynb](tutorials/rosmap_real_data.ipynb) — applying
  PRISM to ROSMAP microglia.
* [docs/TUTORIAL.md](docs/TUTORIAL.md) — installation + CLI reference.

## Reproducing the paper

All scripts, pipelines and notebooks that reproduce the paper's results live
in the top-level `reproducibility/` folder (Snakemake ablations, NEBULA-style
benchmark, external baselines, ROSMAP and COVID real-data analyses). See
[../reproducibility/README.md](../reproducibility/README.md).

## Citation

```
@article{prism2026,
  title={PRISM: Phenotype-Resolved Inference in Single-Cell Mixed Models
         via Latent Disease States and Contextualized Differential Expression},
  author={Anonymous Authors},
  journal={Under review},
  year={2026}
}
```

## Regenerating the animations

The animated visuals are built with [Manim Community](https://www.manim.community/).

```bash
pip install manim
cd media

# Standalone logo (1080p MP4)
manim -qh prism_explainer.py PRISMLogo

# Full explainer (1080p MP4)
manim -qh prism_explainer.py PRISMExplainer

# Full explainer (720p GIF — large file)
manim -qm --format=gif prism_explainer.py PRISMExplainer
```
