Metadata-Version: 2.2
Name: multigedi
Version: 1.1.0
Summary: Multi-modal Gene Expression Decomposition for Integration (CPU; GPU backend optional) - scverse-compliant multi-omics integration
Keywords: single-cell,multi-omics,integration,batch-correction,scverse
Author-Email: Arsham Mikaeili Namini <arsham.mikaeilinamini@mail.mcgill.ca>, "Hamed S. Najafabadi" <hamed.najafabadi@mcgill.ca>
Maintainer-Email: Arsham Mikaeili Namini <arsham.mikaeilinamini@mail.mcgill.ca>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: C++
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Project-URL: Homepage, https://github.com/csglab/multigedi
Project-URL: Documentation, https://csglab.github.io/multigedi
Project-URL: Repository, https://github.com/csglab/multigedi
Project-URL: Issues, https://github.com/csglab/multigedi/issues
Requires-Python: >=3.10
Requires-Dist: anndata>=0.10
Requires-Dist: mudata>=0.3
Requires-Dist: numpy>=2.0
Requires-Dist: scipy>=1.13
Requires-Dist: pandas>=2.2
Requires-Dist: scikit-learn>=1.4
Requires-Dist: matplotlib>=3.9
Requires-Dist: h5py>=3.10
Provides-Extra: umap
Requires-Dist: umap-learn>=0.5; extra == "umap"
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: dev
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0; extra == "docs"
Requires-Dist: furo; extra == "docs"
Requires-Dist: myst-parser; extra == "docs"
Requires-Dist: sphinx-copybutton; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints; extra == "docs"
Description-Content-Type: text/markdown

# multigedi

[![CI](https://github.com/Arshammik/multigedi/actions/workflows/test.yml/badge.svg)](https://github.com/Arshammik/multigedi/actions/workflows/test.yml)
[![Docs](https://github.com/Arshammik/multigedi/actions/workflows/docs.yml/badge.svg)](https://github.com/Arshammik/multigedi/actions/workflows/docs.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python 3.10-3.12](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue)](https://www.python.org/)
[![PyPI](https://img.shields.io/pypi/v/multigedi.svg)](https://pypi.org/project/multigedi/)

Multi-modal joint factor analysis for single-cell data — Python port of the R
[`multigedi`](https://github.com/csglab/multigedi) package, with an optional
CUDA backend for accelerated training. scverse-compliant API.

> **Status:** alpha. The API may change. Bit-identical numerical outputs to
> the R reference are validated by `tests/test_cpu_vs_r.py` (relative
> diff ~1e-12 after 20 iterations).

## What it does

`multigedi` jointly factorizes several modalities of single-cell data into a
shared latent space (`K`-dimensional cell embeddings) plus modality-specific
metagenes. The hub-and-spoke MultiGEDI model handles three observation types:

- `M` — count matrix (e.g. gene expression)
- `M_paired` — paired counts (e.g. spliced / unspliced for RNA velocity)
- `X` — binary indicator matrix (e.g. cluster / chromatin state)

Output: a joint PCA in `mdata[<first_modality>].obsm["X_multigedi_pca"]` that
plugs into the standard scanpy workflow (`sc.pp.neighbors`, `sc.tl.umap`,
`sc.tl.leiden`).

## Install

### CPU only

Requires Python ≥3.10, a C++14 compiler, [Eigen3](https://eigen.tuxfamily.org)
≥3.3, and pybind11 (auto-pulled by the build).

```bash
# On HPC with module system:
module load eigen/3.4.0     # or set CMAKE_PREFIX_PATH to your Eigen install

pip install -e .
```

The build uses [scikit-build-core](https://github.com/scikit-build/scikit-build-core);
no separate `setup.py` step needed.

### GPU (optional)

The CUDA backend (`libmultigedi_gpu.so`) is opt-in at install time.
Requires CUDA ≥11.8 and an NVCC toolchain on the build machine.

**Recommended (bundled in the wheel):**

```bash
pip install --config-settings=cmake.define.MULTIGEDI_BUILD_GPU=ON .
```

The CMake option `MULTIGEDI_BUILD_GPU=ON` triggers a build of
`libmultigedi_gpu.so` and bundles it inside the wheel at
`<site-packages>/multigedi/_gpu/libmultigedi_gpu.so` with
`INSTALL_RPATH "$ORIGIN"`. The Python wrapper (`_ctypes_api._find_lib`)
locates it automatically — no `MULTIGEDI_GPU_LIB` env var needed.

**Legacy (standalone build, `.so` outside the wheel):**

```bash
cd src/_multigedi_gpu
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j 4
export MULTIGEDI_GPU_LIB=$(pwd)/build/libmultigedi_gpu.so
```

This path is still supported for HPC sites that ship a custom build
alongside a stock CPU-only wheel — set `MULTIGEDI_GPU_LIB` to override
the wheel-bundled `.so`.

The library auto-detects up to 16 modalities of arbitrary `obs_type` mix
at runtime. Tested on NVIDIA H100; should work on any compute capability ≥7.0.

### Production deployment

The default build targets a fatbin spanning Volta through Hopper so a
single `libmultigedi_gpu.so` runs on the common production fleet:

| Compute capability | Generation | Examples                  |
|--------------------|------------|---------------------------|
| `sm_70`            | Volta      | V100                      |
| `sm_75`            | Turing     | T4, RTX 20-series         |
| `sm_80`            | Ampere     | A100                      |
| `sm_86`            | Ampere     | A40, RTX 30-series        |
| `sm_90`            | Hopper     | H100                      |

CUDA toolkit ≥ 12.0 is recommended; 11.8 is the floor. The build
matrix is overridable: pass `-DCMAKE_CUDA_ARCHITECTURES="<arches>"`
at configure time to trim the fatbin (e.g. `"90"` for H100-only CI,
or `"native"` to autodetect the local device).

For HPC deployments where the same `.so` is shared across login and
compute nodes, set `MULTIGEDI_GPU_LIB=/path/to/libmultigedi_gpu.so`
in the user's environment instead of relying on `LD_LIBRARY_PATH` —
the Python wrapper checks this env var first (see
`src/multigedi/_gpu/_ctypes_api.py::_find_lib` for the full search
order).

Stay on the default `gpu_low_memory=True` (arena mode) for any
production run with more than two modalities; the legacy
high-memory path is unsupported there. See
[`docs/architecture/gpu_backend.md`](docs/architecture/gpu_backend.md)
for the v2 metadata layout, MAX_MODALITIES bump procedure, and the
full list of known limitations.

## Quickstart

```python
import mudata as md
import multigedi as gd

mdata = md.read_h5mu("your_data.h5mu")  # 'sample' column required in .obs

gd.tl.multigedi(
    mdata,
    modalities={
        "gene":     {"obs_type": "M",      "orthoZ": True},
        "splicing": {"obs_type": "M_list", "orthoZ": False, "layers": (None, "M2")},
    },
    sample_key="sample",
    K=20,
    max_iterations=30,
    use_gpu=False,        # set True to use libmultigedi_gpu.so
)

# Joint embedding (cells × K) — feed to scanpy
import scanpy as sc
sc.pp.neighbors(mdata["gene"], use_rep="X_multigedi_pca")
sc.tl.umap(mdata["gene"])
```

Full results land in `mdata["gene"].uns["multigedi"]["model"]` (per-modality
`Z`, `D`, `Bi`, `sigma2`, tracking) and the joint PCA in
`mdata["gene"].obsm["X_multigedi_pca"]`.

## Tutorial

See [`notebooks/multigedi_tutorial.ipynb`](notebooks/multigedi_tutorial.ipynb)
for a runnable end-to-end walkthrough on bundled 5K-cell test data: load HDF5
counts → assemble MuData → fit on CPU → fit on GPU → UMAP visualization.

## Reproducibility & relationship to R `multigedi`

`multigedi`'s BCD optimizer is mathematically identical to the R reference's
— given the **same** `iter_0` state, both produce bit-identical (≤1e-12 relative)
output after any number of iterations. The `tests/test_cpu_vs_r.py` regression
check loads R's exported `iter_0` directly into Python via the
`add_modality(init_state=...)` path and verifies this at machine precision.

**Cross-backend independent runs do not converge to the same factorization.**
The randomized SVD initializer draws its random projection matrix from a
language-specific RNG (R's Mersenne-Twister vs numpy's PCG64). Different draws
land BCD into different deep local minima of the non-convex objective. After
50 iterations on the bundled 5K-cell dataset:

- Both backends reach virtually identical sigma² (R 0.366 / Py 0.370 for gene)
- Both backends are *internally* converged (col-correlation 0.86–0.92 between
  iter_20 and iter_50 within each)
- But cross-backend Z subspaces remain ~89° apart, and pairwise cell distances
  in the joint embedding correlate at only ~0.03 between R and Python

Implication for users: **pick one tool per dataset.** Don't mix R-trained and
Python-trained `multigedi` outputs in the same downstream analysis. Within a
single backend, identical `seed` parameter produces identical results. If you
need bit-exact agreement with a specific R-`multigedi` reference run, export
its `iter_0` and load it into Python via
`MultiGEDIModel.add_modality(..., init_state=ref_iter_0)`.

## Tests

```bash
# CPU bit-identicality vs R reference (~1 minute)
python tests/test_cpu_vs_r.py

# GPU end-to-end (needs MULTIGEDI_GPU_LIB and a GPU device)
python tests/test_gpu_e2e.py

# GPU bit-identicality vs R at iter_10 (needs the v2 R fixture)
# Maintainer-only: regenerates the fixture (stored in tests/data/); ordinary contributors skip this.
python scripts/maintainer/migrate_tiny_1iter_to_v2.py
python tests/test_gpu_vs_r.py
```

## Layout

```
multigedi/
├── pyproject.toml                     scikit-build-core + pybind11
├── CMakeLists.txt                     top-level CPU build
├── src/
│   ├── multigedi/                   pure-Python API
│   │   ├── _core/                     MultiGEDIModel class
│   │   ├── _gpu/                      ctypes wrapper for libmultigedi_gpu
│   │   ├── tools/                     tl.multigedi(), tl.pca, tl.umap, ...
│   │   ├── plotting/                  pl.*
│   │   └── preprocessing/             pp.*
│   ├── _multigedi_cpp/              CPU C++ extension (pybind11, Eigen, OpenMP)
│   └── _multigedi_gpu/              CUDA library (separate CMake build)
├── notebooks/                         end-user tutorials
├── tests/                             pytest-style + R-parity tests
├── scripts/                           data prep & one-off utilities
└── benchmarks/                        performance harness
```

## Credits

Algorithm and reference R implementation by the
[`multigedi`](https://github.com/csglab/multigedi) authors. The CUDA backend
was originally a standalone library (`cumultigedi`) and is folded in here
under a single coherent name. Python port and packaging maintained in this
repository.

## License

MIT — see [LICENSE](LICENSE).
