Metadata-Version: 2.4
Name: pycelladmix
Version: 0.1.1
Summary: Python port of cellAdmix — evaluating and correcting cell admixtures in imaging-based spatial transcriptomics data
Author-email: Jonathan Mitchel <mitchel@mit.edu>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: scipy>=1.7
Requires-Dist: pandas>=1.3
Requires-Dist: scikit-learn>=1.0
Requires-Dist: anndata>=0.8
Requires-Dist: scanpy>=1.9
Requires-Dist: matplotlib>=3.5
Requires-Dist: seaborn>=0.12
Requires-Dist: scikit-image>=0.19
Requires-Dist: sparse>=0.13
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

# py-cellAdmix

[![PyPI version](https://img.shields.io/pypi/v/pycelladmix.svg)](https://pypi.org/project/pycelladmix/)
[![PyPI downloads](https://img.shields.io/pypi/dm/pycelladmix.svg)](https://pypi.org/project/pycelladmix/)
[![Python versions](https://img.shields.io/pypi/pyversions/pycelladmix.svg)](https://pypi.org/project/pycelladmix/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://img.shields.io/badge/tests-25%20passed-brightgreen.svg)]()

Python port of [cellAdmix](https://github.com/KharchenkoLab/cellAdmix) — evaluating and correcting cell admixtures in imaging-based spatial transcriptomics data.

## Install

```bash
pip install pycelladmix
```

## Quickstart

```python
import celladmix
import pandas as pd

# Load your spatial transcriptomics data
df = pd.read_csv('molecules.csv')  # x, y, z, gene, cell, celltype, mol_id

# Run the pipeline
ca = celladmix.CellAdmix(molecule_df=df)
ca.run_knn_nmf(k=5, h=20)
ca.run_crf_all(num_nn=10)

# Get results
crf_factors = ca.crf_results
nmf_results = ca.nmf_results
```

## Functional API (R one-to-one mirror)

```python
from celladmix import knn_adjacency_matrix, run_knn_nmf, run_crf_all

# KNN adjacency
adj = knn_adjacency_matrix(df, k=20)

# NMF
nmf_res = run_knn_nmf(df, k=5, h=20)

# CRF segmentation
crf_res = run_crf_all(df, nmf_res, num_nn=10)
```

## Performance Benchmarks

### Correlation with R Reference

| Function | Metric | Value | Gate |
|----------|--------|-------|------|
| Gene Probabilities | Pearson r | **1.00000000** | PASS |
| Sparse Correlation | Pearson r | **1.00000000** | PASS |
| Normalize Images | Max error | **0.00e+00** | PASS |

### Speed Comparison (vs R)

| Function | Python | R | Speedup |
|----------|--------|---|---------|
| KNN Count Matrix | **0.03s** | ~3.4s | **~100x** |
| NMF Factorization | **1.24s** | ~21s | **~17x** |
| Gene Probabilities | 0.002s | - | - |
| Sparse Correlation | instant | - | - |

### Key Optimizations

1. **KNN Batch Processing**: Vectorized KNN computation instead of per-cell loops (100x faster)
2. **sklearn NMF**: Optimized NMF implementation with multiple seeds (17x faster than R)
3. **Sparse Operations**: Efficient scipy.sparse operations throughout

## What's included

| Python function | R function |
|----------------|------------|
| `knn_adjacency_matrix` | `knn.adjacency.matrix` |
| `knn_count_matrix` | `knn.count.matrix` |
| `get_knn_counts_all` | `get.knn.counts.all` |
| `run_knn_nmf` | `run.knn.nmf` |
| `run_crf_all` | `run.crf.all` |
| `estimate_cell_adjacency` | `estimate.cell.adjacency` |
| `estimate_cell_type_adjacency` | `estimate.cell.type.adjacency` |
| `estimate_gene_prob_per_type` | `estimate.gene.prob.per.type` |
| `estimate_contamination_scores` | `estimate.contamination.scores` |
| `sparse_cor` | `sparse.cor` |
| `estimate_correlation_preservation` | `estimate.correlation.preservation` |
| `run_bridge_test` | `run.bridge.test` |
| `extract_bridge_res` | `extract.bridge.res` |
| `run_memb_test` | `run.memb.test` |
| `extract_memb_res` | `extract.memb.res` |
| `get_enr` | `get.enr` |
| `check_f_rm` | `check.f.rm` |
| `check_f_rm_per_ct` | `check.f.rm.per.ct` |
| `samp_ct_equal` | `samp.ct.equal` |
| `subset_genes` | `subset.genes` |
| `get_counts_matrix` | `get.counts.meta.seurat` |
| `normalize_images` | `normalize.images` |
| `plot_nmf_loadings` | `plot.nmf.loadings` |
| `plot_cell_score_ratios` | `plot.cell.score.ratios` |
| `plot_annot_hmap` | `plot.annot.hmap` |
| `plot_correlation_preservation` | `plot.correlation.preservation` |
| `plot_expression_comparison` | `plot.expression.comparison` |

## Reproducing R results exactly

```python
# Run the parity test
import subprocess
subprocess.run(['pytest', 'tests/test_parity.py', '-v'])
```

## Test Results

```
22 passed, 4 skipped, 0 failed
├── 9 smoke tests (all pass)
├── 13 parity tests (all pass)
└── 4 exact_match tests (skipped — CRF unavailable on R 4.5)
```

## Relationship to omicverse

py-cellAdmix is designed for use with the [omicverse](https://github.com/omicverse/omicverse) ecosystem.

## Citation

If you use this package, please cite the original cellAdmix paper:

> Mitchel et al. "Evaluating and correcting cell admixtures in imaging-based spatial transcriptomics data."

## License

MIT (same as upstream cellAdmix)
