Metadata-Version: 2.4
Name: cycombinepy
Version: 0.1.1
Summary: Python port of cyCombine: batch correction of single-cell cytometry data
Project-URL: Homepage, https://github.com/mdmanurung/cyCombinePy
Project-URL: Repository, https://github.com/mdmanurung/cyCombinePy
Project-URL: Issues, https://github.com/mdmanurung/cyCombinePy/issues
Project-URL: Changelog, https://github.com/mdmanurung/cyCombinePy/blob/main/CHANGELOG.md
Project-URL: Original R package, https://github.com/biosurf/cyCombine
Author: Mikhael Manurung
Maintainer: Mikhael Manurung
License-Expression: MIT
License-File: LICENSE
Keywords: anndata,batch-correction,bioinformatics,combat,cytof,cytometry,flow-cytometry,flowsom,mass-cytometry,single-cell
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: anndata>=0.10
Requires-Dist: flowsom>=0.1.1
Requires-Dist: formulaic>=1.0
Requires-Dist: inmoose>=0.7
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scanpy>=1.10
Requires-Dist: scikit-learn>=1.3
Requires-Dist: scipy>=1.11
Provides-Extra: all
Requires-Dist: matplotlib>=3.7; extra == 'all'
Requires-Dist: pytometry>=0.1.5; extra == 'all'
Requires-Dist: readfcs>=2; extra == 'all'
Requires-Dist: scib-metrics>=0.5; extra == 'all'
Requires-Dist: seaborn>=0.13; extra == 'all'
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: furo>=2024.1; extra == 'dev'
Requires-Dist: jupyter; extra == 'dev'
Requires-Dist: matplotlib>=3.7; extra == 'dev'
Requires-Dist: myst-nb>=1.1; extra == 'dev'
Requires-Dist: nbclient; extra == 'dev'
Requires-Dist: nbconvert; extra == 'dev'
Requires-Dist: nbformat; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: pytometry>=0.1.5; extra == 'dev'
Requires-Dist: readfcs>=2; extra == 'dev'
Requires-Dist: scib-metrics>=0.5; extra == 'dev'
Requires-Dist: seaborn>=0.13; extra == 'dev'
Requires-Dist: sphinx-autodoc-typehints>=2; extra == 'dev'
Requires-Dist: sphinx>=7; extra == 'dev'
Requires-Dist: twine>=5; extra == 'dev'
Provides-Extra: docs
Requires-Dist: furo>=2024.1; extra == 'docs'
Requires-Dist: myst-nb>=1.1; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints>=2; extra == 'docs'
Requires-Dist: sphinx>=7; extra == 'docs'
Provides-Extra: eval
Requires-Dist: scib-metrics>=0.5; extra == 'eval'
Provides-Extra: io
Requires-Dist: pytometry>=0.1.5; extra == 'io'
Requires-Dist: readfcs>=2; extra == 'io'
Provides-Extra: plotting
Requires-Dist: matplotlib>=3.7; extra == 'plotting'
Requires-Dist: seaborn>=0.13; extra == 'plotting'
Description-Content-Type: text/markdown

# cyCombinePy

Python port of [cyCombine](https://github.com/biosurf/cyCombine) for batch
correction of single-cell cytometry data.

cyCombinePy is AnnData-native and reuses existing Python libraries instead of
reimplementing primitives:

- **ComBat**: [`inmoose.pycombat`](https://github.com/epigenelabs/inmoose)
- **SOM clustering**: [`FlowSOM`](https://github.com/saeyslab/FlowSOM_Python)
- **FCS I/O**: [`pytometry`](https://github.com/buettnerlab/pytometry) /
  [`readfcs`](https://github.com/laminlabs/readfcs)
- **Batch-effect metrics**:
  [`scib-metrics`](https://github.com/YosefLab/scib-metrics)

## Pipeline

The cyCombine workflow ports over unchanged:

1. **Batch-wise normalize** expression per marker (`cycombinepy.normalize`)
2. **Self-organizing map** clustering of cells (`cycombinepy.create_som`)
3. **Per-cluster ComBat** correction with optional covariates and anchors
   (`cycombinepy.correct_data`)

Step 1 operates on a normalized view so clusters represent biology rather than
batch. Step 3 is applied to the unnormalized data per cluster so rare
populations aren't over-corrected.

## Quickstart

```python
import cycombinepy as pc

# 1. Load FCS files into AnnData
adata = pc.io.read_fcs_dir(
    "data/",
    metadata="metadata.csv",
    batch_key="Batch",
    sample_key="Patient",
    condition_key="condition",
    cofactor=5,           # asinh cofactor for CyTOF
)

# 2. Inspect batch effects before correction
figs = pc.detect_batch_effect_express(adata, out_dir="before/")

# 3. End-to-end batch correction
pc.batch_correct(
    adata,
    xdim=8, ydim=8,
    covar="condition",
)
# Corrected matrix is now in adata.layers["cycombine_corrected"]

# 4. Evaluate
from cycombinepy.correct import CORRECTED_LAYER
uncorr = pc.compute_emd(adata, cell_key="cycombine_som")
corr   = pc.compute_emd(adata, cell_key="cycombine_som", layer=CORRECTED_LAYER)
report = pc.evaluate_emd(uncorr, corr)
print(report.groupby("marker")["reduction_pct"].mean())
```

Or use the modular API:

```python
pc.transform_asinh(adata, cofactor=5)
pc.normalize(adata, method="scale")
pc.create_som(adata, xdim=8, ydim=8)
pc.correct_data(adata, label_key="cycombine_som", covar="condition")
```

## Public API

| Function | Purpose |
|---|---|
| `batch_correct` | Full pipeline orchestrator |
| `transform_asinh` | Asinh transform with derandomization |
| `normalize` | Batch-wise scale / rank / CLR / qnorm |
| `create_som` | FlowSOM clustering |
| `correct_data` | Per-cluster ComBat correction |
| `compute_emd`, `evaluate_emd` | Earth-Mover's-Distance batch evaluation |
| `compute_mad`, `evaluate_mad` | Median-Absolute-Deviation batch evaluation |
| `detect_batch_effect`, `detect_batch_effect_express` | Diagnostic plots |
| `get_markers`, `check_confound` | Utilities |

FCS I/O lives in `cycombinepy.io`, plotting in `cycombinepy.plotting`, and an
optional `scib_metrics` wrapper in `cycombinepy.evaluate`.

## Installation

```bash
pip install -e ".[all,dev]"
```

## Data structure conventions

- `adata.X`: cells × markers expression (post-asinh, pre-correction)
- `adata.obs["batch"]`: batch assignment (required)
- `adata.obs["sample"]`, `adata.obs["condition"]`, `adata.obs["anchor"]`:
  optional metadata
- `adata.obs["cycombine_som"]`: SOM cluster labels (written by `create_som`)
- `adata.layers["cycombine_corrected"]`: corrected expression (written by
  `correct_data` / `batch_correct`)

## Citation

If you use cyCombinePy please cite the original cyCombine paper:

> Pedersen, C.B., Dam, S.H., Barnkob, M.B., *et al.* cyCombine allows for robust
> integration of single-cell cytometry datasets within and across technologies.
> *Nat Commun* **13**, 1698 (2022).
> <https://doi.org/10.1038/s41467-022-29383-5>
