Metadata-Version: 2.3
Name: hto
Version: 1.1.7a0
Summary: A method to demultiplex hashtagged single-cell data.
License: MIT
Keywords: single-cell,demultiplexing,HTO
Author: Tobias Krause
Author-email: krauset@mskcc.org
Requires-Python: >=3.10
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: anndata (>=0.10,<0.13)
Requires-Dist: click (>=8.1)
Requires-Dist: matplotlib (>=3.8,<4)
Requires-Dist: numpy (>=2.0,<3)
Requires-Dist: pandas (>=2.1,<3)
Requires-Dist: pyyaml (>=6)
Requires-Dist: requests (>=2.0,<3)
Requires-Dist: scikit-image (>=0,<1)
Requires-Dist: scikit-learn (>=1.5,<2)
Requires-Dist: scipy (>=1.14,<2)
Requires-Dist: seaborn (>=0.13,<0.14)
Requires-Dist: setuptools (>=65.5.0)
Requires-Dist: tqdm (>=4.0,<5)
Project-URL: Homepage, https://pypi.org/project/hto/
Project-URL: Repository, https://github.com/sail-mskcc/hto_dnd/
Description-Content-Type: text/markdown

# HTO DND - Demultiplex Hashtag Data

[![PyPI version](https://badge.fury.io/py/hto.svg)](https://badge.fury.io/py/hto)
[![Build Status](https://github.com/sail-mskcc/hto_dnd/actions/workflows/test.yml/badge.svg)](https://github.com/sail-mskcc/hto_dnd/actions/workflows/test.yml)

`hto` is a Python package designed for efficient and accurate demultiplexing of hash-tagged oligonucleotides (HTOs) in single-cell data.
It normalises based on observed background signal and denoises the data to remove batch effects and noise:

- **Normalization**: Normalize HTO data using background signal, inspired by the DSB method (see citation below).
- **Denoising**: Remove batch effects and noise from the data by regressing out cell by cell variation.
- **Demultiplexing**: Cluster and classify cells into singlets, doublets, or negatives using clustering methods like k-means or Gaussian Mixture Models (GMM).

The package supports command-line interface (CLI) usage and Python imports.

![HTO DND](./media/pipeline_v0.png)

## Installation

Using `pip`:

```bash
pip install hto
```

From source:

```bash
git clone https://github.com/sail-mskcc/hto_dnd.git
cd hto_dnd
pip install .
```

## Usage

### Python API

The python API is built around AnnData. It is highly recommended two work with three AnnData objects:

* `adata_hto`: Filtered AnnData object with HTO data, containing only actual cells.
* `adata_hto_raw`: Raw AnnData object with HTO data, containing actual cells and background signal.
* `adata_gex`: Raw AnnData object with gene expression data. This is optional and can be used to construct a more informative background signal.

```python
import hto

# get mockdata
mockdata = hto.data.generate_hto(n_cells=1000, n_htos=3, seed=10)
adata_hto = mockdata["filtered"]
adata_hto_raw = mockdata["raw"]
adata_gex = mockdata["gex"]

# denoise, normalize, and demultiplex
adata_demux = hto.demultiplex(
  adata_hto,
  adata_hto_raw,
  adata_gex=adata_gex,
)

# see results
adata_demux.obs[["hash_id", "doublet_info"]].head()
```

### Command-Line Interface (CLI)

The CLI provides an API for the `hto demultiplex` scripts. Make sure to define `--adata-out` to save the output.

```
hto demultiplex \
  --adata-hto /path/to/adata_hto.h5ad \
  --adata-hto-raw /path/to/adata_hto_raw.h5ad \
  --adata-gex /path/to/adata_gex.h5ad \
  --adata-out /path/to/output.h5ad
```

## Data Requirements

HTO-DND requires data from cell hashing experiments where samples are labeled with hashtagged antibodies:

- **HTO data** (`adata_hto`): Filtered cell × HTO count matrix in AnnData format.
- **Raw HTO data** (`adata_hto_raw`): Unfiltered barcode × HTO count matrix including empty droplets. Required for background estimation.
- **Gene expression data** (`adata_gex`, recommended): Cell × gene count matrix for improved background estimation.

