Metadata-Version: 2.1
Name: scE2TM
Version: 1.0.3
Summary: scE2TM improves single-cell embedding interpretability and reveals cellular perturbation signatures
Author-email: hegang chen <13247702278@163.com>
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy==1.24.4
Requires-Dist: pandas==2.0.3
Requires-Dist: scipy==1.10.1
Requires-Dist: scikit-learn==1.3.2
Requires-Dist: scanpy==1.9.8
Requires-Dist: anndata==0.9.2
Requires-Dist: gseapy==1.1.4
Requires-Dist: matplotlib==3.7.5
Requires-Dist: umap-learn==0.5.4
Requires-Dist: seaborn==0.13.2
Requires-Dist: python-igraph==0.9.8
Requires-Dist: louvain==0.7.1
Requires-Dist: faiss-gpu==1.7.2
Requires-Dist: pyyaml==6.0.2
Requires-Dist: tqdm==4.67.1
Requires-Dist: joblib==1.4.2
Requires-Dist: threadpoolctl==3.5.0
Requires-Dist: packaging==24.1
Requires-Dist: einops==0.6.1
Requires-Dist: safetensors==0.4.3
Requires-Dist: pynndescent==0.5.13
Requires-Dist: jupyter==1.1.1
Requires-Dist: jgraph==0.2.1
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: jupyter; extra == "dev"

# $scE^2TM$: Toward Interpretable Single-Cell Embedding via Topic Modeling

The full description of $scE^2TM$ and its application on published single cell RNA-seq datasets are available.


The repository includes detailed installation instructions and requirements, scripts and demos.


## 1 Schematic overview of $scE^2TM$.

![](Flow.jpg)

**(a)** To better collaborate the information of different modalities, clusters and topic heads are trained based on mutually refined neighborhood information by encouraging consistent clustering assignments of mutual nearest neighbors of the corresponding cells of different modalities in the embedding space. 
**(b)** ECR clusters gene embeddings $g_j$ (•) as samples and topic embeddings $t_k$ (★) as centers with soft assignment $\pi^{*}_{\epsilon,jk}$. Here, ECR pushes $g_1$ and $g_2$ close to $t_1$, and away from $t_3$ and $t_5$.
**(c)** Sparse linear decoders learn topic embeddings and gene embeddings as well as sparse topic-gene dependencies during reconstruction, thus ensuring model interpretability.
## 2 Installation
Create a new python environment.
```bash
conda create --name  scE2TM_env python=3.8.8
conda activate scE2TM
```

Install the dependencies from the provided requirements.txt file.
```bash
pip install -r requirements.txt
```
Installation typically completes in approximately 1.5 hours.
## 3 Usage

### Data format

$scE^2TM$ requires the input of cell-by-cell gene matrices, external embedding of cells, and true cell type information in .CSV object format.

The true cell type information is only used for prediction accuracy assessment.

We provide default data (Wang) for users to understand and debug the $scE^2TM$ code.


### Training

```bash
python run.py
```
On the provided example dataset, the demo completes in about one minute.

### Tutorial

We provide three tutorials in the `tutorial` directory that introduce the usage of $scE^2TM$ and reproduce the main quantitative results:

- [Clustering and Interpretable Evaluation]
- [Pathway Enrichment]
- [Topic gene embedding]

## Reference

If you use $scE^2TM$ in your work, please cite

## License

This project is licensed under the MIT License.
