Metadata-Version: 2.1
Name: sICTA
Version: 0.0.1
Summary: A marker-based cell type annotation method that combines the self-training strategy with pseudo-labeling and the nonlinear association capturing capability of Transformer.
Home-page: https://github.com/nbnbhwyy/sICTA
Author: chg
Author-email: chenhg25@mail2.sysu.edu.cn
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.txt

# sICTA: Interpretable Cell Type Annotation based on self-training

The full description of sICTA and its application on published single cell RNA-seq datasets are available.

Download archive with preprocessed data at: https://drive.google.com/drive/folders/1jbqSxacL_IDIZ4uPjq220C9Kv024m9eL.

The repository includes detailed installation instructions and requirements, scripts and demos.


## 1 The workflow of sICTA.

![](Flow.jpg)

**(a)** Combining cell expression and marker gene specificity to generate pseudo-labels. **(b)**  The downstream Transformer classifiers are first pre-trained based on cell type probability distributions (pseudo-labels), followed by iterative refinement of the classifiers through a self-training framework until convergence. The sICTA takes the a priori knowledge from the biological domain and uses masked learnable embeddings to transform the input data ($G$ genes) into $k$ input tokens representing each gene set (GS) and a class token (CLS).
## 2 Requirements

+ Linux/UNIX/Windows system
+ Python == 3.8.6
+ torch == 1.12.1
+ scanpy == 1.9.1

<!-- Topic_gene_embedding -->

## 3 Usage

### Data format

sICTA requires cell-gene matrix and cell type marker information to be entered in `.h5ad` object format.

### Training

```bash
python main.py
```

We provide default data for users to understand and debug sICTA code.


## Reference

If you use `sICTA` in your work, please cite
