Metadata-Version: 2.4
Name: stdrug
Version: 0.0.2
Summary: STDrug: A Computational Method to Use Spatial Transcriptomics to Aid Personalized Drug-reposition Recommendation
Home-page: https://github.com/akiyiwen/STdrug
Author: Yiwen Yang
Author-email: yangiwen@umich.edu
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: MarkupSafe>=2.1.5
Requires-Dist: SpaGCN>=1.2.7
Requires-Dist: anndata>=0.10.7
Requires-Dist: array-api-compat>=1.7.1
Requires-Dist: contourpy>=1.2.1
Requires-Dist: cycler>=0.12.1
Requires-Dist: filelock>=3.15.1
Requires-Dist: fonttools>=4.53.0
Requires-Dist: fsspec>=2023.6.0
Requires-Dist: future>=1.0.0
Requires-Dist: h5py>=3.11.0
Requires-Dist: harmonypy==0.0.9
Requires-Dist: igraph>=0.11.5
Requires-Dist: jinja2>=3.1.4
Requires-Dist: joblib>=1.4.2
Requires-Dist: kiwisolver>=1.4.5
Requires-Dist: legacy-api-wrap>=1.4
Requires-Dist: leidenalg>=0.10.2
Requires-Dist: llvmlite>=0.43.0
Requires-Dist: louvain>=0.8.2
Requires-Dist: matplotlib>=3.10.3
Requires-Dist: mpmath>=1.3.0
Requires-Dist: natsort>=8.4.0
Requires-Dist: networkx>=3.3
Requires-Dist: numba>=0.60.0
Requires-Dist: numpy>=1.26.4
Requires-Dist: nvidia-cublas-cu12>=12.6.4.1
Requires-Dist: nvidia-cuda-cupti-cu12>=12.6.80
Requires-Dist: nvidia-cuda-nvrtc-cu12>=12.6.77
Requires-Dist: nvidia-cuda-runtime-cu12>=12.6.77
Requires-Dist: nvidia-cudnn-cu12>=9.5.1.17
Requires-Dist: nvidia-cufft-cu12>=11.3.0.4
Requires-Dist: nvidia-cufile-cu12>=1.11.1.6
Requires-Dist: nvidia-curand-cu12>=10.3.7.77
Requires-Dist: nvidia-cusolver-cu12>=11.7.1.2
Requires-Dist: nvidia-cusparse-cu12>=12.5.4.2
Requires-Dist: nvidia-cusparselt-cu12>=0.6.3
Requires-Dist: nvidia-nccl-cu12>=2.26.2
Requires-Dist: nvidia-nvjitlink-cu12>=12.6.85
Requires-Dist: nvidia-nvtx-cu12>=12.6.77
Requires-Dist: packaging>=24.2
Requires-Dist: pandas>=2.2.2
Requires-Dist: patsy>=0.5.6
Requires-Dist: pillow>=10.3.0
Requires-Dist: pycpd>=2.0.0
Requires-Dist: pynndescent>=0.5.12
Requires-Dist: pyparsing>=3.1.2
Requires-Dist: python-dateutil>=2.9.0.post0
Requires-Dist: python-igraph>=0.11.5
Requires-Dist: pytz>=2024.1
Requires-Dist: scanpy>=1.10.0
Requires-Dist: scikit-learn>=1.5.0
Requires-Dist: scipy>=1.13.1
Requires-Dist: seaborn>=0.13.2
Requires-Dist: session-info>=1.0.0
Requires-Dist: setuptools>=65.5.0
Requires-Dist: six>=1.16.0
Requires-Dist: statsmodels>=0.14.4
Requires-Dist: stdlib_list>=0.10.0
Requires-Dist: sympy>=1.14.0
Requires-Dist: texttable>=1.7.0
Requires-Dist: threadpoolctl>=3.5.0
Requires-Dist: torch>=2.7.0
Requires-Dist: tqdm>=4.66.4
Requires-Dist: triton>=3.3.0
Requires-Dist: typing-extensions>=4.12.2
Requires-Dist: tzdata>=2024.1
Requires-Dist: umap-learn>=0.5.6
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: summary

# STDrug: A Computational Method to Use Spatial Transcriptomics to Aid Personalized Drug-reposition Recommendation

Drug repurposing is a cost-effective strategy for accelerating therapeutic discovery, yet existing single-cell RNA-seq (scRNA-seq)-based methods often overlook the spatial context critical for capturing tissue-specific drug responses. We introduce STADS (Spatial Transcriptomics to Aid Drug-reposition Strategy), a personalized computational framework that leverages spatial transcriptomics data to improve drug repurposing.

![pipeline](figure/STADS%20pipeline%20cartoon_V8.jpeg)

**Illustration of STADS architecture.** STADS utilizes paired diseased and normal tissues from the same patients as input for its spatial domain identification module. This module first performs batch correction and sample alignment before applying a GCN combined with the coherent point drift (CPD) algorithm to identify corresponding spatial domains across conditions. These paired spatial domains then serve as inputs for the drug repurposing module, which identifies potentially reversible genes by comparing differentially expressed genes (DEGs) between spatial domains and integrating drug perturbation data from the L1000 dataset. To prioritize key reversible genes, STADS leverages weights extracted from an XGBoost model trained on potential drug information retrieved from GPT-4o. Additionally, STADS accounts for spatial domain interactions in its drug score calculation. The final drug score is computed by integrating spatial domain proportions and interactions, the significance and weighted influence of reversible genes from XGBoost, as well as drug side effect profiles and sensitivity data. The potential drugs are then validated using empirical evidence from literature and clinical trials, LLM-validated potential drug information, in-silico validation using EHR, and in-vitro validation using cell line experiments.

# Installation

## Docker

Install STDrug pre-built environment on AMD64 Linux using [Docker](https://www.docker.com/get-started/).

```bash
docker pull akiyiwen/stdrug:latest
```

Run the docker image. Follow the instructions printed in the console, open a browser and navigate to `http://localhost:8888` for Jupyter Notebook, `http://localhost:8787` for Rstudio. Both tools will be needed to run a complete STDrug pipeline. For details, follow [Tutorial](#tutorial) and make sure to download the reference data either in or outside the docker container. If the reference data is downloaded outside docker, mount the corresponding path when starting the image. If prompted for username and password, use the one printed in the console.

```bash
docker run -it --rm -p 8787:8787 -p 8888:8888 akiyiwen/stdrug:latest
# Your user name is: arch
# Your password is: <long password>
# Start Jupyter Notebook at http://localhost:8888
# Start RStudio Server at http://localhost:8787
# If the reference data is outside docker, use -v to mount
# docker run -it --rm -p 8787:8787 -p 8888:8888 -v <downloaded data dir>:/home/arch/data akiyiwen/stdrug:latest
```

## Manual install

STDrug consists of a spatial domain matching module and a drug score calculation module, written in Python and R correspondingly. For a manual installation and custom package usage, please follow the directions here.

The package is tested under Python `3.12` and R `4.5.2`.

Install Python requirements using `pip`:

```bash
pip install stdrug
```

Install R environments using `devtools` and `BiocManager`:
```r
install.packages(c("BiocManager", "devtools"))
BiocManager::install(c("cmapR", "limma"))
devtools::install_github(c("immunogenomics/presto", "jinworks/CellChat"))
devtools::install_github("akiyiwen/STdrug")
```

Alternatively, use `renv`:
```r
install.packages("renv")
renv::init(bioconductor = T, repos = "https://cloud.r-project.org")
# Restart R session
renv::install("akiyiwen/STdrug")
```

# Tutorial

## Data preparation

If working on Linux, use `scripts/datasets.sh` to download and extract required reference data and example data for STDrug input.

```bash
curl -sL 'https://raw.githubusercontent.com/akiyiwen/STdrug/bd2f05cdcfe1af77db95c5796884d72c853f464b/scripts/datasets.sh' | bash
```

Alternatively, first manually download drug reference data from [Dropbox](https://www.dropbox.com/scl/fo/sc7tyjuw9k5ci5v0svyw7/AIya9k7PQH786X8bleDW7KY?rlkey=j1fzh131dyl0xffy5pa6j26dl&st=5ti8ct69&dl=0). It is recommended to create a folder named `data` and extract the reference files under `data/reference`. After downloading, the folder should have the following structure:

```
data
└── reference
    ├── drug_validation
    │   ├── liver.csv
    │   └── prostate.csv
    ├── l1000
    │   ├── GSE70138.tar.gz
    │   └── GSE92742.tar.gz
    ├── tahoe
    │   └── drug_ref.rds
    ├── fda.txt
    ├── gdsc.csv
    └── sider.csv
```

Extract tarball files using `tar`:
```bash
tar -xzvf data/reference/l1000/GSE70138.tar.gz -C data/reference/l1000
tar -xzvf data/reference/l1000/GSE92742.tar.gz -C data/reference/l1000
```

After extraction, the folder structure should have the following structure:
```
data
└── reference
    ├── drug_validation
    │   ├── liver.csv
    │   └── prostate.csv
    ├── l1000
    │   ├── GSE70138
    │   │   ├── GSE70138_Broad_LINCS_cell_info_2017-04-28.txt
    │   │   ├── GSE70138_Broad_LINCS_gene_info_2017-03-06.txt
    │   │   ├── GSE70138_Broad_LINCS_inst_info_2017-03-06.txt
    │   │   ├── GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328_2017-03-06.gctx
    │   │   └── GSE70138_Broad_LINCS_sig_info_2017-03-06.txt
    │   └── GSE92742
    │       ├── GSE92742_Broad_LINCS_cell_info.txt
    │       ├── GSE92742_Broad_LINCS_gene_info.txt
    │       ├── GSE92742_Broad_LINCS_Level5_COMPZ.MODZ_n473647x12328.gctx
    │       └── GSE92742_Broad_LINCS_sig_info.txt
    ├── tahoe
    │   └── drug_ref.rds
    ├── fda.txt
    ├── gdsc.csv
    └── sider.csv
```

(Optional) Download the sample data analyzed in the manuscript from [Dropbox](https://www.dropbox.com/scl/fo/momgw9d38zx60d8t5sta8/ALdWi7fPJH1MSuVWYPvqFwc?rlkey=5w1fx7ft7fwlg03t5yk013d5w&st=ai5chwd7&dl=0). You can also put them under `data`.
```
data
├── HCC01N.h5ad
├── HCC01N.rds
├── HCC01T.h5ad
├── HCC01T.rds
├── HCC02N.h5ad
├── HCC02N.rds
├── HCC02T.h5ad
├── HCC02T.rds
├── HCC03N.h5ad
├── HCC03N.rds
├── HCC03T.h5ad
├── HCC03T.rds
├── HCC04N.h5ad
├── HCC04N.rds
├── HCC04T.h5ad
└── HCC04T.rds
```

# Quick Start

## Spatial domain identification

The first step of STDrug is to identify spatial domains that match patient tumor tissue and adjacent normal tissue. Run Python script following the tutorial [Identify spatial domains using STDrug for multiple samples](https://akiyiwen.github.io/STdrug/spatial-domain-identification-example). If using docker, open `example.ipynb` in Jupyter Notebook.

This module should produce output files in the following structure:

```
./output
|-- checkpoint
|  |-- stads_cluster.h5ad // AnnData of integrated spatial data with spatial clustering
|-- partition.csv // Spatial domain annotation and meta data
```

## Drug repurposing

Following the spatial domain identification module, STDrug uses a comprehensive drug ranking algorithm to repurpose drugs personalized for each patient. In this step, use R script to run the module following the tutorial [Use STDrug to calculate drug score for multiple samples](https://akiyiwen.github.io/STdrug/drug-score-calculation-example). If using docker, open `example.rmd` in Rstudio.

STDrug generates drug outputs structured as follows. The repurposed top drugs can be inspected from `./output/drugs_<patient>.csv`.

```
./output
|-- checkpoint
|  |-- cci_ratio_<patient>.csv // Cell-cell interation results for patient
|  |-- drug_scores_<patient>.csv // Spatial domain specific drug score results for patient
|  |-- stads_cluster.h5ad // AnnData of integrated spatial data with spatial clustering
|  |-- stads_cluster.rds // Seurat object of integrated spatial data with spatial clustering
|-- drugs_<patient>.csv // Drug score and ranking for patient, higher drug score means better treatment potential
|-- partition.csv // Spatial domain annotation and meta data

```

# Contributor

# Citation
