Metadata-Version: 2.4
Name: oncoordinate
Version: 0.1.7
Summary: Oncoordinate is an interpretable deep learning framework for single-cell and spatial transcriptomic analysis of malignancy that learns malignant and malignancy-associated cell states across epithelial, stromal, and immune lineages while remaining tightly integrated with the scverse ecosystem.
Author-email: "Vignesh V. Venkat" <vvv11@scarletmail.rutgers.edu>
License: MIT License
        
        Copyright (c) 2025 Vignesh Venkat
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/Viggyvenkat/Oncoordinate
Project-URL: Issues, https://github.com/Viggyvenkat/Oncoordinate/issues
Keywords: cancer,pathways,bioinformatics
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: anndata
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scipy
Requires-Dist: scanpy
Requires-Dist: scvi-tools
Requires-Dist: scikit-learn==1.7.2
Requires-Dist: joblib
Requires-Dist: umap-learn
Requires-Dist: gseapy
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: python-igraph
Requires-Dist: pytorch-tabnet
Requires-Dist: ipywidgets
Requires-Dist: imageio
Requires-Dist: click
Dynamic: license-file

# Oncoordinate

### Contents
1. Introduction
2. Discovering single-cell states
3. Projecting to spatial transcriptomic cohorts
4. References
5. Acknowledgements

Note: we provide a demo notebook (```vignettes/tutorial.html```) and two spatial datasets (```demo-data/V10U24-037_A```, ```demo-data/V10U24-037_B```). We will be uploading the train set, the test set, and the reference_sc.h5ad on zenodo. We have also provided an `environment.yml` file so that you can run the command: `conda env create -f environment.yml` and get a working environment for Oncoordinate.

## 1. Introduction
Oncoordinate is an interpretable deep learning framework for single-cell and spatial transcriptomic analysis of malignancy that learns malignant and malignancy-associated cell states across epithelial, stromal, and immune lineages while remaining tightly integrated with the scverse ecosystem. Built on a sequential attention–based and trained on an integrated lung atlas (~3M cells spanning normal, chronic disease, and multiple lung carcinoma subtypes), Oncoordinate predicts a four-stage neoplastic continuum (normal, dysplastic, pre-malignant, malignant) with calibrated probabilities and sparse, step-wise feature selection that can be traced back to genes and pathways. Beyond single-cell analysis, it includes a de novo label transfer pipeline based on scVI and scANVI that projects these learned states into spatial transcriptomic datasets (e.g., 10x Visium) via pseudospots and joint latent embeddings, enabling the localization of aggressive niches, tumor–CAF neighborhoods, and other malignant ecosystems within intact tissue. Oncoordinate also works with niche detection methods such as SOAPy and works seamlessly with AnnData and Scanpy-based workflows, providing a GPU-accelerated, atlas-scale, and plug-and-play framework for malignancy modeling across molecular and spatial dimensions.

To install Oncoordinate, please run:

``` pip install oncoordinate ```

![Fig1](figures/Figure%201/Slide1.png)

## 2. Discovering single-cell states 
Oncoordinate discovers malignant and malignancy-associated cell states by modeling neoplastic progression as a continuous, lineage-aware process rather than a binary tumor versus non-tumor classification. Using atlas-scale single-cell RNA-seq data, the model integrates gene-level features and pathway scores to learn a four-stage neoplastic continuum spanning normal, dysplastic, pre-malignant, and malignant states. Its sequential attention mechanism performs sparse, step-wise feature selection, allowing the model to focus on distinct transcriptional programs at different stages of malignancy while preserving interpretability. This design enables Oncoordinate to recover coherent oncogenic trajectories within individual lineages, such as epithelial, fibroblast, immune, and endothelial compartments, and to quantify heterogeneity both within and across patients. The resulting per-cell malignancy probabilities and lineage-resolved scores provide an interpretable representation of tumor ecosystem remodeling that can be directly used for downstream analyses, including trajectory visualization, pathway interrogation, and patient-level stratification.

![Fig2](figures/Figure%202/Slide1.png)

## 3. Projecting to spatial transcriptomic cohorts 
To translate single-cell–derived malignancy states into intact tissue architecture, Oncoordinate implements a de novo label transfer framework tailored for spatial transcriptomics data. Single-cell profiles sharing the same lineage and malignancy state are aggregated into pseudospots to approximate the multicellular composition of spatial capture spots, and these pseudospots are jointly embedded with spatial transcriptomic data using scVI to learn a shared, batch-corrected latent space. scANVI is then applied in a semi-supervised manner to infer probabilistic malignancy and lineage labels for each spatial spot. This approach enables robust projection of malignant programs into spatial coordinates, revealing colocalized malignant neighborhoods such as tumor–fibroblast (CAF) niches, regions of stromal remodeling, and immune-associated malignant ecosystems. By combining probabilistic spatial labeling with downstream niche detection methods such as SOAPy, Oncoordinate allows users to identify and characterize aggressive, spatially confined microenvironments that are not apparent from dissociated single-cell data alone.

![Fig3](figures/Figure%203/Slide1.png)

## 4. References
- Venkat V. et al. Disruptive changes in tissue microenvironment prime oncogenic processes at different stages of carcinogenesis in lung. bioRxiv (2024).
- Venkat VV, De S. Oncoordinate-derived single-cell states translate to malignant clusters in spatial transcriptomic cohorts. bioRxiv (2026).

## 5. Acknowledgements
We thank colleagues at the Rutgers Cancer Institute and members of the De Laboratory for their guidance and support.
