Metadata-Version: 2.4
Name: GenKI
Version: 0.2.0
Summary: Gene knock-out inference from single-cell data with variational graph autoencoders
Author-email: "Yongjian Yang, TAMU" <yjyang027@tamu.edu>
License: MIT
Project-URL: Homepage, https://github.com/yjgeno/GenKI
Project-URL: Repository, https://github.com/yjgeno/GenKI
Keywords: neural network,graph neural network,variational graph neural network,computational-biology,single-cell,gene knock-out,gene regulatory network
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: anndata>=0.9
Requires-Dist: h5py>=3.7
Requires-Dist: matplotlib>=3.6
Requires-Dist: numpy>=1.23
Requires-Dist: pandas>=1.5
Requires-Dist: scanpy>=1.9
Requires-Dist: scikit-learn>=1.1
Requires-Dist: scipy>=1.9
Requires-Dist: statsmodels>=0.13
Requires-Dist: tensorboard>=2.10
Requires-Dist: torch>=1.13
Requires-Dist: torch-geometric>=2.3
Requires-Dist: tqdm>=4.64
Provides-Extra: ray
Requires-Dist: ray>=2.0; extra == "ray"
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file

# GenKI — Gene Knock-out Inference

[![PyPI version](https://img.shields.io/pypi/v/genki.svg)](https://pypi.org/project/genki/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![DOI](https://img.shields.io/badge/DOI-10.1093%2Fnar%2Fgkad450-blue)](https://doi.org/10.1093/nar/gkad450)

A Variational Graph Auto-Encoder (VGAE) model for predicting gene perturbation effects from scRNA-seq data. GenKI performs *in silico* gene knock-out experiments on a gene regulatory network (GRN) without requiring real knock-out data.

<p align="center">
    <img src="logo.jpg" alt="GenKI logo" width="300"/>
</p>

## Prerequisites

GenKI requires **Python ≥ 3.10**. **PyTorch** and **PyTorch Geometric** are installed automatically (CPU builds) with the package. For a GPU/CUDA build, install them first to match your CUDA version:

1. [Install PyTorch](https://pytorch.org/get-started/locally/)
2. [Install PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)

## Installation

```shell
pip install GenKI
```

Or install directly from source:

```shell
pip install git+https://github.com/yjgeno/GenKI.git
```

Or with conda (sets up the full environment):

```shell
conda env create -f environment.yml
conda activate ogenki
```

## Quick Start

The high-level `GenKI` facade runs the whole workflow — load & preprocess data, build the GRN, train the VGAE, and rank genes — in one call:

```python
from GenKI import GenKI

ranked = GenKI.from_h5ad(
    "data/my_data.h5ad",
    target_gene=["TUBG1"],   # gene(s) to knock out (upper-cased by default)
).run(epochs=100, seed=8096, n_permutations=100)

print(ranked)   # genes ranked by perturbation effect
```

Separate the training and prediction steps when you want to inspect the model in between:

```python
gk = GenKI.from_h5ad("data/my_data.h5ad", target_gene=["TUBG1"])
gk.fit(epochs=100, lr=7e-4, beta=1e-4, seed=8096)
ranked = gk.predict(n_permutations=100, by="KL")

print(gk.metrics)        # (epochs, loss, AUROC, AP)
gk.loader, gk.trainer    # escape hatch to the underlying objects
```

Start from an in-memory `AnnData` instead of a file (set `preprocess=True` to normalize/standardize it):

```python
import scanpy as sc

adata = sc.read_h5ad("data/my_data.h5ad")
gk = GenKI.from_adata(adata, target_gene=["TUBG1"], preprocess=True)
ranked = gk.run(seed=8096)
```

Building the GRN in parallel needs the optional Ray extra (`pip install "GenKI[ray]"`); pass `n_cpus` and other GRN options as keyword arguments, e.g. `GenKI.from_h5ad(..., rebuild_grn=True, n_cpus=8)`.

<details>
<summary><b>Lower-level API</b> (fine-grained control over each step)</summary>

```python
from GenKI.preprocessing import build_adata
from GenKI.dataLoader import DataLoader
from GenKI.train import VGAE_trainer
from GenKI import utils

# 1. Load and preprocess data
adata = build_adata("data/my_data.h5ad")

# 2. Build GRN and prepare WT / virtual-KO graph data
data_wrapper = DataLoader(
    adata,
    target_gene=["TUBG1"],   # gene to knock out
    target_cell=None,         # None = use all cells
    GRN_file_dir="GRNs",
    rebuild_GRN=True,
    pcNet_name="pcNet",
    verbose=True,
    n_cpus=8,
)
data_wt = data_wrapper.load_data()
data_ko = data_wrapper.load_kodata()

# 3. Train VGAE
sensei = VGAE_trainer(data_wt, epochs=100, lr=7e-4, beta=1e-4, seed=8096)
sensei.train()

# 4. Get latent distributions and compute KL divergence per gene
z_mu_wt, z_std_wt = sensei.get_latent_vars(data_wt)
z_mu_ko, z_std_ko = sensei.get_latent_vars(data_ko)
dis = utils.get_distance(z_mu_ko, z_std_ko, z_mu_wt, z_std_wt, by="KL")

# 5. Rank genes by perturbation effect (with permutation test)
null = sensei.pmt(data_ko, n=100, by="KL")
res = utils.get_generank(data_wt, dis, null)
print(res)
```

</details>

## API

| Symbol | Description |
|---|---|
| `GenKI.GenKI` | High-level facade: `from_h5ad` / `from_adata` constructors and `fit` / `predict` / `run` methods covering the full workflow |
| `GenKI.dataLoader.DataLoader` | Wraps an `AnnData` object, builds/loads the GRN, and produces PyG `Data` objects for WT and virtual-KO conditions |
| `GenKI.train.VGAE_trainer` | Trains the VGAE, exposes latent variables, permutation testing, and model save/load |
| `GenKI.utils.get_distance` | Computes per-gene distribution distance (KL, EMD, t-test) between two latent spaces |
| `GenKI.utils.get_generank` | Ranks genes by perturbation score; optionally filters by permutation-test significance |
| `GenKI.preprocessing.build_adata` | Loads an `.h5ad` file and adds a log-normalised layer used by `DataLoader` |
| `GenKI.pcNet.make_pcNet` | Builds a principal-component-based GRN from expression data (optionally parallelised with Ray) |

## Tutorial

Step-by-step virtual KO example:
[notebook/Example.ipynb](https://github.com/yjgeno/GenKI/blob/master/notebook/Example.ipynb)

## Citation

If you use GenKI in your research, please cite:

> Yang Y, Wang M, Ni P, Zhong J. *GenKI: Virtual gene knockout inference with variational graph autoencoder*. Nucleic Acids Research, 2023. https://doi.org/10.1093/nar/gkad450
