Metadata-Version: 2.2
Name: biofusion
Version: 0.0.6
Summary: Multilayer networks for biological multimodal data fusion and analysis.
Home-page: https://github.com/CalmScout/BioFusion
Author: Anton Popov
Author-email: anton.popov@bsc.es
License: MIT License
Keywords: multimodal network biolobical
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: seaborn
Requires-Dist: tqdm
Requires-Dist: rdflib
Requires-Dist: joblib
Requires-Dist: cobra
Requires-Dist: networkx
Requires-Dist: pympler
Requires-Dist: memory-profiler
Requires-Dist: ipywidgets
Requires-Dist: nbdev
Requires-Dist: jupyterlab
Requires-Dist: jupyterlab-quarto
Requires-Dist: ray[data,serve,train,tune]
Requires-Dist: bioc
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# BioFusion


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

A tool for multimodal biological data integration and analysis with the
help of multilayer networks.

This repository contains code developed during collaboration between
Fujitsu Research of Europe and Barcelona Supercomputing Center.

## Installation

You can install package from PyPI:

``` bash
pip install biofusion
```

For developers, to install the last version of the package please run
the command:

``` bash
pip install -e .
```

from the package roor directory.

## End-to-end example

### 1. Set up the project

#### 1.1. Install `uv` package manager

Follow instructions [from
here](https://docs.astral.sh/uv/getting-started/installation/). For
Linux/MacOS the command is:

``` bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

#### 1.2. Create project dir and corresponding Python environment

``` bash
mkdir biofusion-demo
cd biofusion-demo
uv venv --python=3.12.9
```

The last command has created `.venv` folder with local Python
environment. Let’s activate it:

``` bash
source .venv/bin/activate
```

Let’s install `biofusion` package:

``` bash
uv pip install biofusion
```

### 2. Create the data files

#### 2.1. Create the `data` folder

``` bash
mkdir data
```

#### 2.2. Populate the data folder

In the root of the project create the notebook (e.g. `01_demo.ipynb`).
Open notebook in your favorite IDE (e.g. VS Code) and select the Jupyter
kernel from the environment that we created before. After this we are
ready to generate some synthetic data to check the community detection
algorithms. In the notebook enter and run the following cells:

``` python
from BioFusion.utils import generate_and_save_graphs
# each layer/graph is described by the tuple of parameters
# first tuple element is the number of unique nodes, second is a probability of the
# edge in between two random nodes and third is the label string
graph_params = [(300, 0.2, ""), (500, 0.2, ""), (400, 0.2, ""), (300, 0.4, "")]
# all generated graps will be stored in the dir below in the format `1.csv`, ... `<N>.csv`, wheree <N> is the number of tuples in the list `graph_params`
path_dir_to = "./data/"
generate_and_save_graphs(graph_params, path_dir_to)
```

#### 2.3. Create the output folder

Folder to store the reesults of the analysis:

``` bash
mkdir out
```

After running commands in this section the files in the project will be
created:

``` bash
biofusion-demo$ tree
.
├── 01_demo.ipynb
├── data
│   ├── 1.csv
│   ├── 2.csv
│   ├── 3.csv
│   └── 4.csv
└── out
```

### 3. Run community detection

Import required dependencies:

``` python
import os
from BioFusion.cmmd import cmmd
```

Define the layers of multiayer network:

``` python
prefix = "./data/"
input_layers = [prefix + x for x in os.listdir(prefix) if x.endswith(".csv")]
# sort the input layers, os ignores the alphanumeric order of the files
input_layers.sort()
```

Define parameters of the community detection algorithm:

``` python
gamma_min = 0
gamma_max = 10
gamma_step = 0.5
path_to_communities = "./out/"
```

Run the community detection algorithm:

``` python
cmmd_output = cmmd(
    nodelist = None,
    input_layers = input_layers,
    gamma_min = gamma_min,
    gamma_max = gamma_max,
    gamma_step = gamma_step,
    path_to_communities = path_to_communities,
    distmethod = "hamming")
```

Output of the algorithm is sotred in the `./out` folder.

The whole script:

``` python
import os
from BioFusion.utils import generate_and_save_graphs
from BioFusion.cmmd import cmmd

graph_params = [(300, 0.2, ""), (500, 0.2, ""), (400, 0.2, ""), (300, 0.4, "")]

path_dir_to = "./data/"
generate_and_save_graphs(graph_params, path_dir_to)
prefix = "./data/"

input_layers = [prefix + x for x in os.listdir(prefix) if x.endswith(".csv")]
input_layers.sort()

gamma_min = 0
gamma_max = 10
gamma_step = 0.5

path_to_communities = "./out/"

cmmd_output = cmmd(
    nodelist = None,
    input_layers = input_layers,
    gamma_min = gamma_min,
    gamma_max = gamma_max,
    gamma_step = gamma_step,
    path_to_communities = path_to_communities,
    distmethod = "hamming")
```

## Organisation

The directory structure is as follows:

    .
    |-- data
    |   |-- GeneCelltypes
    |   |   |-- gene_celltypes_all_common.txt
    |   |   |-- gene_celltypes_all_common_cnv.txt
    |   |   |-- gene_celltypes_all_common_rna.txt
    |   |   |-- gene_celltypes_all_unique.txt
    |   |   |-- gene_celltypes_all_unique_cnv.txt
    |   |   `-- gene_celltypes_all_unique_rna.txt
    |   |-- MultilayerCommunities
    |   |   |-- <BSC-community-trajectories.tsv>
    |   |   `-- <BSC-distance-matrix.tsv>
    |   |-- MultilayerGraphs
    |   |   |-- <BSC-MLN-layer-1.json>
    |   |   |-- :
    |   |   `-- <BSC-MLN-layer-5.json>
    |   |-- TCGA_BRCA_Dic_Hover_files
    |   |   `-- TCGA-E2-A1B6-01A-03-TSC.f0917d61-c963-42cf-86c7-48b1e70c662d.pt
    |   |-- TopGenesWSI
    |   |   |-- common_genes
    |   |   |   |-- box_level
    |   |   |   |   `-- TCGA-E2-A1B6-01A-03-TSC.f0917d61-c963-42cf-86c7-48b1e70c662d
    |   |   |   |       `-- stats.csv
    |   |   |   `-- wsi_level
    |   |   `-- unique_genes
    |   |       |-- box_level
    |   |       `-- wsi_level
    |   |-- cnv.csv
    |   `-- rna.csv
    |-- outputs
    |   |-- TCGA_BRCA_spatial
    |   |-- TCGA_Gene_Graphs
    |   `-- TopGenesMLN
    |-- scripts
    |   |-- create_gene_graph.py
    |   |-- create_gene_list.py
    |   |-- get_WSI_celltype_weights.py
    |   `-- get_WSI_gene_info.py
    |-- README.md
    `-- requirements.txt

## Usage

The Python scripts can be run from the `/scripts` directory after
installing all necessary Python modules as listed in `requirements.txt`.

The following scripts are provided:

`create_gene_list.py` - Description: This script finds the set of genes
that are common between the MLN and the genomic data (CNV or RNA). Files
in the folder that have suffix “\_cnv” and “\_rna” are generated using
this script. - Input: /data/GeneCelltypes, /data/cnv.csv - Output:
/data/GeneCelltypes

`get_WSI_gene_info.py` - This script/module reads top genes from WSI
patches and retrieves gene associations and significant neighbourhood
communities from multilayer network. - Input: /data/TopGenesWSI -
Output: /outputs/TopGenesMLN

`get_WSI_celltype_weights.py` - This script takes WSI Graphs (where
patches correspond to groups of nodes), gene celltype associations, and
bulk-RNA data, and produces heatmaps of approximated spatial gene
expression. - Input: /data/TCGA_BRCA_Dic_Hover_files,
/data/GeneCelltypes, /data/rna.csv - Output: /outputs/TCGA_BRCA_spatial

`create_gene_graph.py` - Description: This script takes the genomic data
(CNV or RNA) and MLN graphs (along with computes Louvain community based
Hamming distance matrix) and generates a hierarchical clustering based
similarity matrix for the genes and a gene graph with edge attributes
reflecting the gene-gene similarities. - Input: /data/cnv.csv,
/data/MultilayerGraphs, /dataa/MultilayerCommunities - Output:
/outputs/TCGA_Gene_Graphs
