hcitools

hcitools

The hcitools package provides tools for analyzing and visualizing data generated in high-content imaging experiments.

Installation

# Clone repository
git clone -b prod git@mygithub.gsk.com:to561778/hci-tools.git

# Install package
python -m pip install -e hci-tools

Usage

Package documentation is available here. See docs/examples for detailed guides for generating figures and performing various analysis steps.

Developer Instructions

Use the script below to set up a development environment for this package.

# Clone the repository
git clone -b dev git@mygithub.gsk.com:to561778/hci-tools.git
cd hci-tools

# Create conda environment
conda env create -f environment.yml
conda activate hcitools

# Install the package
python -m pip install -e .

Deploying Changes

Once changes have been made, use the scripts/deploy.sh script to rebuild the package wheel and update the documentation. This will also reinstall the package in the active environment.

Note: Only run deploy.sh from the top-level hci-tools directory.

Examples

Heatmaps

This example will show you how to generate various heatmaps and how to annotate them using the plotly library.

Datasets

This example makes use of two of the built-in datasets listed below

  • covid - Protein expression data from a cohort of COVID-19 patients
  • ros-mito - High content imaging features from an experiment.
# Import
from hcitools import datasets, plot

# Load datasets
covid = datasets.load_dataset('covid')
ros = datasets.load_dataset('ros-mito')

# Plotly renderer
plot.set_renderer('notebook')  # Use this when running notebook
plot.set_renderer('iframe_connected')  # Use this when rendering docs

Protein Expression Heatmaps

Here, we'll create a heatmap to look at the expression of proteins in the patients' blood. We'll include colorbars for patient sex and mortality. We'll also look at how you could add annotations to highlight certain regions of the heatmap.

# Prepare data frame
data = (covid.copy()
    .filter(regex='^B-', axis=1))  # Keep only blood markers
metadata =  covid[['Mortality', 'Sex']]
data.columns = [x[2:] for x in data.columns]

# Define groups for heatmap
row_groups = {
    k: list(v.values()) for k, v in metadata.to_dict(orient='index').items()
}
row_group_names = ['Mortality', 'Sex']
row_colors = {'Alive': '#38d652', 'Dead': '#d93e38',
              'Male': 'blue', 'Female': 'pink'}

# Create heatmap
fig = plot.heatmap(
    data=data,
    clust_rows=True,
    clust_cols=True,
    row_colors=row_colors,
    row_groups=row_groups,
    row_group_names=row_group_names
)

# Add a title and tweak the size
fig.update_layout(
    title='Blood Biomarkers',
    title_x=0.5,
    height=400,
    width=700
)

# Annotate highly expressed proteins
fig.add_shape(
    type='rect',
    x0='MCP-1', x1='EGF',
    y0=0, y1=88,
    row=1, col=3,
    line=dict(color='black')
)

fig.show()

Correlation Maps

Here, we'll generate a heatmap to visualize the correlation of blood proteins with markers of clinical severity.

# Prepare data frame
vars = ['APACHE1h', 'APACHE24h', 'CCI']
data = (covid.copy()
    .set_index(vars)
    .filter(regex='^B-')
    .reset_index()
    .corr()
    .loc[vars, :]
    .drop(vars, axis=1))
data.columns = [x[2:] for x in data.columns]

# Create heatmap
fig = plot.heatmap(
    data=data,
    clust_cols=True,
    clust_rows=True
)

fig.update_layout(
    title='Correlation with Clinical Severity',
    title_x=0.5,
    height=400,
    width=700
)

# Show ticks on the y axis (these are hidden by default)
fig.update_yaxes(
    showticklabels=True, 
    tickfont_size=14
)


fig.show()

Plate Map

Next, we'll show how you can generate an interactive heatmap to view expression across a 96 (or 384) well plate using high-content imaging data.

fig = plot.plate_heatmap(
    data=ros,
    feature="Non-border cells - Number of Objects"
)
fig.update_layout(width=900, height=500)

fig.show()

Clustering

This example will show you how to perform dimensionality reduction and visualize any resulting clusters. We will also show how certain preprocessing steps can be done using preprocess.clean_data.

Datasets

This example makes use of the ros-mito data set which contains features extracted from high-content images.

# Imports
from hcitools import datasets, plot, analysis, preprocess

# Load dataset
ros = datasets.load_dataset('ros-mito')

# Plotly renderer
plot.set_renderer('notebook')  # Use this when running notebook
plot.set_renderer('iframe_connected')  # Use this when rendering docs
# Preprocessing
meta = ['Well', 'Row', 'Column', 'Timepoint', 'Compound', 'Conc']
df, dropped, LOG = preprocess.clean_data(
    data=ros,
    metacols=meta,
    dropna=True,
    drop_low_var=0.0,
    corr_thresh=0.9,
    verbose=True
)
df = df.set_index(meta)

# Generate clusters with default arguments
proj, expvar = analysis.dim_reduction(data=df, method=['pca', 'tsne'])
# Plot PCA components
fig = plot.pca_comps(proj, expvar, n_comps=3)
fig.update_layout(width=700, height=400)

fig.show()
# Compare 2 compounds
fig = plot.clusters(proj, 'Sorafenib Tosylate', 'Imatinib mesylate', 'tsne')
fig.update_layout(width=750, height=450)

fig.show()
 1"""
 2.. include:: ../README.md
 3
 4# Examples
 5.. include:: ../docs/heatmaps.md
 6.. include:: ../docs/clustering.md
 7"""
 8
 9import os
10
11__all__ = ['preprocess', 'analysis', 'plot', 'datasets']
12
13location = os.path.dirname(os.path.realpath(__file__))
14
15
16class datasets:
17    """
18    Class for loading built-in datasets
19    """
20
21    _avail = {
22        'caer': os.path.join(location, 'datasets', 'caer-timecourse.tsv'),
23        'covid': os.path.join(location, 'datasets', 'covid-cohort.tsv'),
24        'ros-mito': os.path.join(location, 'datasets', 'ros-mito-timecourse.tsv')
25    }
26
27    def list_datasets():
28        """
29        List available built-in datasets
30        """
31
32        print("Available Datasets:", *datasets._avail.keys(), sep='\n')
33
34
35    def load_dataset(dataset):
36        """
37        Load a built-in dataset
38
39        Parameters
40        ----------
41        `dataset` : str
42            One of 'covid', 'caer' or 'ros-mito'
43
44        Returns
45        -------
46        pd.DataFrame
47            Desired dataset
48        """
49
50        assert dataset in ['caer', 'covid', 'ros-mito'], \
51            "Unknown dataset. See datasets.list_datasets()"
52        from pandas import read_csv
53
54        return read_csv(datasets._avail[dataset], sep='\t')
class datasets:
17class datasets:
18    """
19    Class for loading built-in datasets
20    """
21
22    _avail = {
23        'caer': os.path.join(location, 'datasets', 'caer-timecourse.tsv'),
24        'covid': os.path.join(location, 'datasets', 'covid-cohort.tsv'),
25        'ros-mito': os.path.join(location, 'datasets', 'ros-mito-timecourse.tsv')
26    }
27
28    def list_datasets():
29        """
30        List available built-in datasets
31        """
32
33        print("Available Datasets:", *datasets._avail.keys(), sep='\n')
34
35
36    def load_dataset(dataset):
37        """
38        Load a built-in dataset
39
40        Parameters
41        ----------
42        `dataset` : str
43            One of 'covid', 'caer' or 'ros-mito'
44
45        Returns
46        -------
47        pd.DataFrame
48            Desired dataset
49        """
50
51        assert dataset in ['caer', 'covid', 'ros-mito'], \
52            "Unknown dataset. See datasets.list_datasets()"
53        from pandas import read_csv
54
55        return read_csv(datasets._avail[dataset], sep='\t')

Class for loading built-in datasets

datasets()
def list_datasets():
28    def list_datasets():
29        """
30        List available built-in datasets
31        """
32
33        print("Available Datasets:", *datasets._avail.keys(), sep='\n')

List available built-in datasets

def load_dataset(dataset):
36    def load_dataset(dataset):
37        """
38        Load a built-in dataset
39
40        Parameters
41        ----------
42        `dataset` : str
43            One of 'covid', 'caer' or 'ros-mito'
44
45        Returns
46        -------
47        pd.DataFrame
48            Desired dataset
49        """
50
51        assert dataset in ['caer', 'covid', 'ros-mito'], \
52            "Unknown dataset. See datasets.list_datasets()"
53        from pandas import read_csv
54
55        return read_csv(datasets._avail[dataset], sep='\t')

Load a built-in dataset

Parameters
  • dataset (str): One of 'covid', 'caer' or 'ros-mito'
Returns
  • pd.DataFrame: Desired dataset