Utilities and Advanced APIs

Threading

set_num_threads(n_threads)

Controls the maximum number of internal threads used by GO3 batch operations.

import go3
go3.set_num_threads(8)

IC lookup

term_ic(go_id, counter) returns the Information Content for one term.

ic = go3.term_ic("GO:0006397", counter)

Gene distance matrices

gene_distance_matrix(genes=None, ontology="BP", similarity="lin", groupwise="bma", counter=..., distance_transform="auto")

Returns (gene_order, distance_matrix).

Distance transforms:

  • auto

  • one_minus

  • max_minus

  • reciprocal

Embedding APIs

  • tsne_genes(...)

  • umap_genes(...)

  • plot_tsne_genes(...)

  • plot_umap_genes(...)

  • plot_embedding(...)

These helpers build embeddings from precomputed GO similarity-derived distances.

Minimal embedding example

genes = ["BRCA1", "CASP8", "TP53", "EGFR", "AKT1"]

genes, emb = go3.tsne_genes(
    genes,
    ontology="BP",
    similarity="lin",
    groupwise="bma",
    counter=counter,
    perplexity=2.0,
    random_state=42,
)

fig, ax = go3.plot_embedding(emb, genes=genes, annotate="auto", title="GO embedding")

API reference

compare_gene_pairs_batch(pairs, ontology, similarity, groupwise, counter)

Compute semantic similarity between genes in batches.

Parameters:
  • pairs (list of (str, str)) – List of pairs of genes to calculate the semantic similarity

  • ontology (str) – Name of the subontology of GO to use: BP, MF or CC.

  • similarity (str) – Name of the similarity method.

  • groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.

  • counter (TermCounter) – Precomputed IC values.

Returns:

List of similarity scores.

Return type:

list of float

Raises:

ValueError – If method or combine are unknown.

gene_distance_matrix(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto')

Compute a gene-to-gene distance matrix using GO semantic similarity.

Parameters:
  • genes (Optional[list[str]]) – List of genes to include. If None, uses all genes with annotations.

  • ontology (str) – Name of the subontology of GO to use: BP, MF or CC.

  • similarity (str) – Name of the similarity method.

  • groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.

  • counter (TermCounter) – Precomputed IC values.

  • distance_transform (str) – How to convert similarity to distance. Options: “auto”, “one_minus”, “reciprocal”, “max_minus”.

Returns:

Tuple with the gene order and a square distance matrix.

Return type:

(list[str], list[list[float]])

plot_embedding(embedding, genes=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)

Plot a 2D embedding with matplotlib.

plot_tsne_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, perplexity=30.0, n_iter=1000, random_state=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)

Compute t-SNE embeddings and plot them with matplotlib.

plot_umap_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, n_neighbors=15, min_dist=0.1, random_state=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)

Compute UMAP embeddings and plot them with matplotlib.

set_num_threads(n_threads)

Configure the maximum number of threads rayon will use.

Parameters:

n_threads (int) – Number of threads to use. If 0, uses all available cores.

term_ic(go_id, counter)

Compute the Information Content (IC) of a GO term.

Parameters:
  • go_id (str) – GO term identifier.

  • counter (TermCounter) – Precomputed term counter with IC values.

Returns:

The IC of the GO term.

Return type:

float

termset_similarity(terms1, terms2, term_similarity='lin', groupwise='bma', counter=None)

Compute semantic similarity between two sets of GO terms.

Parameters:
  • terms1 (list of str) – First list of GO term IDs.

  • terms2 (list of str) – Second list of GO term IDs.

  • term_similarity (str) – Name of the pairwise similarity method.

  • groupwise (str) – Groupwise combination method. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.

  • counter (TermCounter) – Precomputed IC values.

Returns:

Similarity score.

Return type:

float

tsne_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, perplexity=30.0, n_iter=1000, random_state=None)

Compute t-SNE embeddings from a gene list using a precomputed distance matrix.

umap_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, n_neighbors=15, min_dist=0.1, random_state=None)

Compute UMAP embeddings from a gene list using a precomputed distance matrix.