Utilities and Advanced APIs¶
Threading¶
set_num_threads(n_threads)
Controls the maximum number of internal threads used by GO3 batch operations.
import go3
go3.set_num_threads(8)
IC lookup¶
term_ic(go_id, counter) returns the Information Content for one term.
ic = go3.term_ic("GO:0006397", counter)
Gene distance matrices¶
gene_distance_matrix(genes=None, ontology="BP", similarity="lin", groupwise="bma", counter=..., distance_transform="auto")
Returns (gene_order, distance_matrix).
Distance transforms:
autoone_minusmax_minusreciprocal
Embedding APIs¶
tsne_genes(...)umap_genes(...)plot_tsne_genes(...)plot_umap_genes(...)plot_embedding(...)
These helpers build embeddings from precomputed GO similarity-derived distances.
Minimal embedding example¶
genes = ["BRCA1", "CASP8", "TP53", "EGFR", "AKT1"]
genes, emb = go3.tsne_genes(
genes,
ontology="BP",
similarity="lin",
groupwise="bma",
counter=counter,
perplexity=2.0,
random_state=42,
)
fig, ax = go3.plot_embedding(emb, genes=genes, annotate="auto", title="GO embedding")
API reference¶
- compare_gene_pairs_batch(pairs, ontology, similarity, groupwise, counter)
Compute semantic similarity between genes in batches.
- Parameters:
pairs (list of (str, str)) – List of pairs of genes to calculate the semantic similarity
ontology (str) – Name of the subontology of GO to use: BP, MF or CC.
similarity (str) – Name of the similarity method.
groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
- Returns:
List of similarity scores.
- Return type:
list of float
- Raises:
ValueError – If method or combine are unknown.
- gene_distance_matrix(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto')
Compute a gene-to-gene distance matrix using GO semantic similarity.
- Parameters:
genes (Optional[list[str]]) – List of genes to include. If None, uses all genes with annotations.
ontology (str) – Name of the subontology of GO to use: BP, MF or CC.
similarity (str) – Name of the similarity method.
groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
distance_transform (str) – How to convert similarity to distance. Options: “auto”, “one_minus”, “reciprocal”, “max_minus”.
- Returns:
Tuple with the gene order and a square distance matrix.
- Return type:
(list[str], list[list[float]])
- plot_embedding(embedding, genes=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)
Plot a 2D embedding with matplotlib.
- plot_tsne_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, perplexity=30.0, n_iter=1000, random_state=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)
Compute t-SNE embeddings and plot them with matplotlib.
- plot_umap_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, n_neighbors=15, min_dist=0.1, random_state=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)
Compute UMAP embeddings and plot them with matplotlib.
- set_num_threads(n_threads)
Configure the maximum number of threads rayon will use.
- Parameters:
n_threads (int) – Number of threads to use. If 0, uses all available cores.
- term_ic(go_id, counter)
Compute the Information Content (IC) of a GO term.
- Parameters:
go_id (str) – GO term identifier.
counter (TermCounter) – Precomputed term counter with IC values.
- Returns:
The IC of the GO term.
- Return type:
float
- termset_similarity(terms1, terms2, term_similarity='lin', groupwise='bma', counter=None)
Compute semantic similarity between two sets of GO terms.
- Parameters:
terms1 (list of str) – First list of GO term IDs.
terms2 (list of str) – Second list of GO term IDs.
term_similarity (str) – Name of the pairwise similarity method.
groupwise (str) – Groupwise combination method. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
- Returns:
Similarity score.
- Return type:
float
- tsne_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, perplexity=30.0, n_iter=1000, random_state=None)
Compute t-SNE embeddings from a gene list using a precomputed distance matrix.
- umap_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, n_neighbors=15, min_dist=0.1, random_state=None)
Compute UMAP embeddings from a gene list using a precomputed distance matrix.