Semantic Similarity¶
GO3 supports term-level and gene-level semantic similarity across multiple methods.
Quick reference¶
Method |
Key |
Family |
Typical range |
|---|---|---|---|
Resnik |
|
IC-based |
|
Lin |
|
IC-based |
|
Jiang-Conrath similarity |
|
IC-based |
|
SimRel |
|
IC-based |
|
Information Coefficient |
|
IC-based |
|
GraphIC |
|
Hybrid |
|
Wang |
|
Topological |
|
TopoICSim |
|
Hybrid |
|
Term-level APIs¶
semantic_similarity(id1, id2, method, counter)
Computes one score for one term pair.
Raises
ValueErrorifmethodis unknown.
batch_similarity(list1, list2, method, counter)
Computes one score per aligned pair.
Requires
len(list1) == len(list2).Raises
ValueErrorif list sizes differ or method is unknown.
Set-level API¶
termset_similarity(terms1, terms2, term_similarity="lin", groupwise="bma", counter=...)
Groupwise strategies:
bmamaxavghausdorffsimgic
Notes:
simgicis set-based and does not use the pairwise method in the same way as other strategies.For empty sets, GO3 returns
0.0.
Gene-level APIs¶
compare_genes(gene1, gene2, ontology, similarity, groupwise, counter)
Ontology must be one of
BP,MF,CC.Raises
ValueErrorif either gene is missing from loaded annotations.
compare_gene_pairs_batch(pairs, ontology, similarity, groupwise, counter)
Fast path for large pair lists.
Missing/empty per-gene term mappings yield
0.0for those pairs.
Practical behavior and edge cases¶
Invalid or missing GO IDs in similarity calls generally return
0.0.Terms from different namespaces produce
0.0.For normalized methods (for example
linandwang), self-similarity is typically near 1.0.
Distance-oriented workflow¶
For clustering/embedding workflows, use:
gene_distance_matrixtsne_genesumap_genes
These functions convert similarity to distance using distance_transform rules (see Visualization (t-SNE / UMAP)).
Mathematical definitions¶
Resnik¶
Lin¶
Jiang-Conrath (distance-derived similarity)¶
SimRel¶
Wang¶
Wang similarity uses weighted ancestor contributions from GO graph topology and does not require annotation IC frequencies in the same way as IC-only methods.
TopoICSim¶
TopoICSim combines topology-aware paths and IC-derived weights to produce a bounded similarity.
Bibliography¶
API reference¶
- batch_similarity(list1, list2, method, counter)¶
Compute pairwise semantic similarity in batch using a selected method.
- Parameters:
list1 (list of str) – First list of GO term IDs.
list2 (list of str) – Second list of GO term IDs.
method (str) – Name of the similarity method.
counter (TermCounter) – Precomputed IC values.
- Returns:
List of similarity scores.
- Return type:
list of float
- Raises:
ValueError – If input lists differ in length or method is unknown.
- compare_gene_pairs_batch(pairs, ontology, similarity, groupwise, counter)¶
Compute semantic similarity between genes in batches.
- Parameters:
pairs (list of (str, str)) – List of pairs of genes to calculate the semantic similarity
ontology (str) – Name of the subontology of GO to use: BP, MF or CC.
similarity (str) – Name of the similarity method.
groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
- Returns:
List of similarity scores.
- Return type:
list of float
- Raises:
ValueError – If method or combine are unknown.
- compare_genes(gene1, gene2, ontology, similarity, groupwise, counter)¶
Compute semantic similarity between genes.
- Parameters:
gene1 (str) – Gene symbol of the first gene.
gene2 (str) – Gene symbol of the second gene.
ontology (str) – Name of the subontology of GO to use: BP, MF or CC.
similarity (str) – Name of the similarity method.
groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
- Returns:
Similarity score.
- Return type:
float
- Raises:
ValueError – If method or combine are unknown.
- semantic_similarity(id1, id2, method, counter)¶
Compute semantic similarity between two GO terms using a selected method.
- Parameters:
id1 (str) – First GO term ID.
id2 (str) – Second GO term ID.
method (str) – Name of the similarity method. Options: “resnik”, “lin”, etc.
counter (TermCounter) – Precomputed IC values.
- Returns:
Similarity score.
- Return type:
float
- Raises:
ValueError – If the method is unknown.
- term_ic(go_id, counter)¶
Compute the Information Content (IC) of a GO term.
- Parameters:
go_id (str) – GO term identifier.
counter (TermCounter) – Precomputed term counter with IC values.
- Returns:
The IC of the GO term.
- Return type:
float
- termset_similarity(terms1, terms2, term_similarity='lin', groupwise='bma', counter=None)¶
Compute semantic similarity between two sets of GO terms.
- Parameters:
terms1 (list of str) – First list of GO term IDs.
terms2 (list of str) – Second list of GO term IDs.
term_similarity (str) – Name of the pairwise similarity method.
groupwise (str) – Groupwise combination method. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
- Returns:
Similarity score.
- Return type:
float