Annotations and IC¶
GO3 parses GAF annotations and builds term-level statistics for IC-based similarity methods.
Functions¶
load_gaf(path)
Parses a GAF file and caches gene-to-GO mappings.
Returns a list of
GAFAnnotationobjects.
build_term_counter(annotations)
Builds a
TermCounterfrom parsed annotations.Computes counts and Information Content (IC) by namespace.
Filtering rules in load_gaf¶
During parsing, GO3 applies key biological filters:
skips annotations with evidence code
NDskips annotations with qualifier containing
NOThandles obsolete GO terms:
uses
replaced_bywhen availableotherwise uses first
considertarget when availableotherwise discards that annotation
These rules affect both downstream scores and benchmark comparability.
Example¶
import go3
go3.load_go_terms("go-basic.obo")
annotations = go3.load_gaf("goa_human.gaf")
counter = go3.build_term_counter(annotations)
print("Annotations:", len(annotations))
print("IC terms:", len(counter.ic))
Inspecting structures¶
ann = annotations[0]
print(ann.db_object_id, ann.go_term, ann.evidence)
print(counter.counts.get("GO:0008150", 0))
print(counter.total_by_ns)
print(counter.ic.get("GO:0008150", 0.0))
Class reference¶
GAFAnnotation fields:
db_object_idgo_termevidence
TermCounter fields:
countstotal_by_nsic
API reference¶
- class GAFAnnotation
Bases:
objectStruct representing a single annotation from a GAF file.
Fields¶
- db_object_idstr
The gene product identifier (e.g., UniProt ID).
- go_termstr
The GO term ID (e.g., GO:0008150).
- evidencestr
The evidence code for the annotation (e.g., IEA).
- db_object_id
- evidence
- go_term
- class TermCounter
Bases:
objectStruct holding annotation counts and information content (IC) for GO terms.
Fields¶
- countsdict
Mapping from GO term ID to annotation count.
- total_by_nsdict
Mapping from namespace to total annotation count.
- icdict
Mapping from GO term ID to information content (IC).
- counts
- ic
- total_by_ns
- build_term_counter(py_annotations)
Build a term counter (counts, IC) from GAF annotations.
- Parameters:
py_annotations (list of GAFAnnotation) – List of GAFAnnotation Python objects.
- Returns:
Struct with counts and IC values.
- Return type:
TermCounter
- load_gaf(path)
Load a GAF annotation file and cache the gene-to-GO mapping.
- Parameters:
path (str) – Path to the GAF file.
- Returns:
List of parsed GAF annotations.
- Return type:
list of GAFAnnotation