Annotations and IC

GO3 parses GAF annotations and builds term-level statistics for IC-based similarity methods.

Functions

load_gaf(path)

  • Parses a GAF file and caches gene-to-GO mappings.

  • Returns a list of GAFAnnotation objects.

build_term_counter(annotations)

  • Builds a TermCounter from parsed annotations.

  • Computes counts and Information Content (IC) by namespace.

Filtering rules in load_gaf

During parsing, GO3 applies key biological filters:

  • skips annotations with evidence code ND

  • skips annotations with qualifier containing NOT

  • handles obsolete GO terms:

    • uses replaced_by when available

    • otherwise uses first consider target when available

    • otherwise discards that annotation

These rules affect both downstream scores and benchmark comparability.

Example

import go3

go3.load_go_terms("go-basic.obo")
annotations = go3.load_gaf("goa_human.gaf")
counter = go3.build_term_counter(annotations)

print("Annotations:", len(annotations))
print("IC terms:", len(counter.ic))

Inspecting structures

ann = annotations[0]
print(ann.db_object_id, ann.go_term, ann.evidence)

print(counter.counts.get("GO:0008150", 0))
print(counter.total_by_ns)
print(counter.ic.get("GO:0008150", 0.0))

Class reference

GAFAnnotation fields:

  • db_object_id

  • go_term

  • evidence

TermCounter fields:

  • counts

  • total_by_ns

  • ic

API reference

class GAFAnnotation

Bases: object

Struct representing a single annotation from a GAF file.

Fields

db_object_idstr

The gene product identifier (e.g., UniProt ID).

go_termstr

The GO term ID (e.g., GO:0008150).

evidencestr

The evidence code for the annotation (e.g., IEA).

db_object_id
evidence
go_term
class TermCounter

Bases: object

Struct holding annotation counts and information content (IC) for GO terms.

Fields

countsdict

Mapping from GO term ID to annotation count.

total_by_nsdict

Mapping from namespace to total annotation count.

icdict

Mapping from GO term ID to information content (IC).

counts
ic
total_by_ns
build_term_counter(py_annotations)

Build a term counter (counts, IC) from GAF annotations.

Parameters:

py_annotations (list of GAFAnnotation) – List of GAFAnnotation Python objects.

Returns:

Struct with counts and IC values.

Return type:

TermCounter

load_gaf(path)

Load a GAF annotation file and cache the gene-to-GO mapping.

Parameters:

path (str) – Path to the GAF file.

Returns:

List of parsed GAF annotations.

Return type:

list of GAFAnnotation