Advanced Usage
This section describes the various features and options of gokit.
Enrichment
gokit performs Gene Ontology enrichment analysis using Fisher’s exact test for overrepresentation analysis (ORA). To run a single-study enrichment analysis, use the following command:
gokit enrich \
--study study.txt \
--population population.txt \
--assoc assoc.txt \
--out results/goea
Defaults that reduce flags:
--obodefaults to./go-basic.obo--assoc-formatdefaults toauto--test-directiondefaults toboth
Batch Mode
gokit supports batch enrichment of multiple study sets with cross-study semantic similarity comparisons.
gokit enrich \
--studies studies.tsv \
--population population.txt \
--assoc assoc.txt \
--out results_batch \
--out-formats tsv,jsonl \
--compare-semantic \
--semantic-metric wang \
--semantic-top-k 5
studies.tsv accepts either:
study_name<TAB>/path/to/study.txt/path/to/study.txt(name inferred from filename)
Validation
Before running enrichment, it is recommended to validate your input files.
The validate command checks that study, population, and association
files are properly formatted and consistent.
gokit validate \
--study study.txt \
--population population.txt \
--assoc assoc.txt
Download
gokit can download the Gene Ontology files needed for enrichment analysis.
By default, this downloads go-basic.obo and goslim_generic.obo
into the current directory.
gokit download
This is equivalent to:
wget http://current.geneontology.org/ontology/go-basic.obo
wget http://current.geneontology.org/ontology/subsets/goslim_generic.obo
Plotting
gokit supports several plot types for visualizing enrichment results.
Term bar plot:
gokit plot \
--input results_batch/all_studies.tsv \
--study-id study_a \
--kind term-bar \
--direction both \
--top-n 20 \
--out figures/study_a_terms \
--format png
Direction summary plot:
gokit plot \
--input results_batch/all_studies.tsv \
--study-id study_a \
--kind direction-summary \
--alpha 0.05 \
--out figures/study_a_direction_summary.png
Semantic network plot:
gokit plot \
--input results_batch/semantic_similarity.tsv \
--kind semantic-network \
--min-similarity 0.25 \
--max-edges 40 \
--out figures/semantic_network.png
Auto-plot emission from enrich:
gokit enrich \
--studies studies.tsv \
--population population.txt \
--assoc assoc.txt \
--out results_batch \
--compare-semantic \
--emit-plots term-bar,direction-summary,semantic-network \
--plot-format png
Report
Generate a consolidated markdown report from an enrichment run.
gokit report --run results/goea
Semantic Similarity
gokit computes pairwise semantic similarity between study sets using several established metrics.
Available semantic metrics (--semantic-metric):
jaccard: Jaccard index (raw or ancestor-expanded)resnik: Resnik semantic similarity (information content of MICA)lin: Lin semantic similaritywang: Wang semantic similarity
Additional semantic options:
--semantic-top-k: number of top terms to use per study--semantic-namespace: restrict to a specific GO namespace (all,BP,MF,CC)--semantic-min-padjsig: minimum adjusted p-value threshold for term inclusion
Input File Formats
gokit requires three input files for enrichment analysis:
study.txt: one study gene ID per line.
geneA
geneB
population.txt: one background gene ID per line.
geneA
geneB
geneC
geneD
assoc.txt: one gene-to-GO mapping per line. Multiple GO terms on one line are supported using semicolons. Tabs are also accepted.
geneA GO:0008150;GO:0003674
geneB GO:0008150
geneC GO:0005575
Supported association formats:
id2gos: simple gene-to-GO ID mapping (default)gaf: Gene Association File format (GAF 2.x)gpad: Gene Product Association Data format (GPAD 1.x/2.x)gene2go: NCBI gene2go formatauto: automatic format detection (default)
All Options
Enrichment options:
Option |
Usage and meaning |
|---|---|
|
Print help message. |
|
Path to study gene set file. |
|
Path to batch studies manifest (TSV). |
|
Path to population/background gene set file. |
|
Path to gene-to-GO association file. |
|
Association file format. Default: auto. |
|
Path to OBO ontology file. Default: ./go-basic.obo. |
|
Output path prefix. |
|
Comma-separated output formats (tsv, jsonl, parquet). Default: tsv. |
|
Multiple testing correction method: |
|
Direction of test: |
|
ID normalization mode: |
|
Enable cross-study semantic similarity comparison. Default: off. |
|
Semantic similarity metric: |
|
Number of top terms per study for semantic comparison. |
|
GO namespace filter for semantic comparison: |
|
Minimum adjusted p-value threshold for semantic term inclusion. |
|
Comma-separated plot types to auto-emit: |
|
Format for auto-emitted plots. Default: png. |
Command aliases:
Alias |
Equivalent command |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|