g:Profiler Python package documentation

This is the documentation for the official g:Profiler Python package. The package contains both a module for inclusion into a Python codebase and the gprofiler.py command-line tool for querying g:GOSt. Invocation of the latter is not documented here, but executing gprofiler.py --help yields a manual.

Synopsis:

from gprofiler import GProfiler
gp = GProfiler("MyTool/0.1")
gp.gprofile("sox2")

GProfiler class

class gprofiler.GProfiler(user_agent, base_url='http://biit.cs.ut.ee/gprofiler/', output_type=10, want_header=False)

A class representing the g:Profiler toolkit. Contains methods for querying the g:GOSt, g:Convert and g:Orth tools. Please see the g:Profiler web tool for extensive documentation on all the options to the methods.

  • user_agent - Required (String) A short user agent string for your tool.
  • base_url - (String) An absolute URL of the g:Profiler instance to use; the stable release by default.
  • output_type - Controls the data structure returned from the methods.
    • GProfiler.OUTPUT_TYPE_FORMATTED - Default Returns a list (lines) of lists (fields), with each field cast into its proper type or None for “N/A” values.
    • GProfiler.OUTPUT_TYPE_LINES - Returns a list containing the raw lines from g:Profiler.
  • want_header - Prepend the header (column names) as the first row of output; false by default.

Options common to several methods:

  • query - Required (String | List) The query is a space- separated string or a list of genes, proteins or other biological entities.
  • organism - (String) The organism name in g:Profiler format.
  • region_query - (Boolean) The query consists of chromosomal regions.
  • numeric_ns - (String) Namespace to use for fully numeric IDs.
gconvert(query, organism='hsapiens', target='ENSG', region_query=False, numeric_ns=None)

Query g:Convert.

  • target - (String) The target namespace.
gorth(query, source_organism='hsapiens', target_organism='mmusculus', region_query=False, numeric_ns=None)

Query g:Orth.

  • source_organism, target_organism - The source and target organism IDs, in g:Profiler format
gprofile(query, organism='hsapiens', all_results=False, ordered=False, region_query=False, exclude_iea=False, underrep=False, evcodes=False, hier_sorting=False, hier_filtering=None, max_p_value=1.0, min_set_size=None, max_set_size=None, min_isect_size=None, max_isect_size=None, correction_method=None, domain_size=None, numeric_ns=None, custom_bg=None, src_filter=None)

Query g:GOSt.

  • all_results - (Boolean) All results, including those deemed not significant.
  • ordered - (Boolean) Ordered query.
  • exclude_iea - (Boolean) Exclude electronic GO annotations.
  • underrep - (Boolean) Measure underrepresentation.
  • evcodes - (Boolean) Request evidence codes in output as the final column.
  • hier_sorting - (Boolean) Sort output into subgraphs.
  • hier_filtering - (Boolean) Hierarchical filtering.
  • max_p_value - (Float) Custom p-value threshold.
  • min_set_size - (Int) Minimum size of functional category.
  • max_set_size - (Int) Maximum size of functional category.
  • min_isect_size - (Int) Minimum size of query / functional category intersection.
  • max_isect_size - (Int) Maximum size of query / functional category intersection.
  • correction_method - Algorithm used for multiple testing correction, one of:
    • GProfiler.THR_GSCS Default g:SCS.
    • GProfiler.THR_FDR Benjamini-Hochberg FDR.
    • GProfiler.THR_BONFERRONI Bonferroni.
  • domain_size - Statistical domain size, one of:
    • GProfiler.DOMAIN_ANNOTATED - Default Only annotated genes.
    • GProfiler.DOMAIN_KNOWN - All known genes.
  • custom_bg - (String | List) Custom statistical background
  • src_filter - (List) A list of data source ID strings, e.g. ["GO:BP", "KEGG"].