SlipGURU Dipartimento di Informatica e Scienze dell'Informazione Università Degli Studi di Genova

annotation Package

annotation Package

Provides concrete implementation for all abstract entities regarding handling annotations. Also, provides annotation source–specific functionalities.

HGNC Module

Provides specific functionality for handling HUGO Gene Nomenclature Committee (HGNC) annotation data (http://www.genenames.org/). KDVS uses HGNC annotation data to resolve gene naming problems that may arise from using old microarray annotations. HGNC data refer to gene names as ‘symbols’. The data contain all history of approval of individual symbols; some symbols are withdrawn according to the current knowledge of the genes. HGNC data are downloaded in the form of DSV file; see README in ‘data’ directory for more details. Typically, HGNC data are loaded into KDVS DB and wrapped in DSV instance; ‘experimert’ application does it automatically.

kdvs.fw.impl.annotation.HGNC.HGNC_APPROVED_SYMBOL_COL = 'Approved Symbol'

Column in HGNC data that contains approved symbol(s).

kdvs.fw.impl.annotation.HGNC.HGNC_STATUS_COL = 'Status'

Column in HGNC data that contains status of specific gene symbol.

kdvs.fw.impl.annotation.HGNC.HGNC_LOCUS_TYPE_COL = 'Locus Type'

Column in HGNC data that contains type of genetic locus that the symbol describes; it could be valid protein products, tRNA gene, etc.

kdvs.fw.impl.annotation.HGNC.HGNC_PREVIOUS_SYMBOLS_COL = 'Previous Symbols'

Column in HGNC data that contains previously approved symbols.

kdvs.fw.impl.annotation.HGNC.HGNC_SYNONYMS_COL = 'Synonyms'

Column in HGNC data that contains valid synonyms for specific symbol; some synonyms, although not official, are still widely used in different research environments and must be accounted for when resolving any gene name.

kdvs.fw.impl.annotation.HGNC.HGNC_EGENE_ID_COL = 'Entrez Gene ID'

Column in HGNC data that contains Entrez Gene ID for specific symbol; although it is regularly updated, the user should pay attention for its accuracy.

kdvs.fw.impl.annotation.HGNC.HGNC_ENSEMBL_ID_COL = 'Ensembl Gene ID'

Column in HGNC data that contains Ensembl Gene ID for specific symbol; although it is regularly updated, the user should pay attention for its accuracy.

kdvs.fw.impl.annotation.HGNC.HGNC_REFSEQ_ID_COL = 'RefSeq IDs'

Column in HGNC data that contains (possibly many) RefSeq IDs for specific symbol; although it is regularly updated, the user should pay attention for its accuracy.

kdvs.fw.impl.annotation.HGNC.HGNC_FIELDS_SEP = ','

Default DSV separator for the HGNC data file.

kdvs.fw.impl.annotation.HGNC.HGNC_STATUS_APPROVED = 'Approved'

Value in ‘Status’ column referring to the fact that the specific symbol has approved status (i.e. is considered official at this moment).

kdvs.fw.impl.annotation.HGNC.HGNC_STATUS_WITHDRAWN = ['Entry Withdrawn', 'Symbol Withdrawn']

Values in ‘Status’ column referring to the fact that the specific symbol has been withdrawn (i.e. is no longer considered valid); however, it may still be encountered among very old data.

kdvs.fw.impl.annotation.HGNC.HGNC_WITHDRAWN_GS_PART = '~withdrawn'

Suffix that is appended to each withdrawn symbol; KDVS removes it to fasten the querying.

kdvs.fw.impl.annotation.HGNC.HGNC_FIELD_EMPTY = ''

Refers to empty value in any column of HGNC data.

kdvs.fw.impl.annotation.HGNC.HGNCSYNONYMS_TMPL = <kdvs.fw.DBTable.DBTemplate object at 0x4a4ec50>

Database template used by KDVS to create fast query table for resolving synonyms. It defines the name ‘hgnc_synonyms’ and columns ‘synonym’, ‘approved’. The ID column is ‘synonym’. All columns are indexed. With it, user obtains approved symbol given the synonym.

kdvs.fw.impl.annotation.HGNC.HGNCPREVIOUS_TMPL = <kdvs.fw.DBTable.DBTemplate object at 0x4a4ecd0>

Database template used by KDVS to create fast query table for resolving previous symbols. It defines the name ‘hgnc_previous’ and columns ‘previous’, ‘approved’. The ID column is ‘previous’. All columns are indexed. With it, user obtains approved symbol given the previous symbol.

kdvs.fw.impl.annotation.HGNC.correctHGNCApprovedSymbols(hgnc_dsv)

Manually correct HGNC data after loading into DB and wrapping into DSV instance. It removes suffix (defined in HGNC_WITHDRAWN_GS_PART) from withdrawn symbols in order to fasten further querying. This call must be used in the application that uses HGNC data, and must be made after HGNC data has been loaded and wrapped into DSV instance.

Parameters :

hgnc_dsv : DSV

valid instance of DSV that contains HGNC data

kdvs.fw.impl.annotation.HGNC.generateHGNCPreviousSymbols(hgnc_dsv, map_db_key)

Create helper table that eases resolving of previous gene symbols with HGNC data. The helper table may be created in different subordinated database than original HGNC data. The table is specified via template.

Parameters :

hgnc_dsv : DSV

valid instance of DSV that contains HGNC data

map_db_key : string

ID of the database that will hold helper table

Returns :

previousDT : DBTable

wrapper for newly created helper table

kdvs.fw.impl.annotation.HGNC.generateHGNCSynonyms(hgnc_dsv, map_db_key)

Create helper table that eases resolving of synonymic gene symbols with HGNC data. The helper table may be created in different subordinated database than original HGNC data. The table is specified via template.

Parameters :

hgnc_dsv : DSV

valid instance of DSV that contains HGNC data

map_db_key : string

ID of the database that will hold helper table

Returns :

synonymsDT : DBTable

DBTable wrapper for newly created helper table

Table Of Contents