SlipGURU Dipartimento di Informatica e Scienze dell'Informazione Università Degli Studi di Genova

bin Package

bin Package

Provides top–level package for applications built over KDVS API.

experiment Module

Provides KDVS application ‘experiment’. It performs prior–knowledge–guided feature selection and re–annotation, according to specified configuration. It uses Gene Ontology as prior knowledge source and microarray gene expression as measured data.

IMPORTANT NOTE. This application is not polished enough. The API needs to be refined more. Some details are still hard–coded.

class kdvs.bin.experiment.MA_GO_Experiment_App

Bases: kdvs.fw.impl.app.CmdLineApp.CmdLineApp

Main application class. It interprets the instance of MA_GO_PROFILE.

prepare()

Add all actions, in the following order:

kdvs.bin.experiment.resolveStaticDataFiles(env)

Action that resolves file paths for all static data files, according to specification. Specification is taken from ‘static_data_files’ dictionary that comes from default configuration file. The names may contain ‘*’ and are interpreted according to glob module rules. Also opens the files and stores their file handles in the same dictionary (under the keys ‘path’ and ‘fh’). See ‘kdvs/config/default_cfg.py’ for details.

kdvs.bin.experiment.loadStaticData(env)

Action that loads all static data files, either into database governed by the instance of DBManager, or through associated manager, if present. It interprets ‘static_data_files’ dictionary that comes from default configuration file. It uses two exclusive elements. If ‘loadToDb’ is True, the file is loaded into database and wrapped in DSV instance. If ‘manager’ is not None, it instantiates the PKCManager instance that governs the content of the file; also, if debug output was requested, it instructs the manager to dump() all the information. See ‘kdvs/config/default_cfg.py’ for details.

The ‘experiment’ application recognizes two static data files. See ‘kdvs/data/README’ for details.

Raises :

Warn :

if more than one manager was specified for static data file

kdvs.bin.experiment.postprocessStaticData(env)

Action that performs postprocessing of static data. Currently, it performs corrections of withdrawn symbols, and generates helper tables with HGNC synonyms and previous symbols; helper tables are wrapped in DBTable instances.

kdvs.bin.experiment.loadUserData(env)

Action that resolves and loads user data files. The following profile sections are interpreted:

  • ‘annotation_file’
  • ‘gedm_file’
  • ‘labels_file’

All these are DSV files; after loading into database, they are wrapped in DSV instances.

See ‘kdvs/example_experiment/example_experiment_cfg.py’ for details.

kdvs.bin.experiment.resolveProfileComponents(env)

Action that goes through application profile and resolves all dynamically created components, that is, reads the individual specifications, creates instances, and performs individual configurations. Currently, the following groups of components are processed, and concrete instances are created:

Also, for statistical techniques, the corresponding degrees of freedom (DOFs) are expanded.

kdvs.bin.experiment.buildGeneIDMap(env)

Action that constructs the concrete instance of GeneIDMap and builds appropriate mapping. The instance type is specified in user configuration file as ‘geneidmap_type’ variable. Also, if debug output was requested, dump the mapping. See ‘kdvs/example_experiment/example_experiment_cfg.py’ for details.

kdvs.bin.experiment.buildPKCIDMap(env)

Action that constructs the concrete instance of PKCIDMap and builds appropriate mapping. The instance type is specified in user configuration file as ‘pkcidmap_type’ variable. Also, if debug output was requested, dumps the mapping. In addition, since in ‘experiment’ application Gene Ontology is used as prior knowledge source, builds specialized submapping for selected GO domain. The GO domain is specified in MA_GO_PROFILE as ‘go_domain’ element. See ‘kdvs/example_experiment/example_experiment_cfg.py’ for details.

kdvs.bin.experiment.obtainLabels(env)

Action that obtains information about samples and labels (if present) and creates Labels instance. It reads samples from primary dataset, reads labels file, and re–orders labels according to samples from primary dataset. Primary dataset has been specified in MA_GO_PROFILE as ‘gedm_file’ element, loaded earlier into database, and wrapped in DSV instance.

kdvs.bin.experiment.buildPKDrivenDataSubsets(env)

Action that builds all prior–knowledge–driven data subsets. The ‘build’ refers to querying of samples and variables from primary dataset. At this stage, the mapping ‘subsets’

  • {PKC_ID : [subsetID, numpy.shape(ds), [vars], [samples]]}

is constructed, and the numpy.ndarray component of DataSet is serialized for each data subset. Currently, the instances of DataSet are not preserved to conserve memory. Also, the iterable of tuples (pkcID, size), sorted in descending order wrt subset size (i.e. starting from largest), is constructed here as ‘pkc2ss’.

kdvs.bin.experiment.buildSubsetHierarchy(env)

Action that constructs the instance of PKDrivenDBSubsetHierarchy. Also, constructs the operation map, that is, determines the sequence of all operations to be performed on each category, and within, on each data subset, such as orderers, env–ops, statistical techniques, reporters etc. The operation map has two components: executable and textual. The executable component stores all references to actual callables to be performed; the textual component stores all textual IDs of the configurable instances that provide the callables themselves. The textual IDs are taken from user configuration file; the instances were created in resolveProfileComponents() action. In addition, if debug output was requested, serializes constructed data structures.

kdvs.bin.experiment.submitSubsetOperations(env)

Action that does the following:

  • instantiates requested concrete JobContainer and JobGroupManager instances, as specified in configuration file(s)

  • for each category:

    • executes associated pre–Env-Op(s)

    • determines test mode directives, if any; in test mode, only fraction of computational jobs are executed;

      looks for two directives in dictionary ‘subset_hierarchy_components_map’->category_name->’misc’ in MA_GO_PROFILE:

      • ‘test_mode_elems’ (integer) – number of test data subsets to consider
      • ‘test_mode_elems_order’ (string) (‘first’/’last’) – consider ‘first’ or ‘last’ number of data subsets

      only computational jobs generated for specified test data subsets will be executed

    • determine submission order, i.e. the final list of data subsets to process further

    • executes associated orderer(s) on the generated submission order

    • for each data subset:

      • generates all job(s) and adds them to job container
  • starts job container

  • serializes the following technical mapping: { internal_job_ID : custom_job_ID },

    where internal job ID is assigned by job container and custom job ID comes from statistical technique

  • if debug output was requested, serializes the submission order

kdvs.bin.experiment.executeSubsetOperations(env)

Action that performs the following:

  • closes job container and executes submitted jobs; this call is blocking for most job containers;

    any exceptions from jobs are serialized for further manual inspection

  • postClose()-ses job container and serializes its technical data obtained with getMiscData(), if any

  • collects all raw job results and prepares them for further post–processing and generation of Results instances

kdvs.bin.experiment.postprocessSubsetOperations(env)

Action that performs the following:

  • checks completion of all jobs, and all individual job groups if any

  • for completed jobs and job groups, generate Results instances

  • serializes technical mapping { technique_ID : [subset_IDs] }, available as ‘technique2ssname’

  • if debug output was requested, serialize job group completion dictionary

  • create the following technical mapping available as ‘technique2DOF’, and serialize it if debug output was requested:

    {techniqueID : { ‘DOFS_IDXS’: (0, 1, ..., n), ‘DOFs’: (name_DOF0, name_DOF1, ..., name_DOFn)}

  • for each category:

    • executes associated post–Env-Op(s)
kdvs.bin.experiment.performSelections(env)

Action that performs the following:

  • for each category:

    • having all Results instances, executes associated outer selector(s) and inner selector(s)
  • if debug output was requested, serialize direct output of outer and inner selector(s)

kdvs.bin.experiment.storeCompleteResults(env)

Action that performs the following:

  • for each data subset:

    • create individual location for results under current storage manager
    • save all generated plots as physical files there
    • serialize Results instance there
kdvs.bin.experiment.prepareReports(env)

Action that performs the following:

  • obtains/constructs the following mappings/instances used by any L1L2–associated reporters:

    • subsets

      {PKC_ID : [subsetID, numpy.shape(ds), [vars], [samples]]}

    • pkcid2ssname

      {PKC_ID : subsetID}

    • technique2DOF

      {techniqueID : { ‘DOFS_IDXS’: (0, 1, ..., n), ‘DOFs’: (name_DOF0, name_DOF1, ..., name_DOFn)}

    • operations_map_img

      (textual component of operation map)

    • categories_map

      {categorizerID : [categories]}

    • cchain

      i.e. categorizers chain, comes directly from MA_GO_PROFILE application profile (element ‘subset_hierarchy_categorizers_chain’)

    • submission_order

      an iterable of PKC IDs sorted in order of submission of their jobs

    • pkc_manager

      a concrete instance of PKCManager that governs all PKCs generated

  • for each category:

    • having all Results instances, and all additional data collected, executes associated reporter(s)

      (physical report files are saved to the specific location(s) under current storage manager)

kdvs.bin.experiment.main()

Table Of Contents