SlipGURU Dipartimento di Informatica e Scienze dell'Informazione Università Degli Studi di Genova

report Package

report Package

Provides concrete implementations of reporters.

L1L2 Module

Provides concrete implementation of reporters that are associated with statistical techniques which utilize l1l2py library and all associated functionalities such as selectors. Refer to individual reporters for more details. IMPORTANT NOTE: the functionality of those reporters was devised for ‘experiment’ application, therefore LOTS OF details are hard–coded.

class kdvs.fw.impl.report.L1L2.L1L2_VarFreq_Reporter(**kwargs)

Bases: kdvs.fw.Report.Reporter

This local reporter, for each Results instance, produces single report that contains a list of ‘properly selected variables’, sorted according to descending frequencies (see literature about L1L2 implementation for technical details). It recognizes the following Results elements:

Since it focuses on variable selection (called ‘inner selection’ in KDVS terminology), will produce valid report only if inner selection was done with one of the following inner selectors:

This reporter accepts no specific parameters. It re–implements initialize() method to get em2annotation mapping that must be provided in additionalData dictionary. In ‘experiment’ application, this reporter processes Results instances only for those subsets that were ‘selected’ in the terms of ‘outer selection’, based on classfication error.

specific_parameters = ()
global_parameters = ()
initialize(storageManager, subsets_results_location, additionalData)
Parameters :

storageManager : StorageManager

instance of storage manager that will govern the production of physical files

subsets_results_location : string

identifier of standard location used to store KDVS results

additionalData : object

any additional data used by the reporter; the instance must contain em2annotation mapping

Raises :

Error :

if em2annotation mapping was not found in additionalData

produce(resultsIter)

Produce single report with frequencies of variables, selection information, and all bioinformatic annotations, for each Results instance. Refer to the comments how ‘selection’ of individual variables is reported.

Parameters :

resultsIter : iterable of Results

iterable of results obtained across single category

class kdvs.fw.impl.report.L1L2.L1L2_VarCount_Reporter(**kwargs)

Bases: kdvs.fw.Report.Reporter

This local reporter produces counts of selected and non selected variables across specific individual results. For each variable, it lists its count across all observed Results instances, and all available bioinformatic annotations. It produces two reports (for selected and non selected variables) for each degree of freedom (DOF) associated with the technique. For instance, if technique has 3 DOFs associated:

  • D1, D2, D3

6 reports will be produced in total:

  • sel_D1, sel_D2, sel_D3
  • nsel_D1, nsel_D2, nsel_D3

(exact names may vary). Because of that, any technique that uses this reporter must properly separate raw results for each DOF (i.e. properly fill DOF Results element, see individual techniques for details). It recognizes the following Results elements:

Since it focuses on variable selection (called ‘inner selection’ in KDVS terminology), it will produce valid report only if inner selection was done with one of the following inner selectors:

This reporter accepts no specific parameters. It re–implements initialize() method to get em2annotation and technique2DOF mappings that must be provided in additionalData dictionary. ‘Technique2DOF’ is a technical mapping

  • {techniqueID : { ‘DOFS_IDXS’: (0, 1, ..., n), ‘DOFs’: (name_DOF0, name_DOF1, ..., name_DOFn)}

that is produced so far only in ‘experiment’ application. IMPORTANT: this reporter assumes that the input Results instances originate from single category.

specific_parameters = ()
global_parameters = ()
initialize(storageManager, subsets_results_location, additionalData)
Parameters :

storageManager : StorageManager

instance of storage manager that will govern the production of physical files

subsets_results_location : string

identifier of standard location used to store KDVS results

additionalData : object

any additional data used by the reporter; the instance must contain em2annotation and technique2DOF mappings

Raises :

Error :

if em2annotation or technique2DOF mapping was not found in additionalData

produce(resultsIter)

For input Results instances, identify their common technique (via runtime standard Results element techID), along with associated DOFs (via technique2DOF). For each DOF, count all selected and non selected variables across all Results instances, and produce two report files (for selected and not selected variables), where for each variable a count is reported, along with all bioinformatic annotations.

Parameters :

resultsIter : iterable of Results

iterable of results obtained across single category

Raises :

Error :

if more than one technique was detected across Results instances

class kdvs.fw.impl.report.L1L2.L1L2_PKC_Reporter(**kwargs)

Bases: kdvs.fw.Report.Reporter

This local reporter produces reports that contain detailed information regarding ‘outer selection’, that is, results of classification performed by statistical technique on given data subset (classification is possible since data subset contains data points from two, or more, classes of samples); data subset is in turn associated to specific PriorKnowledgeConcept (PKC). The following details are reported for each PKC:

  • selection status (in ‘outer selection’ sense)
  • average error obtained on test splits
  • average error obtained on training splits
  • standard deviation of the error obtained on test splits
  • standard deviation of the error obtained on training splits
  • variance of the error obtained on test splits
  • median error obtained on test splits
  • total number of variables in the subset
  • total number of ‘properly selected’ variables (‘inner selection’ in KDVS sense)

It produces single report for each degree of freedom (DOF) associated with the technique. For instance, if technique has 3 DOFs associated:

  • D1, D2, D3

3 reports will be produced in total:

  • err_D1, err_D2, err_D3

(exact names may vary). The technique must expose the following properly filled Results elements:

  • RESULTS_SUBSET_ID_KEY
  • ‘Selection’->’outer’
  • ‘Selection’->’inner’
  • ‘Avg Error TS’
  • ‘Avg Error TR’
  • ‘Std Error TS’
  • ‘Std Error TR’
  • ‘Var Error TS’
  • ‘Med Error TS’

NOTE: all ‘Error’–like elements will be reported “as-is”.

It will produce valid report only if outer selection was done with the following:

and only if inner selection was done with one of the following:

This reporter accepts no specific parameters. It re–implements initialize() method to get the following mappings:

  • subsets
  • pkcid2ssname
  • technique2DOF

that must be provided in additionalData dictionary. The subsets is a technical mapping:

  • {PKC_ID : [subsetID, numpy.shape(ds), [vars], [samples]]}

produced so far only in ‘experiment’ application. The pkcid2ssname is a technical mapping

  • {PKC_ID : subsetID}

produced so far only in ‘experiment’ application. Technique2DOF is a technical mapping:

  • {techniqueID : { ‘DOFS_IDXS’: (0, 1, ..., n), ‘DOFs’: (name_DOF0, name_DOF1, ..., name_DOFn)}

produced so far only in ‘experiment’ application. IMPORTANT: this reporter assumes that the input Results instances originate from single category.

specific_parameters = ()
global_parameters = ()
initialize(storageManager, subsets_results_location, additionalData)
Parameters :

storageManager : StorageManager

instance of storage manager that will govern the production of physical files

subsets_results_location : string

identifier of standard location used to store KDVS results

additionalData : object

any additional data used by the reporter; the instance must contain subsets, pkcid2ssname, technique2DOF mappings

Raises :

Error :

if subsets, pkcid2ssname or technique2DOF mapping was not found in additionalData

produce(resultsIter)

For input Results instances, identify their common technique (via runtime standard Results element techID), along with associated DOFs (via technique2DOF). For each DOF, scan all Results instances to gather all required information, and produce single report file, where for each PKC, relevant statistical information is listed.

Parameters :

resultsIter : iterable of Results

iterable of results obtained across single category

Raises :

Error :

if more than one technique was detected across Results instances

class kdvs.fw.impl.report.L1L2.L1L2_PKC_UTL_Reporter(**kwargs)

Bases: kdvs.fw.Report.Reporter

This global reporter produces unified term list (UTL) for each single combination of DOFs coming from statistical techniques employed on selected categorizer hierarchy. For instance, if technique T1 with DOFs

  • a1, a2, a3

was used in category A of categorizer C, and technique T2 with DOFs

  • b1, b2, b3

was used in category B of the same categorizer C, the following combinations will be generated:

  • a1_b1, a1_b2, a1_b3
  • a2_b1, a2_b2, a2_b3
  • a3_b1, a3_b2, a3_b3

and in total 9 UTLs will be produced. UTL contains series of information specific for prior knowledge concepts (PKCs), including:

  • selection status (in ‘outer selection’ sense)

  • identifier of associated data subset

  • full name of prior knowledge concept (as given by PKC manager)

  • total number of variables in data subset

  • total number of ‘properly selected’ variables (‘inner selection’ in KDVS sense)

  • ‘error estimate’, i.e. average error obtained on test splits

    (must be exposed by the technique as Results element Classification Error)

  • number of true positives

  • number of true negatives

  • number of false positives

  • number of false negatives

  • Matthews Correlation Coefficient

All this information is associated with classification process performed by the statistical technique (classification is possible since data subset contains data points from two, or more, classes of samples). The technique must expose the following properly filled Results elements:

  • ‘Selection’->’outer’

  • ‘Selection’->’inner’

  • ‘Classification Error’

    (produced for all DOFs separately, see individual techniques for details)

  • ‘CM MCC’

    (as tuple with TP, TN, FP, FN, MCC values)

It will produce valid report only if outer selection was done with the following:

and only if inner selection was done with one of the following:

This reporter accepts no specific parameters. It re–implements initialize() method to get the following mappings/instances:

  • subsets
  • pkcid2ssname
  • technique2DOF
  • operations_map_img
  • categories_map
  • cchain
  • submission_order
  • pkc_manager

that must be provided in additionalData dictionary. The following mappings/instances are produced so far only inside ‘experiment’ application:

  • subsets

    {PKC_ID : [subsetID, numpy.shape(ds), [vars], [samples]]}

  • pkcid2ssname

    {PKC_ID : subsetID}

  • technique2DOF

    {techniqueID : { ‘DOFS_IDXS’: (0, 1, ..., n), ‘DOFs’: (name_DOF0, name_DOF1, ..., name_DOFn)}

  • operations_map_img

    (textual representation of internal mapping operations_map)

  • categories_map

    {categorizerID : [categories]}

  • cchain

    i.e. categorizers chain, comes directly from MA_GO_PROFILE application profile (element ‘subset_hierarchy_categorizers_chain’)

  • submission_order

    an iterable of PKC IDs sorted in order of submission of their jobs

  • pkc_manager

    a concrete instance of PKCManager that governs all PKCs generated

IMPORTANT: this reporter assumes that, across each category of the considered categorizer, all Results instances originated from single statistical technique. See comments for the algorithm details.

specific_parameters = ()
global_parameters = ()
initialize(storageManager, subsets_results_location, additionalData)
Parameters :

storageManager : StorageManager

instance of storage manager that will govern the production of physical files

subsets_results_location : string

identifier of standard location used to store KDVS results

additionalData : object

any additional data used by the reporter; the instance must contain the following mappings/instances: subsets, pkcid2ssname, technique2DOF, operations_map_img, categories_map, cchain, submission_order, pkc_manager

Raises :

Error :

if any of the following was not found in additionalData: subsets, pkcid2ssname, technique2DOF, operations_map_img, categories_map, cchain, submission_order, pkc_manager

produceForHierarchy(subsetHierarchy, ssIndResults, currentCategorizerID, currentCategoryID)

Having current categorizer, get one below in the chain (if possible), along with all its categories. For each category, make sure that all relevant Results instances come from the same technique, and identify it, along with associated DOFs. Having series of DOFs for categories, permute them as explained above, and for each permutation, scan associated Results instances, gather requested information, and produce single report file. See comments for more technical details.

Parameters :

subsetHierarchy : SubsetHierarchy

concrete current hierarchy of subsets that contains whole category tree

ssIndResults : dict of iterable of Results

iterables of Results obtained for all categories at once

currentCategorizerID : string

identifier of Categorizer from which the reporter will start work

currentCategoryID : string

optionally, identifier of category the reporter shall start with

Raises :

Error :

if more than one technique was detected across any category

Table Of Contents