Provides concrete implementations of reporters.
Provides concrete implementation of reporters that are associated with statistical techniques which utilize l1l2py library and all associated functionalities such as selectors. Refer to individual reporters for more details. IMPORTANT NOTE: the functionality of those reporters was devised for ‘experiment’ application, therefore LOTS OF details are hard–coded.
Bases: kdvs.fw.Report.Reporter
This local reporter, for each Results instance, produces single report that contains a list of ‘properly selected variables’, sorted according to descending frequencies (see literature about L1L2 implementation for technical details). It recognizes the following Results elements:
- ‘RESULTS_SUBSET_ID_KEY‘
- ‘Selection’->’inner’
Since it focuses on variable selection (called ‘inner selection’ in KDVS terminology), will produce valid report only if inner selection was done with one of the following inner selectors:
This reporter accepts no specific parameters. It re–implements initialize() method to get em2annotation mapping that must be provided in additionalData dictionary. In ‘experiment’ application, this reporter processes Results instances only for those subsets that were ‘selected’ in the terms of ‘outer selection’, based on classfication error.
See also
Parameters : | storageManager : StorageManager
subsets_results_location : string
additionalData : object
|
---|---|
Raises : | Error :
|
Produce single report with frequencies of variables, selection information, and all bioinformatic annotations, for each Results instance. Refer to the comments how ‘selection’ of individual variables is reported.
Parameters : | resultsIter : iterable of Results
|
---|
Bases: kdvs.fw.Report.Reporter
This local reporter produces counts of selected and non selected variables across specific individual results. For each variable, it lists its count across all observed Results instances, and all available bioinformatic annotations. It produces two reports (for selected and non selected variables) for each degree of freedom (DOF) associated with the technique. For instance, if technique has 3 DOFs associated:
- D1, D2, D3
6 reports will be produced in total:
- sel_D1, sel_D2, sel_D3
- nsel_D1, nsel_D2, nsel_D3
(exact names may vary). Because of that, any technique that uses this reporter must properly separate raw results for each DOF (i.e. properly fill DOF Results element, see individual techniques for details). It recognizes the following Results elements:
- ‘RESULTS_RUNTIME_KEY‘->’techID’
- ‘Selection’->’inner’
Since it focuses on variable selection (called ‘inner selection’ in KDVS terminology), it will produce valid report only if inner selection was done with one of the following inner selectors:
This reporter accepts no specific parameters. It re–implements initialize() method to get em2annotation and technique2DOF mappings that must be provided in additionalData dictionary. ‘Technique2DOF’ is a technical mapping
- {techniqueID : { ‘DOFS_IDXS’: (0, 1, ..., n), ‘DOFs’: (name_DOF0, name_DOF1, ..., name_DOFn)}
that is produced so far only in ‘experiment’ application. IMPORTANT: this reporter assumes that the input Results instances originate from single category.
See also
Parameters : | storageManager : StorageManager
subsets_results_location : string
additionalData : object
|
---|---|
Raises : | Error :
|
For input Results instances, identify their common technique (via runtime standard Results element techID), along with associated DOFs (via technique2DOF). For each DOF, count all selected and non selected variables across all Results instances, and produce two report files (for selected and not selected variables), where for each variable a count is reported, along with all bioinformatic annotations.
Parameters : | resultsIter : iterable of Results
|
---|---|
Raises : | Error :
|
Bases: kdvs.fw.Report.Reporter
This local reporter produces reports that contain detailed information regarding ‘outer selection’, that is, results of classification performed by statistical technique on given data subset (classification is possible since data subset contains data points from two, or more, classes of samples); data subset is in turn associated to specific PriorKnowledgeConcept (PKC). The following details are reported for each PKC:
- selection status (in ‘outer selection’ sense)
- average error obtained on test splits
- average error obtained on training splits
- standard deviation of the error obtained on test splits
- standard deviation of the error obtained on training splits
- variance of the error obtained on test splits
- median error obtained on test splits
- total number of variables in the subset
- total number of ‘properly selected’ variables (‘inner selection’ in KDVS sense)
It produces single report for each degree of freedom (DOF) associated with the technique. For instance, if technique has 3 DOFs associated:
- D1, D2, D3
3 reports will be produced in total:
- err_D1, err_D2, err_D3
(exact names may vary). The technique must expose the following properly filled Results elements:
- ‘RESULTS_SUBSET_ID_KEY‘
- ‘Selection’->’outer’
- ‘Selection’->’inner’
- ‘Avg Error TS’
- ‘Avg Error TR’
- ‘Std Error TS’
- ‘Std Error TR’
- ‘Var Error TS’
- ‘Med Error TS’
NOTE: all ‘Error’–like elements will be reported “as-is”.
It will produce valid report only if outer selection was done with the following:
and only if inner selection was done with one of the following:
This reporter accepts no specific parameters. It re–implements initialize() method to get the following mappings:
- subsets
- pkcid2ssname
- technique2DOF
that must be provided in additionalData dictionary. The subsets is a technical mapping:
- {PKC_ID : [subsetID, numpy.shape(ds), [vars], [samples]]}
produced so far only in ‘experiment’ application. The pkcid2ssname is a technical mapping
- {PKC_ID : subsetID}
produced so far only in ‘experiment’ application. Technique2DOF is a technical mapping:
- {techniqueID : { ‘DOFS_IDXS’: (0, 1, ..., n), ‘DOFs’: (name_DOF0, name_DOF1, ..., name_DOFn)}
produced so far only in ‘experiment’ application. IMPORTANT: this reporter assumes that the input Results instances originate from single category.
Parameters : | storageManager : StorageManager
subsets_results_location : string
additionalData : object
|
---|---|
Raises : | Error :
|
For input Results instances, identify their common technique (via runtime standard Results element techID), along with associated DOFs (via technique2DOF). For each DOF, scan all Results instances to gather all required information, and produce single report file, where for each PKC, relevant statistical information is listed.
Parameters : | resultsIter : iterable of Results
|
---|---|
Raises : | Error :
|
Bases: kdvs.fw.Report.Reporter
This global reporter produces unified term list (UTL) for each single combination of DOFs coming from statistical techniques employed on selected categorizer hierarchy. For instance, if technique T1 with DOFs
- a1, a2, a3
was used in category A of categorizer C, and technique T2 with DOFs
- b1, b2, b3
was used in category B of the same categorizer C, the following combinations will be generated:
- a1_b1, a1_b2, a1_b3
- a2_b1, a2_b2, a2_b3
- a3_b1, a3_b2, a3_b3
and in total 9 UTLs will be produced. UTL contains series of information specific for prior knowledge concepts (PKCs), including:
selection status (in ‘outer selection’ sense)
identifier of associated data subset
full name of prior knowledge concept (as given by PKC manager)
total number of variables in data subset
total number of ‘properly selected’ variables (‘inner selection’ in KDVS sense)
- ‘error estimate’, i.e. average error obtained on test splits
(must be exposed by the technique as Results element Classification Error)
number of true positives
number of true negatives
number of false positives
number of false negatives
Matthews Correlation Coefficient
All this information is associated with classification process performed by the statistical technique (classification is possible since data subset contains data points from two, or more, classes of samples). The technique must expose the following properly filled Results elements:
‘Selection’->’outer’
‘Selection’->’inner’
- ‘Classification Error’
(produced for all DOFs separately, see individual techniques for details)
- ‘CM MCC’
(as tuple with TP, TN, FP, FN, MCC values)
It will produce valid report only if outer selection was done with the following:
and only if inner selection was done with one of the following:
This reporter accepts no specific parameters. It re–implements initialize() method to get the following mappings/instances:
- subsets
- pkcid2ssname
- technique2DOF
- operations_map_img
- categories_map
- cchain
- submission_order
- pkc_manager
that must be provided in additionalData dictionary. The following mappings/instances are produced so far only inside ‘experiment’ application:
- subsets
{PKC_ID : [subsetID, numpy.shape(ds), [vars], [samples]]}
- pkcid2ssname
{PKC_ID : subsetID}
- technique2DOF
{techniqueID : { ‘DOFS_IDXS’: (0, 1, ..., n), ‘DOFs’: (name_DOF0, name_DOF1, ..., name_DOFn)}
- operations_map_img
(textual representation of internal mapping operations_map)
- categories_map
{categorizerID : [categories]}
- cchain
i.e. categorizers chain, comes directly from MA_GO_PROFILE application profile (element ‘subset_hierarchy_categorizers_chain’)
- submission_order
an iterable of PKC IDs sorted in order of submission of their jobs
- pkc_manager
a concrete instance of PKCManager that governs all PKCs generated
IMPORTANT: this reporter assumes that, across each category of the considered categorizer, all Results instances originated from single statistical technique. See comments for the algorithm details.
Parameters : | storageManager : StorageManager
subsets_results_location : string
additionalData : object
|
---|---|
Raises : | Error :
|
Having current categorizer, get one below in the chain (if possible), along with all its categories. For each category, make sure that all relevant Results instances come from the same technique, and identify it, along with associated DOFs. Having series of DOFs for categories, permute them as explained above, and for each permutation, scan associated Results instances, gather requested information, and produce single report file. See comments for more technical details.
Parameters : | subsetHierarchy : SubsetHierarchy
ssIndResults : dict of iterable of Results
currentCategorizerID : string
currentCategoryID : string
|
---|---|
Raises : | Error :
|
See also