Provides abstract framework classes that expose KDVS API, as well as some low–level concrete functionalities.
Provides abstract functionality for handling annotations.
Database table template for storing mapping between prior knowledge concepts and measurements (PKC->M). It defines the name ‘pkc2em’ and columns ‘pkc_id’, ‘em_id’, ‘pkc_data’. The ID column ‘pkc_id’ is also indexed. This general table utilizes multifield ‘pkc_data’ of the following format:
- ‘feature1=value1,...,featureN=valueN’,
where ‘feature’ is the property specific for PK source, and ‘value’ is the value of this property for specified PKC. Typical features of PK may be: PKC name, PKC description, higher level grouping of PKCs, e.g. into domains, etc. NOTE: it is advisable to create and use more specific tables tailored for specificity of individual PK sources, due to limited querying power of this table and potentially high computational cost of parsing the multifield.
Database table template that provides annotations for measurements. This template is tailored specifically for Gene Ontology. It defines the name ‘em2annotation’ and columns ‘em_id’, ‘gene_symbol’, ‘repr_id’, ‘gb_acc’, ‘entrez_gene_id’, ‘ensembl_id’, ‘refseq_id’. The ID column ‘em_id’ is also indexed. This table contains selected set of annotations for given measurement (i.e. probeset for microarray, gene for RNASeq etc). The selection is arbitrary and may not reflect all needs of the user; in that case it is advisable to use different, more specific table.
Default separator for the multifield ‘pkc_data’ used in generic annotation database template.
Obtain the dictionary with mapping between measurements and annotations, stored in specified DBTable instance.
Parameters : | em2annotation_dt : DBTable
|
---|---|
Returns : | em2a : collections.defaultdict
|
Provides abstract functionality of applications built on KDVS API. Each concrete application class must be derived from App class.
Bases: object
Abstract KDVS application.
By default, the constructor calls ‘self.prepareEnv’.
See also
Must be implemented in subclass. The implementation MUST assign fully configured concrete ExecutionEnvironment instance to self.env in order for application to be runnable from within KDVS in normal way. However, if one wants greater control over application behavior, ‘run’ method must be re–implemented as well. See ‘run’ for more details.
By default it does nothing.
By default it does nothing.
By default it does nothing.
By default it performs the following sequence of calls: self.appProlog, self.prepare, self.env.execute, self.appEpilog.
Bases: kdvs.core.util.Configurable
Abstract class for application profile. Concrete KDVS application uses specialized profile for configuration; it reads the profile and verifies if all the configuration elements are present and valid. See profile for ‘experiment’ application in ‘example_experiment’ directory for the complete example.
Parameters : | expected_cfg : dict
actual_cfg : dict
|
---|---|
Raises : | Error :
Error :
|
Provides base functionality for categorizers and orderers. Categorizers can divide data subsets into useful categories that can be nested and resemble a tree. This could be useful for assigning selected statistical techniques to specific data subsets only. Orderers control the order in which data subsets are being processed; each category can have its own orderer.
Informs KDVS that data subset could not be categorized, for whatever reason. Used in concrete derivations of Categorizer.
Standard way to present category stemming from categorizer, as follows: C[“categorizer_name”]->c[“category_name”].
Bases: object
Base class for categorizers. Categorizer must be supplied with dictionary of functions that categorize given subsets. Each function accepts DataSet instance and outputs, as string, either chosen category name or NOTCATEGORIZED. Dictionary maps category names to categorization functions. One must be careful to assign only one category to single data subset; if not, dataset will be permanently NOTCATEGORIZED without warning.
Parameters : | IDstr : string
categorizeFuncTable : dict(string->callable)
|
---|---|
Raises : | Error :
Error :
Error :
|
Returns all categories that this categorizer handles. Essentially, returns keys from categorization function table.
Categorizes given data subset by running all categorization functions on it, collecting the categories, and checking for their uniqueness. If exactly one single category is recognized, it is returned. If not, NOTCATEGORIZED is returned.
Parameters : | dataset_inst : DataSet
|
---|---|
Returns : | category : string
|
Given category name, makes it unique by binding it to categorizer name. It uses format specified in global variable UC.
Parameters : | category : string
|
---|---|
Returns : | uniquified_category : string
|
See also
Reverse the effect of ‘uniquifying’ the category name. Returns tuple (categorizer_name, category_name).
Parameters : | uniquified_category : string
|
---|---|
Returns : | uniquified_components : tuple
|
See also
Bases: object
Base class for orderers. In general, orderer accepts an iterable of data subset IDs, reorders it as it sees fit, and presents it through its API.
Must be implemented in subclass. The implementation MUST assign reordered iterable to self.ordering.
Returns the ordering built by this orderer.
Returns : | ordering : iterable
|
---|---|
Raises : | Error :
|
Provides unified wrapper over results from queries performed on underlying database tables controlled by KDVS.
Bases: object
Wrapper class over query results. Before using this class, correct SQL query must be issued and concrete Cursor instance must be obtained. Typically, all of this is performed inside DBTable, but the class is exposed for those that want greater control over querying. Essentially, this class wraps the the Cursor instance and controls fetching process.
Parameters : | dbtable : DBTable
cursor : Cursor
rowbufsize : integer
|
---|
See also
Generator that yields fetched results one by one. Underlying fetching is buffered. Results are returned as–is; the parsing is left to the user.
Returns : | result_row : iterable
|
---|---|
Raises : | Error :
|
See also
Cursor.fetchmany, OperationalError
Returns all fetched results at once, wrapped in desired structure (list or dictionary). NOTE: depending on the query itself, it may consume a lot of memory.
Parameters : | as_dict : boolean
dict_on_rows : boolean
|
---|---|
Returns : | result : list/dict
|
Raises : | Error :
|
See also
Cursor.fetchall, OperationalError
Closes wrapped Cursor instances and frees all the resouces allocated. Shall always be used when DBResult object is no longer needed.
Provides simple wrapper over database table to act like a dictionary that can hold any Python object, essentially a shelve with database backend.
Instance of default DBTemplate used to construct underlying database table that serves under DBShelve. It defines the name ‘shelve’ and columns ‘key’, ‘value’. The ID column ‘key’ is indexed.
Bases: _abcoll.MutableMapping
Class that exposes dictionary behavior of database table that can hold any Python object. By default, it governs database table created according to DBTemplate template DBSHELVE_TMPL.
Parameters : | dbm : DBManager
db_key : string
protocol : integer/None
|
---|
See also
See also
See also
See also
See also
dict.close
See also
dict.view
Provides low–level functionality of the management of database table under KDVS DB manager. Also provides simple wrapper over templated database tables.
Bases: object
Low–level wrapper over database table managed by KDVS DB manager. KDVS uses database tables to manage query–intensive information, such as the robust generation of data subsets from single main input data set. The wrapper encapsulates basic functionality incl. table creation, table filling from specific generator function, querying with conditions over colums and rows (in case where first column holds row IDs), generation of associated numpy.ndarray object (if possible), as well as basic counting routines.
Parameters : | dbm : DBManager
db_key : string
columns : list/tuple of strings
name : string/None
id_col : string/None
|
---|---|
Raises : | Error :
Error :
Error :
|
Physically create the table in underlying RDBMS; the creation is deferred until this call. The table is created empty.
Parameters : | indexed_columns : list/tuple/’*’
debug : boolean
|
---|---|
Returns : | statements : list of strings/None
|
Raises : | Error :
|
Fill the already created table with some data, coming from specified generator callable.
Parameters : | content : generator callable
debug : boolean
|
---|---|
Returns : | statements : list of strings/None
|
Raises : | Error :
|
Perform query from the table under specified conditions and return corresponding Cursor instance; the Cursor may be used immediately in straightforward manner or may be wrapped in DBResult instance.
Parameters : | columns : list/tuple/’*’
rows: list/tuple/’*’ :
filter_clause : string/None
debug : boolean
|
---|---|
Returns : | cs/statements : Cursor/list of strings
|
Raises : | Error :
|
See also
Convenient wrapper that does the following: performs query under specified conditions, wraps resulting Cursor into DBResult instance, and gets ALL the results wrapped into desired data structure, as per DBResult.getAll.
Parameters : | columns : list/tuple/’*’
rows: list/tuple/’*’ :
filter_clause : string/None
as_dict : boolean
dict_on_rows : boolean
debug : boolean
|
---|---|
Returns : | results/statements : list/dict / list of strings
|
Convenient wrapper that does the following: performs query under specified conditions, and builds corresponding numpy.ndarray object that contains queried data. Uses numpy.loadtxt() function for building the instance of numpy.ndarray. If resulting ndarray has dimension of 1 (i.e. (p,)), reshape it into one–dimensional matrix (i.e. (1,p)).
Parameters : | columns : list/tuple/’*’
rows: list/tuple/’*’ :
filter_clause : string/None
remove_id_col : boolean
debug : boolean
|
---|---|
Returns : | mat/statements : numpy.ndarray/list of strings
|
Raises : | Error :
Error :
Error :
Error :
|
See also
Counts number of rows for the table. Table must be filled to obtain count >0. Counting is performed with SQL standard function ‘count’ in underlying RDBMS.
Returns : | count : integer/None
|
---|---|
Raises : | Error :
Error :
|
Get content of designated ID column and return it as list of values.
Returns : | IDs : list of strings
|
---|---|
Raises : | Error :
Error :
|
Returns True if the table has been physically created, False otherwise.
Returns True if the table is empty, False otherwise.
Raises : | Error :
|
---|
Create an instance of DBTable based on specified DBTemplate instance.
Parameters : | dbm : DBManager
db_key : string
template : DBTemplate
|
---|---|
Returns : | dbtable : DBTable
|
Raises : | Error :
|
Recognized keys used in DBTemplate wrapper object.
Bases: object
The template object that contains simplified directives how to build a database table. It is essentially a wrapper over a dictionary that contains the following elements:
- ‘name’ – specifies the physical name of the table for underlying RDBMS,
- ‘columns’ – non–empty list/tuple of column names of standard type (the type is taken from getTextColumnType() method of the underlying DB provider),
- ‘id_column’ – name of the column designated to be an ID column for that table,
- ‘indexes’ – list/tuple of column names to be indexed by underlying RDBMS, or string ‘*’ for indexing all columns.
Parameters : | in_dict : dict
|
---|---|
Raises : | Error :
|
Provides low–level wrapper over tables that hold delimiter separated values (DSV). Such tables are referred to as DSV tables.
Default ID column for DSV table.
Bases: kdvs.fw.DBTable.DBTable
Create an instance of DBTable and immediately wrap it into DSV table. DSV table manages additional details such as initialization from associated DSV file and handling underlying DSV dialect.
Parameters : | dbm : DBManager
db_key : string
filehandle : file–like
dtname : string/None
delimiter : string/None
comment : string/None
header : list/tuple of string / None
make_missing_ID_column : boolean
|
---|---|
Raises : | Error :
Error :
Error :
Error :
Error :
|
See also
Open specified file and return its handle. Uses fileProvider() to transparently open and read compressed files; additional arguments are passed to file provider directly.
Parameters : | file_path : string
args : iterable
kwargs : dict
|
---|---|
Returns : | handle : file–like
|
Raises : | Error :
|
Build instance of CommentSkipper for this DSV table using specified ‘comment’ string.
Parameters : | iterable : iterable
|
---|---|
Returns : | cs : CommentSkipper
|
Fill the DSV table with data coming from associated DSV file. The input generator is the CommentSkipper instance that is obtained automatically. This method handles all underlying low–level activities. NOTE: the associated DSV file remains open until closed with close() method manually.
Parameters : | debug : boolean
|
---|---|
Returns : | statements : list of string/None
|
Raises : | Error :
Error :
|
See also
Close associated DSV file.
Provides unified interface for data sets processed by KDVS.
Bases: object
Wrapper object that represents data set processed by KDVS. It can wrap two types of objects:
- an existing DBTable object that KDVS uses for data storage in relational database
- an existing numpy.ndarray
In case of wrapping DBTable object, it creates additional numpy object of class ndarray, as returned by numpy.loadtxt() family of functions. The additional ndarray object is cached with the DataSet instance, and can be recached on demand; this may be useful if the content of underlying DBTable object changes dynamically.
Parameters : | input_array : numpy.ndarray
dbtable : DBTable
cols : iterable/’*’
rows : iterable/’*’
filter_clause : string/None
remove_id_col : boolean
|
---|---|
Raises : | Error :
|
Perform recaching of underlying ndarray object. Usable only when DBTable object is wrapped.
Provides root functionality for KDVS EnvOps (environment–wide operations).
IMPORTANT NOTE! In principle, EnvOp is devised as a self-contained function that can access and modify environment explicitly. Other modular execution blocks, such as techniques, reporters, orderers and selectors, by default are isolated from environment as much as possible and are accessible only through API. This is done to minimize possible devastating impact of erroneous code on the whole environment, that must manage other vital things and cannot croak that easily. Therefore, operations should be used only if absolutely necessary since they introduce states that are potentially very hard to debug.
Default parameters for EnvOp.
Bases: kdvs.core.util.Parametrizable
Encapsulates an EnvOp. Environment–wide operation is parametrizable and affects all execution environment. As such, it can potentially cause substantial problems if applied incorrectly. The EnvOp is called automatically during execution in callback fashion. In ‘experiment’ application, two types of EnvOps are available: pre–EnvOp, that will be executed BEFORE all computational jobs produced by statistical techniques for current category, and post–EnvOp, that will be executed AFTER all computational jobs for current category. EnvOps are executed at the category level.
Parameters : | ref_parameters : iterable
kwargs : dict
|
---|---|
Raises : | Error :
|
By default does nothing. Accepts an instance of execution environment.
Provides high–level functionality for handling of computational jobs by KDVS.
Constant used to signal when job produced no results.
Constant used to signal when job ended with an error.
Default number of job arguments presented, used in job listings, logs, etc.
Bases: object
A container for constants that represent the state of the job during its lifecycle. Also provides list of those statuses.
Bases: object
High–level wrapper over computational job that KDVS manages. Job consists of a function with arguments and possibly with some additional data. Newly created Job is in the state of CREATED, and its results are NOTPRODUCED.
Parameters : | call_func : function
call_args : list/tuple
additional_data : dict
|
---|---|
Raises : | Error :
Error :
Error :
|
Execute specified job function with specified arguments and return the result. Job execution is considered successful if no exception has been raised during running of job function.
Returns : | result : object
|
---|---|
Raises : | Error :
|
Bases: object
An abstract container that manages jobs. Must be subclassed.
Parameters : | incrementID : boolean
|
---|
See also
Add job to the execution queue and schedule for execution. Job changes its status to ADDED.
Parameters : | job : Job
kwargs : dict
|
---|---|
Returns : | jobID : string
|
Return number of jobs currently managed by this container. NOTE: this method does not differentiate between executed and not executed jobs.
Return True if container manages any jobs, False otherwise.
Return instance of Job by its jobID.
Parameters : | jobID : string
|
---|---|
Returns : | job : Job
|
Return status of requested job.
Parameters : | jobID : string
|
---|---|
Returns : | status : one of JobStatus.statuses
|
Return results produced by requested job. May be NOTPRODUCED.
Parameters : | jobID : string
|
---|---|
Returns : | result : object
|
Remove requested job from this manager.
Parameters : | jobID : string
|
---|---|
Raises : | Warn :
Warn :
|
Remove all jobs from this container.
Must be implemented in subclass.
Typically implemented in subclass to clean after itself. By default it does nothing.
Used by subclasses. Currently used only in ‘experiment’ application. By default it checks if given destination path exists.
Return any miscellaneous data associated with this container. Typically, subclasses add some to improve job management or provide some debug information.
Remove any miscellaneous data associated with this container.
Bases: object
Simple manager of groups of jobs. Can be used for finer execution control and to facilitate reporting.
Parameters : | kwargs : dict
|
---|
Add requested job to specified job group. If group was not defined before, it will be created.
Parameters : | group_name : string
jobID : string
|
---|
Add series of jobs to specified job group (shortcut). If group was not defined before, it will be created.
Parameters : | group_name : string
group_job_ids : iterable of string
|
---|
Remove specified job group from this manager. All associated job IDs are removed as well. NOTE: physical jobs are left intact.
Parameters : | group_name : string
|
---|
Removes all job groups from this manager.
Get list of job IDs associated with specified job group name.
Parameters : | group_name : string
|
---|---|
Returns : | jobIDs : iterable of string
|
Identify job group of the requested job ID.
Parameters : | jobID : string
|
---|---|
Returns : | group_name : string
|
Raises : | Error :
|
Get list of all job group names managed by this manager.
Provides high–level functionality for mappings constructed by KDVS.
Constant used to signal that entity is not mapped.
Bases: object
This map uses dictionaries of interlinked partial single mappings to derive final mapping. For instance, for single mappings
- {‘a’ : 1, ‘b’ : 2, ‘c’ : 3}
- {1 : ‘baa’, 2 : ‘boo’, 3 : ‘bee’}
- {‘baa’ : ‘x’, ‘boo’ : ‘y’, ‘bee’ : ‘z’}
the derived final mapping has the form
- {‘a’ : ‘x’, ‘b’ : ‘y’, ‘c’ : ‘z’}
Each single partial mapping is wrapped into an instance of this class, and deriving is done with class–wide static method. This class exposes partial dict API and re–implements methods __setitem__ and __getitem__. NOTE: for this map, order of derivation, and therefore, order of single partial mappings processed, is important.
Parameters : | initial_dict : dict/None
|
---|---|
Raises : | Error :
|
Return partial single mapping as a dictionary.
Update partial single mapping with all key–value pairs at once from given dictionary, with possible replacement.
Parameters : | map_dict : dict
replace : boolean
|
---|---|
Raises : | Error :
|
Derive single final value for given single key, computed across all given partial single mappings.
Parameters : | key : object
maps : iterable of ChainMap
|
---|---|
Returns : | key, interms, value : object/NOTMAPPED, list of object, object/None
|
Raises : | Error :
|
Build mapping of key–value pairs that come from deriving of final values for specified keys.
Parameters : | keys : iterable of object
maps : iterable of ChainMap
|
---|---|
Returns : | dmap, interms : dict, iterable of object
|
Raises : | Error :
|
See also
Bases: object
This map stores bi–directional mappings. For such map, values can repeat. To reflect that, values in both direction (forward and backward) will be binned. For instance, for given initial mapping
- {‘a’ : 1, ‘b’ : 1, ‘c’ : 2, ‘d’ : 3}
the following forward mapping will be constructed
- {‘a’ : [1], ‘b’ : [1], ‘c’ : [2], ‘d’ : [3]}
and the following backward mapping will be constructed as well
- {1 : [‘a’,’b’], 2 : [‘c’], 3 : [‘d’]}
The exact underlying data structure that hold binned values (“binning container”) depends on the specific map subtype. This class exposes partial dict API and re–implements methods __setitem__, __getitem__, and __delitem__.
See also
Parameters : | factory_obj : callable
add_op_name : string
initial_map : dict/None
|
---|
Clear bi–directional mapping. The underlying forward and backward mappings will be cleared.
Perform specific activity when given key is missing in the map during construction of bi–directional mapping. By default, it creates new binning container by calling factory_obj().
See also
Return forward mapping as an instance of collections.defaultdict.
Return backward mapping as an instance of collections.defaultdict.
Return forward mapping as an instance of collections.defaultdict.
Return forward mapping as a dictionary. NOTE: the resulting dictionary may not be suitable for printing, depending on the type of underlying binning container.
Return backward mapping as a dictionary. NOTE: the resulting dictionary may not be suitable for printing, depending on the type of underlying binning container.
Bases: kdvs.fw.Map.BDMap
Specialized BDMap that uses lists as binning containers. Repeated values are added to binning container with append() method. NOTE: all specialized behavior incl. exceptions raised, depends on list type. Refer to documentation of list type for more details.
See also
Bases: kdvs.fw.Map.BDMap
Specialized BDMap that uses sets as binning containers. Repeated values are added to binning container with add() method. NOTE: all specialized behavior incl. exceptions raised, depends on set type. Refer to documentation of set type for more details.
See also
Bases: object
Abstract bi–directional mapping (binned in sets) between prior knowledge concepts and individual measurements. The concrete implementation must implement the “build” method, where the mapping is built and SetBDMap instance of self.pkc2emid is filled with it.
Bases: object
Abstract bi–directional mapping (binned in sets) between prior knowledge concepts and gene symbols. Used for gene expression data analysis, may not be present in all KDVS applications. The concrete implementation must implement the “build” method, where the mapping is built and SetBDMap instance of self.pkc2gene is filled with it.
Bases: object
Abstract bi–directional mapping (binned in sets) between gene symbols and individual measurements. Used for gene expression data analysis, may not be present in all KDVS applications. The concrete implementation must implement the “build” method, where the mapping is built and SetBDMap instance of self.gene2emid is filled with it.
Provides high–level functionality for entities related to prior knowledge, such as prior knowledge concepts, prior knowledge managers, etc.
Default elements of each prior knowledge concept recognized by KDVS.
See also
Bases: object
The general representation of prior knowledge concept. Specific details depend on the knowledge itself. For example, in gene expression data analysis, genes may be grouped into functional classes, and each class may be represented by single prior knowledge concept. Prior knowledge concepts may be additionally grouped in domains if necessary; the concept of domain is used by prior knowledge manager to expose selected “subset” of knowledge, without the need of exposing all of it. The concept is thinly wrapped in a dictionary.
Parameters : | conceptid : string
name : string
domain_id : string/None
description : string/None
additionalInfo : dict
|
---|
Return all keys of the associated dictionary that holds the elements of the concept.
Bases: object
Abstract prior knowledge manager. The role of prior knowledge manager in KDVS is to read any specific representation of the knowledge, memorize the individual prior knowledge concepts, optionally map concepts to domains if necessary, and expose individual concepts through its API. The concrete implementation must implement the configure(), getPKC(), and dump() methods, and re–implement load() method. The manager must be configured before knowledge can be loaded. Concrete implementation may cache instances of PriorKnowledgeConcept or create them on the fly. Dump must be in serializable format, and should be human readable if possible. Mapping between concepts and domains by default is bi–directional (via SetBDMap).
Return True if manager has been configured, False otherwise.
By default, this method raises Error if manager has not been configured yet.
Provides high–level functionality for generation of reports by KDVS.
Default parameters for reporter.
Bases: kdvs.core.util.Parametrizable
Abstract reporter. Reporter produces reports based on results obtained from statistical techniques, where each subset has associated single technique, and each computational job executes technique on a subset. Reporter may work across single category of results (in that case reports are “local”), or can cross boundaries of individual categories (in that case reports are “global”). Each reporter may produce many single reports. Reporters are parametrizable, and report generation is done in the background in callback fashion after all computational jobs has been executed. Reporters are closely tied with respected statistical techniques.
Parameters : | ref_parameters : iterable
kwargs : dict
|
---|
Produce reports. This method works across single category of results. By default, it does nothing. The implementation should fill self._reports with mapping
- {file_name : [file_content_lines]}
By default, all report files will be created in standard sublocation results. This may be changed by specifying ‘subloc1/.../sublocN/file_name’ as file name. The new sublocation paths may be constructed with given location separator self.locsep.
Parameters : | resultsIter : iterable of Results
|
---|
Produce reports. This method works across the whole category tree. By default, it does nothing. The implementation should fill self._reports with mapping
- {file_name : [file_content_lines]}
By default, all report files will be created in standard sublocation results. This may be changed by specifying ‘subloc1/.../sublocN/file_name’ as file name. The new sublocation paths may be constructed with given location separator self.locsep. NOTE: reporter of this type may be requested to work starting on specific level of category tree; level is given by categorizer and category; in that case, it has access to the whole starting categorizer, and all subtree below it, and can start from given category.
Parameters : | subsetHierarchy : SubsetHierarchy
ssIndResults : dict of iterable of kdvs.fw.Stat.Results
currentCategorizerID : string
currentCategoryID : string
|
---|
Initialize the reporter. Since reporter produces physical files, the concrete storage must be assigned for them. Also, it may accept any additional data necessary for its work.
Parameters : | storageManager : StorageManager
subsets_results_location : string
additionalData : object
|
---|
Finalize reporter’s work by writing report files and clearing them.
Get currently generated reports as dictionary
- {‘file_name’ : [‘file_content_lines’]}
Request opening of new report in given location with specified content.
Parameters : | rlocation : string
content : iterable
|
---|
Return any additional data associated with this reporter.
Provides high–level functionality for statistical techniques. Statistical technique accepts data subset and processes it as it sees fit. Technique shall produce Results object that is stored physically and may be used later to generate reports. Typically, each technique has its own specific Reporter associated.
Bases: object
Provides uniform information about labels. In supervised machine learning, when the algorithms learn generalities from incoming known samples, the samples are of different types (typically two, sometimes more), and each type has a label associated to it. This information is present only when statistical technique uses supervised classification; in that case, the label information shall be supplied as an additional input file and load into DBTable instance. Typically, in the scenario with two classes of samples, first class has label ‘1’ associated, and second class has label ‘-1’ associated. See ‘example_experiment’ directory for an example of labels file.
Parameters : | source : DBTable
unused_sample_label : integer
|
---|---|
Raises : | Error :
|
Return labels in samples order, as read from input label information. When primary data set is read, samples are in specific order, e.g.
- S1, S2, ..., S40
However, label information specified in separated input file can have different sample order, e.g.
- S32, S33, ..., S40, S1, S2, S3, ..., S31
Here it is ensured that labels are ordered according to specified sample order.
Parameters : | sample_order : iterable
as_array : boolean
|
---|---|
Returns : | lbsord : iterable
|
Return used samples in samples order, as read from input label information. Useful when reordering samples according to specific order, and skipping unused samples (that have ‘unused_sample_label’ associated).
Parameters : | samples_order : iterable
|
---|---|
Returns : | smpord : iterable
|
Constant that represents element that has not been present among Results.
Standard Results element that refers to ID of data subset processed; typically, equivalent to associated prior knowledge concept identifier.
Standard Results element that refers to dictionary of plots associated with the result. Plots are produced with Plot according to specification.
Standard Results element that refers to any information available in runtime, that needs to be included with the result itself.
Bases: object
Wrapper for results obtained from statistical technique. Result is typically composed of various elements produced by the technique. The element can be any object of any valid Python/numpy type. Elements are referred to by their names, and Results instance works like a dictionary. If an element is a dictionary itself, it can contain nested dictionaries, so the following syntax also works:
- Results[‘element_name’][‘subelement_name1’]...[‘subelement_nameN’]
In the documentation, this is represented as:
- ‘element_name’->’subelement_name1’->...->’subelement_nameN’
Each valid statistical technique shall produce exactly one instance of Results for exactly one data subset. This class exposes partial dict API and implements __getitem__ and __setitem__ methods.
Parameters : | ssID : string
elements : iterable of string
|
---|
Return all element names.
Default Results element that shall always be produced by techniques that incorporate classification.
Default Results element that shall always be produced by techniques that incorporate some kind of ‘selection’ (including variable selection).
Default statistical technique parameters that shall always be present.
Bases: kdvs.core.util.Parametrizable
Abstract statistical technique that processes data subsets one at a time. Technique is parametrizable and is initialized during instance creation. Technique processes data subset by creating one or more jobs to be executed by specified job container. After job(s) are finished, the technique produces single Results instance. This split of functionalities was introduced to ease implementation of techniques that use cross validation extensively. Concrete implementation must implement produceResults() and reimplement createJob() methods. In the simplest case, single job that wraps single function call may be generated. More complicated implementations may require generation of cross validation splits, processing them in separated jobs, and merging partial results into single one.
Parameters : | ref_parameters : iterable
kwargs : dict
|
---|
This method must be reimplemented as a generator that yields jobs to be executed. By default, it only checks if input data are correctly specified.
Parameters : | ssname : string
data : numpy.ndarray
labels : numpy.ndarray/None
additionalJobData : dict
|
---|---|
Returns : | (jID, job) : string, Job
|
Notes
Proper order of data and labels must be ensured in order for the technique to work. Typically, subsets are generated according to samples order specified within the primary data set; labels must be in the same order. By definition, it is not checked during job execution.
Must be implemented in subclass. It returns single Results instance.
Parameters : | ssname : string
jobs : iterable of Job
runtime_data : dict
|
---|---|
Returns : | final_results : Results
|
Constant that refers to entity being not selected.
Constant that refers to entity being selected.
Constant that refers to error encountered during selection process.
Bases: kdvs.core.util.Parametrizable
Abstract parametrizable wrapper for selection activity. Generally, for KDVS ‘selection’ is understood in much wider context than in machine learning community. Both prior knowledge concepts and variables from data subsets can be ‘selected’. Some statistical techniques incorporate variable selection (in machine learning sense), some do not. In order to unify the concept, KDVS introduced ‘selection activity’ that marks specified entities as ‘properly selected’. For example, if the technique incorporates proper variable selection, concrete Selector instance will simply recognize it and mark selected variables as ‘properly selected’. If the technique does not involve variable selection, concrete Selector instance may simply declare some variables as ‘properly selected’ or not, depending on the needs. If some prior knowledge concepts could be ‘selected’ in any sense, another concrete Selector can accomplish this as well. Selectors produce ‘selection markings’ that can be saved later and reported. The concrete subclass must implement perform() method. Selectors are closely tied with techniques and reporters.
Parameters : | parameters : iterable
kwargs : dict
|
---|
Perform selection activity. Typically, Selector accepts Results instance and, depending on the needs, may go through individual variables of the data subset marking them as ‘properly selected’ or not, or may mark whole data subset (that has associated prior knowledge concept) as ‘selected’. In dubious cases, the selector can use constant value ‘selection error’. The associated Reporter instance shall recognize properly selected prior knowledge concepts and/or variables, and report them accordingly. This method must also return ‘selection markings’ in the format understandable for Reporter.
Bases: object
Abstract wrapper for plot. Concrete implementation must implement configure(), create(), and plot() methods. When using the plotter, the following sequence of calls shall be issued: configure(), create(), plot().
Calculate confusion matrix for original and predicted labels. It is used when labels reflect two classes, and one class is referred to as ‘cases’ (that has positive label associated), and second class is referred to as ‘control’ (that has negative label associated).
Parameters : | original_labels : iterable of integer
predicted_labels : iterable of integer
positive_label : integer
negative_label : integer
|
---|---|
Returns : | (tp, tn, fp, fn) : tuple of integer
|
Raises : | Error :
Error :
|
Calculate Matthews Correlation Coefficient for given confusion matrix.
Parameters : | tp : integer
tn : integer
fp : integer
fn : integer
|
---|---|
Returns : | mcc : float |
Provides functionality for path and subdirectory management.
Standard separator used for specifying sublocations. It may differ from path separator on current platform.
Bases: object
Storage manager that operates on file system provided by operating system and accessible by Python interpreter through os module. Storage manager manages ‘locations’ that refer to subdirectories under specified root path, and manipulation of concrete directory paths are hidden from the user.
Parameters : | name : string/None
root_path : string/None
create_dbm : boolean
|
---|
See also
Create specified location. Location may be specified as
- ‘loc’
or
- ‘loc/loc1/loc2/.../locN’
In the first case, subdirectory
- ‘loc’
will be created under the root path of the manager, with concrete path
- ‘root/loc’
In the second case, all nested subdirectories will be created, if not created already, and the concrete path will be
- ‘root/loc/loc1/loc2/.../locN’
In addition, all partial sublocations
- ‘root/loc’
- ‘root/loc/loc1’
- ‘root/loc/loc1/loc2’
- ...
Path separators may differ with the platform.
Parameters : | location : string/None
|
---|
Return physical directory path for given location.
Parameters : | location : string
|
---|---|
Returns : | path : string/None
|
Remove location from managed locations. This method considers two cases. When location is e.g.
- ‘loc/loc1/loc2’
and leaf mode is not requested, physical subdirectory
- ‘root/loc/loc1/loc2’
will be deleted along with all nested subdirectories, and all managed sublocations. If leaf mode is requested, only the most nested subdirectory
- ‘root/loc/loc1/loc2’
will be deleted and
- ‘root/loc/loc1’
will be left, along with all managed sublocations.
Parameters : | location : string
leafonly : boolean
|
---|
Return identifier of root location for this manager instance.
Return physical directory path of root location for this manager instance.
Provides high–level functionality for management of hierarchy of data subsets. Subsets can be hierarchical according to prior knowledge domains, or according to user specific criteria. Hierarchy is built based on categorizers.
Bases: object
Abstract subset hierarchy manager. It constructs and manages two entities available as global attributes: hierarchy and symboltree. Data subsets may be categorized with categorizers, and categories may be nested. Hierarchy refers to the nested categories as the dictionary of the following format:
- {parent_category : child_categorizer_id}
where root categorizer is keyed with None (because of no parent), and categories in last categorizer are valued with None (because of no children). Symboltree refers to symbols categorized by categories as the dictionary of the following format:
- {parent_category : {child_category1 : [associated symbols], ..., child_categoryN : [associated symbols]}}
In contrast to hierarchy, symboltree does not contain ‘None’ keys/values. Typically, symbols refer to prior knowledge concepts. Concrete implementation must implement obtainDatasetFromSymbol() method.
Build categories hierarchy and symboltree.
Parameters : | categorizers_list : iterable of string
categorizers_inst_dict : dict
initial_symbols : iterable of string
|
---|---|
Raises : | Error :
|