Provides specific functionality for data–driven activities, such as data subset generation, specific categorizers based on data properties, specific selectors etc.
Provides functionality for ‘null’ data–driven activities. Null activities are neutral against their arguments, simply passing them.
Bases: kdvs.fw.Categorizer.Categorizer
Null categorizer that uses single virtual ‘category’ that passes the data subset without checking it and marks it with that category. Useful as placeholder and on the top of categorizers hierarchy, where more specialized work is done with more specialized categorizers.
Parameters : | ID : string
null_category : string
|
---|
Bases: kdvs.fw.Categorizer.Orderer
Null orderer that simply returns given iterable without actually ordering it. Useful as placeholder and on lower levels of categorizers hierarchy, when the ordering has already been performed upstream.
‘Build’ the order by simply keeping the input iterable as is.
Return the order by simply returning an input iterable.
Provides specialized functionality for data–driven activities. This includes the concrete producer of data subsets, according to the philosophy of creating smaller subsets according to prior knowledge. The subset producer also categorizes them actively with given categorizer(s).
Bases: object
Base class for data–driven subset producer. The concrete subclass needs to implement getSubset() method that returns single DataSet instance for single prior knowledge concept specified. The implementation shall create subset according to prior knowledge information. It may also re–implement categorizeSubset() method if necessary.
By default, the constructor does nothing.
Categorize input DataSet using categories from specified categorizer.
Parameters : | subset_inst : DataSet
subsetCategorizer : Categorizer
|
---|---|
Returns : | category : string
|
Bases: kdvs.fw.impl.data.PKDrivenData.PKDrivenDataManager
Concrete implementation of data–driven subset producer that creates overlapping DataSet instances based on prior knowledge information.
Parameters : | main_dbtable : DBTable
pkcidmap_inst : PKCIDMap
|
---|
Generate data subset for specific prior knowledge concept, and wrap it into DataSet instance if requested. Optionally, it can also generate only the information needed to create subset manually and not the subset itself; this may be useful e.g. if data come from remote source that offers no complete control over querying.
Parameters : | pkcID : string
forSamples : iterable/string
get_ssinfo : boolean
get_dataset : boolean
|
---|---|
Returns : | ssinfo : dict/None
subset_ds : DataSet/None
|
Raises : | Error :
|
Bases: kdvs.fw.SubsetHierarchy.SubsetHierarchy
Concrete instance of SubsetHierarchy class that generates proper data subsets for given symbol (i.e. prior knowledge concept). This implementation uses data–driven subset producer as a subset generator.
Parameters : | pkdm_inst : PKDrivenDBDataManager
samples_iter : iterable of string
|
---|
Build categories hierarchy and symboltree.
Parameters : | categorizers_list : iterable of string
categorizers_inst_dict : dict
initial_symbols : iterable of string
|
---|---|
Raises : | Error :
|
Provides specific functionality for activities connected to size of the data subsets. For instance, data subsets can be categorized based on their size.
Bases: kdvs.fw.Categorizer.Categorizer
Categorizer that checks the size of the data subset (in terms of number of variables associated; also ‘rows’ in KDVS internal implementation terminology), and classifies it into one of two categories: ‘lesser than’ (if size <= threshold) or ‘greater than’ (if size > threshold).
Parameters : | size_threshold : integer
ID : string
size_lesser_category : string
size_greater_category : string
|
---|
Return size threshold for that categorizer as an integer.
Bases: kdvs.fw.Categorizer.Orderer
Concrete Orderer that is closely associated with data–driven activities. It is used to change ordering of the elements associated with subset sizes. For instance, one may expect data subsets to be processed starting from the largest ones and going progressively towards smaller ones. The build() method of this class accepts iterable of tuples (pkcID, size), where ‘pkcID’ is PKC (prior knowledge concept) ID, and ‘size’ is the size of associated data subset (in terms of number of variables associated; also ‘rows’ in KDVS internal implementation terminology). NOTE: this class must be given as input the already sorted iterable, when sort is done according to descending size of associated data subsets.
Parameters : | descending : boolean
|
---|
Build appropriate order from given iterable of tuples (pkcID, size). This method expects iterable that is already sorted in descending way (i.e. starting from largest data subsets).
Parameters : | pkc2ss : iterable of (string, integer)
|
---|
Return specific order for this orderer. NOTE: the order is the iterable of pkcIDs alone; the size information is omitted, but the original iterable can still be accessed as self.pkc2ss.
Returns : | order : iterable of string
|
---|