Module orngDisc implements some functions and classes that can be
used for categorization of continuous attributes. While core Orange
already has several classes that can help in this task, currently,
orngDisc provides a function that may help in entropy-based
discretization (Fayyad & Irani), and a wrapper around classes for
categorization that can be used for learning.
- entropyDiscretization(data)
- A function that takes the classified data set
(data) and categorizes all continuous attributes using the
entropy based discretization (orange.EntropyDiscretization). After
categorization, attribute that were categorized to a single interval
(to a constant value) are removed from the data set. Returns the data
set that includes all categorical and discretized continuous
attributes from the original data set data.
- EntropyDiscretization([data])
- This is simple wrapper class around the function
entropyDiscretization
. Once invoked it would either
create an object that can be passed a data set for discretization, or
if invoked with the data set, would return a discretized data set:
discretizer = orngDisc.EntropyDiscretization()
disc_data = discretizer(data)
another_disc_data = orngDisc.EntropyDiscretization(data)
- DiscretizedLearner([baseLearner[, examples[, discretizer[, name]]]])
- This class allows to set an learner object, such that before
learning a data passed to a learner is discretized. In this way we can
prepare an object that lears without giving it the data, and, for
instance, use it in some standard testing procedure that repeats
learning/testing on several data samples. Default procedure for
discretization (discretizer) is
orngDisc.EntropyDiscretization
. An example on how such
learner is set and used in ten-fold cross validation is given
below:
bayes = orange.BayesLearner()
dBayes = orngDisc.DiscretizedLearner(bayes, name='disc bayes')
dbayes2 = orngDisc.DiscretizedLearner(bayes, name="EquiNBayes", \
discretizer=orange.Preprocessor_discretize(\
method=orange.EquiNDiscretization(numberOfIntervals=10)))
results = orngEval.CrossValidation([dBayes], data)
classifier = orngDisc.DiscretizedLearner(bayes, examples=data)
Examples
A chapter on feature subset selection in Orange for Beginners tutorial shows the use of DiscretizedLearner. Other discretization classes from core Orange are listed in chapter on categorization of the same tutorial.
References
UM Fayyad and KB Irani. Multi-interval discretization of continuous valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1022--1029, Chambery, France, 1993.