Lookup classifiers predict classes by looking into stored lists of cases. There are two kinds of such classifiers in Orange. The simpler and fastest
use up to three discrete attributes and have a stored mapping from values of those attributes to class value. The more complex classifiers stores an ExampleTable
and predicts the class by matching the example to examples in the table.
The natural habitat of these classifiers is feature construction: they usually reside in getValueFrom
fields of constructed attributes to facilitate their automatic computation. For instance, the following script shows how to translate the Monk 1 dataset features into a more useful subset that will only include the attributes a
, b
, e
, and attributes that will tell whether a
and b
are equal and whether e
is 1 (don't bother the details, they follow later).
part of ClassifierByLookupTable.py (uses monk1.tab)
We can check the correctness of the script by printing out several random examples from data2
.
The first ClassifierByLookupTable
takes values of attributes a
and b
and computes the value of ab
according to the rule given in the given table. The first three values correspond to a
=1 and b
=1, 2, 3; for the first combination, value of ab
should be "yes", for the other two a
and b
are different. The next triplet correspond to a
=2; here, the middle value is "yes"...
The second lookup is simpler: since it involves only a single attribute, the list is a simple one-to-one mapping from the four-valued e
to the two-valued e1
. The last value in the list is returned when e
is unknown and tells that e1
should be unknown then as well.
Note that you don't need ClassifierByLookupTable
for this. The new attribute e1
could be computed with a callback to Python, for instance:
While functionally the same, using classifiers by lookup table is faster.
Although the above example used ClassifierByLookupTable
as if it was a concrete class, ClassifierByLookupTable
is actually abstract. Calling its constructor is a typical Orange trick: what you get, is never ClassifierByLookupTable
, but either
,
and
. As their names tell, the first classifies using a single attribute (so that's what we had for e1
), the second uses a pair of attributes (and has been constructed for ab
above), and the third uses three attributes. Class predictions for each combination of attribute values are stored in a (one dimensional) table.
To classify an example, the classifier computes an index of the element of the table that corresponds to the combination of attribute values.
These classifiers are built to be fast, not safe. If you, for instance, change the number of values for one of the attributes, the Orange will most probably crash. To protect you somewhat, many of these classes' attributes are read-only and can only be set when the object is constructed.
Attributes
ClassifierByLookupTable1
only has variable1
, ClassifierByLookupTable2
also has variable2
and ClassifierByLookupTable3
has all three.variable1
, variable2
and variable3
. This is stored here to make the classifier faster. Those attributes are defined only for ClassifierByLookupTable2
(the first two) and ClassifierByLookupTable3
(all three).ValueList
), one for each possible combination of attributes. For ClassifierByLookupTable1
, there is an additional element that is returned when the attribute's value is unknown. Values are ordered by values of attributes, with variable1
being the most important. In case of two three valued attributes, the list order is therefore 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, 3-1, 3-2, 3-3, where the first digit corresponds to variable1
and the second to variable2
.
The list is read-only in the sense that you cannot assign a new list to this field. You can, however, change its elements. Don't change its size, though.
lookupTable
, but is of type DistributionList
and stores a distribution for each combination of values.
EFMDataDescription
, defined only for ClassifierByLookupTable2
and ClassifierByLookupTable3
. They use it to make predictions when one or more attribute values are unknown. ClassifierByLookupTable1
doesn't need it since this case is covered by an additional element in lookupTable
and distributions
, as told above.Methods
lookupTable
and distributions
are omitted, constructor also initializes lookupTable
and distributions
to two lists of the right sizes, but their elements are don't knows and empty distributions. If they are given, they must be of correct size.lookupTable
or distributions
. The formula depends upon the type of the classifier. If valuei is int(example[variablei])
, then the corresponding formulae are
ClassifierByLookupTable1
:index = value1
, or len(lookupTable)-1
if value is unknownClassifierByLookupTable2
:index = value1*noOfValues1 + value2
, or -1 if any value is unknown
ClassifierByLookupTable3
:index = (value1*noOfValues1 + value2) * noOfValues2 + value3
, or -1 if any value is unknownLet's see some indices for randomly chosen examples from the original table.
part of ClassifierByLookupTable.py (continued from above) (uses monk1.tab)
is the alternative to ClassifierByLookupTable
. It is to be used when the classification is based on more than three attributes. Instead of having a lookup table, it stores an ExampleTable
, which is optimized for a faster access.
This class is used in similar contexts as ClassifierByLookupTable
. If you write, for instance, a constructive induction algorithm, it is recommendable that the values of the new attribute are computed either by one of classifiers by lookup table or by ClassifierByExampleTable
, depending on the number of bound attributes.
Attributes
ExampleTable
with sorted examples for lookup. Examples in the table can be merged; if there were multiple examples with the same attribute values (but possibly different classes), they are merged into a single example. Regardless of merging, class values in this table are distributed: their svalue
contains a Distribution
.classifierForUnknown
is not set, don't know's are returned.ClassifierByExampleTable
appears more similar to ClassifierByLookupTable
. If a constructive induction algorithm returns the result in one of these classifiers, and you would like to check which attributes are used, you can use variables
regardless of the class you actually got.There are no specific methods for ClassifierByExampleTable
. Since this is a classifier, it can be called. When the example to be classified includes unknown values, classifierForUnknown
will be used if it is defined.
Although ClassifierByExampleTable
is not really a classifier in the sense that you will use it to classify examples, but is rather a function for computation of intermediate values, it has an associated learner, LookupLearner
. The learner's task is, basically, to construct an ExampleTable
for sortedExamples
. It sorts them, merges them and, of course, regards example weights in the process as well.
part of ClassifierByExampleTable.py (uses monk1.tab)
In data_s
, we have prepared a table in which examples are described only by a
, b
, e
and the class. Learner constructs a ClassifierByExampleTable
and stores examples from data_s
into its sortedExamples
. Examples are merged so that there are no duplicates.
Well, there's a bit more here than meets the eye: each example's class value also stores the distribution of classes for all examples that were merged into it. In our case, the three attribute suffice to unambiguously determine the classes and, since example covered the entire space, all distributions have 12 examples in one of the class and none in the other.
ClassifierByExampleTable
will usually used by getValueFrom
. So, we would probably continue this by constructing a new attribute and put the classifier into its getValueFrom
.
There's something disturbing here. Although abe
determines the value of y2
, abe.classVar
is still y
. Orange doesn't bother (the whole example is artificial - you will seldom pack entire dataset in an ClassifierByExampleTable
...), so shouldn't you. But still, for the sake of hygiene, you can conclude by
Whole story can be greatly simplified. LookupLearner
can also be called differently than other learners. Besides examples, you can pass the new class attribute and the attributes that should be used for classification. This saves us from constructing data_s
and reassigning the classVar
. It doesn't set the getValueFrom
, though.
part of ClassifierByExampleTable.py (uses monk1.tab)
Let us, for the end, show another use of LookupLearner
. With the alternative call arguments, it offers an easy way to observe attribute interactions. For this purpose, we shall omit e
, and construct a ClassifierByExampleTable
from a
and b
only.
part of ClassifierByExampleTable.py (uses monk1.tab)
The script's output show how the classes are distributed for different values of a
and b
.
For instance, when a
is '1' and b
is '3', the majority class is '0', and the class distribution is 36:12 in favor of '0'.