Class
is a base class for a hierarchy of classes used throughout Orange for simple transformation of values. Discretization, for instances, creates
a transformer that converts continuous values into discrete,
while continuizers do the opposite. Classification trees use transformers for binarization where values of discrete attributes are converted into binary.
Transformers are most commonly used in conjunction with Classifiers from Attribute. It is also possible to subtype this class in Python.
Although this classes can occasionally come very handy, you will mostly encounter them when created by other methods, such as discretization.
TransformValue
is the abstract root of the hierarchy, itself derived from Orange
. When called with a Value
as an argument, it returns the transformed value.
See Classifiers from Attribute for an example of how to derive new Python classes from TransformValue
.
Attributes
converts ordinal values to equidistant continuous.
Four-valued attribute with, say, values 'small', 'medium', 'large', 'extra large' would be converted to 0.0, 1.0, 2.0 and 3.0. You can also specify a factor by which the values are multiplied. If the factor for above attribute is set to 1/3 (or, in general, to 1 by number of values), the new continuous attribute will have values from 0.0 to 1.0.
Attributes
part of transformvalues-o2c.py (uses lenses.tab)
The values of attribute 'age' ('young', 'pre-presbyopic' and 'presbyopic') are in the new domain transformed to 0.0, 1.0 and 2.0. If we additionally set age_c.getValueFrom.transformer.factor
to 0.5, the new values will be 0.0, 0.5 and 1.0.
converts a discrete value to a continuous so that some designated value is converted to 1.0 and all others to 0.0 or -1.0, depending on the settings.
Attributes
Value
.True
, default) or -1.0 (False
). When False
undefined values are transformed to 0.0. Otherwise, undefined values yield an error.True
(default is False
), the transformations are reversed - the selected value
becomes 0.0 (or -1.0) and others 1.0.The following examples load the Monk 1 dataset and prepares various transformations for attribute "e".
part of transformvalues-d2c.py (uses monk1.tab)
We first construct a new continuous attribute e1
, and set its getValueFrom
to a newly constructed classifier that will extract the value of e
from any example it's given. Then we tell the classifier to transform the gotten value using a Discrete2Continuous
transformation. The tranformations value
is set to the index of e
's value "1"; one way to do it is to construct a Value
of attribute e
and cast it to integer (if you don't understand this, use it without understanding it).
To demonstrate the use of various flags, we constructed two more attributes in a similar manner. Both are based on e
, all check whether e
's value is "1", except that the new attribute's e10
tranformation will not be zero based and the e01
's transformation will also be inverted:
part of transformvalues-d2c.py
Finally, we shall construct a new domain that will only have the original e
and its transformations, and the class. We shall convert the entire table to that domain and print out the first ten examples.
part of transformvalues-d2c.py
Here's the script's output.
The difference between the second and the third attribute is in that where the second has zero's, the third has -1's. The last attribute (before the class) is reversed version of the third.
You can, of course, "divide" a single attribute into a number of continuous attributes. Original attribute e
has four possible values; let's create for new attributes, each corresponding to one of e
's values.
part of transformvalues-d2c.py (uses monk1.tab)
The output of this script is
Transformer
takes a continuous values and keeps it continuous, but subtracts the average
and divides the difference by half of the span
; v' = (v-average) / span
Attributes
The following script "normalizes" all attribute in the Iris dataset by subtracting the average value and dividing by the half of deviation.
part of transformvalues-nc.py (uses iris.tab)
is a discrete-to-discrete transformer that changes values according to the given mapping. MapIntValue is used for binarization in decision trees.
Attributes
v' = mapping[v]
. Undefined values remain undefined. Mapping is indexed by integer indices and contains integer indices of values.The following script transforms the value of 'age' in dataset lenses from 'young' to 'young', and from 'pre-presbyopic' and 'presbyopic' to 'old'.
part of transformvalues-miv.py (uses lenses.tab)
The mapping tells that 0th value of age
goes to 0th, while 1st and 2nd go to the 1st value of age_b
.
In the example on use of NormalizeContinuous
we have already seen how to transform all attributes of some dataset and prepare the corresponding new dataset. This operation is rather common, so it makes sense to have a few classes for accomplishing this task. Such a class is inevitably less flexible than per-attribute transformations, since no specific options can be set for individual attributes. For instance, DomainContinuizer
which will be introduced below, can be told how to treat multinominal attributes, but the same treatment then applies to all such attributes. In case that some of your attributes need specific treatment, you will have to program individual treatments yourself, in the manner similar to what we showed while introducing NormalizeContinuous
.
is a class that, given a domain or a set of examples returns a new domain containing only continuous attributes. If examples are given, the original continuous attribute can be normalized, while for discrete attributes it is possible to use the most frequent value as the base. The attributes are treated according to their type:
Discrete2Continuous
into 0.0/1.0 or -1.0/1.0 continuous attribute;multinomialTreatment
.The fate of the class attribute is determined specifically.
Attributes
zeroBased
flag of class Discrete2Continuous
and determines the value used as the "low" value of the attribute. When binary attribute are transformed into continuous or when multivalued attribute is transformed into multiple attributes, the transformed attribute can either have values 0.0 and 1.0 (default, zeroBased=True
) or -1.0 and 1.0. In the following text, we will assume that zeroBased
is True
and use 0.0.N
be the number of the attribute's values.
baseValue
set, the specified value is used as base instead
of the lowest one.baseValue
set, the specified value is used instead of the most
frequent.Ordinal2Continuous
).True
(not by default) continuous attributes are "normalized": they are subtracted the average value and divided by the deviation. This is only possible when the continuizer is given the data, not only the domain.multinomialTreatment
where it denotes omitting the
attribute.DomainContinuizer
.
Let us first examine the effect of multinomialTreatment
on attributes from dataset "bridges". To be able to follow the transformations, we shall first print out a description of domain and the 15th example in the dataset.
part of transformvalues-domain.py (uses bridges.tab)
We'll show the output in a moment. Let us now use the lowest values as the bases and continuize the attributes.
part of transformvalues-domain.py
Here's what we get; to the left, we've added the original example and the domain description, so that we can see what happens.
|
|
|
The first, four-valued attribute River is replaced by three attributes corresponding to values "A", "O" and "Y". For the 15th example, River is "M" so all three attributes are 0.0. The continuous year is left intact. Of the three attributes that describe the purpose of the bridge, "PURPOSE=RR" is 1.0 since this is the rail-road bridge. Value of the three-valued "REL-L" is undefined in the original example, so the corresponding two attributes in the new domain are undefined as well...
In the next test, we replaced continuizer.LowestIsBase
by continuizer.FrequentIsBase
, instructing Orange to use the most frequent values for base values.
|
|
|
Comparing the outputs, we notice that for the first attribute, "A" is chosen as the base value instead of "M", so the three new attributes tell whether the bridge is over "M", "O" or "Y". As for Purpose, nothing changes since highway bridges are the most often. The base value also changes for the binary Clear-G, since G is more frequent than N...
Next alternative is continuizer.NValues
, which turns N-valued attributes into N attributes, except for N==2, where we still get the binary attribute, using the lowest value for the base.
|
|
|
The least exciting case is continuizer.Ignore
, which reduces the attribute set to continuous attributes.
|
|
|
The last two variations retain the number of attributes, but turn them into continuous. continuizer.AsOrdinal
looks like this.
|
|
|
For instance, the value of C_Purpose is 2.000 since the Purpose has the 2nd possible value of purpose (if we start counting by 0). Finally, continuizer.AsNormalizedOrdinal
normalizes the new continuous attributes to range 0.0 - 1.0.
|
|
|
Values of Purpose now transform to 0.000, 0.333, 0.667 and 1.000; for railroad bridges, the corresponding value is 0.667.