orange.Example
holds examples - a list of attribute values, together with some auxiliary data. That is how you see them in Python: as a list, resembling ordinary Python's list to extent possible. There are, however, differences; each example corresponds to some domain and therefore the number of attributes ("list elements") and their types are always as prescribed.
Attributes
Examples cannot be assigned arbitrary attributes like other Python and Orange objects ("attribute" is here meant in the sense "class attribute"). For instance, if ex
is an example, ex.xxx=12
will yield an error.
To construct a new example, you first have to have a domain description. You can construct one yourself or load it from a file (which otherwise also contains some examples). For sake of simplicity, we shall load a domain from "lenses" dataset.
part of example.py (uses lenses.tab)
orange.Value
) and must be of appropriate length, one value for each corresponding attribute.
The first example was constructed by giving values as strings. That's what you'll usually do; continuous values can, naturally, be given as numbers (or as strings, if you desire so). In the second example, we've shown alternatives: the second and the third values are given by indices and for the fourth we have constructed an orange.Value
(something that orange
would do for us automatically anyway if we just passed a string).
If domain
is the same as original example's domain, this constructor is equivalent to the previous one.
orange.Example(domain, example)
, which fills the example with values obtained from another example, this constructor fills the example with values obtained from multiple examples. The needed values are sought for in ordinary and meta-attributes registered with the corresponding domains. Meta-attributes that appear in the given examples and don't appear in the new example either as ordinary or meta attributes, are copied as well.
We shall demonstrate the function on the datasets merge1.tab and merge2.tab; the first has attributes a1
and a2
, and meta-attributes m1
and m2
, while the second has attributes a1
and a3
and meta-attributes m1
and m3
.
example_merge.py (uses merge1.tab, merge2.tab)
The newdomain
consists of several attributes from data1
and data2
: a1
, a2
and m1
are ordinary, and m2
and a2
are meta-attributes. Variables m1
and m2
are really tuples of meta-id and a descriptor (Variable). For this reason, orange.Domain
is initialized with m1[1]
, descriptor, while when adding meta attributes, we use m2[0]
and m2[1]
, so that m2
has the same id in both domains. For meta-attribute a2
which was original ordinary, we obtain a new id.
In addition, newdomain
has two new attributes, n1
and n2
, the first as ordinary and the second as meta-attribute.
Since attributes a1
and m1
appear in domains of both original examples, the new examples can only be constructed if these values match. They indeed do, and the merged example has all the values defined in the domain (a1
, a3
and m2
, and meta-attributes a2
and m1
). In addition, it got the value of the meta-attribute m3
from the second example, which is only identified by id -4
since it is not registered with the domain. Values of the two new attributes are left undefined.
Examples have certain list-like behaviour. You can address their values. You can use for
loops to iterate through example's values (for value in example:...
. You can query example's length; it equals the number of attributes, including class attribute. You however cannot change the "length" of example, by inserting or removing attributes. Number and types of attributes are defined by domain
, and the domain
cannot be changed once example is constructed. Finally, you can convert an example to an ordinary Python's list.
Examples can be indexed by integer indices, attribute descriptors or attribute names. Since "age" is the the first attribute in dataset lenses, the below statements are equivalent.
Example's values can be modified. We shall increase the age (and if it becomes larger than 2, reset it to 0).
The lesson which we've learned by the way is that by example = data[0]
we don't get a fresh copy of example but a reference to the first example in the data
. If you need a fresh copy, you need to clone the example, as explained above.
The last value in the example is class value. Do not access it by example[-1]
since this is reserved for future use (with meta values); use getclass
and setclass
instead.
Methods
Value
.Value
, number or string.value
should be a qualified orange.Value
, that is, it should have field variable
defined. value.variable
should be one of the attributes in example's domain (either ordinary or a registered meta-attribute). Functions sets the value of the attribute to the given value. This function is equivalent to calling self[value.variable] = value
.
This function makes it easy to assign prescribed values to examples; see an example in the section about meta values.
Value
; if it is 0, the list will contain native Python objects - string for discrete and numbers for continuous attribute values).Hash function for example (accessible via Python's built-in function hash
, see Python documentation) is computed using CRC32. To some extent, you can also use it as random number (this is done, for instance, by RandomClassifier
.
Data examples in Orange are described by a fixed number and types of values, defined by domain descriptor. There is, however a way to attach additional attributes to examples. Such attributes (we call them meta-attributes) are not used for learning, but can carry additional information, such as, for instance, patient's name or the number of times the example was missclassified during some test procedure. The most common additional information is example's weight. To make things even more complex, we have already encountered problems for which examples had to have more than one weight each.
For contrast from ordinary attributes, examples from the same domain (or even the same ExampleTable
) can have varying number of meta values. Ordinary attributes are addressed by positions (eg example[0]
is the first and example[4]
is the fifth value in the example). Meta-attributes are addressed by id's; id's are really negative integers, but you should see them as "keys". An example can have any number of meta values with distinct id's. Domain descriptor can, but doesn't need to know about them.
Id's are "created" by function orange.newmetaid()
. (The function uses a very elaborate procedure for generating unique negative integers; the procedure might reveal itself only to the brightest if they make a few calls to the function and carefully observe the returned values.) So, if you want to assign meta values to examples, you need to obtain an id from orange.newmetaid()
; afterwards, you can use it on any examples you want.
If there is a particular attributes associated with the meta value, you can also pass the attribute as an argument to orange.newmetaid
. If the attribute has been already registered with some id, the id can be reused. Doing so is recommended, but not the necessary.
Meta values can also be loaded from files in tab-delimited or Excel format. In this case, you only need to know the names of corresponding meta-attributes; id's and stuff will be taken care of while loading the data. See documentation on file format.
Most often, you will use id for assigning weights; to each example you would assign a number (can be greater or smaller than one, most algorithms will even tolerate negative weights) and pass the id to the learning algorithm. Let's do this with random weights.
example2.py (uses lenses.tab)
Example now consists of two parts, ordinary attributes that resemble a list since they are addressed by positions (eg. the first value is "psby"), and meta values that are more like dictionaries, where the id (-2) is a key and 0.34 is a value (of type orange.Value
, as all values in Example
).
To make learner aware of weights, one only needs to pass the id as an additional argument. Therefore, to train a Bayesian classifier on our randomly weighted examples, you would call it by.
Many other functions accept weights in similar fashion.
It is easy to see how this system also accommodates examples having different weights to be used for different procedures in the same experimental setup.
As mentioned in documentation on orange.Domain
, you can enhance the output by registering an attribute descriptor for meta-attribute with id -2 in the example's domain.
Meta-attribute can now be indexed just as any other attribute:
More important consequence of registering attribute with the domain is that it enables automatic value conversion.
Let us add a nominal meta-attribute, which will tell whether the example has been double-checked by the domain expert. The attributes values will be "yes" and "no".
part of example3.py (uses lenses.tab)
This can't work since we haven't told Orange that ok_id
corresponds to attribute ok
and thus it cannot convert string "yes" to a orange.Value
. You should perform the conversion manually.
However, if you register the meta-attribute with the domain descriptor, Orange can find a descriptor and perform the conversion itself.
As before, you can use either id ok_id
, attribute descriptor ok
or attribute's name "ok?"
to index the example.
It is even possible to use the meta-attribute with the setvalue
function.
Methods
int
(default), str
or orange.Variable
, and determines whether the keys in the dictionary will be meta-id's, attribute names or attribute descriptors. In the latter two cases, the function will only return the meta values that are registered in the domain (there are no descriptors/names associated with other values). In either case, the dictionary contains only a copy of the values: changing the dictionary won't affect the example's meta values.
Argument 'optional' tells the method to return only the optional or the non-optional meta attributes. For the optional, the attributes with the same value of the flag are returned. If the argument is absent, both types of attributes are returned.
The below code will print out the dictionary with all four possible key-types.
part of basket.py (uses inquisition2.basket)
None
, function returns 1.0 (since weight id of 0 normally means that examples are not weighted).
If you are writing your own learner, you should always use this function to retrieve example's weight. It is practical: most functions in Orange that can optionally accept weights, understand a weight id of 0 as "no weights"; this function takes care of that. In particular, never attempt to do this:
If examples are not weighted, id
will be zero and you'll get the value of the first attribute...
id
is zero or None
, nothing happens. weight
must be a number; if omitted, the weight is set to 1.0.removemeta
. It does exactly the same thing except that it doesn't accept anything but integer for id
. If id
is zero or None
, this function does nothing.