Generators for Subsets of Attributes

Subsets generators are classes that generate subsets of a given set of attributes. Their primary mission was to generate bound sets for function decomposition, although they can also be used for other purposes.


SubsetsGenerator

SubsetsGenerator is an abstract class that defines the behaviour of derived classes.

Using Subsets Generators

Let sgen be a generator that constructs pair of attributes from domain Monk 1 (the section below describes how to create such generator). You can use it in for sentences

>>> for attrs in sgen: ... print attrs ... (EnumVariable 'a', EnumVariable 'b') (EnumVariable 'a', EnumVariable 'c') (EnumVariable 'a', EnumVariable 'd') ...

or in list comprehensions

subsets = [attrs for attrs in sgen]

There is another way of using subset generators. You can reset generator by calling reset and get a sequence of attribute subsets by calling next until it returns None. This is provided for compatibility with older versions of Orange and describe here for easier understanding of old code. Don't use it.

Initializing the Generator

Before iterating through subsets, the generator needs to be given a set of attributes. You can specify the set at construction time, set it through varList attribute, or specify it at for-clause. So, to construct the above generator, one can write

sgen = orange.SubsetsGenerator_constSize(data.domain.attributes)

or

sgen = orange.SubsetsGenerator_constSize() sgen.varList = data.domain.attributes

The third, somewhat ugly alternative, is providing the attribute set at call time.

sgen = orange.SubsetsGenerator_constSize() for attrs in sgen(data.domain.attributes): print attrs Why is it ugly? It's syntactically a call, and it's also implemented as one, but the code is equivalent to def __call__(self, v): self.varList = v; return self. Why is this dirt here at all? For compatibility with some stuff originating from before Python had the iterator protocol. And because it can come handy.

SubsetsGenerator_constSize

SubsetsGenerator_constSize returns subsets of predefined size.

Attributes

B
Subsets size. Default size is 2.

Here is an example

part of subsetsgenerators.py (uses monk1.tab)

gen1 = orange.SubsetsGenerator_constSize(data.domain.attributes, B=3) for attrs in gen1: print attrs

Output begins by

(EnumVariable 'a', EnumVariable 'b', EnumVariable 'c') (EnumVariable 'a', EnumVariable 'b', EnumVariable 'd') (EnumVariable 'a', EnumVariable 'b', EnumVariable 'e') (EnumVariable 'a', EnumVariable 'b', EnumVariable 'f') (EnumVariable 'a', EnumVariable 'c', EnumVariable 'd') (EnumVariable 'a', EnumVariable 'c', EnumVariable 'e') (EnumVariable 'a', EnumVariable 'c', EnumVariable 'f') (EnumVariable 'a', EnumVariable 'd', EnumVariable 'e') (EnumVariable 'a', EnumVariable 'd', EnumVariable 'f') (EnumVariable 'a', EnumVariable 'e', EnumVariable 'f') (EnumVariable 'b', EnumVariable 'c', EnumVariable 'd') ...

More often, however, the generator will be constructed in advance and then used to construct subsets of some given attribute set.

part of subsetsgenerators.py (uses monk1.tab)

def f(gen, data): for attrs in gen(data.domain.attributes): print attrs gen = orange.SubsetsGenerator_constSize(B=3) f(gen, data)

SubsetsGenerator_minMaxSize

SubsetsGenerator_minMaxSize returns subsets of sizes within given limits. Subsets are sorted by increasing cardinality.

Attributes

min, max
Minimal and maximal subset size. Defaults are 2 and 3.

part of subsetsgenerators.py (uses monk1.tab)

gen4 = orange.SubsetsGenerator_minMaxSize(min=1, max=3) for attrs in gen4(data.domain.attributes): print attrs

The output begins by:

(EnumVariable 'a',) (EnumVariable 'b',) (EnumVariable 'c',) (EnumVariable 'd',) (EnumVariable 'e',) (EnumVariable 'f',) (EnumVariable 'a', EnumVariable 'b') (EnumVariable 'a', EnumVariable 'c') (EnumVariable 'a', EnumVariable 'd')

SubsetsGenerator_constant

SubsetsGenerator_constant "generates" a single subset, prescribed by the user.

Attributes

constant
The one and only subset returned by the generator.

The code below will always return a subset containing the first three attributes.

part of subsetsgenerators.py (uses monk1.tab)

gen5 = orange.SubsetsGenerator_constant() gen5.constant = data.domain[:3] for attrs in gen5(data.domain.attributes): print attrs

Why the hell would you need such a generator? There are object that require a subsets generator as a component. In function decomposition, for instance, subsets generators are used to construct a list of candidate subsets. This is a way to force them into observing a single prescribed subset.

SubsetsGenerator_withRestrictions

This is the most complex subsets generator. It uses a generator - one of the above generators - stored in a field subGenerator to generate subsets, but it filters out all the subsets that do not comply to restrictions.

Attributes

subGenerator
A generator that "proposes" subsets.
required
A list of attributes which need (all of them!) to be included in a subset.
forbidden
A list of forbidden attributes that must not appear in a subset.
forbiddenSubSubsets
Combinations of attribute that must not appear in a subset (that is, a subset is invalid if it contains one of the subsubsets in this list).