XCS Tutorial

This is the official tutorial for the xcs package for Python 3. You can find the latest release and get updates on the project's status at the project home page on GitHub.com.

What is XCS?

XCS is a Python 3 implementation of the XCS algorithm as described in the 2001 paper, An Algorithmic Description of XCS, by Martin Butz and Stewart Wilson. XCS is a type of Learning Classifier System (LCS), a machine learning algorithm that utilizes a genetic algorithm acting on a rule-based system, to solve a reinforcement learning problem.

In its canonical form, XCS accepts a fixed-width string of bits as its input, and attempts to select the best action from a predetermined list of choices using an evolving set of rules that match inputs and offer appropriate suggestions. It then receives a reward signal indicating the quality of its decision, which it uses to adjust the rule set that was used to make the decision. This process is subsequently repeated, allowing the algorithm to evaluate the changes it has already made and further refine the rule set.

A key feature of XCS is that, unlike many other machine learning algorithms, it not only learns the optimal input/output mapping, but also produces a minimal set of rules for describing that mapping. This is a big advantage over other learning algorithms such as neural networks whose models are largely opaque to human analysis, making XCS an important tool in any data scientist's tool belt.

The XCS library provides not only an implementation of the standard XCS algorithm, but a set of interfaces which together constitute a framework for implementing and experimenting with other LCS variants. Future plans for the XCS library include continued expansion of the tool set with additional algorithms, and refinement of the interface to support reinforcement learning algorithms in general.

Terminology

Being both a reinforcement learning algorithm and an evolutionary algorithm, XCS requires an understanding of terms pertaining to both.

Situation

A situation is just another term for an input received by the classifier.

Action

An action is an output produced by the classifier.

Rule

A rule is a pairing between a condition, describing which situations can be matched, and a suggested action. Each rule has an associated prediction indicating the expected reward if the suggested action is taken when the condition matches the situation, a fitness indicating its suitability for reproduction and continued use in the population, and a numerosity value which indicates the number of (virtual) instances of the rule in the population. (There are other parameters associated with each rule, as well, but these are visibly important ones.)

Population

This is the collection of all rules currently used and tracked by the classifier. The genetic algorithm operates on this set of rules over time to optimize them for accuracy and generality in their descriptiveness of the problem space. Note that the population is virtual, meaning that if the same rule has multiple copies in the population, it is represented only once, with an associated numerosity value to indicate the number of virtual instances of the rule in the population.

Match Set

The match set is the set of rules which match against the current situation.

Action Set

The action set is the set of rules which match against the current situation and recommend the selected action. Thus the action set is a subset of the match set. In fact, the match set can be seen as a collection of mutually exclusive and competing action sets, from which only one is to be selected.

Reward

The reward is the signal the algorithm attempts to maximize. It takes the form of a floating point value passed to the action set via the accept_payoff() method.

Prediction

A prediction is an estimate by a rule or an action set as to the reward expected to be received by taking the suggested action in the given situation. The prediction of an action set is formed by taking the fitness-weighted average of the predictions made by the individual rules within it.

Fitness

Fitness is another floating point value similar in function to the reward, except that in this case it is an internal signal defined by the algorithm itself, which is then used as a guide for selection of which rules are to act as parents to the next generation. Each rule in the population has its own associated fitness value. In XCS, as opposed to strength-based LCS variants such as ZCS, the fitness is actually based on the accuracy of each rule's reward prediction, as opposed to its size. Thus a rule with a very low expected reward can have a high fitness provided it is accurate in its prediction of low reward, whereas a rule with very high expected reward may have low fitness because the reward it receives varies widely from one reward cycle to the next. Using reward prediction accuracy instead of reward prediction size helps XCS find rules that describe the problem in a stable, predictable way.

Installation

To install xcs, you will of course need a Python 3 interpreter. The latest version of the standard CPython distribution is available for download from the Python Software Foundation, or if you prefer a download that comes with a long list of top-notch machine learning and scientific computing packages already built for you, I recommend Anaconda from Continuum Analytics.

Starting with Python 3.4, the standard CPython distribution comes with the package installation tool, pip, as part of the standard distribution. Anaconda comes with pip regardless of the Python version. If you have pip, installation of xcs is straight forward:

pip install xcs

If all goes as planned, you should see a message like this:

Successfully installed xcs-1.0.0

If for some reason you are unable to use pip, you can still install xcs manually. The latest release can be found here or here. Download the zip file, unpack it, and cd into the directory. Then run:

python setup.py install

You should see a message indicating that the package was successfully installed.

It is recommended that you also install numpy if you are using a Python distribution that does not come with it already installed. The xcs package will still work without numpy, but you should expect slower execution speeds. To install numpy with pip:

pip install numpy

For instructions on how to manually install numpy, visit the numpy installation instructions page at SciPy.org.

Testing the Newly Installed Package

Let's start things off with a quick test, to verify that everything has been installed properly. First, fire up the Python interpreter. We'll set up Python's built-in logging system so we can see the test's progress.

In [1]:
import logging
logging.root.setLevel(logging.INFO)

Then we import the xcs module and run the built-in test() function. By default, the test() function runs the canonical XCS algorithm on the 11-bit (3-bit address) MUX problem for 10,000 steps.

In [ ]:
import xcs
xcs.test()
INFO:xcs.problems:Possible actions:
INFO:xcs.problems:    False
INFO:xcs.problems:    True
INFO:xcs.problems:Steps completed: 0
INFO:xcs.problems:Average reward per step: 0.00000
INFO:xcs.problems:Steps completed: 100
INFO:xcs.problems:Average reward per step: 0.46000
INFO:xcs.problems:Steps completed: 200
INFO:xcs.problems:Average reward per step: 0.52000

.
.
.

00#11###### => True
    Time Stamp: 9992
    Average Reward: 1.0
    Error: 0.0
    Fitness: 0.78492797199
    Experience: 535
    Action Set Size: 84.31258405257977
    Numerosity: 33
0#1#1#1#### => True
    Time Stamp: 9978
    Average Reward: 1.0
    Error: 0.0
    Fitness: 0.813621637225
    Experience: 325
    Action Set Size: 75.4028428842268
    Numerosity: 37

INFO:xcs:Total time: 24.01673 seconds

Note that your results may vary somewhat from what is shown here. XCS relies on randomization to discover new rules, so unless you set the random seed with random.seed(), each run will be different.

Usage

Now we'll run through a quick demo of how to use existing algorithms and problems. This is essentially the same code that appears in the test() function we called above.

First, we're going to need to import a few things:

In [1]:
from xcs import XCSAlgorithm, LCS
from xcs.problems import MUXProblem, OnLineObserver

The XCSAlgorithm class contains the actual XCS algorithm implementation. The LCS class combines the selected algorithm with its state (a Population instance) to form a learning classifier system. MUXProblem is the classic multiplexer problem, which defaults to 3 address bits (11 bits total). OnLineObserver is a wrapper for problems which logs the inputs, actions, and rewards as the algorithm attempts to solve the problem.

Now that we've imported the necessary tools, we can define the actual problem, telling it to give the algorithm 10,000 reward cycles to attempt to learn the appropriate input/output mapping, and wrapping it with an observer so we can see the algorithm's progress.

In [2]:
problem = OnLineObserver(MUXProblem(50000))

Next, we'll create the algorithm which will be used to manage the classifier population and learn the mapping defined by the problem we have selected:

In [3]:
algorithm = XCSAlgorithm(problem.get_possible_actions())

We ask the problem for the possible actions that can be taken, and pass them to the XCS algorithm so they can be used in covering operations. (Covering is the generation of a random classifier rule when too few match the current situation.) The algorithm's parameters are set to appropriate defaults for most problems, but it is straight forward to modify them if it becomes necessary.

In [4]:
algorithm.exploration_probability = .1
algorithm.discount_factor = 0
algorithm.do_GA_subsumption = True
algorithm.do_action_set_subsumption = True

Here we have selected an exploration probability of .1, which will sacrifice most (9 out of 10) learning opportunities in favor of taking advantage of what has already been learned so far. This makes sense in real-time learning environment; a lower value is more appropriate in cases where the classifier is being trained in advance or is being used simply to learn a minimal rule set. The discount factor is set to 0, since future rewards are not affected at all by the currently selected action. We have also elected to turn on GA and action set subsumption, which help the system to converge to the minimal effective rule set more quickly in some types of problems.

Next, we create the classifier itself:

In [5]:
classifier = LCS(algorithm)

The LCS instance will automatically create an empty population for us if we do not provide one.

And finally, this is where all the magic happens:

In [6]:
classifier.learn(problem)

We pass the problem to the classifier and ask it to learn the appropriate input/output mapping. It executes training cycles until the problem dictates that training should stop. Note that if you wish to see the progress as the algorithm learns the problem, you will need to set the logging level to INFO, as described in the previous section, before calling the learn() method.

Now we can observe the fruits of our labors.

In [ ]:
print(classifier.population)
00##00####1 => True
    Time Stamp: 49977
    Average Reward: 0.87821679153
    Error: 0.202410212397
    Fitness: 0.00194208652311
    Experience: 0
    Action Set Size: 1
    Numerosity: 1
#001######1 => True
    Time Stamp: 49979
    Average Reward: 0.858333333333
    Error: 0.179444444444
    Fitness: 0.00747704267322
    Experience: 0
    Action Set Size: 1
    Numerosity: 1
#00##1##### => True

.
.
.

001#0###### => False
    Time Stamp: 49990
    Average Reward: 1.0
    Error: 0.0
    Fitness: 0.932438846396
    Experience: 752
    Action Set Size: 13.743613415253174
    Numerosity: 9
101#####0## => False
    Time Stamp: 49971
    Average Reward: 1.0
    Error: 0.0
    Fitness: 0.960029964669
    Experience: 741
    Action Set Size: 16.74403701264255
    Numerosity: 10

This gives us a printout of each rule, in the form condition => action, followed by various stats about the rule pertaining to the algorithm we selected. The population can also be accessed as an iterable container:

In [8]:
print(len(classifier.population))
104
In [9]:
for condition, action in classifier.population:
    metadata = classifier.population.get_metadata(condition, action)
    if metadata.fitness > .5:
        print(condition, '=>', action, ' [%.5f]' % metadata.fitness)
101#####0## => False  [0.96003]
001#0###### => False  [0.93244]
011###0#### => False  [0.85302]
100####0### => False  [0.67857]
111#######0 => False  [0.93936]
111#######1 => True  [0.89453]
110######1# => True  [0.88251]
#001###1### => True  [0.82224]
01###00#0## => True  [0.78859]
1#0####0#0# => False  [0.63669]
011###1#### => True  [0.90256]
#011###11#1 => True  [0.51872]