kNN is one of the simplest learning techniques - the learner only needs to store the examples, while the classifier does its work by observing the most similar examples of the example to be classified.
Classes for kNN learner and classifier are strongly related to classes for measuring distances, described on a separate page.
Attributes
true
)kNNLearner
first constructs an object for measuring distances between examples. distanceConstructor
is used if given; otherwise, Euclidean metrics will be used. kNNLearner
then constructs an instance of FindNearest_BruteForce
. Together with ID of meta attribute with weights of examples, k
and rankWeight
, it is passed to a kNNClassifier
.
Attributes
true
).When called to classify an example, the classifier first calls findNearest
to retrieve a list with k
nearest neighbours. The component findNearest
has a stored table of examples (those that have been passed to the learner) together with their weights. If examples are weighted (non-zero weightID
), weights are considered when counting the neighbours.
If findNearest
returns only one neighbour (this is the case if k
=1), kNNClassifier
returns the neighbour's class.
Otherwise, the retrieved neighbours vote about the class prediction (or probability of classes). Voting has double weights. As first, if examples are weighted, their weights are respected. Secondly, nearer neighbours have greater impact on the prediction; weight of example is computed as exp(-t2/s2), where the meaning of t depends on the setting of rankWeight
.
rankWeight
is false
, t is a distance from the example being classified;rankWeight
is true
, neighbours are ordered and t is the position of the neighbour on the list (a rank)In both cases, s is chosen so that the impact of the farthest example is 0.001.
Weighting gives the classifier certain insensitivity to the number of neighbours used, making it possible to use large k
's.
The classifier can treat continuous and discrete attributes, and can even distinguish between ordinal and nominal attributes. See information on distance measuring for details.
We will test the learner on 'iris' dataset. We shall split it onto train (80%) and test (20%) sets, learn on training examples and test on five randomly selected test examples.
part of knnlearner.py (uses iris.tab)
The secret of kNN's success is that the examples in iris dataset appear in three well separated clusters. The classifier's accuraccy will remain excellent even with very large or small number of neighbours.
As many experiments have shown, a selection of examples distance measure does not have a greater and predictable effect on the performance of kNN classifiers. So there is not much point in changing the default. If you decide to do so, you need to set the distanceConstructor
to an instance of one of the classes for distance measuring.
Still perfect. I've told you.