Cluster Analysis

The tool uses clustering methodologies implemented by the scikit-learn team (scikit-learn.org). This clustering is unsupervised, meaning that a multiple band dataset is supplied, and regions of the dataset is automatically grouped into classes.

Cluster Algorithm

This can be k-means, DBSCAN or Birch.

Minimum Clusters and Maximum Clusters (k-means, Birch)

It refers to the generation of a range of solutions. For example, if minimum clusters is 5, and maximum clusters is 7, then 3 interpretations are produced - one with 5 classes, one with 6 classes and one with 7 classes.

Maximum iterations (k-means)

Maximum number of iterations to use when generating a solution.

Tolerance (k-means)

Refers to the relative tolerance with regards to inertia to declare convergence and stop iterating. When the relative decrease in the objective function between iterations is less than the given tolerance value, this results in the stopping of the iterations.

eps (DBSCAN)

The maximum distance between two samples for them to be considered as in the same neighborhood.

Minimum samples (DBSCAN)

The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.

Threshold(Birch)

The radius of the subcluster obtained by merging a new sample and the closest subcluster should be lesser than the threshold. Otherwise a new subcluster is started. Setting this value to be very low promotes splitting and vice-versa.

Branching factor (Birch)

Maximum number of Charactersitic Feature subclusters in each node. If a new sample enters such that the number of subclusters exceed the branching_factor then that node is split into two nodes with the subclusters redistributed in each. The parent subcluster of that node is removed and two new subclusters are added as parents of the 2 split nodes.

Scaling

This refers to the scaling of data as a preprocessing step. It can be one of the following: