Greedy permutation on shape data sets
Below, the landmark
function is demonstrated on the “aggregation” dataset (taken from the clustering basic benchmark). 40 landmarks are selected, which are shown in red.
Code
"aggregation", k=40)) show(fig_landmarks(
Code
"compound", k=40)) show(fig_landmarks(
Code
"pathbased", k=40)) show(fig_landmarks(
Code
"spiral", k=40)) show(fig_landmarks(
Code
"d31", k=40)) show(fig_landmarks(
Code
"r15", k=40)) show(fig_landmarks(
Code
"jain", k=40)) show(fig_landmarks(
Code
"flame", k=40)) show(fig_landmarks(
The landmark indices returned by landmark
represent the \(k\)-prefix of the greedy permutation.
Code
= load_shape("aggregation")[:,:2]
X = landmarks(X, k = K, full_output=True)
ind, info
## Show coverage of the union
= figure(width=350, height=350, title="Coverage guarantee")
p *X[ind].T, radius=info['radii'][-1], fill_color='yellow', fill_alpha=0.15)
p.circle(*X.T, color='lightblue', size=3, line_color='black', line_width=0.5)
p.scatter(*X[ind].T, color='red', size=6, line_color='white')
p.scatter(
## Show packing of the union
= figure(width=350, height=350, title="Packing guarantee")
q *X[ind].T, radius=info['radii'][-1] / 2.0, fill_color='orange', fill_alpha=0.15)
q.circle(*X.T, color='lightblue', size=3, line_color='black', line_width=0.5)
q.scatter(*X[ind].T, color='red', size=6, line_color='white')
q.scatter( show(row(p, q))
Generalized metrics
For point cloud data, any Minkowski distance is supported data out-of-the-box, either by supplying its name via the metric
argument or by passing metric='minkowksi'
and a suitable p
, i.e.
=K, metric="cityblock")
landmarks(X, k# -or-
=K, metric="minkowksi", p=1) landmarks(X, k
In general, different metrics lead to distinct solutions shaped by their respective distance measures, as clusters and center placements reflect the geometric and statistical characteristics defined by the metric. For example, below are the first K
landmarks clustered using the \(1\)-, \(2\)-, and \(\infty\)- \(p\)-norms:
Code
= load_shape("compound")[:,:2]
X = []
figs for metric in ['cityblock', 'euclidean', 'chebychev']:
= landmarks(X, k = K, metric=metric)
ind = figure(title=f"Landmarks with {metric} metric")
pc *X.T, color='lightblue', size=3, line_color='black', line_width=0.5)
pc.scatter(*X[ind].T, color='red', size=6, line_color='white')
pc.scatter(= None
pc.toolbar_location = 'scale_width'
pc.sizing_mode
figs.append(pc)
= row(figs)
figs_row = 'scale_width'
figs_row.sizing_mode show(figs_row)
With the Euclidean norm (\(p = 2\)), centers are placed to minimize the maximum radius (maximal dispersion) within clusters, leading to solutions that tend to have spherical shapes. When \(p=1\), the centers tend to be placed at medians of coordinate ranges. When \(p=\infty\), clusters can be elongated along the coordinate axes leading to centers that favor a more ‘grid-like’ placement.