Metadata-Version: 2.4
Name: PyGlimmerMDS
Version: 1.2.1
Summary: Python Implementation of the Glimmer algorithm for multidimensional scaling
Author: David Hägele
License-Expression: MIT
Project-URL: Homepage, https://github.com/hageldave/PyGlimmerMDS
Project-URL: Bug Tracker, https://github.com/hageldave/PyGlimmerMDS/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: numba
Dynamic: license-file

# PyGlimmerMDS
Multidimensional scaling (MDS) for large data sets - a python implementation of the Glimmer algorithm.  
*[[Glimmer: Multilevel MDS on the GPU - 2009 - IEEE TVCG - Ingram, Munzner, Olano](https://doi.org/10.1109/TVCG.2008.85)]*

Glimmer performs dimensionality reduction on high-dimensional data sets of many instances, 
avoiding the quadratic runtime behavior of naive MDS implementations by employing a multilevel (coarse to fine) approach.
This implementation has a GPU switch, but also gives considerable speedup with only CPU nonetheless and makes MDS on large data
sets feasible.

Glimmer is a metric MDS and uses Euclidean distance in the high-dimensional space as the dissimilarity measure. 
This is **not** the classical MDS that has a linear projection solution.
Instead it solves the following optimization problem:

$$\underset{y_1,..,y_n}{\mathrm{argmin}} ~ \sum_{i=1}^n \sum_{j=i+1}^n \Big(\lVert x_i-x_j \rVert - \lVert y_i-y_j \rVert\Big) ^2 \quad \mathrm{where} x_i \in \mathbb{R}^D \mathrm{and} y_i \in \mathbb{R}^{d \ll D}$$


## Installation
PyGlimmerMDS is available on [PyPi](https://pypi.org/project/PyGlimmerMDS/) and can be installed through `pip`.
```
pip install PyGlimmerMDS
```
or if you want to install a specific commit use
```
pip install git+https://github.com/hageldave/PyGlimmerMDS@<commit_hash>
```

## How to use
### Very briefly
Performing Glimmer on a data set works like this:
```python
mds = Glimmer(decimation_factor=2, stress_ratio_tol=1-1e-5, rng=rng)
projection = mds.fit_transform(data) # alternative: projection, stress = execute_glimmer(data)
print(f"final stress={mds.stress}")
```

### Enable GPU acceleration
The GPU implementation is based on [CuPy](https://github.com/cupy/cupy), which is an optional dependency. 
The GPU implementation will only be available if a CuPy package is installed.
Which CuPy package to install depends on the available hardware and driver (i.e. `cupy-cuda12x`, `cupy-cuda13x`, or `cupy-rocm-7-0` as of writing). GPU acceleration can then be used as follows:
```python
mds = Glimmer(gpu=True, ....)
# or
projection, stress = execute_glimmer_gpu(data, ....)
```

### Complete example
Jittering the Iris data set to produce a data set of 38,400 points. Performing Glimmer on this data set.
```python
from pyglimmermds import Glimmer, execute_glimmer
from sklearn import preprocessing as prep
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed=0xBA0BAB)

# get iris data
dataset = datasets.load_iris()
data = dataset.data
labels = dataset.target
# duplicate data with added noise
for _ in range(8):
  data = np.vstack((data,data+(rng.random((data.shape[0], data.shape[1]))*0.2-.1)))
  labels = np.append(labels,labels)
print(data.shape)
print(labels.shape)

# perform MDS
data = prep.StandardScaler().fit_transform(data)
mds = Glimmer(decimation_factor=2, stress_ratio_tol=1-1e-5, rng=rng)
projection = mds.fit_transform(data) # alternative: projection, stress = execute_glimmer(data)
print(f"final stress={mds.stress}")

# show scatter plot
fig, ax = plt.subplots()
scatter = ax.scatter(projection[:, 0], projection[:, 1], c=labels, s=0.02)
ax.axis('equal')
plt.show(fig)
```
![glimmer_iris](https://github.com/user-attachments/assets/67b10cec-2e76-4e5e-8d63-6ec9ab4765fa)


This video shows the layouting happening per level and iteration

https://github.com/user-attachments/assets/aa9f7a8c-1c03-46a3-8ee1-19b3d2d4033e


