Metadata-Version: 2.4
Name: sude
Version: 0.2.1
Summary: A scalable manifold learning (SUDE) method that can cope with large-scale and high-dimensional data in an efficient manner.
Project-URL: Homepage, https://github.com/ZPGuiGroupWhu/SUDE-pkg
Project-URL: Repository, https://github.com/ZPGuiGroupWhu/SUDE-pkg.git
Project-URL: Issues, https://github.com/ZPGuiGroupWhu/SUDE-pkg/issues
Author-email: pdh <pengdh@whu.edu.cn>
License: MIT
Keywords: dimension reduction,embedding,landmark sampling,manifold learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Requires-Dist: numba>=0.57
Requires-Dist: numpy>=1.21
Requires-Dist: scikit-learn>=1.3.2
Requires-Dist: scipy>=1.8
Provides-Extra: test
Requires-Dist: pytest>=7; extra == 'test'
Requires-Dist: tomli>=1.1.0; (python_version < '3.11') and extra == 'test'
Description-Content-Type: text/markdown

# Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data (SUDE)

We propose a scalable manifold learning (SUDE) method that can cope with
large-scale and high-dimensional data in an efficient manner. It starts by
seeking a set of landmarks to construct the low-dimensional skeleton of the
entire data, and then incorporates the non-landmarks into this skeleton based
on the constrained locally linear embedding.

This repository provides the Python version of SUDE. Version 0.2.1 keeps the
public API of the original `sude` package while improving the runtime of the
probability construction, gradient computation, and non-landmark embedding
steps. The MATLAB version can be found at https://github.com/ZPGuiGroupWhu/sude.
The related paper has been published in *Nature Machine Intelligence*:
https://www.nature.com/articles/s42256-025-01112-9.

![image](https://raw.githubusercontent.com/ZPGuiGroupWhu/SUDE-pkg/refs/heads/main/image/sude.jpg)

## Project layout

The project now follows the structure of the
`scikit-learn-contrib/project-template`:

```text
.
|-- .github/workflows/
|-- benchmarks/
|-- doc/
|-- examples/
|-- image/
|-- sude/
|   |-- __init__.py
|   |-- _learning_utils.py
|   |-- _numba_kernels.py
|   |-- _sude.py
|   |-- _version.py
|   `-- learning.py
|-- tests/
|-- pyproject.toml
`-- README.md
```

## Installation
Supported `python` versions are `3.8` and above.

This project has been uploaded to [PyPI](https://pypi.org/project/sude/), supporting direct download and installation from pypi

```
pip install sude
```

Numba-accelerated kernels are installed by default. SUDE enables them
automatically when both the unique input sample count and the landmark count
are large enough.

The default thresholds are:

```python
NUMBA_AUTO_MIN_SAMPLES = 3000
NUMBA_AUTO_MIN_LANDMARKS = 512
```

You can adjust them before fitting:

```python
import sude.learning as sude_learning

sude_learning.NUMBA_AUTO_MIN_SAMPLES = 5000
sude_learning.NUMBA_AUTO_MIN_LANDMARKS = 1024
```

Both values must be positive integers. If either value is invalid, SUDE falls
back to using numba whenever numba is installed.

### Manual installation

```
git clone https://github.com/ZPGuiGroupWhu/SUDE-pkg.git
cd SUDE-pkg
pip install -e .
```

## How to run

The package now exposes both a scikit-learn style estimator class and a
function wrapper with matching parameter names.

### Estimator interface

```python
import numpy as np
from sude import SUDE
import time
import matplotlib.pyplot as plt

# Input data
data = np.loadtxt("benchmarks/rice.csv", delimiter=",")

# Obtain data size and true annotations
m = data.shape[1]
X = data[:, :m - 1]
ref = data[:, m - 1]

# Fit a scikit-learn style estimator
start_time = time.time()
model = SUDE(
    n_components=2,
    n_neighbors=10,
    init="pca",
    max_iter=50,
)
Y = model.fit_transform(X)
end_time = time.time()
print("Elapsed time:", end_time - start_time, 's')

plt.scatter(Y[:, 0], Y[:, 1], c=ref, cmap='tab10', s=4)
plt.show()
```

The estimator provides the familiar API:

```python
model = SUDE(n_components=2, n_neighbors=10, init="le")
Y_train = model.fit_transform(X_train)
Y_test = model.transform(X_test)
```

### Function interface

The function entry point uses the same sklearn-style parameter names as the
estimator:

```python
from sude import sude

Y = sude(X, n_components=2, n_neighbors=10, init="le", max_iter=50)
```

For readers comparing with the paper or original function interface,
``n_components`` corresponds to ``no_dims``, ``n_neighbors`` corresponds to
``k1``, ``init`` corresponds to ``initialize``, and ``max_iter`` corresponds to
``T_epoch``.

Run the packaged example with:

```bash
uv run python examples/plot_sude_embedding.py
```

Run the test suite with:

```bash
uv run python -m unittest discover -s tests
```

## Citation request
Peng, D., Gui, Z., Wei, W. et al. Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data. Nat. Mach. Intell. (2025). https://doi.org/10.1038/s42256-025-01112-9


## License
SUDE is released under the MIT License.
