Metadata-Version: 2.1
Name: parametric_umap
Version: 0.1.0
Summary: A streamlined and fast implementation of parametric UMAP using PyTorch and FAISS
Home-page: https://github.com/mr-fcharles/parametric_umap
Author: Francesco Carli
Author-email: francesco.carli94@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: torch
Requires-Dist: faiss-cpu
Requires-Dist: tqdm

# Parametric UMAP
A PyTorch implementation of Parametric UMAP (Uniform Manifold Approximation and Projection) for learning low-dimensional parametric embeddings of high-dimensional data.

## Install
A proper installation of [PyTorch](https://pytorch.org) (possibly with GPU accelaration) is suggested before installing this package. The package can then be installed with

```python
pip install parametric_umap
```

## Overview
Parametric UMAP ([original paper](https://arxiv.org/abs/2009.12981)) extends the original UMAP algorithm by learning a neural network that can map new data points to the lower-dimensional space without having to rerun the entire optimization. This (unofficial) implementation provides a flexible and efficient way to perform parametric dimensionality reduction leveraging PyTorch and FAISS.

## Features
- Neural network-based parametric mapping
- Efficient nearest neighbor computation using FAISS
- Sparse matrix operations for memory efficiency
- GPU acceleration support
- Model saving and loading capabilities
- Correlation loss term to preserve distance relationships

## Quick start
```python
from parametric_umap import ParametricUMAP
from sklearn.datasets import make_swiss_roll
import numpy as np

# Create sample data
n_samples = 1000
X, color = make_swiss_roll(n_samples=n_samples, random_state=42)

# Initialize and fit the model
pumap = ParametricUMAP(
    n_components=2,
    hidden_dim=128,
    n_layers=3,
    n_epochs=10
)

# Fit and transform the data
embeddings = pumap.fit_transform(X)

# Transform new data
X_new = np.random.rand(100, 3)
new_embeddings = pumap.transform(X_new)
```

## Key Parameters
Hyperparameters default values follow the [original UMAP implementation](https://umap-learn.readthedocs.io/en/latest/)

**UMAP parameters**
- `a`: parameter for scaling distances between embedded points
- `b`: parameter for controlling sharpness of the curve's transition between attraction and repulsion
- `n_neighbors`: number of neighbors to compute for UMAP knn graph (default: 15)

**Parametric model**
- `n_components`: Dimension of the output embedding (default: 2)
- `hidden_dim`: Dimension of hidden layers in the MLP (default: 1024)
- `n_layers`: Number of hidden layers (default: 3)
- `n_neighbors`: Number of nearest neighbors (default: 15)
- `correlation_weight`: Weight of the correlation loss term (default: 0.1)
- `learning_rate`: Learning rate for optimization (default: 1e-4)
- `n_epochs`: Number of training epochs (default: 10)
- `batch_size`: Training batch size (default: 32)
- `use_batchnorm`: Whether to use batch normalization in the embedding MLP (default: False)
- `use_dropout`: Whether to use dropout in the embedding MLP (default: False)
