Metadata-Version: 2.4
Name: gpu-tsne
Version: 0.1.0
Summary: GPU-accelerated t-SNE for CUDA, MPS, and CPU
Author-email: Jayant Parashar <parasharjayant10@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Jayantparashar10/g-tsne
Project-URL: Repository, https://github.com/Jayantparashar10/g-tsne
Keywords: tsne,dimensionality-reduction,gpu,visualization,machine-learning
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: numpy>=1.24
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Dynamic: license-file

# g-tsne

GPU-accelerated t-SNE for Python. Runs on CUDA, Apple MPS, and CPU — no code changes needed across backends. Drop-in sklearn-compatible API.

```python
from g_tsne import TSNE

embedding = TSNE(n_components=2, perplexity=30, device='cuda').fit_transform(X)
```

---

## Install

```bash
pip install -e .
```

**Dependencies:** PyTorch ≥ 2.0, NumPy. FAISS is optional (used for faster KNN when available).

---

## Quick Start

```python
import numpy as np
from g_tsne import TSNE

X = np.random.randn(5000, 128)

# Auto-detects best device (CUDA > MPS > CPU)
embedding = TSNE(perplexity=30).fit_transform(X)

# Explicit device
embedding = TSNE(perplexity=30, device='cuda').fit_transform(X)

# Cosine distance (useful for text/image embeddings)
embedding = TSNE(perplexity=30, metric='cosine').fit_transform(X)

# Online: embed new points into an existing layout
tsne = TSNE(perplexity=30).fit(X)
X_new = np.random.randn(20, 128)
new_embedding = tsne.transform(X_new)
```

---

## API

### `TSNE(...)`

| Parameter | Default | Description |
|---|---|---|
| `n_components` | `2` | Output dimensionality |
| `perplexity` | `30.0` | Effective neighbor count. Scalar or list for multiscale. Try 5–50. |
| `learning_rate` | `'auto'` | `'auto'` sets η = N / early_exaggeration (Belkina 2019). Or pass a float. |
| `early_exaggeration` | `12.0` | P scaling during early phase |
| `early_exaggeration_iter` | `250` | Iterations in early phase |
| `n_iter` | `750` | Total iterations (includes early exaggeration) |
| `initialization` | `'pca'` | `'pca'`, `'random'`, or an `(N, n_components)` numpy array |
| `metric` | `'euclidean'` | `'euclidean'` or `'cosine'` |
| `negative_gradient_method` | `'auto'` | `'auto'` (exact for N < 30K, fft for N ≥ 30K), `'exact'`, or `'fft'` |
| `exaggeration` | `None` | Extra P scaling during refinement phase |
| `dof` | `1.0` | Student-t degrees of freedom. 1.0 = standard t-SNE. |
| `max_step_norm` | `None` | Per-point velocity clip. `None` = disabled. |
| `knn_backend` | `'auto'` | `'auto'`, `'faiss'`, or `'exact'` |
| `knn_index` | `None` | Precomputed `(knn_indices, knn_dists)` to skip KNN search |
| `callbacks` | `None` | List of `fn(iter, kl, embedding) → bool`. Return `True` to stop early. |
| `callbacks_every_iters` | `50` | How often callbacks and verbose logging fire |
| `random_state` | `None` | Integer seed for reproducibility |
| `verbose` | `False` | Print iteration, KL divergence, and timing |
| `device` | `None` | `'cuda'`, `'mps'`, `'cpu'`, or `None` (auto-detect) |

### Methods

| Method | Returns | Description |
|---|---|---|
| `fit_transform(X)` | `np.ndarray` | Compute and return the embedding |
| `fit(X)` | `TSNE` | Compute embedding, store state for `transform` |
| `transform(X_new)` | `np.ndarray` | Embed new points into existing layout without re-running on all data |

`X` can be a numpy array or a PyTorch tensor.

---

## How It Works

Two gradient computation paths, selected automatically:

- **N < 30K** — exact O(N²) GPU matrix multiply. A single cuBLAS call is faster than FFT at this scale.
- **N ≥ 30K** — FFT-based O(N log N) repulsive gradient on a power-of-2 grid (256 or 512), using cuFFT / Metal FFT.

Per-point bandwidth (σ) is found via Newton's method rather than binary search — converges in 5–10 iterations instead of 50.

Online `transform` runs in O(m²) where m is the number of new points, instead of O((N+m)²) if the full embedding were re-run.

---

## Running Tests

```bash
pytest tests/
```

```bash
# Benchmark
python tests/benchmark.py
```
