Metadata-Version: 2.4
Name: rustpam
Version: 0.1.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: numpy>=1.20
Requires-Dist: scikit-learn>=1.0
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=4.0 ; extra == 'dev'
Requires-Dist: black>=23.0 ; extra == 'dev'
Requires-Dist: isort>=5.12 ; extra == 'dev'
Requires-Dist: mypy>=1.0 ; extra == 'dev'
Requires-Dist: maturin>=1.9,<2.0 ; extra == 'dev'
Provides-Extra: dev
License-File: LICENSE
Summary: High-performance PAM (k-medoids) clustering implemented in Rust with Python bindings
Keywords: clustering,k-medoids,PAM,machine-learning,rust,parallel
Author-email: RustPAM Contributors <your.email@example.com>
Maintainer-email: RustPAM Contributors <your.email@example.com>
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/yourusername/rustpam
Project-URL: Documentation, https://github.com/yourusername/rustpam#readme
Project-URL: Repository, https://github.com/yourusername/rustpam
Project-URL: Bug Tracker, https://github.com/yourusername/rustpam/issues
Project-URL: Changelog, https://github.com/yourusername/rustpam/releases

# RustPAM - High-Performance PAM Clustering in Rust

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

RustPAM is a Rust reimplementation of OneBatchPAM (k-medoids clustering) using modern engineering practices and the Rayon parallelization framework, providing better performance and maintainability than the original Cython version.

## Features

- 🚀 **High Performance**: Core algorithm implemented in Rust with zero-cost abstractions
- ⚡ **Parallelization**: Data parallelism based on Rayon, fully utilizing multi-core CPUs
- 🔧 **Engineering Excellence**: Built with PyO3 + maturin, seamlessly integrating into Python ecosystem
- 📦 **User Friendly**: scikit-learn compatible API
- 🎯 **Memory Efficient**: Batch sampling reduces memory footprint

## Installation

### Install from Source

```bash
# Clone the repository
git clone <repository-url>
cd rustpam

# Build and install with maturin
pip install maturin
maturin develop --release
```

### Install with pip (after building)

```bash
pip install rustpam
```

## Requirements

- Python >= 3.8
- NumPy >= 1.20
- scikit-learn >= 1.0
- Rust (required for building)

## Quick Start

```python
import numpy as np
from rustpam import OneBatchPAM

# Generate sample data
X = np.random.randn(1000, 10).astype(np.float32)

# Create model
model = OneBatchPAM(
    n_medoids=5,
    distance='euclidean',
    max_iter=100,
    random_state=42,
    n_threads=4  # Use 4 threads
)

# Fit model
model.fit(X)

# Get cluster centers and labels
centers = model.cluster_centers_
labels = model.labels_

# Predict new data
X_new = np.random.randn(100, 10).astype(np.float32)
new_labels = model.predict(X_new)

print(f"Medoid indices: {model.medoid_indices_}")
print(f"Inertia: {model.inertia_:.4f}")
print(f"Iterations: {model.n_iter_}")
```

## API Documentation

### OneBatchPAM

**Parameters:**

- `n_medoids` (int, default=10): Number of clusters
- `distance` (str, default='euclidean'): Distance metric, supports all scikit-learn distances
- `batch_size` ('auto' or int, default='auto'): Batch size
- `weighting` (bool, default=True): Whether to use cluster size weighting
- `max_iter` (int, default=100): Maximum number of iterations
- `tol` (float, default=1e-6): Convergence tolerance
- `n_jobs` (int or None, default=None): Parallelism for sklearn distance computation
- `random_state` (int or None, default=None): Random seed
- `n_threads` (int or None, default=None): Number of threads for Rust core

**Attributes:**

- `medoid_indices_`: Indices of selected medoids
- `labels_`: Cluster label for each sample
- `inertia_`: Objective function value
- `dist_to_nearest_medoid_`: Distance to nearest medoid
- `n_iter_`: Actual number of iterations
- `cluster_centers_`: Medoid feature vectors

**Methods:**

- `fit(X)`: Fit the model
- `predict(X)`: Predict cluster labels
- `fit_predict(X)`: Fit and return medoid indices

## Performance Comparison

Compared to the original Cython implementation, RustPAM offers:

1. **Better Parallel Scalability**: Rayon's work-stealing scheduler is more efficient than OpenMP
2. **Memory Safety**: Rust's ownership system prevents memory leaks and data races
3. **Easier Maintenance**: Type system and modern toolchain improve code quality
4. **Cross-Platform**: Better Windows/macOS/Linux support

## Algorithm Description

OneBatchPAM is an optimized variant of PAM (Partitioning Around Medoids):

1. **Batch Sampling**: Uses a sampled batch to approximate instead of full distance matrix
2. **Greedy Swap**: Finds the best medoid swap pair in each iteration
3. **Parallelization**: Independent evaluation steps can be executed in parallel
4. **Weighting**: Optional cluster size weighting improves stability for small samples

## Development

```bash
# Install development dependencies
pip install maturin pytest numpy scikit-learn

# Development mode build
maturin develop

# Run tests
pytest tests/

# Release mode build
maturin build --release
```

## Project Structure

```
rustpam/
├── src/
│   └── lib.rs           # Rust core implementation
├── rustpam/
│   ├── __init__.py      # Python package initialization
│   └── onebatchpam.py   # Python wrapper layer
├── Cargo.toml           # Rust dependencies
├── pyproject.toml       # Python project configuration
└── README.md
```

## Tech Stack

- **Rust**: Core algorithm implementation
- **PyO3**: Python-Rust bindings
- **maturin**: Build system
- **ndarray**: Rust array library
- **rayon**: Data parallelism framework
- **numpy**: Python array interface

## License

MIT License

## Contributing

Contributions are welcome! Please submit Issues or Pull Requests.

## Acknowledgments

This project is based on the original Cython implementation, rewritten in Rust to provide better performance and maintainability.

## Contact

For questions or suggestions, please submit a GitHub Issue.

