Metadata-Version: 2.3
Name: ggml-ot
Version: 0.9.6
Summary: Global Ground Metric Learning
Author: Damin Kuehn
Author-email: kuehn@cs.rwth-aachen.de
Requires-Python: >=3.11
Classifier: Programming Language :: Python :: 3
Classifier: Development Status :: 5 - Production/Stable
Classifier: Framework :: Sphinx
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Healthcare Industry
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Dist: anndata (>=0.11.4,<0.12.0)
Requires-Dist: gprofiler-official (>=1.0.0,<2.0.0)
Requires-Dist: ipython (>=9.3.0,<10.0.0)
Requires-Dist: matplotlib (>=3.10.1,<4.0.0)
Requires-Dist: numpy (>=2.2.4,<3.0.0)
Requires-Dist: pandas (>=2.2.3,<3.0.0)
Requires-Dist: pot (>=0.9.5,<0.10.0)
Requires-Dist: pydiffmap (>=0.2.0.1,<0.3.0.0)
Requires-Dist: requests (>=2.32.4,<3.0.0)
Requires-Dist: scanpy (>=1.11.1,<2.0.0)
Requires-Dist: scikit-learn (>=1.1,<1.6)
Requires-Dist: scipy (>=1.15.2,<2.0.0)
Requires-Dist: seaborn (>=0.13.2,<0.14.0)
Requires-Dist: torch (>=2.6.0,<3.0.0)
Requires-Dist: tqdm (>=4.67.1,<5.0.0)
Requires-Dist: umap-learn (>=0.5.7,<0.6.0)
Project-URL: Homepage, https://github.com/DaminK/ggml-ot
Project-URL: Issues, https://github.com/DaminK/ggml-ot/issues
Description-Content-Type: text/markdown

# GGML-OT

<img src="https://github.com/DaminK/ggml-ot/blob/main/docs/source/images/icon_ggrouml.png?raw=True" width="300" />

## Abstract

Optimal transport (OT) provides a robust framework for comparing probability distributions.
Its effectiveness is significantly influenced by the choice of the underlying ground metric.
Traditionally, the ground metric has either been (i) predefined, e.g. as a Euclidean metric, or (ii) learned in a supervised way, by utilizing labeled data to learn a suitable ground metric for enhanced task-specific performance.
While predefined metrics often do not account for the inherent structure and varying significance of different features in the data, existing supervised ground metric learning methods often fail to generalize across multiple classes or are limited to distributions with shared supports.
To address this issue, this paper introduces a novel approach for learning metrics for arbitrary distributions over a shared metric space.
Our method differentiates elements like a global metric, but requires only class labels on a distribution-level for training akin a ground metric.
The resulting learned global ground metric enables more accurate OT distances, which can significantly improve clustering and classification tasks. It can create task-specific shared embeddings across elements of different distributions including unseen data.

## Installation

### Via pip

```terminal
pip install ggml-ot
```

### Manual

```terminal
git clone https://github.com/DaminK/ggml-ot
cd ggml-ot
pip install poetry
poetry lock && poetry install
```

### Development installation

```terminal
git clone https://github.com/DaminK/ggml-ot
cd ggml-ot
pip install poetry
peotry lock && poetry install --with dev
pre-commit install
```

## Citation

If you use this code in your research, please cite the following paper:
> Global Ground Metric Learning with Applications to scRNA data
>
> Damin Kuehn and Michael T. Schaub, Department of Computer Science RWTH Aachen
>
> Published at AISTATS2025 (DOI will follow)

