Metadata-Version: 2.4
Name: Leakly
Version: 0.1.1
Summary: Leakage checks for machine-learning pipelines using permutation tests.
Author: DeMONLab-BioFINDER
License-Expression: MIT
Project-URL: Homepage, https://github.com/DeMONLab-BioFINDER/Leakly
Project-URL: Source, https://github.com/DeMONLab-BioFINDER/Leakly
Project-URL: Issues, https://github.com/DeMONLab-BioFINDER/Leakly/issues
Keywords: data-leakage,machine-learning,permutation-test,scikit-learn
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: joblib<2.0,>=1.3
Requires-Dist: matplotlib<4.0,>=3.7
Requires-Dist: numpy<2.4,>=1.23
Requires-Dist: pandas<3.0,>=1.5
Requires-Dist: PyYAML<7.0,>=6.0
Requires-Dist: scikit-learn<1.7,>=1.6
Requires-Dist: scipy<1.16,>=1.9
Requires-Dist: tqdm<5.0,>=4.64
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: wheel>=0.41; extra == "dev"
Provides-Extra: notebook
Requires-Dist: ipykernel>=6.0; extra == "notebook"
Requires-Dist: ipywidgets>=8.0; extra == "notebook"
Requires-Dist: jupyter>=1.0; extra == "notebook"
Dynamic: license-file

<p align="center">
  <img src="https://raw.githubusercontent.com/DeMONLab-BioFINDER/Leakly/main/assets/leakly-logo.svg" alt="Leakly logo" width="520">
</p>

<p align="center">
  <a href="https://pypi.org/project/Leakly/"><img alt="PyPI" src="https://img.shields.io/pypi/v/leakly.svg"></a>
  <a href="https://github.com/DeMONLab-BioFINDER/Leakly/actions/workflows/ci.yml?query=branch%3Amain"><img alt="Build" src="https://github.com/DeMONLab-BioFINDER/Leakly/actions/workflows/ci.yml/badge.svg?branch=main"></a>
  <a href="https://github.com/DeMONLab-BioFINDER/Leakly/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/license-MIT-green.svg"></a>
</p>

<p align="center">
  <a href="https://colab.research.google.com/github/DeMONLab-BioFINDER/Leakly/blob/main/example.ipynb"><img alt="Open in Google Colab" src="https://colab.research.google.com/assets/colab-badge.svg" height="32"></a>
</p>

# Leakly: Leakage checks for any machine-learning pipeline

`Leakly` uses label permutation to test whether a pipeline still performs above
chance when no true signal is present.

If it does, the pipeline may be leaking test-set information
through preprocessing, feature selection, tuning, or another step.

## Principle

1. Permute labels to remove real signal.
2. Run the full pipeline exactly as a user would run it.
3. Compare the score distribution with chance level.
4. Above-chance permuted performance suggests possible leakage.

Leakly includes example configurations for a leaky pipeline and a non-leaky
pipeline so users can see the effect immediately.

![Example permutation AUC summary](https://raw.githubusercontent.com/DeMONLab-BioFINDER/Leakly/main/assets/AUC.png)

## Install

```bash
pip install Leakly
```

For notebook environments that need the optional notebook dependencies:

```bash
pip install "Leakly[notebook]"
```

For the current GitHub checkout:

```bash
git clone https://github.com/DeMONLab-BioFINDER/Leakly.git
cd Leakly
pip install -e .
```

## Quick Start

### <a href="https://colab.research.google.com/github/DeMONLab-BioFINDER/Leakly/blob/main/example.ipynb"><img alt="Open example.ipynb in Colab" src="https://img.shields.io/badge/Open-example.ipynb-F9AB00?logo=googlecolab&logoColor=white" height="28"></a>

### Key Python snippet

```python
from leakly import (
    MLPipeline,
    SummaryPlotter,
    load_example_leakage_config,
    permute_label,
)

scores = []
for seed in range(100):
    permuted_y = permute_label(data.y, random_state=seed)
    score = (
        # user could replace with any pipeline
        MLPipeline(
            data.X,
            permuted_y,
            covariates=data.covariates,
            config=load_example_leakage_config(),
        ).fit()).evaluate()
    scores.append(score)

SummaryPlotter(scores, chance_level=0.5).plot("assets/AUC.png")
```

## FAQ

**Can Leakly check my own pipeline?**

Yes. Leakly can evaluate any pipeline that takes `X`, `y`, optional `covariates`, and returns a test score. The key is to run the full pipeline exactly as in the real analysis, including preprocessing, feature selection, tuning, and evaluation.

**Why can a leaky pipeline score well on permuted labels?**

After label permutation, there should be no real biological, clinical, or statistical link between features and outcomes. A valid pipeline should therefore perform near chance.

A leaky pipeline may still score well if information from the full dataset enters the analysis before the train/test split or outside the cross-validation loop. Common sources include feature selection, scaling, imputation, covariate adjustment, dimensionality reduction, or hyperparameter tuning performed on all samples.

This is especially problematic in high-dimensional data, such as neuroimaging, omics, or biomarker studies, where random label-specific patterns can appear meaningful by chance. If the test set influences preprocessing or feature selection, the model may "remember" these random patterns and show inflated performance.

**How many permutations should I run?**

Use 100 for a quick check. Use 1,000 or more for publication-level evidence.

## License

MIT. See [LICENSE](LICENSE).
