Metadata-Version: 2.4
Name: nacvi
Version: 0.1.0
Summary: Cluster Validity Indices for Noise-Aware Clusterings (e.g. DBSCAN)
Home-page: https://github.com/leaeb/noise-aware-cvi
Author: Lea Eileen Brauner
Author-email: le.brauner@ostfalia.de
License: GPL-3.0-or-later
Project-URL: Bug Tracker, https://github.com/leaeb/noise-aware-cvi/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: contourpy==1.3.2
Requires-Dist: cycler==0.12.1
Requires-Dist: fonttools==4.58.5
Requires-Dist: joblib==1.5.1
Requires-Dist: kiwisolver==1.4.8
Requires-Dist: numpy==2.3.1
Requires-Dist: packaging==25.0
Requires-Dist: pandas==2.3.1
Requires-Dist: pillow==11.3.0
Requires-Dist: pyparsing==3.2.3
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: pytz==2025.2
Requires-Dist: scikit-learn==1.7.0
Requires-Dist: scipy==1.16.0
Requires-Dist: six==1.17.0
Requires-Dist: threadpoolctl==3.6.0
Requires-Dist: tzdata==2025.2
Dynamic: license-file

# Noise-Aware Cluster Validity Indices (NACVI)

 This repository contains a Python implementation of internal cluster validity indices specifically designed for **noise-aware Clusterings** (e.g. DBSCAN). The validity indices presented here explicitly consider **unassigned data points (noise)**, which makes them particularly suitable for realistic, unsupervised settings.

> This is based on the scientific publication:  
> **Lea Eileen Brauner, Frank Höppner, Frank Klawonn**  
> *Cluster Validity for Noise-Aware Clusterings*, Intelligent Data Analysis Journal, IOS Press (2025)

---

## Content

You can find the implementations of the following NACVIs:

- `sil+`: noise-aware Silhouette Coefficient
- `dbi+`: noise-aware Davies-Bouldin Index
- `gD33+`: noise-aware Dunn-Index-Variant
- `sf+`: noise-aware Score Function
- `grid+`: grid-based noise-validity index
- `nr+`: neighbourhood-based Noise-validity index

---

## Motivation

Conventional validity measures treat **all data points as belonging to a cluster**, even if noise is explicitly labelled in DBSCAN, for example. This leads to distorted evaluations.

This package:
- takes noise into account correctly,
- enables a separate evaluation of the **cluster structure** and the **noise delimitation**,
- offers an **integrated metric** for both with the `B+` score.

---

## Installation

```bash
pip install nacvi
```

## Usage

In examples/usage_miniexample.py you can find a minimal example for the usage with numpy arrays as inputs.

In examples/usage_example.ipynb you can find a comprehensive example with:
- data generation,
- execution of the DBSCAN clustering algorithm,
- visualisation,
- calculation of the NACVIs
