Metadata-Version: 2.4
Name: sledgehammer
Version: 1.0.0
Summary: SLEDgeHammer: Support, Length, Exclusivity and Difference Weigthed for Group Evaluation
Home-page: https://aquinordg.github.io/sledgehammer
Author: R. Douglas G. de Aquino
Author-email: aquinordga@gmail.com
License: MIT
Project-URL: Bug Tracker, https://github.com/aquinordg/sledgehammer/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: numpy
Dynamic: license-file

![Project](https://img.shields.io/badge/Project-sledgehammer-blue)
![Author](https://img.shields.io/badge/Author-aquinordg-green)
![Python](https://img.shields.io/badge/Python-3.12+-blue)
![Version](https://img.shields.io/badge/Version-1.0.0-orange)
![License](https://img.shields.io/badge/License-MIT-lightgrey)

# SLEDgeHammer (SLEDgeH): Support, Length, Exclusivity and Difference Weigthed for Group Evaluation

SLEDgeH is a Python library for evaluating clustering results using a semantic-based approach. Unlike traditional distance-based metrics, this method leverages the semantic relationship between significant frequent patterns identified among cluster items. This internal validation technique is particularly effective for data organized in **categorical form**.

---

## 🔥 Features

- **Semantic Descriptors**: Analyze feature support in clusters.
- **Particularization of Descriptors**: Refine cluster descriptors using customizable thresholds.
- **SLED Indicators**: Evaluate clusters based on Support (S), Length deviation (L), Exclusivity (E), and Descriptor support Difference (D).
- **Customizable Aggregation**: Choose from harmonic, geometric, or median aggregation for SLED indicators.

---

## 🛠 Installation

Install using *pip*:

```bash
pip install sledgehammer
```

---

## 🚀 Usage

### Importing the Library
```python
import numpy as np
from sklearn.cluster import KMeans  # requires: pip install scikit-learn
from sledgehammer import sledgehammer_score, sledgehammer_score_clusters, semantic_descriptors
```

### Example Workflow
```python
# Generate a random binary dataset
X = np.random.randint(0, 2, (100, 5))

# Specify the number of clusters
num_clusters = 3

# Perform K-Means clustering
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
labels = kmeans.fit_predict(X)

# Calculate the SLEDgeH score
average_score = sledgehammer_score(X, labels, aggregation='median')
print(f"Average SLEDgeH Score: {average_score}\n")

# Generate semantic descriptors
report = semantic_descriptors(X, labels, particular_threshold=0.5, report_form=True)

# Print cluster descriptors
for i in range(num_clusters):
    print(f"Cluster {i}:\n{report[i]}\n")
```

---

## 📜 Functions Overview

### `sledgehammer_score`
**Computes the average SLEDgeH score for all clusters.**

#### Parameters:
- **`X`**: Binary feature matrix of shape `(n_samples, n_features)`.
- **`labels`**: Cluster labels for each sample.
- **`W`**: Weighting factors for the SLED indicators (default `[0.3, 0.1, 0.5, 0.1]`).
- **`particular_threshold`**: Threshold for descriptor particularization (`None` for no particularization).
- **`aggregation`**: Aggregation method (`'harmonic'`, `'geometric'`, or `'median'`).

#### Returns:
- **`score`**: Average SLEDgeH score.

---

### `sledgehammer_score_clusters`
**Computes the SLEDgeH score for individual clusters.**

#### Parameters:
- Same as `sledgehammer_score`, with the addition of:
  - **`aggregation=None`**: If `None`, returns scores for each SLED indicator separately.

#### Returns:
- **`scores`**: Aggregated SLEDgeH scores for each cluster.
- **`score_matrix`**: Individual SLED indicator scores if `aggregation=None`.

---

### `semantic_descriptors`
**Computes semantic descriptors based on feature support in clusters.**

#### Parameters:
- **`X`**: Binary feature matrix of shape `(n_samples, n_features)`.
- **`labels`**: Cluster labels for each sample.
- **`particular_threshold`**: Threshold for descriptor particularization.
- **`report_form`**: If `True`, returns descriptors as a sorted dictionary for each cluster.

#### Returns:
- **`descriptors`**: Matrix with particularized feature support in clusters.
- **`report`**: Sorted dictionary of significant features in each cluster (if `report_form=True`).

---

### `particularize_descriptors`
**Particularizes descriptors based on support thresholds.**

#### Parameters:
- **`descriptors`**: Feature support matrix of shape `(n_clusters, n_features)`.
- **`particular_threshold`**: Threshold for particularization (default `1.0`).

#### Returns:
- **`descriptors`**: Matrix with particularized support values.

---

## 📄 License

This project is licensed under the MIT License. See the `LICENSE` file for details.

---

## 🤝 Contributing

We welcome contributions to SLEDgeH! To contribute:
1. Fork this repository.
2. Create a new branch for your feature.
3. Submit a pull request with your changes.

For questions or information, feel free to reach out at: [aquinordga@gmail.com](mailto:aquinordga@gmail.com).

---

## 👨‍💻 Author

Developed by AQUINO, R. D. G. 
[![Lattes](https://github.com/aquinordg/custom_tools/blob/main/icons/icons8-plataforma-lattes-32.png)](http://lattes.cnpq.br/2373005809061037)
[![ORCID](https://github.com/aquinordg/custom_tools/blob/main/icons/icons8-orcid-32.png)](https://orcid.org/0000-0002-8486-8354)
[![Google Scholar](https://github.com/aquinordg/custom_tools/blob/main/icons/icons8-google-scholar-32.png)](https://scholar.google.com/citations?user=r5WsvKgAAAAJ&hl)

---

## 💬 Feedback

Feel free to open an issue or contact me for feedback or feature requests. Your input is highly appreciated!
