Metadata-Version: 2.4
Name: interpretable-embeddings
Version: 1.0.1
Summary: Package for creating rank-based interpretable and contextual embeddings.
Author-email: Thiago César Castilho Almeida <tc.almeida@unesp.br>, Lucas Pascotti Valem <lucaspascottivalem@gmail.com>, Daniel Carlos Guimarães Pedronette <pedronette@gmail.com>
License: GPL-2.0
Keywords: interpretability,representation learning,dimensionality reduction,graph embedding,ranking,unsupervised learning
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2)
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: tqdm
Dynamic: license-file

# interpretable-embeddings

[![PyPI version](https://img.shields.io/pypi/v/interpretable-embeddings.svg)](https://pypi.org/project/interpretable-embeddings/)
[![License](https://img.shields.io/pypi/l/interpretable-embeddings.svg)](https://github.com/thcastilho/interpretable-embeddings/blob/main/LICENSE)
[![GitHub Repo](https://img.shields.io/badge/GitHub-Official_Repo-blue?logo=github)](https://github.com/thcastilho/interpretable-embeddings)

**interpretable-embeddings** is the official implementation of GRaCE algorithm for generating rank-based interpretable and contextual embeddings from top-K similarity lists.

This package implements the **RaDE** and **GRaCE** algorithms, which use graph-based measures to create **interpretable**, **effective**, and **unsupervised** embeddings for retrieval, clustering, classification, and visualization.

---

## 🔍 Overview

Unlike traditional embedding techniques that require raw features or supervised training, this package builds representations **entirely from ranked similarity lists** (e.g., from a kNN graph or retrieval system). Each embedding dimension corresponds to a similarity measure with the "leaders" (reference node).

Key benefits:
- **Unsupervised**: No labels or ground truth needed.
- **Interpretable**: Embedding dimensions are human-understandable.
- **Versatile**: Works for text, images, graphs—any domain with top-K similarities.

---

## 📦 Installation

```bash
pip install interpretable-embeddings
```

**Dependencies:**
- `numpy`
- `tqdm`

Requires Python ≥ 3.7.

---

## 🧠 Algorithms

### RaDE (Rank-based Diffusion Embedding)
- Selects leaders by propagating rank-based affinities through a diffusion process.

### GRaCE (Graph and Rank-based Contextual Embeddings)
- Extends RaDE with *unsupervised effectiveness estimation* (e.g., Reciprocal Density, Accumulated JacMax) and *rank correlation measures* (e.g., Reciprocal Distance, JacMax).

---

## 🛠 Usage

### Input Format

Your input must be a `.txt` file with one ranked list per line (space-separated item IDs):

```
15 3 8 22 7 9 ...
3 2 11 5 6 ...
...
```

Each line is a query, and each number is a retrieved item.

---

### RaDE Example

```python
from interpretable_embeddings import RaDE

# Initialize
rade = RaDE(rks_path="data/ranked_lists.txt", rks_size_L=20)

# Compute internal structure
rade.fit(num_candidates=1000, num_leaders=128, t=2)

# Get embedding vectors
embeddings = rade.transform()

# Or do both in one call
embeddings = rade.fit_transform(num_candidates=1000, num_leaders=128, t=2)
```

---

### GRaCE Example

```python
from interpretable_embeddings import GRaCE

grace = GRaCE(
    rks_path="data/ranked_lists.txt",
    top_K=20,
    correlation_measure="jacmax",  # or "reciprocal"
    estimation_measure="reciprocal_density",  # or "accjacmax"
    alpha=0.95
)

# Compute internal structure
grace.fit(num_leaders=128)

# Get embedding vectors
embeddings = grace.transform()

# Or do both in one call
embeddings = grace.fit_transform(num_leaders=128)
```

---

## 🔬 Example Applications

### Retrieval

```python
from sklearn.metrics.pairwise import cosine_similarity

query_idx = 0
sims = cosine_similarity(embeddings)
top_k = sims[query_idx].argsort()[::-1][1:11]
print("Top-10 results for query:", top_k)
```

### Clustering

```python
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5).fit(embeddings)
print(kmeans.labels_)
```

### Classification

```python
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(embeddings, labels)
clf = LogisticRegression(max_iter=1000).fit(X_train, y_train)
print("Accuracy:", clf.score(X_test, y_test))
```

### Visualization

```python
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

proj = TSNE(n_components=2).fit_transform(embeddings)
plt.scatter(proj[:, 0], proj[:, 1], c=labels, cmap="tab10")
plt.title("2D Visualization of RaDE Embeddings")
plt.show()
```

---

## 📁 Package Structure

```
interpretable-embeddings/
│
├── rade.py                 # RaDE implementation
├── grace.py                # GRaCE implementation
├── utils.py                # Ranked list reader
└── measures/
    ├── qpp.py              # Query performance prediction measures (AccJacMax, Reciprocal Density)
    └── correlation.py      # Rank correlation measures (JacMax, Reciprocal KNN)

```

---

## 📚 Citation

If you use this package in your research, please cite:

### GRaCE

> **Almeida, T. C. C., Letício, G. R., Valem, L. P., Freitas, A., Pedronette, D. C. G.**
> *Effective Graph and Rank-based Contextual Embeddings for Textual and Multimedia Data*
> 2025 International Joint Conference on Neural Networks (IJCNN), Rome – Italy.
> [![GRaCE](https://img.shields.io/badge/View%20Paper-GRaCE-blue)](https://doi.org/10.1109/IJCNN64981.2025.11229362)

---

### RaDE

> **De Fernando, F. A., Pedronette, D. C. G., De Sousa, G. J., Valem, L. P., Guilherme, I. R.**
> *RaDE: A Rank-based Graph Embedding Approach*
> 15th International Conference on Computer Vision Theory and Applications (VISAPP), 2020.
> [![RaDE](https://img.shields.io/badge/View%20Paper-RaDE-blue)](https://doi.org/10.5220/0008985901420152)

---

## 🤝 Contact

- Thiago César Castilho Almeida: `tc.almeida@unesp.br`
- Lucas Pascotti Valem: `lucaspascottivalem@gmail.com`
- Daniel Carlos Guimarães Pedronette: `pedronette@gmail.com`
