Metadata-Version: 2.4
Name: pearl-H
Version: 0.1.4
Summary: PEARL: Prototype-Enhanced Aligned Representation Learning
Author: Ruiyu Zhang, Lin Nie, Wai-Fung Lam, Qihao Wang, Xin Zhao
License: MIT
Project-URL: Homepage, https://github.com/yourusername/pearl
Project-URL: Documentation, https://pearl-ai.readthedocs.io
Project-URL: Repository, https://github.com/yourusername/pearl
Project-URL: Bug Tracker, https://github.com/yourusername/pearl/issues
Keywords: machine-learning,deep-learning,embedding,prototype-learning,text-classification,representation-learning,pytorch
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: torch>=1.12.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: openpyxl>=3.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Requires-Dist: sphinx>=4.0.0; extra == "dev"
Provides-Extra: examples
Requires-Dist: transformers>=4.20.0; extra == "examples"
Requires-Dist: datasets>=2.0.0; extra == "examples"
Requires-Dist: matplotlib>=3.5.0; extra == "examples"
Requires-Dist: seaborn>=0.11.0; extra == "examples"
Provides-Extra: all
Requires-Dist: pearl-H[dev,examples]; extra == "all"
Dynamic: license-file

# PEARL (`pearl-H`)

PEARL (**P**rototype-**E**nhanced **A**ligned **R**epresentation **L**earning) is a
lightweight, label-efficient post-processing method for **refining fixed embeddings** (e.g., sentence/document
embeddings) to improve **local neighborhood geometry** for similarity-driven systems such as kNN retrieval,
case-based routing, and embedding-based classifiers.

This package implements a practical PEARL workflow:

- **Signal extraction**: learns a small refinement network to separate class-discriminative signal from residual
  variation while preserving the original embedding dimensionality.
- **Prototype-augmented features (PAF)**: fits per-class prototypes (KMeans) and augments embeddings with
  prototype/centroid similarity features (useful for downstream lightweight models).

## Installation

```bash
pip install pearl-H
```

## Quickstart (recommended)

PEARL assumes you already have embeddings `X` from a fixed encoder. You provide a small labeled subset
`(X_train, y_train)` to fit the refinement, then transform any embeddings for retrieval/classification.

```python
import numpy as np
from pearl import PEARLPipeline

# X_train: [N, D] numpy array of embeddings
# y_train: [N] integer labels in [0, n_classes)
pipeline = PEARLPipeline(n_classes=10, device="auto")

pipeline.fit(X_train, y_train, X_val=X_val, y_val=y_val, epochs=100, patience=20)

# Choose the output you want:
X_enhanced = pipeline.transform(X_test, mode="enhanced")  # same dim as input
X_paf = pipeline.transform(X_test, mode="paf")            # augmented with prototype features
```

## Core API

- `PEARLPipeline`: end-to-end training + transformation (`fit`, `transform`, `fit_transform`).
- `SignalExtractorTrainer`: trains the refinement model; produces **same-dimensional** enhanced embeddings.
- `PAFAugmentor`: appends prototype/centroid similarity features to embeddings.
- `RAGClassifierWrapper`: retrieval-augmented classifier over embeddings (kNN retrieval + cross-attention).

### Input conventions

- Embeddings: `numpy.ndarray` of shape `[N, D]` (float32/float64).
- Labels: `numpy.ndarray` of shape `[N]` with integer class ids `0..n_classes-1`.
- Device: `"auto"`, `"cuda"`, `"mps"`, `"cpu"` (or a `torch.device`).

## Paper & citation

If you use PEARL in academic work, please cite the paper:

```bibtex
@misc{zhang2026pearlprototypeenhancedalignmentlabelefficient,
      title={PEARL: Prototype-Enhanced Alignment for Label-Efficient Representation Learning with Deployment-Driven Insights from Digital Governance Communication Systems},
      author={Ruiyu Zhang and Lin Nie and Wai-Fung Lam and Qihao Wang and Xin Zhao},
      year={2026},
      eprint={2601.17495},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.17495},
}
```

## License

MIT License. See `LICENSE`.
