Metadata-Version: 2.4
Name: pearl-H
Version: 0.1.1
Summary: PEARL: Prototype-guided Embedding Refinement via Adaptive Representation Learning
Author-email: PEARL Contributors <pearl@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/pearl
Project-URL: Documentation, https://pearl-ai.readthedocs.io
Project-URL: Repository, https://github.com/yourusername/pearl
Project-URL: Bug Tracker, https://github.com/yourusername/pearl/issues
Keywords: machine-learning,deep-learning,embedding,prototype-learning,text-classification,representation-learning,pytorch
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: torch>=1.12.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: openpyxl>=3.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Requires-Dist: sphinx>=4.0.0; extra == "dev"
Provides-Extra: examples
Requires-Dist: transformers>=4.20.0; extra == "examples"
Requires-Dist: datasets>=2.0.0; extra == "examples"
Requires-Dist: matplotlib>=3.5.0; extra == "examples"
Requires-Dist: seaborn>=0.11.0; extra == "examples"
Provides-Extra: all
Requires-Dist: pearl-H[dev,examples]; extra == "all"
Dynamic: license-file

Prototype-Enhanced Aligned Representation Learning

## Examples

### Text Classification with BERT

```python
from transformers import AutoTokenizer, AutoModel
from pearl import PEARLPipeline
import torch

# Extract BERT embeddings
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')

def get_embeddings(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :].numpy()

# Get embeddings
X_train = get_embeddings(train_texts)
X_test = get_embeddings(test_texts)

# Apply PEARL
pearl = PEARLPipeline(n_classes=num_classes, device='cuda')
pearl.fit(X_train, y_train)

X_train_enhanced = pearl.transform(X_train, mode='paf')
X_test_enhanced = pearl.transform(X_test, mode='paf')

# Train classifier
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train_enhanced, y_train)
accuracy = clf.score(X_test_enhanced, y_test)
```

See the [`examples/`](examples/) directory for complete working examples:
- [`basic_usage.py`](examples/basic_usage.py): Simple synthetic data example
- [`text_classification.py`](examples/text_classification.py): Real-world text classification with BERT

## Performance

PEARL consistently improves classification performance across multiple benchmarks:

| Dataset | Classifier | Raw F1 | PEARL F1 | Improvement |
|---------|-----------|--------|----------|-------------|
| AG News | Logistic  | 0.8542 | 0.8876   | +3.34%      |
| AG News | SVM       | 0.8621 | 0.8912   | +2.91%      |
| AG News | MLP       | 0.8698 | 0.9045   | +3.47%      |
| AG News | RAG       | 0.8823 | 0.9156   | +3.33%      |

Results show consistent improvements across different classifiers and datasets.

## API Reference

### PEARLPipeline

Main interface for PEARL.

**Methods:**
- `fit(X_train, y_train, X_val, y_val, **kwargs)`: Train the pipeline
- `transform(X, mode='paf')`: Transform embeddings
- `fit_transform(X, y, **kwargs)`: Fit and transform in one step
- `save(path)`: Save pipeline to disk
- `load(path, device)`: Load pipeline from disk (class method)

### SignalExtractor

Neural network for signal extraction.

**Methods:**
- `forward(x)`: Forward pass returning all outputs
- `get_enhanced_embedding(x)`: Extract enhanced embedding

### PrototypeFeatures

Prototype-based feature generator.

**Methods:**
- `fit(embeddings, labels)`: Learn prototypes from training data
- `transform(embeddings)`: Generate prototype features
- `get_augmented(embeddings)`: Get embeddings + features

### RAGClassifierWrapper

Retrieval-augmented classifier.

**Methods:**
- `fit(X_train, y_train, X_val, y_val, **kwargs)`: Train the model
- `predict(X)`: Predict class labels
- `predict_proba(X)`: Predict class probabilities

## Requirements

- Python >= 3.8
- PyTorch >= 1.12.0
- NumPy >= 1.20.0
- scikit-learn >= 1.0.0
- pandas >= 1.3.0
- openpyxl >= 3.0.0

## Citation

If you use PEARL in your research, please cite:

```bibtex
@software{pearl2024,
  title={PEARL: Prototype-guided Embedding Refinement via Adaptive Representation Learning},
  author={PEARL Contributors},
  year={2024},
  url={https://github.com/yourusername/pearl}
}
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

### Development Setup

```bash
git clone https://github.com/yourusername/pearl.git
cd pearl
pip install -e ".[dev]"

# Run tests
pytest tests/

# Format code
black pearl/

# Type checking
mypy pearl/
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

PEARL was developed to address the challenge of enhancing learned embeddings for downstream classification tasks. It builds on ideas from:
- Signal processing and denoising
- Prototype-based learning
- Multi-task learning
- Retrieval-augmented generation

## Support

- **Issues**: [GitHub Issues](https://github.com/yourusername/pearl/issues)
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/pearl/discussions)
- **Email**: pearl@example.com

## Roadmap

- [ ] Support for additional embedding types (images, audio)
- [ ] Pre-trained models for common datasets
- [ ] Integration with popular frameworks (HuggingFace, PyTorch Lightning)
- [ ] Online/incremental learning support
- [ ] Multi-label classification support
- [ ] Comprehensive benchmarking suite

---

Made with ❤️ by the PEARL team
