Metadata-Version: 2.4
Name: hugging-mapper
Version: 1.0.1
Summary: A lightweight python tool for effortless text similarity scoring using Hugging Face models
Author-email: "Angel L. P." <59593766+angelphanth@users.noreply.github.com>
Project-URL: Bug Tracker, https://github.com/angelphanth/hugger/issues
Project-URL: Homepage, https://github.com/angelphanth/hugger
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: appnope>=0.1.4
Requires-Dist: asttokens>=3.0.0
Requires-Dist: certifi>=2022.12.7
Requires-Dist: charset-normalizer>=3.0.0
Requires-Dist: comm>=0.2.2
Requires-Dist: debugpy>=1.8.0
Requires-Dist: decorator>=5.2.0
Requires-Dist: executing>=2.0.0
Requires-Dist: filelock>=3.8.0
Requires-Dist: fsspec>=2023.10.0
Requires-Dist: hf-xet>=1.1.5
Requires-Dist: huggingface-hub>=0.13.0
Requires-Dist: idna>=3.0
Requires-Dist: ipython<9.0,>=8.0
Requires-Dist: ipython_pygments_lexers>=1.1.1
Requires-Dist: jedi>=0.19.0
Requires-Dist: Jinja2>=3.0.0
Requires-Dist: joblib>=1.2.0
Requires-Dist: jupyter_client>=8.0.0
Requires-Dist: jupyter_core>=5.0.0
Requires-Dist: MarkupSafe>=2.0.0
Requires-Dist: matplotlib-inline>=0.1.6
Requires-Dist: mpmath>=1.2.0
Requires-Dist: nest-asyncio>=1.5.0
Requires-Dist: networkx>=3.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: packaging>=21.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: parso>=0.8.0
Requires-Dist: pexpect>=4.8.0
Requires-Dist: platformdirs>=3.0.0
Requires-Dist: prompt_toolkit>=3.0.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: ptyprocess>=0.7.0
Requires-Dist: pure_eval>=0.2.0
Requires-Dist: Pygments>=2.10.0
Requires-Dist: python-dateutil>=2.8.0
Requires-Dist: pytz>=2022.0
Requires-Dist: PyYAML>=6.0.0
Requires-Dist: pyzmq>=25.0.0
Requires-Dist: regex>=2023.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: safetensors>=0.3.0
Requires-Dist: scikit-learn>=1.1.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: six>=1.16.0
Requires-Dist: stack-data>=0.6.0
Requires-Dist: sympy>=1.10.0
Requires-Dist: threadpoolctl>=3.1.0
Requires-Dist: tokenizers>=0.13.0
Requires-Dist: torch>=1.12.0
Requires-Dist: tornado>=6.2.0
Requires-Dist: tqdm>=4.64.0
Requires-Dist: traitlets>=5.3.0
Requires-Dist: transformers>=4.26.0
Requires-Dist: typing_extensions>=4.0.0
Requires-Dist: tzdata>=2022.0
Requires-Dist: urllib3>=1.26.0
Requires-Dist: wcwidth>=0.2.5
Provides-Extra: docs
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: sphinx-book-theme; extra == "docs"
Requires-Dist: myst-nb; extra == "docs"
Requires-Dist: ipywidgets; extra == "docs"
Requires-Dist: sphinx-new-tab-link!=0.2.2; extra == "docs"
Requires-Dist: jupytext; extra == "docs"
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ipykernel; extra == "dev"
Requires-Dist: openpyxl; extra == "dev"
Dynamic: license-file

![img](https://raw.githubusercontent.com/angelphanth/hugging-mapper/refs/heads/main/docs/assets/hugger-logo-wide.svg)

<h1 align="center">Hugging-Mapper</h1>
<p align="center"><em>A lightweight python tool for easy text similarity scoring using Hugging Face models</em></p>

<p align="center">
    <a href="https://pypi.org/project/hugging-mapper/">
        <img src="https://img.shields.io/pypi/v/hugging-mapper?label=PyPI" alt="PyPI">
    </a>
    <a href="https://github.com/angelphanth/hugging-mapper/actions/workflows/cicd.yml">
        <img src="https://github.com/angelphanth/hugging-mapper/actions/workflows/cicd.yml/badge.svg?branch=" alt="Python application">
    </a>
    <a href="https://hugging-mapper.readthedocs.io/en/latest/?badge=latest">
        <img src="https://readthedocs.org/projects/hugging-mapper/badge/?version=latest" alt="Read the Docs">
    </a>
    <img src="https://img.shields.io/pypi/pyversions/hugging-mapper" alt="PyPI - Python Version">
    <br>
    <br>
    <img src="https://img.shields.io/github/issues/angelphanth/hugging-mapper" alt="GitHub issues">
    <img src="https://img.shields.io/github/license/angelphanth/hugging-mapper" alt="GitHub license">
    <img src="https://img.shields.io/github/last-commit/angelphanth/hugging-mapper" alt="GitHub last commit">
    <img src="https://img.shields.io/github/stars/angelphanth/hugging-mapper?style=social" alt="GitHub stars">
</p>


## Table of Contents :bookmark_tabs:

- [Installation](#installation)
- [Features](#features)
- [Usage](#usage)
- [Documentation](#documentation)
- [License](#license)

## Installation 

```bash
pip install hugging-mapper
```

## Features
- Easily compare how similar two pieces of text are
- Customizable model selection at initialization
- Works with Hugging Face models that create sentence embeddings
- Batch scoring for lists of sentence pairs


## Usage

Embedding text using huggingface models
```python
from hugger.mapper import HuggingMapper

# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")
```

Similarity search of given data
```python
from hugger.mapper import NodeMapper
import pandas as pd

# demo data
data = pd.DataFrame({
    "id": ["node1", "node2", "node3"], 
    "text": ["Disease", "Gene", "Drug"]
})

# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)

# get most similar 
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)

# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)
```

## Documentation 

Tutorials and documentation are available on [Read the Docs](https://hugging-mapper.readthedocs.io/) :notebook_with_decorative_cover::grinning:

## License

This project is licensed under the MIT License.
