Metadata-Version: 2.4
Name: SDGDetector
Version: 1.4.0
Summary: A library for classifying texts into one or more of the 17 SDGs using a pretrained transformer-based model or keyword extraction method, with an option to combine both approaches.
Home-page: https://gitlab.com/netmode/SDGDetector
Author: Ioanna Mandilara
Author-email: Ioanna Mandilara <ioanna_mandilara@yahoo.gr>
License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License
        Copyright (c) 2025 androna-xm
        
        You are free to:
        Share — copy and redistribute the material in any medium or format
        Adapt — remix, transform, and build upon the material
        The licensor cannot revoke these freedoms as long as you follow the license terms.
        Under the following terms:
        Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
        NonCommercial — You may not use the material for commercial purposes .
        No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Project-URL: Homepage, https://gitlab.com/netmode/sdg-detector
Project-URL: Documentation, https://gitlab.com/netmode/sdg-detector/-/wikis/home
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.26.4
Requires-Dist: nltk>=3.9.1
Requires-Dist: sentence-transformers>=3.4.1
Requires-Dist: sentencepiece>=0.2.0
Requires-Dist: keras_preprocessing>=1.1.2
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# SDGDetector

[![License](https://img.shields.io/badge/License-CC_BY_NC_4.0-blue.svg)](https://creativecommons.org/licenses/by-nc/4.0/)

<div align="center">
    <img src="https://gitlab.com/netmode/sdg-detector/-/raw/main/logo.jpg?ref_type=heads" alt="CyVer" width="300">
</div>

**SDGDetector** is an open-source python library, that aims to classify texts with the Sustainable Development Goals (SDGs).
This library either uses a pretrained fine-tuned model to classifiy the given texts to the SDG or implements the method of keywords extraction to associate them with the SDG or combine the two aforementioned methods.

1. The first method takes as input a list of texts and a _pytorch_ fine-tuned XLNet or RoBERTa model and returns the probabilities of the given texts to be associated with the SDG. Our fine-tuned pretrained XLNet and RoBERTa models can be found [here](https://gitlab.com/netmode/sdg-text2kg/-/tree/main/Data/Classification%20Task-Transfer%20Learning) under the folder 'Data/Classification Task-Transfer Learning'. The training/testing f1-score of the models is 0.90.
2. The second method takes as input a list of texts and find the most relevant keywords and computes the cosine similarity with the keywords of the SDG. As keywords of the SDG, we use the keywords based on the methodology explained [here](https://sustainability.utoronto.ca/inventories/sustainable-development-goals-sdgs-keywords/) and we add as keywords for the SDG17 the keywords which can be found [here](https://ap-unsdsn.org/regional-initiatives/universities-sdgs/).
3. The third option of this libary is a combination of the two above methods. The formula is

<div align="center">
    <img src="https://gitlab.com/netmode/sdg-detector/-/raw/main/rSDG_formula.jpg?ref_type=heads" alt="CyVer" width="600">
</div>

The methodology used for this library is available in [SustaiNLP Gitlab repository](https://gitlab.com/netmode/sdg-text2kg)

## Features

* **SDG_classifier_using_model**: Classifies the given text with the SDGs by using the fine-tuned *XLNet* or *RoBERTa* models.
* **SDG_classifier_using_keywords_extraction**: Classifies the given text with the SDGs by using keyword extraction and sentence embeddings generated by one of the models: *all-mpnet-base-v2*,*distilbert-base-nli-mean-tokens*, *all-MiniLM-L6-v2*. Representative keywords are identified through the MMR algorithm and compared to SDG keywords using cosine similarity.
* **SDG_classifier**: Classifies the given text with the SDGs by combining the aforementioned methods.

## ⚙️ Installation

To install the latest stable version from [PyPI](https://pypi.org/project/SDGDetector/#history), run:

```powershell
pip install SDGDetector
```

Alternatively, if you prefer to install the latest development version, you can install it directly from GitLab:

```powershell
pip install git+https://gitlab.com/netmode/sdg-detector.git
```

Or, you can manually clone the repository and run:

```powershell
git clone https://gitlab.com/netmode/sdg-detector.git
cd sdg-detector
pip install .
```

## 📦 Prerequisites

Before installing, make sure you have the following Python packages installed, with Python version >= 3.11:

```
'numpy>=1.26.4',  
'nltk>=3.9.1',  
'sentence-transformers>=3.4.1', 
'sentencepiece>=0.2.0', 
'keras_preprocessing>=1.1.2'
```

## 📖Documentation

For documentation read [Wiki](https://gitlab.com/netmode/SDGDetector/-/wikis/home).

## 💻 Example Usage

In the file _Test.ipynb_ there are examples for the 3 different classes of this library.

The fine-tuned pretrained XLNet and RoBERTa models, which are used in the first class *SDG_classifier_using_model* can be found [here](https://gitlab.com/netmode/sdg-text2kg/-/tree/main/Data/Classification%20Task-Transfer%20Learning) under the folder 'Data/Classification Task-Transfer Learning'. The training/testing f1-score of the models is 0.90. XLNet is better than RoBERTa. In addition, this library can be used with the user's fine-tuned model. The requirements for a different fine-tuned model is:

* It should be 'RoBERTa' or 'XLNet' model
* It should be saved using the python code `torch.save(model.state_dict(), model_name)` and implemented using Pytorch.

### Importing the Library

```python-repl
import os
from SDGDetector import SDGDetector

text = ['Europe has always been the home of industry. For centuries, it has been a pioneer in industrial innovation and has helped \
    improve the way people around the world produce, consume and do business. Based on a strong internal market, the European industry \
    has long powered our economy, providing a stable living for millions and creating the social hubs around which our communities are built.']

```

### SDG_classifier_using_model

```python
model = SDGDetector.SDG_classifier_using_model(model_name='XLNet',model_path=<your path of downloaded model>)

# apply classifier on example input text
sdg,sdg_names,probs = model.predict(text, return_probs=True)
```

### SDG_classifier_using_keywords_extraction

🔑: Some Sentence-Transformers models require authentication to access, especially models hosted on Hugging Face. To use these models, you need to create and set up a Hugging Face token.

```python
os.environ['HF_TOKEN'] = <token>
hf_token = os.getenv('HF_TOKEN')
if hf_token:
    print(f"Hugging Face token is set: {hf_token}")
else:
    print("Hugging Face token is not set.")
```

```python
mpnet = SDGDetector.SDG_classifier_using_keywords_extraction(model_name='all-mpnet-base-v2')

keywords_mpnet = mpnet.find_top_keywords(text,top_keywords=5,diversity=0.3,n_gram_range=(1,2))

sdg,sdg_name,cosine_similarity,cosine_matrix = mpnet.predict(text,top_keywords=5,diversity=0.3,n_gram_range=(1,2),return_cs_matrix_and_avg_cs=True)
```

### SDG_classifier

```python
combo = SDGDetector.SDG_classifier(pretrained_model_name='XLNet',pretrained_model_path=<your path of downloaded model>
                                sentence_model_name='all-mpnet-base-v2')

sdg,sdg_name,association = combo.predict(text,,top_keywords=10,diversity=0.3,n_gram_range=(1,2),return_association=True)

```

## 🤝 Contributing

Contributions are welcome! Please submit a pull request or open an issue.

## 📚 Cite

To cite this work, please use:

[Knowledge Graph Data Enrichment based on a Software Library for Text Mapping to the Sustainable Development Goals
Ioanna Mandilara, Eleni Fotopoulou, Christina Maria Androna, Anastasios Zafeiropoulos, Symeon Papavassiliou](https://ceur-ws.org/Vol-3447/Text2KG_Paper_4.pdf)

Zenodo repository: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7984984.svg)](https://doi.org/10.5281/zenodo.7984984)

## 📬 Contact

For any request for detailed information or expression of interest for participating at this initiative, you may contact:

* 📧 Ioanna Mandilara - ioannamand (at) netmode (dot) ntua (dot) gr
* 📧 Christina Maria Androna - andronaxm (at) netmode (dot) ntua (dot) gr
* 📧 Eleni Fotopoulou - efotopoulou (at) netmode (dot) ntua (dot) gr
* 📧 Anastasios Zafeiropoulos - tzafeir (at) cn (dot) ntua (dot) gr

## 📑License

This project is licensed under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.
