Metadata-Version: 2.1
Name: safaa
Version: 0.0.2
Summary: Created as a part of the 2023 Google Summer of Code project:      Reducing Fossology's False Positive Copyrights, the purpose is to be able      to predict whether a given copyright output from the Fossology software      is a false positive or not. It is also able to remove extra      text from a copyright notice.
Home-page: https://github.com/fossology/safaa
Author: Abdelrahman Jamal
Author-email: abdelrahmanjamal5565@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: spacy==3.8.4
Requires-Dist: joblib==1.2.0
Requires-Dist: pandas==2.2.3
Requires-Dist: scikit-learn==1.6.1

<!--
SPDX-FileCopyrightText: © 2023 abdelrahmanjamal5565@gmail.com

SPDX-License-Identifier: LGPL-2.1-only
-->
# Safaa

Safaa is a Python package designed for handling false positive detection in copyright notices.
Additionally, it can declutter copyright notices, removing unnecessary extra text.

## Features

- Load pre-trained models or train your own.
- Integration with scikit-learn for training and prediction.
- Integrated with spaCy for named entity recognition and decluttering tasks.
- Preprocessing tools to ensure data consistency and quality.
- Ability to handle local or default model directories.

## Installation

To install Safaa, simply use pip:

```bash
pip install safaa
```

## Usage

### Initialization

```
from safaa.Safaa import *
agent = SafaaAgent()
```

### Preprocessing Data
```
data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)
```

### Predicting False Positives
```
predictions = agent.predict(data)
```

### Decluttering Copyright Notices
```
decluttered_data = agent.declutter(data, predictions)
```

### Training Models
**To train the false positive detector:**

```
training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)
```

**To train the named entity recognition model:**

```
train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)
```

### Saving Trained Models
```
save_path = "path/to/save"
agent.save(save_path)
```

## Dependencies
* scikit-learn
* spaCy
* joblib
* regex
* os
* shutil

## License

This project is licensed under the [GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999](LICENSE).

## Contact Information

- **Name**: Abdelrahman Jamal
- **Email**: [abdelrahmanjamal5565@gmail.com](mailto:abdelrahmanjamal5565@gmail.com)
- **LinkedIn**: [linkedin.com/in/abdelrahmanjamal](https://linkedin.com/in/abdelrahmanjamal)


