Metadata-Version: 2.1
Name: NLarge
Version: 0.2.3
Summary: Data augmentation for NLP
Author: Ng Tze Kean
Author-email: ngtzekean@gmail.com
Requires-Python: >=3.12,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: accelerate (>=1.0.1,<2.0.0)
Requires-Dist: datasets (>=3.0.1,<4.0.0)
Requires-Dist: gensim (>=4.3.3,<5.0.0)
Requires-Dist: iprogress (>=0.4,<0.5)
Requires-Dist: ipykernel (>=6.29.5,<7.0.0)
Requires-Dist: jupyter (>=1.1.1,<2.0.0)
Requires-Dist: matplotlib (>=3.9.2,<4.0.0)
Requires-Dist: nltk (>=3.9.1,<4.0.0)
Requires-Dist: numpy (>=1.18.5,<2.0.0)
Requires-Dist: pandas (>=2.2.3,<3.0.0)
Requires-Dist: sentencepiece (>=0.2.0,<0.3.0)
Requires-Dist: torch (==2.4.1)
Requires-Dist: transformers (>=4.45.2,<5.0.0)
Description-Content-Type: text/markdown

# SC4001 NLarge

## Purpose of Project

NLarge is a project focused on exploring and implementing various data augmentation techniques for Natural Language Processing (NLP) tasks. The primary goal is to enhance the diversity and robustness of training datasets, thereby improving the performance and generalization capabilities of NLP models. This project includes traditional data augmentation methods such as synonym replacement and random substitution, as well as advanced techniques using Large Language Models (LLMs).

## Initializing Virtual Environment

We use Poetry in this project for dependency management. To get started, you will need to install Poetry.

```shell
pip install poetry
```

Afterwards, you can install the needed packages from Python with the help of Poetry using the command below:

```shell
poetry install
```

## Repository Contents

- `report.tex`: The LaTeX document containing the detailed report of the project, including methodology, experiments, results, and analysis.
- `example/`: Contains example scripts for data augmentation and model training.
- `NLarge/`: The main package containing the data augmentation and model implementation.

## Usage

To run the models and experiments, you can use the python notebooks in the `example/` directory. The notebooks contain detailed explanations and code snippets for data augmentation and model training.

## Website

You can access the PiPy page of the project from the link here: [pypi page](https://pypi.org/project/NLarge/)

Our github repository can be found here: [github page](https://github.com/HiIAmTzeKean/SC4001)

## Contributing

Contributions to this project are welcome. If you have any suggestions or improvements, please create a pull request or open an issue.
