Metadata-Version: 2.4
Name: ml_pentest
Version: 0.0.2
Summary: Robustness evaluation framework for ML-Based Windows malware detectors
Home-page: https://github.com/gparrella12/ml_pentest
Author: gparrella
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python
Dynamic: summary

# ML - Pentest | Are malware detector robust?

This is a software framework that can be used for the evaluation of the robustness of Malware Detection methods with respect to adversarial attacks. Currently, there is a focus on black-box adversarial attacks on PE malware detector.

The high level architecture of the software framework is shown in the following figure.

![arch](https://github.com/gparrella12/ml_pentest/assets/94001472/73a0350c-cdd6-419a-af8c-355ce99c3ac7)


## Attacks included

* **GAMMA**, formulated by [Demetrio et al.](https://arxiv.org/abs/2003.13526), with section injection and API injection manipulations. This implementation of the attack is easily used against any target model by implementing only the specific wrapper. Some code is reused from the open-source repository [secml-malware](https://github.com/pralab/secml_malware).
* **GAMMA V2**, introducted in this framework. Is an augmented version of GAMMA that optimize the section injection position, the section characteristics and the section names.

## Models included

Some of the known malware detectors in the literature have already been included by default within the library. 

Specifically, the models included include:
* **MalConv** ([link](https://arxiv.org/abs/1710.09435)) in its original formulation. The PyTorch implementation of MalConv was taken from [this](https://github.com/Alexander-H-Liu/MalConv-Pytorch) open source repository.
* **MalConv2** ([link](https://arxiv.org/abs/2012.09390)), an improved version of MalConv that requires less memory and time to train the model. The PyTorch implementation of MalConv was taken from [this](https://github.com/NeuromorphicComputationResearchProgram/MalConv2) open source repository. A pre-trained version of the model is available into the library.
* **EMBER Gradient-Boosted Decision Tree (GBDT)** ([link](https://arxiv.org/abs/1804.04637)): a gradient-boosted decision tree that use EMBER features for classification. A pre-trained version of the model on the EMBER dataset (taken from [this repository](https://github.com/elastic/ember) is available into the library.

**It should be noted that any requirements needed to run the models must already be available in the software environment used for execution.** The library offers support for any type of model upon implementation of a specific wrapper, with no software dependencies statically encoded in the library. 

Therefore, [PyTorch](https://pytorch.org/) must be installed to use the MalConv and MalConv2 models, [lgbm](https://lightgbm.readthedocs.io/en/latest/Python-Intro.html) for the GBDT model. 

Any other model can be used after implementing the specific wrapper and installing the necessary dependencies on the used environment.


## Table of Contents

- [Installation](#installation)
- [License](#license)

## Installation
The library is tested with **Python 3.8**. The library can be used with later versions of Python as long as they are compatible with [LIEF](https://github.com/lief-project/LIEF) version 0.12.0.

You can install the library by using the following command.
```bash
pip install ml_pentest 
```

## License

This system uses the GNU General Public License v3.0 (GPL-3.0) to ensure that the software remains open-source and free for everyone to use, modify and distribute. The license allows users to use the system for personal or commercial use, modify the code, and distribute the modified code as long as they maintain the same license. The GPL-3.0 ensures that the system's source code remains open and accessible, fostering a collaborative development environment.
