Metadata-Version: 2.4
Name: VulnTrain
Version: 2.1.0
Summary: Generate datasets amd models based on vulnerabilities data from Vulnerability-Lookup.
License-Expression: GPL-3.0-or-later
License-File: AUTHORS
License-File: COPYING
Author: Cédric Bonhomme
Author-email: cedric.bonhomme@circl.lu
Requires-Python: >=3.11,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Security
Requires-Dist: codecarbon (>=2.8.3,<3.0.0)
Requires-Dist: cvss (>=3.4,<4.0)
Requires-Dist: datasets (>=3.5.0)
Requires-Dist: evaluate (>=0.4.3,<0.5.0)
Requires-Dist: markdown-it-py (>=3.0.0,<4.0.0)
Requires-Dist: nltk (>=3.9.1)
Requires-Dist: pandas (>=2.2.3)
Requires-Dist: scikit-learn (>=1.6.1,<2.0.0)
Requires-Dist: torch (>=2.7.0)
Requires-Dist: transformers[torch] (>=4.49.0,<5.0.0)
Requires-Dist: valkey (>=6.1.0)
Project-URL: Changelog, https://github.com/vulnerability-lookup/VulnTrain/blob/main/CHANGELOG.md
Project-URL: Homepage, https://github.com/vulnerability-lookup/VulnTrain
Project-URL: Repository, https://github.com/vulnerability-lookup/VulnTrain
Description-Content-Type: text/markdown

# VulnTrain

[![Latest release](https://img.shields.io/github/release/vulnerability-lookup/VulnTrain.svg?style=flat-square)](https://github.com/vulnerability-lookup/VulnTrain/releases/latest)
[![License](https://img.shields.io/github/license/vulnerability-lookup/VulnTrain.svg?style=flat-square)](https://www.gnu.org/licenses/gpl-3.0.html)
[![PyPi version](https://img.shields.io/pypi/v/VulnTrain.svg?style=flat-square)](https://pypi.org/project/VulnTrain)


VulnTrain offers a suite of commands to generate diverse AI datasets and train models using
comprehensive vulnerability data from [Vulnerability-Lookup](https://github.com/vulnerability-lookup/vulnerability-lookup).
It harnesses over one million JSON records from all supported advisory sources to build high-quality, domain-specific models.
  
Additionally, data from the ``vulnerability-lookup:meta`` container, including enrichment sources such as vulnrichment and Fraunhofer FKIE,
is incorporated to enhance model quality.

Check out the datasets and models on Hugging Face: 

[![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-xl-dark.svg)](https://huggingface.co/CIRCL)

For more information about the use of AI in Vulnerability-Lookup, please refer to the
[user manual](https://www.vulnerability-lookup.org/user-manual/ai/).


## Usage

Install VulnTrain:

```bash
$ pipx install VulnTrain
```

Three types of commands are available:

- **Dataset generation**: Create and prepare datasets.
- **Model training**: Train models using the prepared datasets.
  - Train a model to **classify** vulnerabilities by severity. [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm-dark.svg)](https://huggingface.co/CIRCL/vulnerability-severity-classification-roberta-base)
  - Train a model for **text generation** to assist in writing vulnerability descriptions [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm-dark.svg)](https://huggingface.co/CIRCL/vulnerability-description-generation-gpt2#how-to-get-started-with-the-model)
- **Model validation**: Assess the performance of trained models (validations, benchmarks, etc.).


Check out the [documentation](docs/) for more information.


## How to cite

Bonhomme, C., & Dulaunoy, A. (2025). VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Version 1.4.0) [Computer software]. https://doi.org/10.48550/arXiv.2507.03607

```bibtex
@misc{bonhomme2025vlai,
    title={VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification},
    author={Cédric Bonhomme and Alexandre Dulaunoy},
    year={2025},
    eprint={2507.03607},
    archivePrefix={arXiv},
    primaryClass={cs.CR}
}
```


## License

[VulnTrain](https://github.com/vulnerability-lookup/VulnTrain) is licensed under
[GNU General Public License version 3](https://www.gnu.org/licenses/gpl-3.0.html)

~~~
Copyright (c) 2025 Computer Incident Response Center Luxembourg (CIRCL)
Copyright (C) 2025 Cédric Bonhomme - https://github.com/cedricbonhomme
Copyright (C) 2025 Léa Ulusan - https://github.com/3LS3-1F
~~~


