Metadata-Version: 2.4
Name: nn-dataset
Version: 1.2.5
Summary: Neural Network Dataset
Home-page: https://ABrain.one
Author: ABrain One and contributors
Author-email: AI@ABrain.one
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.6.0
Requires-Dist: torchvision>=0.21.0
Requires-Dist: tqdm
Requires-Dist: onnx
Requires-Dist: optuna
Requires-Dist: datasets>=3.2.0
Requires-Dist: pycocotools~=2.0.8
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scipy
Requires-Dist: requests~=2.32.3
Requires-Dist: pillow
Requires-Dist: setuptools<80.0.0
Requires-Dist: nltk>=3.9.1
Requires-Dist: gdown
Requires-Dist: flake8~=7.1.1
Requires-Dist: flake8-json~=24.4.0
Requires-Dist: radon~=6.0.1
Requires-Dist: pylint~=3.3.3
Provides-Extra: stat
Requires-Dist: nn-stat; extra == "stat"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

## <img src='https://abrain.one/img/lemur-nn-icon-64x64.png' width='32px'/> Neural Network Dataset 
<sub><a href='https://pypi.python.org/pypi/nn-dataset'><img src='https://img.shields.io/pypi/v/nn-dataset.svg'/></a><br/>
short alias  <a href='https://pypi.python.org/pypi/lmur'>lmur</a></sub>
   
LEMUR - Learning, Evaluation, and Modeling for Unified Research

<img src='https://abrain.one/img/lemur-nn-whit.jpg' width='25%'/>

The original version of the <a href='https://github.com/ABrain-One/nn-dataset'>LEMUR dataset</a> was created by <strong>Arash Torabi Goodarzi, Roman Kochnev</strong> and <strong>Zofia Antonina Bentyn</strong> at the Computer Vision Laboratory, University of Würzburg, Germany.

<h3>Overview 📖</h3>
NN Dataset project provides flexibility for dynamically combining various deep learing tasks, datasets, metrics, and neural network models. It is designed to facilitate the verification of neural network performance under various combinations of training hyperparameters and data transformation algorithms, by automatically generating performance statistics. Developed to support the <a href='https://github.com/ABrain-One/nn-gpt'>NNGPT</a> project, this dataset contains neural network models modified or generated by NNGPT's large language models, with names featuring multiple dash-separated alphanumeric postfixes (regex: ^.+(-[\w\d]{4,}){4,}$).

## Create and Activate a Virtual Environment (recommended)
For Linux/Mac:
   ```bash
   python3 -m venv .venv
   source .venv/bin/activate
   ```
For Windows:
   ```bash
   python3 -m venv .venv
   .venv\Scripts\activate
   ```

It is also assumed that CUDA 12.6 is installed. If you have a different version, please replace 'cu126' with the appropriate version number.

## Environment for NN Dataset Contributors
### Pip package manager
Create a virtual environment, activate it, and run the following command to install all the project dependencies:
```bash
python -m pip install --upgrade pip
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu126
```

## Installation or Update of the NN Dataset
Remove an old version of the LEMUR Dataset and its database:
```bash
pip uninstall nn-dataset -y
rm -rf db
```
Installing the stable version:
```bash
pip install nn-dataset --upgrade --extra-index-url https://download.pytorch.org/whl/cu126
```
Installing from GitHub to get the most recent code and statistics updates:
```bash
pip install git+https://github.com/ABrain-One/nn-dataset --upgrade --force --extra-index-url https://download.pytorch.org/whl/cu126
```
Adding functionality to export data to Excel files and generate plots for <a href='https://github.com/ABrain-One/nn-stat'>analyzing neural network performance</a>:
```bash
pip install nn-stat --upgrade --extra-index-url https://download.pytorch.org/whl/cu126
```
and export/generate:
```bash
python -m ab.stat.export
```

## Usage

Standard use cases:
1. Add a new neural network model into the `ab/nn/nn` directory.
2. Run the automated training process for this model (e.g., a new ComplexNet training pipeline configuration):
```bash
python -m ab.nn.train -c img-classification_cifar-10_acc_ComplexNet
```
or for all image segmentation models using a fixed range of training parameters and transformer:
```bash
python run.py -c img-segmentation -f echo --min_learning_rate 1e-4 -l 1e-2 --min_momentum 0.8 -m 0.99 --min_batch_binary_power 2 -b 6
```
To reproduce the previous result, set the minimum and maximum to the same desired values:
```bash
python run.py -c img-classification_cifar-10_acc_AlexNet --min_learning_rate 0.0061 -l 0.0061 --min_momentum 0.7549 -m 0.7549 --min_batch_binary_power 2 -b 2 -f norm_299
```
To view supported flags:
```bash
python run.py -h
```

### Docker
All versions of this project are compatible with <a href='https://hub.docker.com/r/abrainone/ai-linux' target='_blank'>AI Linux</a> and can be run inside a Docker image:
```bash
docker run -v /a/mm:. abrainone/ai-linux bash -c "PYTHONPATH=/a/mm python -m ab.nn.train"
```

Some recently added dependencies might be missing in the <b>AI Linux</b>. In this case, you can create a container from the Docker image ```abrainone/ai-linux```, install the missing packages (preferably using ```pip install <package name>```), and then create a new image from the container using ```docker commit <container name> <new image name>```. You can use this new image locally or push it to the registry for deployment on the computer cluster.

## Contribution

To contribute a new neural network (NN) model to the NN Dataset, please ensure the following criteria are met:

1. The code for each model is provided in a respective ".py" file within the <strong>/ab/nn/nn</strong> directory, and the file is named after the name of the model's structure.
2. The main class for each model is named <strong>Net</strong>.
3. The constructor of the <strong>Net</strong> class takes the following parameters:
   - <strong>in_shape</strong> (tuple): The shape of the first tensor from the dataset iterator. For images it is structured as `(batch, channel, height, width)`.
   - <strong>out_shape</strong> (tuple): Provided by the dataset loader, it describes the shape of the output tensor. For a classification task, this could be `(number of classes,)`.
   - <strong>prm</strong> (dict): A dictionary of hyperparameters, e.g., `{'lr': 0.24, 'momentum': 0.93, 'dropout': 0.51}`.
   - <strong>device</strong> (torch.device): PyTorch device used for the model training 
4. All external information required for the correct building and training of the NN model for a specific dataset/transformer, as well as the list of hyperparameters, is extracted from <strong>in_shape</strong>, <strong>out_shape</strong> or <strong>prm</strong>, e.g.: </br>`batch = in_shape[0]` </br>`channel_number = in_shape[1]` </br>`image_size = in_shape[2]` </br>`class_number = out_shape[0]` </br>`learning_rate = prm['lr']` </br>`momentum = prm['momentum']` </br>`dropout = prm['dropout']`.
5. Every model script has function returning set of supported hyperparameters, e.g.: </br>`def supported_hyperparameters(): return {'lr', 'momentum', 'dropout'}`</br> The value of each hyperparameter lies within the range of 0.0 to 1.0.
6. Every class <strong>Net</strong> implements two functions: </br>`train_setup(self, prm)`</br> and </br>`learn(self, train_data)`</br> The first function initializes the `criteria` and `optimizer`, while the second implements the training pipeline. See a simple implementation in the <a href="https://github.com/ABrain-One/nn-dataset/blob/main/ab/nn/nn/AlexNet.py">AlexNet model</a>.
7. For each pull request involving a new NN model, please generate and submit training statistics for 100 Optuna trials (or at least 3 trials for very large models) in the <strong>ab/nn/stat</strong> directory. The trials should cover 5 epochs of training. Ensure that this statistics is included along with the model in your pull request. For example, the statistics for the ComplexNet model are stored in files <strong>&#x003C;epoch number&#x003E;.json</strong> inside folder <strong>img-classification_cifar-10_acc_ComplexNet</strong>, and can be generated by:<br/>
```bash
python run.py -c img-classification_cifar-10_acc_ComplexNet -t 100 -e 5
```
<p>See more examples of models in <code>/ab/nn/nn</code> and generated statistics in <code>/ab/nn/stat</code>.</p>

### Available Modules

The `NN Dataset` includes the following key modules within the **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn'>ab.nn</a>** package:
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/nn'>nn</a>**: Predefined neural network architectures, including models like `AlexNet`, `ResNet`, `VGG`, and more.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/loader'>loader</a>**: Data loading utilities for popular datasets such as CIFAR-10, COCO, and others.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/metric'>metric</a>**: Evaluation metrics supported for model assessment, such as accuracy, Intersection over Union (IoU), and others.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/transform'>transform</a>**: A collection of data transformation algorithms for dataset preprocessing and augmentation.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/stat'>stat</a>**: Statistics for different neural network model training pipelines.
- **<a href='https://github.com/ABrain-One/nn-dataset/tree/main/ab/nn/util'>util</a>**: Utility functions designed to assist with training, model evaluation, and statistical analysis.

## Citation

If you find the LEMUR Neural Network Dataset to be useful for your research, please consider citing our <a target='_blank' href='https://arxiv.org/pdf/2504.10552'>article</a>:
```bibtex
@article{ABrain.NN-Dataset,
  title={LEMUR Neural Network Dataset: Towards Seamless AutoML},
  author={Goodarzi, Arash Torabi and Kochnev, Roman and Khalid, Waleed and Qin, Furui and Uzun, Tolgay Atinc and Dhameliya, Yashkumar Sanjaybhai and Kathiriya, Yash Kanubhai and Bentyn, Zofia Antonina and Ignatov, Dmitry and Timofte, Radu},
  journal={arXiv preprint arXiv:2504.10552},
  year={2025}
}
```

## Licenses

This project is distributed under the following licensing terms:
<ul><li>for neural network models adopted from other projects
  <ul>
    <li> Python code under the legacy <a href="https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-MIT-NNs">MIT</a> or <a href="https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-BSD-NNs">BSD 3-Clause</a> license</li>
    <li> models with pretrained weights under the legacy <a href="https://github.com/ABrain-One/nn-dataset/blob/main/Doc/Licenses/LICENSE-DEEPSEEK-LLM-V2">DeepSeek LLM V2</a> license</li>
  </ul></li>
<li> all neural network models and their weights not covered by the above licenses, as well as all other files and assets in this project, are subject to the <a href="https://github.com/ABrain-One/nn-dataset/blob/main/LICENSE">MIT license</a></li> 
</ul>

#### The idea and leadership of Dr. Ignatov
