11  Hyperparameter Tuning: PyTorch With fashionMNIST

In this tutorial, we will show how spotPython can be integrated into the PyTorch training workflow.

This document refers to the following software versions:

pip list | grep  "spot[RiverPython]"
spotPython                 0.2.31
spotRiver                  0.0.93
Note: you may need to restart the kernel to use updated packages.

spotPython can be installed via pip. Alternatively, the source code can be downloaded from gitHub: https://github.com/sequential-parameter-optimization/spotPython.

!pip install spotPython
# import sys
# !{sys.executable} -m pip install --upgrade build
# !{sys.executable} -m pip install --upgrade --force-reinstall spotPython

11.1 Setup

Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size and the device that is used.

MAX_TIME = 1
INIT_SIZE = 5
DEVICE = "cpu" # "cuda:0"
from spotPython.utils.device import getDevice
DEVICE = getDevice(DEVICE)
print(DEVICE)
cpu
import os
import copy
import socket
from datetime import datetime
from dateutil.tz import tzlocal
start_time = datetime.now(tzlocal())
HOSTNAME = socket.gethostname().split(".")[0]
experiment_name = '11-torch' + "_" + HOSTNAME + "_" + str(MAX_TIME) + "min_" + str(INIT_SIZE) + "init_" + str(start_time).split(".", 1)[0].replace(' ', '_')
experiment_name = experiment_name.replace(':', '-')
print(experiment_name)
if not os.path.exists('./figures'):
    os.makedirs('./figures')
11-torch_p040025_1min_5init_2023-06-16_09-36-31

11.2 Step 1: Initialization of the Empty fun_control Dictionary

spotPython uses a Python dictionary for storing the information required for the hyperparameter tuning process, which was described in Section 13.2.

from spotPython.utils.init import fun_control_init
fun_control = fun_control_init(task="classification",
    tensorboard_path="runs/11_spot_hpt_torch_fashion_mnist",
    device=DEVICE)

11.3 PyTorch Data Loading

11.4 Step 2: Load fashionMNIST Data

from torchvision import datasets, transforms
from torchvision.transforms import ToTensor
def load_data(data_dir="./data"):
    # Download training data from open datasets.
    training_data = datasets.FashionMNIST(
        root=data_dir,
        train=True,
        download=True,
        transform=ToTensor(),
    )
    # Download test data from open datasets.
    test_data = datasets.FashionMNIST(
        root=data_dir,
        train=False,
        download=True,
        transform=ToTensor(),
    )
    return training_data, test_data
train, test = load_data()
train.data.shape, test.data.shape
(torch.Size([60000, 28, 28]), torch.Size([10000, 28, 28]))
n_samples = len(train)
# add the dataset to the fun_control
fun_control.update({"data": None,
               "train": train,
               "test": test,
               "n_samples": n_samples,
               "target_column": None})

11.5 The Model (Algorithm) to be Tuned

11.6 Step 3: Specification of the Preprocessing Model

After the training and test data are specified and added to the fun_control dictionary, spotPython allows the specification of a data preprocessing pipeline, e.g., for the scaling of the data or for the one-hot encoding of categorical variables, see Section 13.4.1. This feature is not used here, so we do not change the default value (which is None).

11.7 Step 4: Select algorithm and core_model_hyper_dict

spotPython implements a class which is similar to the class described in the PyTorch tutorial. The class is called Net_fashionMNIST and is implemented in the file netfashionMNIST.py. The class is imported here.

from torch import nn
import spotPython.torch.netcore as netcore


class Net_fashionMNIST(netcore.Net_Core):
    def __init__(self, l1, l2, lr_mult, batch_size, epochs, k_folds, patience, optimizer, sgd_momentum):
        super(Net_fashionMNIST, self).__init__(
            lr_mult=lr_mult,
            batch_size=batch_size,
            epochs=epochs,
            k_folds=k_folds,
            patience=patience,
            optimizer=optimizer,
            sgd_momentum=sgd_momentum,
        )
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, l1),
            nn.ReLU(),
            nn.Linear(l1, l2),
            nn.ReLU(),
            nn.Linear(l2, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

This class inherits from the class Net_Core which is implemented in the file netcore.py, see ?sec-the-net-core-class-24.

from spotPython.data.torch_hyper_dict import TorchHyperDict
from spotPython.torch.netfashionMNIST import Net_fashionMNIST
from spotPython.hyperparameters.values import add_core_model_to_fun_control
fun_control = add_core_model_to_fun_control(core_model=Net_fashionMNIST,
                              fun_control=fun_control,
                              hyper_dict=TorchHyperDict,
                              filename=None)

11.8 The Search Space

11.8.1 Configuring the Search Space With spotPython

11.8.1.1 The hyper_dict Hyperparameters for the Selected Algorithm

spotPython uses JSON files for the specification of the hyperparameters, which were described in Section 13.5.2.

The corresponding entries for the Net_fashionMNIST class are shown below.

 "Net_fashionMNIST":
    {
        "l1": {
            "type": "int",
            "default": 5,
            "transform": "transform_power_2_int",
            "lower": 2,
            "upper": 9},
        "l2": {
            "type": "int",
            "default": 5,
            "transform": "transform_power_2_int",
            "lower": 2,
            "upper": 9},
        "lr_mult": {
            "type": "float",
            "default": 1.0,
            "transform": "None",
            "lower": 0.1,
            "upper": 10.0},
        "batch_size": {
            "type": "int",
            "default": 4,
            "transform": "transform_power_2_int",
            "lower": 1,
            "upper": 4},
        "epochs": {
                "type": "int",
                "default": 3,
                "transform": "transform_power_2_int",
                "lower": 3,
                "upper": 4},
        "k_folds": {
            "type": "int",
            "default": 1,
            "transform": "None",
            "lower": 1,
            "upper": 1},
        "patience": {
            "type": "int",
            "default": 5,
            "transform": "None",
            "lower": 2,
            "upper": 10
        },
        "optimizer": {
            "levels": ["Adadelta",
                        "Adagrad",
                        "Adam",
                        "AdamW",
                        "SparseAdam",
                        "Adamax",
                        "ASGD",
                        "NAdam",
                        "RAdam",
                        "RMSprop",
                        "Rprop",
                        "SGD"],
            "type": "factor",
            "default": "SGD",
            "transform": "None",
            "core_model_parameter_type": "str",
            "lower": 0,
            "upper": 12},
        "sgd_momentum": {
            "type": "float",
            "default": 0.0,
            "transform": "None",
            "lower": 0.0,
            "upper": 1.0}
    },

11.9 Step 5: Modify hyper_dict Hyperparameters for the Selected Algorithm aka core_model

spotPython provides functions for modifying the hyperparameters, their bounds and factors as well as for activating and de-activating hyperparameters without re-compilation of the Python source code. These functions were described in Section 13.5.3.

11.9.1 Modify hyperparameter of type numeric and integer (boolean)

from spotPython.hyperparameters.values import modify_hyper_parameter_bounds
# fun_control = modify_hyper_parameter_bounds(fun_control, "delta", bounds=[1e-10, 1e-6])
# fun_control = modify_hyper_parameter_bounds(fun_control, "min_samples_split", bounds=[3, 20])
#fun_control = modify_hyper_parameter_bounds(fun_control, "merit_preprune", bounds=[0, 0])
# fun_control["core_model_hyper_dict"]
fun_control = modify_hyper_parameter_bounds(fun_control, "k_folds", bounds=[0, 0])
fun_control = modify_hyper_parameter_bounds(fun_control, "patience", bounds=[2, 2])
fun_control = modify_hyper_parameter_bounds(fun_control, "epochs", bounds=[2, 3])

11.9.2 Modify hyperparameter of type factor

from spotPython.hyperparameters.values import modify_hyper_parameter_levels
# fun_control = modify_hyper_parameter_levels(fun_control, "leaf_model", ["LinearRegression"])
# fun_control["core_model_hyper_dict"]

11.9.3 Optimizers

Optimizers are described in Section 13.6.

11.10 Step 6: Selection of the Objective (Loss) Function

11.10.1 Evaluation

The evaluation procedure requires the specification of two elements:

  1. the way how the data is split into a train and a test set and
  2. the loss function (and a metric).

These are described in Section 18.9.

The key "loss_function" specifies the loss function which is used during the optimization, see Section 13.8.

We will use CrossEntropy loss for the multiclass-classification task.

from torch.nn import CrossEntropyLoss
loss_function = CrossEntropyLoss()
fun_control.update({
        "loss_function": loss_function,
        "shuffle": True,
        "eval":  "train_hold_out"
        })

11.10.2 Metric

from torchmetrics import Accuracy
metric_torch = Accuracy(task="multiclass", num_classes=10).to(fun_control["device"])
fun_control.update({"metric_torch": metric_torch})

11.11 Preparing the SPOT Call

The following code passes the information about the parameter ranges and bounds to spot.

# extract the variable types, names, and bounds
from spotPython.hyperparameters.values import (get_bound_values,
    get_var_name,
    get_var_type,)
var_type = get_var_type(fun_control)
var_name = get_var_name(fun_control)
fun_control.update({"var_type": var_type,
                    "var_name": var_name})
lower = get_bound_values(fun_control, "lower")
upper = get_bound_values(fun_control, "upper")
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))
| name         | type   | default   |   lower |   upper | transform             |
|--------------|--------|-----------|---------|---------|-----------------------|
| l1           | int    | 5         |     2   |       9 | transform_power_2_int |
| l2           | int    | 5         |     2   |       9 | transform_power_2_int |
| lr_mult      | float  | 1.0       |     0.1 |      10 | None                  |
| batch_size   | int    | 4         |     1   |       4 | transform_power_2_int |
| epochs       | int    | 3         |     2   |       3 | transform_power_2_int |
| k_folds      | int    | 1         |     0   |       0 | None                  |
| patience     | int    | 5         |     2   |       2 | None                  |
| optimizer    | factor | SGD       |     0   |      12 | None                  |
| sgd_momentum | float  | 0.0       |     0   |       1 | None                  |

11.12 The Objective Function fun_torch

The objective function fun_torch is selected next. It implements an interface from PyTorch’s training, validation, and testing methods to spotPython.

from spotPython.fun.hypertorch import HyperTorch
fun = HyperTorch().fun_torch

11.13 Starting the Hyperparameter Tuning

import numpy as np
from spotPython.spot import spot
from math import inf
spot_tuner = spot.Spot(fun=fun,
                   lower = lower,
                   upper = upper,
                   fun_evals = inf,
                   fun_repeats = 1,
                   max_time = MAX_TIME,
                   noise = False,
                   tolerance_x = np.sqrt(np.spacing(1)),
                   var_type = var_type,
                   var_name = var_name,
                   infill_criterion = "y",
                   n_points = 1,
                   seed=123,
                   log_level = 50,
                   show_models= False,
                   show_progress= True,
                   fun_control = fun_control,
                   design_control={"init_size": INIT_SIZE,
                                   "repeats": 1},
                   surrogate_control={"noise": True,
                                      "cod_type": "norm",
                                      "min_theta": -4,
                                      "max_theta": 3,
                                      "n_theta": len(var_name),
                                      "model_fun_evals": 10_000,
                                      "log_level": 50
                                      })
spot_tuner.run(X_start=X_start)

config: {'l1': 16, 'l2': 32, 'lr_mult': 9.563687451910228, 'batch_size': 8, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'AdamW', 'sgd_momentum': 0.41533100039458876}
Epoch: 1
Loss on hold-out set: 0.8697875214486073
Accuracy on hold-out set: 0.6594166666666667
MulticlassAccuracy value on hold-out data: 0.659416675567627
Epoch: 2
Loss on hold-out set: 0.7461786333272854
Accuracy on hold-out set: 0.70825
MulticlassAccuracy value on hold-out data: 0.7082499861717224
Epoch: 3
Loss on hold-out set: 0.7399586153697844
Accuracy on hold-out set: 0.714
MulticlassAccuracy value on hold-out data: 0.7139999866485596
Epoch: 4
Loss on hold-out set: 0.7325205926403093
Accuracy on hold-out set: 0.7214166666666667
MulticlassAccuracy value on hold-out data: 0.7214166522026062
Epoch: 5
Loss on hold-out set: 0.7821155497288952
Accuracy on hold-out set: 0.6969166666666666
MulticlassAccuracy value on hold-out data: 0.6969166398048401
Epoch: 6
Loss on hold-out set: 0.7480082136467099
Accuracy on hold-out set: 0.7005833333333333
MulticlassAccuracy value on hold-out data: 0.7005833387374878
Early stopping at epoch 5
Returned to Spot: Validation loss: 0.7480082136467099
----------------------------------------------

config: {'l1': 128, 'l2': 32, 'lr_mult': 6.258012467639852, 'batch_size': 2, 'epochs': 4, 'k_folds': 0, 'patience': 2, 'optimizer': 'RAdam', 'sgd_momentum': 0.9572474073249809}
Epoch: 1
Loss on hold-out set: 0.5277602214035049
Accuracy on hold-out set: 0.8421666666666666
MulticlassAccuracy value on hold-out data: 0.8421666622161865
Epoch: 2
Loss on hold-out set: 0.5432566576737462
Accuracy on hold-out set: 0.853375
MulticlassAccuracy value on hold-out data: 0.8533750176429749
Epoch: 3
Loss on hold-out set: 0.5048360322263601
Accuracy on hold-out set: 0.8692083333333334
MulticlassAccuracy value on hold-out data: 0.8692083358764648
Epoch: 4
Loss on hold-out set: 0.5264402558312139
Accuracy on hold-out set: 0.86875
MulticlassAccuracy value on hold-out data: 0.8687499761581421
Returned to Spot: Validation loss: 0.5264402558312139
----------------------------------------------

config: {'l1': 256, 'l2': 256, 'lr_mult': 0.2437336281201693, 'batch_size': 16, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'Adagrad', 'sgd_momentum': 0.15368887503658651}
Epoch: 1
Loss on hold-out set: 0.4975364735821883
Accuracy on hold-out set: 0.8226666666666667
MulticlassAccuracy value on hold-out data: 0.8226666450500488
Epoch: 2
Loss on hold-out set: 0.4522183754021923
Accuracy on hold-out set: 0.84075
MulticlassAccuracy value on hold-out data: 0.840749979019165
Epoch: 3
Loss on hold-out set: 0.43958584532514217
Accuracy on hold-out set: 0.8469583333333334
MulticlassAccuracy value on hold-out data: 0.846958339214325
Epoch: 4
Loss on hold-out set: 0.4252316642763714
Accuracy on hold-out set: 0.851375
MulticlassAccuracy value on hold-out data: 0.8513749837875366
Epoch: 5
Loss on hold-out set: 0.41232428667570153
Accuracy on hold-out set: 0.85625
MulticlassAccuracy value on hold-out data: 0.856249988079071
Epoch: 6
Loss on hold-out set: 0.40429589938620725
Accuracy on hold-out set: 0.8582083333333334
MulticlassAccuracy value on hold-out data: 0.8582083582878113
Epoch: 7
Loss on hold-out set: 0.40291512925301987
Accuracy on hold-out set: 0.8591666666666666
MulticlassAccuracy value on hold-out data: 0.85916668176651
Epoch: 8
Loss on hold-out set: 0.39473768007506926
Accuracy on hold-out set: 0.8614583333333333
MulticlassAccuracy value on hold-out data: 0.8614583611488342
Returned to Spot: Validation loss: 0.39473768007506926
----------------------------------------------

config: {'l1': 64, 'l2': 8, 'lr_mult': 2.906205211581667, 'batch_size': 8, 'epochs': 4, 'k_folds': 0, 'patience': 2, 'optimizer': 'SGD', 'sgd_momentum': 0.25435133436334767}
Epoch: 1
Loss on hold-out set: 0.9618792747010787
Accuracy on hold-out set: 0.6515416666666667
MulticlassAccuracy value on hold-out data: 0.6515416502952576
Epoch: 2
Loss on hold-out set: 0.7929936404004693
Accuracy on hold-out set: 0.6933333333333334
MulticlassAccuracy value on hold-out data: 0.6933333277702332
Epoch: 3
Loss on hold-out set: 0.70789112474521
Accuracy on hold-out set: 0.7372916666666667
MulticlassAccuracy value on hold-out data: 0.737291693687439
Epoch: 4
Loss on hold-out set: 0.6498391977275412
Accuracy on hold-out set: 0.7598333333333334
MulticlassAccuracy value on hold-out data: 0.7598333358764648
Returned to Spot: Validation loss: 0.6498391977275412
----------------------------------------------

config: {'l1': 4, 'l2': 128, 'lr_mult': 4.224097306355747, 'batch_size': 4, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'Adamax', 'sgd_momentum': 0.6538496127257492}
Epoch: 1
Loss on hold-out set: 0.8638992863685901
Accuracy on hold-out set: 0.6985
MulticlassAccuracy value on hold-out data: 0.6984999775886536
Epoch: 2
Loss on hold-out set: 0.9494509066280443
Accuracy on hold-out set: 0.6774583333333334
MulticlassAccuracy value on hold-out data: 0.6774583458900452
Epoch: 3
Loss on hold-out set: 0.8908399186267439
Accuracy on hold-out set: 0.7044583333333333
MulticlassAccuracy value on hold-out data: 0.7044583559036255
Early stopping at epoch 2
Returned to Spot: Validation loss: 0.8908399186267439
----------------------------------------------

config: {'l1': 256, 'l2': 256, 'lr_mult': 0.7375404984302874, 'batch_size': 16, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'Adagrad', 'sgd_momentum': 0.20628131592582533}
Epoch: 1
Loss on hold-out set: 0.44405158352230983
Accuracy on hold-out set: 0.842875
MulticlassAccuracy value on hold-out data: 0.8428750038146973
Epoch: 2
Loss on hold-out set: 0.4016528637322287
Accuracy on hold-out set: 0.861
MulticlassAccuracy value on hold-out data: 0.8610000014305115
Epoch: 3
Loss on hold-out set: 0.3710196974252661
Accuracy on hold-out set: 0.8702916666666667
MulticlassAccuracy value on hold-out data: 0.8702916502952576
Epoch: 4
Loss on hold-out set: 0.3579240807853639
Accuracy on hold-out set: 0.87575
MulticlassAccuracy value on hold-out data: 0.8757500052452087
Epoch: 5
Loss on hold-out set: 0.3572794831348583
Accuracy on hold-out set: 0.87675
MulticlassAccuracy value on hold-out data: 0.8767499923706055
Epoch: 6
Loss on hold-out set: 0.3460310174692422
Accuracy on hold-out set: 0.8797083333333333
MulticlassAccuracy value on hold-out data: 0.8797083497047424
Epoch: 7
Loss on hold-out set: 0.348224284471944
Accuracy on hold-out set: 0.8785
MulticlassAccuracy value on hold-out data: 0.8784999847412109
Epoch: 8
Loss on hold-out set: 0.3341982651151096
Accuracy on hold-out set: 0.8819583333333333
MulticlassAccuracy value on hold-out data: 0.8819583058357239
Returned to Spot: Validation loss: 0.3341982651151096
----------------------------------------------
spotPython tuning: 0.3341982651151096 [#####-----] 51.66% 

config: {'l1': 256, 'l2': 256, 'lr_mult': 1.9109745230304602, 'batch_size': 16, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'Adagrad', 'sgd_momentum': 0.37600084984498416}
Epoch: 1
Loss on hold-out set: 0.39271209251632294
Accuracy on hold-out set: 0.8576666666666667
MulticlassAccuracy value on hold-out data: 0.8576666712760925
Epoch: 2
Loss on hold-out set: 0.36352159655156235
Accuracy on hold-out set: 0.8684166666666666
MulticlassAccuracy value on hold-out data: 0.8684166669845581
Epoch: 3
Loss on hold-out set: 0.3585649378380428
Accuracy on hold-out set: 0.871125
MulticlassAccuracy value on hold-out data: 0.8711249828338623
Epoch: 4
Loss on hold-out set: 0.3361179789993912
Accuracy on hold-out set: 0.8793333333333333
MulticlassAccuracy value on hold-out data: 0.8793333172798157
Epoch: 5
Loss on hold-out set: 0.32392818241224935
Accuracy on hold-out set: 0.883125
MulticlassAccuracy value on hold-out data: 0.8831250071525574
Epoch: 6
Loss on hold-out set: 0.3205822240756825
Accuracy on hold-out set: 0.8864583333333333
MulticlassAccuracy value on hold-out data: 0.8864583373069763
Epoch: 7
Loss on hold-out set: 0.31521114270109685
Accuracy on hold-out set: 0.8882083333333334
MulticlassAccuracy value on hold-out data: 0.8882083296775818
Epoch: 8
Loss on hold-out set: 0.31223045862217746
Accuracy on hold-out set: 0.8905416666666667
MulticlassAccuracy value on hold-out data: 0.890541672706604
Returned to Spot: Validation loss: 0.31223045862217746
----------------------------------------------
spotPython tuning: 0.31223045862217746 [##########] 100.00% Done...
<spotPython.spot.spot.Spot at 0x2af14b1f0>

12 Tensorboard

The textual output shown in the console (or code cell) can be visualized with Tensorboard as described in Section 13.13.

12.0.1 Results

After the hyperparameter tuning run is finished, the results can be analyzed as described in Section 13.14.

SAVE = False
LOAD = False

if SAVE:
    result_file_name = "res_" + experiment_name + ".pkl"
    with open(result_file_name, 'wb') as f:
        pickle.dump(spot_tuner, f)

if LOAD:
    result_file_name = "ADD THE NAME here, e.g.: res_ch10-friedman-hpt-0_maans03_60min_20init_1K_2023-04-14_10-11-19.pkl"
    with open(result_file_name, 'rb') as f:
        spot_tuner =  pickle.load(f)

After the hyperparameter tuning run is finished, the progress of the hyperparameter tuning can be visualized. The following code generates the progress plot from ?fig-progress.

spot_tuner.plot_progress(log_y=False,
    filename="./figures/" + experiment_name+"_progress.png")

Progress plot. Black dots denote results from the initial design. Red dots illustrate the improvement found by the surrogate model based optimization.
  • Print the results
print(gen_design_table(fun_control=fun_control,
    spot=spot_tuner))
| name         | type   | default   |   lower |   upper |               tuned | transform             |   importance | stars   |
|--------------|--------|-----------|---------|---------|---------------------|-----------------------|--------------|---------|
| l1           | int    | 5         |     2.0 |     9.0 |                 8.0 | transform_power_2_int |       100.00 | ***     |
| l2           | int    | 5         |     2.0 |     9.0 |                 8.0 | transform_power_2_int |         0.71 | .       |
| lr_mult      | float  | 1.0       |     0.1 |    10.0 |  1.9109745230304602 | None                  |        28.53 | *       |
| batch_size   | int    | 4         |     1.0 |     4.0 |                 4.0 | transform_power_2_int |         0.00 |         |
| epochs       | int    | 3         |     2.0 |     3.0 |                 3.0 | transform_power_2_int |         0.00 |         |
| k_folds      | int    | 1         |     0.0 |     0.0 |                 0.0 | None                  |         0.00 |         |
| patience     | int    | 5         |     2.0 |     2.0 |                 2.0 | None                  |         0.00 |         |
| optimizer    | factor | SGD       |     0.0 |    12.0 |                 1.0 | None                  |         0.01 |         |
| sgd_momentum | float  | 0.0       |     0.0 |     1.0 | 0.37600084984498416 | None                  |         0.00 |         |

12.1 Show variable importance

spot_tuner.plot_importance(threshold=0.025, filename="./figures/" + experiment_name+"_importance.png")

Variable importance plot, threshold 0.025.

12.2 Get the Tuned Architecture (SPOT Results)

The architecture of the spotPython model can be obtained by the following code:

from spotPython.hyperparameters.values import get_one_core_model_from_X
X = spot_tuner.to_all_dim(spot_tuner.min_X.reshape(1,-1))
model_spot = get_one_core_model_from_X(X, fun_control)
model_spot
Net_fashionMNIST(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=10, bias=True)
  )
)

12.3 Get Default Hyperparameters

fc = fun_control
fc.update({"core_model_hyper_dict":
    hyper_dict[fun_control["core_model"].__name__]})
model_default = get_one_core_model_from_X(X_start, fun_control=fc)
model_default
Net_fashionMNIST(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=32, bias=True)
    (3): ReLU()
    (4): Linear(in_features=32, out_features=10, bias=True)
  )
)

12.4 Evaluation of the Default and the Tuned Architectures

The method train_tuned takes a model architecture without trained weights and trains this model with the train data. The train data is split into train and validation data. The validation data is used for early stopping. The trained model weights are saved as a dictionary.

from spotPython.torch.traintest import train_tuned
train_tuned(net=model_default, train_dataset=train, shuffle=True,
        loss_function=fun_control["loss_function"],
        metric=fun_control["metric_torch"],
        device = fun_control["device"],
        show_batch_interval=1_000_000,
        path=None,
        task=fun_control["task"])
Epoch: 1
Loss on hold-out set: 2.029458522160848
Accuracy on hold-out set: 0.21083333333333334
MulticlassAccuracy value on hold-out data: 0.2108333259820938
Epoch: 2
Loss on hold-out set: 1.6233754312197368
Accuracy on hold-out set: 0.48033333333333333
MulticlassAccuracy value on hold-out data: 0.4803333282470703
Epoch: 3
Loss on hold-out set: 1.3776697399616242
Accuracy on hold-out set: 0.5720416666666667
MulticlassAccuracy value on hold-out data: 0.5720416903495789
Epoch: 4
Loss on hold-out set: 1.2028716133038202
Accuracy on hold-out set: 0.606
MulticlassAccuracy value on hold-out data: 0.6060000061988831
Epoch: 5
Loss on hold-out set: 1.0858700147469837
Accuracy on hold-out set: 0.632875
MulticlassAccuracy value on hold-out data: 0.6328750252723694
Epoch: 6
Loss on hold-out set: 1.0070785391728083
Accuracy on hold-out set: 0.6559583333333333
MulticlassAccuracy value on hold-out data: 0.655958354473114
Epoch: 7
Loss on hold-out set: 0.9488001957138379
Accuracy on hold-out set: 0.671625
MulticlassAccuracy value on hold-out data: 0.671625018119812
Epoch: 8
Loss on hold-out set: 0.9060477392673493
Accuracy on hold-out set: 0.676
MulticlassAccuracy value on hold-out data: 0.6759999990463257
Returned to Spot: Validation loss: 0.9060477392673493
----------------------------------------------
from spotPython.torch.traintest import test_tuned
test_tuned(net=model_default, test_dataset=test, 
        loss_function=fun_control["loss_function"],
        metric=fun_control["metric_torch"],
        shuffle=False, 
        device = fun_control["device"],
        task=fun_control["task"])
Loss on hold-out set: 0.916261920785904
Accuracy on hold-out set: 0.6674
MulticlassAccuracy value on hold-out data: 0.6674000024795532
Final evaluation: Validation loss: 0.916261920785904
Final evaluation: Validation metric: 0.6674000024795532
----------------------------------------------
(0.916261920785904, nan, tensor(0.6674))

The following code trains the model model_spot. If path is set to a filename, e.g., path = "model_spot_trained.pt", the weights of the trained model will be saved to this file.

train_tuned(net=model_spot, train_dataset=train,
        loss_function=fun_control["loss_function"],
        metric=fun_control["metric_torch"],
        shuffle=True,
        device = fun_control["device"],
        path=None,
        task=fun_control["task"])
Epoch: 1
Loss on hold-out set: 0.4239927289299667
Accuracy on hold-out set: 0.8472916666666667
MulticlassAccuracy value on hold-out data: 0.8472916483879089
Epoch: 2
Loss on hold-out set: 0.37399891556488973
Accuracy on hold-out set: 0.86825
MulticlassAccuracy value on hold-out data: 0.8682500123977661
Epoch: 3
Loss on hold-out set: 0.3522352550762395
Accuracy on hold-out set: 0.8755
MulticlassAccuracy value on hold-out data: 0.8755000233650208
Epoch: 4
Loss on hold-out set: 0.3493974453068028
Accuracy on hold-out set: 0.8772083333333334
MulticlassAccuracy value on hold-out data: 0.8772083520889282
Epoch: 5
Loss on hold-out set: 0.3426488108104095
Accuracy on hold-out set: 0.879
MulticlassAccuracy value on hold-out data: 0.8790000081062317
Epoch: 6
Loss on hold-out set: 0.3393002171457435
Accuracy on hold-out set: 0.8809583333333333
MulticlassAccuracy value on hold-out data: 0.8809583187103271
Epoch: 7
Loss on hold-out set: 0.34342135236179455
Accuracy on hold-out set: 0.881625
MulticlassAccuracy value on hold-out data: 0.8816249966621399
Epoch: 8
Loss on hold-out set: 0.3357768829856068
Accuracy on hold-out set: 0.884875
MulticlassAccuracy value on hold-out data: 0.8848749995231628
Returned to Spot: Validation loss: 0.3357768829856068
----------------------------------------------
test_tuned(net=model_spot, test_dataset=test,
            shuffle=False,
            loss_function=fun_control["loss_function"],
            metric=fun_control["metric_torch"],
            device = fun_control["device"],
            task=fun_control["task"])
Loss on hold-out set: 0.3683470522731543
Accuracy on hold-out set: 0.8754
MulticlassAccuracy value on hold-out data: 0.8754000067710876
Final evaluation: Validation loss: 0.3683470522731543
Final evaluation: Validation metric: 0.8754000067710876
----------------------------------------------
(0.3683470522731543, nan, tensor(0.8754))

12.5 Detailed Hyperparameter Plots

filename = "./figures/" + experiment_name
spot_tuner.plot_important_hyperparameter_contour(filename=filename)
l1:  100.0
l2:  0.7131487144216024
lr_mult:  28.532825734599953

Contour plots.

12.6 Parallel Coordinates Plot

spot_tuner.parallel_plot()

Parallel coordinates plots

12.7 Plot all Combinations of Hyperparameters

  • Warning: this may take a while.
PLOT_ALL = False
if PLOT_ALL:
    n = spot_tuner.k
    for i in range(n-1):
        for j in range(i+1, n):
            spot_tuner.plot_contour(i=i, j=j, min_z=min_z, max_z = max_z)