11  HPT: PyTorch With fashionMNIST

In this tutorial, we will show how spotPython can be integrated into the PyTorch training workflow.

This document refers to the following software versions:

pip list | grep  "spot[RiverPython]"
spotPython               0.2.51
spotRiver                0.0.94
Note: you may need to restart the kernel to use updated packages.

spotPython can be installed via pip. Alternatively, the source code can be downloaded from gitHub: https://github.com/sequential-parameter-optimization/spotPython.

!pip install spotPython
# import sys
# !{sys.executable} -m pip install --upgrade build
# !{sys.executable} -m pip install --upgrade --force-reinstall spotPython

11.1 Step 1: Setup

Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size and the device that is used.

Caution: Run time and initial design size should be increased for real experiments
  • MAX_TIME is set to one minute for demonstration purposes. For real experiments, this should be increased to at least 1 hour.
  • INIT_SIZE is set to 5 for demonstration purposes. For real experiments, this should be increased to at least 10.
Note: Device selection
  • The device can be selected by setting the variable DEVICE.
  • Since we are using a simple neural net, the setting "cpu" is preferred (on Mac).
  • If you have a GPU, you can use "cuda:0" instead.
  • If DEVICE is set to None, spotPython will automatically select the device.
    • This might result in "mps" on Macs, which is not the best choice for simple neural nets.
MAX_TIME = 1
INIT_SIZE = 5
DEVICE = "cpu" # "cuda:0"
from spotPython.utils.device import getDevice
DEVICE = getDevice(DEVICE)
print(DEVICE)
cpu
import os
import copy
import socket
from datetime import datetime
from dateutil.tz import tzlocal
start_time = datetime.now(tzlocal())
HOSTNAME = socket.gethostname().split(".")[0]
experiment_name = '11-torch' + "_" + HOSTNAME + "_" + str(MAX_TIME) + "min_" + str(INIT_SIZE) + "init_" + str(start_time).split(".", 1)[0].replace(' ', '_')
experiment_name = experiment_name.replace(':', '-')
print(experiment_name)
if not os.path.exists('./figures'):
    os.makedirs('./figures')
11-torch_maans05_1min_5init_2023-06-28_14-33-15

11.2 Step 2: Initialization of the Empty fun_control Dictionary

spotPython uses a Python dictionary for storing the information required for the hyperparameter tuning process, which was described in Section 14.2.

Caution: Tensorboard does not work under Windows
  • Since tensorboard does not work under Windows, we recommend setting the parameter tensorboard_path to None if you are working under Windows.
from spotPython.utils.init import fun_control_init
fun_control = fun_control_init(task="classification",
    tensorboard_path="runs/11_spot_hpt_torch_fashion_mnist",
    device=DEVICE)

11.3 Step 3: PyTorch Data Loading

11.3.1 Load fashionMNIST Data

from torchvision import datasets, transforms
from torchvision.transforms import ToTensor
def load_data(data_dir="./data"):
    # Download training data from open datasets.
    training_data = datasets.FashionMNIST(
        root=data_dir,
        train=True,
        download=True,
        transform=ToTensor(),
    )
    # Download test data from open datasets.
    test_data = datasets.FashionMNIST(
        root=data_dir,
        train=False,
        download=True,
        transform=ToTensor(),
    )
    return training_data, test_data
train, test = load_data()
train.data.shape, test.data.shape
(torch.Size([60000, 28, 28]), torch.Size([10000, 28, 28]))
n_samples = len(train)
# add the dataset to the fun_control
fun_control.update({"data": None,
               "train": train,
               "test": test,
               "n_samples": n_samples,
               "target_column": None})

11.4 Step 4: Specification of the Preprocessing Model

After the training and test data are specified and added to the fun_control dictionary, spotPython allows the specification of a data preprocessing pipeline, e.g., for the scaling of the data or for the one-hot encoding of categorical variables, see Section 14.4. This feature is not used here, so we do not change the default value (which is None).

11.5 Step 5: Select Model (algorithm) and core_model_hyper_dict

spotPython implements a class which is similar to the class described in the PyTorch tutorial. The class is called Net_fashionMNIST and is implemented in the file netfashionMNIST.py. The class is imported here.

from torch import nn
import spotPython.torch.netcore as netcore


class Net_fashionMNIST(netcore.Net_Core):
    def __init__(self, l1, l2, lr_mult, batch_size, epochs, k_folds, patience, optimizer, sgd_momentum):
        super(Net_fashionMNIST, self).__init__(
            lr_mult=lr_mult,
            batch_size=batch_size,
            epochs=epochs,
            k_folds=k_folds,
            patience=patience,
            optimizer=optimizer,
            sgd_momentum=sgd_momentum,
        )
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, l1),
            nn.ReLU(),
            nn.Linear(l1, l2),
            nn.ReLU(),
            nn.Linear(l2, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

This class inherits from the class Net_Core which is implemented in the file netcore.py, see Section 14.5.1.

from spotPython.data.torch_hyper_dict import TorchHyperDict
from spotPython.torch.netfashionMNIST import Net_fashionMNIST
from spotPython.hyperparameters.values import add_core_model_to_fun_control
fun_control = add_core_model_to_fun_control(core_model=Net_fashionMNIST,
                              fun_control=fun_control,
                              hyper_dict=TorchHyperDict,
                              filename=None)

11.5.1 The Search Space

11.5.2 Configuring the Search Space With spotPython

11.5.2.1 The hyper_dict Hyperparameters for the Selected Algorithm

spotPython uses JSON files for the specification of the hyperparameters, which were described in Section 14.5.5.

The corresponding entries for the core_model class are shown below.

fun_control['core_model_hyper_dict']
{'l1': {'type': 'int',
  'default': 5,
  'transform': 'transform_power_2_int',
  'lower': 2,
  'upper': 9},
 'l2': {'type': 'int',
  'default': 5,
  'transform': 'transform_power_2_int',
  'lower': 2,
  'upper': 9},
 'lr_mult': {'type': 'float',
  'default': 1.0,
  'transform': 'None',
  'lower': 0.1,
  'upper': 10.0},
 'batch_size': {'type': 'int',
  'default': 4,
  'transform': 'transform_power_2_int',
  'lower': 1,
  'upper': 4},
 'epochs': {'type': 'int',
  'default': 3,
  'transform': 'transform_power_2_int',
  'lower': 3,
  'upper': 4},
 'k_folds': {'type': 'int',
  'default': 1,
  'transform': 'None',
  'lower': 1,
  'upper': 1},
 'patience': {'type': 'int',
  'default': 5,
  'transform': 'None',
  'lower': 2,
  'upper': 10},
 'optimizer': {'levels': ['Adadelta',
   'Adagrad',
   'Adam',
   'AdamW',
   'SparseAdam',
   'Adamax',
   'ASGD',
   'NAdam',
   'RAdam',
   'RMSprop',
   'Rprop',
   'SGD'],
  'type': 'factor',
  'default': 'SGD',
  'transform': 'None',
  'core_model_parameter_type': 'str',
  'lower': 0,
  'upper': 12},
 'sgd_momentum': {'type': 'float',
  'default': 0.0,
  'transform': 'None',
  'lower': 0.0,
  'upper': 1.0}}

11.6 Step 6: Modify hyper_dict Hyperparameters for the Selected Algorithm aka core_model

spotPython provides functions for modifying the hyperparameters, their bounds and factors as well as for activating and de-activating hyperparameters without re-compilation of the Python source code. These functions were described in Section 14.6.

11.6.1 Modify hyperparameter of type numeric and integer (boolean)

The hyperparameter k_folds is not used, it is de-activated here by setting the lower and upper bound to the same value.

Caution: Small net size, number of epochs, and patience for demonstration purposes
  • Net sizes l1 and l2 as well as epochs and patience are set to small values for demonstration purposes. These values are too small for a real application.
  • More resonable values are, e.g.:
    • fun_control = modify_hyper_parameter_bounds(fun_control, "l1", bounds=[2, 7])
    • fun_control = modify_hyper_parameter_bounds(fun_control, "epochs", bounds=[7, 9]) and
    • fun_control = modify_hyper_parameter_bounds(fun_control, "patience", bounds=[2, 7])
from spotPython.hyperparameters.values import modify_hyper_parameter_bounds
fun_control = modify_hyper_parameter_bounds(fun_control, "k_folds", bounds=[0, 0])
fun_control = modify_hyper_parameter_bounds(fun_control, "patience", bounds=[2, 2])
fun_control = modify_hyper_parameter_bounds(fun_control, "epochs", bounds=[2, 3])
fun_control = modify_hyper_parameter_bounds(fun_control, "l1", bounds=[2, 5])
fun_control = modify_hyper_parameter_bounds(fun_control, "l2", bounds=[2, 5])

11.6.2 Modify hyperparameter of type factor

from spotPython.hyperparameters.values import modify_hyper_parameter_levels
fun_control = modify_hyper_parameter_levels(fun_control, "optimizer",["Adam", "AdamW", "Adamax", "NAdam"])

11.6.3 Optimizers

Optimizers are described in Section 14.6.1.

fun_control = modify_hyper_parameter_bounds(fun_control,
    "lr_mult", bounds=[1e-3, 1e-3])
fun_control = modify_hyper_parameter_bounds(fun_control,
    "sgd_momentum", bounds=[0.9, 0.9])

11.7 Step 7: Selection of the Objective (Loss) Function

11.7.1 Evaluation

The evaluation procedure requires the specification of two elements:

  1. the way how the data is split into a train and a test set and
  2. the loss function (and a metric).

These are described in Section 19.7.1.

The key "loss_function" specifies the loss function which is used during the optimization, see Section 14.7.5.

We will use CrossEntropy loss for the multiclass-classification task.

from torch.nn import CrossEntropyLoss
loss_function = CrossEntropyLoss()
fun_control.update({
        "loss_function": loss_function,
        "shuffle": True,
        "eval":  "train_hold_out"
        })

11.7.2 Metric

from torchmetrics import Accuracy
metric_torch = Accuracy(task="multiclass", num_classes=10).to(fun_control["device"])
fun_control.update({"metric_torch": metric_torch})

11.8 Step 8: Calling the SPOT Function

11.8.1 Preparing the SPOT Call

The following code passes the information about the parameter ranges and bounds to spot.

# extract the variable types, names, and bounds
from spotPython.hyperparameters.values import (get_bound_values,
    get_var_name,
    get_var_type,)
var_type = get_var_type(fun_control)
var_name = get_var_name(fun_control)
fun_control.update({"var_type": var_type,
                    "var_name": var_name})
lower = get_bound_values(fun_control, "lower")
upper = get_bound_values(fun_control, "upper")
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))
| name         | type   | default   |   lower |   upper | transform             |
|--------------|--------|-----------|---------|---------|-----------------------|
| l1           | int    | 5         |   2     |   5     | transform_power_2_int |
| l2           | int    | 5         |   2     |   5     | transform_power_2_int |
| lr_mult      | float  | 1.0       |   0.001 |   0.001 | None                  |
| batch_size   | int    | 4         |   1     |   4     | transform_power_2_int |
| epochs       | int    | 3         |   2     |   3     | transform_power_2_int |
| k_folds      | int    | 1         |   0     |   0     | None                  |
| patience     | int    | 5         |   2     |   2     | None                  |
| optimizer    | factor | SGD       |   0     |   3     | None                  |
| sgd_momentum | float  | 0.0       |   0.9   |   0.9   | None                  |

11.8.2 The Objective Function fun_torch

The objective function fun_torch is selected next. It implements an interface from PyTorch’s training, validation, and testing methods to spotPython.

from spotPython.fun.hypertorch import HyperTorch
fun = HyperTorch().fun_torch

11.8.3 Starting the Hyperparameter Tuning

import numpy as np
from spotPython.spot import spot
from math import inf
spot_tuner = spot.Spot(fun=fun,
                   lower = lower,
                   upper = upper,
                   fun_evals = inf,
                   fun_repeats = 1,
                   max_time = MAX_TIME,
                   noise = False,
                   tolerance_x = np.sqrt(np.spacing(1)),
                   var_type = var_type,
                   var_name = var_name,
                   infill_criterion = "y",
                   n_points = 1,
                   seed=123,
                   log_level = 50,
                   show_models= False,
                   show_progress= True,
                   fun_control = fun_control,
                   design_control={"init_size": INIT_SIZE,
                                   "repeats": 1},
                   surrogate_control={"noise": True,
                                      "cod_type": "norm",
                                      "min_theta": -4,
                                      "max_theta": 3,
                                      "n_theta": len(var_name),
                                      "model_fun_evals": 10_000,
                                      "log_level": 50
                                      })
spot_tuner.run(X_start=X_start)

config: {'l1': 16, 'l2': 8, 'lr_mult': 0.001, 'batch_size': 16, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'AdamW', 'sgd_momentum': 0.9}
Epoch: 1 | 
MulticlassAccuracy: 0.1698333323001862 | Loss: 2.3015185308456423 | Acc: 0.1698333333333333.
Epoch: 2 | 
MulticlassAccuracy: 0.1747083365917206 | Loss: 2.2783363691965737 | Acc: 0.1747083333333333.
Epoch: 3 | 
MulticlassAccuracy: 0.1868333369493484 | Loss: 2.2568184957504274 | Acc: 0.1868333333333333.
Epoch: 4 | 
MulticlassAccuracy: 0.1969583332538605 | Loss: 2.2362602044741311 | Acc: 0.1969583333333333.
Epoch: 5 | 
MulticlassAccuracy: 0.2029999941587448 | Loss: 2.2159162033398947 | Acc: 0.2030000000000000.
Epoch: 6 | 
MulticlassAccuracy: 0.2064583301544189 | Loss: 2.1955996053218843 | Acc: 0.2064583333333333.
Epoch: 7 | 
MulticlassAccuracy: 0.2097916603088379 | Loss: 2.1751886695226035 | Acc: 0.2097916666666667.
Epoch: 8 | 
MulticlassAccuracy: 0.2160416692495346 | Loss: 2.1544364747206370 | Acc: 0.2160416666666667.
Returned to Spot: Validation loss: 2.154436474720637

config: {'l1': 8, 'l2': 8, 'lr_mult': 0.001, 'batch_size': 8, 'epochs': 4, 'k_folds': 0, 'patience': 2, 'optimizer': 'Adamax', 'sgd_momentum': 0.9}
Epoch: 1 | 
MulticlassAccuracy: 0.1239166632294655 | Loss: 2.2921640187104542 | Acc: 0.1239166666666667.
Epoch: 2 | 
MulticlassAccuracy: 0.1832499951124191 | Loss: 2.2631607745488487 | Acc: 0.1832500000000000.
Epoch: 3 | 
MulticlassAccuracy: 0.2182500064373016 | Loss: 2.2324760752518973 | Acc: 0.2182500000000000.
Epoch: 4 | 
MulticlassAccuracy: 0.2369583398103714 | Loss: 2.2046033837000527 | Acc: 0.2369583333333333.
Returned to Spot: Validation loss: 2.2046033837000527

config: {'l1': 32, 'l2': 16, 'lr_mult': 0.001, 'batch_size': 2, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'NAdam', 'sgd_momentum': 0.9}
Epoch: 1 | 
MulticlassAccuracy: 0.3432916700839996 | Loss: 2.0968230939805506 | Acc: 0.3432916666666667.
Epoch: 2 | 
MulticlassAccuracy: 0.3843333423137665 | Loss: 1.8433081769992907 | Acc: 0.3843333333333334.
Epoch: 3 | 
MulticlassAccuracy: 0.5312500000000000 | Loss: 1.6072639011790355 | Acc: 0.5312500000000000.
Epoch: 4 | 
MulticlassAccuracy: 0.5931249856948853 | Loss: 1.4182748231415947 | Acc: 0.5931250000000000.
Epoch: 5 | 
MulticlassAccuracy: 0.6115000247955322 | Loss: 1.2715940673351287 | Acc: 0.6115000000000000.
Epoch: 6 | 
MulticlassAccuracy: 0.6235416531562805 | Loss: 1.1561005362973860 | Acc: 0.6235416666666667.
Epoch: 7 | 
MulticlassAccuracy: 0.6331666707992554 | Loss: 1.0645056977315495 | Acc: 0.6331666666666667.
Epoch: 8 | 
MulticlassAccuracy: 0.6439583301544189 | Loss: 0.9931334485535820 | Acc: 0.6439583333333333.
Returned to Spot: Validation loss: 0.993133448553582

config: {'l1': 4, 'l2': 8, 'lr_mult': 0.001, 'batch_size': 4, 'epochs': 4, 'k_folds': 0, 'patience': 2, 'optimizer': 'AdamW', 'sgd_momentum': 0.9}
Epoch: 1 | 
MulticlassAccuracy: 0.0975833311676979 | Loss: 2.3035618282357850 | Acc: 0.0975833333333333.
Epoch: 2 | 
MulticlassAccuracy: 0.0975833311676979 | Loss: 2.2900932596524557 | Acc: 0.0975833333333333.
Epoch: 3 | 
MulticlassAccuracy: 0.0975833311676979 | Loss: 2.2780392925341926 | Acc: 0.0975833333333333.
Epoch: 4 | 
MulticlassAccuracy: 0.0975833311676979 | Loss: 2.2662698140144348 | Acc: 0.0975833333333333.
Returned to Spot: Validation loss: 2.266269814014435

config: {'l1': 16, 'l2': 32, 'lr_mult': 0.001, 'batch_size': 8, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'Adam', 'sgd_momentum': 0.9}
Epoch: 1 | 
MulticlassAccuracy: 0.1799583286046982 | Loss: 2.2826722462972007 | Acc: 0.1799583333333333.
Epoch: 2 | 
MulticlassAccuracy: 0.2345833331346512 | Loss: 2.2556005299091337 | Acc: 0.2345833333333333.
Epoch: 3 | 
MulticlassAccuracy: 0.2597916722297668 | Loss: 2.2186744021574656 | Acc: 0.2597916666666666.
Epoch: 4 | 
MulticlassAccuracy: 0.2779166698455811 | Loss: 2.1737045572598777 | Acc: 0.2779166666666666.
Epoch: 5 | 
MulticlassAccuracy: 0.2864166796207428 | Loss: 2.1249015250603356 | Acc: 0.2864166666666667.
Epoch: 6 | 
MulticlassAccuracy: 0.2893750071525574 | Loss: 2.0724829261302946 | Acc: 0.2893750000000000.
Epoch: 7 | 
MulticlassAccuracy: 0.2922083437442780 | Loss: 2.0170972356796266 | Acc: 0.2922083333333333.
Epoch: 8 | 
MulticlassAccuracy: 0.2997083365917206 | Loss: 1.9597443953752518 | Acc: 0.2997083333333334.
Returned to Spot: Validation loss: 1.9597443953752518

config: {'l1': 8, 'l2': 16, 'lr_mult': 0.001, 'batch_size': 8, 'epochs': 8, 'k_folds': 0, 'patience': 2, 'optimizer': 'NAdam', 'sgd_momentum': 0.9}
Epoch: 1 | 
MulticlassAccuracy: 0.1788749992847443 | Loss: 2.2956553886731466 | Acc: 0.1788750000000000.
Epoch: 2 | 
MulticlassAccuracy: 0.2063333392143250 | Loss: 2.2728778444131215 | Acc: 0.2063333333333333.
Epoch: 3 | 
MulticlassAccuracy: 0.2121666669845581 | Loss: 2.2405936568578086 | Acc: 0.2121666666666667.
Epoch: 4 | 
MulticlassAccuracy: 0.2259583324193954 | Loss: 2.1906005802551904 | Acc: 0.2259583333333333.
Epoch: 5 | 
MulticlassAccuracy: 0.2531666755676270 | Loss: 2.1032378111282983 | Acc: 0.2531666666666667.
Epoch: 6 | 
MulticlassAccuracy: 0.3013750016689301 | Loss: 2.0174429922898609 | Acc: 0.3013750000000000.
Epoch: 7 | 
MulticlassAccuracy: 0.3688333332538605 | Loss: 1.9310318603118262 | Acc: 0.3688333333333333.
Epoch: 8 | 
MulticlassAccuracy: 0.4222916662693024 | Loss: 1.8459605353275934 | Acc: 0.4222916666666667.
Returned to Spot: Validation loss: 1.8459605353275934
spotPython tuning: 0.993133448553582 [##########] 100.00% Done...
<spotPython.spot.spot.Spot at 0x143296a70>

11.9 Step 9: Tensorboard

The textual output shown in the console (or code cell) can be visualized with Tensorboard as described in Section 14.9, see also the description in the documentation: Tensorboard.

11.10 Step 10: Results

After the hyperparameter tuning run is finished, the results can be analyzed as described in Section 14.10.

SAVE = False
LOAD = False

if SAVE:
    result_file_name = "res_" + experiment_name + ".pkl"
    with open(result_file_name, 'wb') as f:
        pickle.dump(spot_tuner, f)

if LOAD:
    result_file_name = "ADD THE NAME here, e.g.: res_ch10-friedman-hpt-0_maans03_60min_20init_1K_2023-04-14_10-11-19.pkl"
    with open(result_file_name, 'rb') as f:
        spot_tuner =  pickle.load(f)

After the hyperparameter tuning run is finished, the progress of the hyperparameter tuning can be visualized. The following code generates the progress plot from ?fig-progress.

spot_tuner.plot_progress(log_y=False,
    filename="./figures/" + experiment_name+"_progress.png")

Progress plot. Black dots denote results from the initial design. Red dots illustrate the improvement found by the surrogate model based optimization.
  • Print the results
print(gen_design_table(fun_control=fun_control,
    spot=spot_tuner))
| name         | type   | default   |   lower |   upper |   tuned | transform             |   importance | stars   |
|--------------|--------|-----------|---------|---------|---------|-----------------------|--------------|---------|
| l1           | int    | 5         |     2.0 |     5.0 |     5.0 | transform_power_2_int |         0.00 |         |
| l2           | int    | 5         |     2.0 |     5.0 |     4.0 | transform_power_2_int |        10.90 | *       |
| lr_mult      | float  | 1.0       |   0.001 |   0.001 |   0.001 | None                  |         0.00 |         |
| batch_size   | int    | 4         |     1.0 |     4.0 |     1.0 | transform_power_2_int |         0.45 | .       |
| epochs       | int    | 3         |     2.0 |     3.0 |     3.0 | transform_power_2_int |         0.00 |         |
| k_folds      | int    | 1         |     0.0 |     0.0 |     0.0 | None                  |         0.00 |         |
| patience     | int    | 5         |     2.0 |     2.0 |     2.0 | None                  |         0.00 |         |
| optimizer    | factor | SGD       |     0.0 |     3.0 |     3.0 | None                  |       100.00 | ***     |
| sgd_momentum | float  | 0.0       |     0.9 |     0.9 |     0.9 | None                  |         0.00 |         |

11.10.1 Show variable importance

spot_tuner.plot_importance(threshold=0.025, filename="./figures/" + experiment_name+"_importance.png")

Variable importance plot, threshold 0.025.

11.10.2 Get the Tuned Architecture (SPOT Results)

The architecture of the spotPython model can be obtained by the following code:

from spotPython.hyperparameters.values import get_one_core_model_from_X
X = spot_tuner.to_all_dim(spot_tuner.min_X.reshape(1,-1))
model_spot = get_one_core_model_from_X(X, fun_control)
model_spot
Net_fashionMNIST(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=16, bias=True)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=10, bias=True)
  )
)

11.10.3 Get Default Hyperparameters

fc = fun_control
fc.update({"core_model_hyper_dict":
    hyper_dict[fun_control["core_model"].__name__]})
model_default = get_one_core_model_from_X(X_start, fun_control=fc)
model_default
Net_fashionMNIST(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=32, bias=True)
    (3): ReLU()
    (4): Linear(in_features=32, out_features=10, bias=True)
  )
)

11.10.4 Evaluation of the Default and the Tuned Architectures

The method train_tuned takes a model architecture without trained weights and trains this model with the train data. The train data is split into train and validation data. The validation data is used for early stopping. The trained model weights are saved as a dictionary.

from spotPython.torch.traintest import train_tuned
train_tuned(net=model_default, train_dataset=train, shuffle=True,
        loss_function=fun_control["loss_function"],
        metric=fun_control["metric_torch"],
        device = fun_control["device"],
        show_batch_interval=1_000_000,
        path=None,
        task=fun_control["task"])
Epoch: 1 | 
MulticlassAccuracy: 0.2833749949932098 | Loss: 2.0104992804527284 | Acc: 0.2833750000000000.
Epoch: 2 | 
MulticlassAccuracy: 0.4392499923706055 | Loss: 1.5686774803797403 | Acc: 0.4392500000000000.
Epoch: 3 | 
MulticlassAccuracy: 0.5816249847412109 | Loss: 1.2633160647551218 | Acc: 0.5816249999999999.
Epoch: 4 | 
MulticlassAccuracy: 0.6169166564941406 | Loss: 1.0911320121685664 | Acc: 0.6169166666666667.
Epoch: 5 | 
MulticlassAccuracy: 0.6437916755676270 | Loss: 0.9824318277041117 | Acc: 0.6437916666666667.
Epoch: 6 | 
MulticlassAccuracy: 0.6607499718666077 | Loss: 0.9147949484189352 | Acc: 0.6607499999999999.
Epoch: 7 | 
MulticlassAccuracy: 0.6700833439826965 | Loss: 0.8671281810998916 | Acc: 0.6700833333333334.
Epoch: 8 | 
MulticlassAccuracy: 0.6788333058357239 | Loss: 0.8325125283002853 | Acc: 0.6788333333333333.
Returned to Spot: Validation loss: 0.8325125283002853
from spotPython.torch.traintest import test_tuned
test_tuned(net=model_default, test_dataset=test, 
        loss_function=fun_control["loss_function"],
        metric=fun_control["metric_torch"],
        shuffle=False, 
        device = fun_control["device"],
        task=fun_control["task"])
MulticlassAccuracy: 0.6643999814987183 | Loss: 0.8509405560970307 | Acc: 0.6644000000000000.
Final evaluation: Validation loss: 0.8509405560970307
Final evaluation: Validation metric: 0.6643999814987183
----------------------------------------------
(0.8509405560970307, nan, tensor(0.6644))

The following code trains the model model_spot. If path is set to a filename, e.g., path = "model_spot_trained.pt", the weights of the trained model will be saved to this file.

train_tuned(net=model_spot, train_dataset=train,
        loss_function=fun_control["loss_function"],
        metric=fun_control["metric_torch"],
        shuffle=True,
        device = fun_control["device"],
        path=None,
        task=fun_control["task"])
Epoch: 1 | 
Batch: 10000. Batch Size: 2. Training Loss (running): 2.250
MulticlassAccuracy: 0.3386666774749756 | Loss: 2.0659151181181272 | Acc: 0.3386666666666667.
Epoch: 2 | 
Batch: 10000. Batch Size: 2. Training Loss (running): 1.999
MulticlassAccuracy: 0.4562083184719086 | Loss: 1.8194029325842858 | Acc: 0.4562083333333333.
Epoch: 3 | 
Batch: 10000. Batch Size: 2. Training Loss (running): 1.755
MulticlassAccuracy: 0.5444166660308838 | Loss: 1.5996205939998229 | Acc: 0.5444166666666667.
Epoch: 4 | 
Batch: 10000. Batch Size: 2. Training Loss (running): 1.548
MulticlassAccuracy: 0.5881666541099548 | Loss: 1.4159008273656171 | Acc: 0.5881666666666666.
Epoch: 5 | 
Batch: 10000. Batch Size: 2. Training Loss (running): 1.369
MulticlassAccuracy: 0.6184583306312561 | Loss: 1.2646532986313104 | Acc: 0.6184583333333333.
Epoch: 6 | 
Batch: 10000. Batch Size: 2. Training Loss (running): 1.235
MulticlassAccuracy: 0.6379583477973938 | Loss: 1.1568649689729016 | Acc: 0.6379583333333333.
Epoch: 7 | 
Batch: 10000. Batch Size: 2. Training Loss (running): 1.136
MulticlassAccuracy: 0.6455833315849304 | Loss: 1.0763771688994019 | Acc: 0.6455833333333333.
Epoch: 8 | 
Batch: 10000. Batch Size: 2. Training Loss (running): 1.062
MulticlassAccuracy: 0.6491666436195374 | Loss: 1.0089076419733465 | Acc: 0.6491666666666667.
Returned to Spot: Validation loss: 1.0089076419733465
test_tuned(net=model_spot, test_dataset=test,
            shuffle=False,
            loss_function=fun_control["loss_function"],
            metric=fun_control["metric_torch"],
            device = fun_control["device"],
            task=fun_control["task"])
MulticlassAccuracy: 0.6513000130653381 | Loss: 1.0097611629925669 | Acc: 0.6513000000000000.
Final evaluation: Validation loss: 1.009761162992567
Final evaluation: Validation metric: 0.6513000130653381
----------------------------------------------
(1.009761162992567, nan, tensor(0.6513))

11.10.5 Detailed Hyperparameter Plots

filename = "./figures/" + experiment_name
spot_tuner.plot_important_hyperparameter_contour(filename=filename)
l2:  10.899412013416766
batch_size:  0.4472521244478626
optimizer:  100.0

Contour plots.

11.10.6 Parallel Coordinates Plot

spot_tuner.parallel_plot()

Parallel coordinates plots

11.10.7 Plot all Combinations of Hyperparameters

  • Warning: this may take a while.
PLOT_ALL = False
if PLOT_ALL:
    n = spot_tuner.k
    for i in range(n-1):
        for j in range(i+1, n):
            spot_tuner.plot_contour(i=i, j=j, min_z=min_z, max_z = max_z)