# import sys
# !{sys.executable} -m pip install --upgrade build
# !{sys.executable} -m pip install --upgrade --force-reinstall spotPython
12 HPT: PyTorch With spotPython
and Ray Tune on CIFAR10
In this tutorial, we will show how spotPython
can be integrated into the PyTorch
training workflow. It is based on the tutorial “Hyperparameter Tuning with Ray Tune” from the PyTorch
documentation (PyTorch 2023a), which is an extension of the tutorial “Training a Classifier” (PyTorch 2023b) for training a CIFAR10 image classifier.
spotPython
can be installed via pip1.
!pip install spotPython
- Uncomment the following lines if you want to for (re-)installation the latest version of
spotPython
from gitHub.
Results that refer to the Ray Tune
package are taken from https://PyTorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html2.
12.1 Step 1: Setup
Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size and the device that is used.
- MAX_TIME is set to one minute for demonstration purposes. For real experiments, this should be increased to at least 1 hour.
- INIT_SIZE is set to 5 for demonstration purposes. For real experiments, this should be increased to at least 10.
- The device can be selected by setting the variable
DEVICE
. - Since we are using a simple neural net, the setting
"cpu"
is preferred (on Mac). - If you have a GPU, you can use
"cuda:0"
instead. - If DEVICE is set to
"auto"
orNone
,spotPython
will automatically select the device.- This might result in
"mps"
on Macs, which is not the best choice for simple neural nets.
- This might result in
= 1
MAX_TIME = 5
INIT_SIZE = "auto" # "cpu"
DEVICE = "14-torch" PREFIX
from spotPython.utils.device import getDevice
= getDevice(DEVICE)
DEVICE print(DEVICE)
mps
import warnings
"ignore") warnings.filterwarnings(
12.2 Step 2: Initialization of the fun_control
Dictionary
spotPython
uses a Python dictionary for storing the information required for the hyperparameter tuning process. This dictionary is called fun_control
and is initialized with the function fun_control_init
. The function fun_control_init
returns a skeleton dictionary. The dictionary is filled with the required information for the hyperparameter tuning process. It stores the hyperparameter tuning settings, e.g., the deep learning network architecture that should be tuned, the classification (or regression) problem, and the data that is used for the tuning. The dictionary is used as an input for the SPOT function.
from spotPython.utils.init import fun_control_init
from spotPython.utils.file import get_experiment_name, get_spot_tensorboard_path
from spotPython.utils.device import getDevice
= get_experiment_name(prefix=PREFIX)
experiment_name
= fun_control_init(
fun_control ="classification",
task=get_spot_tensorboard_path(experiment_name),
spot_tensorboard_path=DEVICE,) device
12.3 Step 3: PyTorch Data Loading
The data loading process is implemented in the same manner as described in the Section “Data loaders” in PyTorch (2023a). The data loaders are wrapped into the function load_data_cifar10
which is identical to the function load_data
in PyTorch (2023a). A global data directory is used, which allows sharing the data directory between different trials. The method load_data_cifar10
is part of the spotPython
package and can be imported from spotPython.data.torchdata
.
In the following step, the test and train data are added to the dictionary fun_control
.
from spotPython.data.torchdata import load_data_cifar10
= load_data_cifar10()
train, test = len(train)
n_samples # add the dataset to the fun_control
fun_control.update({"train": train,
"test": test,
"n_samples": n_samples})
Files already downloaded and verified
Files already downloaded and verified
12.4 Step 4: Specification of the Preprocessing Model
After the training and test data are specified and added to the fun_control
dictionary, spotPython
allows the specification of a data preprocessing pipeline, e.g., for the scaling of the data or for the one-hot encoding of categorical variables. The preprocessing model is called prep_model
(“preparation” or pre-processing) and includes steps that are not subject to the hyperparameter tuning process. The preprocessing model is specified in the fun_control
dictionary. The preprocessing model can be implemented as a sklearn
pipeline. The following code shows a typical preprocessing pipeline:
categorical_columns = ["cities", "colors"]
one_hot_encoder = OneHotEncoder(handle_unknown="ignore",
sparse_output=False)
prep_model = ColumnTransformer(
transformers=[
("categorical", one_hot_encoder, categorical_columns),
],
remainder=StandardScaler(),
)
Because the Ray Tune (ray[tune]
) hyperparameter tuning as described in PyTorch (2023a) does not use a preprocessing model, the preprocessing model is set to None
here.
= None
prep_model "prep_model": prep_model}) fun_control.update({
12.5 Step 5: Select Model (algorithm
) and core_model_hyper_dict
The same neural network model as implemented in the section “Configurable neural network” of the PyTorch
tutorial (PyTorch 2023a) is used here. We will show the implementation from PyTorch (2023a) in Section 12.5.0.1 first, before the extended implementation with spotPython
is shown in Section 12.5.0.2.
12.5.0.1 Implementing a Configurable Neural Network With Ray Tune
We used the same hyperparameters that are implemented as configurable in the PyTorch
tutorial. We specify the layer sizes, namely l1
and l2
, of the fully connected layers:
class Net(nn.Module):
def __init__(self, l1=120, l2=84):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, l1)
self.fc2 = nn.Linear(l1, l2)
self.fc3 = nn.Linear(l2, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
The learning rate, i.e., lr
, of the optimizer is made configurable, too:
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
12.5.0.2 Implementing a Configurable Neural Network With spotPython
spotPython
implements a class which is similar to the class described in the PyTorch
tutorial. The class is called Net_CIFAR10
and is implemented in the file netcifar10.py
.
from torch import nn
import torch.nn.functional as F
import spotPython.torch.netcore as netcore
class Net_CIFAR10(netcore.Net_Core):
def __init__(self, l1, l2, lr_mult, batch_size, epochs, k_folds, patience,
optimizer, sgd_momentum):
super(Net_CIFAR10, self).__init__(
lr_mult=lr_mult,
batch_size=batch_size,
epochs=epochs,
k_folds=k_folds,
patience=patience,
optimizer=optimizer,
sgd_momentum=sgd_momentum,
)
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, l1)
self.fc2 = nn.Linear(l1, l2)
self.fc3 = nn.Linear(l2, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
12.5.1 The Net_Core
class
Net_CIFAR10
inherits from the class Net_Core
which is implemented in the file netcore.py
. It implements the additional attributes that are common to all neural network models. The Net_Core
class is implemented in the file netcore.py
. It implements hyperparameters as attributes, that are not used by the core_model
, e.g.:
- optimizer (
optimizer
), - learning rate (
lr
), - batch size (
batch_size
), - epochs (
epochs
), - k_folds (
k_folds
), and - early stopping criterion “patience” (
patience
).
Users can add further attributes to the class. The class Net_Core
is shown below.
from torch import nn
class Net_Core(nn.Module):
def __init__(self, lr_mult, batch_size, epochs, k_folds, patience,
optimizer, sgd_momentum):
super(Net_Core, self).__init__()
self.lr_mult = lr_mult
self.batch_size = batch_size
self.epochs = epochs
self.k_folds = k_folds
self.patience = patience
self.optimizer = optimizer
self.sgd_momentum = sgd_momentum
12.5.2 Comparison of the Approach Described in the PyTorch Tutorial With spotPython
Comparing the class Net
from the PyTorch
tutorial and the class Net_CIFAR10
from spotPython
, we see that the class Net_CIFAR10
has additional attributes and does not inherit from nn
directly. It adds an additional class, Net_core
, that takes care of additional attributes that are common to all neural network models, e.g., the learning rate multiplier lr_mult
or the batch size batch_size
.
spotPython
’s core_model
implements an instance of the Net_CIFAR10
class. In addition to the basic neural network model, the core_model
can use these additional attributes. spotPython
provides methods for handling these additional attributes to guarantee 100% compatibility with the PyTorch
classes. The method add_core_model_to_fun_control
adds the hyperparameters and additional attributes to the fun_control
dictionary. The method is shown below.
from spotPython.torch.netcifar10 import Net_CIFAR10
from spotPython.data.torch_hyper_dict import TorchHyperDict
from spotPython.hyperparameters.values import add_core_model_to_fun_control
= Net_CIFAR10
core_model =core_model,
add_core_model_to_fun_control(core_model=fun_control,
fun_control=TorchHyperDict,
hyper_dict=None) filename
12.5.3 The Search Space: Hyperparameters
In Section 12.5.4, we first describe how to configure the search space with ray[tune]
(as shown in PyTorch (2023a)) and then how to configure the search space with spotPython
in -14.
12.5.4 Configuring the Search Space With Ray Tune
Ray Tune’s search space can be configured as follows (PyTorch 2023a):
config = {
"l1": tune.sample_from(lambda _: 2**np.random.randint(2, 9)),
"l2": tune.sample_from(lambda _: 2**np.random.randint(2, 9)),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16])
}
The tune.sample_from()
function enables the user to define sample methods to obtain hyperparameters. In this example, the l1
and l2
parameters should be powers of 2 between 4 and 256, so either 4, 8, 16, 32, 64, 128, or 256. The lr
(learning rate) should be uniformly sampled between 0.0001 and 0.1. Lastly, the batch size is a choice between 2, 4, 8, and 16.
At each trial, ray[tune]
will randomly sample a combination of parameters from these search spaces. It will then train a number of models in parallel and find the best performing one among these. ray[tune]
uses the ASHAScheduler
which will terminate bad performing trials early.
12.5.5 Configuring the Search Space With spotPython
12.5.5.1 The hyper_dict
Hyperparameters for the Selected Algorithm
spotPython
uses JSON
files for the specification of the hyperparameters. Users can specify their individual JSON
files, or they can use the JSON
files provided by spotPython
. The JSON
file for the core_model
is called torch_hyper_dict.json
.
In contrast to ray[tune]
, spotPython
can handle numerical, boolean, and categorical hyperparameters. They can be specified in the JSON
file in a similar way as the numerical hyperparameters as shown below. Each entry in the JSON
file represents one hyperparameter with the following structure: type
, default
, transform
, lower
, and upper
.
"factor_hyperparameter": {
"levels": ["A", "B", "C"],
"type": "factor",
"default": "B",
"transform": "None",
"core_model_parameter_type": "str",
"lower": 0,
"upper": 2},
The corresponding entries for the core_model` class are shown below.
'core_model_hyper_dict'] fun_control[
{'l1': {'type': 'int',
'default': 5,
'transform': 'transform_power_2_int',
'lower': 2,
'upper': 9},
'l2': {'type': 'int',
'default': 5,
'transform': 'transform_power_2_int',
'lower': 2,
'upper': 9},
'lr_mult': {'type': 'float',
'default': 1.0,
'transform': 'None',
'lower': 0.1,
'upper': 10.0},
'batch_size': {'type': 'int',
'default': 4,
'transform': 'transform_power_2_int',
'lower': 1,
'upper': 4},
'epochs': {'type': 'int',
'default': 3,
'transform': 'transform_power_2_int',
'lower': 3,
'upper': 4},
'k_folds': {'type': 'int',
'default': 1,
'transform': 'None',
'lower': 1,
'upper': 1},
'patience': {'type': 'int',
'default': 5,
'transform': 'None',
'lower': 2,
'upper': 10},
'optimizer': {'levels': ['Adadelta',
'Adagrad',
'Adam',
'AdamW',
'SparseAdam',
'Adamax',
'ASGD',
'NAdam',
'RAdam',
'RMSprop',
'Rprop',
'SGD'],
'type': 'factor',
'default': 'SGD',
'transform': 'None',
'class_name': 'torch.optim',
'core_model_parameter_type': 'str',
'lower': 0,
'upper': 12},
'sgd_momentum': {'type': 'float',
'default': 0.0,
'transform': 'None',
'lower': 0.0,
'upper': 1.0}}
12.6 Step 6: Modify hyper_dict
Hyperparameters for the Selected Algorithm aka core_model
Ray tune (PyTorch 2023a) does not provide a way to change the specified hyperparameters without re-compilation. However, spotPython
provides functions for modifying the hyperparameters, their bounds and factors as well as for activating and de-activating hyperparameters without re-compilation of the Python source code. These functions are described in the following.
12.6.0.1 Modify hyper_dict
Hyperparameters for the Selected Algorithm aka core_model
After specifying the model, the corresponding hyperparameters, their types and bounds are loaded from the JSON
file torch_hyper_dict.json
. After loading, the user can modify the hyperparameters, e.g., the bounds. spotPython
provides a simple rule for de-activating hyperparameters: If the lower and the upper bound are set to identical values, the hyperparameter is de-activated. This is useful for the hyperparameter tuning, because it allows to specify a hyperparameter in the JSON
file, but to de-activate it in the fun_control
dictionary. This is done in the next step.
12.6.0.2 Modify Hyperparameters of Type numeric and integer (boolean)
Since the hyperparameter k_folds
is not used in the PyTorch
tutorial, it is de-activated here by setting the lower and upper bound to the same value. Note, k_folds
is of type “integer”.
from spotPython.hyperparameters.values import modify_hyper_parameter_bounds
modify_hyper_parameter_bounds(fun_control, "batch_size", bounds=[1, 5])
modify_hyper_parameter_bounds(fun_control, "k_folds", bounds=[0, 0])
modify_hyper_parameter_bounds(fun_control, "patience", bounds=[3, 3])
12.6.0.3 Modify Hyperparameter of Type factor
In a similar manner as for the numerical hyperparameters, the categorical hyperparameters can be modified. New configurations can be chosen by adding or deleting levels. For example, the hyperparameter optimizer
can be re-configured as follows:
In the following setting, two optimizers ("SGD"
and "Adam"
) will be compared during the spotPython
hyperparameter tuning. The hyperparameter optimizer
is active.
from spotPython.hyperparameters.values import modify_hyper_parameter_levels
modify_hyper_parameter_levels(fun_control,"optimizer", ["SGD", "Adam"])
The hyperparameter optimizer
can be de-activated by choosing only one value (level), here: "SGD"
.
"optimizer", ["SGD"]) modify_hyper_parameter_levels(fun_control,
As discussed in Section 12.6.1, there are some issues with the LBFGS optimizer. Therefore, the usage of the LBFGS optimizer is not deactivated in spotPython
by default. However, the LBFGS optimizer can be activated by adding it to the list of optimizers. Rprop
was removed, because it does perform very poorly (as some pre-tests have shown). However, it can also be activated by adding it to the list of optimizers. Since SparseAdam
does not support dense gradients, Adam
was used instead. Therefore, there are 10 default optimizers:
"optimizer",
modify_hyper_parameter_levels(fun_control, "Adadelta", "Adagrad", "Adam", "AdamW", "Adamax", "ASGD",
["NAdam", "RAdam", "RMSprop", "SGD"])
12.6.1 Optimizers
Table 12.1 shows some of the optimizers available in PyTorch
:
\(a\) denotes (0.9,0.999), \(b\) (0.5,1.2), and \(c\) (1e-6, 50), respectively. \(R\) denotes required, but unspecified
. “m” denotes momentum
, “w_d” weight_decay
, “d” dampening
, “n” nesterov
, “r” rho
, “l_s” learning rate for scaling delta
, “l_d” lr_decay
, “b” betas
, “l” lambd
, “a” alpha
, “m_d” for momentum_decay
, “e” etas
, and “s_s” for step_sizes
.
Optimizer | lr | m | w_d | d | n | r | l_s | l_d | b | l | a | m_d | e | s_s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Adadelta | - | - | 0. | - | - | 0.9 | 1. | - | - | - | - | - | - | - |
Adagrad | 1e-2 | - | 0. | - | - | - | - | 0. | - | - | - | - | - | - |
Adam | 1e-3 | - | 0. | - | - | - | - | - | \(a\) | - | - | - | - | - |
AdamW | 1e-3 | - | 1e-2 | - | - | - | - | - | \(a\) | - | - | - | - | - |
SparseAdam | 1e-3 | - | - | - | - | - | - | - | \(a\) | - | - | - | - | - |
Adamax | 2e-3 | - | 0. | - | - | - | - | - | \(a\) | - | - | - | - | - |
ASGD | 1e-2 | .9 | 0. | - | F | - | - | - | - | 1e-4 | .75 | - | - | - |
LBFGS | 1. | - | - | - | - | - | - | - | - | - | - | - | - | - |
NAdam | 2e-3 | - | 0. | - | - | - | - | - | \(a\) | - | - | 0 | - | - |
RAdam | 1e-3 | - | 0. | - | - | - | - | - | \(a\) | - | - | - | - | - |
RMSprop | 1e-2 | 0. | 0. | - | - | - | - | - | \(a\) | - | - | - | - | - |
Rprop | 1e-2 | - | - | - | - | - | - | - | - | - | \(b\) | \(c\) | - | - |
SGD | \(R\) | 0. | 0. | 0. | F | - | - | - | - | - | - | - | - | - |
spotPython
implements an optimization
handler that maps the optimizer names to the corresponding PyTorch
optimizers.
We recommend deactivating PyTorch
’s LBFGS optimizer, because it does not perform very well. The PyTorch
documentation, see https://pytorch.org/docs/stable/generated/torch.optim.LBFGS.html#torch.optim.LBFGS, states:
This is a very memory intensive optimizer (it requires additional
param_bytes * (history_size + 1)
bytes). If it doesn’t fit in memory try reducing the history size, or use a different algorithm.
Furthermore, the LBFGS optimizer is not compatible with the PyTorch
tutorial. The reason is that the LBFGS optimizer requires the closure
function, which is not implemented in the PyTorch
tutorial. Therefore, the LBFGS
optimizer is recommended here. Since there are ten optimizers in the portfolio, it is not recommended tuning the hyperparameters that effect one single optimizer only.
spotPython
provides a multiplier for the default learning rates, lr_mult
, because optimizers use different learning rates. Using a multiplier for the learning rates might enable a simultaneous tuning of the learning rates for all optimizers. However, this is not recommended, because the learning rates are not comparable across optimizers. Therefore, we recommend fixing the learning rate for all optimizers if multiple optimizers are used. This can be done by setting the lower and upper bounds of the learning rate multiplier to the same value as shown below.
Thus, the learning rate, which affects the SGD
optimizer, will be set to a fixed value. We choose the default value of 1e-3
for the learning rate, because it is used in other PyTorch
examples (it is also the default value used by spotPython
as defined in the optimizer_handler()
method). We recommend tuning the learning rate later, when a reduced set of optimizers is fixed. Here, we will demonstrate how to select in a screening phase the optimizers that should be used for the hyperparameter tuning.
For the same reason, we will fix the sgd_momentum
to 0.9
.
modify_hyper_parameter_bounds(fun_control,"lr_mult", bounds=[1.0, 1.0])
modify_hyper_parameter_bounds(fun_control,"sgd_momentum", bounds=[0.9, 0.9])
12.7 Step 7: Selection of the Objective (Loss) Function
12.7.1 Evaluation: Data Splitting
The evaluation procedure requires the specification of the way how the data is split into a train and a test set and the loss function (and a metric). As a default, spotPython
provides a standard hold-out data split and cross validation.
12.7.2 Hold-out Data Split
If a hold-out data split is used, the data will be partitioned into a training, a validation, and a test data set. The split depends on the setting of the eval
parameter. If eval
is set to train_hold_out
, one data set, usually the original training data set, is split into a new training and a validation data set. The training data set is used for training the model. The validation data set is used for the evaluation of the hyperparameter configuration and early stopping to prevent overfitting. In this case, the original test data set is not used.
spotPython
returns the hyperparameters of the machine learning and deep learning models, e.g., number of layers, learning rate, or optimizer, but not the model weights. Therefore, after the SPOT run is finished, the corresponding model with the optimized architecture has to be trained again with the best hyperparameter configuration. The training is performed on the training data set. The test data set is used for the final evaluation of the model.
Summarizing, the following splits are performed in the hold-out setting:
- Run
spotPython
witheval
set totrain_hold_out
to determine the best hyperparameter configuration. - Train the model with the best hyperparameter configuration (“architecture”) on the training data set:
train_tuned(model_spot, train, "model_spot.pt")
. - Test the model on the test data:
test_tuned(model_spot, test, "model_spot.pt")
These steps will be exemplified in the following sections.
In addition to this hold-out
setting, spotPython
provides another hold-out setting, where an explicit test data is specified by the user that will be used as the validation set. To choose this option, the eval
parameter is set to test_hold_out
. In this case, the training data set is used for the model training. Then, the explicitly defined test data set is used for the evaluation of the hyperparameter configuration (the validation).
12.7.3 Cross-Validation
The cross validation setting is used by setting the eval
parameter to train_cv
or test_cv
. In both cases, the data set is split into \(k\) folds. The model is trained on \(k-1\) folds and evaluated on the remaining fold. This is repeated \(k\) times, so that each fold is used exactly once for evaluation. The final evaluation is performed on the test data set. The cross validation setting is useful for small data sets, because it allows to use all data for training and evaluation. However, it is computationally expensive, because the model has to be trained \(k\) times.
Combinations of the above settings are possible, e.g., cross validation can be used for training and hold-out for evaluation or vice versa. Also, cross validation can be used for training and testing. Because cross validation is not used in the PyTorch
tutorial (PyTorch 2023a), it is not considered further here.
12.7.4 Overview of the Evaluation Settings
12.7.4.1 Settings for the Hyperparameter Tuning
An overview of the training evaluations is shown in Table 12.2. "train_cv"
and "test_cv"
use sklearn.model_selection.KFold()
internally. More details on the data splitting are provided in Section 18.14 (in the Appendix).
eval |
train |
test |
function | comment |
---|---|---|---|---|
"train_hold_out" |
\(\checkmark\) | train_one_epoch() , validate_one_epoch() for early stopping |
splits the train data set internally |
|
"test_hold_out" |
\(\checkmark\) | \(\checkmark\) | train_one_epoch() , validate_one_epoch() for early stopping |
use the test data set for validate_one_epoch() |
"train_cv" |
\(\checkmark\) | evaluate_cv(net, train) |
CV using the train data set |
|
"test_cv" |
\(\checkmark\) | evaluate_cv(net, test) |
CV using the test data set . Identical to "train_cv" , uses only test data. |
12.7.4.2 Settings for the Final Evaluation of the Tuned Architecture
12.7.4.2.1 Training of the Tuned Architecture
train_tuned(model, train)
: train the model with the best hyperparameter configuration (or simply the default) on the training data set. It splits the train
data into new train
and validation
sets using create_train_val_data_loaders()
, which calls torch.utils.data.random_split()
internally. Currently, 60% of the data is used for training and 40% for validation. The train
data is used for training the model with train_hold_out()
. The validation
data is used for early stopping using validate_fold_or_hold_out()
on the validation
data set.
12.7.4.2.2 Testing of the Tuned Architecture
test_tuned(model, test)
: test the model on the test data set. No data splitting is performed. The (trained) model is evaluated using the validate_fold_or_hold_out()
function. Note: During training, "shuffle"
is set to True
, whereas during testing, "shuffle"
is set to False
.
Section 18.14.1.4 describes the final evaluation of the tuned architecture.
fun_control.update({"eval": "train_hold_out",
"path": "torch_model.pt",
"shuffle": True})
12.7.5 Evaluation: Loss Functions and Metrics
The key "loss_function"
specifies the loss function which is used during the optimization. There are several different loss functions under PyTorch
’s nn
package. For example, a simple loss is MSELoss
, which computes the mean-squared error between the output and the target. In this tutorial we will use CrossEntropyLoss
, because it is also used in the PyTorch
tutorial.
from torch.nn import CrossEntropyLoss
= CrossEntropyLoss()
loss_function "loss_function": loss_function}) fun_control.update({
In addition to the loss functions, spotPython
provides access to a large number of metrics.
- The key
"metric_sklearn"
is used for metrics that follow thescikit-learn
conventions. - The key
"river_metric"
is used for the river based evaluation (Montiel et al. 2021) viaeval_oml_iter_progressive
, and - the key
"metric_torch"
is used for the metrics fromTorchMetrics
.
TorchMetrics
is a collection of more than 90 PyTorch metrics, see https://torchmetrics.readthedocs.io/en/latest/. Because the PyTorch
tutorial uses the accuracy as metric, we use the same metric here. Currently, accuracy is computed in the tutorial’s example code. We will use TorchMetrics
instead, because it offers more flexibilty, e.g., it can be used for regression and classification. Furthermore, TorchMetrics
offers the following advantages:
* A standardized interface to increase reproducibility
* Reduces Boilerplate
* Distributed-training compatible
* Rigorously tested
* Automatic accumulation over batches
* Automatic synchronization between multiple devices
Therefore, we set
import torchmetrics
= torchmetrics.Accuracy(task="multiclass", num_classes=10).to(fun_control["device"])
metric_torch "metric_torch": metric_torch}) fun_control.update({
12.8 Step 8: Calling the SPOT Function
12.8.1 Preparing the SPOT Call
The following code passes the information about the parameter ranges and bounds to spot
.
from spotPython.hyperparameters.values import (
get_var_type,
get_var_name,
get_bound_values
)= get_var_type(fun_control)
var_type = get_var_name(fun_control)
var_name
= get_bound_values(fun_control, "lower")
lower = get_bound_values(fun_control, "upper") upper
Now, the dictionary fun_control
contains all information needed for the hyperparameter tuning. Before the hyperparameter tuning is started, it is recommended to take a look at the experimental design. The method gen_design_table
generates a design table as follows:
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))
| name | type | default | lower | upper | transform |
|--------------|--------|-----------|---------|---------|-----------------------|
| l1 | int | 5 | 2 | 9 | transform_power_2_int |
| l2 | int | 5 | 2 | 9 | transform_power_2_int |
| lr_mult | float | 1.0 | 1 | 1 | None |
| batch_size | int | 4 | 1 | 5 | transform_power_2_int |
| epochs | int | 3 | 3 | 4 | transform_power_2_int |
| k_folds | int | 1 | 0 | 0 | None |
| patience | int | 5 | 3 | 3 | None |
| optimizer | factor | SGD | 0 | 9 | None |
| sgd_momentum | float | 0.0 | 0.9 | 0.9 | None |
This allows to check if all information is available and if the information is correct. ?tbl-design shows the experimental design for the hyperparameter tuning. The table shows the hyperparameters, their types, default values, lower and upper bounds, and the transformation function. The transformation function is used to transform the hyperparameter values from the unit hypercube to the original domain. The transformation function is applied to the hyperparameter values before the evaluation of the objective function. Hyperparameter transformations are shown in the column “transform”, e.g., the l1
default is 5
, which results in the value \(2^5 = 32\) for the network, because the transformation transform_power_2_int
was selected in the JSON
file. The default value of the batch_size
is set to 4
, which results in a batch size of \(2^4 = 16\).
12.8.2 The Objective Function fun_torch
The objective function fun_torch
is selected next. It implements an interface from PyTorch
’s training, validation, and testing methods to spotPython
.
from spotPython.fun.hypertorch import HyperTorch
= HyperTorch().fun_torch fun
12.8.3 Using Default Hyperparameters or Results from Previous Runs
We add the default setting to the initial design:
from spotPython.hyperparameters.values import get_default_hyperparameters_as_array
= get_default_hyperparameters_as_array(fun_control) X_start
12.8.4 Starting the Hyperparameter Tuning
The spotPython
hyperparameter tuning is started by calling the Spot
function. Here, we will run the tuner for approximately 30 minutes (max_time
). Note: the initial design is always evaluated in the spotPython
run. As a consequence, the run may take longer than specified by max_time
, because the evaluation time of initial design (here: init_size
, 10 points) is performed independently of max_time
. During the run, results from the training is shown. These results can be visualized with Tensorboard as will be shown in Section 12.9.
from spotPython.spot import spot
from math import inf
import numpy as np
= spot.Spot(fun=fun,
spot_tuner = lower,
lower = upper,
upper = inf,
fun_evals = MAX_TIME,
max_time = np.sqrt(np.spacing(1)),
tolerance_x = var_type,
var_type = var_name,
var_name = True,
show_progress= fun_control,
fun_control ={"init_size": INIT_SIZE},
design_control={"noise": True,
surrogate_control"cod_type": "norm",
"min_theta": -4,
"max_theta": 3,
"n_theta": len(var_name),
"model_fun_evals": 10_000
})=X_start) spot_tuner.run(X_start
config: {'l1': 128, 'l2': 8, 'lr_mult': 1.0, 'batch_size': 32, 'epochs': 16, 'k_folds': 0, 'patience': 3, 'optimizer': 'AdamW', 'sgd_momentum': 0.9}
Epoch: 1 |
MulticlassAccuracy: 0.3889499902725220 | Loss: 1.6403590366363525 | Acc: 0.3889500000000000.
Epoch: 2 |
MulticlassAccuracy: 0.4578999876976013 | Loss: 1.4816969134330749 | Acc: 0.4579000000000000.
Epoch: 3 |
MulticlassAccuracy: 0.4945999979972839 | Loss: 1.3767625138282775 | Acc: 0.4946000000000000.
Epoch: 4 |
MulticlassAccuracy: 0.5118499994277954 | Loss: 1.3446329971313478 | Acc: 0.5118500000000000.
Epoch: 5 |
MulticlassAccuracy: 0.5447499752044678 | Loss: 1.2767737101554870 | Acc: 0.5447500000000000.
Epoch: 6 |
MulticlassAccuracy: 0.5664499998092651 | Loss: 1.2234437763214112 | Acc: 0.5664500000000000.
Epoch: 7 |
MulticlassAccuracy: 0.5648499727249146 | Loss: 1.2325385323524476 | Acc: 0.5648500000000000.
Epoch: 8 |
MulticlassAccuracy: 0.5896499752998352 | Loss: 1.1611093239784240 | Acc: 0.5896500000000000.
Epoch: 9 |
MulticlassAccuracy: 0.6015999913215637 | Loss: 1.1370150957107543 | Acc: 0.6016000000000000.
Epoch: 10 |
MulticlassAccuracy: 0.6074000000953674 | Loss: 1.1378371593475343 | Acc: 0.6074000000000001.
Epoch: 11 |
MulticlassAccuracy: 0.6036999821662903 | Loss: 1.1592556796073914 | Acc: 0.6037000000000000.
Epoch: 12 |
MulticlassAccuracy: 0.5997499823570251 | Loss: 1.1987680685997009 | Acc: 0.5997500000000000.
Early stopping at epoch 11
Returned to Spot: Validation loss: 1.1987680685997009
config: {'l1': 16, 'l2': 16, 'lr_mult': 1.0, 'batch_size': 8, 'epochs': 8, 'k_folds': 0, 'patience': 3, 'optimizer': 'NAdam', 'sgd_momentum': 0.9}
Epoch: 1 |
MulticlassAccuracy: 0.3920499980449677 | Loss: 1.6102165319681168 | Acc: 0.3920500000000000.
Epoch: 2 |
MulticlassAccuracy: 0.4390000104904175 | Loss: 1.5077767979741097 | Acc: 0.4390000000000000.
Epoch: 3 |
MulticlassAccuracy: 0.4700999855995178 | Loss: 1.4581756867766380 | Acc: 0.4701000000000000.
Epoch: 4 |
MulticlassAccuracy: 0.4981499910354614 | Loss: 1.3969129746913911 | Acc: 0.4981500000000000.
Epoch: 5 |
MulticlassAccuracy: 0.5059000253677368 | Loss: 1.3693460956692696 | Acc: 0.5059000000000000.
Epoch: 6 |
MulticlassAccuracy: 0.5133500099182129 | Loss: 1.3540988440275192 | Acc: 0.5133500000000000.
Epoch: 7 |
MulticlassAccuracy: 0.5081499814987183 | Loss: 1.3817692994177342 | Acc: 0.5081500000000000.
Epoch: 8 |
MulticlassAccuracy: 0.5159500241279602 | Loss: 1.3653468480706215 | Acc: 0.5159500000000000.
Returned to Spot: Validation loss: 1.3653468480706215
config: {'l1': 256, 'l2': 128, 'lr_mult': 1.0, 'batch_size': 2, 'epochs': 16, 'k_folds': 0, 'patience': 3, 'optimizer': 'RMSprop', 'sgd_momentum': 0.9}
Epoch: 1 |
MulticlassAccuracy: 0.0958499982953072 | Loss: 2.3086834851264952 | Acc: 0.0958500000000000.
Epoch: 2 |
MulticlassAccuracy: 0.0987000018358231 | Loss: 2.3107500833988190 | Acc: 0.0987000000000000.
Epoch: 3 |
MulticlassAccuracy: 0.0958499982953072 | Loss: 2.3054559610605239 | Acc: 0.0958500000000000.
Epoch: 4 |
MulticlassAccuracy: 0.1013000011444092 | Loss: 2.3091404678583145 | Acc: 0.1013000000000000.
Epoch: 5 |
MulticlassAccuracy: 0.0958499982953072 | Loss: 2.3109533527135850 | Acc: 0.0958500000000000.
Epoch: 6 |
MulticlassAccuracy: 0.0987000018358231 | Loss: 2.3080133529186249 | Acc: 0.0987000000000000.
Early stopping at epoch 5
Returned to Spot: Validation loss: 2.308013352918625
config: {'l1': 8, 'l2': 32, 'lr_mult': 1.0, 'batch_size': 4, 'epochs': 8, 'k_folds': 0, 'patience': 3, 'optimizer': 'Adamax', 'sgd_momentum': 0.9}
Epoch: 1 |
MulticlassAccuracy: 0.3910000026226044 | Loss: 1.6194829273104667 | Acc: 0.3910000000000000.
Epoch: 2 |
MulticlassAccuracy: 0.4532499909400940 | Loss: 1.5181912495672703 | Acc: 0.4532500000000000.
Epoch: 3 |
MulticlassAccuracy: 0.5023999810218811 | Loss: 1.3594324642419815 | Acc: 0.5024000000000000.
Epoch: 4 |
MulticlassAccuracy: 0.5066999793052673 | Loss: 1.3639220094040037 | Acc: 0.5067000000000000.
Epoch: 5 |
MulticlassAccuracy: 0.5313000082969666 | Loss: 1.3084210138827563 | Acc: 0.5313000000000000.
Epoch: 6 |
MulticlassAccuracy: 0.5376499891281128 | Loss: 1.3020537653062492 | Acc: 0.5376500000000000.
Epoch: 7 |
MulticlassAccuracy: 0.5404999852180481 | Loss: 1.2979997927054763 | Acc: 0.5405000000000000.
Epoch: 8 |
MulticlassAccuracy: 0.5505999922752380 | Loss: 1.2794678398683668 | Acc: 0.5506000000000000.
Returned to Spot: Validation loss: 1.2794678398683668
config: {'l1': 64, 'l2': 512, 'lr_mult': 1.0, 'batch_size': 16, 'epochs': 16, 'k_folds': 0, 'patience': 3, 'optimizer': 'Adagrad', 'sgd_momentum': 0.9}
Error in Net_Core. Call to evaluate_hold_out() failed. err=TypeError("Adagrad.__init__() got an unexpected keyword argument 'differentiable'"), type(err)=<class 'TypeError'>
Returned to Spot: Validation loss: nan
config: {'l1': 512, 'l2': 256, 'lr_mult': 1.0, 'batch_size': 16, 'epochs': 8, 'k_folds': 0, 'patience': 3, 'optimizer': 'NAdam', 'sgd_momentum': 0.9}
Epoch: 1 |
MulticlassAccuracy: 0.5067499876022339 | Loss: 1.3663547940254210 | Acc: 0.5067500000000000.
Epoch: 2 |
MulticlassAccuracy: 0.5563499927520752 | Loss: 1.2667929081201554 | Acc: 0.5563500000000000.
Epoch: 3 |
MulticlassAccuracy: 0.5666499733924866 | Loss: 1.2227724067926407 | Acc: 0.5666500000000000.
Epoch: 4 |
MulticlassAccuracy: 0.6025000214576721 | Loss: 1.1719128470897675 | Acc: 0.6025000000000000.
Epoch: 5 |
MulticlassAccuracy: 0.5952500104904175 | Loss: 1.2412489697217941 | Acc: 0.5952499999999999.
Epoch: 6 |
MulticlassAccuracy: 0.5884500145912170 | Loss: 1.2785740818262101 | Acc: 0.5884500000000000.
Epoch: 7 |
MulticlassAccuracy: 0.6009500026702881 | Loss: 1.3023499223232269 | Acc: 0.6009500000000000.
Early stopping at epoch 6
Returned to Spot: Validation loss: 1.3023499223232269
spotPython tuning: 1.1987680685997009 [##########] 100.00% Done...
<spotPython.spot.spot.Spot at 0x2a37b3c10>
12.9 Step 9: Tensorboard
The textual output shown in the console (or code cell) can be visualized with Tensorboard.
12.9.1 Tensorboard: Start Tensorboard
Start TensorBoard through the command line to visualize data you logged. Specify the root log directory as used in fun_control = fun_control_init(task="regression", tensorboard_path="runs/24_spot_torch_regression")
as the tensorboard_path
. The argument logdir points to directory where TensorBoard will look to find event files that it can display. TensorBoard will recursively walk the directory structure rooted at logdir, looking for .tfevents. files.
tensorboard --logdir=runs
Go to the URL it provides or to http://localhost:6006/. The following figures show some screenshots of Tensorboard.
12.9.2 Saving the State of the Notebook
The state of the notebook can be saved and reloaded as follows:
import pickle
= False
SAVE = False
LOAD
if SAVE:
= "res_" + experiment_name + ".pkl"
result_file_name with open(result_file_name, 'wb') as f:
pickle.dump(spot_tuner, f)
if LOAD:
= "add_the_name_of_the_result_file_here.pkl"
result_file_name with open(result_file_name, 'rb') as f:
= pickle.load(f) spot_tuner
12.10 Step 10: Results
After the hyperparameter tuning run is finished, the progress of the hyperparameter tuning can be visualized. The following code generates the progress plot from ?fig-progress.
=False,
spot_tuner.plot_progress(log_y="./figures/" + experiment_name+"_progress.png") filename
?fig-progress shows a typical behaviour that can be observed in many hyperparameter studies (Bartz et al. 2022): the largest improvement is obtained during the evaluation of the initial design. The surrogate model based optimization-optimization with the surrogate refines the results. ?fig-progress also illustrates one major difference between ray[tune]
as used in PyTorch (2023a) and spotPython
: the ray[tune]
uses a random search and will generate results similar to the black dots, whereas spotPython
uses a surrogate model based optimization and presents results represented by red dots in ?fig-progress. The surrogate model based optimization is considered to be more efficient than a random search, because the surrogate model guides the search towards promising regions in the hyperparameter space.
In addition to the improved (“optimized”) hyperparameter values, spotPython
allows a statistical analysis, e.g., a sensitivity analysis, of the results. We can print the results of the hyperparameter tuning, see ?tbl-results. The table shows the hyperparameters, their types, default values, lower and upper bounds, and the transformation function. The column “tuned” shows the tuned values. The column “importance” shows the importance of the hyperparameters. The column “stars” shows the importance of the hyperparameters in stars. The importance is computed by the SPOT software.
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control=fun_control, spot=spot_tuner))
| name | type | default | lower | upper | tuned | transform | importance | stars |
|--------------|--------|-----------|---------|---------|---------|-----------------------|--------------|---------|
| l1 | int | 5 | 2.0 | 9.0 | 7.0 | transform_power_2_int | 0.08 | |
| l2 | int | 5 | 2.0 | 9.0 | 3.0 | transform_power_2_int | 0.08 | |
| lr_mult | float | 1.0 | 1.0 | 1.0 | 1.0 | None | 0.00 | |
| batch_size | int | 4 | 1.0 | 5.0 | 5.0 | transform_power_2_int | 100.00 | *** |
| epochs | int | 3 | 3.0 | 4.0 | 4.0 | transform_power_2_int | 5.04 | * |
| k_folds | int | 1 | 0.0 | 0.0 | 0.0 | None | 0.00 | |
| patience | int | 5 | 3.0 | 3.0 | 3.0 | None | 0.00 | |
| optimizer | factor | SGD | 0.0 | 9.0 | 3.0 | None | 0.21 | . |
| sgd_momentum | float | 0.0 | 0.9 | 0.9 | 0.9 | None | 0.00 | |
To visualize the most important hyperparameters, spotPython
provides the function plot_importance
. The following code generates the importance plot from ?fig-importance.
=0.025,
spot_tuner.plot_importance(threshold="./figures/" + experiment_name+"_importance.png") filename
12.10.1 Get the Tuned Architecture (SPOT Results)
The architecture of the spotPython
model can be obtained as follows. First, the numerical representation of the hyperparameters are obtained, i.e., the numpy array X
is generated. This array is then used to generate the model model_spot
by the function get_one_core_model_from_X
. The model model_spot
has the following architecture:
from spotPython.hyperparameters.values import get_one_core_model_from_X
= spot_tuner.to_all_dim(spot_tuner.min_X.reshape(1,-1))
X = get_one_core_model_from_X(X, fun_control)
model_spot model_spot
Net_CIFAR10(
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=128, bias=True)
(fc2): Linear(in_features=128, out_features=8, bias=True)
(fc3): Linear(in_features=8, out_features=10, bias=True)
)
12.10.2 Get Default Hyperparameters
In a similar manner as in Section 12.10.1, the default hyperparameters can be obtained.
# fun_control was modified, we generate a new one with the original
# default hyperparameters
from spotPython.hyperparameters.values import get_one_core_model_from_X
from spotPython.hyperparameters.values import get_default_hyperparameters_as_array
= get_default_hyperparameters_as_array(fun_control)
X_start = get_one_core_model_from_X(X_start, fun_control)
model_default model_default
Net_CIFAR10(
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=32, bias=True)
(fc2): Linear(in_features=32, out_features=32, bias=True)
(fc3): Linear(in_features=32, out_features=10, bias=True)
)
12.10.3 Evaluation of the Default Architecture
The method train_tuned
takes a model architecture without trained weights and trains this model with the train data. The train data is split into train and validation data. The validation data is used for early stopping. The trained model weights are saved as a dictionary.
This evaluation is similar to the final evaluation in PyTorch (2023a).
from spotPython.torch.traintest import (
train_tuned,
test_tuned,
)=model_default, train_dataset=train, shuffle=True,
train_tuned(net=fun_control["loss_function"],
loss_function=fun_control["metric_torch"],
metric= fun_control["device"], show_batch_interval=1_000_000,
device =None,
path=fun_control["task"],)
task
=model_default, test_dataset=test,
test_tuned(net=fun_control["loss_function"],
loss_function=fun_control["metric_torch"],
metric=False,
shuffle= fun_control["device"],
device =fun_control["task"],) task
Epoch: 1 |
MulticlassAccuracy: 0.0975499972701073 | Loss: 2.3062030729293825 | Acc: 0.0975500000000000.
Epoch: 2 |
MulticlassAccuracy: 0.0975499972701073 | Loss: 2.3040160190582277 | Acc: 0.0975500000000000.
Epoch: 3 |
MulticlassAccuracy: 0.0987500026822090 | Loss: 2.3020986358642577 | Acc: 0.0987500000000000.
Epoch: 4 |
MulticlassAccuracy: 0.1266500055789948 | Loss: 2.2995942102432250 | Acc: 0.1266500000000000.
Epoch: 5 |
MulticlassAccuracy: 0.1498499959707260 | Loss: 2.2961405302047728 | Acc: 0.1498500000000000.
Epoch: 6 |
MulticlassAccuracy: 0.1425999999046326 | Loss: 2.2900444021224975 | Acc: 0.1426000000000000.
Epoch: 7 |
MulticlassAccuracy: 0.1688500046730042 | Loss: 2.2748941745758056 | Acc: 0.1688500000000000.
Epoch: 8 |
MulticlassAccuracy: 0.1881999969482422 | Loss: 2.2260843358993530 | Acc: 0.1882000000000000.
Returned to Spot: Validation loss: 2.226084335899353
MulticlassAccuracy: 0.1918999999761581 | Loss: 2.2214019302368162 | Acc: 0.1919000000000000.
Final evaluation: Validation loss: 2.221401930236816
Final evaluation: Validation metric: 0.19189999997615814
----------------------------------------------
(2.221401930236816, nan, tensor(0.1919, device='mps:0'))
12.10.4 Evaluation of the Tuned Architecture
The following code trains the model model_spot
.
If path
is set to a filename, e.g., path = "model_spot_trained.pt"
, the weights of the trained model will be saved to this file.
If path
is set to a filename, e.g., path = "model_spot_trained.pt"
, the weights of the trained model will be loaded from this file.
=model_spot, train_dataset=train,
train_tuned(net=fun_control["loss_function"],
loss_function=fun_control["metric_torch"],
metric=True,
shuffle= fun_control["device"],
device =None,
path=fun_control["task"],)
task=model_spot, test_dataset=test,
test_tuned(net=False,
shuffle=fun_control["loss_function"],
loss_function=fun_control["metric_torch"],
metric= fun_control["device"],
device =fun_control["task"],) task
Epoch: 1 |
MulticlassAccuracy: 0.3661000132560730 | Loss: 1.7093173021316528 | Acc: 0.3661000000000000.
Epoch: 2 |
MulticlassAccuracy: 0.4627499878406525 | Loss: 1.4642572961807252 | Acc: 0.4627500000000000.
Epoch: 3 |
MulticlassAccuracy: 0.4796499907970428 | Loss: 1.4531756073951720 | Acc: 0.4796500000000000.
Epoch: 4 |
MulticlassAccuracy: 0.5193499922752380 | Loss: 1.3521972542762757 | Acc: 0.5193500000000000.
Epoch: 5 |
MulticlassAccuracy: 0.5456500053405762 | Loss: 1.2886844976425171 | Acc: 0.5456500000000000.
Epoch: 6 |
MulticlassAccuracy: 0.5571500062942505 | Loss: 1.2521571839332581 | Acc: 0.5571500000000000.
Epoch: 7 |
MulticlassAccuracy: 0.5662500262260437 | Loss: 1.2315309381484985 | Acc: 0.5662500000000000.
Epoch: 8 |
MulticlassAccuracy: 0.5618000030517578 | Loss: 1.2532640023231507 | Acc: 0.5618000000000000.
Epoch: 9 |
MulticlassAccuracy: 0.5825999975204468 | Loss: 1.1913747765541076 | Acc: 0.5826000000000000.
Epoch: 10 |
MulticlassAccuracy: 0.5830000042915344 | Loss: 1.1833122503280640 | Acc: 0.5830000000000000.
Epoch: 11 |
MulticlassAccuracy: 0.5910000205039978 | Loss: 1.1831278825759888 | Acc: 0.5910000000000000.
Epoch: 12 |
MulticlassAccuracy: 0.5956000089645386 | Loss: 1.1784049792289735 | Acc: 0.5956000000000000.
Epoch: 13 |
MulticlassAccuracy: 0.5949500203132629 | Loss: 1.1580024556159974 | Acc: 0.5949500000000000.
Epoch: 14 |
MulticlassAccuracy: 0.5827500224113464 | Loss: 1.2288481868743897 | Acc: 0.5827500000000000.
Epoch: 15 |
MulticlassAccuracy: 0.5848000049591064 | Loss: 1.2282373707771301 | Acc: 0.5848000000000000.
Epoch: 16 |
MulticlassAccuracy: 0.5872499942779541 | Loss: 1.2612915629386903 | Acc: 0.5872500000000000.
Early stopping at epoch 15
Returned to Spot: Validation loss: 1.2612915629386903
MulticlassAccuracy: 0.5918999910354614 | Loss: 1.2511131338798962 | Acc: 0.5919000000000000.
Final evaluation: Validation loss: 1.2511131338798962
Final evaluation: Validation metric: 0.5918999910354614
----------------------------------------------
(1.2511131338798962, nan, tensor(0.5919, device='mps:0'))
12.10.5 Detailed Hyperparameter Plots
The contour plots in this section visualize the interactions of the three most important hyperparameters. Since some of these hyperparameters take fatorial or integer values, sometimes step-like fitness landcapes (or response surfaces) are generated. SPOT draws the interactions of the main hyperparameters by default. It is also possible to visualize all interactions.
= "./figures/" + experiment_name
filename =filename) spot_tuner.plot_important_hyperparameter_contour(filename
l1: 0.08257501318668711
l2: 0.08257501318668711
batch_size: 100.0
epochs: 5.036050457287037
optimizer: 0.2060782987482385
The figures (?fig-contour) show the contour plots of the loss as a function of the hyperparameters. These plots are very helpful for benchmark studies and for understanding neural networks. spotPython
provides additional tools for a visual inspection of the results and give valuable insights into the hyperparameter tuning process. This is especially useful for model explainability, transparency, and trustworthiness. In addition to the contour plots, ?fig-parallel shows the parallel plot of the hyperparameters.
spot_tuner.parallel_plot()
Parallel coordinates plots
12.11 Summary and Outlook
This tutorial presents the hyperparameter tuning open source software spotPython
for PyTorch
. To show its basic features, a comparison with the “official” PyTorch
hyperparameter tuning tutorial (PyTorch 2023a) is presented. Some of the advantages of spotPython
are:
- Numerical and categorical hyperparameters.
- Powerful surrogate models.
- Flexible approach and easy to use.
- Simple JSON files for the specification of the hyperparameters.
- Extension of default and user specified network classes.
- Noise handling techniques.
- Interaction with
tensorboard
.
Currently, only rudimentary parallel and distributed neural network training is possible, but these capabilities will be extended in the future. The next version of spotPython
will also include a more detailed documentation and more examples.
Important: This tutorial does not present a complete benchmarking study (Bartz-Beielstein et al. 2020). The results are only preliminary and highly dependent on the local configuration (hard- and software). Our goal is to provide a first impression of the performance of the hyperparameter tuning package spotPython
. To demonstrate its capabilities, a quick comparison with ray[tune]
was performed. ray[tune]
was chosen, because it is presented as “an industry standard tool for distributed hyperparameter tuning.” The results should be interpreted with care.
12.12 Appendix
12.12.1 Sample Output From Ray Tune’s Run
The output from ray[tune]
could look like this (PyTorch 2023b):
Number of trials: 10 (10 TERMINATED)
------+------+-------------+--------------+---------+------------+--------------------+
| l1 | l2 | lr | batch_size | loss | accuracy | training_iteration |
+------+------+-------------+--------------+---------+------------+--------------------|
| 64 | 4 | 0.00011629 | 2 | 1.87273 | 0.244 | 2 |
| 32 | 64 | 0.000339763 | 8 | 1.23603 | 0.567 | 8 |
| 8 | 16 | 0.00276249 | 16 | 1.1815 | 0.5836 | 10 |
| 4 | 64 | 0.000648721 | 4 | 1.31131 | 0.5224 | 8 |
| 32 | 16 | 0.000340753 | 8 | 1.26454 | 0.5444 | 8 |
| 8 | 4 | 0.000699775 | 8 | 1.99594 | 0.1983 | 2 |
| 256 | 8 | 0.0839654 | 16 | 2.3119 | 0.0993 | 1 |
| 16 | 128 | 0.0758154 | 16 | 2.33575 | 0.1327 | 1 |
| 16 | 8 | 0.0763312 | 16 | 2.31129 | 0.1042 | 4 |
| 128 | 16 | 0.000124903 | 4 | 2.26917 | 0.1945 | 1 |
+-----+------+------+-------------+--------------+---------+------------+--------------------+
Best trial config: {'l1': 8, 'l2': 16, 'lr': 0.00276249, 'batch_size': 16, 'data_dir': '...'}
Best trial final validation loss: 1.181501
Best trial final validation accuracy: 0.5836
Best trial test set accuracy: 0.5806
Alternatively, the source code can be downloaded from gitHub: https://github.com/sequential-parameter-optimization/spotPython.↩︎
We were not able to install
Ray Tune
on our system. Therefore, we used the results from thePyTorch
tutorial.↩︎