11  river Hyperparameter Tuning: HATR with Friedman Drift Data

11.1 Setup

Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size and the device that is used.

Caution: Run time and initial design size should be increased for real experiments
  • MAX_TIME is set to one minute for demonstration purposes. For real experiments, this should be increased to at least 1 hour.
  • INIT_SIZE is set to 5 for demonstration purposes. For real experiments, this should be increased to at least 10.
  • K is set to 0.1 for demonstration purposes. For real experiments, this should be increased to at least 1.
MAX_TIME = 1
INIT_SIZE = 5
PREFIX="10-river"
K = .1
10-river_bartz08-2_2023-07-10_00-25-44

11.2 Initialization of the Empty fun_control Dictionary

spotPython supports the visualization of the hyperparameter tuning process with TensorBoard. The following example shows how to use TensorBoard with spotPython.

First, we define an “experiment name” to identify the hyperparameter tuning process. The experiment name is used to create a directory for the TensorBoard files.

from spotPython.utils.init import fun_control_init
from spotPython.utils.file import get_spot_tensorboard_path

experiment_name = get_experiment_name(prefix=PREFIX)
fun_control = fun_control_init(
    spot_tensorboard_path=get_spot_tensorboard_path(experiment_name),
    TENSORBOARD_CLEAN=True)

Since the spot_tensorboard_path is defined, spotPython will log the optimization process in the TensorBoard files. The TensorBoard files are stored in the directory spot_tensorboard_path. We can pass the TensorBoard information to the Spot method via the fun_control dictionary.

11.3 Load Data: The Friedman Drift Data

horizon = 7*24
k = K
n_total = int(k*100_000)
n_samples = n_total
p_1 = int(k*25_000)
p_2 = int(k*50_000)
position=(p_1, p_2)
n_train = 1_000
a = n_train + p_1 - 12
b = a + 12
  • Since we also need a river version of the data below for plotting the model, the corresponding data set is generated here. Note: spotRiver uses the train and test data sets, while river uses the X and y data sets
from river.datasets import synth
import pandas as pd
dataset = synth.FriedmanDrift(
   drift_type='gra',
   position=position,
     seed=123
)
from spotRiver.utils.data_conversion import convert_to_df
target_column = "y"
df = convert_to_df(dataset, target_column=target_column, n_total=n_total)
# Add column names x1 until x10 to the first 10 columns of the dataframe and the column name y to the last column
df.columns = [f"x{i}" for i in range(1, 11)] + ["y"]

train = df[:n_train]
test = df[n_train:]
#
fun_control.update({"data": None, # dataset,
               "train": train,
               "test": test,
               "n_samples": n_samples,
               "target_column": target_column})

11.4 Specification of the Preprocessing Model

from river import preprocessing
prep_model = preprocessing.StandardScaler()
fun_control.update({"prep_model": prep_model})

11.5 SelectSelect Model (algorithm) and core_model_hyper_dict

  • The river model (HATR) is selected.
  • Furthermore, the corresponding hyperparameters, see: https://riverml.xyz/0.15.0/api/tree/HoeffdingTreeRegressor/ are selected (incl. type information, names, and bounds).
  • The corresponding hyperparameter dictionary is added to the fun_control dictionary.
  • Alternatively, you can load a local hyper_dict. Simply set river_hyper_dict.json as the filename. If filenameis set to None, the hyper_dict is loaded from the spotRiver package.
from river.tree import HoeffdingAdaptiveTreeRegressor
from spotRiver.data.river_hyper_dict import RiverHyperDict
from spotPython.hyperparameters.values import add_core_model_to_fun_control
core_model  = HoeffdingAdaptiveTreeRegressor
add_core_model_to_fun_control(core_model=core_model,
                              fun_control=fun_control,
                              hyper_dict=RiverHyperDict,
                              filename=None)

11.6 Modify hyper_dict Hyperparameters for the Selected Algorithm aka core_model

11.6.1 Modify hyperparameter of type factor

# modify_hyper_parameter_levels(fun_control, "leaf_model", ["LinearRegression"])
# fun_control["core_model_hyper_dict"]

11.6.2 Modify hyperparameter of type numeric and integer (boolean)

from spotPython.hyperparameters.values import modify_hyper_parameter_bounds
modify_hyper_parameter_bounds(fun_control, "delta", bounds=[1e-10, 1e-6])
# modify_hyper_parameter_bounds(fun_control, "min_samples_split", bounds=[3, 20])
modify_hyper_parameter_bounds(fun_control, "merit_preprune", [0, 0])

11.7 Selection of the Objective (Loss) Function

There are two metrics:

1. `metric` is used for the river based evaluation via `eval_oml_iter_progressive`.
2. `metric_sklearn` is used for the sklearn based evaluation via `eval_oml_horizon`.
import numpy as np
from river import metrics
from sklearn.metrics import mean_absolute_error

weights = np.array([1, 1/1000, 1/1000])*10_000.0
horizon = 7*24
oml_grace_period = 2
step = 100
weight_coeff = 1.0

fun_control.update({
               "horizon": horizon,
               "oml_grace_period": oml_grace_period,
               "weights": weights,
               "step": step,
               "log_level": 50,
               "weight_coeff": weight_coeff,
               "metric": metrics.MAE(),
               "metric_sklearn": mean_absolute_error
               })

11.8 Calling the SPOT Function

11.8.1 Prepare the SPOT Parameters

  • Get types and variable names as well as lower and upper bounds for the hyperparameters.
from spotPython.hyperparameters.values import (
    get_var_type,
    get_var_name,
    get_bound_values    
    )
var_type = get_var_type(fun_control)
var_name = get_var_name(fun_control)
lower = get_bound_values(fun_control, "lower")
upper = get_bound_values(fun_control, "upper")
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))
| name                   | type   | default          |      lower |    upper | transform             |
|------------------------|--------|------------------|------------|----------|-----------------------|
| grace_period           | int    | 200              |     10     | 1000     | None                  |
| max_depth              | int    | 20               |      2     |   20     | transform_power_2_int |
| delta                  | float  | 1e-07            |      1e-10 |    1e-06 | None                  |
| tau                    | float  | 0.05             |      0.01  |    0.1   | None                  |
| leaf_prediction        | factor | mean             |      0     |    2     | None                  |
| leaf_model             | factor | LinearRegression |      0     |    2     | None                  |
| model_selector_decay   | float  | 0.95             |      0.9   |    0.99  | None                  |
| splitter               | factor | EBSTSplitter     |      0     |    2     | None                  |
| min_samples_split      | int    | 5                |      2     |   10     | None                  |
| bootstrap_sampling     | factor | 0                |      0     |    1     | None                  |
| drift_window_threshold | int    | 300              |    100     |  500     | None                  |
| switch_significance    | float  | 0.05             |      0.01  |    0.1   | None                  |
| binary_split           | factor | 0                |      0     |    1     | None                  |
| max_size               | float  | 500.0            |    100     | 1000     | None                  |
| memory_estimate_period | int    | 1000000          | 100000     |    1e+06 | None                  |
| stop_mem_management    | factor | 0                |      0     |    1     | None                  |
| remove_poor_attrs      | factor | 0                |      0     |    1     | None                  |
| merit_preprune         | factor | 0                |      0     |    0     | None                  |

11.8.2 The Objective Function

The objective function is selected next.

from spotRiver.fun.hyperriver import HyperRiver
fun = HyperRiver().fun_oml_horizon
from spotPython.hyperparameters.values import get_default_hyperparameters_as_array
X_start = get_default_hyperparameters_as_array(fun_control)

11.8.3 Run the Spot Optimizer

  • Run SPOT for approx. x mins (max_time).
  • Note: the run takes longer, because the evaluation time of initial design (here: init_size = INIT_SIZE as specified above) is not considered.
from spotPython.spot import spot
from math import inf
import numpy as np
spot_tuner = spot.Spot(fun=fun,
                   lower = lower,
                   upper = upper,
                   fun_evals = inf,
                   infill_criterion = "y",
                   max_time = MAX_TIME,
                   tolerance_x = np.sqrt(np.spacing(1)),
                   var_type = var_type,
                   var_name = var_name,
                   show_progress= True,
                   fun_control = fun_control,
                   design_control={"init_size": INIT_SIZE},
                   surrogate_control={"noise": False,
                                      "cod_type": "norm",
                                      "min_theta": -4,
                                      "max_theta": 3,
                                      "n_theta": len(var_name),
                                      "model_fun_evals": 10_000})
spot_tuner.run(X_start=X_start)
spotPython tuning: 2.088280756722266 [##--------] 21.42% 
spotPython tuning: 2.088280756722266 [####------] 35.69% 
spotPython tuning: 2.088280756722266 [#####-----] 48.72% 
spotPython tuning: 2.088280756722266 [######----] 60.38% 
spotPython tuning: 2.088280756722266 [#########-] 87.39% 
spotPython tuning: 2.088280756722266 [##########] 100.00% Done...
<spotPython.spot.spot.Spot at 0x2c93d3dc0>

11.8.4 TensorBoard

Now we can start TensorBoard in the background with the following command:

tensorboard --logdir="./runs"

We can access the TensorBoard web server with the following URL:

http://localhost:6006/

The TensorBoard plot illustrates how spotPython can be used as a microscope for the internal mechanisms of the surrogate-based optimization process. Here, one important parameter, the learning rate \(\theta\) of the Kriging surrogate is plotted against the number of optimization steps.

TensorBoard visualization of the spotPython optimization process and the surrogate model.

11.8.5 Results

from spotPython.utils.file import save_pickle
save_pickle(spot_tuner, experiment_name)
from spotPython.utils.file import load_pickle
spot_tuner = load_pickle(experiment_name)
  • Show the Progress of the hyperparameter tuning:

After the hyperparameter tuning run is finished, the progress of the hyperparameter tuning can be visualized.

spot_tuner.plot_progress(log_y=True, filename="./figures/" + experiment_name+"_progress.pdf")

  • Print the Results
print(gen_design_table(fun_control=fun_control, spot=spot_tuner))
| name                   | type   | default          |    lower |     upper |              tuned | transform             |   importance | stars   |
|------------------------|--------|------------------|----------|-----------|--------------------|-----------------------|--------------|---------|
| grace_period           | int    | 200              |     10.0 |    1000.0 |              179.0 | None                  |         0.00 |         |
| max_depth              | int    | 20               |      2.0 |      20.0 |               19.0 | transform_power_2_int |         0.00 |         |
| delta                  | float  | 1e-07            |    1e-10 |     1e-06 |              1e-10 | None                  |         0.00 |         |
| tau                    | float  | 0.05             |     0.01 |       0.1 |                0.1 | None                  |         0.00 |         |
| leaf_prediction        | factor | mean             |      0.0 |       2.0 |                2.0 | None                  |         8.83 | *       |
| leaf_model             | factor | LinearRegression |      0.0 |       2.0 |                0.0 | None                  |       100.00 | ***     |
| model_selector_decay   | float  | 0.95             |      0.9 |      0.99 |               0.99 | None                  |         0.00 |         |
| splitter               | factor | EBSTSplitter     |      0.0 |       2.0 |                2.0 | None                  |         0.00 |         |
| min_samples_split      | int    | 5                |      2.0 |      10.0 |                4.0 | None                  |         0.00 |         |
| bootstrap_sampling     | factor | 0                |      0.0 |       1.0 |                0.0 | None                  |         0.00 |         |
| drift_window_threshold | int    | 300              |    100.0 |     500.0 |              168.0 | None                  |         0.00 |         |
| switch_significance    | float  | 0.05             |     0.01 |       0.1 |               0.01 | None                  |         0.00 |         |
| binary_split           | factor | 0                |      0.0 |       1.0 |                0.0 | None                  |         0.00 |         |
| max_size               | float  | 500.0            |    100.0 |    1000.0 | 104.93419488206317 | None                  |         0.00 |         |
| memory_estimate_period | int    | 1000000          | 100000.0 | 1000000.0 |           975784.0 | None                  |         0.00 |         |
| stop_mem_management    | factor | 0                |      0.0 |       1.0 |                1.0 | None                  |         0.00 |         |
| remove_poor_attrs      | factor | 0                |      0.0 |       1.0 |                1.0 | None                  |         0.00 |         |
| merit_preprune         | factor | 0                |      0.0 |       0.0 |                0.0 | None                  |         0.00 |         |

11.9 Show variable importance

spot_tuner.plot_importance(threshold=0.0025, filename="./figures/" + experiment_name+"_importance.pdf")

11.10 Build and Evaluate HTR Model with Tuned Hyperparameters

m = test.shape[0]
a = int(m/2)-50
b = int(m/2)

11.11 Der große Datensatz

Caution: Increased Friedman-Drift Data Set
  • The Friedman-Drift Data Set is increased by a factor of two to show the transferability of the hyperparameter tuning results.
  • Larger values of k lead to a longer run time.
horizon = 7*24
k = 0.2
n_total = int(k*100_000)
n_samples = n_total
p_1 = int(k*25_000)
p_2 = int(k*50_000)
position=(p_1, p_2)
n_train = 1_000
a = n_train + p_1 - 12
b = a + 12
from river.datasets import synth
dataset = synth.FriedmanDrift(
   drift_type='gra',
   position=position,
     seed=123
)
from spotRiver.utils.data_conversion import convert_to_df
target_column = "y"
df = convert_to_df(dataset, target_column=target_column, n_total=n_total)
# Add column names x1 until x10 to the first 10 columns of the dataframe and the column name y to the last column
df.columns = [f"x{i}" for i in range(1, 11)] + ["y"]
train = df[:n_train]
test = df[n_train:]
target_column = "y"
#
fun_control.update({"data": None, # dataset,
               "train": train,
               "test": test,
               "n_samples": n_samples,
               "target_column": target_column})

11.12 Get Default Hyperparameters

# fun_control was modified, we generate a new one with the original 
# default hyperparameters
from spotPython.hyperparameters.values import get_one_core_model_from_X
from spotPython.hyperparameters.values import get_default_hyperparameters_as_array
X_start = get_default_hyperparameters_as_array(fun_control)
model_default = get_one_core_model_from_X(X_start, fun_control)
model_default
HoeffdingAdaptiveTreeRegressor (
  grace_period=200
  max_depth=1048576
  delta=1e-07
  tau=0.05
  leaf_prediction="mean"
  leaf_model=LinearRegression (
    optimizer=SGD (
      lr=Constant (
        learning_rate=0.01
      )
    )
    loss=Squared ()
    l2=0.
    l1=0.
    intercept_init=0.
    intercept_lr=Constant (
      learning_rate=0.01
    )
    clip_gradient=1e+12
    initializer=Zeros ()
  )
  model_selector_decay=0.95
  nominal_attributes=None
  splitter=EBSTSplitter ()
  min_samples_split=5
  bootstrap_sampling=0
  drift_window_threshold=300
  drift_detector=ADWIN (
    delta=0.002
    clock=32
    max_buckets=5
    min_window_length=5
    grace_period=10
  )
  switch_significance=0.05
  binary_split=0
  max_size=500.
  memory_estimate_period=1000000
  stop_mem_management=0
  remove_poor_attrs=0
  merit_preprune=0
  seed=None
)
from spotRiver.evaluation.eval_bml import eval_oml_horizon

df_eval_default, df_true_default = eval_oml_horizon(
                    model=model_default,
                    train=fun_control["train"],
                    test=fun_control["test"],
                    target_column=fun_control["target_column"],
                    horizon=fun_control["horizon"],
                    oml_grace_period=fun_control["oml_grace_period"],
                    metric=fun_control["metric_sklearn"],
                )
from spotRiver.evaluation.eval_bml import plot_bml_oml_horizon_metrics, plot_bml_oml_horizon_predictions
df_labels=["default"]
plot_bml_oml_horizon_metrics(df_eval = [df_eval_default], log_y=False, df_labels=df_labels, metric=fun_control["metric_sklearn"])
plot_bml_oml_horizon_predictions(df_true = [df_true_default[a:b]], target_column=target_column,  df_labels=df_labels)

11.13 Get SPOT Results

from spotPython.hyperparameters.values import get_one_core_model_from_X
X = spot_tuner.to_all_dim(spot_tuner.min_X.reshape(1,-1))
model_spot = get_one_core_model_from_X(X, fun_control)
model_spot
HoeffdingAdaptiveTreeRegressor (
  grace_period=179
  max_depth=524288
  delta=1e-10
  tau=0.1
  leaf_prediction="adaptive"
  leaf_model=LinearRegression (
    optimizer=SGD (
      lr=Constant (
        learning_rate=0.01
      )
    )
    loss=Squared ()
    l2=0.
    l1=0.
    intercept_init=0.
    intercept_lr=Constant (
      learning_rate=0.01
    )
    clip_gradient=1e+12
    initializer=Zeros ()
  )
  model_selector_decay=0.99
  nominal_attributes=None
  splitter=QOSplitter (
    radius=0.25
    allow_multiway_splits=False
  )
  min_samples_split=4
  bootstrap_sampling=0
  drift_window_threshold=168
  drift_detector=ADWIN (
    delta=0.002
    clock=32
    max_buckets=5
    min_window_length=5
    grace_period=10
  )
  switch_significance=0.01
  binary_split=0
  max_size=104.934195
  memory_estimate_period=975784
  stop_mem_management=1
  remove_poor_attrs=1
  merit_preprune=0
  seed=None
)
df_eval_spot, df_true_spot = eval_oml_horizon(
                    model=model_spot,
                    train=fun_control["train"],
                    test=fun_control["test"],
                    target_column=fun_control["target_column"],
                    horizon=fun_control["horizon"],
                    oml_grace_period=fun_control["oml_grace_period"],
                    metric=fun_control["metric_sklearn"],
                )
df_labels=["default", "spot"]
plot_bml_oml_horizon_metrics(df_eval = [df_eval_default, df_eval_spot], log_y=False, df_labels=df_labels, metric=fun_control["metric_sklearn"], filename="./figures/" + experiment_name+"_metrics.pdf")

a = int(m/2)+20
b = int(m/2)+50
plot_bml_oml_horizon_predictions(df_true = [df_true_default[a:b], df_true_spot[a:b]], target_column=target_column,  df_labels=df_labels, filename="./figures/" + experiment_name+"_predictions.pdf")

from spotPython.plot.validation import plot_actual_vs_predicted
plot_actual_vs_predicted(y_test=df_true_default["y"], y_pred=df_true_default["Prediction"], title="Default")
plot_actual_vs_predicted(y_test=df_true_spot["y"], y_pred=df_true_spot["Prediction"], title="SPOT")

11.14 Visualize Regression Trees

dataset_f = dataset.take(n_total)
for x, y in dataset_f:
    model_default.learn_one(x, y)
Caution: Large Trees
  • Since the trees are large, the visualization is suppressed by default.
  • To visualize the trees, uncomment the following line.
# model_default.draw()
model_default.summary
{'n_nodes': 35,
 'n_branches': 17,
 'n_leaves': 18,
 'n_active_leaves': 96,
 'n_inactive_leaves': 0,
 'height': 6,
 'total_observed_weight': 39002.0,
 'n_alternate_trees': 21,
 'n_pruned_alternate_trees': 6,
 'n_switch_alternate_trees': 2}

11.14.1 Spot Model

dataset_f = dataset.take(n_total)
for x, y in dataset_f:
    model_spot.learn_one(x, y)
Caution: Large Trees
  • Since the trees are large, the visualization is suppressed by default.
  • To visualize the trees, uncomment the following line.
# model_spot.draw()
model_spot.summary
{'n_nodes': 51,
 'n_branches': 25,
 'n_leaves': 26,
 'n_active_leaves': 106,
 'n_inactive_leaves': 0,
 'height': 11,
 'total_observed_weight': 39002.0,
 'n_alternate_trees': 34,
 'n_pruned_alternate_trees': 12,
 'n_switch_alternate_trees': 0}
from spotPython.utils.eda import compare_two_tree_models
print(compare_two_tree_models(model_default, model_spot))
| Parameter                |   Default |   Spot |
|--------------------------|-----------|--------|
| n_nodes                  |        35 |     51 |
| n_branches               |        17 |     25 |
| n_leaves                 |        18 |     26 |
| n_active_leaves          |        96 |    106 |
| n_inactive_leaves        |         0 |      0 |
| height                   |         6 |     11 |
| total_observed_weight    |     39002 |  39002 |
| n_alternate_trees        |        21 |     34 |
| n_pruned_alternate_trees |         6 |     12 |
| n_switch_alternate_trees |         2 |      0 |

11.15 Detailed Hyperparameter Plots

filename = "./figures/" + experiment_name
spot_tuner.plot_important_hyperparameter_contour(filename=filename)
leaf_prediction:  8.829577109947413
leaf_model:  100.0

11.16 Parallel Coordinates Plots

spot_tuner.parallel_plot()

11.17 Plot all Combinations of Hyperparameters

  • Warning: this may take a while.
PLOT_ALL = False
if PLOT_ALL:
    n = spot_tuner.k
    for i in range(n-1):
        for j in range(i+1, n):
            spot_tuner.plot_contour(i=i, j=j, min_z=min_z, max_z = max_z)