import tensorflow as tf
tf.config.set_visible_devices([], 'GPU')
from d2l import tensorflow as d2l
import keras
import numpy as np
from scipy import statsHyperparameters are the knobs you tune outside gradient descent: learning rate, batch size, depth, dropout rate. Usually 5–20 of them; the validation loss is non-convex, noisy, and expensive — one full training run per setting.
Hyperparameter optimization (HPO) automates the tuning. Simplest variant: random search — sample configurations from a prior, evaluate, keep the best.
Train multiple models with different hyperparameters; pick the best.
Random search beats grid search and most hand-tuning. Smarter algorithms (Bayesian opt, Hyperband) come next.
Find \mathbf{x}^* = \arg\min_{\mathbf{x} \in \mathcal{X}} f(\mathbf{x}) where f is the validation error after training with hyperparameters \mathbf{x}, and \mathcal{X} is the configuration space — a structured product of discrete and continuous ranges.
The “function” we’re optimizing is “train a model with this config, return validation error”. Wrap that into a clean callable:
def hpo_objective_softmax_classification(config, max_epochs=8):
learning_rate = config["learning_rate"]
import keras
model = keras.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(10),
])
model.compile(
optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
data = d2l.FashionMNIST(batch_size=16)
train_ds = data.get_dataloader(True)
val_ds = data.get_dataloader(False)
history = model.fit(train_ds, epochs=max_epochs, validation_data=val_ds,
verbose=0)
val_acc = history.history['val_accuracy'][-1]
return 1 - val_accA structured space — log-uniform for learning rate (spans orders of magnitude), uniform integer for layer counts, categorical for activations:
Iterate: draw random config, evaluate, log. Keep the best seen so far. Brutally simple, surprisingly effective:
errors, values = [], []
num_iterations = 5
for i in range(num_iterations):
learning_rate = config_space["learning_rate"].rvs()
print(f"Trial {i}: learning_rate = {learning_rate}")
y = hpo_objective_softmax_classification({"learning_rate": learning_rate})
print(f" validation_error = {y}")
values.append(learning_rate)
errors.append(y)Trial 0: learning_rate = 0.224069440717892
validation_error = 0.8443000018596649
Trial 1: learning_rate = 0.0019335868822569966
validation_error = 0.8999999985098839
Trial 2: learning_rate = 0.13492200851658212
validation_error = 0.8985999971628189
Trial 3: learning_rate = 0.0007946191990455363
validation_error = 0.8999999985098839
Trial 4: learning_rate = 0.19835234956767864
validation_error = 0.8999999985098839