Hyperparameter Optimization API

The HPO API Pattern

HPO algorithms have a common structure. The next two decks will swap out pieces (parallel scheduling, multi-fidelity). This deck factors out the common skeleton:

Searcher — proposes the next configuration. Random, Bayesian, evolutionary, …
Scheduler — decides which trials to run, when to stop them, how to allocate compute. Async, Hyperband, ASHA, …
Tuner — runs the loop: ask searcher, ask scheduler, evaluate, log.

Same shape every modern HPO library uses (Optuna, SyneTune, Vizier, Ray Tune).

import time
from d2l import torch as d2l
from scipy import stats

Searcher base class

class HPOSearcher(d2l.HyperParameters):
    def sample_configuration(self) -> dict:
        raise NotImplementedError

    def update(self, config: dict, error: float, additional_info=None):
        pass

A concrete RandomSearcher:

class RandomSearcher(HPOSearcher):
    def __init__(self, config_space: dict, initial_config=None):
        self.save_hyperparameters()

    def sample_configuration(self) -> dict:
        if self.initial_config is not None:
            result = self.initial_config
            self.initial_config = None
        else:
            result = {
                name: domain.rvs()
                for name, domain in self.config_space.items()
            }
        return result

Scheduler base class

class HPOScheduler(d2l.HyperParameters):
    def suggest(self) -> dict:
        raise NotImplementedError
    
    def update(self, config: dict, error: float, info=None):
        raise NotImplementedError

Concrete sequential / FIFO scheduler:

class BasicScheduler(HPOScheduler):
    def __init__(self, searcher: HPOSearcher):
        self.save_hyperparameters()

    def suggest(self) -> dict:
        return self.searcher.sample_configuration()

    def update(self, config: dict, error: float, info=None):
        self.searcher.update(config, error, additional_info=info)

Tuner

Combines searcher + scheduler + objective into a single loop:

class HPOTuner(d2l.HyperParameters):
    def __init__(self, scheduler: HPOScheduler, objective: callable):
        self.save_hyperparameters()
        # Bookkeeping results for plotting
        self.incumbent = None
        self.incumbent_error = None
        self.incumbent_trajectory = []
        self.cumulative_runtime = []
        self.current_runtime = 0
        self.records = []

    def run(self, number_of_trials):
        for i in range(number_of_trials):
            start_time = time.time()
            config = self.scheduler.suggest()
            print(f"Trial {i}: config = {config}")
            error = self.objective(**config)
            if hasattr(error, 'cpu'):
                error = error.cpu()
            error = float(d2l.numpy(error))
            self.scheduler.update(config, error)
            runtime = time.time() - start_time
            self.bookkeeping(config, error, runtime)
            print(f"    error = {error}, runtime = {runtime}")

Bookkeeping

Track wall-clock time and best-seen objective so we can plot any-time performance later:

@d2l.add_to_class(HPOTuner)
def bookkeeping(self, config: dict, error: float, runtime: float):
    self.records.append({"config": config, "error": error, "runtime": runtime})
    # Check if the last hyperparameter configuration performs better 
    # than the incumbent
    if self.incumbent is None or self.incumbent_error > error:
        self.incumbent = config
        self.incumbent_error = error
    # Add current best observed performance to the optimization trajectory
    self.incumbent_trajectory.append(self.incumbent_error)
    # Update runtime
    self.current_runtime += runtime
    self.cumulative_runtime.append(self.current_runtime)

Tuning a CNN

Run the abstraction on a real model — a small CNN on Fashion-MNIST. Search over learning rate, batch size, and network width:

def hpo_objective_lenet(learning_rate, batch_size, max_epochs=10):
    model = d2l.LeNet(lr=learning_rate, num_classes=10)
    trainer = d2l.HPOTrainer(max_epochs=max_epochs, num_gpus=1)
    data = d2l.FashionMNIST(batch_size=batch_size)
    model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
    trainer.fit(model=model, data=data)
    validation_error = trainer.validation_error()
    return validation_error

config_space = {
    "learning_rate": stats.loguniform(1e-2, 1),
    "batch_size": stats.randint(32, 256),
}
initial_config = {
    "learning_rate": 0.1,
    "batch_size": 128,
}

Run + results

The incumbent curve reports the best validation error found so far as the tuner spends more wall-clock time. Downward steps mean a new configuration beat the previous best; flat regions mean the search is still evaluating but has not improved the incumbent.

searcher = RandomSearcher(config_space, initial_config=initial_config)
scheduler = BasicScheduler(searcher=searcher)
tuner = HPOTuner(scheduler=scheduler, objective=hpo_objective_lenet)
tuner.run(number_of_trials=5)

    error = 0.7420397996902466, runtime = 53.76104807853699

board = d2l.ProgressBoard(xlabel="time", ylabel="error")
for time_stamp, error in zip(
    tuner.cumulative_runtime, tuner.incumbent_trajectory
):
    board.draw(time_stamp, error, "random search", every_n=1)
board.flush()  # drawing is asynchronous; render the final trajectory

Recap

HPO library skeleton: searcher + scheduler + tuner.
The next decks plug in:
- Async random search — parallel workers without waiting.
- Successive halving / ASHA — early-stopping bad trials based on partial training curves.
Compare algorithms on any-time performance plots: best-seen-vs-wall-clock-time, not just final accuracy.