Objective with simulated wall time

Asynchronous Random Search

Random search is embarrassingly parallel — each trial is independent. With K machines, you’d hope for K \times speedup. But synchronous parallelism wastes time on stragglers: every batch of K trials waits for the slowest one.

Asynchronous random search keeps every worker busy: when one finishes, immediately give it a new config. Total wall-clock time scales much better.

Sync vs async parallel HPO: async avoids idle workers when trials finish at different times.

This deck wires up an async scheduler around the API abstraction from the previous deck.

Setup

A toy objective whose runtime depends on the hyperparameters — exposes the straggler problem clearly:

def hpo_objective_lenet_synetune(learning_rate, batch_size, max_epochs):
    from d2l import torch as d2l    
    from syne_tune import Reporter

    model = d2l.LeNet(lr=learning_rate, num_classes=10)
    trainer = d2l.HPOTrainer(max_epochs=1, num_gpus=1)
    data = d2l.FashionMNIST(batch_size=batch_size)
    model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
    report = Reporter() 
    for epoch in range(1, max_epochs + 1):
        if epoch == 1:
            # Initialize the state of Trainer
            trainer.fit(model=model, data=data) 
        else:
            trainer.fit_epoch()
        validation_error = d2l.numpy(trainer.validation_error().cpu())
        report(epoch=epoch, validation_error=float(validation_error))

Async scheduler

Maintain a worker pool; on each tick, dispatch new trials to free workers; collect completed trial results asynchronously:

n_workers = 2  # Needs to be <= the number of available GPUs

max_wallclock_time = 2 * 60  # 2 minutes
mode = "min"
metric = "validation_error"

Scheduler (cont.)

config_space = {
    "learning_rate": loguniform(1e-2, 1),
    "batch_size": randint(32, 256),
    "max_epochs": 10,
}
initial_config = {
    "learning_rate": 0.1,
    "batch_size": 128,
}
trial_backend = PythonBackend(
    tune_function=hpo_objective_lenet_synetune,
    config_space=config_space,
)

Wiring up the loop

scheduler = RandomSearch(
    config_space,
    metric=metric,
    mode=mode,
    points_to_evaluate=[initial_config],
)
stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)

tuner = Tuner(
    trial_backend=trial_backend,
    scheduler=scheduler, 
    stop_criterion=stop_criterion,
    n_workers=n_workers,
    print_update_interval=int(max_wallclock_time * 0.6),
)

Loop (cont.)

Run the tuner, then load the experiment results. The raw Syne Tune logs contain local paths and backend commands, so the slide keeps the plot-producing analysis cell instead of the console transcript.

d2l.set_figsize()
tuning_experiment = load_experiment(tuner.name)
tuning_experiment.plot()

Wall-clock advantage

Plot best-seen-vs-time for sync vs async. Async always makes progress; sync sits idle while waiting for stragglers:

d2l.set_figsize([6, 2.5])
results = tuning_experiment.results

for trial_id in results.trial_id.unique():
    df = results[results["trial_id"] == trial_id]
    d2l.plt.plot(
        df["st_tuner_time"],
        df["validation_error"],
        marker="o"
    )
    
d2l.plt.xlabel("wall-clock time")
d2l.plt.ylabel("objective function")

Recap

  • Async random search ≈ sync random search statistically, but much better wall-clock-wise.
  • The skeleton (worker pool, dispatch on availability) generalizes to any HPO algorithm — not just random search.
  • Production HPO libraries (SyneTune, Ray Tune) make async the default.