Synchronous successive halving has a problem: at each rung, you wait for all surviving configs to finish before promoting. With multiple workers, fast configs finish first and idle waiting for stragglers.
Synchronous SH: workers idle while waiting for slow trials at the rung boundary.
ASHA (Asynchronous Successive Halving) fixes this: promote configs to the next rung the moment they qualify, without waiting for the rest of the cohort. Workers always have something to do.
State-of-the-art-grade HPO performance, with one parallelism mechanism that handles both early stopping and parallel dispatch.
ASHA Setup
Import Syne Tune’s ASHA scheduler and reuse the same objective family as the random-search deck. The only new idea is that max_epochs becomes the resource budget:
from d2l import torch as d2limport logginglogging.basicConfig(level=logging.INFO)import matplotlib.pyplot as pltfrom syne_tune.config_space import loguniform, randintfrom syne_tune.backend.python_backend.python_backend import PythonBackendfrom syne_tune.optimizer.baselines import ASHAfrom syne_tune import Tuner, StoppingCriterionfrom syne_tune.experiments import load_experiment
pip install 'syne-tune[aws]'
or (for everything)
pip install 'syne-tune[extra]'
pip install 'syne-tune[aws]'
or (for everything)
pip install 'syne-tune[extra]'
pip install 'syne-tune[raytune]'
or (for everything)
pip install 'syne-tune[extra]'
pip install 'syne-tune[aws]'
or (for everything)
pip install 'syne-tune[extra]'
Objective with Epochs as Budget
The objective reports validation error after every epoch, so the scheduler can decide whether to stop, continue, or promote a trial without waiting for full training:
def hpo_objective_lenet_synetune(learning_rate, batch_size, max_epochs):from d2l import torch as d2lfrom syne_tune import Reporter model = d2l.LeNet(lr=learning_rate, num_classes=10) trainer = d2l.HPOTrainer(max_epochs=1, num_gpus=1) data = d2l.FashionMNIST(batch_size=batch_size) model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn) report = Reporter()for epoch inrange(1, max_epochs +1):if epoch ==1:# Initialize the state of Trainer trainer.fit(model=model, data=data)else: trainer.fit_epoch() validation_error = d2l.numpy(trainer.validation_error().cpu()) report(epoch=epoch, validation_error=float(validation_error))
ASHA Configuration
Rung budgets grow geometrically:
r_i = r_{\min}\eta^i,\quad r_i \le r_{\max}.
With \eta=2, roughly half the trials advance at each rung and survivors receive twice as much training budget:
Per-rung “halving books”: each book holds the configs that have completed up to budget r_i. When a worker frees up, look across all rungs for any config that qualifies for promotion (top 1/\eta at its rung); if none, sample a fresh config:
Promotion rule: after at least \eta trials are observed at rung i, promote a config only if it is in the best \lfloor n_i/\eta \rfloor scores at that rung.
n_workers =2# Needs to be <= the number of available GPUsmax_wallclock_time =2*60# 2 minutes
INFO:syne_tune.optimizer.schedulers.fifo:max_resource_level = 10, as inferred from config_space
INFO:syne_tune.optimizer.schedulers.fifo:Master random_seed = 3592025114
ASHA scheduler (cont.)
The tuner launch is intentionally omitted from the lecture slide: its console output is backend bookkeeping, not a conceptual step. The relevant result is the incumbent curve loaded from the completed experiment.