Asynchronous Successive Halving

Asynchronous Successive Halving

Synchronous successive halving has a problem: at each rung, you wait for all surviving configs to finish before promoting. With multiple workers, fast configs finish first and idle waiting for stragglers.

Synchronous SH: workers idle while waiting for slow trials at the rung boundary.

ASHA (Asynchronous Successive Halving) fixes this: promote configs to the next rung the moment they qualify, without waiting for the rest of the cohort. Workers always have something to do.

ASHA: free worker, qualified config, instant promotion.

State-of-the-art-grade HPO performance, with one parallelism mechanism that handles both early stopping and parallel dispatch.

ASHA Setup

Import Syne Tune’s ASHA scheduler and reuse the same objective family as the random-search deck. The only new idea is that max_epochs becomes the resource budget:

from d2l import torch as d2l
import logging
logging.basicConfig(level=logging.INFO)
import matplotlib.pyplot as plt
from syne_tune.config_space import loguniform, randint
from syne_tune.backend.python_backend.python_backend import PythonBackend
from syne_tune.optimizer.baselines import ASHA
from syne_tune import Tuner, StoppingCriterion
from syne_tune.experiments import load_experiment
   pip install 'syne-tune[aws]'
or (for everything)
   pip install 'syne-tune[extra]'
   pip install 'syne-tune[aws]'
or (for everything)
   pip install 'syne-tune[extra]'
   pip install 'syne-tune[raytune]'
or (for everything)
   pip install 'syne-tune[extra]'
   pip install 'syne-tune[aws]'
or (for everything)
   pip install 'syne-tune[extra]'

Objective with Epochs as Budget

The objective reports validation error after every epoch, so the scheduler can decide whether to stop, continue, or promote a trial without waiting for full training:

def hpo_objective_lenet_synetune(learning_rate, batch_size, max_epochs):
    from d2l import torch as d2l
    from syne_tune import Reporter

    model = d2l.LeNet(lr=learning_rate, num_classes=10)
    trainer = d2l.HPOTrainer(max_epochs=1, num_gpus=1)
    data = d2l.FashionMNIST(batch_size=batch_size)
    model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
    report = Reporter()
    for epoch in range(1, max_epochs + 1):
        if epoch == 1:
            # Initialize the state of Trainer
            trainer.fit(model=model, data=data)
        else:
            trainer.fit_epoch()
        validation_error = d2l.numpy(trainer.validation_error().cpu())
        report(epoch=epoch, validation_error=float(validation_error))

ASHA Configuration

Rung budgets grow geometrically:

r_i = r_{\min}\eta^i,\quad r_i \le r_{\max}.

With \eta=2, roughly half the trials advance at each rung and survivors receive twice as much training budget:

min_number_of_epochs = 2
max_number_of_epochs = 10
eta = 2

config_space = {
    "learning_rate": loguniform(1e-2, 1),
    "batch_size": randint(32, 256),
    "max_epochs": max_number_of_epochs,
}
initial_config = {
    "learning_rate": 0.1,
    "batch_size": 128,
}

ASHA scheduler

Per-rung “halving books”: each book holds the configs that have completed up to budget r_i. When a worker frees up, look across all rungs for any config that qualifies for promotion (top 1/\eta at its rung); if none, sample a fresh config:

Promotion rule: after at least \eta trials are observed at rung i, promote a config only if it is in the best \lfloor n_i/\eta \rfloor scores at that rung.

n_workers = 2  # Needs to be <= the number of available GPUs
max_wallclock_time = 2 * 60  # 2 minutes
mode = "min"
metric = "validation_error"
resource_attr = "epoch"

scheduler = ASHA(
    config_space,
    metric=metric,
    mode=mode,
    points_to_evaluate=[initial_config],
    max_resource_attr="max_epochs",
    resource_attr=resource_attr,
    grace_period=min_number_of_epochs,
    reduction_factor=eta,
)
INFO:syne_tune.optimizer.schedulers.fifo:max_resource_level = 10, as inferred from config_space
INFO:syne_tune.optimizer.schedulers.fifo:Master random_seed = 3592025114

ASHA scheduler (cont.)

The tuner launch is intentionally omitted from the lecture slide: its console output is backend bookkeeping, not a conceptual step. The relevant result is the incumbent curve loaded from the completed experiment.

d2l.set_figsize()
e = load_experiment(tuner.name)
e.plot()

Running ASHA

d2l.set_figsize([6, 2.5])
results = e.results
for trial_id in results.trial_id.unique():
    df = results[results["trial_id"] == trial_id]
    d2l.plt.plot(
        df["st_tuner_time"],
        df["validation_error"],
        marker="o"
    )
d2l.plt.xlabel("wall-clock time")
d2l.plt.ylabel("objective function")

Text(0, 0.5, 'objective function')

Recap

  • ASHA = SH with no rung-boundary wait — promote qualified configs as soon as they qualify.
  • Workers are always busy; throughput scales much closer to linearly with the number of workers.
  • Practical default for large-scale HPO. SyneTune / Ray Tune both ship ASHA out of the box.