Asynchronous Successive Halving

Synchronous successive halving has a problem: at each rung, you wait for all surviving configs to finish before promoting. With multiple workers, fast configs finish first and idle waiting for stragglers.

Synchronous SH: workers idle while waiting for slow trials at the rung boundary.

ASHA (Asynchronous Successive Halving) fixes this: promote configs to the next rung the moment they qualify, without waiting for the rest of the cohort. Workers always have something to do.

ASHA: free worker, qualified config, instant promotion.

State-of-the-art-grade HPO performance, with one parallelism mechanism that handles both early stopping and parallel dispatch.

ASHA Setup

Import Syne Tune’s ASHA scheduler and reuse the same objective family as the random-search deck. The only new idea is that max_epochs becomes the resource budget:

from d2l import torch as d2l
import logging
# Use INFO level so the periodic Syne Tune tuning-status table appears,
# but use a clean format that drops the "INFO:syne_tune.tuner:" prefix.
logging.basicConfig(level=logging.INFO, format="%(message)s", force=True)
import matplotlib.pyplot as plt
# Silence Syne Tune's import-time chatter about optional AWS dependencies
# (sagemaker, s3fs) and Ray Tune. We use the local PythonBackend, so those
# are not needed. Suppress both print() and logging.info() during imports.
import contextlib, io
_root = logging.getLogger()
_prev_level = _root.level
_root.setLevel(logging.WARNING)
try:
    with contextlib.redirect_stdout(io.StringIO()):
        from syne_tune.config_space import loguniform, randint
        from syne_tune.backend.python_backend.python_backend import PythonBackend
        from syne_tune.optimizer.baselines import ASHA
        from syne_tune import Tuner, StoppingCriterion
        from syne_tune.experiments import load_experiment
finally:
    _root.setLevel(_prev_level)

# Silence the per-trial subprocess-command spam from local_backend and
# drop the per-trial scheduling / completion lines from the tuner logger.
# Keep the periodic "tuning status (last metric is reported)" updates so
# the reader can still see progress over time.
class _DropPerTrialNoise(logging.Filter):
    _DROP = (
        "results of trials will be saved",
        "scheduled ",
        "Trial trial_id ",
    )
    def filter(self, record):
        msg = record.getMessage()
        return not any(s in msg for s in self._DROP)

logging.getLogger("syne_tune.backend.local_backend").setLevel(logging.WARNING)
logging.getLogger("syne_tune.tuner").addFilter(_DropPerTrialNoise())

Objective with Epochs as Budget

The objective reports validation error after every epoch, so the scheduler can decide whether to stop, continue, or promote a trial without waiting for full training:

def hpo_objective_lenet_synetune(learning_rate, batch_size, max_epochs):
    from d2l import torch as d2l
    from syne_tune import Reporter

    model = d2l.LeNet(lr=learning_rate, num_classes=10)
    trainer = d2l.HPOTrainer(max_epochs=1, num_gpus=1)
    data = d2l.FashionMNIST(batch_size=batch_size)
    model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
    report = Reporter()
    for epoch in range(1, max_epochs + 1):
        if epoch == 1:
            # Initialize the state of Trainer
            trainer.fit(model=model, data=data)
        else:
            trainer.fit_epoch()
        validation_error = d2l.numpy(trainer.validation_error().cpu())
        report(epoch=epoch, validation_error=float(validation_error))

ASHA Configuration

Rung budgets grow geometrically:

r_i = r_{\min}\eta^i,\quad r_i \le r_{\max}.

With \eta=2, roughly half the trials advance at each rung and survivors receive twice as much training budget:

min_number_of_epochs = 2
max_number_of_epochs = 10
eta = 2

config_space = {
    "learning_rate": loguniform(1e-2, 1),
    "batch_size": randint(32, 256),
    "max_epochs": max_number_of_epochs,
}
initial_config = {
    "learning_rate": 0.1,
    "batch_size": 128,
}

ASHA scheduler

Per-rung “halving books”: each book holds the configs that have completed up to budget r_i. When a worker frees up, look across all rungs for any config that qualifies for promotion (top 1/\eta at its rung); if none, sample a fresh config:

Promotion rule: after at least \eta trials are observed at rung i, promote a config only if it is in the best \lfloor n_i/\eta \rfloor scores at that rung.

# Each LeNet trial fits in well under 7 GB of GPU memory, so we can pack
# multiple trials per device. `PythonBackend(rotate_gpus=True)` (the
# default) round-robins trials across detected GPUs and falls back to
# sharing when `n_workers > num_gpus`. Allocate 7 GB per slot — this
# yields 3 slots on a 24 GB card and 4 slots on a 32 GB card after
# driver overhead, e.g. 4×24 GB → 12 slots; 2×32 GB → 8.
import torch
_GB = 1024 ** 3
n_workers = sum(
    torch.cuda.get_device_properties(i).total_memory // (7 * _GB)
    for i in range(torch.cuda.device_count())
) or 1
max_wallclock_time = 15 * 60  # 15 minutes

mode = "min"
metric = "validation_error"
resource_attr = "epoch"

scheduler = ASHA(
    config_space,
    metric=metric,
    mode=mode,
    points_to_evaluate=[initial_config],
    max_resource_attr="max_epochs",
    resource_attr=resource_attr,
    grace_period=min_number_of_epochs,
    reduction_factor=eta,
)

max_resource_level = 10, as inferred from config_space
Master random_seed = 3937097272

ASHA scheduler (cont.)

The tuner launch is intentionally omitted from the lecture slide: its console output is backend bookkeeping, not a conceptual step. The relevant result is the incumbent curve loaded from the completed experiment.

d2l.set_figsize()
e = load_experiment(tuner.name)
e.plot()

Running ASHA

d2l.set_figsize([6, 2.5])
results = e.results
for trial_id in results.trial_id.unique():
    df = results[results["trial_id"] == trial_id]
    d2l.plt.plot(
        df["st_tuner_time"],
        df["validation_error"],
        marker="o"
    )
d2l.plt.xlabel("wall-clock time")
d2l.plt.ylabel("objective function")

Text(0, 0.5, 'objective function')

Recap

ASHA = SH with no rung-boundary wait — promote qualified configs as soon as they qualify.
Workers are always busy; throughput scales much closer to linearly with the number of workers.
Practical default for large-scale HPO. SyneTune / Ray Tune both ship ASHA out of the box.