import tensorflow as tf
from d2l import tensorflow as d2lWe’ve seen a sequence of hand-designed architectures (LeNet → AlexNet → VGG → GoogLeNet → ResNet → DenseNet) — each a hypothesis about what makes nets work.
Can we design networks more systematically?
RegNet (Radosavovic et al., 2020):
AnyNet) — same template, free hyperparameters.Simple closed-form rules (“width grows linearly with stage”) outperform years of expert tuning.
The AnyNet design space.
Stem (low-level conv) → 4 stages of residual blocks → head (global pool + linear). Each stage’s depth, width, group count are free parameters:
The stem is deliberately plain: one stride-2 3×3 convolution, BatchNorm, ReLU. Its job is to halve resolution and create the first feature channels before the repeated stages begin.
Each stage repeats the same ResNeXt block. The first block uses stride 2 and a 1×1 skip projection to change resolution and channel count; the rest preserve shape.
The architecture tuple supplies (depth, channels, groups, bottleneck) per stage. The head is the now-standard global average pool + linear classifier.
def __init__(self, arch, stem_channels, lr=0.1, num_classes=10):
super(AnyNet, self).__init__()
self.save_hyperparameters()
self.net = self.stem(stem_channels)
for i, s in enumerate(arch):
self.net.add(self.stage(*s))
self.net.add(tf.keras.models.Sequential([
tf.keras.layers.GlobalAvgPool2D(),
tf.keras.layers.Dense(units=num_classes)]))Comparing error empirical distribution functions of design spaces.
RegNet narrows AnyNet with simple constraints: stage widths grow approximately linearly, bottleneck ratios stay fixed, and group widths are shared across stages. The result is a smaller search space with better probability of good models.
The paper’s empirical findings collapse to: width grows linearly with stage, depth stays roughly constant, ResNeXt-style groups. A scaled-down version for Fashion-MNIST:
Conv2D output shape: (1, 48, 48, 32)
BatchNormalization output shape: (1, 48, 48, 32)
Activation output shape: (1, 48, 48, 32)
Sequential output shape: (1, 24, 24, 32)
Sequential output shape: (1, 12, 12, 80)
Sequential output shape: (1, 10)
The architecture is competitive with hand-designed ResNets at similar parameter counts — and the discovery process scales trivially with compute.