AnyNet stem

Designing Convolutional Network Architectures

From hand design to design spaces

We’ve seen a sequence of hand-designed architectures (LeNet → AlexNet → VGG → GoogLeNet → ResNet → DenseNet) — each a hypothesis about what makes nets work.

Can we design networks more systematically?

The AnyNet design space

The AnyNet design space.

Stem (low-level conv) → 4 stages of residual blocks → head (global pool + linear). Each stage’s depth, width, group count are free parameters:

from d2l import jax as d2l
from flax import linen as nn

The stem is deliberately plain: one stride-2 3×3 convolution, BatchNorm, ReLU. Its job is to halve resolution and create the first feature channels before the repeated stages begin.

class AnyNet(d2l.Classifier):
    arch: tuple
    stem_channels: int
    lr: float = 0.1
    num_classes: int = 10
    training: bool = True

    def setup(self):
        self.net = self.create_net()

    def stem(self, num_channels):
        return nn.Sequential([
            nn.Conv(num_channels, kernel_size=(3, 3), strides=(2, 2),
                    padding=(1, 1)),
            nn.BatchNorm(not self.training),
            nn.relu
        ])

AnyNet stage

Each stage repeats the same ResNeXt block. The first block uses stride 2 and a 1×1 skip projection to change resolution and channel count; the rest preserve shape.

def stage(self, depth, num_channels, groups, bot_mul):
    blk = []
    for i in range(depth):
        if i == 0:
            blk.append(d2l.ResNeXtBlock(num_channels, groups, bot_mul,
                use_1x1conv=True, strides=(2, 2), training=self.training))
        else:
            blk.append(d2l.ResNeXtBlock(num_channels, groups, bot_mul,
                                        training=self.training))
    return nn.Sequential(blk)

AnyNet assembly

The architecture tuple supplies (depth, channels, groups, bottleneck) per stage. The head is the now-standard global average pool + linear classifier.

def create_net(self):
    net = nn.Sequential([self.stem(self.stem_channels)])
    for i, s in enumerate(self.arch):
        net.layers.extend([self.stage(*s)])
    net.layers.extend([nn.Sequential([
        lambda x: nn.avg_pool(x, window_shape=x.shape[1:3],
                            strides=x.shape[1:3], padding='valid'),
        lambda x: x.reshape((x.shape[0], -1)),
        nn.Dense(self.num_classes)])])
    return net

RegNet design-space evidence

Comparing error empirical distribution functions of design spaces.

RegNet narrows AnyNet with simple constraints: stage widths grow approximately linearly, bottleneck ratios stay fixed, and group widths are shared across stages. The result is a smaller search space with better probability of good models.

A RegNetX-3.2GF instance

The paper’s empirical findings collapse to: width grows linearly with stage, depth stays roughly constant, ResNeXt-style groups. A scaled-down version for Fashion-MNIST:

class RegNetX32(AnyNet):
    lr: float = 0.1
    num_classes: int = 10
    stem_channels: int = 32
    arch: tuple = ((4, 32, 16, 1), (6, 80, 16, 1))
RegNetX32(training=False).layer_summary((1, 96, 96, 1))
Sequential output shape:     (1, 48, 48, 32)
Sequential output shape:     (1, 24, 24, 32)
Sequential output shape:     (1, 12, 12, 80)
Sequential output shape:     (1, 10)

Training

model = RegNetX32(lr=0.05)
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
data = d2l.FashionMNIST(batch_size=128, resize=(96, 96))
trainer.fit(model, data)

The architecture is competitive with hand-designed ResNets at similar parameter counts — and the discovery process scales trivially with compute.

Recap

  • Modern architecture design = search over a parametric design space, not heroic engineering.
  • AnyNet specifies the template (stem / 4 stages / head); the empirical search picks widths, depths, and groups.
  • Resulting networks (RegNet) match or beat hand-designed rivals with simpler, more interpretable rules.
  • Sets the stage for NAS (neural architecture search) and the modern philosophy: pick the design space carefully, then let compute find the best instance.