from d2l import tensorflow as d2l
import tensorflow as tfDenseNet (Huang et al., 2017) takes the residual idea one step further: instead of adding skip connections, concatenate them.
\mathbf{x}_\ell = f_\ell\bigl([\mathbf{x}_0, \mathbf{x}_1, \dots, \mathbf{x}_{\ell-1}]\bigr).
Every layer in a dense block sees the concatenation of all preceding outputs.
Dense block grows channels by concatenation; transition layers (1×1 conv + pool) reset channels between blocks.
Pros: maximum feature reuse, fewer parameters than ResNet for similar accuracy. Cons: memory grows linearly with depth within a block — handled by transitions.
A small conv block (BN → ReLU → 3×3 conv) is the unit; a DenseBlock will reuse it repeatedly.
class ConvBlock(tf.keras.layers.Layer):
def __init__(self, num_channels):
super(ConvBlock, self).__init__()
self.bn = tf.keras.layers.BatchNormalization()
self.relu = tf.keras.layers.ReLU()
self.conv = tf.keras.layers.Conv2D(
filters=num_channels, kernel_size=(3, 3), padding='same')
self.listLayers = [self.bn, self.relu, self.conv]
def call(self, x):
y = x
for layer in self.listLayers:
y = layer(y)
y = tf.keras.layers.concatenate([x,y], axis=-1)
return yNow stack the conv blocks. After each block, concatenate its new features onto the running input, so later blocks see everything computed so far.
A DenseBlock(num_convs=2, num_channels=10) on a 3-channel input grows channels by num_convs * num_channels per block:
TensorShape([4, 8, 8, 23])
Stops the channel explosion between dense blocks: 1×1 conv halves channels, 2×2 avg-pool halves spatial dims:
class TransitionBlock(tf.keras.layers.Layer):
def __init__(self, num_channels, **kwargs):
super(TransitionBlock, self).__init__(**kwargs)
self.batch_norm = tf.keras.layers.BatchNormalization()
self.relu = tf.keras.layers.ReLU()
self.conv = tf.keras.layers.Conv2D(num_channels, kernel_size=1)
self.avg_pool = tf.keras.layers.AvgPool2D(pool_size=2, strides=2)
def call(self, x):
x = self.batch_norm(x)
x = self.relu(x)
x = self.conv(x)
return self.avg_pool(x)A standard “stem → dense block → transition → dense block → transition → … → global avg-pool → linear” pipeline:
def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4),
lr=0.1, num_classes=10):
super(DenseNet, self).__init__()
self.save_hyperparameters()
self.net = self.b1()
for i, num_convs in enumerate(arch):
self.net.add(DenseBlock(num_convs, growth_rate))
# The number of output channels in the previous dense block
num_channels += num_convs * growth_rate
# A transition layer that halves the number of channels is added
# between the dense blocks
if i != len(arch) - 1:
num_channels //= 2
self.net.add(TransitionBlock(num_channels))
self.net.add(tf.keras.models.Sequential([
tf.keras.layers.BatchNormalization(),
tf.keras.layers.ReLU(),
tf.keras.layers.GlobalAvgPool2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(num_classes)]))DenseNet hits competitive ImageNet accuracy with far fewer parameters than equivalent ResNets — the concatenation reuse genuinely helps.