import tensorflow as tf
from d2l import tensorflow as d2lA standard convolution + pooling stack reduces spatial resolution. For dense prediction (semantic segmentation, generative models, super-resolution) we need to go the other way — upsample features back to image resolution.
The standard tool: transposed convolution, also called “deconvolution” (a misnomer — it’s not a true inverse). Each input element broadcasts a full kernel into the output, contributions from neighbors get summed:
A 2 \times 2 transposed convolution: each input element scatters its kernel into the output.
Output shape grows: with stride 1, kernel k, no padding, n_{\text{out}} = n_{\text{in}} + k - 1. With stride s, multiplied accordingly.
The hand-written implementation should match the framework operator. If the shape or values differ, the usual culprits are padding semantics or channel layout:
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[ 0., 0., 1.],
[ 0., 4., 6.],
[ 4., 12., 9.]], dtype=float32)>
Same result via the framework op (PyTorch ConvTranspose2d, etc.):
Padding here removes output rows/columns instead of adding them — it’s the inverse interpretation.
Stride > 1 inserts zeros between input elements before the scatter — that’s how transposed conv upsamples:
Stride-2 transposed conv: each input element’s kernel is placed at twice-spaced positions, then summed.
Multi-channel works as expected: input channels reduce-add through the kernel, output channels stack in parallel:
A standard convolution can be written as a sparse matrix multiplication \mathbf{y} = \mathbf{K}\mathbf{x} where \mathbf{K} encodes the kernel + stride + padding.
A transposed convolution multiplies by the transpose: \mathbf{x}' = \mathbf{K}^\top \mathbf{y}. That’s where the name comes from.
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[27., 37.],
[57., 67.]], dtype=float32)>
<tf.Tensor: shape=(4, 9), dtype=float32, numpy=
array([[1., 2., 0., 3., 4., 0., 0., 0., 0.],
[0., 1., 2., 0., 3., 4., 0., 0., 0.],
[0., 0., 0., 1., 2., 0., 3., 4., 0.],
[0., 0., 0., 0., 1., 2., 0., 3., 4.]], dtype=float32)>
<tf.Tensor: shape=(2, 2), dtype=bool, numpy=
array([[ True, True],
[ True, True]])>