Deep Convolutional Generative Adversarial Networks

DCGAN

DCGAN (Radford, Metz, Chintala 2015) — the recipe that made image GANs work in practice. Architectural rules that became standard:

All-convolutional generator (transposed convs to upsample, no fully connected layers).
All-convolutional discriminator (strided convs to downsample).
Batch normalization in both networks.
ReLU in generator (Tanh on output), LeakyReLU in discriminator.
Adam optimizer, learning rate 0.0002, \beta_1 = 0.5.

These choices transformed GANs from “interesting but unstable” to a workable image-generation tool.

This deck trains a DCGAN to generate Pokémon sprites.

Pokémon dataset

Small image dataset — perfect size for a teaching demo of DCGAN. Resize to 64×64, normalize to [-1, 1] (matches generator’s tanh output range):

images are resized to 64 \times 64;
pixel values are scaled to [-1, 1];
the generator’s last activation is tanh, so real and generated images live in the same numeric range.

Inspecting samples

Use these real images to calibrate the task: low resolution, centered objects, and normalized colors. The generator only has to match this sprite distribution, not produce arbitrary photographs.

Generator block

TransposedConv → BatchNorm → ReLU. Stack five of these to upsample 1 \times 1 noise to 64 \times 64 pixels:

each block doubles spatial resolution;
BatchNorm stabilizes feature scales;
ReLU keeps gradients healthy;
the final block uses tanh instead of ReLU.

Generator architecture

Five generator blocks; final layer projects to 3 channels with tanh:

\mathbf{z}\in\mathbb{R}^{d} \rightarrow 4{\times}4 \rightarrow 8{\times}8 \rightarrow 16{\times}16 \rightarrow 32{\times}32 \rightarrow 64{\times}64{\times}3.

Pedagogically: the generator learns an upsampling map from a latent code to an image, not a pixel-by-pixel lookup table.

Discriminator block

Conv → BatchNorm → LeakyReLU. Mirrored architecture: five blocks downsampling 64 \times 64 to 1 \times 1:

strided convolutions replace pooling;
LeakyReLU avoids dead activations;
BatchNorm is skipped on the input layer;
the output is a real/fake logit.

Discriminator architecture

64{\times}64{\times}3 \rightarrow 32{\times}32 \rightarrow 16{\times}16 \rightarrow 8{\times}8 \rightarrow 4{\times}4 \rightarrow 1.

This is a learned critic: high score for real images, low score for generated images. The generator trains against this signal.

Training

Same minimax loss as basic GAN; per-step alternation of D then G updates with the DCGAN-recommended Adam hyperparameters:

update D on real images labeled 1;
update D on generated images labeled 0;
update G so generated images make D predict 1.

This alternating game is why GAN training is unstable: the target distribution moves every time either player improves.

Training trace

Read the loss curves together with the generated samples. Healthy training usually shows neither player instantly winning forever; the more important sign is that generated sprites become sharper and more Pokemon-like over epochs.

loss_D 0.183, loss_G 3.750, 5845.4 examples/sec

Recap

DCGAN = standard architectural recipe for image GANs: all-conv generator/discriminator, BatchNorm, Tanh/LeakyReLU.
Adam(0.0002, \beta_1=0.5) is the magic LR/momentum combo that stabilized training in 2015.
Modern image generators (StyleGAN, BigGAN, diffusion models) supersede DCGAN; the architectural lessons (all-conv, normalization, careful activation choice) carry over.