Deep Convolutional Generative Adversarial Networks

DCGAN

DCGAN (Radford, Metz, Chintala 2015) — the recipe that made image GANs work in practice. Architectural rules that became standard:

  • All-convolutional generator (transposed convs to upsample, no fully connected layers).
  • All-convolutional discriminator (strided convs to downsample).
  • Batch normalization in both networks.
  • ReLU in generator (Tanh on output), LeakyReLU in discriminator.
  • Adam optimizer, learning rate 0.0002, \beta_1 = 0.5.

These choices transformed GANs from “interesting but unstable” to a workable image-generation tool.

This deck trains a DCGAN to generate Pokémon sprites.

Pokémon dataset

Small image dataset — perfect size for a teaching demo of DCGAN. Resize to 64×64, normalize to [-1, 1] (matches generator’s tanh output range):

  • images are resized to 64 \times 64;
  • pixel values are scaled to [-1, 1];
  • the generator’s last activation is tanh, so real and generated images live in the same numeric range.

Inspecting samples

Use these real images to calibrate the task: low resolution, centered objects, and normalized colors. The generator only has to match this sprite distribution, not produce arbitrary photographs.

Generator block

TransposedConv → BatchNorm → ReLU. Stack five of these to upsample 1 \times 1 noise to 64 \times 64 pixels:

  • each block doubles spatial resolution;
  • BatchNorm stabilizes feature scales;
  • ReLU keeps gradients healthy;
  • the final block uses tanh instead of ReLU.

Generator architecture

Five generator blocks; final layer projects to 3 channels with tanh:

\mathbf{z}\in\mathbb{R}^{d} \rightarrow 4{\times}4 \rightarrow 8{\times}8 \rightarrow 16{\times}16 \rightarrow 32{\times}32 \rightarrow 64{\times}64{\times}3.

Pedagogically: the generator learns an upsampling map from a latent code to an image, not a pixel-by-pixel lookup table.

Discriminator block

Conv → BatchNorm → LeakyReLU. Mirrored architecture: five blocks downsampling 64 \times 64 to 1 \times 1:

  • strided convolutions replace pooling;
  • LeakyReLU avoids dead activations;
  • BatchNorm is skipped on the input layer;
  • the output is a real/fake logit.

Discriminator architecture

64{\times}64{\times}3 \rightarrow 32{\times}32 \rightarrow 16{\times}16 \rightarrow 8{\times}8 \rightarrow 4{\times}4 \rightarrow 1.

This is a learned critic: high score for real images, low score for generated images. The generator trains against this signal.

Training

Same minimax loss as basic GAN; per-step alternation of D then G updates with the DCGAN-recommended Adam hyperparameters:

  • update D on real images labeled 1;
  • update D on generated images labeled 0;
  • update G so generated images make D predict 1.

This alternating game is why GAN training is unstable: the target distribution moves every time either player improves.

Training trace

Read the loss curves together with the generated samples. Healthy training usually shows neither player instantly winning forever; the more important sign is that generated sprites become sharper and more Pokemon-like over epochs.

loss_D 0.076, loss_G 4.777, 7016.5 examples/sec

Recap

  • DCGAN = standard architectural recipe for image GANs: all-conv generator/discriminator, BatchNorm, Tanh/LeakyReLU.
  • Adam(0.0002, \beta_1=0.5) is the magic LR/momentum combo that stabilized training in 2015.
  • Modern image generators (StyleGAN, BigGAN, diffusion models) supersede DCGAN; the architectural lessons (all-conv, normalization, careful activation choice) carry over.