Image Augmentation

Image augmentation multiplies the dataset’s effective size for free: apply small, label-preserving perturbations on the fly during training (flips, crops, color jitter). Each example is seen many times but never the exact same way.

Two effects:

More data = less overfitting.
The model learns invariance to whatever you apply.

Modern pipelines: random erasing, mixup, cutmix, RandAugment — same idea, more aggressive.

Setup

Load a sample image and a helper to display a grid of augmented samples:

start from one image;
sample a random transform several times;
visualize a grid so the transform distribution is visible, not just one lucky draw.

The important object is the augmentation distribution, not the helper function used to plot it.

Reference image

All augmentation examples start from this single image. The label is assumed to stay valid after each transform; that label-preserving assumption is what makes augmentation useful.

Flips and crops

Random horizontal flip — the cheapest, most-used augmentation. It is safe when left/right orientation is not part of the label:

apply(img, torchvision.transforms.RandomHorizontalFlip())

Vertical flip — used selectively. It is valid for some textures or aerial imagery, but usually wrong for faces, street scenes, and text:

apply(img, torchvision.transforms.RandomVerticalFlip())

Random resized crop

Crop a random rectangle, resize back to the input size. The single most effective augmentation in vision: scale invariance and translation invariance in one trick:

shape_aug = torchvision.transforms.RandomResizedCrop(
    (200, 200), scale=(0.1, 1), ratio=(0.5, 2))
apply(img, shape_aug)

Color jitter — brightness

Brightness jitter changes illumination without moving object geometry. The object should still be recognizable across the sample grid:

apply(img, torchvision.transforms.ColorJitter(
    brightness=0.5, contrast=0, saturation=0, hue=0))

Color jitter — hue

Hue jitter changes color statistics. Use it only when color is not itself the label signal:

apply(img, torchvision.transforms.ColorJitter(
    brightness=0, contrast=0, saturation=0, hue=0.5))

Combined color jitter

Brightness, contrast, saturation, hue — all at once. Tame the magnitudes; large jitters destroy semantic content:

color_aug = torchvision.transforms.ColorJitter(
    brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
apply(img, color_aug)

Composing augmentations

Compose([flip, crop, color, ToTensor]) — a pipeline of transforms applied in order. Standard recipe:

augs = torchvision.transforms.Compose([
    torchvision.transforms.RandomHorizontalFlip(), color_aug, shape_aug])
apply(img, augs)

Training with augmentation

Train CIFAR-10 ResNet18 with and without augmentation. Same model, same hyperparameters — augmentation just transforms the data loader output:

training loader: random crop + random horizontal flip;
test loader: deterministic normalization only;
model and optimizer stay unchanged.

This separation matters: evaluation should measure the trained classifier, not randomness in the augmentation pipeline.

CIFAR-10 samples

The batch should still look like valid CIFAR-10 examples, just shifted, cropped, and flipped. If many objects are cropped out, the augmentation is too aggressive.

Training helper

The training helper has no augmentation-specific logic.

It receives already-transformed minibatches from the data loader, then performs the usual supervised update:

\mathbf{x}' \sim a(\mathbf{x}), \qquad \min_\theta \ell(f_\theta(\mathbf{x}'), y).

That is the clean abstraction: augment in input pipeline, train in optimization loop.

Train it

Augmentation often slows training loss at first because the problem is harder. The payoff is lower validation error from reduced overfitting.

loss 0.218, train acc 0.925, test acc 0.825
7055.8 examples/sec on [device(type='cuda', index=0)]

Recap

Augmentation = label-preserving random perturbations applied each epoch — effectively multiplies the dataset size.
Standard recipe: random horizontal flip + random resized crop + light color jitter.
Modern aggressive variants (RandAugment, mixup, cutmix, AutoAugment) push accuracy further with the same data.
Apply only at training time; eval uses center crop and no jitter.