apply(img, torchvision.transforms.RandomHorizontalFlip())Image augmentation multiplies the dataset’s effective size for free: apply small, label-preserving perturbations on the fly during training (flips, crops, color jitter). Each example is seen many times but never the exact same way.
Two effects:
Modern pipelines: random erasing, mixup, cutmix, RandAugment — same idea, more aggressive.
Load a sample image and a helper to display a grid of augmented samples:
The important object is the augmentation distribution, not the helper function used to plot it.
All augmentation examples start from this single image. The label is assumed to stay valid after each transform; that label-preserving assumption is what makes augmentation useful.
Random horizontal flip — the cheapest, most-used augmentation. It is safe when left/right orientation is not part of the label:
Crop a random rectangle, resize back to the input size. The single most effective augmentation in vision: scale invariance and translation invariance in one trick:
Brightness jitter changes illumination without moving object geometry. The object should still be recognizable across the sample grid:
Hue jitter changes color statistics. Use it only when color is not itself the label signal:
Brightness, contrast, saturation, hue — all at once. Tame the magnitudes; large jitters destroy semantic content:
Compose([flip, crop, color, ToTensor]) — a pipeline of transforms applied in order. Standard recipe:
Train CIFAR-10 ResNet18 with and without augmentation. Same model, same hyperparameters — augmentation just transforms the data loader output:
This separation matters: evaluation should measure the trained classifier, not randomness in the augmentation pipeline.
The batch should still look like valid CIFAR-10 examples, just shifted, cropped, and flipped. If many objects are cropped out, the augmentation is too aggressive.
The training helper has no augmentation-specific logic.
It receives already-transformed minibatches from the data loader, then performs the usual supervised update:
\mathbf{x}' \sim a(\mathbf{x}), \qquad \min_\theta \ell(f_\theta(\mathbf{x}'), y).
That is the clean abstraction: augment in input pipeline, train in optimization loop.
Augmentation often slows training loss at first because the problem is harder. The payoff is lower validation error from reduced overfitting.
loss 0.209, train acc 0.928, test acc 0.833
7039.8 examples/sec on [device(type='cuda', index=0)]