def RandomHorizontalFlip():
def aug(img):
img_tf = tf.constant(np.array(img))
img_tf = tf.image.random_flip_left_right(img_tf)
return Image.fromarray(img_tf.numpy())
return aug
apply(img, RandomHorizontalFlip())Image augmentation multiplies the dataset’s effective size for free: apply small, label-preserving perturbations on the fly during training (flips, crops, color jitter). Each example is seen many times but never the exact same way.
Two effects:
Modern pipelines: random erasing, mixup, cutmix, RandAugment — same idea, more aggressive.
Load a sample image and a helper to display a grid of augmented samples:
The important object is the augmentation distribution, not the helper function used to plot it.
All augmentation examples start from this single image. The label is assumed to stay valid after each transform; that label-preserving assumption is what makes augmentation useful.
Random horizontal flip — the cheapest, most-used augmentation. It is safe when left/right orientation is not part of the label:
Vertical flip — used selectively. It is valid for some textures or aerial imagery, but usually wrong for faces, street scenes, and text:
Crop a random rectangle, resize back to the input size. The single most effective augmentation in vision: scale invariance and translation invariance in one trick:
def RandomResizedCrop(size, scale=(0.1, 1), ratio=(0.5, 2)):
target_h, target_w = size
def aug(img):
img_tf = tf.constant(np.array(img))
h, w = tf.shape(img_tf)[0], tf.shape(img_tf)[1]
area = tf.cast(h * w, tf.float32)
log_ratio = (tf.math.log(float(ratio[0])), tf.math.log(float(ratio[1])))
target_area = tf.random.uniform([], scale[0], scale[1]) * area
aspect = tf.exp(tf.random.uniform([], log_ratio[0], log_ratio[1]))
crop_h = tf.cast(tf.round(tf.sqrt(target_area / aspect)), tf.int32)
crop_w = tf.cast(tf.round(tf.sqrt(target_area * aspect)), tf.int32)
crop_h = tf.minimum(crop_h, h)
crop_w = tf.minimum(crop_w, w)
offset_h = tf.random.uniform([], 0, h - crop_h + 1, dtype=tf.int32)
offset_w = tf.random.uniform([], 0, w - crop_w + 1, dtype=tf.int32)
img_tf = tf.image.crop_to_bounding_box(img_tf, offset_h, offset_w,
crop_h, crop_w)
img_tf = tf.cast(img_tf, tf.float32)
img_tf = tf.image.resize(img_tf, [target_h, target_w])
img_tf = tf.cast(img_tf, tf.uint8)
return Image.fromarray(img_tf.numpy())
return aug
shape_aug = RandomResizedCrop((200, 200), scale=(0.1, 1), ratio=(0.5, 2))
apply(img, shape_aug)Brightness jitter changes illumination without moving object geometry. The object should still be recognizable across the sample grid:
def RandomBrightness(max_delta):
def aug(img):
img_tf = tf.cast(tf.constant(np.array(img)), tf.float32) / 255.0
img_tf = tf.image.random_brightness(img_tf, max_delta)
img_tf = tf.clip_by_value(img_tf, 0.0, 1.0)
return Image.fromarray((img_tf.numpy() * 255).astype(np.uint8))
return aug
apply(img, RandomBrightness(0.5))Hue jitter changes color statistics. Use it only when color is not itself the label signal:
Brightness, contrast, saturation, hue — all at once. Tame the magnitudes; large jitters destroy semantic content:
def RandomColorJitter(brightness=0, contrast=0, saturation=0, hue=0):
def aug(img):
img_tf = tf.cast(tf.constant(np.array(img)), tf.float32) / 255.0
if brightness > 0:
img_tf = tf.image.random_brightness(img_tf, brightness)
if contrast > 0:
img_tf = tf.image.random_contrast(img_tf, 1 - contrast,
1 + contrast)
if saturation > 0:
img_tf = tf.image.random_saturation(img_tf, 1 - saturation,
1 + saturation)
if hue > 0:
img_tf = tf.image.random_hue(img_tf, hue)
img_tf = tf.clip_by_value(img_tf, 0.0, 1.0)
return Image.fromarray((img_tf.numpy() * 255).astype(np.uint8))
return aug
color_aug = RandomColorJitter(brightness=0.5, contrast=0.5, saturation=0.5,
hue=0.5)
apply(img, color_aug)Compose([flip, crop, color, ToTensor]) — a pipeline of transforms applied in order. Standard recipe:
Train CIFAR-10 ResNet18 with and without augmentation. Same model, same hyperparameters — augmentation just transforms the data loader output:
This separation matters: evaluation should measure the trained classifier, not randomness in the augmentation pipeline.
The batch should still look like valid CIFAR-10 examples, just shifted, cropped, and flipped. If many objects are cropped out, the augmentation is too aggressive.
d = cPickle.load(f, encoding="bytes")
The training helper has no augmentation-specific logic.
It receives already-transformed minibatches from the data loader, then performs the usual supervised update:
\mathbf{x}' \sim a(\mathbf{x}), \qquad \min_\theta \ell(f_\theta(\mathbf{x}'), y).
That is the clean abstraction: augment in input pipeline, train in optimization loop.
Augmentation often slows training loss at first because the problem is harder. The payoff is lower validation error from reduced overfitting.
loss 0.179, train acc 0.937, test acc 0.800
715.7 examples/sec