%matplotlib inline
import os
from d2l import tensorflow as d2l
import tensorflow as tf
import kerasYou’ll rarely train a vision model from scratch. Transfer learning — start from weights pretrained on a big dataset (ImageNet) and adapt to your small one — is the default recipe.
Fine-tuning: pretrained backbone + new task-specific head.
A tiny binary classification dataset (hot dog / not hot dog) — too small to train a CNN from scratch, perfect for transfer learning:
from PIL import Image as _PILImage
import pathlib
def _load_image_folder(path):
"""Load images from a directory with class subfolders, returning
a list of (PIL.Image, class_index) tuples."""
path = pathlib.Path(path)
class_names = sorted([p.name for p in path.iterdir() if p.is_dir()])
class_to_idx = {c: i for i, c in enumerate(class_names)}
items = []
for cls in class_names:
for img_path in sorted((path / cls).iterdir()):
try:
img = _PILImage.open(str(img_path)).convert('RGB')
items.append((img, class_to_idx[cls]))
except Exception:
continue
return items
train_imgs = _load_image_folder(os.path.join(data_dir, 'train'))
test_imgs = _load_image_folder(os.path.join(data_dir, 'test'))Standard ImageNet recipe — random resized crop + flip for training, center crop for eval. Match the preprocessing convention that the pretrained model expects:
# Plain tf.image / tf.data preprocessing for Keras ResNet50 (NHWC). Keras
# ResNet50 expects its own `preprocess_input` convention, not PyTorch-style
# RGB mean/std normalization.
IMG_SIZE = 224
def _normalize(x):
return tf.keras.applications.resnet50.preprocess_input(
tf.cast(x, tf.float32))
def train_augs(x, training=False):
# Input is (256, 256, 3) — already resized by image_dataset_from_directory.
x = tf.image.random_crop(x, (IMG_SIZE, IMG_SIZE, 3))
x = tf.image.random_flip_left_right(x)
return _normalize(x)
def test_augs(x, training=False):
# Input is (256, 256, 3) — already resized by image_dataset_from_directory.
# Center crop to IMG_SIZE x IMG_SIZE.
off = (256 - IMG_SIZE) // 2
x = x[off:off + IMG_SIZE, off:off + IMG_SIZE, :]
return _normalize(x)The source model was trained for 1000 ImageNet classes. Its convolutional body is reusable; the final classifier is task-specific and will be replaced:
Create a target model with the same pretrained backbone and a randomly initialized 2-way classifier for hot dog vs. not hot dog:
<Dense name=predictions, built=True>
Let \theta_b be pretrained backbone parameters and \theta_h the new head. Use a small step on \theta_b and a larger one on \theta_h:
\eta_b = \eta,\qquad \eta_h = 10\eta.
# Pretrained ResNet50 base (no top) + global average pool + fresh 2-class head
finetune_net = keras.Sequential([
keras.applications.ResNet50(weights='imagenet', include_top=False,
pooling='avg',
input_shape=(IMG_SIZE, IMG_SIZE, 3)),
keras.layers.Dense(2, kernel_initializer='glorot_uniform',
name='classifier'),
])The helper hides framework details: parameter groups, optimizer construction, metric logging, and the scratch/fine-tune switch. The four-step pattern is:
With matched ImageNet preprocessing and a small base LR, the pretrained model should reach useful accuracy quickly. The point is not just a better final score; it is much less data and compute than training the same network cold.
epoch 1, loss 0.879, train acc 0.516, test acc 0.564
epoch 2, loss 0.773, train acc 0.566, test acc 0.635
epoch 3, loss 0.710, train acc 0.610, test acc 0.688
epoch 4, loss 0.660, train acc 0.663, test acc 0.716
epoch 5, loss 0.603, train acc 0.692, test acc 0.738
Same architecture, no pretraining. Much worse on this small dataset — illustrates why transfer learning is the default:
# Train from scratch: same architecture but with random (no-pretrain) weights.
scratch_net = keras.Sequential([
keras.applications.ResNet50(weights=None, include_top=False,
pooling='avg',
input_shape=(IMG_SIZE, IMG_SIZE, 3)),
keras.layers.Dense(2, kernel_initializer='glorot_uniform',
name='classifier'),
])
train_fine_tuning(scratch_net, 5e-4, param_group=False)epoch 1, loss 0.678, train acc 0.575, test acc 0.478
epoch 2, loss 0.676, train acc 0.572, test acc 0.479
epoch 3, loss 0.667, train acc 0.592, test acc 0.520
epoch 4, loss 0.656, train acc 0.608, test acc 0.535
epoch 5, loss 0.645, train acc 0.631, test acc 0.539
The natural ablations are: freeze more or fewer layers, change the backbone/head learning-rate ratio, and compare against the source ImageNet “hotdog” class weights.
preprocess_input) to what the pretrained model expects.