The hot-dog dataset

Fine-Tuning

Fine-Tuning

You’ll rarely train a vision model from scratch. Transfer learning — start from weights pretrained on a big dataset (ImageNet) and adapt to your small one — is the default recipe.

Fine-tuning: pretrained backbone + new task-specific head.

The standard recipe

  1. Take a pretrained network (ResNet, ViT, etc.).
  2. Replace the output layer with a head for your task.
  3. Optionally freeze early layers; train the rest.
  4. Small LR on the pretrained part, larger LR on the new head.

Setup

%matplotlib inline
import os
from d2l import tensorflow as d2l
import tensorflow as tf
import keras

A tiny binary classification dataset (hot dog / not hot dog) — too small to train a CNN from scratch, perfect for transfer learning:

d2l.DATA_HUB['hotdog'] = (d2l.DATA_URL + 'hotdog.zip', 
                         'fba480ffa8aa7e0febbb511d181409f899b9baa5')

data_dir = d2l.download_extract('hotdog')
from PIL import Image as _PILImage
import pathlib

def _load_image_folder(path):
    """Load images from a directory with class subfolders, returning
    a list of (PIL.Image, class_index) tuples."""
    path = pathlib.Path(path)
    class_names = sorted([p.name for p in path.iterdir() if p.is_dir()])
    class_to_idx = {c: i for i, c in enumerate(class_names)}
    items = []
    for cls in class_names:
        for img_path in sorted((path / cls).iterdir()):
            try:
                img = _PILImage.open(str(img_path)).convert('RGB')
                items.append((img, class_to_idx[cls]))
            except Exception:
                continue
    return items

train_imgs = _load_image_folder(os.path.join(data_dir, 'train'))
test_imgs = _load_image_folder(os.path.join(data_dir, 'test'))
hotdogs = [train_imgs[i][0] for i in range(8)]
not_hotdogs = [train_imgs[-i - 1][0] for i in range(8)]
d2l.show_images(hotdogs + not_hotdogs, 2, 8, scale=1.4);

Augmentation pipelines

Standard ImageNet recipe — random resized crop + flip for training, center crop for eval. Match the preprocessing convention that the pretrained model expects:

# Plain tf.image / tf.data preprocessing for Keras ResNet50 (NHWC). Keras
# ResNet50 expects its own `preprocess_input` convention, not PyTorch-style
# RGB mean/std normalization.
IMG_SIZE = 224

def _normalize(x):
    return tf.keras.applications.resnet50.preprocess_input(
        tf.cast(x, tf.float32))

def train_augs(x, training=False):
    # Input is (256, 256, 3) — already resized by image_dataset_from_directory.
    x = tf.image.random_crop(x, (IMG_SIZE, IMG_SIZE, 3))
    x = tf.image.random_flip_left_right(x)
    return _normalize(x)

def test_augs(x, training=False):
    # Input is (256, 256, 3) — already resized by image_dataset_from_directory.
    # Center crop to IMG_SIZE x IMG_SIZE.
    off = (256 - IMG_SIZE) // 2
    x = x[off:off + IMG_SIZE, off:off + IMG_SIZE, :]
    return _normalize(x)

Inspect the pretrained head

The source model was trained for 1000 ImageNet classes. Its convolutional body is reusable; the final classifier is task-specific and will be replaced:

# Load pretrained ResNet50 (full model with top) to inspect the output layer
pretrained_net = keras.applications.ResNet50(weights='imagenet')

Replace the task head

Create a target model with the same pretrained backbone and a randomly initialized 2-way classifier for hot dog vs. not hot dog:

pretrained_net.layers[-1]
<Dense name=predictions, built=True>

Discriminative learning rates

Let \theta_b be pretrained backbone parameters and \theta_h the new head. Use a small step on \theta_b and a larger one on \theta_h:

\eta_b = \eta,\qquad \eta_h = 10\eta.

# Pretrained ResNet50 base (no top) + global average pool + fresh 2-class head
finetune_net = keras.Sequential([
    keras.applications.ResNet50(weights='imagenet', include_top=False,
                                pooling='avg',
                                input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    keras.layers.Dense(2, kernel_initializer='glorot_uniform',
                       name='classifier'),
])

Training helper

The helper hides framework details: parameter groups, optimizer construction, metric logging, and the scratch/fine-tune switch. The four-step pattern is:

  • build the pretrained backbone and new head;
  • assign a small learning rate to backbone parameters;
  • assign a larger learning rate to the randomly initialized head;
  • train and compare against a scratch baseline.

Run fine-tuning

With matched ImageNet preprocessing and a small base LR, the pretrained model should reach useful accuracy quickly. The point is not just a better final score; it is much less data and compute than training the same network cold.

train_fine_tuning(finetune_net, 5e-5)
epoch 1, loss 0.879, train acc 0.516, test acc 0.564
epoch 2, loss 0.773, train acc 0.566, test acc 0.635
epoch 3, loss 0.710, train acc 0.610, test acc 0.688
epoch 4, loss 0.660, train acc 0.663, test acc 0.716
epoch 5, loss 0.603, train acc 0.692, test acc 0.738

From-scratch baseline

Same architecture, no pretraining. Much worse on this small dataset — illustrates why transfer learning is the default:

# Train from scratch: same architecture but with random (no-pretrain) weights.
scratch_net = keras.Sequential([
    keras.applications.ResNet50(weights=None, include_top=False,
                                pooling='avg',
                                input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    keras.layers.Dense(2, kernel_initializer='glorot_uniform',
                       name='classifier'),
])
train_fine_tuning(scratch_net, 5e-4, param_group=False)
epoch 1, loss 0.678, train acc 0.575, test acc 0.478
epoch 2, loss 0.676, train acc 0.572, test acc 0.479
epoch 3, loss 0.667, train acc 0.592, test acc 0.520
epoch 4, loss 0.656, train acc 0.608, test acc 0.535
epoch 5, loss 0.645, train acc 0.631, test acc 0.539

What to vary

The natural ablations are: freeze more or fewer layers, change the backbone/head learning-rate ratio, and compare against the source ImageNet “hotdog” class weights.

# Freeze the ResNet50 backbone (layer 0 of the Sequential); only head trains.
finetune_net.layers[0].trainable = False
weight = pretrained_net.layers[-1].get_weights()[0]  # Shape: (2048, 1000)
hotdog_w = weight[:, 934]
hotdog_w.shape
(2048,)

Recap

  • Transfer learning: pretrained backbone + new head; almost always beats from-scratch on small / medium datasets.
  • Use small LR on the backbone (10×–100× smaller than the head LR) — pretrained features need only nudges.
  • Match input preprocessing (mean/std normalization, input size, or model-specific preprocess_input) to what the pretrained model expects.
  • Modern variants: feature-extractor mode (freeze everything but head), full fine-tune (everything trains), parameter-efficient methods (LoRA, adapters).