The Image Classification Dataset

Dive into Deep Learning · §3.2

The Image Classification Dataset
Fashion-MNIST, the dataset we will classify for the rest of this chapter.

MNIST is solved; Fashion-MNIST is not

Motivation

On MNIST, even simple models exceed 95% and a linear one tops 90%: models are hard to tell apart.
Fashion-MNIST: a drop-in replacement, same shape and API, but harder clothing classes (28\times28 grayscale, 10 classes, 60 k / 10 k).

Here a linear model caps out near 82% (the softmax-from-scratch section): headroom the deeper models of later chapters will spend.

Loading the Data

a reusable DataModule per framework

Wrap it once, reuse everywhere

A DataModule owns this framework’s download, transform, and train/val splits, so every model we build later just asks for batches:

class FashionMNIST(d2l.DataModule):
    """The Fashion-MNIST dataset."""
    def __init__(self, batch_size=64, resize=(28, 28)):
        super().__init__()
        self.save_hyperparameters()
        self.train, self.val = tf.keras.datasets.fashion_mnist.load_data()

60 000 train, 10 000 test

Instantiate it, resizing to 32\times32 to match the ConvNet inputs in later chapters:

data = FashionMNIST(resize=(32, 32))
len(data.train[0]), len(data.val[0])

(60000, 10000)

Ten classes \times 6 000 train images each = 60\,000; 1 000 each in test.

One image: channel-last

Loading · layout

TensorFlow and JAX store images channel-last, h \times w \times c, with the color axis at the end:

X, y = next(iter(data.train_dataloader()))
X[0].shape  # channel-last: (height, width, channels)

(32, 32, 1)

Same image, axes reordered to (32, 32, 1): one grayscale channel at the end.

Labels as words, not integers

The dataset stores labels as integers 0–9. A tiny helper maps them to names so our spot-checks are readable:

@d2l.add_to_class(FashionMNIST)
def text_labels(self, indices):
    """Return text labels."""
    labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
              'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [labels[int(i)] for i in indices]

Reading Minibatches

the iterator that feeds training

The data iterator

Minibatches

get_dataloader shuffles the training split and serves a batch_size-sized minibatch each step:

@d2l.add_to_class(FashionMNIST)
def get_dataloader(self, train):
    data = self.train if train else self.val
    process = lambda X, y: (tf.expand_dims(X, axis=3) / 255,
                            tf.cast(y, dtype='int32'))
    resize_fn = lambda X, y: (tf.image.resize_with_pad(X, *self.resize), y)
    shuffle_buf = len(data[0]) if train else 1
    # `drop_remainder=train` keeps every training minibatch the same
    # shape, so JAX does not retrace the `@jax.jit`'d step function for
    # a smaller last batch.
    dataset = (tf.data.Dataset.from_tensor_slices(process(*data)).shuffle(
        shuffle_buf).batch(self.batch_size, drop_remainder=train).map(
            resize_fn))
    return d2l.TensorFlowDataLoader(dataset)

What one minibatch looks like

Minibatches

Pull one batch and read its shapes off directly:

X, y = next(iter(data.train_dataloader()))
print(X.shape, X.dtype, y.shape, y.dtype)

(64, 32, 32, 1) float32 (64,) int32

64 images, one grayscale channel, 32\times32 pixels, plus 64 integer labels arriving as a matching vector.

Loading is not the bottleneck: measure it

Minibatches · timing

Time one full pass over all 60,000 training images:

tic = time.time()
for X, y in data.train_dataloader():
    continue
f'{time.time() - tic:.2f} sec'

'0.75 sec'

Seconds, not minutes. For the ConvNets of later chapters, one forward + backward pass costs 10–100× the corresponding I/O, so a well-built loader keeps data off the critical path. If it ever were the bottleneck: prefetch and raise num_workers.

Looking at the Data

always eyeball what you train on

See the data before you model it

Visualization

A visualize method tiles one validation batch, each image captioned with its class name. Eyeballing data is a cheap, powerful sanity check:

@d2l.add_to_class(FashionMNIST)
def visualize(self, batch, nrows=1, ncols=8, labels=None):
    X, y = batch
    if not labels:
        labels = self.text_labels(y)
    d2l.show_images(jnp.squeeze(X), nrows, ncols, titles=labels)

batch = next(iter(data.val_dataloader()))
data.visualize(batch)

Recap

Wrap-up

Fashion-MNIST: 10 clothing classes, 28\times28 grayscale, harder than MNIST but the same size and API.
A DataModule owns each framework’s download, transforms, and train/val loaders.

Channel axis differs: PyTorch/MXNet c\times h\times w, TensorFlow/JAX h\times w\times c (the loader hides it).
Always look at your data; a full loading pass costs seconds, so training speed is set by the model, not I/O.
Next: a linear classifier on this data, and its ~82% ceiling (the softmax-from-scratch section).