Object-Oriented Design for Implementation

Dive into Deep Learning · §2.2

Write the training loop once
let every new model and dataset be a subclass · Module · DataModule · Trainer.

One loop, written once

Motivation

Almost every model in this book runs the same loop: load a batch, forward, compute loss, update, repeat.

Rewrite that loop per model and one tweak (gradient clipping, an LR schedule) means touching every chapter. Instead, factor it into three collaborating classes:

Module is the model · DataModule is the data · Trainer owns the loop. New work = a new subclass.

Notebook-friendly utilities

three helpers that make classes teachable

Define a class, then grow it

Utilities

A notebook wants short cells, so declare the shell first and instantiate it…

class A:
    def __init__(self):
        self.b = 1

a = A()

…then attach a method later with @add_to_class, which setattrs it onto the class, so the bound method sees self:

@add_to_class(A)
def do(self):
    print('Class attribute "b" is', self.b)

a.do()

Class attribute "b" is 1

`add_to_class`, in three lines

Utilities

The whole trick: a decorator that writes the function onto a class object. Python’s class namespace is mutable, so this works even on a class that already has instances.

def add_to_class(Class):
    """Register functions as methods in created class."""
    def wrapper(obj):
        setattr(Class, obj.__name__, obj)
        return obj
    return wrapper

We use it throughout the book to split one class across several cells, each next to the prose that explains it.

Stop hand-copying constructor args

Utilities

Every __init__ is full of self.lr = lr; self.n = n; .... The HyperParameters mixin captures the caller’s arguments and saves them as attributes automatically:

# Call the fully implemented HyperParameters class saved in d2l
class B(d2l.HyperParameters):
    def __init__(self, a, b, c):
        self.save_hyperparameters(ignore=['c'])
        print('self.a =', self.a, 'self.b =', self.b)
        print('There is no self.c =', not hasattr(self, 'c'))

b = B(a=1, b=2, c=3)

self.a = 1 self.b = 2
There is no self.c = True

One save_hyperparameters() call and self.a, self.b exist; an ignore= list opts arguments out. (Full implementation in the Utilities appendix.)

`ProgressBoard`: the loss curve, animated

Utilities

draw(x, y, label) records a point and the curve grows as training runs; every_n thins a noisy series by plotting the average of the last n values:

board = d2l.ProgressBoard('x')
for x in np.arange(0, 10, 0.1):
    board.draw(x, np.sin(x), 'sin', every_n=2)
    board.draw(x, np.cos(x), 'cos', every_n=10)
board.flush()  # wait for the queued points, then render the final figure

Why draw merely schedules the point (and flush() waits for the queue) is the point of the next slide.

Watching the loss live should be impossible

Utilities · compilation & async

Frameworks earn their speed by compiling the training step into a graph and letting the device run ahead of Python. That imposes two rules:

A compiled step must be pure: a print or plot inside it cannot be captured by the compiler, forcing a fallback to slower eager execution.
The instant Python asks for a concrete number, it must block until the device catches up, stalling the very pipeline we built.

So every naïve “plot the loss each batch” either breaks the compiled graph or drains the device pipeline. Real-time monitoring and efficiency seem to be at war.

Resolution: queue now, render elsewhere

Utilities · compilation & async

ProgressBoard decouples the two: draw hands the value to a queue and returns at once; a background thread does the device-to-host copy and the slow matplotlib rendering at its own pace, dropping points if it falls behind, since a live curve needs only a few updates per second.

The training loop stays compiled, the device stays busy, and the loss still falls before your eyes.

The pattern to remember, book-wide: keep the hot path pure and compiled; push logging, plotting, and checkpointing off to the side.

The three base classes

Module · DataModule · Trainer

`Module`: the model, its loss, its optimizer

Base classes

Every model subclasses Module and supplies three things:

forward / loss: the prediction and how wrong it is.
training_step: loss on one batch (plots it for free).
configure_optimizers: the optimizer to use.

Module extends the framework’s own neural-network base class, so an instance is callable: model(X) runs forward.

`DataModule`: where batches come from

Base classes

A DataModule serves a train and a validation loader, both through one get_dataloader(train) hook that subclasses override. This is the entire base class:

class DataModule(d2l.HyperParameters):
    """The base class of data."""
    def __init__(self, root='../data', num_workers=4):
        self.save_hyperparameters()

    def get_dataloader(self, train):
        raise NotImplementedError

    def train_dataloader(self):
        return self.get_dataloader(train=True)

    def val_dataloader(self):
        return self.get_dataloader(train=False)

A loader is a generator yielding one batch at a time, fed straight into Module.training_step.

`Trainer`: it owns the loop

Base classes

fit(model, data) wires the two together: prepare the loaders, hand the optimizer over, then run fit_epoch for max_epochs. The body is short:

def fit(self, model, data):
    self.prepare_data(data)
    self.prepare_model(model)
    self.optim = model.configure_optimizers()
    for self.epoch in range(self.max_epochs):
        self.fit_epoch()

fit_epoch stays abstract here; we enrich Trainer for GPUs and parallel training in later chapters.

Recap

Wrap-up

Three classes scaffold every model: Module (the model), DataModule (the data), Trainer (the loop).
New model or dataset = a subclass; the loop is written once.
add_to_class splits a class across notebook cells; HyperParameters kills __init__ boilerplate.

ProgressBoard plots the loss live yet never blocks: keep the hot path pure and compiled; push logging off to the side, a theme that recurs all book.

Object-Oriented Design for Implementation

One loop, written once

Define a class, then grow it

add_to_class, in three lines

Stop hand-copying constructor args

ProgressBoard: the loss curve, animated

Watching the loss live should be impossible

Resolution: queue now, render elsewhere

Module: the model, its loss, its optimizer

DataModule: where batches come from

Trainer: it owns the loop

Recap

`add_to_class`, in three lines

`ProgressBoard`: the loss curve, animated

`Module`: the model, its loss, its optimizer

`DataModule`: where batches come from

`Trainer`: it owns the loop