%matplotlib inline
from d2l import torch as d2l
import torchEnd-to-end linear regression with nothing but tensor ops:
Module with w and b parameters and a forward.Trainer’s fit_epoch, also from scratch.The next chapter does the same with nn.LazyLinear + MSELoss + SGD in two lines. This one shows what those two lines hide.
Initialize w randomly (small Gaussian), b at zero:
class LinearRegressionScratch(d2l.Module):
"""The linear regression model implemented from scratch."""
def __init__(self, num_inputs, lr, sigma=0.01):
super().__init__()
self.save_hyperparameters()
self.w = d2l.normal(0, sigma, (num_inputs, 1), requires_grad=True)
self.b = d2l.zeros(1, requires_grad=True)requires_grad=True (or the framework equivalent) so autograd tracks them.
The model is one matrix-vector product plus a bias — \hat{\mathbf{y}} = \mathbf{X}\mathbf{w} + b:
Squared error per example, averaged across the batch:
\ell(\hat{y}, y) = \tfrac{1}{2}(\hat{y} - y)^2.
The update rule \theta \leftarrow \theta - \eta \nabla_\theta L written out by hand:
class SGD(d2l.HyperParameters):
"""Minibatch stochastic gradient descent."""
def __init__(self, params, lr):
self.save_hyperparameters()
def step(self):
for param in self.params:
param -= self.lr * param.grad
def zero_grad(self):
for param in self.params:
if param.grad is not None:
param.grad.zero_()What happens once per minibatch — forward, loss, backward, step:
The Trainer walks the train and val loaders once per epoch, calling the steps:
@d2l.add_to_class(d2l.Trainer)
def fit_epoch(self):
self.model.train()
for batch in self.train_dataloader:
loss = self.model.training_step(self.prepare_batch(batch))
self.optim.zero_grad()
loss.backward()
if self.gradient_clip_val > 0: # To be discussed later
self.clip_gradients(self.gradient_clip_val, self.model)
# The `no_grad` only needs to wrap the parameter update; the
# scratch `SGD.step` does an in-place `param -= lr * grad`,
# which would otherwise be flagged as a leaf-tensor mutation.
with torch.no_grad():
self.optim.step()
self.train_batch_idx += 1
if self.val_dataloader is None:
return
self.model.eval()
for batch in self.val_dataloader:
with torch.no_grad():
self.model.validation_step(self.prepare_batch(batch))
self.val_batch_idx += 1We know the true w and b — compare with the learned values:
error in estimating w: tensor([ 0.0006, -0.0003])
error in estimating b: tensor([-0.0005])
Tiny differences come from finite training data + noise; tighter than that requires either more data or a better optimizer.
Module for linear regression boils down to __init__, forward, loss, configure_optimizers.Trainer.fit_epoch glue is what pytorch / tensorflow / jax / mxnet’s training APIs hide.