%matplotlib inline
from d2l import tensorflow as d2l
import tensorflow as tfEnd-to-end linear regression with nothing but tensor ops:
Module with w and b parameters and a forward.Trainer’s fit_epoch, also from scratch.The next chapter does the same with nn.LazyLinear + MSELoss + SGD in two lines. This one shows what those two lines hide.
Initialize w randomly (small Gaussian), b at zero:
class LinearRegressionScratch(d2l.Module):
"""The linear regression model implemented from scratch."""
def __init__(self, num_inputs, lr, sigma=0.01):
super().__init__()
self.save_hyperparameters()
w = tf.random.normal((num_inputs, 1), mean=0, stddev=0.01)
b = tf.zeros(1)
self.w = tf.Variable(w, trainable=True)
self.b = tf.Variable(b, trainable=True)requires_grad=True (or the framework equivalent) so autograd tracks them.
The model is one matrix-vector product plus a bias — \hat{\mathbf{y}} = \mathbf{X}\mathbf{w} + b:
Squared error per example, averaged across the batch:
\ell(\hat{y}, y) = \tfrac{1}{2}(\hat{y} - y)^2.
The update rule \theta \leftarrow \theta - \eta \nabla_\theta L written out by hand:
What happens once per minibatch — forward, loss, backward, step:
The Trainer walks the train and val loaders once per epoch, calling the steps:
@d2l.add_to_class(d2l.Trainer)
def _compile_steps(self):
model, optim = self.model, self.optim
grad_clip = self.gradient_clip_val
for batch in self.train_dataloader:
model(*self.prepare_batch(batch)[:-1], training=True)
break
def train_step(batch):
with tf.GradientTape() as tape:
loss = model.loss(model(*batch[:-1], training=True),
batch[-1])
params = model.trainable_variables
if not params:
params = list(tape.watched_variables())
grads = tape.gradient(loss, params)
if grad_clip > 0:
grads = self.clip_gradients(grad_clip, grads)
optim.apply_gradients(zip(grads, params))
return loss
def val_step(batch):
return model(*batch[:-1], training=False)
train_step = tf.function(train_step, reduce_retracing=True)
val_step = tf.function(val_step, reduce_retracing=True)
self._train_step = train_step
self._val_step = val_step
@d2l.add_to_class(d2l.Trainer)
def fit_epoch(self):
self.model.training = True
for batch in self.train_dataloader:
loss = self._train_step(self.prepare_batch(batch))
self.model._report_train(loss)
self.train_batch_idx += 1
if self.val_dataloader is None:
return
self.model.training = False
for batch in self.val_dataloader:
b = self.prepare_batch(batch)
y_hat = self._val_step(b)
self.model._report_val(y_hat, b)
self.val_batch_idx += 1We know the true w and b — compare with the learned values:
error in estimating w: [0.00043726 0.00010633]
error in estimating b: [0.00016832]
Tiny differences come from finite training data + noise; tighter than that requires either more data or a better optimizer.
Module for linear regression boils down to __init__, forward, loss, configure_optimizers.Trainer.fit_epoch glue is what pytorch / tensorflow / jax / mxnet’s training APIs hide.