from d2l import torch as d2l
import numpy as np
import torch
from torch import nnThe same model, same data, same training — using the framework’s high-level layers and built-in losses:
LazyLinear (or equivalent) instead of hand-rolled w, b.MSELoss (no factor of ½).SGD.End result: ~5 lines of model code instead of 30. Same convergence on synthetic data.
Wrap a single linear layer with the right output dimension. The “lazy” variant defers the input-dim shape until the first forward:
Hook the layer into our Module interface (forward, configure_optimizers):
Built-in MSE — note it omits the 1/2 factor we used by hand:
Identical loop — the Trainer doesn’t care that the model is now a thin wrapper around a built-in layer:
Pull weights and bias back out of the layer:
Module / Trainer scaffold.