from d2l import mxnet as d2l
from mxnet import autograd, gluon, init, np, npx
from mxnet.gluon import nn
npx.set_np()The same model, same data, same training — using the framework’s high-level layers and built-in losses:
LazyLinear (or equivalent) instead of hand-rolled w, b.MSELoss (no factor of ½).SGD.End result: ~5 lines of model code instead of 30. Same convergence on synthetic data.
Wrap a single linear layer with the right output dimension. The “lazy” variant defers the input-dim shape until the first forward:
Hook the layer into our Module interface (forward, configure_optimizers):
Built-in MSE — note it omits the 1/2 factor we used by hand:
Identical loop — the Trainer doesn’t care that the model is now a thin wrapper around a built-in layer:
Pull weights and bias back out of the layer:
Module / Trainer scaffold.