from d2l import mxnet as d2l
from mxnet import autograd, np, npx, gluon
npx.set_np()The same recipe as linear regression, with two new pieces:
Wired into the same Module / Trainer scaffold from the regression chapter — Classifier adds accuracy reporting and we inherit the rest.
Quick reminder before defining softmax — sum along chosen axes:
\mathrm{softmax}(\mathbf{X})_{ij} = \frac{\exp(\mathbf{X}_{ij})}{\sum_k \exp(\mathbf{X}_{ik})}.
Three steps: exponentiate, sum across the class axis, divide.
Flatten each 32×32 image into a 1024-vector, hit one linear layer that outputs 10 logits — one per class:
class SoftmaxRegressionScratch(d2l.Classifier):
def __init__(self, num_inputs, num_outputs, lr, sigma=0.01):
super().__init__()
self.save_hyperparameters()
self.W = np.random.normal(0, sigma, (num_inputs, num_outputs))
self.b = np.zeros(num_outputs)
self.W.attach_grad()
self.b.attach_grad()
def collect_params(self):
return [self.W, self.b]For label y (an integer class), the loss on one example is just
\ell = -\log \hat{y}_{y}
— the negative log of the predicted probability of the correct class. Here are two examples with 3 classes:
One line — fancy indexing pulls out y_hat[i, y[i]] for each example, then negative log:
10 epochs on Fashion-MNIST. The base Classifier already handles the validation loop and accuracy reporting:
Pull a fresh validation batch and look at predicted vs. true classes:
Tile the misclassified images, captioned with predicted / true:
Linear models cap out around ~83% on Fashion-MNIST — easy classes right, ambiguous shirt-vs-pullover wrong.