from d2l import torch as d2l
import torchThe same recipe as linear regression, with two new pieces:
Wired into the same Module / Trainer scaffold from the regression chapter — Classifier adds accuracy reporting and we inherit the rest.
Quick reminder before defining softmax — sum along chosen axes:
(tensor([[5., 7., 9.]]),
tensor([[ 6.],
[15.]]))
\mathrm{softmax}(\mathbf{X})_{ij} = \frac{\exp(\mathbf{X}_{ij})}{\sum_k \exp(\mathbf{X}_{ik})}.
Three steps: exponentiate, sum across the class axis, divide.
Flatten each 32×32 image into a 1024-vector, hit one linear layer that outputs 10 logits — one per class:
class SoftmaxRegressionScratch(d2l.Classifier):
def __init__(self, num_inputs, num_outputs, lr, sigma=0.01):
super().__init__()
self.save_hyperparameters()
self.W = torch.normal(0, sigma, size=(num_inputs, num_outputs),
requires_grad=True)
self.b = torch.zeros(num_outputs, requires_grad=True)
def parameters(self):
return [self.W, self.b]For label y (an integer class), the loss on one example is just
\ell = -\log \hat{y}_{y}
— the negative log of the predicted probability of the correct class. Here are two examples with 3 classes:
tensor([0.1000, 0.5000])
One line — fancy indexing pulls out y_hat[i, y[i]] for each example, then negative log:
tensor(1.4979)
10 epochs on Fashion-MNIST. The base Classifier already handles the validation loop and accuracy reporting:
Pull a fresh validation batch and look at predicted vs. true classes:
torch.Size([256])
Tile the misclassified images, captioned with predicted / true:
Linear models cap out around ~83% on Fashion-MNIST — easy classes right, ambiguous shirt-vs-pullover wrong.