from d2l import tensorflow as d2l
import tensorflow as tfThe same recipe as linear regression, with two new pieces:
Wired into the same Module / Trainer scaffold from the regression chapter — Classifier adds accuracy reporting and we inherit the rest.
Quick reminder before defining softmax — sum along chosen axes:
(<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[5., 7., 9.]], dtype=float32)>,
<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[ 6.],
[15.]], dtype=float32)>)
\mathrm{softmax}(\mathbf{X})_{ij} = \frac{\exp(\mathbf{X}_{ij})}{\sum_k \exp(\mathbf{X}_{ik})}.
Three steps: exponentiate, sum across the class axis, divide.
Result: every row is non-negative and sums to 1 — a valid probability distribution over classes:
(<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[0.18055958, 0.17420632, 0.24367459, 0.2057537 , 0.19580577],
[0.16664788, 0.12496491, 0.14281595, 0.24873221, 0.31683904]],
dtype=float32)>,
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 1.], dtype=float32)>)
Flatten each 32×32 image into a 1024-vector, hit one linear layer that outputs 10 logits — one per class:
class SoftmaxRegressionScratch(d2l.Classifier):
def __init__(self, num_inputs, num_outputs, lr, sigma=0.01):
super().__init__()
self.save_hyperparameters()
self.W = tf.random.normal((num_inputs, num_outputs), 0, sigma)
self.b = tf.zeros(num_outputs)
self.W = tf.Variable(self.W)
self.b = tf.Variable(self.b)For label y (an integer class), the loss on one example is just
\ell = -\log \hat{y}_{y}
— the negative log of the predicted probability of the correct class. Here are two examples with 3 classes:
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.1, 0.5], dtype=float32)>
One line — fancy indexing pulls out y_hat[i, y[i]] for each example, then negative log:
<tf.Tensor: shape=(), dtype=float32, numpy=1.497866153717041>
10 epochs on Fashion-MNIST. The base Classifier already handles the validation loop and accuracy reporting:
Pull a fresh validation batch and look at predicted vs. true classes:
TensorShape([256])
Tile the misclassified images, captioned with predicted / true:
Linear models cap out around ~83% on Fashion-MNIST — easy classes right, ambiguous shirt-vs-pullover wrong.