from d2l import tensorflow as d2l
import tensorflow as tfSame model, same data — using the framework’s built-in primitives:
W and b.cross_entropy that fuses softmax + log + NLL with numerical-stability tricks (the LogSumExp trick).Trainer, same convergence, much less code.Imports + a one-line linear layer wrapped in our Classifier scaffold:
class SoftmaxRegression(d2l.Classifier):
"""The softmax regression model."""
def __init__(self, num_outputs, lr):
super().__init__()
self.save_hyperparameters()
self.net = tf.keras.models.Sequential()
self.net.add(tf.keras.layers.Flatten())
self.net.add(tf.keras.layers.Dense(num_outputs))
def forward(self, X):
return self.net(X)Computing softmax then log then NLL separately blows up numerically when logits are large (exp(100) overflows in common float32 arithmetic). The framework’s cross_entropy takes raw logits and computes the loss directly via the LogSumExp trick — equivalent math, stable arithmetic:
\log \sum_j e^{o_j} = m + \log \sum_j e^{o_j-m}, \quad m=\max_j o_j.
@d2l.add_to_class(d2l.Classifier)
def loss(self, Y_hat, Y, averaged=True):
Y_hat = d2l.reshape(Y_hat, (-1, Y_hat.shape[-1]))
Y = d2l.reshape(Y, (-1,))
reduction = (tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE
if averaged else tf.keras.losses.Reduction.NONE)
fn = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction=reduction)
return fn(Y, Y_hat)The model output skips the explicit softmax — the loss handles both pieces.
Same Fashion-MNIST data, same 10 epochs, same Trainer:
Identical accuracy curve to the from-scratch version. Built-in loss = cleaner code + better numerics.
cross_entropy(logits, y) ≡ softmax → log → NLL with the LogSumExp stability trick baked in.