The Classifier class

The Base Classification Model

The shared classifier base

A small Classifier base class that every classification model in the book inherits from. Same role as d2l.Module for regression — but with classification-specific defaults:

  • A validation step that reports loss and accuracy.
  • An accuracy helper that compares the argmax of the predicted scores to the true labels.

Subclasses just supply forward (and a custom loss if not plain cross-entropy).

Scores, probabilities, decisions

Classifiers usually produce a vector of scores \mathbf{o}\in\mathbb{R}^q. The training loss may turn scores into probabilities, but the deployed decision is often just

\hat{y}=\arg\max_j o_j.

Keep the roles separate:

  • scores/logits: differentiable quantities the model outputs;
  • loss: smooth training signal, e.g. cross-entropy;
  • accuracy: discrete evaluation metric after taking argmax.

Accuracy is what many benchmarks report, but it is not a useful gradient: one tiny score change usually leaves argmax unchanged.

Base classifier imports

from d2l import tensorflow as d2l
import tensorflow as tf
class Classifier(d2l.Module):
    """The base class of classification models."""
    def validation_step(self, batch):
        Y_hat = self(*batch[:-1])
        self.plot('loss', self.loss(Y_hat, batch[-1]), train=False)
        self.plot('acc', self.accuracy(Y_hat, batch[-1]), train=False)

    def _report_val(self, y_hat, batch):
        self.plot('loss', self.loss(y_hat, batch[-1]), train=False)
        self.plot('acc', self.accuracy(y_hat, batch[-1]), train=False)

Default optimizer hook

A default configure_optimizers on Module so subclasses don’t have to write it:

@d2l.add_to_class(d2l.Module)
def configure_optimizers(self):
    return tf.keras.optimizers.SGD(float(self.lr))

Accuracy

Take the argmax along the class axis, compare with the true label element-wise, and average. The result is the fraction of correctly-classified examples in the batch:

@d2l.add_to_class(Classifier)
def accuracy(self, Y_hat, Y, averaged=True):
    """Compute the number of correct predictions."""
    Y_hat = d2l.reshape(Y_hat, (-1, Y_hat.shape[-1]))
    preds = d2l.astype(d2l.argmax(Y_hat, axis=1), Y.dtype)
    compare = d2l.astype(preds == d2l.reshape(Y, (-1,)), d2l.float32)
    return d2l.reduce_mean(compare) if averaged else compare

The validation step then reports both the loss (lower is better) and accuracy (higher is better) every epoch.

Why report both loss and accuracy?

Two models can have the same accuracy but different confidence. Cross-entropy still notices whether the correct class received probability 0.51 or 0.99.

Use both during training:

  • loss detects calibration and optimization progress;
  • accuracy tracks the hard decision quality students and benchmarks usually care about;
  • disagreement between them is diagnostic, not a bug.

Recap

  • Classifier(d2l.Module) adds accuracy reporting to the base scaffold from the regression chapter.
  • One line for accuracy: argmax → ==y → mean.
  • The same training loop now drives every classification model we’ll build through the rest of the book.