%matplotlib inline
from d2l import mxnet as d2l
import math
from mxnet import gluon, np, npx
npx.set_np()
d2l.use_svg_display()Naive Bayes — the simplest probabilistic classifier. Apply Bayes’ rule:
P(y \mid \mathbf{x}) \propto P(y) \prod_i P(x_i \mid y).
The “naive” part is the assumption that features are conditionally independent given the class. Wrong in general — pixels of an image are obviously correlated — but the model is fast, requires little data, and is a useful starting point.
This deck applies it to MNIST digit classification with binarized pixels.
Binarize pixels so each pixel can be modeled as a Bernoulli random variable conditioned on the digit class.
Inspect the binarized digits before fitting: the class templates are recognizable, but neighboring pixels are clearly dependent.
def transform(data, label):
return np.floor(data.astype('float32') / 128).squeeze(axis=-1), label
# In Gluon 2.0, `transform=` on the dataset constructor was deprecated in favor
# of `dataset.transform(...)` so that transforms compose cleanly with DataLoader.
mnist_train = gluon.data.vision.MNIST(train=True).transform(transform)
mnist_test = gluon.data.vision.MNIST(train=False).transform(transform)For each class y and pixel i, estimate P(x_i = 1 \mid y) from the training set. With Laplace smoothing to avoid zeros:
Training is counting, not gradient descent: estimate class priors and per-pixel likelihoods directly from the labeled examples.
Training stores only class priors and per-class pixel probabilities; prediction multiplies those likelihood terms, usually in log-space.
Sums of logs instead of products of probabilities — avoids underflow:
log_P_xy = np.log(P_xy)
log_P_xy_neg = np.log(1 - P_xy)
log_P_y = np.log(P_y)
def bayes_pred_stable(x):
x = np.expand_dims(x, axis=0) # (28, 28) -> (1, 28, 28)
p_xy = log_P_xy * x + log_P_xy_neg * (1 - x)
p_xy = p_xy.reshape(10, -1).sum(axis=1) # p(x|y)
return p_xy + log_P_y
py = bayes_pred_stable(image)
pyThe accuracy is useful mostly as a sanity check: on images, the conditional-independence assumption leaves visible performance on the table.