v = [1, 7, 0, 1]The geometric intuitions behind the linear algebra used throughout the book. Two viewpoints on a vector \mathbf{v}:
Most of deep learning works in the second view. From it we get dot products (similarity), angles, projections, hyperplanes (decision boundaries), and determinants (volume changes).
The same array can name a point or a displacement. Deep learning mostly uses the displacement view: directions, lengths, and angles.
\mathbf{u}^\top \mathbf{v} = \|\mathbf{u}\| \|\mathbf{v}\| \cos\theta. Cosine similarity = normalized dot product. The metric behind kernel methods, attention, and contrastive learning:
A hyperplane is the set \{\mathbf{x} : \mathbf{w}^\top \mathbf{x} = b\}. Linear classifiers split space with one — sign of the dot product gives the prediction. Most of deep learning is “learn good features so a hyperplane works”:
# Load in the dataset
train = gluon.data.vision.FashionMNIST(train=True)
test = gluon.data.vision.FashionMNIST(train=False)
# In MXNet 2.0 reductions over `float` (== float64) inputs stay float64, but
# many fused kernels still emit float32 — pin everything to float32 up front so
# downstream dot products see matching dtypes.
X_train_0 = np.stack([x[0] for x in train if x[1] == 0]).astype('float32')
X_train_1 = np.stack([x[0] for x in train if x[1] == 1]).astype('float32')
X_test = np.stack(
[x[0] for x in test if x[1] == 0 or x[1] == 1]).astype('float32')
y_test = np.stack(
[x[1] for x in test if x[1] == 0 or x[1] == 1]).astype('float32')
# Compute averages
ave_0 = np.mean(X_train_0, axis=0)
ave_1 = np.mean(X_train_1, axis=0)Changing \mathbf{w} rotates the boundary; changing b shifts it. Normalized distance to the boundary is a margin.
Square matrices are invertible iff they don’t collapse volumes. The determinant measures the signed volume scale factor:
Translate all of this into NumPy / PyTorch:
These final snippets connect the geometric ideas to the actual linear-algebra APIs for norms, determinants, and inverses.