v = [1, 7, 0, 1]The geometric intuitions behind the linear algebra used throughout the book. Two viewpoints on a vector \mathbf{v}:
Most of deep learning works in the second view. From it we get dot products (similarity), angles, projections, hyperplanes (decision boundaries), and determinants (volume changes).
The same array can name a point or a displacement. Deep learning mostly uses the displacement view: directions, lengths, and angles.
\mathbf{u}^\top \mathbf{v} = \|\mathbf{u}\| \|\mathbf{v}\| \cos\theta. Cosine similarity = normalized dot product. The metric behind kernel methods, attention, and contrastive learning:
Array(0.41899002, dtype=float32)
A hyperplane is the set \{\mathbf{x} : \mathbf{w}^\top \mathbf{x} = b\}. Linear classifiers split space with one — sign of the dot product gives the prediction. Most of deep learning is “learn good features so a hyperplane works”:
# Load in the dataset
import tensorflow as tf
((train_images, train_labels), (
test_images, test_labels)) = tf.keras.datasets.fashion_mnist.load_data()
X_train_0 = jnp.array(train_images[train_labels == 0], dtype=jnp.float32) * 256
X_train_1 = jnp.array(train_images[train_labels == 1], dtype=jnp.float32) * 256
X_test = jnp.array(
test_images[(test_labels == 0) | (test_labels == 1)], dtype=jnp.float32) * 256
y_test = jnp.array(
test_labels[(test_labels == 0) | (test_labels == 1)], dtype=jnp.float32)
# Compute averages
ave_0 = jnp.mean(X_train_0, axis=0)
ave_1 = jnp.mean(X_train_1, axis=0)Changing \mathbf{w} rotates the boundary; changing b shifts it. Normalized distance to the boundary is a margin.
Square matrices are invertible iff they don’t collapse volumes. The determinant measures the signed volume scale factor:
Array([[1., 0.],
[0., 1.]], dtype=float32)
Translate all of this into NumPy / PyTorch:
((2, 2), (2, 2, 3), (2,))
These final snippets connect the geometric ideas to the actual linear-algebra APIs for norms, determinants, and inverses.
Array([[ 90, 126],
[102, 144],
[114, 162]], dtype=int32)