v = [1, 7, 0, 1]The geometric intuitions behind the linear algebra used throughout the book. Two viewpoints on a vector \mathbf{v}:
Most of deep learning works in the second view. From it we get dot products (similarity), angles, projections, hyperplanes (decision boundaries), and determinants (volume changes).
The same array can name a point or a displacement. Deep learning mostly uses the displacement view: directions, lengths, and angles.
\mathbf{u}^\top \mathbf{v} = \|\mathbf{u}\| \|\mathbf{v}\| \cos\theta. Cosine similarity = normalized dot product. The metric behind kernel methods, attention, and contrastive learning:
<tf.Tensor: shape=(), dtype=float32, numpy=0.41899001598358154>
A hyperplane is the set \{\mathbf{x} : \mathbf{w}^\top \mathbf{x} = b\}. Linear classifiers split space with one — sign of the dot product gives the prediction. Most of deep learning is “learn good features so a hyperplane works”:
# Load in the dataset
((train_images, train_labels), (
test_images, test_labels)) = tf.keras.datasets.fashion_mnist.load_data()
X_train_0 = tf.cast(tf.stack(train_images[[i for i, label in enumerate(
train_labels) if label == 0]]), dtype=tf.float32) * 256
X_train_1 = tf.cast(tf.stack(train_images[[i for i, label in enumerate(
train_labels) if label == 1]]), dtype=tf.float32) * 256
X_test = tf.cast(tf.stack(test_images[[i for i, label in enumerate(
test_labels) if label == 0 or label == 1]]),
dtype=tf.float32) * 256
y_test = tf.cast(tf.stack([label for label in test_labels
if label == 0 or label == 1]), dtype=tf.float32)
# Compute averages
ave_0 = tf.reduce_mean(X_train_0, axis=0)
ave_1 = tf.reduce_mean(X_train_1, axis=0)Changing \mathbf{w} rotates the boundary; changing b shifts it. Normalized distance to the boundary is a margin.
<tf.Tensor: shape=(), dtype=float32, numpy=0.0>
Square matrices are invertible iff they don’t collapse volumes. The determinant measures the signed volume scale factor:
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[1., 0.],
[0., 1.]], dtype=float32)>
Translate all of this into NumPy / PyTorch:
(TensorShape([2, 2]), TensorShape([2, 2, 3]), TensorShape([2]))
These final snippets connect the geometric ideas to the actual linear-algebra APIs for norms, determinants, and inverses.
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 90, 126],
[102, 144],
[114, 162]], dtype=int32)>