v = [1, 7, 0, 1]The geometric intuitions behind the linear algebra used throughout the book. Two viewpoints on a vector \mathbf{v}:
Most of deep learning works in the second view. From it we get dot products (similarity), angles, projections, hyperplanes (decision boundaries), and determinants (volume changes).
The same array can name a point or a displacement. Deep learning mostly uses the displacement view: directions, lengths, and angles.
\mathbf{u}^\top \mathbf{v} = \|\mathbf{u}\| \|\mathbf{v}\| \cos\theta. Cosine similarity = normalized dot product. The metric behind kernel methods, attention, and contrastive learning:
%matplotlib inline
from d2l import torch as d2l
from IPython import display
import torch
from torchvision import transforms
import torchvision
def angle(v, w):
return torch.acos(v.dot(w) / (torch.norm(v) * torch.norm(w)))
angle(torch.tensor([0, 1, 2], dtype=torch.float32), torch.tensor([2.0, 3, 4]))tensor(0.4190)
A hyperplane is the set \{\mathbf{x} : \mathbf{w}^\top \mathbf{x} = b\}. Linear classifiers split space with one — sign of the dot product gives the prediction. Most of deep learning is “learn good features so a hyperplane works”:
# Load in the dataset
trans = []
trans.append(transforms.ToTensor())
trans = transforms.Compose(trans)
train = torchvision.datasets.FashionMNIST(root="../data", transform=trans,
train=True, download=True)
test = torchvision.datasets.FashionMNIST(root="../data", transform=trans,
train=False, download=True)
X_train_0 = torch.stack(
[x[0] * 256 for x in train if x[1] == 0]).type(torch.float32)
X_train_1 = torch.stack(
[x[0] * 256 for x in train if x[1] == 1]).type(torch.float32)
X_test = torch.stack(
[x[0] * 256 for x in test if x[1] == 0 or x[1] == 1]).type(torch.float32)
y_test = torch.stack([torch.tensor(x[1]) for x in test
if x[1] == 0 or x[1] == 1]).type(torch.float32)
# Compute averages
ave_0 = torch.mean(X_train_0, axis=0)
ave_1 = torch.mean(X_train_1, axis=0)Changing \mathbf{w} rotates the boundary; changing b shifts it. Normalized distance to the boundary is a margin.
tensor(0.8730, dtype=torch.float64)
Square matrices are invertible iff they don’t collapse volumes. The determinant measures the signed volume scale factor:
tensor([[1., 0.],
[0., 1.]])
Translate all of this into NumPy / PyTorch:
(torch.Size([2, 2]), torch.Size([2, 2, 3]), torch.Size([2]))
These final snippets connect the geometric ideas to the actual linear-algebra APIs for norms, determinants, and inverses.
tensor([[ 90, 126],
[102, 144],
[114, 162]])