x = torch.tensor(3.0)
y = torch.tensor(2.0)
x + y, x * y, x / y, x**yThe minimum linear-algebra vocabulary every chapter that follows assumes:
Each piece comes with a one-liner of code so you can see the API.
Scalars are rank-0 tensors — a single number with all the usual arithmetic operators:
(tensor(5.), tensor(6.), tensor(1.5000), tensor(9.))
A vector is a 1-D array of scalars:
tensor([0, 1, 2])
The length of a vector is its number of elements:
3
A matrix is a rank-2 tensor — m rows × n columns:
tensor([[0, 1],
[2, 3],
[4, 5]])
A matrix is symmetric when it equals its own transpose:
\mathbf{A} = \mathbf{A}^\top.
tensor([[True, True, True],
[True, True, True],
[True, True, True]])
Useful: the input to many losses (covariance, Gram matrix) is symmetric.
The naming generalizes — a rank-n tensor has n axes. A 3-D tensor is the shape of a stack of matrices (think batched RGB images: batch × height × width × channels in TF, batch × channels × height × width in PyTorch):
tensor([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Two tensors of the same shape combine element-wise:
(tensor([[0., 1., 2.],
[3., 4., 5.]]),
tensor([[ 0., 2., 4.],
[ 6., 8., 10.]]))
A scalar broadcasts to every element of a tensor:
(tensor([[[ 2, 3, 4, 5],
[ 6, 7, 8, 9],
[10, 11, 12, 13]],
[[14, 15, 16, 17],
[18, 19, 20, 21],
[22, 23, 24, 25]]]),
torch.Size([2, 3, 4]))
The sum \sum_i x_i collapses every element into one scalar:
(tensor([0., 1., 2.]), tensor(3.))
To collapse only one or some axes, pass axis=:
(torch.Size([2, 3]), torch.Size([3]))
A list of axes reduces over each:
tensor(True)
axis=[0,1] is identical to the default sum() for a rank-2 tensor.
\bar x = \frac{1}{n} \sum_i x_i. Either built-in mean() or sum() / numel():
(tensor(2.5000), tensor(2.5000))
keepdims)Set keepdims=True to preserve the reduced axis (size 1) so broadcasting still works:
(tensor([[ 3.],
[12.]]),
torch.Size([2, 1]))
cumsum(axis=k) keeps the axis but reports a running total — useful for time-series and prefix sums:
tensor([[0., 1., 2.],
[3., 5., 7.]])
\mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i — element-wise multiply, then sum:
(tensor([0., 1., 2.]), tensor([1., 1., 1.]), tensor(3.))
\mathbf{A}\mathbf{x} is a length-m vector — one dot product per row of A. The most ubiquitous operation in deep learning: a fully-connected layer’s forward pass.
(torch.Size([2, 3]), torch.Size([3]), tensor([ 5., 14.]), tensor([ 5., 14.]))
The \ell_2 norm — Euclidean length, the workhorse of optimization:
\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}.\qquad \|\mathbf{x}\|_1 = \sum_i |x_i|.\qquad \|\mathbf{X}\|_\text{F} = \sqrt{\sum_{i,j} x_{ij}^2}.
tensor(5.)
*).sum, mean, with axis= and keepdims=.dot, mv, mm / @.Most deep-learning math compiles down to this short list.