x = jnp.array(3.0)
y = jnp.array(2.0)
x + y, x * y, x / y, x**yThe minimum linear-algebra vocabulary every chapter that follows assumes:
Each piece comes with a one-liner of code so you can see the API.
Scalars are rank-0 tensors — a single number with all the usual arithmetic operators:
(Array(5., dtype=float32, weak_type=True),
Array(6., dtype=float32, weak_type=True),
Array(1.5, dtype=float32, weak_type=True),
Array(9., dtype=float32, weak_type=True))
A vector is a 1-D array of scalars:
Array([0, 1, 2], dtype=int32)
The length of a vector is its number of elements:
3
A matrix is a rank-2 tensor — m rows × n columns:
Array([[0, 1],
[2, 3],
[4, 5]], dtype=int32)
A matrix is symmetric when it equals its own transpose:
\mathbf{A} = \mathbf{A}^\top.
Array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
Useful: the input to many losses (covariance, Gram matrix) is symmetric.
The naming generalizes — a rank-n tensor has n axes. A 3-D tensor is the shape of a stack of matrices (think batched RGB images: batch × height × width × channels in TF, batch × channels × height × width in PyTorch):
Array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]], dtype=int32)
Two tensors of the same shape combine element-wise:
(Array([[0., 1., 2.],
[3., 4., 5.]], dtype=float32),
Array([[ 0., 2., 4.],
[ 6., 8., 10.]], dtype=float32))
A scalar broadcasts to every element of a tensor:
(Array([[[ 2, 3, 4, 5],
[ 6, 7, 8, 9],
[10, 11, 12, 13]],
[[14, 15, 16, 17],
[18, 19, 20, 21],
[22, 23, 24, 25]]], dtype=int32),
(2, 3, 4))
The sum \sum_i x_i collapses every element into one scalar:
(Array([0., 1., 2.], dtype=float32), Array(3., dtype=float32))
To collapse only one or some axes, pass axis=:
((2, 3), (3,))
A list of axes reduces over each:
Array(True, dtype=bool)
axis=[0,1] is identical to the default sum() for a rank-2 tensor.
\bar x = \frac{1}{n} \sum_i x_i. Either built-in mean() or sum() / numel():
(Array(2.5, dtype=float32), Array(2.5, dtype=float32))
keepdims)Set keepdims=True to preserve the reduced axis (size 1) so broadcasting still works:
(Array([[ 3.],
[12.]], dtype=float32),
(2, 1))
cumsum(axis=k) keeps the axis but reports a running total — useful for time-series and prefix sums:
Array([[0., 1., 2.],
[3., 5., 7.]], dtype=float32)
\mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i — element-wise multiply, then sum:
(Array([0., 1., 2.], dtype=float32),
Array([1., 1., 1.], dtype=float32),
Array(3., dtype=float32))
\mathbf{A}\mathbf{x} is a length-m vector — one dot product per row of A. The most ubiquitous operation in deep learning: a fully-connected layer’s forward pass.
((2, 3), (3,), Array([ 5., 14.], dtype=float32))
The \ell_2 norm — Euclidean length, the workhorse of optimization:
\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}.\qquad \|\mathbf{x}\|_1 = \sum_i |x_i|.\qquad \|\mathbf{X}\|_\text{F} = \sqrt{\sum_{i,j} x_{ij}^2}.
Array(5., dtype=float32)
\ell_1 is less sensitive to outliers and promotes sparsity:
Array(7., dtype=float32)
*).sum, mean, with axis= and keepdims=.dot, mv, mm / @.Most deep-learning math compiles down to this short list.