x = np.array(3.0)
y = np.array(2.0)
x + y, x * y, x / y, x ** yThe minimum linear-algebra vocabulary every chapter that follows assumes:
Each piece comes with a one-liner of code so you can see the API.
Scalars are rank-0 tensors — a single number with all the usual arithmetic operators:
A vector is a 1-D array of scalars:
The length of a vector is its number of elements:
A matrix is a rank-2 tensor — m rows × n columns:
A matrix is symmetric when it equals its own transpose:
\mathbf{A} = \mathbf{A}^\top.
Useful: the input to many losses (covariance, Gram matrix) is symmetric.
The naming generalizes — a rank-n tensor has n axes. A 3-D tensor is the shape of a stack of matrices (think batched RGB images: batch × height × width × channels in TF, batch × channels × height × width in PyTorch):
Two tensors of the same shape combine element-wise:
A scalar broadcasts to every element of a tensor:
The sum \sum_i x_i collapses every element into one scalar:
To collapse only one or some axes, pass axis=:
A list of axes reduces over each:
axis=[0,1] is identical to the default sum() for a rank-2 tensor.
\bar x = \frac{1}{n} \sum_i x_i. Either built-in mean() or sum() / numel():
keepdims)Set keepdims=True to preserve the reduced axis (size 1) so broadcasting still works:
cumsum(axis=k) keeps the axis but reports a running total — useful for time-series and prefix sums:
\mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i — element-wise multiply, then sum:
\mathbf{A}\mathbf{x} is a length-m vector — one dot product per row of A. The most ubiquitous operation in deep learning: a fully-connected layer’s forward pass.
The \ell_2 norm — Euclidean length, the workhorse of optimization:
\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}.\qquad \|\mathbf{x}\|_1 = \sum_i |x_i|.\qquad \|\mathbf{X}\|_\text{F} = \sqrt{\sum_{i,j} x_{ij}^2}.
*).sum, mean, with axis= and keepdims=.dot, mv, mm / @.Most deep-learning math compiles down to this short list.