x = tf.constant(3.0)
y = tf.constant(2.0)
x + y, x * y, x / y, x**yThe minimum linear-algebra vocabulary every chapter that follows assumes:
Each piece comes with a one-liner of code so you can see the API.
Scalars are rank-0 tensors — a single number with all the usual arithmetic operators:
(<tf.Tensor: shape=(), dtype=float32, numpy=5.0>,
<tf.Tensor: shape=(), dtype=float32, numpy=6.0>,
<tf.Tensor: shape=(), dtype=float32, numpy=1.5>,
<tf.Tensor: shape=(), dtype=float32, numpy=9.0>)
A vector is a 1-D array of scalars:
<tf.Tensor: shape=(3,), dtype=int32, numpy=array([0, 1, 2], dtype=int32)>
The length of a vector is its number of elements:
3
A matrix is a rank-2 tensor — m rows × n columns:
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[0, 1],
[2, 3],
[4, 5]], dtype=int32)>
A matrix is symmetric when it equals its own transpose:
\mathbf{A} = \mathbf{A}^\top.
<tf.Tensor: shape=(3, 3), dtype=bool, numpy=
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])>
Useful: the input to many losses (covariance, Gram matrix) is symmetric.
The naming generalizes — a rank-n tensor has n axes. A 3-D tensor is the shape of a stack of matrices (think batched RGB images: batch × height × width × channels in TF, batch × channels × height × width in PyTorch):
<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy=
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]], dtype=int32)>
Two tensors of the same shape combine element-wise:
(<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0., 1., 2.],
[3., 4., 5.]], dtype=float32)>,
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 0., 2., 4.],
[ 6., 8., 10.]], dtype=float32)>)
A scalar broadcasts to every element of a tensor:
(<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy=
array([[[ 2, 3, 4, 5],
[ 6, 7, 8, 9],
[10, 11, 12, 13]],
[[14, 15, 16, 17],
[18, 19, 20, 21],
[22, 23, 24, 25]]], dtype=int32)>,
TensorShape([2, 3, 4]))
The sum \sum_i x_i collapses every element into one scalar:
(<tf.Tensor: shape=(3,), dtype=float32, numpy=array([0., 1., 2.], dtype=float32)>,
<tf.Tensor: shape=(), dtype=float32, numpy=3.0>)
To collapse only one or some axes, pass axis=:
(TensorShape([2, 3]), TensorShape([3]))
A list of axes reduces over each:
(<tf.Tensor: shape=(), dtype=float32, numpy=15.0>,
<tf.Tensor: shape=(), dtype=float32, numpy=15.0>)
axis=[0,1] is identical to the default sum() for a rank-2 tensor.
\bar x = \frac{1}{n} \sum_i x_i. Either built-in mean() or sum() / numel():
(<tf.Tensor: shape=(), dtype=float32, numpy=2.5>,
<tf.Tensor: shape=(), dtype=float32, numpy=2.5>)
keepdims)Set keepdims=True to preserve the reduced axis (size 1) so broadcasting still works:
(<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[ 3.],
[12.]], dtype=float32)>,
TensorShape([2, 1]))
cumsum(axis=k) keeps the axis but reports a running total — useful for time-series and prefix sums:
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0., 1., 2.],
[3., 5., 7.]], dtype=float32)>
\mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i — element-wise multiply, then sum:
(<tf.Tensor: shape=(3,), dtype=float32, numpy=array([0., 1., 2.], dtype=float32)>,
<tf.Tensor: shape=(3,), dtype=float32, numpy=array([1., 1., 1.], dtype=float32)>,
<tf.Tensor: shape=(), dtype=float32, numpy=3.0>)
\mathbf{A}\mathbf{x} is a length-m vector — one dot product per row of A. The most ubiquitous operation in deep learning: a fully-connected layer’s forward pass.
(TensorShape([2, 3]),
TensorShape([3]),
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([ 5., 14.], dtype=float32)>)
The \ell_2 norm — Euclidean length, the workhorse of optimization:
\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}.\qquad \|\mathbf{x}\|_1 = \sum_i |x_i|.\qquad \|\mathbf{X}\|_\text{F} = \sqrt{\sum_{i,j} x_{ij}^2}.
<tf.Tensor: shape=(), dtype=float32, numpy=5.0>
\ell_1 is less sensitive to outliers and promotes sparsity:
<tf.Tensor: shape=(), dtype=float32, numpy=7.0>
*).sum, mean, with axis= and keepdims=.dot, mv, mm / @.Most deep-learning math compiles down to this short list.