Vectors

Linear Algebra

Linear Algebra Toolkit

The minimum linear-algebra vocabulary every chapter that follows assumes:

  • Scalars / vectors / matrices / tensors — the four ranks.
  • Arithmetic — element-wise, with broadcasting.
  • Reductions — sum, mean, along chosen axes.
  • Products — dot, matrix-vector, matrix-matrix.
  • Norms\ell_1, \ell_2, Frobenius.

Each piece comes with a one-liner of code so you can see the API.

Scalars

Scalars are rank-0 tensors — a single number with all the usual arithmetic operators:

x = np.array(3.0)
y = np.array(2.0)

x + y, x * y, x / y, x ** y

A vector is a 1-D array of scalars:

x = np.arange(3)
x

Element access uses standard indexing:

x[2]

Length and shape

The length of a vector is its number of elements:

len(x)

For higher-rank tensors len() is just shape[0]. Use .shape when you need every axis:

x.shape

Matrices

A matrix is a rank-2 tensor — m rows × n columns:

A = np.arange(6).reshape(3, 2)
A

The transpose flips rows and columns; the same data, axes swapped:

A.T

Symmetric matrices

A matrix is symmetric when it equals its own transpose:

\mathbf{A} = \mathbf{A}^\top.

A = np.array([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A == A.T

Useful: the input to many losses (covariance, Gram matrix) is symmetric.

Higher-rank tensors

The naming generalizes — a rank-n tensor has n axes. A 3-D tensor is the shape of a stack of matrices (think batched RGB images: batch × height × width × channels in TF, batch × channels × height × width in PyTorch):

np.arange(24).reshape(2, 3, 4)

Element-wise arithmetic

Two tensors of the same shape combine element-wise:

A = np.arange(6).reshape(2, 3)
B = A.copy()  # Assign a copy of A to B by allocating new memory
A, A + B

The element-wise product of matrices is the Hadamard product \mathbf{A} \odot \mathbf{B}:

A * B

Scalar–tensor arithmetic

A scalar broadcasts to every element of a tensor:

a = 2
X = np.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape

Reductions: sum

The sum \sum_i x_i collapses every element into one scalar:

x = np.arange(3)
x, x.sum()

Same call works for any rank — it folds across all axes by default:

A.shape, A.sum()

Reducing along an axis

To collapse only one or some axes, pass axis=:

A.shape, A.sum(axis=0).shape
A.shape, A.sum(axis=1).shape

axis=0 collapses rows (output rank drops by one along that axis), axis=1 collapses columns.

Reducing all axes

A list of axes reduces over each:

A.sum(axis=[0, 1]) == A.sum()  # Same as A.sum()

axis=[0,1] is identical to the default sum() for a rank-2 tensor.

Mean

\bar x = \frac{1}{n} \sum_i x_i. Either built-in mean() or sum() / numel():

A.mean(), A.sum() / A.size

And along a single axis:

A.mean(axis=0), A.sum(axis=0) / A.shape[0]

Non-reducing sum (keepdims)

Set keepdims=True to preserve the reduced axis (size 1) so broadcasting still works:

sum_A = A.sum(axis=1, keepdims=True)
sum_A, sum_A.shape

Now A / sum_A divides every row by its sum — common normalization:

A / sum_A

Cumulative sum

cumsum(axis=k) keeps the axis but reports a running total — useful for time-series and prefix sums:

A.cumsum(axis=0)

Dot product

\mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i — element-wise multiply, then sum:

y = np.ones(3)
x, y, np.dot(x, y)

Two equivalent ways to compute it:

np.sum(x * y)

Matrix products

\mathbf{A}\mathbf{x} is a length-m vector — one dot product per row of A. The most ubiquitous operation in deep learning: a fully-connected layer’s forward pass.

A.shape, x.shape, np.dot(A, x)

\mathbf{AB} is m matrix-vector products stitched into a matrix (equivalently, m \cdot n row-by-column dot products):

B = np.ones(shape=(3, 4))
np.dot(A, B)

Norms

The \ell_2 norm — Euclidean length, the workhorse of optimization:

\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}.\qquad \|\mathbf{x}\|_1 = \sum_i |x_i|.\qquad \|\mathbf{X}\|_\text{F} = \sqrt{\sum_{i,j} x_{ij}^2}.

u = np.array([3, -4])
np.linalg.norm(u)

\ell_1 is less sensitive to outliers and promotes sparsity:

np.abs(u).sum()

For matrices, Frobenius is the \ell_2 of the flattened matrix:

np.linalg.norm(np.ones((4, 9)))

Recap

  • Scalars / vectors / matrices / tensors are ranks 0 / 1 / 2 / n.
  • Element-wise ops, scalar broadcasting, Hadamard product (*).
  • Reductions: sum, mean, with axis= and keepdims=.
  • Products: dot, mv, mm / @.
  • Norms: \ell_1, \ell_2, Frobenius.

Most deep-learning math compiles down to this short list.