Gershgorin circles

Eigendecompositions

Eigenvectors and Dynamics

A square matrix \mathbf{A} has eigenvalue \lambda and eigenvector \mathbf{v} when

\mathbf{A}\mathbf{v} = \lambda \mathbf{v}.

Geometrically: \mathbf{A} stretches \mathbf{v} by \lambda but doesn’t rotate it. If \mathbf{A} is diagonalizable: \mathbf{A} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^{-1} — a basis change in which the action is just stretching along axes.

Why we care: matrix powers \mathbf{A}^t are governed by \lambda^t. Repeated application of \mathbf{A} aligns arbitrary inputs with the dominant eigenvector. That’s the heart of vanishing/exploding gradients in RNNs, of PageRank, and of every iterative solver.

A concrete example

Use a small matrix so the geometry is visible: applying \mathbf{A} to an eigenvector changes scale but not direction.

%matplotlib inline
from d2l import mxnet as d2l
from IPython import display
import numpy as np

np.linalg.eig(np.array([[2, 1], [2, 3]]))

Cheap eigenvalue bounds without computing them: eigenvalues lie in the union of disks centered at a_{ii} with radius \sum_{j \ne i} |a_{ij}|. Useful for stability arguments:

A = np.array([[1.0, 0.1, 0.1, 0.1],
              [0.1, 3.0, 0.2, 0.3],
              [0.1, 0.2, 5.0, 0.5],
              [0.1, 0.3, 0.5, 9.0]])

v, _ = np.linalg.eig(A)
v

Eigenvectors govern long-run behavior

Power iteration: keep multiplying by \mathbf{A}. The direction converges to the leading eigenvector; the norm grows like \lambda_1^t:

np.random.seed(8675309)

k = 5
A = np.random.randn(k, k)
A
# Calculate the sequence of norms after repeatedly applying `A`
v_in = np.random.randn(k, 1)

norm_list = [np.linalg.norm(v_in)]
for i in range(1, 100):
    v_in = A.dot(v_in)
    norm_list.append(np.linalg.norm(v_in))

d2l.plot(np.arange(0, 100), norm_list, 'Iteration', 'Value')
# Compute the scaling factor of the norms
norm_ratio_list = []
for i in range(1, 100):
    norm_ratio_list.append(norm_list[i]/norm_list[i - 1])

d2l.plot(np.arange(1, 100), norm_ratio_list, 'Iteration', 'Ratio')

Relating back

After repeated multiplication, normalize the vector to read off the direction; the scale factor estimates the dominant eigenvalue.

# Compute the eigenvalues
eigs = np.linalg.eigvals(A).tolist()
norm_eigs = [np.absolute(x) for x in eigs]
norm_eigs.sort()
print(f'norms of eigenvalues: {norm_eigs}')
# Rescale the matrix `A`
A /= norm_eigs[-1]

# Do the same experiment again
v_in = np.random.randn(k, 1)

norm_list = [np.linalg.norm(v_in)]
for i in range(1, 100):
    v_in = A.dot(v_in)
    norm_list.append(np.linalg.norm(v_in))

d2l.plot(np.arange(0, 100), norm_list, 'Iteration', 'Value')
# Also plot the ratio
norm_ratio_list = []
for i in range(1, 100):
    norm_ratio_list.append(norm_list[i]/norm_list[i-1])

d2l.plot(np.arange(1, 100), norm_ratio_list, 'Iteration', 'Ratio')

Recap

  • \mathbf{A}\mathbf{v} = \lambda \mathbf{v}: \mathbf{A} acts as scaling along the eigenvector axes.
  • Largest |\lambda| controls long-run iterated dynamics.
  • Symmetric matrices have orthonormal eigenvectors and real eigenvalues — the basis for PCA.
  • Vanishing/exploding RNN gradients = “iterated map” with bad spectral radius.