Advanced Mathematics

Advanced Linear Algebra: Beyond the Basics

Advanced linear algebra extends matrix computations to the deep structure of vector spaces. Spectral theory, SVD, Jordan form, tensor products, and exterior algebra form the mathematical backbone of modern data science, quantum mechanics, and geometry.

1. Inner Product Spaces

An inner product space generalizes the dot product to abstract vector spaces, introducing notions of angle, orthogonality, and length in an algebraic setting. The theory drives Fourier analysis, quantum mechanics, and least-squares fitting.

Definition of an Inner Product

Let V be a vector space over the real or complex numbers. An inner product is a map from V times V to the scalars, written angle-bracket u, v angle-bracket, satisfying:

  • Conjugate symmetry: angle-bracket u, v angle-bracket = conjugate of angle-bracket v, u angle-bracket
  • Linearity in first argument: angle-bracket au + bw, v angle-bracket = a angle-bracket u, v angle-bracket + b angle-bracket w, v angle-bracket
  • Positive definiteness: angle-bracket v, v angle-bracket is greater than 0 for all v not equal to 0
  • angle-bracket 0, 0 angle-bracket = 0

The induced norm is defined as the square root of angle-bracket v, v angle-bracket, written as the norm of v. Two vectors u and v are orthogonal if angle-bracket u, v angle-bracket equals zero.

The Cauchy-Schwarz Inequality

In any inner product space, the absolute value of the inner product of u and v is at most the product of their norms:

|angle-bracket u, v angle-bracket| is at most (norm of u)(norm of v)

Equality holds exactly when u and v are scalar multiples of each other. This inequality is the foundation for the triangle inequality and gives meaning to the concept of angle between vectors via cosine-theta equals angle-bracket u,v angle-bracket divided by (norm of u)(norm of v).

Gram-Schmidt Orthogonalization

Given any linearly independent set of vectors v1, v2, ..., vk, Gram-Schmidt produces an orthonormal set e1, e2, ..., ek spanning the same subspace. The algorithm proceeds inductively:

  • Step 1: u1 = v1; e1 = u1 divided by norm(u1)
  • Step 2: u2 = v2 minus (angle-bracket v2, e1 angle-bracket) times e1; e2 = u2 divided by norm(u2)
  • Step k: uk = vk minus the sum over j from 1 to k-1 of (angle-bracket vk, ej angle-bracket) times ej; ek = uk divided by norm(uk)

Each projection angle-bracket vk, ej angle-bracket times ej removes the component of vk in the direction ej, ensuring uk is perpendicular to all previous basis vectors. Numerically, modified Gram-Schmidt (updating the residual after each projection) is more stable than classical Gram-Schmidt.

Orthogonal Projections

Given a subspace W with orthonormal basis e1, ..., em, the orthogonal projection of a vector v onto W is:

proj_W(v) = sum over j of angle-bracket v, ej angle-bracket times ej

The projection matrix P = Q Q-transpose (where Q has columns ej) satisfies P squared = P (idempotent) and P-transpose = P (symmetric). The error v minus proj_W(v) is perpendicular to W, making proj_W(v) the closest point in W to v. This is the geometric heart of least-squares.

Complete Orthonormal Systems

In infinite-dimensional inner product spaces (Hilbert spaces), a complete orthonormal system (basis) allows every vector to be expressed as a convergent series. For the space L-squared of square-integrable functions on the interval from 0 to 2-pi, the functions (1 over sqrt(2-pi)) times e to the inx for integer n form a complete orthonormal system — the Fourier basis.

Parseval's Identity

For a complete orthonormal basis, the sum of squared inner product coefficients equals the squared norm of the vector: sum over j of |angle-bracket v, ej angle-bracket| squared equals the squared norm of v. This is the abstract version of the Pythagorean theorem and ensures no energy is lost in the representation.

2. The Spectral Theorem

The spectral theorem is one of the most powerful results in linear algebra, characterizing when a linear operator can be diagonalized by an orthonormal basis and establishing that self-adjoint operators have real spectra.

Symmetric and Hermitian Matrices

A real matrix A is symmetric if A equals A-transpose. A complex matrix A is Hermitian if A equals its conjugate transpose A-star (also written A-dagger). These are the natural self-adjoint operators in finite dimensions. Key properties:

  • All eigenvalues of a real symmetric or complex Hermitian matrix are real.
  • Eigenvectors for distinct eigenvalues are orthogonal.
  • The matrix is orthogonally (or unitarily) diagonalizable.

The Spectral Theorem (Real Case)

Theorem

Every real symmetric matrix A has a spectral decomposition: A = Q D Q-transpose, where Q is an orthogonal matrix (Q-transpose Q = I) whose columns are orthonormal eigenvectors, and D is a diagonal matrix with the corresponding real eigenvalues on the diagonal. The eigenvalues are repeated according to multiplicity.

For a complex Hermitian matrix, the decomposition is A = U D U-star, where U is unitary (U-star U = I) and D has real diagonal entries.

Spectral Decomposition as a Sum of Rank-1 Projections

Writing the columns of Q as q1, q2, ..., qn with eigenvalues lambda-1, ..., lambda-n, the spectral theorem gives:

A = lambda-1 times q1 q1-transpose + lambda-2 times q2 q2-transpose + ... + lambda-n times qn qn-transpose

Each term qi qi-transpose is the orthogonal projection onto the eigenspace for lambda-i. This representation enables efficient computation of matrix functions: f(A) = sum of f(lambda-i) times qi qi-transpose for any function f.

Normal Matrices

A complex matrix A is normal if A A-star equals A-star A. The spectral theorem extends to normal matrices: every normal matrix is unitarily diagonalizable. This class includes Hermitian, skew-Hermitian, and unitary matrices. The Schur decomposition states that every square complex matrix is unitarily similar to an upper triangular matrix.

Min-Max (Courant-Fischer) Theorem

For a real symmetric matrix A with eigenvalues ordered lambda-1 at most lambda-2 at most ... at most lambda-n, the k-th eigenvalue has a variational characterization:

lambda-k = max over all k-dimensional subspaces S of (min over nonzero v in S of (v-transpose A v) divided by (v-transpose v))

This min-max characterization is used to prove eigenvalue interlacing theorems and provides bounds on eigenvalues after rank-1 perturbations (Weyl's inequality).

3. Singular Value Decomposition (SVD)

SVD is arguably the most important matrix factorization in applied mathematics. Unlike eigendecomposition, SVD applies to every matrix — rectangular, rank-deficient, or complex — and reveals the complete geometric action of a linear map.

Existence and Statement

Theorem (SVD)

Every real m-by-n matrix A can be factored as A = U Sigma V-transpose, where U is m-by-m orthogonal, V is n-by-n orthogonal, and Sigma is m-by-n with non-negative entries sigma-1 at least sigma-2 at least ... at least sigma-r greater than 0 on the main diagonal and zeros elsewhere. The sigma-i are the singular values of A. The number r is the rank of A.

The singular values are the positive square roots of the non-zero eigenvalues of A-transpose A (equivalently, of A A-transpose). The columns of V are the right singular vectors (eigenvectors of A-transpose A), and the columns of U are the left singular vectors (eigenvectors of A A-transpose).

Geometric Interpretation

SVD decomposes A = U Sigma V-transpose into three operations:

V-transpose

Rotate/reflect the input space. V-transpose maps the standard basis to the right singular vectors.

Sigma

Scale along coordinate axes by singular values. Stretches each axis independently.

U

Rotate/reflect the output space. U maps coordinate axes to the left singular vectors.

The unit ball is mapped to an ellipsoid with semi-axes of lengths sigma-1, sigma-2, ..., sigma-r aligned with the columns of U.

Fundamental Subspaces via SVD

Column Space of A

Spanned by the first r columns of U (left singular vectors for non-zero singular values).

Null Space of A

Spanned by columns r+1 through n of V (right singular vectors for zero singular values).

Row Space of A

Spanned by the first r columns of V.

Left Null Space of A

Spanned by columns r+1 through m of U.

Low-Rank Approximation and the Eckart-Young Theorem

The truncated SVD A-k = U-k Sigma-k V-k-transpose (keeping only the top k singular values and vectors) is the best rank-k approximation to A in both the Frobenius norm and the spectral norm:

min over rank-k matrices B of (norm of A minus B) = sigma-(k+1)

For the Frobenius norm, the error equals the square root of the sum of squared singular values from k+1 to r. This theorem justifies dimensionality reduction: keeping only the top k directions of variation captures the most information.

The Moore-Penrose Pseudoinverse

For any matrix A = U Sigma V-transpose, the pseudoinverse is A-plus = V Sigma-plus U-transpose, where Sigma-plus replaces each non-zero sigma-i by 1 divided by sigma-i and transposes the matrix. The pseudoinverse gives the minimum-norm least-squares solution to Ax = b:

x-star = A-plus b = V Sigma-plus U-transpose b

This is the solution with smallest Euclidean norm among all vectors minimizing the residual norm of Ax minus b. When A is square and invertible, A-plus = A-inverse. When A has full column rank, A-plus = (A-transpose A) inverse A-transpose, the normal equations solution.

4. Matrix Norms

Matrix norms extend the concept of vector norms to matrices, enabling precise measurement of matrix size and perturbation sensitivity. Different norms capture different geometric and algebraic properties.

Frobenius Norm

The Frobenius norm of an m-by-n matrix A is the square root of the sum of squares of all entries:

norm_F(A) = sqrt(sum over i,j of |a_ij| squared) = sqrt(trace(A-transpose A)) = sqrt(sum of sigma-i squared)

The Frobenius norm is induced by the inner product angle-bracket A, B angle-bracket = trace(A-transpose B) on the space of matrices. It is orthogonally invariant: norm_F(U A V-transpose) = norm_F(A) for orthogonal U, V.

Spectral Norm (Operator 2-Norm)

The spectral norm (or operator norm induced by the Euclidean vector norm) measures the maximum stretching factor of A:

norm_2(A) = max over nonzero x of (norm(Ax) / norm(x)) = sigma-1 (largest singular value)

The spectral norm satisfies submultiplicativity: norm_2(AB) is at most norm_2(A) times norm_2(B), and is also orthogonally invariant. It equals 1 for orthogonal matrices.

Nuclear Norm

The nuclear norm (trace norm or Schatten 1-norm) is the sum of all singular values:

norm_star(A) = sum of sigma-i = trace(sqrt(A-transpose A))

The nuclear norm is the convex relaxation of matrix rank, playing a role in matrix completion problems (like Netflix recommendation) analogous to the L1 norm in compressed sensing. Minimizing the nuclear norm promotes low-rank solutions.

Submultiplicativity and Consistency

A matrix norm is submultiplicative if norm(AB) is at most norm(A) times norm(B). Consistent norms satisfy norm(Ax) at most norm(A) times norm(x) for compatible vector norms. Key facts:

  • The spectral norm and Frobenius norm are both submultiplicative.
  • For any square matrix: norm_2(A) is at most norm_F(A) is at most sqrt(n) times norm_2(A).
  • The spectral radius rho(A) = max of |eigenvalues| satisfies rho(A) at most norm(A) for any matrix norm.

Condition Number

The condition number of an invertible matrix A with respect to a norm measures the sensitivity of the linear system Ax = b to perturbations:

kappa(A) = norm(A) times norm(A-inverse) = sigma-1 divided by sigma-r

A large condition number means the system is ill-conditioned: small changes in b cause large changes in the solution x. A condition number near 1 indicates a well-conditioned problem. Singular matrices have infinite condition number. In double-precision arithmetic with machine epsilon near 10 to the -16, systems with condition number near 10 to the 16 lose all significant digits.

5. Jordan Normal Form

Not every matrix is diagonalizable. When an eigenvalue has fewer independent eigenvectors than its algebraic multiplicity, the Jordan normal form provides the canonical structure, revealing the complete behavior of matrix powers and matrix exponentials.

Generalized Eigenvectors

For an eigenvalue lambda of A, the generalized eigenspace is the null space of (A minus lambda I) to the power k for sufficiently large k. A generalized eigenvector of rank k satisfies (A minus lambda I) to the k times v = 0 but (A minus lambda I) to the k-1 times v is not zero. The chain v, (A minus lambda I)v, ..., (A minus lambda I) to the k-1 v forms a Jordan chain.

Jordan Blocks and Jordan Normal Form

Jordan Block J(lambda, k)

A k-by-k matrix with lambda on the diagonal and 1 on the superdiagonal:

[lambda 1 0 ]

[0 lambda 1 ]

[0 0 lambda]

The Jordan normal form theorem states: every complex square matrix A is similar to a block diagonal matrix J = diag(J1, J2, ..., Jt) where each Ji is a Jordan block. The form is unique up to reordering of blocks. If all blocks are 1-by-1, A is diagonalizable.

Minimal Polynomial

The minimal polynomial of A is the monic polynomial m(x) of smallest degree such that m(A) = 0. Key facts:

  • The minimal polynomial divides the characteristic polynomial.
  • They have the same roots (eigenvalues), but possibly different multiplicities.
  • A is diagonalizable if and only if its minimal polynomial has no repeated roots.
  • The Cayley-Hamilton theorem: every matrix satisfies its own characteristic polynomial.

Applications: Matrix Exponential

The matrix exponential e to the At, defined by the series sum of (At) to the k divided by k!, solves the ODE system dx/dt = Ax with x(0) = x0. Using Jordan form J = P inverse A P:

e to the At = P times (e to the Jt) times P-inverse

For a Jordan block J(lambda, k), the block exponential has entries involving t to the j times e to the lambda-t divided by j-factorial. This is why systems with defective eigenvalues have solutions with polynomial growth multiplied by exponentials — a hallmark of resonance phenomena in differential equations.

6. Positive Definite Matrices

Positive definite matrices are the matrix analogue of positive numbers. They arise naturally as covariance matrices in statistics, Hessians at local minima, Gram matrices of inner products, and fundamental forms in differential geometry.

Equivalent Definitions

A real symmetric n-by-n matrix A is positive definite (written A greater than 0) if and only if any of these equivalent conditions hold:

Quadratic Form

x-transpose A x is greater than 0 for all nonzero x in R-n.

Eigenvalues

All eigenvalues of A are strictly positive.

Sylvester's Criterion

All leading principal minors (determinants of k-by-k upper-left submatrices) are positive.

Cholesky

A = L L-transpose for a unique lower triangular L with positive diagonal entries.

Cholesky Decomposition

The Cholesky decomposition A = L L-transpose is the computational workhorse for positive definite systems. It requires roughly half the operations of LU decomposition and is numerically stable without pivoting. The algorithm proceeds column by column:

  • l_jj = sqrt(a_jj minus sum over k from 1 to j-1 of l_jk squared)
  • l_ij = (a_ij minus sum over k from 1 to j-1 of l_ik times l_jk) divided by l_jj, for i greater than j

If at any step the value under the square root is non-positive, A is not positive definite. This provides a fast positive definiteness test.

Loewner Partial Order

On symmetric matrices, we write A greater than or equal to B (in the Loewner sense) if A minus B is positive semidefinite. This defines a partial order with useful properties:

  • If A is at least B and B is at least C, then A is at least C.
  • If A is at least B and C is at least 0, then C A C-transpose is at least C B C-transpose.
  • The set of positive semidefinite matrices forms a convex cone.

Applications in Optimization and Statistics

Second-Order Conditions

A twice-differentiable function f has a strict local minimum at point x-star if the gradient is zero and the Hessian H(x-star) is positive definite. In statistics, the covariance matrix of any random vector is positive semidefinite. Maximum likelihood estimators in curved exponential families satisfy second-order conditions involving positive definite Fisher information matrices.

7. Tensor Products

Tensor products extend linear algebra to multilinear settings, enabling the systematic study of multi-index objects. They underpin quantum mechanics, differential geometry, and modern machine learning architectures.

Bilinear Maps and Universal Property

A bilinear map from V times W to a vector space Z is a function f(v, w) that is linear in each argument separately. The tensor product V tensor W is defined (up to isomorphism) by the universal property: there exists a canonical bilinear map tau from V times W to V tensor W such that every bilinear map f from V times W to Z factors uniquely through tau.

Concretely, if V has basis e-1, ..., e-m and W has basis f-1, ..., f-n, then V tensor W has dimension mn with basis e-i tensor f-j. General elements are finite sums (not just pure tensors): sum of a_ij times (e-i tensor f-j). Pure tensors v tensor w form a Zariski-dense subset but do not span all of V tensor W by pure tensors alone unless one space is one-dimensional.

Tensor Products in Quantum Computing

A single qubit lives in C-squared. A system of n qubits lives in C-2 tensor C-2 tensor ... tensor C-2 (n times), which has dimension 2 to the n. The tensor product structure captures the independence of subsystems. The state |00 angle-bracket corresponds to the basis vector e-1 tensor e-1. Entangled states like (|00 angle-bracket + |11 angle-bracket) divided by sqrt(2) are not pure tensors — they cannot be written as a product of individual qubit states.

Kronecker Products

For matrices, the tensor product corresponds to the Kronecker product. For A (m-by-n) and B (p-by-q), the Kronecker product A tensor B is the mp-by-nq block matrix with block (i,j) equal to a_ij times B.

Key Properties

  • (A tensor B)(C tensor D) = (AC) tensor (BD) when dimensions are compatible
  • (A tensor B)-transpose = A-transpose tensor B-transpose
  • eigenvalues of A tensor B are products lambda-i times mu-j of eigenvalues of A and B
  • vec(AXB) = (B-transpose tensor A) vec(X), useful for matrix equations

Multilinear Maps and Tensors

A (p, q)-tensor on a vector space V is a multilinear map from V-star times ... times V-star (p copies) times V times ... times V (q copies) to the scalars, where V-star is the dual space. Covariant tensors (type (0, q)) generalize bilinear forms. Contravariant tensors (type (p, 0)) generalize vectors. Mixed tensors arise in Riemannian geometry, where the metric tensor is of type (0, 2).

8. Exterior Algebra and Differential Forms

Exterior algebra provides the algebraic framework for signed volume, determinants, and the calculus of differential forms. The wedge product captures oriented area and volume in arbitrary dimensions.

The Wedge Product

For a vector space V, the exterior algebra (also called the Grassmann algebra) is built from the wedge product, a skew-symmetric bilinear operation satisfying:

  • v wedge v = 0 for all v (implies v wedge w = minus w wedge v)
  • (av + bw) wedge u = a(v wedge u) + b(w wedge u) (linearity)
  • (u wedge v) wedge w = u wedge (v wedge w) (associativity)

The space of k-vectors (k-th exterior power, written Lambda-k(V)) has dimension n choose k when V has dimension n. If e-1, ..., e-n is a basis for V, the basis of Lambda-k(V) consists of all wedge products e-i1 wedge ... wedge e-ik with i1 less than ... less than ik.

Determinant as Top Exterior Power

The top exterior power Lambda-n(V) for an n-dimensional space V is one-dimensional. A linear map T: V to V induces a map on Lambda-n(V), which must be multiplication by a scalar. That scalar is precisely the determinant of T:

T(v1) wedge T(v2) wedge ... wedge T(vn) = det(T) times (v1 wedge v2 wedge ... wedge vn)

This gives a coordinate-free definition of the determinant. Geometrically, |det(T)| measures the factor by which T scales n-dimensional volume, and the sign records orientation reversal.

Differential Forms

A differential k-form on a smooth manifold M assigns to each point p a skew-symmetric k-linear functional on the tangent space. Locally, a k-form omega can be written as:

omega = sum over i1 less than ... less than ik of f_(i1...ik)(x) dx-i1 wedge ... wedge dx-ik

The exterior derivative d sends k-forms to (k+1)-forms satisfying d squared = 0. Stokes' theorem unifies Green's, divergence, and classical Stokes' theorems: integral over M of d-omega = integral over boundary of M of omega.

Hodge Star and de Rham Cohomology

On a Riemannian manifold, the Hodge star operator maps k-forms to (n-k)-forms, providing a duality between forms of complementary degree. De Rham cohomology H-k(M) = ker(d on k-forms) divided by image(d on (k-1)-forms) measures the topological holes in M that prevent closed forms from being exact.

9. Lie Algebras

Lie algebras are the infinitesimal version of Lie groups — smooth manifolds that are also groups. They linearize the study of continuous symmetry and arise throughout physics, geometry, and representation theory.

Definition and Lie Bracket

A Lie algebra g is a vector space equipped with a bilinear operation [X, Y] called the Lie bracket satisfying:

  • Anti-symmetry: [X, Y] = minus [Y, X]
  • Jacobi identity: [X, [Y, Z]] + [Y, [Z, X]] + [Z, [X, Y]] = 0
  • Bilinearity: [aX + bY, Z] = a[X, Z] + b[Y, Z]

Classical Matrix Lie Algebras

gl(n, R)

All n-by-n real matrices. The Lie bracket is the commutator [A, B] = AB minus BA. Dimension is n squared.

sl(n, R)

Traceless n-by-n matrices. Dimension is n squared minus 1. Lie algebra of SL(n), the special linear group.

so(n)

Skew-symmetric matrices (A-transpose = minus A). Dimension is n(n-1)/2. Lie algebra of the rotation group SO(n).

sp(2n, R)

Matrices satisfying A-transpose J + J A = 0 where J is the standard symplectic form. Dimension is n(2n+1). Appears in Hamiltonian mechanics.

Adjoint Representation

Each element X in a Lie algebra g defines a linear map ad(X): g to g by ad(X)(Y) = [X, Y]. This is the adjoint representation of g on itself. The Jacobi identity ensures ad([X, Y]) = ad(X) ad(Y) minus ad(Y) ad(X), making ad: g to gl(g) a Lie algebra homomorphism.

Killing Form and Semisimplicity

The Killing form B(X, Y) = trace(ad(X) composed with ad(Y)) is a symmetric bilinear form on g. Cartan's criterion states: g is semisimple if and only if the Killing form is non-degenerate. For sl(2, R), the Killing form is B(X, Y) = 4 trace(XY). Semisimple Lie algebras over the complex numbers are completely classified by Dynkin diagrams: A-n, B-n, C-n, D-n, and the five exceptional algebras.

Exponential Map

The matrix exponential exp: gl(n) to GL(n) maps the Lie algebra to the Lie group. For skew-symmetric matrices X in so(3), exp(X) is a rotation matrix. For X in sl(2), exp(X) is in SL(2). The Baker-Campbell-Hausdorff formula expresses log(exp(X) exp(Y)) as a series in X, Y and their nested commutators, connecting multiplication in the group to the Lie bracket in the algebra.

10. Numerical Linear Algebra

Numerical linear algebra studies how to compute matrix factorizations and solve linear systems accurately and efficiently in finite-precision arithmetic. Understanding floating-point errors and algorithmic stability is essential for reliable computation.

LU Decomposition with Partial Pivoting

Every invertible matrix A can be factored as PA = LU, where P is a permutation matrix, L is unit lower triangular (ones on diagonal), and U is upper triangular. Partial pivoting swaps rows at each step to put the largest entry in the current column on the diagonal, controlling element growth and ensuring numerical stability.

Cost and Stability

LU with partial pivoting costs approximately (2/3) n cubed flops for an n-by-n matrix. For solving Ax = b: forward substitution on Ly = Pb (O(n squared) flops), then backward substitution on Ux = y (O(n squared) flops). For symmetric positive definite A, Cholesky (no pivoting needed) is faster and more stable.

QR Decomposition

Every m-by-n matrix A (m at least n) factors as A = QR where Q is m-by-m orthogonal and R is m-by-n upper triangular. The thin QR gives A = Q1 R1 where Q1 is m-by-n and R1 is n-by-n. QR decomposition is computed via Householder reflections or Givens rotations, and is the basis for the QR algorithm for eigenvalues.

Householder Reflections

A Householder reflector H = I minus 2vv-transpose divided by (v-transpose v) is orthogonal and symmetric. Choosing v appropriately zeros out entries below the diagonal in each column, reducing A to upper triangular form after n reflections (for square A). This approach is more stable than Gram-Schmidt for QR due to better orthogonality maintenance.

Floating-Point Arithmetic and Backward Stability

An algorithm is backward stable if the computed result is the exact result for a slightly perturbed input. For solving Ax = b via LU with partial pivoting, the computed solution satisfies (A + delta-A) x-hat = b, where the norm of delta-A is at most machine-epsilon times the norm of A (up to a small growth factor). Forward error analysis then bounds the norm of x-hat minus x using the condition number:

relative error of x at most kappa(A) times (machine epsilon times growth factor)

This explains why ill-conditioned systems lose precision: the condition number amplifies any small perturbation.

Iterative Methods for Large Systems

For large sparse systems where direct factorization is impractical, iterative methods are used:

  • Conjugate Gradient (CG): For symmetric positive definite A, converges in at most n steps exactly, much fewer in practice. Minimizes the A-norm of the error over Krylov subspaces.
  • GMRES: For general (possibly non-symmetric) A, minimizes the residual norm over the Krylov subspace K-m = span(b, Ab, ..., A to the m-1 b).
  • Preconditioning: Replace Ax = b with P-inverse Ax = P-inverse b where P approximates A, reducing the condition number and accelerating convergence.

11. Randomized Linear Algebra

Randomized algorithms have transformed large-scale linear algebra by trading a small amount of accuracy for dramatic computational savings. These methods are provably near-optimal and have become standard practice in machine learning and data science.

Johnson-Lindenstrauss Lemma

A cornerstone result: given m points in R-d and epsilon greater than 0, there exists a linear map f: R-d to R-k with k at most O(log(m) / epsilon squared) such that for every pair of points u, v:

(1 minus epsilon) times norm(u minus v) squared at most norm(f(u) minus f(v)) squared at most (1 plus epsilon) times norm(u minus v) squared

A Gaussian random matrix scaled by 1 over sqrt(k) achieves this. The remarkable fact: the target dimension k depends only on the number of points m and the error tolerance, not on the original dimension d.

Randomized SVD

The Halko-Martinsson-Tropp algorithm computes a near-optimal rank-k approximation to a large matrix A:

  1. Step 1: Draw a Gaussian random matrix Omega (n-by-(k+p)) for small oversampling parameter p (typically 5-10).
  2. Step 2: Form the sketch Y = A Omega (m-by-(k+p)), possibly with power iteration: Y = (A A-transpose) to the q power times A Omega.
  3. Step 3: Compute QR decomposition of Y to get orthonormal Q (m-by-(k+p)).
  4. Step 4: Form B = Q-transpose A (small (k+p)-by-n matrix).
  5. Step 5: Compute SVD of B = U-hat Sigma V-transpose; set U = Q U-hat.

Total cost: O(mn(k+p)) for a dense matrix, much less for sparse. Power iteration with q steps improves accuracy by sharpening the gap between retained and discarded singular values.

Sketching and Randomized Least Squares

For overdetermined systems Ax = b with A of size m-by-n (m much larger than n), randomized sketching forms a compressed system SA x equals Sb where S is a k-by-m sketching matrix (k of order n log n), then solves the small system. The sketch-and-solve approach gives an (1 plus epsilon) approximation to the least-squares objective. Alternatively, sketching can be used for preconditioning, forming a preconditioner from a sketch and solving the original system iteratively.

Random Projections for Compressed Sensing

Compressed sensing (Candes, Romberg, Tao; Donoho) shows that sparse signals can be recovered from far fewer measurements than the Shannon-Nyquist theorem suggests. If x in R-n has at most s non-zero entries and A is an m-by-n Gaussian random matrix with m at least O(s log(n/s)), then x can be exactly recovered from b = Ax by solving the convex L1-minimization problem: minimize the L1 norm of z subject to Az = b.

Restricted Isometry Property (RIP)

A matrix A satisfies the RIP of order s if for all s-sparse vectors x, (1 minus delta) times norm(x) squared at most norm(Ax) squared at most (1 plus delta) times norm(x) squared. Random matrices satisfy RIP with high probability, guaranteeing stable sparse recovery via L1 minimization.

12. Applications

Advanced linear algebra is the engine behind modern machine learning, quantum physics, and control engineering. These applications transform theoretical results into practical computational tools.

Principal Component Analysis (PCA)

PCA finds the directions of greatest variance in a dataset. Given data matrix X (n samples, d features, centered), the covariance matrix is S = X-transpose X divided by (n minus 1). The spectral theorem guarantees S = Q D Q-transpose. The principal components are the columns of Q, ordered by decreasing eigenvalue, and the projection onto the top k components is the best rank-k approximation to the data.

PCA via SVD

Computing SVD of X = U Sigma V-transpose directly gives the principal directions as columns of V and principal scores as columns of U Sigma. This is numerically preferable to computing S = X-transpose X explicitly, as squaring the matrix squares the condition number.

Least Squares and the Normal Equations

The overdetermined system Ax = b (m equations, n unknowns, m at least n, A full column rank) has least-squares solution:

x-star = (A-transpose A) inverse A-transpose b

Numerically, solving via QR decomposition (A = Q1 R1, then R1 x = Q1-transpose b) is more stable than forming the normal equations A-transpose Ax = A-transpose b, because QR avoids squaring the condition number. With Tikhonov regularization, minimize norm(Ax minus b) squared plus lambda times norm(x) squared, giving (A-transpose A plus lambda I) x = A-transpose b.

Quantum Mechanics

In quantum mechanics, the state of a physical system is a unit vector in a Hilbert space (often L-squared or C-n for finite systems). Observables are Hermitian operators, and the spectral theorem guarantees real measurement outcomes (eigenvalues). Measuring an observable A on state |psi angle-bracket gives eigenvalue lambda-k with probability |angle-bracket phi-k, psi angle-bracket| squared, where phi-k are the orthonormal eigenvectors.

  • The Schrodinger equation: i hbar d|psi angle-bracket/dt = H|psi angle-bracket, where H is the Hermitian Hamiltonian.
  • The time evolution operator e to the (-iHt/hbar) is unitary by the spectral theorem.
  • Entanglement corresponds to non-separability in the tensor product structure of multi-particle Hilbert spaces.

Control Theory: Stability via Eigenvalues

A linear time-invariant system dx/dt = Ax is stable (all solutions decay to zero) if and only if all eigenvalues of A have strictly negative real parts. Lyapunov's stability theorem reformulates this: the system is stable if and only if for every positive definite Q, there exists a unique positive definite P satisfying the Lyapunov equation A-transpose P + PA = minus Q. The function V(x) = x-transpose Px is a Lyapunov function (an energy-like function that decreases along trajectories).

Recommender Systems and Matrix Completion

The Netflix problem: given a partially observed matrix of user-movie ratings, complete the missing entries by finding the minimum nuclear norm matrix consistent with the observations. Under the RIP condition on the sampling operator, the nuclear norm minimization exactly recovers the true low-rank rating matrix. Alternating minimization and stochastic gradient descent on factored representations A approx U V-transpose are practical alternatives.

13. Practice Problems with Solutions

Work through these problems to test your understanding of advanced linear algebra. Click to reveal detailed solutions.

Problem 1: Gram-Schmidt Orthogonalization

Apply Gram-Schmidt to the vectors v1 = (1, 1, 0), v2 = (1, 0, 1), v3 = (0, 1, 1) in R-3 with the standard dot product. Find an orthonormal basis for R-3.

Solution

  • Step 1: u1 = v1 = (1,1,0). norm(u1) = sqrt(2). e1 = (1/sqrt(2), 1/sqrt(2), 0).
  • Step 2: Projection of v2 onto e1: angle-bracket v2, e1 angle-bracket = 1/sqrt(2). u2 = v2 minus (1/sqrt(2)) times e1 = (1,0,1) minus (1/2, 1/2, 0) = (1/2, -1/2, 1). norm(u2) = sqrt(1/4 + 1/4 + 1) = sqrt(3/2). e2 = (1/sqrt(6), -1/sqrt(6), 2/sqrt(6)).
  • Step 3: Subtract projections of v3 onto e1 and e2. angle-bracket v3, e1 angle-bracket = 1/sqrt(2). angle-bracket v3, e2 angle-bracket = (-1/sqrt(6) + 2/sqrt(6)) = 1/sqrt(6). u3 = (0,1,1) minus (1/2, 1/2, 0) minus (1/6, -1/6, 2/6) = (-2/3, 2/3, 2/3). After normalizing: e3 = (-1/sqrt(3), 1/sqrt(3), 1/sqrt(3)).
  • Verification: Check that each pair of e-i has zero dot product and each e-i has unit norm.
Problem 2: Spectral Decomposition

Find the spectral decomposition A = Q D Q-transpose for the symmetric matrix A with entries: A(1,1) = 3, A(1,2) = A(2,1) = 1, A(2,2) = 3. Compute e to the A using the spectral decomposition.

Solution

  • Eigenvalues: det(A minus lambda I) = (3 minus lambda) squared minus 1 = 0. So (3 minus lambda) = plus or minus 1, giving lambda-1 = 2, lambda-2 = 4.
  • Eigenvectors: For lambda = 2: (A minus 2I)v = 0 gives v = (1,-1)/sqrt(2). For lambda = 4: v = (1,1)/sqrt(2).
  • Decomposition: A = 2 times q1 q1-transpose + 4 times q2 q2-transpose, where q1 = (1,-1)/sqrt(2) and q2 = (1,1)/sqrt(2).
  • Matrix exponential: e to the A = e squared times q1 q1-transpose + e to the 4 times q2 q2-transpose. In matrix form: e to the A has entry (1,1) = (e squared + e to the 4)/2, entry (1,2) = entry (2,1) = (e to the 4 minus e squared)/2, entry (2,2) = (e squared + e to the 4)/2.
Problem 3: Computing SVD and Pseudoinverse

Find the SVD of the 2-by-3 matrix A with rows (1, 0, 0) and (0, 1, 1). Then compute the pseudoinverse A-plus and verify A A-plus A = A.

Solution

  • A-transpose A: 3-by-3 matrix with entries (1,0,0),(0,1,1),(0,1,1). Eigenvalues are 1, 2, 0. Corresponding eigenvectors: (1,0,0), (0,1,1)/sqrt(2), (0,1,-1)/sqrt(2).
  • Singular values: sigma-1 = sqrt(2), sigma-2 = 1. Rank is 2.
  • Right singular vectors (V): v1 = (0,1,1)/sqrt(2), v2 = (1,0,0), v3 = (0,1,-1)/sqrt(2).
  • Left singular vectors (U): u1 = Av1/sigma-1 = (0,1), u2 = Av2/sigma-2 = (1,0).
  • Pseudoinverse: A-plus = V Sigma-plus U-transpose. Sigma-plus has entries 1/sigma-1 and 1/sigma-2 in the first two diagonal positions. A-plus has rows: (0,0), (1/2, 0), (1/2, 1) after transposing.
Problem 4: Jordan Normal Form

Find the Jordan normal form of the 3-by-3 matrix A with characteristic polynomial (lambda minus 2) squared times (lambda minus 3). Suppose A minus 2I has rank 1. Determine the Jordan block structure.

Solution

  • Eigenvalues: lambda = 2 (algebraic multiplicity 2) and lambda = 3 (algebraic multiplicity 1).
  • Geometric multiplicity at lambda = 2: Geometric multiplicity = 3 minus rank(A minus 2I) = 3 minus 1 = 2... Wait — if rank(A minus 2I) = 1, then nullity = 3 minus 1 = 2. But algebraic multiplicity is only 2, so geometric multiplicity is 2 as well. This means lambda = 2 contributes two 1-by-1 Jordan blocks (it is semisimple).
  • Jordan form: J = diag(2, 2, 3). The matrix is actually diagonalizable despite the repeated eigenvalue, because the geometric and algebraic multiplicities both equal 2.
  • Contrast: If rank(A minus 2I) = 2, geometric multiplicity = 1, and we would have one 2-by-2 Jordan block for lambda = 2.
Problem 5: Positive Definiteness and Cholesky

Determine whether A with rows (4, 2) and (2, 3) is positive definite. If so, compute the Cholesky decomposition A = L L-transpose.

Solution

  • Sylvester criterion: First leading principal minor: det(4) = 4 greater than 0. Second: det(A) = 4 times 3 minus 2 times 2 = 12 minus 4 = 8 greater than 0. Both positive, so A is positive definite.
  • Cholesky computation:l_11 = sqrt(a_11) = sqrt(4) = 2. l_21 = a_21 / l_11 = 2/2 = 1. l_22 = sqrt(a_22 minus l_21 squared) = sqrt(3 minus 1) = sqrt(2).
  • Result: L has rows (2, 0) and (1, sqrt(2)). Verify: L times L-transpose = (2,0;1,sqrt(2)) times (2,1;0,sqrt(2)) = (4,2;2,1+2) = (4,2;2,3) = A. Confirmed.
Problem 6: Tensor Product and Kronecker Product

Compute the Kronecker product of A = (1, 2; 3, 4) (2-by-2) and B = I (the 2-by-2 identity). Verify the eigenvalue formula: eigenvalues of A tensor I are products of eigenvalues of A and eigenvalues of I.

Solution

  • Kronecker product: A tensor I replaces each entry a_ij of A by a_ij times I. Result is the 4-by-4 block diagonal matrix diag(A, A)... actually no: rows and columns are interleaved. A tensor I has the 4-by-4 matrix with 2-by-2 blocks: [1*I, 2*I; 3*I, 4*I] = [(1,0,2,0),(0,1,0,2),(3,0,4,0),(0,3,0,4)].
  • Eigenvalues of A: Characteristic polynomial is (lambda minus 1)(lambda minus 4) minus 6 = lambda squared minus 5 lambda minus 2 = 0. lambda = (5 plus or minus sqrt(33))/2.
  • Eigenvalues of I: Both equal 1.
  • Eigenvalues of A tensor I: Products lambda-i times 1 = lambda-i. Each eigenvalue of A appears twice (with multiplicity 2). So the 4 eigenvalues are (5 plus sqrt(33))/2 (twice) and (5 minus sqrt(33))/2 (twice). This matches the characteristic polynomial of the 4-by-4 matrix.
Problem 7: Condition Number and Numerical Sensitivity

The Hilbert matrix H of order n has entries H_ij = 1/(i+j-1). For n = 3, H has entries H_11 = 1, H_12 = 1/2, H_13 = 1/3, H_22 = 1/3, etc. Explain why solving Hx = b is numerically challenging and what strategies mitigate the issue.

Solution

  • Condition number: The 3-by-3 Hilbert matrix has condition number approximately 524. For n = 10, the condition number exceeds 10 to the 13, meaning nearly all significant digits are lost in double precision.
  • Why Hilbert matrices are ill-conditioned: The rows become nearly linearly dependent because 1/(i+j-1) varies slowly. The smallest singular value decays exponentially with n.
  • Mitigation strategies:(1) Use extended precision arithmetic (quad precision or arbitrary precision libraries). (2) Tikhonov regularization: replace H by H plus lambda I for small lambda, trading bias for stability. (3) If the right-hand side b is structured, exploit it analytically. (4) Preconditioning with diagonal scalings can modestly help but cannot fix fundamental ill-conditioning. (5) Reformulate the problem: if H arises from polynomial fitting, switch to an orthogonal polynomial basis (Legendre, Chebyshev) which gives well-conditioned Gram matrices.

Quick Reference: Key Theorems and Formulas

Spectral Theorem

  • Real symmetric A: A = Q D Q-transpose, eigenvalues real
  • Complex Hermitian A: A = U D U-star, eigenvalues real
  • Normal A: unitarily diagonalizable

SVD

  • A = U Sigma V-transpose for any A
  • Pseudoinverse: A-plus = V Sigma-plus U-transpose
  • Best rank-k approx: Eckart-Young theorem

Positive Definite Tests

  • All eigenvalues positive
  • Sylvester: all leading minors positive
  • Cholesky decomposition exists
  • x-transpose A x greater than 0 for all nonzero x

Matrix Norms

  • Frobenius: sqrt(trace(A-transpose A)) = sqrt(sum sigma-i squared)
  • Spectral: sigma-1 (max singular value)
  • Nuclear: sum of sigma-i
  • Condition number: sigma-1 / sigma-r

Jordan Form

  • Every complex matrix similar to Jordan form
  • Diagonalizable iff minimal poly has no repeated roots
  • Cayley-Hamilton: p(A) = 0 for characteristic poly p

Gram-Schmidt

  • uk = vk minus sum of projections onto previous e-j
  • ek = uk / norm(uk)
  • Numerically: use modified Gram-Schmidt
  • Equivalent to QR decomposition of column matrix

Frequently Asked Questions

When is eigendecomposition possible versus when must you use Jordan form?

Eigendecomposition A = P D P-inverse (with D diagonal) is possible exactly when A has n linearly independent eigenvectors. This happens when all eigenvalues are distinct, or more generally when for each eigenvalue the algebraic and geometric multiplicities agree (equivalently, the minimal polynomial has no repeated roots). When an eigenvalue has geometric multiplicity strictly less than its algebraic multiplicity (a defective eigenvalue), the matrix is not diagonalizable and Jordan form is the canonical substitute.

Why is SVD preferred over eigendecomposition for general matrices?

Eigendecomposition requires a square matrix and may involve complex eigenvalues even for real matrices. SVD applies to any m-by-n matrix, always produces real non-negative singular values, and uses orthogonal (numerically stable) transformations. SVD is backward stable and well-suited to ill-conditioned matrices. The singular values are more numerically meaningful than eigenvalues for non-symmetric matrices: for instance, the eigenvalues of a matrix with large off-diagonal entries can be wildly sensitive to perturbations while singular values are always stable.

What is the difference between a tensor and a matrix?

A matrix is a 2-index array representing a bilinear form or linear map. A tensor generalizes this to any number of indices: a (p,q)-tensor has p contravariant and q covariant indices. Crucially, the tensor concept is coordinate-independent and transforms according to specific rules when changing bases. A matrix is a tensor of type (1,1) when it represents a linear map, but as a bilinear form it is type (0,2). In machine learning, the term "tensor" often just means a multi-dimensional array, which is a coordinate-dependent notion without the transformation rules.

How does the exterior algebra relate to the determinant?

The determinant is the unique alternating multilinear form on the columns of a matrix that equals 1 on the identity. The top exterior power Lambda-n(V) is one-dimensional, so any linear map V to V induces multiplication by a scalar on Lambda-n(V) — that scalar is the determinant. This gives a coordinate-free definition and explains why det(AB) = det(A)det(B): composing linear maps composes the induced scalar multiplications. It also explains why the determinant changes sign when two rows are swapped (transpositions in the symmetric group act by the sign character on the top exterior power).

Related Topics