Graduate Mathematics

Measure Theory: Complete Study Guide

Measure theory provides the rigorous mathematical foundation for integration, probability, and functional analysis. This guide covers sigma-algebras, the Lebesgue integral, convergence theorems, L^p spaces, and the measure-theoretic foundations of probability — from first definitions through graduate-level theorems.

Table of Contents

  1. 1. Sigma-Algebras and Measurable Spaces
  2. 2. Measures: Definitions and Basic Properties
  3. 3. Lebesgue Measure on R^n
  4. 4. Measurable Functions
  5. 5. The Lebesgue Integral
  6. 6. Convergence Theorems
  7. 7. L^p Spaces
  8. 8. Product Measures and Fubini-Tonelli
  9. 9. Signed Measures and Radon-Nikodym
  10. 10. Modes of Convergence
  11. 11. Egorov's Theorem and Lusin's Theorem
  12. 12. Probability as Measure Theory
  13. 13. Frequently Asked Questions

1. Sigma-Algebras and Measurable Spaces

The starting point of measure theory is understanding which sets can be assigned a size. The naive hope that every subset of R can be consistently measured turns out to be false — the Axiom of Choice allows the construction of so-called non-measurable sets (such as the Vitali set). The solution is to restrict attention to a structured collection of sets called a sigma-algebra.

Definition: Sigma-Algebra

A sigma-algebra (also written as sigma-field) on a set X is a collection M of subsets of X satisfying three axioms:

Axioms of a Sigma-Algebra M on X

  • (SA1) Empty set: the empty set is in M
  • (SA2) Closure under complements: if E is in M, then X minus E is in M
  • (SA3) Closure under countable unions: if E_1, E_2, ... are in M, then their union is in M

From these axioms, one derives closure under countable intersections (by De Morgan's law), countable set differences, and finite operations. The pair (X, M) is called a measurable space, and sets in M are called measurable sets.

Examples of Sigma-Algebras

The two extreme examples are the trivial sigma-algebra (containing only the empty set and X itself) and the power set 2^X (containing every subset of X). Between these extremes lie the sigma-algebras of greatest practical importance.

The Borel Sigma-Algebra on R

The Borel sigma-algebra B(R) is the smallest sigma-algebra on R that contains every open set. Equivalently, it is generated by any of the following families: all open intervals (a, b), all closed intervals [a, b], all half-open intervals (a, b], or all half-lines (-infinity, a]. Borel sets include all open sets, all closed sets, all countable unions of closed sets (F-sigma sets), and all countable intersections of open sets (G-delta sets). Every set you encounter in elementary analysis is a Borel set.

The Lebesgue Sigma-Algebra on R

The Lebesgue sigma-algebra L is strictly larger than B(R). It is the completion of B(R) with respect to Lebesgue measure: every subset of a Borel set of measure zero is declared measurable (with measure zero). This is important because it ensures that subsets of null sets are measurable, a property needed for most analytic arguments. The inclusion is strict: there exist Lebesgue measurable sets that are not Borel sets (though constructing one requires effort).

Generated Sigma-Algebras

Given any collection E of subsets of X, the sigma-algebra generated by E — written sigma(E) — is the smallest sigma-algebra containing E. It exists because the intersection of any family of sigma-algebras is again a sigma-algebra, so one can intersect all sigma-algebras containing E. This construction is used to define B(R) as sigma(open sets) and is essential for defining product sigma-algebras and sigma-algebras generated by random variables.

Non-Measurable Sets (Vitali Construction)

Assuming the Axiom of Choice, one constructs a Vitali set V as follows: define an equivalence relation on [0,1] by x ~ y iff x minus y is rational. By AC, choose one representative from each equivalence class to form V. Assuming V were measurable with measure m, the translates V + q for rational q in [0,1] are pairwise disjoint and their countable union covers [0,2]. So m must satisfy 1 at most the sum of infinitely many copies of m at most 3, which forces m = 0 (union has measure 0, too small) or m greater than 0 (the sum diverges, too large). Contradiction. Hence V is not measurable.

2. Measures: Definitions and Basic Properties

A measure assigns a non-negative extended real number to each measurable set, generalizing notions of length, area, volume, counting, and probability.

Definition: Measure

Measure Space Axioms

A measure on a measurable space (X, M) is a function mu: M to [0, +infinity] satisfying:

  • (M1) Non-negativity: mu(E) is at least 0 for all E in M
  • (M2) Null empty set: mu(empty set) = 0
  • (M3) Countable additivity: if E_1, E_2, ... are pairwise disjoint sets in M, then mu(union of E_n) = sum of mu(E_n)

The triple (X, M, mu) is a measure space. The countable additivity axiom (also called sigma-additivity) is far stronger than finite additivity and is what makes the theory powerful. Note that mu can take the value +infinity.

Fundamental Properties of Measures

Monotonicity

If E is a subset of F, then mu(E) is at most mu(F). This follows because F = E union (F minus E) and both sets are disjoint, so mu(F) = mu(E) + mu(F minus E) is at least mu(E).

Subadditivity (Countable)

For any sequence of measurable sets E_1, E_2, ...: mu(union of E_n) is at most the sum of mu(E_n). This holds even when the sets overlap, since one can write the union as a disjoint union of sets each contained in some E_n.

Continuity from Below

If E_1 is a subset of E_2 is a subset of ... is an increasing sequence of measurable sets, then mu(union of E_n) = limit of mu(E_n). This is the measure-theoretic analog of limits of increasing sequences.

Continuity from Above

If E_1 contains E_2 contains ... is a decreasing sequence of measurable sets and mu(E_1) is finite, then mu(intersection of E_n) = limit of mu(E_n). The finiteness hypothesis is essential: the decreasing sequence (n, infinity) in R has Lebesgue measure infinity at each step and empty intersection.

Null Sets and Completeness

A null set (or set of measure zero) is a measurable set E with mu(E) = 0. Null sets can be large in cardinality: the Cantor set is uncountable yet has Lebesgue measure zero. A property is said to hold almost everywhere (a.e.) if the set where it fails is a null set.

A measure space (X, M, mu) is complete if every subset of a null set is measurable (and hence also a null set). The Lebesgue measure space is complete; the Borel measure space is not. Completing a measure space means adding all subsets of null sets to the sigma-algebra — this is the Lebesgue completion of the Borel sigma-algebra.

Examples of Measures

Counting Measure

On any set X with M = 2^X, define mu(E) = the cardinality of E. Integration with respect to counting measure on the natural numbers recovers series summation.

Dirac Measure

Fix a point x_0 in X. The Dirac measure delta_(x_0) assigns 1 to every set containing x_0 and 0 to every set not containing x_0. Integration against delta_(x_0) is evaluation at x_0: integral f d(delta_(x_0)) = f(x_0).

Probability Measures

A probability measure P satisfies P(X) = 1. Every probability distribution (Gaussian, Poisson, Binomial) defines a probability measure on (R, B(R)).

Lebesgue Measure

The unique translation-invariant Borel measure on R^n assigning measure 1 to the unit cube [0,1]^n. This is the formal basis for length (n=1), area (n=2), and volume (n=3).

3. Lebesgue Measure on R^n

The Lebesgue measure is the canonical measure on Euclidean space. Its construction is non-trivial: one must prove the existence of a measure on the Borel (or Lebesgue) sigma-algebra that assigns to each rectangle its classical volume.

Construction via Outer Measure

The Lebesgue outer measure m^* of any subset E of R is defined by:

m^*(E) = inf (sum of lengths of intervals I_n)

where the infimum is over all countable covers of E by open intervals I_n

The outer measure is defined on all subsets of R (not just measurable ones), but it is only countably subadditive, not countably additive. The Caratheodory extension theorem identifies which sets can be measured consistently: a set E is Caratheodory-measurable if for every set A:

m^*(A) = m^*(A intersect E) + m^*(A intersect E^c)

Intuitively: E cuts every set A into two pieces whose outer measures add up correctly

Caratheodory's theorem states that the Caratheodory-measurable sets form a sigma-algebra, and the restriction of m^* to this sigma-algebra is a complete measure. This sigma-algebra contains all Borel sets, and the resulting measure is Lebesgue measure m.

Key Properties of Lebesgue Measure

Translation Invariance

For any measurable set E and any vector v in R^n: m(E + v) = m(E). Lebesgue measure does not depend on where in space a set is located.

Scaling

For c greater than 0: m(cE) = c^n times m(E) where n is the dimension. Scaling by c in each direction multiplies n-dimensional volume by c^n.

Regularity

Lebesgue measure is both outer regular (m(E) = inf of m(U) over open sets U containing E) and inner regular (m(E) = sup of m(K) over compact sets K contained in E). This allows approximation of measurable sets by open or compact sets.

Countable Sets Have Measure Zero

Every countable subset of R has Lebesgue measure zero. In particular, the rationals Q have measure zero in R, even though they are dense. This motivates the concept of "almost everywhere."

The Cantor Set: An Instructive Example

The Cantor set C is constructed by removing the middle third (1/3, 2/3) from [0,1], then the middle thirds of the remaining two intervals, and so on, countably many times. The total length removed is:

1/3 + 2/9 + 4/27 + ... = (1/3) times (1 / (1 - 2/3)) = 1

So m(C) = 1 - 1 = 0: the Cantor set has Lebesgue measure zero. Yet C is uncountable (it bijects with [0,1] via ternary expansions using digits 0 and 2), perfect (closed and every point is a limit point), and nowhere dense (its interior is empty). The Cantor set demonstrates that "small in measure" and "small in cardinality" are entirely different notions.

4. Measurable Functions

Just as continuity is the right notion of "nice" function for topology, measurability is the right notion for measure theory. Measurable functions are precisely those whose preimages of measurable sets are measurable.

Definition: Measurable Function

Measurability Condition

A function f: (X, M) to (Y, N) between measurable spaces is measurable if for every set E in N, the preimage f^(-1)(E) = (x in X : f(x) in E) is in M.

For real-valued functions (Y = R with the Borel sigma-algebra), it suffices to check that f^(-1)((a, +infinity)) is in M for every real number a. This is the most commonly used criterion.

Stability Properties

The class of measurable functions is closed under all operations one might want to perform:

  • Sums, differences, products, quotients (when the denominator is non-zero a.e.) of measurable functions are measurable.
  • If f_n are measurable, then sup f_n, inf f_n, limsup f_n, and liminf f_n are measurable. In particular, pointwise limits of measurable functions are measurable.
  • Every continuous function from R to R is Borel measurable (measurable with respect to the Borel sigma-algebras).
  • Compositions: if f is measurable and g is continuous (or more generally Borel measurable), then g composed with f is measurable.

Simple Functions

A simple function is a measurable function taking only finitely many values. If the values are c_1, ..., c_n and the preimages are E_1, ..., E_n (which partition X), then the standard representation is:

phi(x) = sum_(i=1)^n c_i times 1_(E_i)(x)

where 1_(E_i) is the indicator function of E_i

Simple functions are the building blocks of Lebesgue integration. The key approximation theorem states: every non-negative measurable function f is the pointwise limit of an increasing sequence of non-negative simple functions. Moreover, if f is bounded, the convergence is uniform.

5. The Lebesgue Integral

The Lebesgue integral is constructed in three stages: first for non-negative simple functions, then for non-negative measurable functions via approximation, and finally for general measurable functions by splitting into positive and negative parts.

Stage 1: Integration of Simple Functions

For a non-negative simple function phi with standard representation phi = sum c_i times 1_(E_i), the integral is defined as the weighted sum of measures:

integral phi d(mu) = sum_(i=1)^n c_i times mu(E_i)

using the convention 0 times infinity = 0

One checks this is well-defined (independent of the representation of phi) and that integration of simple functions is linear and monotone.

Stage 2: Integration of Non-Negative Functions

For a non-negative measurable function f, define:

integral f d(mu) = sup (integral phi d(mu))

where the supremum is over all simple functions phi with 0 at most phi at most f

This value in [0, +infinity] is always well-defined. The Monotone Convergence Theorem (proved at this stage) is crucial: if phi_n increases to f, then the integral of phi_n approaches the integral of f. This justifies the approximation procedure.

Stage 3: Integration of General Functions

For a general measurable function f, decompose it into positive and negative parts:

f^+(x) = max(f(x), 0), f^-(x) = max(-f(x), 0)

f = f^+ minus f^-, |f| = f^+ + f^-

Then f is called integrable (or in L^1) if both integrals of f^+ and f^- are finite, and one defines:

integral f d(mu) = integral f^+ d(mu) minus integral f^- d(mu)

Comparison with the Riemann Integral

If f is Riemann integrable on [a, b], then f is Lebesgue integrable and the two integrals agree. The converse fails: the Dirichlet function (1 on rationals, 0 on irrationals) has Lebesgue integral 0 but is not Riemann integrable. A bounded function on [a, b] is Riemann integrable if and only if it is continuous almost everywhere — this is the Lebesgue characterization of Riemann integrability.

6. The Three Great Convergence Theorems

The power of the Lebesgue integral over the Riemann integral lies in its convergence theorems, which justify interchanging limits and integrals under much weaker hypotheses.

Monotone Convergence Theorem (MCT)

Theorem (MCT)

Let (f_n) be a sequence of non-negative measurable functions on (X, M, mu) with f_n(x) increasing to f(x) for almost every x. Then:

lim_(n to infinity) integral f_n d(mu) = integral f d(mu)

Proof sketch: Since the sequence is increasing, its integrals form an increasing sequence of extended reals, converging to some limit L. Since f_n is at most f, we have L at most integral f d(mu). For the reverse inequality, fix epsilon in (0, 1) and a simple function phi with 0 at most phi at most f. Let A_n = (x : f_n(x) is at least epsilon times phi(x)). Then A_n increases to X, and by continuity from below: integral f_n d(mu) is at least integral over A_n of f_n d(mu) is at least epsilon times integral over A_n of phi d(mu), which approaches epsilon times integral phi d(mu). Since epsilon and phi were arbitrary, L is at least integral f d(mu).

Fatou's Lemma

Theorem (Fatou's Lemma)

For any sequence of non-negative measurable functions (f_n):

integral (liminf_(n) f_n) d(mu) is at most liminf_(n) integral f_n d(mu)

Proof sketch: Apply MCT to g_n = inf_(k at least n) f_k, which increases to liminf f_n. Since g_n is at most f_k for all k at least n, we have integral g_n d(mu) is at most inf_(k at least n) integral f_k d(mu). Taking the limit and applying MCT gives the result.

Fatou's lemma shows the integral of the limit infimum is at most the limit infimum of the integrals. The inequality can be strict: consider f_n = n times 1_(0, 1/n) on [0, 1]. Each f_n has integral 1, but f_n converges to 0 pointwise, and the integral of 0 is 0.

Dominated Convergence Theorem (DCT)

Theorem (DCT — Lebesgue)

Let (f_n) be measurable functions converging a.e. to f. Suppose there exists an integrable function g (the dominating function) such that |f_n| is at most g a.e. for all n. Then f is integrable and:

lim_(n to infinity) integral f_n d(mu) = integral f d(mu)

Moreover: lim_(n) integral |f_n minus f| d(mu) = 0

Proof sketch: Apply Fatou's lemma to g + f_n (non-negative) and g minus f_n (non-negative) separately. Adding the resulting inequalities and using that integral g is finite yields the conclusion.

The DCT is perhaps the most-used tool in analysis. Typical applications include: differentiating under the integral sign (verify the derivative is bounded by an integrable function), justifying power series integration term by term, and proving continuity of parameter-dependent integrals.

Differentiating Under the Integral Sign (Leibniz Rule)

If F(t) = integral f(x, t) d(mu)(x), one wants F'(t) = integral (partial f / partial t)(x, t) d(mu)(x). This is justified by the DCT when the partial derivative is bounded by an integrable function: |partial f / partial t| at most g(x) for all t in a neighborhood of t_0, with integral g d(mu) finite.

7. L^p Spaces

L^p spaces are the natural function spaces arising in measure theory, providing the framework for harmonic analysis, PDEs, and functional analysis. They are Banach spaces — complete normed vector spaces — with rich structure.

Definitions

For a measure space (X, M, mu) and 1 at most p less than infinity, define:

L^p(mu) = (f measurable : integral |f|^p d(mu) less than infinity)

||f||_p = (integral |f|^p d(mu))^(1/p)

L^infinity(mu) = (f measurable : ess sup |f| less than infinity)

||f||_infinity = ess sup |f| = inf (M : |f| at most M a.e.)

Technically, L^p consists of equivalence classes of functions that agree almost everywhere, so that ||f||_p = 0 implies f = 0 (the positive definiteness of the norm requires this identification). When we write "f in L^p," we always mean an equivalence class.

Holder's Inequality

Theorem (Holder)

Let 1 at most p at most infinity with conjugate exponent q defined by 1/p + 1/q = 1 (so q = p/(p-1), with q = infinity when p = 1). If f is in L^p and g is in L^q, then fg is in L^1 and:

|integral fg d(mu)| at most ||f||_p times ||g||_q

Proof sketch: Without loss of generality normalize so ||f||_p = ||g||_q = 1. Apply Young's inequality (ab at most a^p/p + b^q/q for non-negative a, b) to |f(x)| and |g(x)|, then integrate both sides.

The special case p = q = 2 is the Cauchy-Schwarz inequality for integrals. Holder's inequality is used to prove Minkowski's inequality, to establish duality of L^p spaces, and throughout PDEs to estimate norms.

Minkowski's Inequality (Triangle Inequality for L^p)

Theorem (Minkowski)

For 1 at most p at most infinity and f, g in L^p:

||f + g||_p at most ||f||_p + ||g||_p

This is precisely the triangle inequality, confirming that ||.||_p is a genuine norm. Proof: write |f + g|^p at most |f| times |f + g|^(p-1) + |g| times |f + g|^(p-1) and apply Holder to each term with exponents p and q.

Completeness: The Riesz-Fischer Theorem

The most important structural theorem about L^p spaces:

Theorem (Riesz-Fischer)

For 1 at most p at most infinity, L^p(mu) is a Banach space (a complete normed vector space). That is, every Cauchy sequence in L^p converges to an element of L^p.

Proof sketch for p less than infinity: Given a Cauchy sequence (f_n), extract a subsequence (f_n_k) with ||f_(n_(k+1)) minus f_(n_k)||_p less than 2^(-k). Let g = sum |f_(n_(k+1)) minus f_(n_k)|. By Minkowski and MCT, ||g||_p is finite, so g is finite a.e. The telescoping sum converges absolutely a.e. to some f in L^p, and one shows f_n converges to f in L^p norm.

L^2 as a Hilbert Space

The space L^2(mu) carries an inner product:

<f, g> = integral f times conjugate(g) d(mu)

inducing the norm ||f||_2 = sqrt(<f, f>)

L^2 is a Hilbert space. This additional structure (orthogonality, projections, orthonormal bases) makes L^2 the central object in Fourier analysis, quantum mechanics, and the spectral theory of operators. The Fourier series of a function converges in L^2 norm by the completeness of the trigonometric system.

Inclusions and Relations Between L^p Spaces

On a finite measure space with mu(X) less than infinity: L^q is contained in L^p whenever p is at most q (larger exponents impose more integrability). On infinite measure spaces, no such inclusion holds in general. The dual space of L^p (for 1 less than p less than infinity) is L^q where 1/p + 1/q = 1. The dual of L^1 is L^infinity; the dual of L^infinity is strictly larger than L^1.

8. Product Measures and the Fubini-Tonelli Theorem

Product measures formalize the notion of multi-dimensional integration and justify switching the order of iterated integrals.

Construction of the Product Measure

Given two sigma-finite measure spaces (X, M, mu) and (Y, N, nu), the product sigma-algebra M times-cross N is generated by measurable rectangles A times B (with A in M, B in N). The product measure mu times nu is the unique measure on M times-cross N satisfying:

(mu times nu)(A times B) = mu(A) times nu(B)

for all A in M, B in N

Existence and uniqueness follow from the Caratheodory extension theorem applied to the premeasure defined on rectangles. The sigma-finiteness hypothesis is necessary for uniqueness.

Tonelli's Theorem (Non-Negative Functions)

Theorem (Tonelli)

Let (X, M, mu) and (Y, N, nu) be sigma-finite measure spaces, and let f: X times Y to [0, +infinity] be measurable. Then:

  • For a.e. x: the section y to f(x, y) is measurable
  • The function x to integral f(x, y) d(nu)(y) is measurable
  • The iterated integrals equal the double integral: integral integral f d(nu) d(mu) = integral f d(mu times nu) = integral integral f d(mu) d(nu)

Fubini's Theorem (Integrable Functions)

Theorem (Fubini)

Under the same setup, if f is in L^1(mu times nu), then: for a.e. x the section y to f(x, y) is nu-integrable; the function x to integral f(x, y) d(nu)(y) is mu-integrable; and both iterated integrals equal the double integral.

The standard workflow is: apply Tonelli to |f| to verify integrability (if one iterated integral of |f| is finite, then f is in L^1), then apply Fubini to f itself to switch the order of integration freely.

Warning: Fubini Can Fail Without Integrability

The classic counterexample: define f(x, y) = (x^2 minus y^2) / (x^2 + y^2)^2 on [0,1] times [0,1]. The two iterated integrals give pi/4 and -pi/4 — different values. This function is not in L^1, so Fubini does not apply. Always check integrability via Tonelli before switching order.

9. Signed Measures and the Radon-Nikodym Theorem

Signed measures generalize measures by allowing negative values. They arise naturally as differences of measures and as indefinite integrals of integrable functions that change sign.

Signed Measures

A signed measure nu on (X, M) is a function nu: M to [-infinity, +infinity] satisfying nu(empty set) = 0 and countable additivity for pairwise disjoint sequences (with the constraint that at most one of +infinity, -infinity is attained, and the sum converges absolutely when both finite and infinite terms are present).

Hahn Decomposition Theorem

Theorem (Hahn Decomposition)

For every signed measure nu on (X, M), there exists a partition of X into a positive set P and a negative set N (with P union N = X, P intersect N = empty set) such that: for every measurable E contained in P, nu(E) is at least 0; and for every measurable E contained in N, nu(E) is at most 0. The decomposition is essentially unique (unique up to null sets).

From the Hahn decomposition, one obtains the Jordan decomposition nu = nu^+ minus nu^- where nu^+(E) = nu(E intersect P) and nu^-(E) = -nu(E intersect N) are both positive measures. The total variation measure is |nu| = nu^+ + nu^-.

Absolute Continuity and Singularity

Two measures mu and nu on the same space can be related in two extreme ways:

Absolute Continuity: nu is much less than mu

nu is absolutely continuous with respect to mu if mu(E) = 0 implies nu(E) = 0 for every measurable E. Informally: nu cannot "see" anything that mu assigns measure zero. Every measure of the form nu(E) = integral over E of f d(mu) for a non-negative measurable f is absolutely continuous with respect to mu.

Singular Measures: nu is perpendicular to mu

nu and mu are mutually singular if there exists a partition X = A union B such that nu is concentrated on A (nu(B) = 0) and mu is concentrated on B (mu(A) = 0). They "live on disjoint sets." Example: Lebesgue measure and the Dirac delta are mutually singular.

Lebesgue-Radon-Nikodym Theorem

Theorem (Lebesgue Decomposition + Radon-Nikodym)

Let mu and nu be sigma-finite measures on (X, M). Then there exists a unique decomposition nu = nu_ac + nu_s where nu_ac is absolutely continuous with respect to mu and nu_s is singular with respect to mu. Moreover, there exists a unique non-negative measurable function f (the Radon-Nikodym derivative, written dnu/dmu) such that:

nu_ac(E) = integral over E of f d(mu) for all E in M

The Radon-Nikodym theorem is the measure-theoretic analog of the Fundamental Theorem of Calculus. In probability theory, if P and Q are probability measures with P absolutely continuous with respect to Q, then dP/dQ is called the likelihood ratio or density, central to Bayesian inference and change-of-measure arguments (the Girsanov theorem in stochastic calculus).

10. Modes of Convergence

In analysis, "convergence" can mean many different things. Understanding the relationships between modes of convergence is essential for applying the right theorem.

The Four Main Modes

Pointwise Convergence (everywhere)

f_n to f everywhere if for every x in X: lim_(n) f_n(x) = f(x). The convergence rate can vary wildly from point to point.

Convergence Almost Everywhere (a.e.)

f_n to f a.e. if mu((x : f_n(x) does not converge to f(x))) = 0. The convergence can fail on a null set. Most theorems in Lebesgue theory work with a.e. convergence rather than everywhere convergence.

Convergence in Measure

f_n to f in measure if for every epsilon greater than 0:

lim_(n) mu((x : |f_n(x) minus f(x)| greater than epsilon)) = 0

The set where the functions differ by epsilon shrinks to a null set, but the functions might be very different on a shrinking set.

Convergence in L^p Norm

f_n to f in L^p if:

lim_(n) ||f_n minus f||_p = lim_(n) (integral |f_n minus f|^p d(mu))^(1/p) = 0

Implications and Counterexamples

The relationships between modes of convergence, and the failure of converses, are captured in the following diagram and counterexamples:

  • L^p to in measure: Convergence in L^p implies convergence in measure (by Markov's inequality applied to |f_n minus f|^p).
  • a.e. to in measure (finite measure): On a finite measure space, a.e. convergence implies convergence in measure (by Egorov's theorem or directly).
  • In measure to a.e. (subsequence): Convergence in measure implies there is a subsequence converging a.e.
  • a.e. does NOT imply L^p: Typewriter sequence: f_n = 1_([(n-2^k)/2^k, (n-2^k+1)/2^k]) for n = 2^k + j converges to 0 in measure but not a.e.
  • a.e. does NOT imply L^p without domination: f_n = n times 1_(0, 1/n) converges to 0 a.e. but ||f_n||_1 = 1 for all n, so no L^1 convergence.

11. Egorov's Theorem and Lusin's Theorem

Two classical theorems connect measure-theoretic and topological notions of "almost uniform" behavior of measurable functions.

Egorov's Theorem

Theorem (Egorov)

Let (X, M, mu) be a measure space with mu(X) less than infinity (finite!). If f_n to f almost everywhere, then for every epsilon greater than 0 there exists a measurable set E with mu(E^c) less than epsilon such that f_n to f uniformly on E.

In words: a.e. convergence on a finite measure space is "almost uniform" — you can excise a set of arbitrarily small measure and get uniform convergence on the rest.

Proof sketch: Define A_(n, k) = (x : |f_m(x) minus f(x)| at least 1/k for some m at least n). For fixed k, A_(n, k) decreases to the null set (x : f_m(x) does not converge to f(x)) as n increases, so by continuity from above mu(A_(n, k)) to 0. Choose n_k large enough that mu(A_(n_k, k)) less than epsilon/2^k. Let F = union A_(n_k, k). Then mu(F) less than epsilon and convergence on F^c is uniform.

The finiteness hypothesis is necessary: on R with Lebesgue measure, the sequence f_n = 1_(n, n+1) converges to 0 everywhere but not almost uniformly (any set of finite measure misses all but finitely many of the bumps).

Lusin's Theorem

Theorem (Lusin)

Let f: R to R be a Lebesgue measurable function. For every epsilon greater than 0 and every measurable set E of finite measure, there exists a compact set K contained in E with m(E minus K) less than epsilon such that the restriction of f to K is continuous.

This is sometimes paraphrased as: "every measurable function is nearly continuous" — continuous on all but an arbitrarily small set. Note that f is NOT being claimed to be continuous; only its restriction to K is continuous.

Lusin's theorem implies that every measurable function can be approximated in L^p by continuous functions of compact support, which is used to prove density of smooth functions in L^p spaces.

12. Probability as Measure Theory

Modern probability theory is built entirely on measure theory. The measure-theoretic framework unifies discrete, continuous, and mixed probability distributions and provides the tools to prove limit theorems rigorously.

The Kolmogorov Probability Space

A probability space is a measure space (Omega, F, P) where:

  • Omega — the sample space: the set of all possible outcomes
  • F — the event sigma-algebra: a sigma-algebra of subsets of Omega representing "events" to which probabilities can be assigned
  • P — the probability measure: P(Omega) = 1, P(E) is at least 0, and P is countably additive on F

This framework, introduced by Kolmogorov in 1933, resolved longstanding foundational questions and unified all of probability theory under a single mathematical structure.

Random Variables as Measurable Functions

A random variable X is a measurable function X: (Omega, F) to (R, B(R)). The law (or distribution) of X is the pushforward measure P_X = P composed with X^(-1) on B(R):

P_X(B) = P(X^(-1)(B)) = P(omega : X(omega) in B)

Two random variables with the same law are identically distributed even if they live on entirely different probability spaces. This abstraction allows one to work with distributions without specifying the underlying probability space.

Expectation as Lebesgue Integral

The expectation of a random variable X is its Lebesgue integral with respect to P:

E[X] = integral_(Omega) X(omega) dP(omega)

or equivalently integral_(R) x dP_X(x) (change of variables)

For discrete X: E[X] = sum x_i P(X = x_i) (the familiar formula). For absolutely continuous X with density f_X: E[X] = integral x f_X(x) dx. Both are special cases of the Lebesgue integral. The DCT becomes the Dominated Convergence Theorem for expectations: if X_n to X a.s. and |X_n| is at most Y with E[Y] finite, then E[X_n] to E[X].

Independence via Product Measures

Events A and B are independent if P(A intersect B) = P(A) times P(B). Random variables X and Y are independent if the sigma-algebras they generate are independent, which is equivalent to: the joint law P_(X,Y) equals the product measure P_X times P_Y on B(R^2). This product measure perspective immediately gives: E[XY] = E[X] times E[Y] for independent X, Y (by Fubini).

Strong Law of Large Numbers

Theorem (Strong LLN — Kolmogorov)

Let X_1, X_2, ... be i.i.d. random variables with E[|X_1|] finite and E[X_1] = mu. Then:

(X_1 + X_2 + ... + X_n) / n to mu almost surely

The "almost surely" is precisely the measure-theoretic a.e.: the set of outcomes omega for which the averages do not converge to mu has probability zero. The proof uses:

  • 1.Borel-Cantelli lemma (a measure-theoretic result about limsups of events): if sum P(A_n) is finite, then P(limsup A_n) = 0.
  • 2.Kolmogorov's maximal inequality and truncation arguments to reduce to the case of bounded variables.
  • 3.The MCT or DCT to handle the truncation error.

Conditional Expectation via Radon-Nikodym

The measure-theoretic definition of conditional expectation E[X | G] for a sub-sigma-algebra G of F is the unique G-measurable random variable Y satisfying:

integral over A of Y dP = integral over A of X dP for all A in G

Existence follows from the Radon-Nikodym theorem: the map A to integral over A of X dP defines a measure on (Omega, G) that is absolutely continuous with respect to P restricted to G, and Y is its Radon-Nikodym derivative. This abstract definition unifies conditioning on events, conditioning on discrete random variables, and conditioning on continuous random variables.

13. Frequently Asked Questions

What is a sigma-algebra and why is it needed in measure theory?

A sigma-algebra on a set X is a collection of subsets containing the empty set, closed under complements, and closed under countable unions. Sigma-algebras are needed because not every subset of R can be assigned a consistent notion of length — the Vitali set is a classic non-measurable set constructed using the Axiom of Choice. By restricting to a sigma-algebra, we obtain a collection of sets that can be measured consistently without contradictions.

What is the difference between the Riemann integral and the Lebesgue integral?

The Riemann integral partitions the domain and sums vertical rectangles. The Lebesgue integral partitions the range and measures the set of x-values where the function has a given height. The Lebesgue approach handles many more functions (such as the Dirichlet function, which is 1 on rationals and 0 on irrationals — Lebesgue integrable with integral 0, but not Riemann integrable), and has far superior convergence theorems allowing interchange of limits and integrals under mild conditions.

What does the Monotone Convergence Theorem state?

If (f_n) is an increasing sequence of non-negative measurable functions converging pointwise to f, then the Lebesgue integral of f equals the limit of the integrals of f_n. This justifies the construction of the Lebesgue integral via approximation by simple functions and is one of the two main tools for interchanging limits and integrals.

What is the Dominated Convergence Theorem and when can you apply it?

If f_n converges to f almost everywhere, and there exists an integrable function g with |f_n| at most g for all n, then the integral of f_n converges to the integral of f. The key requirement is a dominating integrable function g. Typical applications include differentiating under the integral sign, showing parameter-dependent integrals are continuous, and justifying term-by-term integration of series.

What is the Radon-Nikodym theorem and what is a key application?

If nu is absolutely continuous with respect to mu (two sigma-finite measures), then there exists a measurable function f — the Radon-Nikodym derivative dnu/dmu — such that nu(E) = integral over E of f d(mu). In probability, this gives the probability density function: if a probability measure P is absolutely continuous with respect to Lebesgue measure, the Radon-Nikodym derivative dP/dm is the density function. Conditional expectation is also defined via Radon-Nikodym.

What are L^p spaces and why are they important?

L^p consists of measurable functions whose p-th power of the absolute value is integrable (or essentially bounded for p = infinity). They are Banach spaces — complete normed vector spaces — fundamental to functional analysis, harmonic analysis, and PDEs. The key inequalities are Holder's (bounding the integral of a product) and Minkowski's (the triangle inequality for the L^p norm). L^2 is a Hilbert space, which makes it especially important in Fourier analysis and quantum mechanics.

What is the Fubini-Tonelli theorem and when do you use it?

Tonelli's theorem: for non-negative measurable functions on a product space, all three quantities (double integral and both iterated integrals) are equal — no integrability hypothesis needed. Fubini's theorem: for integrable functions (in L^1 of the product measure), the iterated integrals can be computed in either order. The practical workflow is: first apply Tonelli to |f| to verify f is in L^1 of the product measure; then Fubini applies and you can switch the order of integration freely.

How does measure theory unify probability theory?

A probability space (Omega, F, P) is a measure space with total measure 1. Random variables are measurable functions. Expectation is the Lebesgue integral. Independence is expressed via product measures. The Strong Law of Large Numbers becomes an almost-sure convergence result proved using Borel-Cantelli and convergence theorems. Conditional expectation is the Radon-Nikodym derivative. This unification, due to Kolmogorov (1933), gives probability a rigorous foundation and access to all of measure theory's tools.

Related Topics