Advanced Mathematics

Ergodic Theory: Measure-Preserving Dynamics and Long-Run Behavior

Ergodic theory studies the statistical and long-run behavior of dynamical systems. It bridges measure theory, probability, and analysis to answer a fundamental question: do time averages equal space averages? The answer illuminates everything from gas molecules to the distribution of prime numbers.

Learning Objectives

After working through this page, you will be able to:

  • Define measure-preserving transformations and verify the condition for standard examples including rotations, the doubling map, and shift maps.
  • State and prove the Poincare Recurrence Theorem and explain its physical and mathematical significance.
  • Define ergodicity, state its equivalent characterizations, and distinguish ergodic from non-ergodic systems with examples.
  • State and apply Birkhoff's Pointwise Ergodic Theorem and von Neumann's Mean Ergodic Theorem, understanding what each guarantees.
  • Define weak mixing, strong mixing, K-systems, and Bernoulli shifts, placing them in the mixing hierarchy.
  • Compute the Kolmogorov-Sinai entropy for standard examples and state the variational principle.
  • Describe the spectral theory of measure-preserving systems and identify spectral invariants.
  • Explain the Krylov-Bogoliubov theorem, unique ergodicity, and the Weyl Equidistribution Theorem.
  • Describe applications to statistical mechanics, number theory (Furstenberg's proof of Szemeredi's theorem), and information theory.

1. Measure-Preserving Transformations

Ergodic theory begins with a probability space (X, B, mu) and a transformation T: X to X that preserves the measure. The space X is the state space of the system (for example, a torus or a sequence space), B is a sigma-algebra of measurable subsets, and mu is a probability measure (mu(X) = 1).

Definition: Measure-Preserving Transformation

Formal Definition

T: X to X is measure-preserving if T is measurable and for all A in B:

mu(T⁻¹(A)) = mu(A)

Here T⁻¹(A) = (x in X : T(x) in A) is the preimage of A under T. Note that T need not be invertible; what matters is that the preimage of every set has the same measure as the set itself. When T is invertible and both T and T⁻¹ are measure-preserving, T is called a measure-preserving automorphism.

Example 1: Irrational Rotation of the Circle

Let X = R/Z (the circle, identified with [0,1) with addition mod 1), B the Borel sigma-algebra, and mu = Lebesgue measure. Fix alpha in (0,1) irrational and define:

T(x) = x + alpha mod 1

This is a rotation by angle 2*pi*alpha. Since Lebesgue measure is translation-invariant, mu(T⁻¹(A)) = mu(A - alpha mod 1) = mu(A) for every measurable A. Thus T is measure-preserving. The irrationality of alpha is not needed for measure preservation but is crucial for ergodicity, which we discuss in Section 3.

Example 2: The Doubling Map

Let X = [0,1), mu = Lebesgue measure, and define the doubling map:

T(x) = 2x mod 1

For any interval [a,b) in [0,1), its preimage under T consists of two intervals: [a/2, b/2) and [(a+1)/2, (b+1)/2), each of length (b-a)/2. The total preimage has measure (b-a)/2 + (b-a)/2 = b-a = mu([a,b)). By a standard extension argument, T preserves Lebesgue measure. The doubling map is related to the binary expansion of x: if x = 0.b₁b₂b₃... in binary, then T(x) = 0.b₂b₃... (the left shift on binary digits).

Example 3: The Shift Map on Sequence Spaces

Let A = (0, 1, ..., k-1) be a finite alphabet and X = A^Z (bi-infinite sequences with values in A). Equip X with the product sigma-algebra and a product measure mu = p^Z where p = (p₀, ..., pₘ₋¹) is a probability vector with pᵢ greater than 0 for all i. The shift map is:

(Tx)ₙ = xₙ₊¹ for all n in Z

T shifts the entire sequence one step to the left. A cylinder set C = (x : xᵢ = aᵢ for i = m, ..., n) has measure pₘ times ... times pₕ. The preimage T⁻¹(C) = (x : xᵢ₊¹ = aᵢ for i = m, ..., n) is a cylinder of the same type, with the same measure. This is the Bernoulli shift with distribution p, and it is one of the most important examples in ergodic theory.

Example 4: Toral Automorphisms (Cat Map)

Let X = R²/Z² be the 2-torus, mu = Lebesgue measure, and A a 2x2 integer matrix with det(A) = plus or minus 1 (so A is invertible over Z). The corresponding toral automorphism T(x) = Ax mod Z² preserves Lebesgue measure because det(A) = plus or minus 1. Arnold's cat map uses A with entries (2,1,1,1) (top row then bottom row) and is a hyperbolic toral automorphism, meaning it has eigenvalues not on the unit circle. Such maps exhibit chaotic behavior and are strongly mixing.

Key Insight: The Measure is the Invariant Object

In ergodic theory, the measure mu is the fundamental invariant, not the topology. Two transformations can be topologically very different yet isomorphic as measure-preserving systems if there is a measure-isomorphism between them. Conversely, the Kolmogorov-Sinai entropy (Section 6) and spectral invariants (Section 7) are tools for distinguishing non-isomorphic systems.

2. Poincare Recurrence Theorem

One of the earliest and most remarkable results in ergodic theory is the Poincare Recurrence Theorem (1890). Poincare proved this in the context of celestial mechanics, but it holds in complete generality for measure-preserving systems on finite measure spaces.

Statement of the Theorem

Poincare Recurrence Theorem

Let T be a measure-preserving transformation on a finite measure space (X, B, mu) and let A in B with mu(A) greater than 0. Then for mu-almost every x in A, there exist infinitely many positive integers n such that Tⁿ(x) is in A.

Proof Sketch

We prove the weaker statement first: mu-almost every x in A returns to A at least once. Define:

B = (x in A : Tⁿ(x) is not in A for all n = 1, 2, 3, ...)

B is the set of points in A that never return. We claim mu(B) = 0. Note that B, T⁻¹(B), T⁻²(B), ... are pairwise disjoint: if x is in T⁻ⁿ(B) and T⁻ᵐ(B) with n less than m, then Tⁿ(x) is in B, so Tⁿ(x) never returns to A, but Tᵐ(x) = Tᵐ₋ⁿ(Tⁿ(x)) is in B which is contained in A, contradiction. Since T is measure-preserving, each T⁻ⁿ(B) has measure mu(B). The sets being disjoint and lying in a finite measure space forces mu(B) = 0. Applying this argument inductively shows infinite recurrence almost surely.

Physical Interpretation and Loschmidt's Paradox

In statistical mechanics, Poincare recurrence implies that any classical mechanical system in a bounded phase space region will eventually return arbitrarily close to its initial state — given enough time. For a box of gas, this means the gas molecules will (almost surely) return to occupying only the left half of the box at some future time, seeming to reverse entropy.

The Resolution of the Paradox

The recurrence time is astronomically large — for a mole of gas, the expected recurrence time vastly exceeds the age of the universe. Thermodynamic irreversibility operates on human timescales where recurrence is practically impossible. This does not contradict the second law of thermodynamics, which is statistical in nature.

Quantitative Recurrence: Kac's Lemma

Kac's Lemma gives the expected return time. If T is ergodic and A has positive measure, let rᵀ(x) = min(n greater than or equal to 1 : Tⁿ(x) in A) be the first return time. Then:

(1/mu(A)) times integral over A of rᵀ d mu = 1/mu(A)

In other words, the expected return time to A is exactly 1/mu(A). This beautiful result says: the rarer the event (smaller mu(A)), the longer you wait on average for recurrence. For an ergodic system, this average is taken over the invariant measure restricted to A.

3. Ergodicity: Definition and Characterizations

Ergodicity is the central property in ergodic theory. An ergodic system is one that cannot be decomposed into two or more invariant subsystems of positive measure — it is, in a precise sense, dynamically irreducible.

Definition of Ergodicity

Ergodic Transformation

A measure-preserving transformation T on (X, B, mu) is ergodic if the only T-invariant measurable sets A (meaning T⁻¹(A) = A modulo sets of measure zero) are sets with mu(A) = 0 or mu(A) = 1.

Equivalently, T is ergodic if there are no non-trivial T-invariant sets. An invariant set A with 0 less than mu(A) less than 1 would allow us to decompose the system into two parts — the dynamics on A and the dynamics on X minus A — each T-invariant. Ergodicity rules this out.

Equivalent Characterizations

The following are equivalent for a measure-preserving T on a probability space (X, B, mu):

(i) Invariant Sets

T is ergodic (the invariant set definition above).

(ii) Invariant Functions

Every T-invariant measurable function f (satisfying f composed with T = f almost everywhere) is constant almost everywhere.

(iii) Time Average Equals Space Average

For every f in L¹(mu), the time average (1/n) sum from k=0 to n-1 of f(Tⁿ(x)) converges to the integral of f d mu, for mu-almost every x. (This follows from Birkhoff's theorem combined with ergodicity.)

(iv) Ergodic Criterion via Products

For all A, B in B with positive measure: (1/n) sum from k=0 to n-1 of mu(A intersect T⁻ⁿ(B)) converges to mu(A) times mu(B). This says A and B are asymptotically independent on average along orbits.

Ergodic Examples

Ergodic Systems

  • Irrational rotation: T(x) = x + alpha mod 1 for irrational alpha. The orbit of every point is dense in the circle.
  • Doubling map: T(x) = 2x mod 1 on [0,1). Ergodic with respect to Lebesgue measure.
  • Bernoulli shift: The shift on (A^Z, p^Z) for any strictly positive probability vector p.
  • Hyperbolic toral automorphisms: Arnold's cat map and other Anosov diffeomorphisms of tori.

Non-Ergodic Systems

  • Rational rotation: T(x) = x + p/q mod 1 for rational p/q. All orbits are finite, and many invariant sets of intermediate measure exist.
  • Identity map: T = id. Every measurable set is invariant, so the system is as far from ergodic as possible.
  • Direct product of two systems: T x S on (X x Y, mu x nu) where neither factor is ergodic — the product inherits non-ergodicity.

Ergodic Decomposition Theorem

Every measure-preserving system can be decomposed into ergodic components. Formally, every T-invariant probability measure mu can be written as an integral over ergodic measures: mu = integral of muₓ d nu(y), where each muₓ is an ergodic measure for T. This is the ergodic decomposition, and it reduces many questions to the ergodic case.

4. The Ergodic Theorems

The ergodic theorems are the fundamental limit theorems of ergodic theory, establishing that time averages of observables converge. There are two main theorems: Birkhoff's pointwise theorem (convergence almost everywhere) and von Neumann's mean theorem (convergence in L²).

Birkhoff's Pointwise Ergodic Theorem (1931)

Birkhoff Ergodic Theorem

Let T be a measure-preserving transformation on a probability space (X, B, mu) and f in L¹(mu). Then the time averages converge almost everywhere:

Aⁿ(f)(x) = (1/n) sum from k=0 to n-1 of f(Tⁿ(x)) converges to f*(x) a.e.

The limit f* is a T-invariant L¹ function satisfying the integral of f* d mu equals the integral of f d mu. If T is ergodic, then f* is constant a.e. and equals the integral of f d mu. In other words, for ergodic systems, the time average of any L¹ observable equals its space average.

What Birkhoff's Theorem Really Says

For a concrete example: let T be the doubling map on [0,1) (ergodic) and f(x) = x. Then (1/n) sum from k=0 to n-1 of Tⁿ(x) converges almost everywhere to the integral of x dx from 0 to 1, which equals 1/2. This says: for Lebesgue-almost every starting point x, the long-run time average of the orbit converges to 1/2, the uniform average over [0,1).

More generally, for f = indicator of [a,b]: (1/n) times the number of k in (0,...,n-1) such that Tⁿ(x) is in [a,b] converges to b-a a.e. This says orbits spend time in intervals proportional to their length — equidistribution.

Von Neumann's Mean Ergodic Theorem (1932)

Mean Ergodic Theorem

Let T be a measure-preserving transformation on (X, B, mu) and Uₒ the induced unitary operator on L²(mu) defined by (Uₒf)(x) = f(T(x)). Then for every f in L²(mu):

(1/n) sum from k=0 to n-1 of Uₒⁿ(f) converges in L² norm to P(f)

where P is the orthogonal projection onto the closed subspace of T-invariant functions in L²(mu). If T is ergodic, P(f) = integral of f d mu (the constant function equal to the mean of f). This is convergence in L², not pointwise.

Comparison: Birkhoff vs. Von Neumann

PropertyBirkhoff (Pointwise)Von Neumann (Mean)
Function classf in L¹f in L²
Mode of convergenceAlmost everywhereL² norm
LimitT-invariant f*Projection P(f)
Ergodic caseLimit = integral of fLimit = integral of f (constant)
Proved byBirkhoff, 1931Von Neumann, 1932

Maximal Ergodic Lemma

A key tool in proving Birkhoff's theorem is the Maximal Ergodic Lemma (also called the maximal inequality). Define the maximal ergodic average:

f*(x) = sup over n greater than or equal to 1 of (1/n) sum from k=0 to n-1 of f(Tⁿ(x))

The Maximal Ergodic Lemma states: for any lambda greater than 0, mu( (x : f*(x) greater than lambda) ) is at most (1/lambda) times the integral of f d mu, when f is non-negative and in L¹. This maximal inequality is the ergodic analogue of the Hardy-Littlewood maximal inequality and is the core technical ingredient in proving almost everywhere convergence from L² convergence.

5. Mixing: Weak, Strong, K-Systems, and Bernoulli Shifts

Mixing properties describe how quickly a dynamical system loses memory of its initial state. They form a hierarchy: every Bernoulli shift is a K-system, every K-system is strongly mixing, every strongly mixing system is weakly mixing, and every weakly mixing system is ergodic. But none of these implications reverse.

Weak Mixing

Definition

T is weakly mixing if for all A, B in B:

(1/n) sum from k=0 to n-1 of |mu(A intersect T⁻ⁿ(B)) - mu(A)mu(B)| converges to 0

This is Cesaro convergence of mu(A intersect T⁻ⁿ(B)) to mu(A)mu(B). Weak mixing is equivalent to T x T being ergodic on X x X. It is also equivalent to the absence of non-constant eigenfunctions: if f composed with T = lambda f a.e. for some constant lambda, then |lambda| = 1 and f is constant a.e.

Strong Mixing

Definition

T is (strongly) mixing if for all A, B in B:

mu(A intersect T⁻ⁿ(B)) converges to mu(A)mu(B) as n tends to infinity

This is genuine convergence (not just Cesaro). Strong mixing says sets A and T⁻ⁿ(B) become asymptotically independent. Equivalently, for all f, g in L²(mu): the inner product of f composed with Tⁿ with g converges to (integral of f)(integral of g) as n tends to infinity. Strong mixing implies weak mixing (by a standard argument via Cesaro means), but the converse fails.

K-Systems (Kolmogorov Systems)

K-systems are defined via sigma-algebras rather than directly via mixing. A measure-preserving transformation T on (X, B, mu) is a K-automorphism (Kolmogorov automorphism) if there exists a sub-sigma-algebra K of B such that:

  • (i) T(K) is contained in K (i.e., K is subordinate to T)
  • (ii) The join of Tⁿ(K) over all n = 0, 1, 2, ... is dense in B (generates B)
  • (iii) The intersection of T⁻ⁿ(K) over all n = 0, 1, 2, ... is the trivial sigma-algebra (just the null set and X)

K-systems have very strong chaotic properties. They have completely positive entropy (every non-trivial factor has positive entropy), are strongly mixing (in fact, mixing of all orders), and have rich spectral structure. Every Bernoulli shift is a K-system, but not every K-system is a Bernoulli shift (this was a major open question resolved by Ornstein theory).

Bernoulli Shifts and Ornstein's Theorem

Bernoulli shifts are the paradigm of randomness in ergodic theory. The one-sided Bernoulli shift B(p₀, ..., pₘ₋¹) on A^N with product measure p^N is the shift map sigma(x)ₙ = xₙ₊¹. The two-sided version is on A^Z.

Ornstein's Theorem (1970)

Two Bernoulli shifts are isomorphic (as measure-preserving systems) if and only if they have the same entropy. That is, B(p₀, ..., pₘ₋¹) is isomorphic to B(q₀, ..., qₙ₋¹) if and only if minus sum of pᵢ log pᵢ equals minus sum of qₗ log qₗ.

This is a remarkable theorem: entropy is a complete isomorphism invariant for Bernoulli shifts. The proof introduced the notion of finitely determined processes and very weak Bernoulli processes, which became central tools in the theory.

The Mixing Hierarchy (from weakest to strongest):

ErgodicWeakly MixingStrongly MixingK-SystemBernoulli

Each arrow means "implies." None of the implications reverse.

6. Entropy: Kolmogorov-Sinai Entropy and the Variational Principle

Entropy in ergodic theory measures the average information content generated per unit time by a dynamical system. Introduced by Kolmogorov (1958) and refined by Sinai, Kolmogorov-Sinai (metric) entropy h(T) is an isomorphism invariant that distinguishes many non-isomorphic systems.

Shannon Entropy of a Partition

Let P = (A₁, ..., A⁾) be a finite measurable partition of X (the Aᵢ are pairwise disjoint, cover X, and have positive measure). The Shannon entropy of P is:

H(P) = minus sum from i=1 to n of mu(Aᵢ) log mu(Aᵢ)

H(P) measures the average uncertainty about which atom of P contains a randomly chosen point. When all atoms have equal measure 1/n, H(P) = log n is maximized. When one atom has measure 1 (trivial partition), H(P) = 0.

Conditional Entropy and the Join of Partitions

The join P join Q of two partitions consists of all intersections Aᵢ intersect Bₗ where Aᵢ is in P and Bₗ is in Q. The conditional entropy of P given Q is:

H(P|Q) = H(P join Q) minus H(Q)

This measures the additional uncertainty about P given that we know Q. The time-n join P join T⁻¹P join ... join T⁻(ⁿ₋¹)P is the partition of X into sets where points agree on their P-labels for times 0, 1, ..., n-1. The entropy rate of T with respect to P is:

h(T, P) = lim as n tends to infinity of (1/n) H(P join T⁻¹P join ... join T⁻(ⁿ₋¹)P)

Kolmogorov-Sinai Entropy

Definition

The Kolmogorov-Sinai (metric) entropy of T is:

h(T) = sup over all finite partitions P of h(T, P)

By Sinai's theorem, the supremum is attained by any generating partition P (a partition such that the join of T⁻ⁿ(P) over all n in Z generates B modulo null sets). This means h(T) = h(T, P) for any generating partition, enormously simplifying computation.

Entropy of Standard Examples

Bernoulli Shift B(p₀,...,pₘ₋¹)

h(T) = minus sum of pᵢ log pᵢ

This is the Shannon entropy of the generating partition (the 0th coordinate). For the fair coin flip B(1/2, 1/2): h = log 2 bits per step.

Doubling Map T(x) = 2x mod 1

h(T) = log 2

The partition P = ([0,1/2), [1/2,1)) is generating. Each application of T produces exactly 1 bit of information about the binary expansion of x.

Irrational Rotation T(x) = x + alpha mod 1

h(T) = 0

Irrational rotations are isometries — they create no information. Zero entropy systems are deterministic in the ergodic-theoretic sense.

Hyperbolic Toral Automorphism

h(T) = sum of log|lambdaᵢ| for |lambdaᵢ| greater than 1

The entropy equals the sum of positive Lyapunov exponents (Pesin's formula). For Arnold's cat map with eigenvalues (3 plus sqrt(5))/2 and (3 minus sqrt(5))/2: h = log((3+sqrt(5))/2).

The Variational Principle

Variational Principle (Goodwyn-Goodman-Dinaburg, 1969-1971)

For a continuous map T on a compact metric space X, the topological entropy hₜ̛Ṕ(T) satisfies:

hₜ̛Ṕ(T) = sup over all T-invariant Borel probability measures mu of hṁ(T)

where hṁ(T) is the KS entropy with respect to mu. The measures achieving the supremum are called measures of maximal entropy. This bridges topological and measure-theoretic entropy, and is fundamental in thermodynamic formalism.

7. Spectral Theory of Ergodic Systems

Every measure-preserving transformation T induces a unitary operator Uₒ on the Hilbert space L²(X, mu), defined by Uₒ(f) = f composed with T. The spectral theory of Uₒ is the spectral theory of the dynamical system T, and spectral invariants provide powerful tools for distinguishing non-isomorphic systems.

The Koopman Operator

The operator Uₒ: L²(mu) to L²(mu) defined by Uₒ(f)(x) = f(T(x)) is called the Koopman operator. It is a linear isometry: the L² norm of Uₒ(f) equals the L² norm of f. When T is invertible and measure-preserving, Uₒ is unitary (Uₒ has a bounded inverse equal to Uₒ⁻¹, the Koopman operator for T⁻¹).

The von Neumann Mean Ergodic Theorem (Section 4) is precisely the statement that the Cesaro averages (1/n) sum from k=0 to n-1 of Uₒⁿ converge in the strong operator topology to the projection P onto the eigenspace of Uₒ corresponding to eigenvalue 1 (the T-invariant functions). Thus the spectral theory of Uₒ directly governs the ergodic averages.

Eigenvalues and Spectral Measures

A measurable function f is an eigenfunction of T with eigenvalue lambda in C if f composed with T = lambda f almost everywhere. Since Uₒ is unitary, all eigenvalues lie on the unit circle: |lambda| = 1. The set of eigenvalues forms a subgroup of the circle group (under multiplication), called the point spectrum or eigenvalue group.

Spectral Measures

For each f in L²(mu), the spectral measure sigmaᶠ is the unique Borel measure on the circle T such that for all n in Z:

inner product of Uₒⁿ(f) with f = integral of zⁿ d sigmaᶠ(z)

The maximal spectral type of Uₒ is the measure class of the sum over a basis (fₙ) of (1/2ⁿ) times sigmaᶠₙ. The spectral type is an isomorphism invariant.

Spectral Characterizations of Mixing Properties

Ergodicity

T is ergodic if and only if 1 is a simple eigenvalue of Uₒ (i.e., the eigenspace for eigenvalue 1 consists only of constants). Equivalently, the constant functions are the only T-invariant L² functions.

Weak Mixing

T is weakly mixing if and only if the spectral measure of Uₒ has no point mass except possibly at 1. Equivalently, Uₒ has no non-trivial eigenvalues other than 1 (no non-constant eigenfunctions).

Strong Mixing

T is strongly mixing if and only if for all f, g in L² perpendicular to the constants, the inner product of Uₒⁿ(f) with g converges to 0 as n tends to infinity. By the Riemann-Lebesgue lemma for spectral measures, this holds when the spectral measure of Uₒ restricted to L² orthogonal to constants is continuous (no atoms).

Discrete Spectrum Systems

If L²(mu) has an orthonormal basis of eigenfunctions of Uₒ, the system has discrete spectrum. Halmos-von Neumann theorem: ergodic systems with discrete spectrum are classified (up to isomorphism) by their eigenvalue group, which must be a countable dense subgroup of the circle. Irrational rotations have discrete spectrum with eigenvalue group (exp(2*pi*i*n*alpha) : n in Z).

8. Invariant Measures: Existence and Unique Ergodicity

A central question is: for a given map T, which probability measures does T preserve? The Krylov-Bogoliubov theorem guarantees existence in the topological setting, while unique ergodicity (exactly one invariant measure) gives the strongest equidistribution results.

Krylov-Bogoliubov Theorem

Krylov-Bogoliubov Existence Theorem

Let X be a compact metrizable topological space and T: X to X a continuous map. Then T admits at least one Borel probability measure mu that is T-invariant (mu(T⁻¹(A)) = mu(A) for all Borel sets A).

Proof idea: Pick any point x₀ in X and consider the sequence of probability measures muⁿ = (1/n) sum from k=0 to n-1 of the Dirac measure at Tⁿ(x₀). Since the space of probability measures on a compact metric space is compact in the weak* topology (Prokhorov's theorem), there is a weak* limit mu of a subsequence. One verifies that mu is T-invariant by a direct computation using the continuity of T.

Unique Ergodicity

Definition

A continuous map T: X to X on a compact metric space is uniquely ergodic if there is exactly one T-invariant Borel probability measure mu.

Unique ergodicity implies strong equidistribution: for every continuous function f on X and every point x in X:

(1/n) sum from k=0 to n-1 of f(Tⁿ(x)) converges to integral of f d mu

Note: this convergence is for ALL x (not just mu-almost every x), which is much stronger than what Birkhoff's theorem guarantees. This is what makes unique ergodicity particularly powerful.

Weyl's Equidistribution Theorem

Weyl's Equidistribution Theorem (1916) is the most celebrated instance of unique ergodicity. It predates the general theory but is now understood as a consequence of it.

Weyl Equidistribution Theorem

If alpha is irrational, the sequence n*alpha mod 1 (n = 1, 2, 3, ...) is equidistributed in [0,1). Equivalently, for every Riemann integrable function f on [0,1):

(1/N) sum from n=1 to N of f(n*alpha mod 1) converges to integral from 0 to 1 of f(x) dx

In ergodic terms: the rotation T(x) = x + alpha mod 1 is uniquely ergodic (with respect to Lebesgue measure mu) when alpha is irrational. Weyl's original proof uses character estimates: it suffices to verify equidistribution for f(x) = exp(2*pi*i*k*x) for each non-zero integer k, where the sum is a geometric series that converges to 0.

Polynomial Sequences and Weyl's Theorem

Weyl also proved equidistribution for polynomial sequences. If p(n) is a polynomial with at least one irrational coefficient (other than the constant term), then p(n) mod 1 is equidistributed in [0,1). For example, n²*alpha mod 1 for irrational alpha is equidistributed. This is proved by induction on degree using the van der Corput differencing lemma.

9. Topological Dynamics: Minimality, Proximality, and Ellis Semigroups

Topological dynamics studies continuous maps on compact metric (or topological) spaces without necessarily fixing a measure. The analogues of ergodic-theoretic concepts take a topological form. Many of the deepest results in ergodic theory draw on the interplay between the topological and measure-theoretic perspectives.

Minimality

Definition

A topological dynamical system (X, T) is minimal if every orbit (Tⁿ(x) : n in N) is dense in X. Equivalently, X has no proper closed T-invariant subsets.

Minimality is the topological analogue of ergodicity. Every compact dynamical system contains a minimal subsystem (by Zorn's lemma). The irrational rotation T(x) = x + alpha mod 1 on the circle is minimal when alpha is irrational: every orbit is dense. A rational rotation T(x) = x + p/q is not minimal: every orbit is a finite set.

Proximal Pairs and Distal Systems

Two points x, y in X are proximal if their orbits come arbitrarily close: inf over n in N of d(Tⁿ(x), Tⁿ(y)) = 0. They are distal if inf over n of d(Tⁿ(x), Tⁿ(y)) greater than 0. A system is distal if every pair of distinct points is distal.

Distal systems have particularly rich structure. Furstenberg proved (1963) that every minimal distal system is an isometric extension of a sequence of isometric extensions of the trivial one-point system — the Furstenberg structure theorem. Isometric rotations and nilrotations (dynamics on nilmanifolds) are examples of distal systems. This structure theorem is a forerunner of the more general Host-Kra structure theorem for characteristic factors.

The Ellis Semigroup

For a compact dynamical system (X, T), the Ellis semigroup E(X, T) is the closure of the set of maps (Tⁿ : n in N) in X^X equipped with the product topology. It is a compact semigroup under composition.

The algebraic structure of E(X, T) encodes deep dynamical properties. For example: the system is distal if and only if E(X, T) is a group. The minimal ideal of E(X, T) and its idempotents are used in Furstenberg's proof of multiple recurrence and have connections to combinatorial number theory (IP sets, central sets).

Nilsystems and the Host-Kra Structure Theorem

Nilsystems are dynamical systems of the form (G/Gamma, T) where G is a nilpotent Lie group, Gamma is a cocompact lattice, and T is translation by a fixed element. They generalize rotations (which correspond to G = R) and play the role of the characteristic factors for the Gowers uniformity norms.

Host-Kra Structure Theorem

For an ergodic system (X, T), the characteristic factor for the k-term Cesaro average of the product f₀ times f₁ composed with T times ... times fₘ₋¹ composed with Tⁿ(ₘ₋¹) is a (k-1)-step nilsystem. This structural result was proved by Host and Kra (2005) and by Ziegler (2007), and it led to the proof of convergence of multiple ergodic averages by Tao and others.

10. Applications of Ergodic Theory

Ergodic theory's reach extends far beyond abstract dynamical systems. Its methods have transformed statistical mechanics, number theory, and information theory.

Statistical Mechanics: The Ergodic Hypothesis

In classical statistical mechanics, a system of N particles occupies a point in a 6N-dimensional phase space. The energy constraint E = const. defines a hypersurface (energy shell). Boltzmann's ergodic hypothesis (1871) asserted that a single gas trajectory eventually passes through every point on the energy shell — that the time average of any observable equals its microcanonical (phase space) average.

Mathematical Formulation

The Hamiltonian flow preserves the Liouville measure on phase space (by Liouville's theorem). If this flow is ergodic with respect to the microcanonical measure on the energy surface, then Birkhoff's theorem gives: for almost every initial condition, the time average of any L¹ observable f equals its microcanonical average. This is the modern mathematical content of the ergodic hypothesis.

In practice, proving ergodicity for realistic Hamiltonian systems is extraordinarily difficult. The Boltzmann-Sinai gas (hard sphere billiards) was proved ergodic by Sinai (1970) and later Chernov-Haskell and Simanyi-Szasz. The KAM theorem (Kolmogorov-Arnold-Moser) shows that near-integrable systems are emphatically NOT ergodic — they have a positive measure set of invariant tori.

Number Theory: Furstenberg's Proof of Szemeredi's Theorem

Szemeredi's Theorem (1975)

Every subset A of the integers with positive upper density (lim sup of |A intersect (1,...,N)| / N greater than 0) contains arbitrarily long arithmetic progressions.

Furstenberg's 1977 proof proceeds in two steps. First, the Furstenberg Correspondence Principle:

Furstenberg Correspondence Principle

Given A with positive upper density d*(A) greater than 0, there exists a measure-preserving system (X, B, mu, T) and a set E in B with mu(E) = d*(A) such that: for all k and all n₁, ..., n⁾ in Z, if (n₁, n₁+k, ..., n₁+(r-1)k) is a subset of A, then Tⁿ¹(E) intersect Tⁿ²(E) intersect ... intersect Tⁿʳ(E) has positive measure.

The second step is the Furstenberg Multiple Recurrence Theorem:

Furstenberg Multiple Recurrence Theorem

For any measure-preserving system (X, B, mu, T), any E in B with mu(E) greater than 0, and any positive integer r, there exists n greater than 0 such that:

mu(E intersect T⁻ⁿ(E) intersect T⁻²ⁿ(E) intersect ... intersect T⁻(ʳ₋¹)ⁿ(E)) greater than 0

This is a generalization of Poincare recurrence to multiple times. Combined with the correspondence principle, it immediately gives Szemeredi's theorem. The proof of multiple recurrence uses the structure theory of ergodic systems (factors, compact extensions, and weak mixing extensions) in an intricate induction on r.

Information Theory and the Shannon-McMillan-Breiman Theorem

The Shannon-McMillan-Breiman theorem is the ergodic version of Shannon's source coding theorem. It establishes the existence of a consistent average information content for stationary ergodic processes.

Shannon-McMillan-Breiman Theorem

Let T be an ergodic measure-preserving transformation and P a finite generating partition with entropy H(P). Let Pⁿ(x) denote the atom of the partition P join T⁻¹P join ... join T⁻(ⁿ₋¹)P containing x. Then:

minus (1/n) log mu(Pⁿ(x)) converges to h(T, P) almost everywhere

This says: the typical atom of the n-fold join partition has measure approximately exp(minus n * h(T,P)). In information theory language: a source with entropy h has roughly exp(n*h) typical sequences of length n, each with probability roughly exp(minus n*h). This is the Asymptotic Equipartition Property, fundamental to data compression.

Continued Fractions and the Gauss Map

The Gauss map T: (0,1) to [0,1) defined by T(x) = (1/x) mod 1 (i.e., the fractional part of 1/x) is the ergodic-theoretic engine behind continued fractions. The unique T-invariant absolutely continuous probability measure is the Gauss measure:

d muḠ(x) = (1 / (log 2)) times (1 / (1 + x)) dx

The Gauss map is ergodic with respect to muḠ. By Birkhoff's theorem applied to the function f(x) = log(1/x), the geometric mean of the partial quotients aⁿ of a continued fraction [a₁, a₂, ...] converges almost everywhere (with respect to Lebesgue measure) to Khinchin's constant K = product over k=1 to infinity of (1 + 1/(k*(k+2)))^(log k / log 2), approximately 2.6854. This is Khinchin's theorem, a beautiful application of ergodic theory to number theory.

Practice Problems with Solutions

Problem 1

Show that the doubling map T(x) = 2x mod 1 on ([0,1), Lebesgue) is measure-preserving. Then determine whether T is ergodic, and find h(T).

Show Solution

Measure preservation: For any interval [a,b) of length L = b-a, we have T⁻¹([a,b)) = [a/2, b/2) union [(a+1)/2, (b+1)/2). Each piece has Lebesgue measure L/2, so the total preimage has measure L = mu([a,b)). By a pi-lambda argument (the intervals generate the Borel sigma-algebra), T is measure-preserving.

Ergodicity: Suppose A is a T-invariant set with mu(A) greater than 0. Using the Fourier characterization: A is invariant means 1_A composed with T = 1_A a.e., so the Fourier coefficients satisfy hat(1_A)(k) = hat(1_A)(2k) for all k. For k not equal to 0, iterating gives hat(1_A)(k) = hat(1_A)(2ⁿk) for all n. Since the Fourier coefficients of an L² function tend to 0, we get hat(1_A)(k) = 0 for all k not equal to 0. Hence 1_A is constant a.e., so mu(A) is 0 or 1. This proves T is ergodic.

Entropy: The partition P = ([0,1/2), [1/2,1)) is a generating partition (the binary expansion of x). Using Sinai's theorem: h(T) = h(T, P). The atoms of the n-fold join are dyadic intervals of length 2⁻ⁿ, each with measure 2⁻ⁿ. The entropy of the n-fold join is minus 2ⁿ times (2⁻ⁿ log 2⁻ⁿ) = n log 2. Thus h(T, P) = (1/n) times n log 2 = log 2. So h(T) = log 2.

Problem 2

Let alpha be irrational and T(x) = x + alpha mod 1. Prove that T is ergodic using the Fourier characterization. Deduce that (n*alpha mod 1) is equidistributed.

Show Solution

Ergodicity via Fourier: Suppose f in L²([0,1)) satisfies f composed with T = f a.e. Expand f(x) = sum over k in Z of cₖ exp(2*pi*i*k*x). Then (f composed with T)(x) = f(x + alpha) = sum over k of cₖ exp(2*pi*i*k*alpha) exp(2*pi*i*k*x). The condition f composed with T = f gives cₖ exp(2*pi*i*k*alpha) = cₖ for all k. For k not equal to 0, since alpha is irrational, exp(2*pi*i*k*alpha) is not equal to 1, so cₖ = 0. Hence f(x) = c₀ is constant a.e., proving ergodicity.

Equidistribution: By unique ergodicity (Lebesgue measure is the unique T-invariant measure), for every continuous f: (1/N) sum from n=0 to N-1 of f(Tⁿ(x)) converges to integral of f dx for all x, not just almost all x. Taking f(x) = indicator of [a,b] (approximated by continuous functions), and x = 0: (1/N) times #(n in (0,...,N-1) : n*alpha mod 1 in [a,b]) converges to b-a. This is equidistribution.

Problem 3

Prove the Poincare Recurrence Theorem. That is: if T is measure-preserving on a finite measure space and mu(A) greater than 0, then almost every x in A returns to A infinitely often.

Show Solution

Step 1 (return at least once): Let B = (x in A : Tⁿ(x) not in A for all n greater than or equal to 1). We show mu(B) = 0. The sets B, T⁻¹(B), T⁻²(B), ... are pairwise disjoint. Indeed, if x is in T⁻ⁿ(B) intersect T⁻ᵐ(B) with n less than m, then Tⁿ(x) and Tᵐ(x) are both in B. But Tᵐ(x) = Tᵐ₋ⁿ(Tⁿ(x)), and since Tⁿ(x) is in B (subset of A) and B consists of points that never return to A, Tᵐ₋ⁿ(Tⁿ(x)) cannot be in A, contradicting Tᵐ(x) in B which is a subset of A. So the sets are disjoint, each has measure mu(B) (since T is measure-preserving), and they all fit in a space of finite measure, so mu(B) = 0.

Step 2 (infinitely often): Let Aⁿ = (x in A : Tⁿ₋¹(Tⁿ₀(x)) not in A for all n greater than n₀) be the set of points that stop returning to A after time n₀. By Step 1 applied to Tⁿ₀ restricted to A, mu(Aⁿ) = 0 for each n₀. The set of points not returning infinitely often is a countable union of Aⁿ, hence has measure 0.

Problem 4

Let (X, T, mu) be an ergodic measure-preserving system. Use Birkhoff's theorem to show that for mu-almost every x, the orbit (Tⁿ(x) : n = 0, 1, 2, ...) is dense in the support of mu.

Show Solution

Let U be any open set in X with mu(U) greater than 0. By Birkhoff's theorem applied to f = 1_U (indicator of U): for mu-almost every x, (1/n) sum from k=0 to n-1 of 1_U(Tⁿ(x)) converges to integral of 1_U d mu = mu(U) greater than 0.

In particular, for mu-almost every x, the Cesaro average is eventually positive, meaning infinitely many terms Tⁿ(x) must land in U. So the orbit of mu-almost every x visits every open set of positive measure.

Now take a countable base (Uᵢ) for the topology of the support of mu (a separable metric space has a countable base). For each Uᵢ, the set of x whose orbit visits Uᵢ has full measure. The intersection over all i is still a full-measure set. Every point in this intersection has orbit visiting every Uᵢ, which is every open set — so the orbit is dense in the support of mu.

Problem 5

Compute the KS entropy of the Bernoulli shift B(1/3, 1/3, 1/3) on a 3-letter alphabet, and of B(1/2, 1/4, 1/4). Are these systems isomorphic?

Show Solution

B(1/3, 1/3, 1/3): The entropy is H(1/3, 1/3, 1/3) = minus 3 times (1/3) times log(1/3) = minus 3 times (1/3) times (minus log 3) = log 3 (using natural log) or log₂(3) bits (using log base 2).

B(1/2, 1/4, 1/4): The entropy is H(1/2, 1/4, 1/4) = minus (1/2) log(1/2) minus 2 times (1/4) log(1/4) = (1/2) log 2 + (1/2) log 4 = (1/2) log 2 + log 2 = (3/2) log 2. In bits: 1/2 + 1 = 3/2 bits.

Are they isomorphic? By Ornstein's theorem, two Bernoulli shifts are isomorphic iff they have the same entropy. We need to check if log 3 = (3/2) log 2, i.e., if 3 = 2^(3/2) = 2*sqrt(2), i.e., if 3 = 2.828..., which is false. So h(B(1/3,1/3,1/3)) = log 3 is not equal to (3/2) log 2 = h(B(1/2,1/4,1/4)), and the two systems are not isomorphic.

Problem 6

Let T be a measure-preserving transformation. Show that T is strongly mixing implies T is ergodic. Give an example showing ergodicity does not imply strong mixing.

Show Solution

Mixing implies ergodicity: Suppose T is strongly mixing and A is a T-invariant set (T⁻¹(A) = A a.e.). Then for all n: mu(A intersect T⁻ⁿ(A)) = mu(A intersect A) = mu(A). By strong mixing, mu(A intersect T⁻ⁿ(A)) converges to mu(A)² as n tends to infinity. So mu(A) = mu(A)², which gives mu(A) = 0 or mu(A) = 1. Thus T is ergodic.

Ergodic but not strongly mixing: Irrational rotation T(x) = x + alpha mod 1 is ergodic (proved above). It is not strongly mixing: take A = [0, 1/2) and B = [0, 1/2). Then mu(A intersect T⁻ⁿ(B)) = mu([0,1/2) intersect ([0,1/2) minus n*alpha mod 1)). As n varies, this oscillates between 0 and 1/2 without converging. In fact for any rotation, mu(A intersect T⁻ⁿ(A)) is an almost periodic function of n (its Fourier expansion is a sum of terms exp(2*pi*i*k*n*alpha)), so it does not converge unless A has measure 0 or 1.

Exam Tips and Common Pitfalls

Distinguish a.e. convergence from convergence for all x

Birkhoff's theorem gives convergence for almost every x. Unique ergodicity gives convergence for ALL x. The distinction matters enormously in applications. Irrational rotations are uniquely ergodic, giving all-x equidistribution; the doubling map is ergodic but not uniquely ergodic (Dirac masses at periodic points are also invariant).

Measure-preservation uses preimages, not forward images

The condition is mu(T⁻¹(A)) = mu(A), not mu(T(A)) = mu(A). For non-invertible maps, T(A) is not even guaranteed to be measurable. When T is invertible, both conditions are equivalent, but the preimage formulation is primary.

Entropy requires a generating partition for easy computation

By Sinai's theorem, h(T) = h(T, P) for any generating partition P. Always look for the natural generating partition for a given system (binary digits for the doubling map, cylinder sets for shifts). Computing h(T) as a supremum over all partitions directly is rarely practical.

Ergodicity is about invariant sets, not orbit density

Ergodicity means no non-trivial T-invariant measurable sets. Orbit density is a topological property (minimality). In measure theory, a single orbit of measure zero can be dense without affecting ergodicity. Conversely, an ergodic system can have no dense orbits (a.e. orbits are equidistributed but not individually dense in general unless the support is metrically nice).

The mixing hierarchy is strict

Know the hierarchy: Bernoulli implies K implies strong mixing implies weak mixing implies ergodic. Know examples separating each level: irrational rotation (ergodic, not weakly mixing), Chacon system (weakly mixing, not strongly mixing), ... and it was a major open problem for decades whether every K-system is Bernoulli (resolved negatively by Ornstein and Shields in 1973).

Poincare recurrence requires finite measure

The Poincare Recurrence Theorem fails for infinite measure-preserving systems. The translation T(x) = x + 1 on the real line with Lebesgue measure is measure-preserving, but the set A = [0,1) satisfies Tⁿ(A) = [n, n+1) which is disjoint from A for all n = 1, 2, .... No point of A ever returns to A. Always check that mu(X) is finite before applying the theorem.

Related Topics

Summary: Key Results at a Glance

Theorem/ConceptStatement (informal)Due to
Poincare RecurrenceAlmost every point returns to any positive-measure set infinitely oftenPoincare, 1890
Kac's LemmaMean return time to A equals 1/mu(A) for ergodic TKac, 1947
Mean Ergodic TheoremL² Cesaro averages converge to the projection onto invariant functionsVon Neumann, 1932
Pointwise Ergodic TheoremTime averages converge a.e. for L¹ functionsBirkhoff, 1931
Weyl Equidistribution(n*alpha mod 1) is equidistributed for irrational alphaWeyl, 1916
Ornstein's TheoremBernoulli shifts classified up to isomorphism by entropyOrnstein, 1970
Variational PrincipleTopological entropy = supremum of metric entropies over invariant measuresGoodwyn et al., 1969-71
Shannon-McMillan-BreimanTypical atoms have measure exp(minus n * h) for ergodic systemsMcMillan/Breiman, 1953/1957
Furstenberg Multiple RecurrencePositive-measure sets recur along arithmetic progressions; implies SzemerediFurstenberg, 1977
Krylov-BogoliubovEvery continuous map on a compact space has at least one invariant measureKrylov-Bogoliubov, 1937