Graduate Mathematics

Harmonic Analysis: Fourier Series, Transforms, and Singular Integrals

Harmonic analysis is the branch of mathematics that studies the representation of functions through superposition of basic waves. From classical Fourier series on the circle to modern Calderon-Zygmund theory and wavelets, harmonic analysis sits at the crossroads of analysis, PDE theory, and signal processing.

Learning Objectives

After working through this guide you will be able to:

  • State and apply the L^2 theory of Fourier series, Parseval's identity, and the Dirichlet and Fejer convergence theorems
  • Compute Fourier transforms and apply the inversion formula, Plancherel theorem, convolution theorem, and uncertainty principle
  • Work with Schwartz space and tempered distributions, including the distributional Fourier transform and the delta function
  • Prove the weak-type (1,1) bound for the Hardy-Littlewood maximal function and derive strong-type bounds via interpolation
  • Analyze the Hilbert transform and Riesz transforms as prototypical Calderon-Zygmund singular integral operators
  • Apply Littlewood-Paley theory and dyadic decomposition to prove multiplier theorems and characterize Besov and Triebel-Lizorkin spaces
  • Construct wavelets via multiresolution analysis and understand applications to data compression and signal processing
  • State Pontryagin duality for locally compact abelian groups and place classical Fourier analysis in its abstract setting

1. Fourier Series on the Circle: L^2 Theory

The circle group T is identified with the interval (-pi, pi] with endpoints identified, or equivalently with the unit circle in the complex plane. Every function in L^2(T) has a Fourier series expansion in terms of the orthonormal basis of complex exponentials.

Fourier Coefficients and Series

For f in L^2(T), the n-th Fourier coefficient is defined by the inner product of f with the character e^(inx). The resulting series is the Fourier series of f.

Fourier coefficient: c_n(f) = (1/2pi) integral[-pi, pi] f(x) e^(-inx) dx

Fourier series: f(x) ~ sum(n in Z) c_n e^(inx)

Parseval's identity: sum(n in Z) |c_n|^2 = (1/2pi) ||f||_2^2

Parseval's Identity and L^2 Completeness

The system (e^(inx)) forms a complete orthonormal basis for L^2(T). Completeness means that the partial sums S_N(f) = sum from n = -N to N of c_n e^(inx) converge to f in the L^2 norm. Parseval's identity, which says the L^2 norm of f equals the l^2 norm of its coefficient sequence, is the content of this completeness statement. The map f to (c_n) is an isometric isomorphism from L^2(T) to l^2(Z).

Key Insight: Orthonormality

The inner product of e^(imx) and e^(inx) in L^2(T) equals 1 when m = n and 0 otherwise. This orthogonality is what makes the coefficient formula work: c_n is extracted by integrating f against e^(-inx), killing all other frequency components.

Dirichlet Kernel and Pointwise Convergence

The N-th partial sum S_N(f)(x) equals the convolution of f with the Dirichlet kernel D_N(x) = sum from n = -N to N of e^(inx). By summing the geometric series, D_N(x) = sin((N + 1/2)x) divided by sin(x/2). The Dirichlet kernel has unit integral but its L^1 norm grows like log(N), which is why pointwise convergence fails in general. The Riemann-Lebesgue lemma says c_n(f) tends to 0 as n tends to infinity for any f in L^1(T).

Fejer's Theorem and Cesaro Summability

While partial sums may diverge pointwise, the Cesaro means sigma_N(f) = (1/(N+1)) times sum from k = 0 to N of S_k(f) behave much better. These averages equal the convolution of f with the Fejer kernel F_N(x) = (1/(N+1)) times |D_N(x)|^2, which is non-negative and has unit integral. Fejer's theorem states:

Fejer's Theorem

  • If f is in L^1(T), then sigma_N(f) converges to f in L^1 norm.
  • If f is continuous, then sigma_N(f) converges to f uniformly.
  • At any point x where f has a left and right limit, sigma_N(f)(x) converges to the average (f(x+) + f(x-))/2.

Fejer's theorem has a remarkable corollary: the trigonometric polynomials are dense in C(T) in the uniform norm (by Fejer), and hence dense in L^p(T) for all 1 less than or equal to p less than infinity. This is the harmonic analysis analog of the Weierstrass approximation theorem.

Pointwise Convergence: Dini and Jordan Conditions

The Fourier series of f converges to f(x) pointwise at x under various local regularity conditions. The Dini condition requires that the integral from 0 to delta of |f(x+t) + f(x-t) - 2f(x)| / t dt be finite. The Jordan condition applies when f is of bounded variation in a neighborhood of x: the partial sums converge to (f(x+) + f(x-))/2. However, Kolmogorov constructed an L^1 function whose Fourier series diverges everywhere — pointwise convergence for L^1 functions is not automatic.

Deep Result: Carleson's Theorem (1966)

For any f in L^2(T), the Fourier series of f converges to f(x) for almost every x. This was a landmark result, later extended by Hunt to L^p for any p greater than 1. The proof uses a sophisticated argument involving the Calderon-Zygmund technique and maximal operators.

2. Fourier Transform on R

The Fourier transform extends the idea of frequency decomposition from periodic functions to functions on the entire real line. It converts a function of time (or position) into a function of frequency, and underpins virtually all of modern signal processing and PDE theory.

Definition and Basic Properties

Fourier transform: f-hat(xi) = integral[-inf, inf] f(x) e^(-2pi i x xi) dx

Inversion formula: f(x) = integral[-inf, inf] f-hat(xi) e^(2pi i x xi) d-xi

The Fourier transform is well defined for f in L^1(R) and the result f-hat is continuous and vanishes at infinity (Riemann-Lebesgue lemma). The inversion formula holds when both f and f-hat are in L^1. Key algebraic properties: the Fourier transform converts differentiation to multiplication by 2pi i xi, and converts convolution to pointwise multiplication.

Convolution Theorem

The convolution of two functions f and g is defined as the integral of f(y) times g(x-y) over all y. The convolution theorem states that the Fourier transform of a convolution is the pointwise product of the transforms:

(f * g)-hat(xi) = f-hat(xi) times g-hat(xi)

Equivalently: (fg)-hat = f-hat * g-hat

The convolution theorem is fundamental in signal processing: passing a signal through a linear time-invariant filter corresponds to multiplying in the frequency domain. The transfer function of the filter is the Fourier transform of its impulse response.

Plancherel Theorem

For f in L^1 intersect L^2, the Fourier transform satisfies the isometry property: the L^2 norm of f-hat equals the L^2 norm of f. The Plancherel theorem asserts that the Fourier transform extends uniquely to a unitary automorphism of L^2(R).

Plancherel's Theorem

||f-hat||_2 = ||f||_2 for all f in L^2(R)

More generally: integral f-hat(xi) g-hat-bar(xi) d-xi = integral f(x) g-bar(x) dx

Heisenberg Uncertainty Principle

A function and its Fourier transform cannot both be highly concentrated. The mathematical uncertainty principle quantifies this trade-off precisely.

Uncertainty Principle (Heisenberg-Weyl Inequality)

||x f(x)||_2 times ||xi f-hat(xi)||_2 is greater than or equal to (1/4pi) ||f||_2^2

Equality holds if and only if f is a Gaussian: f(x) = C e^(-ax^2) for some constant C and positive constant a. In quantum mechanics, this inequality corresponds to Heisenberg's uncertainty principle for position and momentum.

The proof uses integration by parts and the Cauchy-Schwarz inequality. The key step is that x times d/dx of |f|^2 integrates to -||f||_2^2, which follows from the divergence theorem (or integration by parts).

3. Tempered Distributions and Schwartz Space

To extend the Fourier transform to objects like the Dirac delta function or to functions that grow at infinity, we work in the framework of distributions introduced by Laurent Schwartz.

The Schwartz Space S(R^n)

The Schwartz space consists of smooth functions that decay rapidly together with all their derivatives. Formally, phi is in S(R^n) if phi is infinitely differentiable and for every pair of multi-indices alpha and beta, the seminorm sup over x of |x^alpha times D^beta phi(x)| is finite. Informally, Schwartz functions decrease faster than any polynomial, as do all their derivatives.

Examples of Schwartz Functions

  • e^(-|x|^2): the Gaussian (also its own Fourier transform)
  • All compactly supported smooth functions (bump functions)
  • Products of polynomials with Gaussians
  • Not in S(R): e^(-|x|) (not smooth at 0), 1/(1+x^2) (not rapidly decreasing in derivatives)

The Fourier transform is an automorphism of S(R^n): if phi is in S then phi-hat is in S. This is the key advantage of Schwartz space for Fourier analysis. The inversion formula holds unconditionally in S.

Tempered Distributions S'(R^n)

A tempered distribution is a continuous linear functional on S(R^n). The space of tempered distributions is denoted S'(R^n) and is the topological dual of S. Every function of polynomial growth defines a tempered distribution by integration. More singular objects like the Dirac delta are also tempered distributions.

The Dirac Delta Distribution

delta(phi) = phi(0) for all phi in S(R^n)

delta-hat(phi) = delta(phi-hat) = phi-hat(0) = integral phi(x) dx = 1-hat(phi)

So the Fourier transform of delta is the constant function 1, and the Fourier transform of 1 is delta. This makes rigorous the physicists' formula delta(x) = (1/2pi) integral e^(ixt) dt.

Distributional Fourier Transform

For u in S'(R^n), the Fourier transform u-hat is defined by duality: u-hat(phi) = u(phi-hat) for all phi in S. This extends the Fourier transform to all tempered distributions. Derivatives of distributions are also defined by duality: (D^alpha u)(phi) = (-1)^|alpha| u(D^alpha phi). Key examples include: the derivative of delta is -delta', and the principal value distribution p.v.(1/x) (which is the distributional kernel of the Hilbert transform) has Fourier transform -i pi sign(xi).

Exam Tip: Distribution vs. Function

Distributions do not have pointwise values — they are only defined by what they do to test functions. When a problem asks you to compute in the distributional sense, always move derivatives off the distribution and onto the test function using integration by parts (and the duality definition).

4. Hardy-Littlewood Maximal Function

The Hardy-Littlewood maximal function is a fundamental tool for controlling pointwise behavior of functions through averages. It appears in the proof of Lebesgue's differentiation theorem and in the theory of singular integrals.

Definition

Hardy-Littlewood Maximal Operator

Mf(x) = sup over r greater than 0 of (1/|B(x,r)|) integral[B(x,r)] |f(y)| dy

Here B(x,r) is the ball of radius r centered at x, and |B(x,r)| is its Lebesgue measure. Mf(x) is the supremum of average values of |f| over all balls centered at x. Note Mf(x) is always greater than or equal to |f(x)| at Lebesgue points.

Weak-Type (1,1) Bound

The maximal function is not L^1-bounded (taking the constant function 1 on R shows Mf = infinity). However, it satisfies a weaker substitute: the weak-type (1,1) inequality.

Hardy-Littlewood Maximal Theorem

Weak-type (1,1): ||(x : Mf(x) greater than lambda)|| is less than or equal to (C_n / lambda) ||f||_1

Strong-type (p,p) for p greater than 1: ||Mf||_p is less than or equal to C_(n,p) ||f||_p

The proof of the weak-type bound uses the Vitali covering lemma: from any collection of balls, extract a subcollection of disjoint balls whose 3-fold dilations cover the original collection. This geometric lemma is fundamental to all of real-variable harmonic analysis.

Vitali Covering Lemma

Vitali Covering Lemma

Let (B_1, ..., B_N) be a finite collection of balls in R^n. Then there exists a subcollection (B_(i_1), ..., B_(i_k)) of disjoint balls such that B_1 union ... union B_N is contained in the union of 3 B_(i_1), ..., 3 B_(i_k), where 3 B denotes the ball with the same center as B but three times the radius. The measure of the union is at most 3^n times the sum of the measures of the selected balls.

Marcinkiewicz Interpolation and Strong-Type Bounds

The Marcinkiewicz interpolation theorem provides a soft method to upgrade weak-type estimates to strong-type estimates. Given a sublinear operator T that is weak-type (1,1) and bounded on L^infinity, Marcinkiewicz interpolation yields that T is bounded on L^p for all 1 less than p less than infinity. Applied to the maximal function (which is trivially L^infinity bounded by the L^infinity norm of f), this gives the strong-type (p,p) bound for p greater than 1.

Application: Lebesgue Differentiation Theorem

Using the maximal function, one can prove that for any locally integrable f, the average of f over B(x,r) converges to f(x) as r tends to 0, for almost every x. This is the Lebesgue differentiation theorem, and it is one of the key applications of the weak-type (1,1) bound.

5. Hilbert Transform and Riesz Transforms

The Hilbert transform is the prototypical example of a singular integral operator: its kernel 1/(pi x) is not integrable, yet the operator is bounded on L^2 and L^p. It plays a central role in complex analysis, signal processing, and the theory of Calderon-Zygmund operators.

Definition of the Hilbert Transform

Hilbert Transform (principal value integral)

Hf(x) = (1/pi) p.v. integral[-inf, inf] f(y)/(x - y) dy

= (1/pi) lim(epsilon to 0+) integral[|x-y| greater than epsilon] f(y)/(x-y) dy

Fourier multiplier: (Hf)-hat(xi) = -i sign(xi) f-hat(xi)

The Fourier multiplier formula shows immediately that H is bounded on L^2 with norm 1 (since |sign(xi)| = 1), and that H composed with H is -I (the negative identity on functions with zero mean). The operator H is its own inverse up to sign: H^2 = -I.

L^p Boundedness and the M. Riesz Theorem

Marcel Riesz proved that the Hilbert transform extends to a bounded operator on L^p(R) for all 1 less than p less than infinity. The norm ||H||_(L^p to L^p) grows like p as p tends to infinity and like 1/(p-1) as p tends to 1. The Hilbert transform does not extend to a bounded operator on L^1 or L^infinity, but it satisfies a weak-type (1,1) bound.

M. Riesz Theorem

For 1 less than p less than infinity: ||Hf||_p is less than or equal to A_p ||f||_p

Weak-type (1,1): ||(x : |Hf(x)| greater than lambda)|| is less than or equal to (C/lambda) ||f||_1

Riesz Transforms in Higher Dimensions

The Riesz transforms R_j, for j = 1, ..., n, are the natural n-dimensional generalizations of the Hilbert transform. They are defined by the Fourier multiplier -i xi_j / |xi|.

Riesz Transforms

(R_j f)-hat(xi) = (-i xi_j / |xi|) f-hat(xi), j = 1, ..., n

The Riesz transforms satisfy R_j* = -R_j (skew-adjoint), and the identity R_1^2 + ... + R_n^2 = -I holds on L^2. They are bounded on L^p for 1 less than p less than infinity and weak-type (1,1). The Riesz transforms can be used to recover second derivatives from the Laplacian: partial_j partial_k u = -R_j R_k (Delta u).

Connection to Complex Analysis

The Hilbert transform on R arises naturally from the Cauchy integral formula. If F = u + iv is an analytic function in the upper half-plane with boundary values u on R, then v = Hu. The pair (u, v) is a conjugate pair satisfying the Cauchy-Riemann equations. This connection explains why the Hilbert transform is bounded on L^p: it follows from the L^p theory of harmonic conjugates.

6. Calderon-Zygmund Theory

Calderon-Zygmund theory provides a unified framework for studying singular integral operators far beyond the Hilbert and Riesz transforms. The theory originated in the 1950s with Calderon and Zygmund's study of second-order elliptic PDEs, and was vastly generalized by subsequent work including the T(1) theorem.

Calderon-Zygmund Kernels

A Calderon-Zygmund (CZ) kernel K on R^n is a function defined away from the origin satisfying size and smoothness conditions:

CZ Kernel Conditions

  • Size: |K(x)| is less than or equal to C/|x|^n
  • Gradient: |grad K(x)| is less than or equal to C/|x|^(n+1)
  • Cancellation: integral[r less than |x| less than R] K(x) dx = 0 for all r, R

Equivalently, K is homogeneous of degree -n, smooth away from the origin, and has mean zero on the unit sphere. Examples: 1/(pi x) in dimension 1 (Hilbert transform kernel), x_j x_k / |x|^(n+2) (second Riesz transform kernel).

The Calderon-Zygmund Decomposition

The fundamental tool in the theory is the Calderon-Zygmund decomposition: given f in L^1 and a height lambda, decompose f = g + b where g (the "good" part) satisfies ||g||_1 is less than or equal to ||f||_1 and ||g||_infinity is less than or equal to C lambda, and b (the "bad" part) is supported on a union of disjoint cubes Q_j with total measure at most C||f||_1/lambda, with each b_j = b restricted to Q_j having mean zero.

Why the Decomposition Works

The cubes Q_j are the level-set cubes where the Hardy-Littlewood maximal function of f exceeds lambda. On each cube, f is replaced by its average (which stays bounded by C lambda), giving the good part. The mean-zero property of each b_j is crucial: it causes cancellation in the kernel integral, confining the bad part's contribution near the cubes.

BMO Space and the Sharp Maximal Function

The space BMO (functions of Bounded Mean Oscillation) consists of locally integrable functions f such that the "sharp maximal function" is bounded. The John-Nirenberg theorem gives precise exponential integrability for BMO functions.

BMO Norm

||f||_BMO = sup over cubes Q of (1/|Q|) integral[Q] |f - f_Q| dx

where f_Q = (1/|Q|) integral[Q] f is the average of f over Q.

BMO is the natural endpoint for Calderon-Zygmund operators: where L^infinity fails (CZ operators do not map L^infinity to L^infinity), they map L^infinity to BMO. Dually, L^1 maps to the Hardy space H^1, which is the predual of BMO.

T(1) Theorem of David-Journe

The T(1) theorem gives necessary and sufficient conditions for a singular integral operator T to be bounded on L^2. The conditions are: (1) T(1) is in BMO, (2) T*(1) is in BMO, and (3) T satisfies the weak boundedness property (a testing condition on indicator functions of balls). This theorem unified many results in the theory and led to the T(b) theorem and broader developments.

7. Littlewood-Paley Theory

Littlewood-Paley theory provides a powerful method for decomposing functions into frequency bands and measuring their size. It is the key tool for proving Fourier multiplier theorems and for defining modern function spaces like Sobolev, Besov, and Triebel-Lizorkin spaces.

Dyadic Decomposition and the Partition of Unity

Choose a smooth function psi on R^n supported in the annulus (1/2) less than |xi| less than 2, with sum over j in Z of psi(2^(-j) xi) = 1 for all xi not equal to 0. Define the j-th Littlewood-Paley piece of f by P_j f, whose Fourier transform is psi(2^(-j) xi) times f-hat(xi). This localizes f to frequencies of size approximately 2^j.

Littlewood-Paley Square Function

S(f)(x) = (sum over j in Z of |P_j f(x)|^2)^(1/2)

The Littlewood-Paley theorem: For 1 less than p less than infinity, the L^p norm of f is equivalent to the L^p norm of S(f). More precisely, c_p ||f||_p is less than or equal to ||S(f)||_p is less than or equal to C_p ||f||_p.

Multiplier Theorems: Mikhlin and Hormander

A key application of Littlewood-Paley theory is proving that certain Fourier multiplier operators are bounded on L^p. A Fourier multiplier operator T_m is defined by (T_m f)-hat(xi) = m(xi) f-hat(xi).

Mikhlin Multiplier Theorem

If m is a smooth function on R^n minus the origin satisfying |D^alpha m(xi)| is less than or equal to C_alpha |xi|^(-|alpha|) for all multi-indices alpha with |alpha| is less than or equal to floor(n/2) + 1, then T_m is bounded on L^p for all 1 less than p less than infinity. The Riesz transforms and the Laplacian inverse are examples. The Littlewood-Paley decomposition reduces the estimate to individual dyadic pieces where the multiplier looks approximately constant.

Besov and Triebel-Lizorkin Spaces

The Littlewood-Paley pieces P_j f measure the frequency content of f at scale 2^j. By controlling these pieces in different ways, one obtains different function spaces.

Besov Space B^(s,p,q)

Norm: (sum over j of (2^(js) ||P_j f||_p)^q)^(1/q). Measures smoothness s by how fast the L^p norms of dyadic pieces decay. Besov spaces include Sobolev spaces, Holder spaces, and the Hardy space H^1 as special cases.

Triebel-Lizorkin Space F^(s,p,q)

Norm: ||(sum over j of (2^(js) |P_j f|)^q)^(1/q)||_p. The L^p and l^q summations are interchanged compared to Besov spaces. Sobolev spaces W^(k,p) are special cases of Triebel-Lizorkin spaces.

8. Wavelets and Multiresolution Analysis

Wavelets combine the time-frequency localization of Fourier analysis with a recursive structure that makes them especially suited for analyzing signals at multiple scales. The theory was developed in the 1980s by Daubechies, Mallat, Meyer, and others, and immediately found applications in image compression, numerical analysis, and signal processing.

The Haar Wavelet

The simplest wavelet, introduced by Haar in 1910, is defined as: psi(x) = 1 on [0, 1/2), -1 on [1/2, 1), and 0 otherwise. The Haar system consists of all dilates and translates psi_(j,k)(x) = 2^(j/2) psi(2^j x - k) for j, k in Z. These form an orthonormal basis for L^2(R). The Haar wavelet is discontinuous; constructing smooth wavelets requires the framework of multiresolution analysis.

Haar Wavelet Coefficients

c_(j,k) = integral[-inf, inf] f(x) psi_(j,k)(x) dx

f = sum over j, k of c_(j,k) psi_(j,k) (L^2 convergence)

The coefficient c_(j,k) measures how much f oscillates at scale 2^(-j) near location k times 2^(-j). Smooth functions have rapidly decaying wavelet coefficients; discontinuities are detected by large coefficients at fine scales.

Multiresolution Analysis (MRA)

A multiresolution analysis is a sequence of closed subspaces V_j of L^2(R) satisfying:

MRA Axioms

  • (1) Nesting: ... V_(-1) contained in V_0 contained in V_1 contained in ...
  • (2) Density: the union of all V_j is dense in L^2(R)
  • (3) Triviality: the intersection of all V_j is (0)
  • (4) Scaling: f(x) in V_j if and only if f(2x) in V_(j+1)
  • (5) Shift-invariance: f(x) in V_0 implies f(x-k) in V_0 for all k in Z
  • (6) Basis: there exists phi in V_0 (scaling function) such that (phi(x-k)) form an orthonormal basis for V_0

Given an MRA, the wavelet psi is constructed as an element of W_0 = V_1 ominus V_0 (the orthogonal complement of V_0 in V_1). Since V_1 is generated by phi(2x - k), and the scaling function satisfies phi(x) = sum_k h_k phi(2x - k) for some filter coefficients h_k (the refinement equation), the wavelet is psi(x) = sum_k (-1)^k h_(1-k) phi(2x - k).

Daubechies Wavelets and Regularity

Ingrid Daubechies constructed a family of wavelets with compact support and increasing regularity. The Daubechies-N wavelet has N vanishing moments (integral of x^k psi(x) dx = 0 for k = 0, ..., N-1), which gives approximation order N and implies that smooth function coefficients decay rapidly. The Daubechies-1 wavelet is the Haar wavelet; Daubechies-2 and higher are smoother and better suited for compression applications.

Applications to Compression and Signal Processing

Image Compression (JPEG 2000)

JPEG 2000 uses the 2D discrete wavelet transform (DWT). An image is decomposed into wavelet coefficients at multiple scales; most coefficients are small and can be quantized to zero with little perceptual loss. The sparsity of the wavelet representation for natural images (which tend to be smooth except at edges) makes compression highly effective.

Denoising

The Donoho-Johnstone wavelet shrinkage method removes noise by thresholding wavelet coefficients: large coefficients (signal) are kept while small coefficients (noise) are set to zero. This exploits the fact that a signal's wavelet coefficients are concentrated while Gaussian noise is spread equally across all coefficients.

9. Pontryagin Duality and Abstract Fourier Analysis

Classical Fourier analysis on R and on the circle T can both be understood as special cases of a general theory of harmonic analysis on locally compact abelian (LCA) groups. Pontryagin duality places these examples in a unified framework and reveals the deep symmetry between a group and its dual.

Characters and the Dual Group

Let G be a locally compact abelian group. A character of G is a continuous group homomorphism from G to the circle group T = (z in C : |z| = 1). The set of all characters of G, denoted G-hat, forms a group under pointwise multiplication and carries a natural topology (the compact-open topology) that makes it locally compact abelian as well. G-hat is called the Pontryagin dual of G.

Examples of Dual Groups

  • G = R: characters are x to e^(2pi i x xi) for xi in R, so R-hat = R (self-dual)
  • G = T (circle): characters are z to z^n for n in Z, so T-hat = Z
  • G = Z (integers): characters are n to e^(2pi i n theta) for theta in T, so Z-hat = T
  • G = Z/nZ: characters are k to e^(2pi i j k/n), so the dual is also Z/nZ (self-dual)
  • G = R^n: self-dual (characters are x to e^(2pi i x dot xi))

Pontryagin Duality Theorem

The Pontryagin duality theorem asserts that the canonical map from G to its double dual G-hat-hat is an isomorphism of topological groups. This is the exact analog of the finite-dimensional statement that a vector space is canonically isomorphic to its double dual, but it holds in full generality for all LCA groups.

Abstract Fourier Transform

For f in L^1(G): f-hat(chi) = integral[G] f(x) chi-bar(x) d-mu(x)

Here mu is the Haar measure on G (the unique translation-invariant regular Borel measure, normalized appropriately). The abstract Fourier transform takes functions on G to functions on G-hat. The inversion formula, Plancherel theorem, and convolution theorem all generalize to this setting.

Haar Measure

Every locally compact group G admits a left-invariant Borel measure, unique up to positive scalar multiple, called the Haar measure. For R^n, Haar measure is Lebesgue measure. For the circle T, it is arc length measure normalized to have total mass 1. For a finite group, it is counting measure divided by the order of the group. The existence of Haar measure (proved by Haar in 1933, with uniqueness by von Neumann) is what makes integration and Fourier analysis possible on general LCA groups.

10. Applications: PDEs, Signal Processing, and Number Theory

Harmonic analysis is not merely an abstract theory — it provides the computational and conceptual machinery for solving PDEs, compressing and transmitting data, and even proving results in analytic number theory.

Heat Equation via Fourier Transform

Consider the heat equation on R: partial_t u = Delta u, with initial condition u(x, 0) = f(x). Taking the Fourier transform in the space variable x converts the PDE to an ODE in t for each frequency xi:

Heat Equation Solution

partial_t u-hat(xi, t) = -4 pi^2 |xi|^2 u-hat(xi, t)

u-hat(xi, t) = f-hat(xi) e^(-4 pi^2 |xi|^2 t)

u(x, t) = f * H_t(x), where H_t(x) = (1/(4 pi t)^(n/2)) e^(-|x|^2 / (4t))

H_t is the heat kernel (Gaussian). High-frequency components (large |xi|) are damped exponentially fast, giving the smoothing property of the heat equation.

Wave Equation and Huygens' Principle

For the wave equation partial_tt u = Delta u on R^n, the Fourier transform gives u-hat(xi, t) = f-hat(xi) cos(2 pi |xi| t) + g-hat(xi) sin(2 pi |xi| t) / (2 pi |xi|), where f and g are initial position and velocity. In odd dimensions at least 3, the solution at time t depends only on data on the sphere of radius t (Huygens' principle); in even dimensions, it depends on data in the entire ball.

Signal Processing: Sampling and Reconstruction

The Shannon-Nyquist sampling theorem, a fundamental result of signal processing, is a direct application of Fourier analysis. A band-limited signal — one whose Fourier transform vanishes for |xi| greater than B — is completely determined by its values at the sample points n/(2B) for n in Z. Reconstruction is given by the Whittaker-Shannon interpolation formula using sinc functions.

Shannon-Nyquist Theorem

If f is band-limited with bandwidth B (f-hat supported in [-B, B]), then f is determined by its samples (f(n/(2B)))_n in Z and can be reconstructed exactly. The minimum sampling rate 2B (two samples per period of the highest frequency) is the Nyquist rate. Sampling below the Nyquist rate causes aliasing.

Number Theory: Hardy-Ramanujan and the Circle Method

Fourier analysis on the circle (equivalently, the theory of exponential sums) is a central tool in analytic number theory. The Hardy-Ramanujan circle method, developed in 1918, uses Fourier series on the circle to extract asymptotic formulas for arithmetic quantities from generating functions.

Partition Function Asymptotics

The generating function for the number of partitions p(n) is the product over k of 1/(1-z^k). By Cauchy's integral formula, p(n) = integral[|z|=r] z^(-n-1) product(1/(1-z^k)) dz. Hardy and Ramanujan used Fourier analysis on the integration circle, exploiting the near-singularities at roots of unity (the "major arcs"), to derive the asymptotic p(n) ~ (1/(4n sqrt(3))) times e^(pi sqrt(2n/3)).

The circle method was later applied by Hardy, Littlewood, and Vinogradov to Goldbach's problem (every even integer is a sum of two primes) and Waring's problem (every positive integer is a sum of at most g(k) perfect k-th powers). The key analytical tool is the estimation of exponential sums sum of e^(2 pi i n alpha) over primes or other arithmetic sequences, which requires deep estimates from both harmonic analysis and analytic number theory.

Practice Problems with Solutions

Problem 1: Fourier Coefficients

Compute the Fourier coefficients of f(x) = x on (-pi, pi]. Use Parseval's identity to find the value of the series sum from n = 1 to infinity of 1/n^2.

Show Solution

Integration by parts: c_0 = (1/2pi) integral[-pi, pi] x dx = 0 by odd symmetry.

For n not equal to 0: c_n = (1/2pi) integral[-pi, pi] x e^(-inx) dx.

Integrating by parts: c_n = (1/2pi) [x e^(-inx)/(-in)] evaluated from -pi to pi minus (1/2pi) integral[-pi, pi] e^(-inx)/(-in) dx.

The last integral vanishes, and the boundary term gives c_n = (1/2pi) times (pi e^(-in pi) + pi e^(in pi)) / (in) = (1/2pi) times (2 pi cos(n pi)) / (in) = i(-1)^n / n.

So c_n = i(-1)^n / n for n not equal to 0, c_0 = 0.

By Parseval's identity: sum over n not equal to 0 of |c_n|^2 = (1/2pi) integral[-pi,pi] x^2 dx.

Left side: 2 sum from n=1 to infinity of 1/n^2. Right side: (1/2pi) times (2 pi^3/3) = pi^2/3.

Therefore: sum from n=1 to infinity of 1/n^2 = pi^2/6.

Problem 2: Fourier Transform of Gaussian

Compute the Fourier transform of the Gaussian f(x) = e^(-pi x^2). Verify that f is its own Fourier transform.

Show Solution

f-hat(xi) = integral[-inf, inf] e^(-pi x^2) e^(-2 pi i x xi) dx.

Complete the square in the exponent: -pi x^2 - 2 pi i x xi = -pi(x + i xi)^2 - pi xi^2.

So f-hat(xi) = e^(-pi xi^2) times integral[-inf, inf] e^(-pi(x + i xi)^2) dx.

Shift the contour of integration (justified by Cauchy's theorem since the integrand is entire): integral[-inf, inf] e^(-pi u^2) du = 1 (standard Gaussian integral).

Therefore f-hat(xi) = e^(-pi xi^2) = f(xi). The Gaussian is its own Fourier transform. This is the unique eigenfunction of the Fourier transform with eigenvalue 1 (up to scaling).

Problem 3: Maximal Function and Weak-Type Bound

Let f(x) = 1[0,1](x) (the indicator function of [0,1]). Compute Mf(x) for all x in R, and verify the weak-type (1,1) bound with C = 1.

Show Solution

For x in [0,1]: the ball B(x, r) contains all of [0,1] when r is large, so the best average includes the interval itself. For small r, the average is 1 if x is interior to [0,1]. Thus Mf(x) = 1 for x in [0,1].

For x greater than 1: the ball B(x, r) intersects [0,1] in the interval [max(0, x-r), 1] when x - r is less than 1. The average is min(1, r) / (2r) maximized at r = x (giving 1/(2x) when x is large). More carefully, Mf(x) = 1/(2(x - 0)) times 1 = 1/(2x) is not quite right; the optimal ball centered at x containing [0,1] has radius x, giving average 1/(2x). Actually Mf(x) = 1/(2(x-0)) = 1/(2x) does not hold; the correct value via direct computation for x greater than 1 is Mf(x) = 1/(2(x - 1/2)) when B is optimally chosen to include all of [0,1]. After careful optimization Mf(x) = 1/(2|x - 1/2|) for |x - 1/2| greater than 1/2 (i.e., x outside [0,1]).

Weak-type bound: the set (x : Mf(x) greater than lambda) for lambda less than 1 is the interval (1/2 - 1/(2lambda), 1/2 + 1/(2lambda)), which has measure 1/lambda = ||f||_1 / lambda. This confirms the bound with C = 1.

Problem 4: Hilbert Transform Computation

Use the Fourier multiplier characterization to compute H(f) where f(x) = 1/(1+x^2). State whether H(f) is in L^2.

Show Solution

First compute f-hat. The Fourier transform of 1/(1+x^2) is pi e^(-2pi |xi|) (computed by contour integration using the residue at x = i in the upper half-plane).

The Hilbert transform in frequency space multiplies by -i sign(xi): (Hf)-hat(xi) = -i sign(xi) times pi e^(-2pi |xi|).

Inverting: Hf(x) = inverse Fourier transform of (-i sign(xi) pi e^(-2pi |xi|)).

Since -i sign(xi) e^(-2pi |xi|) = -i sign(xi) e^(-2pi |xi|), and the inverse Fourier transform of sign(xi) e^(-2pi |xi|) is 2x/(1+x^2) times (1/2i) (using the imaginary part of the Cauchy kernel), we get Hf(x) = x/(1+x^2).

Since ||Hf||_2 = ||f||_2 (Plancherel), and f = 1/(1+x^2) is in L^2(R) (integral of 1/(1+x^2)^2 is pi/2), yes H(f) is in L^2.

Problem 5: Uncertainty Principle Application

Let f(x) = e^(-pi a x^2) for a greater than 0. Compute ||x f||_2, ||xi f-hat||_2, and ||f||_2, and verify the Heisenberg-Weyl inequality becomes an equality.

Show Solution

f-hat(xi) = (1/sqrt(a)) e^(-pi xi^2 / a) (computed as in Problem 2).

||f||_2^2 = integral e^(-2pi a x^2) dx = 1/sqrt(2a). So ||f||_2 = (2a)^(-1/4).

||x f||_2^2 = integral x^2 e^(-2pi a x^2) dx = 1/(4pi a) times sqrt(pi/(2a)) = 1/(4pi a) times (2a)^(-1/2) times sqrt(pi). After simplification, ||x f||_2^2 = 1/(8 pi a sqrt(2a)) times sqrt(pi) = ... use the formula integral x^2 e^(-cx^2) dx = sqrt(pi)/(2c^(3/2)) with c = 2pi a: ||x f||_2^2 = sqrt(pi) / (2 (2pi a)^(3/2)).

By symmetry (f-hat is also Gaussian with parameter a replaced by 1/a): ||xi f-hat||_2^2 = sqrt(pi) / (2 (2pi/a)^(3/2)) = a^(3/2) sqrt(pi) / (2 (2pi)^(3/2)).

Multiplying: ||x f||_2^2 times ||xi f-hat||_2^2 = pi / (4 (2pi)^3) times a^(3/2) / a^(3/2) = pi / (4 times 8 pi^3) = 1/(32 pi^2).

So ||x f||_2 times ||xi f-hat||_2 = 1/(4pi), and ||f||_2^2 = (2a)^(-1/2), giving the ratio ||x f||_2 times ||xi f-hat||_2 / ||f||_2^2 = 1/(4pi), confirming equality in the Heisenberg-Weyl inequality for all Gaussians.

Problem 6: Fejer's Theorem in Practice

Let f be the function on (-pi, pi] defined by f(x) = 0 for x less than 0 and f(x) = 1 for x greater than or equal to 0. To what value does the Fourier series of f converge at x = 0? What does Fejer's theorem predict for the Cesaro means at x = 0?

Show Solution

At x = 0, f has a jump discontinuity. The left limit is f(0-) = 0 and f(0) = f(0+) = 1.

By the Dirichlet-Jordan theorem (f is of bounded variation), the partial sums S_N(f)(0) converge to (f(0+) + f(0-))/2 = 1/2.

Fejer's theorem states that the Cesaro means sigma_N(f)(0) also converge to (f(0+) + f(0-))/2 = 1/2, since f has left and right limits at 0.

Both the partial sums and the Cesaro means converge to 1/2 at x = 0. Note the Gibbs phenomenon: the partial sums S_N(f) overshoot the jump by approximately 9% near the discontinuity, while the Cesaro means do not (the Fejer kernel is non-negative).

Exam Tips and Common Mistakes

Mistake: Forgetting the Principal Value

The Hilbert transform kernel 1/(pi x) is not integrable near 0. The integral must be taken as a principal value: the symmetric limit as epsilon to 0+ of the integral over (|x| greater than epsilon). Omitting the principal value makes the integral undefined. Also recall that the convolution of f with a principal value distribution is defined via duality or the Fourier multiplier.

Mistake: Confusing Weak-Type and Strong-Type Bounds

Weak-type (1,1) means the measure of the super-level set of Mf exceeds lambda is at most C||f||_1/lambda. This is NOT the same as saying Mf is in L^1 (it usually is not). The L^p boundedness of M for p greater than 1 is the strong-type bound, obtained from the weak-type bound via Marcinkiewicz interpolation, not directly from the weak-type definition.

Mistake: Applying Fourier Inversion Outside Its Domain

The Fourier inversion formula f(x) = integral f-hat(xi) e^(2pi i x xi) d-xi requires both f and f-hat to be in L^1 for pointwise recovery. For general f in L^2, inversion holds in the L^2 sense (limits of partial integrals). For distributions, inversion must be interpreted in the distributional (duality) sense.

Tip: Fourier Transform Table for Common Functions

f(x)f-hat(xi)
e^(-pi x^2)e^(-pi xi^2)
e^(-a|x|) (a greater than 0)2a / (a^2 + 4 pi^2 xi^2)
1[(-1/2, 1/2)](x) (box)sinc(xi) = sin(pi xi)/(pi xi)
delta(x)1
1 (constant)delta(xi)
p.v.(1/(pi x))-i sign(xi)

Tip: Duality Strategy for Distribution Problems

When working with distributions, the standard strategy is to move operations (derivatives, Fourier transforms) from the distribution to the test function using the duality definition, then compute classically. For example, to find the distributional derivative of |x|, compute (|x|)'(phi) = -(|x|)(phi') = -integral |x| phi'(x) dx, then integrate by parts to recognize the result as sign(x) acting on phi.

Tip: Normalizations Matter

There are multiple conventions for the Fourier transform: the factor can be 1/(2pi), 1/sqrt(2pi), or placed in the exponent as 2pi i xi vs i xi. On exams, state your convention once and be consistent. The Plancherel theorem, convolution theorem, and inversion formula all hold in any convention, but the precise constants differ. The convention f-hat(xi) = integral f(x) e^(-2pi i x xi) dx (as used above) gives the cleanest form of Plancherel.

Key Theorems at a Glance

TheoremStatement (informal)Key Condition
ParsevalL^2 norm is preserved: sum |c_n|^2 = ||f||_2^2f in L^2(T)
FejerCesaro means converge to (f(x+)+f(x-))/2f has left/right limits at x
Riemann-LebesgueFourier coefficients/transform vanish at infinityf in L^1
PlancherelFourier transform is L^2 isometryf in L^2(R)
Heisenberg-WeylPosition spread times frequency spread is at least 1/(4pi)f in L^2(R)
HL MaximalMf is weak-(1,1) and strong (p,p) for p greater than 1f in L^1 or L^p
M. Riesz (H)Hilbert transform bounded on L^p1 less than p less than infinity
MarcinkiewiczWeak-(1,1) plus L^inf bound implies L^p boundT sublinear, 1 less than p less than infinity
T(1)CZ operator is L^2 bounded iff T(1), T*(1) in BMOPlus weak boundedness property
MikhlinMultiplier bounded on L^p if symbol in symbol class1 less than p less than infinity
PontryaginG is canonically isomorphic to its double dual G-hat-hatG locally compact abelian
Shannon-NyquistBand-limited signal recovered from samples at Nyquist ratef-hat supported in [-B, B]

Related Topics

Further Reading

Stein and Weiss — Introduction to Fourier Analysis on Euclidean Spaces

The classic reference for L^2 theory, Riesz transforms, and the structure of harmonic analysis on R^n. Rigorous and comprehensive.

Stein — Singular Integrals and Differentiability Properties of Functions

The definitive text on Calderon-Zygmund theory, maximal functions, and Littlewood-Paley theory. Essential for anyone working in analysis.

Daubechies — Ten Lectures on Wavelets

The foundational text on wavelet theory, covering multiresolution analysis, filter banks, and regularity theory for wavelets.

Folland — A Course in Abstract Harmonic Analysis

Complete treatment of harmonic analysis on locally compact groups, Pontryagin duality, and representation theory.

Grafakos — Classical and Modern Fourier Analysis (2 vols.)

Modern comprehensive treatment including Calderon-Zygmund theory, BMO, Hardy spaces, and multilinear operators. The standard graduate reference.