What is a Fourier series and when does it converge?

A Fourier series represents a periodic function as an infinite sum of sines and cosines. For a function f with period 2*pi, the series is the sum over n of (a_n * cos(n*x) + b_n * sin(n*x)). The Dirichlet conditions guarantee pointwise convergence: f must have finitely many discontinuities and finitely many extrema in each period. At jump discontinuities, the series converges to the average of the left and right limits.

What is the Nyquist-Shannon sampling theorem?

The Nyquist-Shannon theorem states that a bandlimited signal with maximum frequency B Hz can be perfectly reconstructed from samples taken at a rate of at least 2B samples per second (the Nyquist rate). Sampling below 2B causes aliasing, where high-frequency components fold back and contaminate lower frequencies, making reconstruction impossible without additional information.

How does the Fast Fourier Transform achieve O(n log n) complexity?

The Cooley-Tukey FFT algorithm recursively splits a DFT of length n into two DFTs of length n/2 using the even-odd decomposition. The DFT of even-indexed inputs and the DFT of odd-indexed inputs combine via twiddle factors (roots of unity) to produce the full DFT. This divide-and-conquer strategy reduces the naive O(n^2) computation to O(n log n), which for n = 1,000,000 is roughly 20 million operations instead of 1 trillion.

What is the convolution theorem?

The convolution theorem states that the Fourier transform of the convolution of two functions equals the pointwise product of their transforms: F(f * g) = F(f) * F(g). Equivalently, pointwise multiplication in the time domain corresponds to convolution in the frequency domain. This transforms the computationally expensive convolution integral into a simple multiplication, which is why filtering in the frequency domain is so efficient.

What is the Plancherel theorem?

The Plancherel theorem states that the Fourier transform preserves the L2 norm: the integral of |f(x)|^2 dx equals the integral of |F(xi)|^2 d*xi. This means the total energy of a signal equals the total energy of its spectrum (Parseval identity in L2). It also implies the Fourier transform extends to a unitary isometry on the Hilbert space L2(R), not just on L1 functions.

What are wavelets and how do they improve on the Fourier transform?

Wavelets are oscillatory functions localized in both time and frequency, used to analyze signals at multiple scales simultaneously. Unlike the Fourier transform which provides global frequency information but no time localization, wavelets reveal which frequencies are present and when they occur. Multiresolution analysis decomposes signals into approximation (low-frequency) and detail (high-frequency) components at each scale. Applications include JPEG 2000 compression, edge detection, and denoising.

Advanced Mathematics

Fourier Analysis: Decomposing Signals into Frequencies

Fourier analysis is the mathematical framework for decomposing functions and signals into sinusoidal components. It underpins modern signal processing, image compression, quantum mechanics, and the solution of partial differential equations.

1. Fourier Series

A Fourier series represents a periodic function as a superposition of sinusoidal waves. The fundamental idea — that any periodic signal can be built from harmonics — was revolutionary when Fourier proposed it in 1807 and remains central to analysis today.

Definition and Coefficients

Let f be a real-valued function with period 2*pi (or period L with appropriate scaling). The Fourier series of f is:

f(x) = a_0/2 + sum over n from 1 to infinity of (a_n * cos(n*x) + b_n * sin(n*x))

a_0 = (1/pi) * integral from -pi to pi of f(x) dx

a_n = (1/pi) * integral from -pi to pi of f(x)*cos(n*x) dx

b_n = (1/pi) * integral from -pi to pi of f(x)*sin(n*x) dx

The coefficient a_0/2 is the mean value of f over one period. The coefficients a_n and b_n measure how much of the n-th harmonic cos(n*x) and sin(n*x) is present in f. Using complex exponentials simplifies the formulas: the complex Fourier coefficient is c_n = (1/2*pi) * integral from -pi to pi of f(x)*exp(-i*n*x) dx, and f(x) = sum over n from -infinity to +infinity of c_n*exp(i*n*x).

Orthogonality

The trigonometric system is orthogonal on [-pi, pi]. This means:

integral of cos(m*x)*cos(n*x) dx = 0 if m is not equal to n, pi if m = n
integral of sin(m*x)*sin(n*x) dx = 0 if m is not equal to n, pi if m = n
integral of cos(m*x)*sin(n*x) dx = 0 for all m, n

Orthogonality is why projecting f onto each basis function (integrating against cos(n*x) or sin(n*x)) isolates exactly the n-th coefficient. The trigonometric system forms a complete orthonormal basis for L^2([-pi, pi]) after normalization.

Parseval's Identity

Parseval's identity is the Fourier series analogue of the Pythagorean theorem. For f in L^2([-pi, pi]):

(1/pi) * integral from -pi to pi of |f(x)|^2 dx = (a_0^2)/2 + sum over n from 1 to infinity of (a_n^2 + b_n^2)

Equivalently: ||f||^2 = sum over n of |c_n|^2

This says the total energy of f equals the sum of energies of its Fourier coefficients. Parseval's identity also implies that if all Fourier coefficients of f are zero, then f = 0 in L^2, confirming completeness of the trigonometric system.

Convergence: Dirichlet Conditions

The Dirichlet theorem gives sufficient conditions for pointwise convergence of a Fourier series:

→f is periodic with period 2*pi
→f has finitely many jump discontinuities in [-pi, pi]
→f has finitely many local maxima and minima in [-pi, pi]

Under these conditions, the Fourier series converges to f(x) at every continuity point, and to (f(x-) + f(x+))/2 — the average of left and right limits — at each jump discontinuity.

Gibbs Phenomenon

Important Anomaly Near Jump Discontinuities

When a Fourier series is truncated at N terms near a jump discontinuity, the partial sums overshoot the function by approximately 9% of the jump height, regardless of how many terms are used. This overshoot does not vanish as N grows to infinity; it merely narrows and moves closer to the discontinuity. The Gibbs phenomenon explains why naive truncation of Fourier series produces oscillatory artifacts near sharp edges — a critical concern in signal processing and image analysis. Remedies include using Cesaro summation (Fejer kernel), which averages partial sums and converges without overshoot, or applying a sigma factor to damp high-frequency coefficients.

2. The Fourier Transform

The Fourier transform extends Fourier series to non-periodic functions on the real line. Instead of discrete frequencies n = 0, 1, 2, ..., it produces a continuous spectrum over all real frequencies xi.

Definition

For f in L^1(R), the Fourier transform is:

F(f)(xi) = f-hat(xi) = integral from -infinity to +infinity of f(x) * exp(-2*pi*i*xi*x) dx

Inverse: f(x) = integral from -infinity to +infinity of f-hat(xi) * exp(2*pi*i*xi*x) d*xi

Alternative conventions place 2*pi in the exponent differently or include a factor of 1/sqrt(2*pi). The convention above is most common in engineering and probability. The angular frequency convention uses omega = 2*pi*xi and writes exp(-i*omega*x).

Key Properties

Property	Time Domain	Frequency Domain
Linearity	af + bg	aF(f) + bF(g)
Time shift	f(x - x_0)	exp(-2piixix_0) * f-hat(xi)
Frequency shift	exp(2piixi_0x) * f(x)	f-hat(xi - xi_0)
Scaling	f(a*x)	(1/\|a\|) * f-hat(xi/a)
Differentiation	f'(x)	2piixi f-hat(xi)
Convolution	(f * g)(x)	f-hat(xi) * g-hat(xi)
Conjugation	f-bar(x) (complex conjugate)	f-hat-bar(-xi)

The differentiation property is especially powerful: it converts differential equations into algebraic equations in the frequency domain, because differentiation in time becomes multiplication by the frequency variable.

Inversion Formula

The Fourier inversion theorem states: if f is in L^1(R) and its transform f-hat is also in L^1(R), then for almost every x,

f(x) = integral from -infinity to +infinity of f-hat(xi) * exp(2*pi*i*xi*x) d*xi

At points where f is continuous, the formula holds pointwise. At jump discontinuities (if f is of bounded variation), it gives the average of left and right limits — mirroring the Dirichlet theorem for Fourier series.

3. Fourier Transform on L^1 and L^2

The rigorous theory of the Fourier transform requires careful attention to which function spaces it acts on. The behavior differs significantly between L^1 and L^2.

Riemann-Lebesgue Lemma

For any f in L^1(R), its Fourier transform satisfies:

lim as |xi| approaches infinity of f-hat(xi) = 0

That is, f-hat vanishes at infinity. This says rapidly oscillating exponentials integrate against an L^1 function to near zero — the positive and negative contributions cancel. The Riemann-Lebesgue lemma also implies f-hat is bounded (by the L^1 norm of f) and continuous. However, f-hat need not be in L^1 itself, which complicates the inversion formula.

Plancherel Theorem (Extension to L^2)

The Plancherel theorem is the cornerstone of L^2 Fourier theory:

||f-hat||_L2 = ||f||_L2

Equivalently: integral |f-hat(xi)|^2 d*xi = integral |f(x)|^2 dx

Because the Fourier transform preserves the L^2 norm, it extends uniquely from L^1 intersection L^2 to all of L^2(R) as a unitary (norm-preserving) isometry. The image of the Fourier transform on L^2 is all of L^2 — it is surjective. This makes the L^2 Fourier transform a unitary operator on the Hilbert space L^2(R), and its inverse equals its adjoint.

Schwartz Space and Tempered Distributions

The Schwartz space S(R) consists of all infinitely differentiable functions that decay rapidly at infinity:

S(R) = functions f such that sup over x of |x^alpha * d^beta f/dx^beta(x)| is finite for all non-negative integers alpha, beta

Examples: exp(-x^2) (Gaussian), exp(-x^2)*polynomial. Every Gaussian is in S(R), and the Fourier transform of a Gaussian is again a Gaussian.

The Fourier transform maps S(R) to S(R) bijectively — it is a topological automorphism of Schwartz space. This is the ideal setting: the transform and its inverse are both defined and map nice functions to nice functions.

Tempered distributions are continuous linear functionals on S(R). The Fourier transform extends to tempered distributions by duality: for a tempered distribution T and a Schwartz function phi, define F(T)(phi) = T(F(phi)). This allows the Fourier transform of the Dirac delta and other singular objects to be defined rigorously.

Dirac Delta and Weak Derivatives

The Dirac delta delta(x) is not a function but a distribution: delta(phi) = phi(0) for all phi in S(R). Its Fourier transform is:

F(delta)(xi) = 1 for all xi

F(1)(xi) = delta(xi) (constant function transforms to delta)

F(exp(2*pi*i*xi_0*x))(xi) = delta(xi - xi_0)

These identities make rigorous the informal idea that a pure sinusoid is concentrated at a single frequency. The weak derivative of a distribution T is defined by D(T)(phi) = -T(D(phi)), extending differentiation to functions like the Heaviside step function whose classical derivative does not exist. The weak derivative of the Heaviside function is the Dirac delta.

4. Discrete Fourier Transform (DFT)

The DFT operates on finite sequences of complex numbers rather than continuous functions. It is the version implemented in computers and the foundation of all practical spectral analysis.

Definition

Given a sequence x_0, x_1, ..., x(N-1) of N complex numbers, the DFT produces N frequency-domain values X_0, X_1, ..., X(N-1):

X_k = sum over n from 0 to N-1 of x_n * exp(-2*pi*i*k*n/N)

for k = 0, 1, ..., N-1

Inverse DFT: x_n = (1/N) * sum over k from 0 to N-1 of X_k * exp(2*pi*i*k*n/N)

Matrix Form

The DFT is a linear transformation, so it can be written as matrix multiplication X = W * x, where W is the N x N DFT matrix with entries W(k,n) = omega^(k*n) and omega = exp(-2*pi*i/N) (the primitive N-th root of unity):

W = [[1, 1, 1, ..., 1],

[1, omega, omega^2, ..., omega^(N-1)],

[1, omega^2, omega^4, ..., omega^(2*(N-1))],

... ,

[1, omega^(N-1), omega^(2*(N-1)), ..., omega^((N-1)^2)]]

The DFT matrix W is symmetric and unitary up to a factor of sqrt(N). The inverse DFT matrix is (1/N) * W-conjugate-transpose = (1/N) * W-bar.

Cyclic Convolution

The DFT converts cyclic (circular) convolution into pointwise multiplication. Given two sequences x and h of length N, their cyclic convolution y is:

y_n = sum over m from 0 to N-1 of x_m * h((n-m) mod N)

DFT gives: Y_k = X_k * H_k (pointwise product of DFTs)

To compute linear convolution (not cyclic) using the DFT, zero-pad both sequences to length at least M + N - 1 (where M, N are the original lengths), compute the DFT of each, multiply pointwise, then take the inverse DFT. This exploits the FFT's O(N log N) speed rather than direct O(N^2) convolution.

DFT Properties and Applications

→Periodicity: X_k = X(k+N), so the spectrum is periodic with period N
→Parseval: sum of |x_n|^2 = (1/N) * sum of |X_k|^2 (energy conservation)
→For real inputs, X(N-k) = X_k-bar (conjugate symmetry), so only the first N/2 + 1 values are independent
→Applications: spectral analysis, digital filtering, solving difference equations, data compression, MRI reconstruction

5. Fast Fourier Transform (FFT)

The Fast Fourier Transform is an algorithm — not a different mathematical transform — that computes the DFT in O(N log N) time instead of O(N^2). The Cooley-Tukey algorithm (1965) is the most widely used FFT, though related ideas appeared in Gauss's work around 1805.

Cooley-Tukey Radix-2 Algorithm

When N = 2^m (a power of 2), the DFT can be split into two DFTs of length N/2 using the even-odd decomposition. Let omega_N = exp(-2*pi*i/N):

X_k = sum over n even of x_n * omega_N^(k*n) + sum over n odd of x_n * omega_N^(k*n)

Let x_even = [x_0, x_2, x_4, ..., x(N-2)]

Let x_odd = [x_1, x_3, x_5, ..., x(N-1)]

E_k = DFT of x_even at frequency k (length N/2)

O_k = DFT of x_odd at frequency k (length N/2)

X_k = E_k + omega_N^k * O_k

X(k+N/2) = E_k - omega_N^k * O_k

The twiddle factor omega_N^k rotates the odd DFT result before combining. The key insight is that E(k+N/2) = E_k and O(k+N/2) = O_k (periodicity of the sub-DFTs), so we only need to compute N/2 values of each sub-DFT to recover all N values of the full DFT. This halving recurses: each N/2 DFT splits into two N/4 DFTs, and so on.

Complexity Analysis

Recurrence Relation

T(N) = 2 * T(N/2) + O(N)

By the Master Theorem: T(N) = O(N * log_2(N))

For N = 10^6:

Naive DFT: about 10^12 operations

FFT: about 2 * 10^7 operations (50,000x faster)

For non-power-of-2 sizes, the mixed-radix FFT factors N into smaller primes and applies analogous splits. The Bluestein algorithm handles arbitrary N by embedding the DFT into a convolution computable by a power-of-2 FFT.

Bit Reversal Permutation

The recursive even-odd splitting rearranges the input in a specific permutation: the index of each element in the final rearrangement is the bit-reversal of its original index. For N = 8:

Original index: 0 (000), 1 (001), 2 (010), 3 (011), 4 (100), 5 (101), 6 (110), 7 (111)

Bit-reversed: 0 (000), 4 (100), 2 (010), 6 (110), 1 (001), 5 (101), 3 (011), 7 (111)

In-place iterative FFT implementations apply this bit-reversal permutation first, then perform log_2(N) butterfly passes. Each butterfly is the elementary operation that computes (a + w*b, a - w*b) from two values (a, b) and a twiddle factor w.

6. Convolution Theorem

The convolution theorem is perhaps the single most useful property of the Fourier transform in applications, converting a computationally expensive integral operation into simple multiplication.

Convolution in Time and Frequency

The convolution of two functions f and g is:

(f * g)(x) = integral from -infinity to +infinity of f(t) * g(x - t) dt

Convolution Theorem: F(f * g)(xi) = F(f)(xi) * F(g)(xi)

Dual (multiplication theorem): F(f * g)(xi) = (F(f) convolved with F(g))(xi)

The theorem requires f and g to be in L^1(R) for the convolution theorem to hold in L^1. For L^2 functions the theorem holds with the Fourier transform interpreted in the L^2 sense. In the DFT setting, convolution becomes cyclic convolution (see Section 4).

Applications of the Convolution Theorem

Linear Filtering

A linear time-invariant (LTI) system with impulse response h acts on input f by convolution: output = f * h. In the frequency domain this is multiplication by the transfer function H = F(h), making filter design straightforward: choose H to pass or reject desired frequencies.

Probability (Sums of RVs)

The probability density of the sum X + Y of independent random variables equals the convolution of their individual densities. Characteristic functions (Fourier transforms of densities) multiply: phi(X+Y)(t) = phi_X(t) * phi_Y(t). This is why the normal distribution is stable under convolution.

Fast Polynomial Multiplication

Multiplying two degree-N polynomials via coefficient convolution takes O(N^2). Using FFT: evaluate both polynomials at 2N+1 roots of unity (O(N log N) via FFT), multiply pointwise (O(N)), then interpolate back (O(N log N) via inverse FFT). Total: O(N log N).

Image Blurring and Sharpening

Convolving an image with a Gaussian kernel blurs it by attenuating high-frequency components. Multiplying by 1/G(xi) (the inverse filter) theoretically sharpens it. In practice, Wiener filtering (regularized deconvolution) avoids noise amplification.

7. Sampling Theory: Nyquist-Shannon Theorem

Sampling theory addresses the fundamental question: how densely must we sample a continuous signal to capture all its information, and how can we perfectly reconstruct it from those samples?

Bandlimited Signals

A function f is bandlimited to B Hz if its Fourier transform f-hat(xi) = 0 for all |xi| greater than B. This means f contains no frequency components above B. Examples include audio signals processed through a low-pass filter.

Nyquist-Shannon Sampling Theorem

Theorem (Shannon, 1949)

A function f(t) bandlimited to B Hz is completely determined by its samples f(n/T) taken at rate T samples per second, provided T greater than or equal to 2B (the Nyquist rate). The reconstruction formula is:

f(t) = sum over n from -infinity to +infinity of f(n/T) * sinc(T*t - n)

where sinc(x) = sin(pi*x) / (pi*x)

The sinc function is the ideal interpolation kernel. Each sample f(n/T) contributes a shifted sinc, and the sum reconstructs f exactly. In practice, sinc has infinite support so perfect reconstruction requires infinitely many samples — practical systems use truncated or windowed sinc approximations.

Aliasing

If f is sampled at rate T less than 2B (below Nyquist), frequency components above T/2 fold back (alias) into the baseband [0, T/2]. The spectrum of the sampled signal is the sum of shifted copies of f-hat, spaced T apart; when T is too small these copies overlap, corrupting the spectrum irreversibly.

Real-World Consequence

Audio CDs sample at 44,100 Hz, safely above twice the 20 kHz human hearing limit. A low-pass anti-aliasing filter removes frequencies above 22,050 Hz before sampling. Without this filter, inaudible high frequencies would fold back into the audible range as distortion artifacts.

Oversampling and Interpolation

Sampling at a rate far above Nyquist (oversampling) provides immunity to aliasing and simplifies anti-aliasing filter design. Modern audio codecs typically oversample by factors of 64-128x, apply digital filtering, then downsample. Interpolation between samples uses polynomial methods (linear, cubic spline, Lanczos) as practical approximations to sinc interpolation.

8. Short-Time Fourier Transform and Time-Frequency Analysis

The global Fourier transform reveals which frequencies are present in a signal but not when they occur. The short-time Fourier transform (STFT) provides local frequency information by analyzing the signal through a sliding window.

Definition of the STFT

Given a window function g (typically a Gaussian or Hann window), the STFT of f is:

STFT(f)(t, xi) = integral from -infinity to +infinity of f(x) * g(x - t) * exp(-2*pi*i*xi*x) dx

The window g(x - t) localizes the analysis near time t. The result STFT(f)(t, xi) tells us roughly how much of frequency xi is present near time t. The magnitude squared |STFT(f)(t, xi)|^2 is the spectrogram, used in speech processing, music analysis, and audio recognition.

The Heisenberg-Gabor Uncertainty Principle

A fundamental limitation: no signal can be perfectly localized in both time and frequency simultaneously. The time-bandwidth product is bounded below:

Delta_t * Delta_xi is greater than or equal to 1/(4*pi)

where Delta_t = RMS time duration and Delta_xi = RMS bandwidth. Equality holds if and only if f is a Gaussian function. A narrow window gives good time resolution but poor frequency resolution; a wide window gives good frequency resolution but poor time resolution. This trade-off is inherent and cannot be circumvented.

Gabor atoms are Gaussians modulated by sinusoids: g(t) = exp(-pi*t^2) * exp(2*pi*i*xi_0*t). They achieve the uncertainty bound and form the basis of Gabor analysis. The Wigner-Ville distribution is an alternative time-frequency representation with better theoretical properties but cross-term interference artifacts.

9. Wavelets and Multiresolution Analysis

Wavelets overcome the fixed time-frequency trade-off of the STFT by using short windows at high frequencies and long windows at low frequencies — adapting resolution to the signal scale.

Continuous Wavelet Transform (CWT)

The CWT analyzes f at scale a (dilation) and position b (translation) using a mother wavelet psi:

CWT(f)(a, b) = (1/sqrt(|a|)) * integral f(t) * psi-bar((t-b)/a) dt

The factor 1/sqrt(|a|) normalizes energy across scales. At small a (fine scale), psi is compressed and detects high-frequency features (edges, transients). At large a (coarse scale), psi is stretched and detects low-frequency structure (trends, slow variations).

The admissibility condition for psi requires that the integral of psi over all t is zero (psi has zero mean) and that integral of |psi-hat(xi)|^2 / |xi| d*xi is finite. Under admissibility, the reconstruction formula holds:

f(t) = (1/C_psi) * integral integral CWT(f)(a, b) * (1/sqrt(|a|)) * psi((t-b)/a) * da*db / a^2

Multiresolution Analysis (MRA)

MRA provides the algebraic framework for the discrete wavelet transform. An MRA consists of a nested sequence of closed subspaces of L^2(R):

... subset V(-2) subset V(-1) subset V(0) subset V(1) subset V(2) subset ...

such that: union of all V_j is dense in L^2(R);

intersection of all V_j = 0 (zero function only); f(x) in V_j iff f(2x) in V(j+1);

there exists a scaling function phi such that integer translates of phi form an orthonormal basis of V_0

At each level j, V_j approximates L^2 functions at resolution 2^j. The detail space W_j is the orthogonal complement of V_j in V(j+1): V(j+1) = V_j plus W_j. Wavelets psi(j,k) = 2^(j/2) * psi(2^j*x - k) form an orthonormal basis for W_j. Together, all wavelet spaces give a complete orthonormal basis for L^2(R).

Haar Wavelet

The simplest wavelet, proposed by Haar in 1910:

psi(t) = +1 if 0 is less than or equal to t is less than 1/2

psi(t) = -1 if 1/2 is less than or equal to t is less than 1

psi(t) = 0 otherwise

The corresponding scaling function phi(t) = 1 on [0, 1), 0 elsewhere (the box function). Haar wavelets have compact support (which is good for localization) but are discontinuous (which limits regularity in applications). The Haar DWT of a sequence proceeds by averaging and differencing adjacent pairs.

Daubechies Wavelets

Daubechies (1988) constructed compactly supported orthogonal wavelets with higher regularity. The Daubechies-N (DbN) wavelet has N vanishing moments (integral of t^k * psi(t) dt = 0 for k = 0, ..., N-1) and support of length 2N - 1. Key examples:

→Db1 = Haar wavelet (1 vanishing moment, support length 1)
→Db2 (1 vanishing moment beyond Haar, support length 3, continuous)
→Db4 (4 vanishing moments, support length 7, used in JPEG 2000)
→More vanishing moments = better polynomial approximation = better compression, but longer filters = more computation

Discrete Wavelet Transform (DWT)

The DWT decomposes a signal x into approximation coefficients (low frequency) and detail coefficients (high frequency) at each scale using a pair of filters: low-pass h and high-pass g derived from the scaling function and wavelet. One level of the DWT:

a_j[k] = sum over n of h[n - 2k] * x[n] (approximation, downsample by 2)

d_j[k] = sum over n of g[n - 2k] * x[n] (detail, downsample by 2)

This filter bank structure makes the DWT efficient: one level of DWT costs O(N). The approximation a_j is recursively decomposed, giving a logarithmic tree of coefficients. The full multi-level DWT costs O(N) total. Perfect reconstruction is guaranteed by the quadrature mirror filter (QMF) conditions on h and g.

10. Applications to Partial Differential Equations

The Fourier transform is one of the most powerful tools for solving PDEs, converting differential equations in x into algebraic equations in the frequency variable xi, which can be solved explicitly.

Heat Equation on the Real Line

The heat equation describes temperature evolution u(x, t):

du/dt = k * d^2u/dx^2, x in R, t greater than 0

Initial condition: u(x, 0) = f(x)

Solution method:

Take Fourier transform in x: d(u-hat)/dt = -k * (2*pi*xi)^2 * u-hat

This is an ODE in t: u-hat(xi, t) = f-hat(xi) * exp(-4*pi^2*k*xi^2*t)

Invert: u(x, t) = (f * G_t)(x)

where G_t(x) = (1/sqrt(4*pi*k*t)) * exp(-x^2 / (4*k*t)) (the heat kernel = Gaussian with variance 2*k*t)

The solution is the convolution of the initial data f with the heat kernel G_t. Differentiating in the frequency domain became multiplication, turning the PDE into an ODE that is trivial to solve.

Wave Equation

The 1D wave equation u_tt = c^2 * u_xx with initial conditions u(x, 0) = f(x) and u_t(x, 0) = g(x) transforms to:

d^2(u-hat)/dt^2 = -(2*pi*c*xi)^2 * u-hat

Solution: u-hat(xi, t) = f-hat(xi)*cos(2*pi*c*xi*t) + g-hat(xi)*sin(2*pi*c*xi*t) / (2*pi*c*xi)

Inverting gives d'Alembert's formula:

u(x,t) = (f(x+ct) + f(x-ct))/2 + (1/(2c)) * integral from x-ct to x+ct of g(s) ds

Poisson Equation

The Poisson equation in R^n, minus-Laplacian u = f, transforms to:

4*pi^2 * |xi|^2 * u-hat(xi) = f-hat(xi)

So u-hat(xi) = f-hat(xi) / (4*pi^2 * |xi|^2)

u = f * G, where G is the fundamental solution (Green's function) whose Fourier transform is 1 / (4*pi^2 * |xi|^2)

The Fourier transform also reveals the symbol of a differential operator: the Laplacian has symbol -4*pi^2*|xi|^2. Pseudodifferential operators generalize this: instead of a polynomial in xi, the symbol can be any suitable function of xi, allowing fractional powers of the Laplacian and other non-local operators.

11. Applications in Science and Engineering

Fourier analysis is embedded in virtually every area of modern science, engineering, and data processing.

Signal Processing

Digital filters (low-pass, high-pass, band-pass) are designed in the frequency domain and implemented via convolution. The FFT enables real-time spectrum analysis, OFDM modulation in WiFi and LTE, radar signal processing, and audio equalization. Speech recognition systems extract MFCC (mel-frequency cepstral) features using DFT.

Image Compression (JPEG)

JPEG uses the Discrete Cosine Transform (DCT), a close relative of the DFT using only cosines. The image is divided into 8x8 pixel blocks; each block's DCT is computed; high-frequency coefficients are quantized (rounded aggressively) or discarded; and the remaining coefficients are entropy-coded. JPEG 2000 uses the DWT (Daubechies 9/7 wavelet), which avoids blocking artifacts and supports progressive decoding.

Magnetic Resonance Imaging (MRI)

MRI scanners directly measure the 2D Fourier transform of the imaged cross-section (called k-space data). Applying the inverse 2D FFT reconstructs the image. Compressed sensing MRI exploits sparsity of medical images in wavelet bases to reconstruct from far fewer measurements than the Nyquist rate, dramatically reducing scan time.

Quantum Mechanics

In quantum mechanics, the Fourier transform connects position space and momentum space representations. If psi(x) is the position-space wave function, then its Fourier transform phi(p) = F(psi)(p/h-bar) is the momentum-space wave function. The Heisenberg uncertainty principle delta_x * delta_p is greater than or equal to h-bar/2 is a direct consequence of the Fourier uncertainty relation.

Numerical PDE Solvers

Spectral methods solve PDEs by expanding the solution in Fourier (or Chebyshev) modes and evolving the coefficients. They achieve exponential convergence for smooth solutions, far outperforming finite difference methods. Pseudospectral methods compute nonlinear terms in physical space (via FFT) and derivatives in frequency space — combining the best of both.

Audio and Music

MP3 encoding uses the Modified Discrete Cosine Transform (MDCT) with overlapping windows to avoid blocking artifacts. The human auditory system perceives pitch on a logarithmic scale (the mel scale); mel filterbanks derived from DFT spectra model this. Pitch detection, chord recognition, and music transcription all rely on spectral analysis.

12. Practice Problems with Solutions

Work through these problems to master Fourier analysis. Click each problem to reveal the solution.

Compute the Fourier series of the square wave f(x) = 1 for 0 less than x less than pi and f(x) = -1 for -pi less than x less than 0, with period 2*pi.▼

The square wave is odd, so all cosine coefficients a_n = 0. For b_n: b_n = (1/pi) * integral from -pi to pi of f(x)*sin(n*x) dx = (2/pi) * integral from 0 to pi of sin(n*x) dx = (2/pi) * [-cos(n*x)/n] from 0 to pi = (2/(n*pi)) * (1 - cos(n*pi)). Since cos(n*pi) = (-1)^n, we get b_n = 0 when n is even, and b_n = 4/(n*pi) when n is odd. The Fourier series is: f(x) = (4/pi) * sum over odd n of sin(n*x)/n = (4/pi)(sin(x) + sin(3x)/3 + sin(5x)/5 + ...). At x = pi/2, this gives the Leibniz formula: pi/4 = 1 - 1/3 + 1/5 - 1/7 + ...

Compute the Fourier transform of f(x) = exp(-a*|x|) for a greater than 0.▼

Split into two pieces: integral from -infinity to 0 of exp(a*x)*exp(-2*pi*i*xi*x) dx + integral from 0 to +infinity of exp(-a*x)*exp(-2*pi*i*xi*x) dx. First piece: integral from -infinity to 0 of exp((a - 2*pi*i*xi)*x) dx = 1/(a - 2*pi*i*xi). Second piece: integral from 0 to +infinity of exp(-(a + 2*pi*i*xi)*x) dx = 1/(a + 2*pi*i*xi). Sum: 1/(a - 2*pi*i*xi) + 1/(a + 2*pi*i*xi) = 2a / (a^2 + (2*pi*xi)^2) = 2a / (a^2 + 4*pi^2*xi^2). This is a Lorentzian (Cauchy distribution in probability). The Fourier transform of a decaying exponential is a Lorentzian: sharper decay (larger a) corresponds to broader spectrum.

Prove Parseval's identity for Fourier series: if f has Fourier coefficients c_n, then the integral from -pi to pi of |f(x)|^2 dx = 2*pi * sum over all n of |c_n|^2.▼

Write the L^2 inner product of f with itself, expanding f by its Fourier series. Compute the integral from -pi to pi of |f(x)|^2 dx = integral of f(x) * (sum_n c_n * exp(i*n*x))-bar dx. By linearity and the orthogonality integral of exp(i*m*x)*exp(-i*n*x) dx = 2*pi if m = n and 0 otherwise, we get 2*pi * sum_n |c_n|^2. Formally, this uses the Bessel inequality (which holds for any orthonormal set) and completeness (equality). In L^2, the partial sums converge in L^2 norm by the completeness of the trigonometric system, justifying the interchange of sum and integral. The identity expresses that the map f to (c_n) is an isometry from L^2([-pi,pi]) to l^2(Z), the space of square-summable sequences.

A signal f has Fourier transform f-hat(xi) = 1 for |xi| less than B and 0 otherwise (an ideal low-pass filter). What is f(t)?▼

By the inverse Fourier transform: f(t) = integral from -B to B of exp(2*pi*i*xi*t) d*xi = [exp(2*pi*i*xi*t) / (2*pi*i*t)] from xi = -B to B = exp(2*pi*i*B*t)/(2*pi*i*t) - exp(-2*pi*i*B*t)/(2*pi*i*t) = (2*sin(2*pi*B*t)) / (2*pi*t) = 2B * sinc(2*B*t) where sinc(x) = sin(pi*x)/(pi*x). The ideal low-pass filter in frequency is a sinc function in time. This sinc has infinite duration (it is not causal), which is why ideal low-pass filters cannot be realized in practice — physical causal filters must approximate this behavior.

The DFT of x = [1, 1, 1, 1] (N=4). Compute X_k for k = 0, 1, 2, 3.▼

For N=4, omega = exp(-2*pi*i/4) = exp(-pi*i/2) = -i. X_k = sum over n=0 to 3 of x_n * omega^{k*n} = sum of omega^{k*n} for n=0,1,2,3 (since all x_n = 1). For k=0: X_0 = 1 + 1 + 1 + 1 = 4. For k=1: X_1 = 1 + (-i) + (-i)^2 + (-i)^3 = 1 + (-i) + (-1) + (i) = 0. For k=2: X_2 = 1 + (-i)^2 + (-i)^4 + (-i)^6 = 1 + (-1) + 1 + (-1) = 0. For k=3: X_3 = 1 + (-i)^3 + (-i)^6 + (-i)^9 = 1 + i + (-1) + (-i) = 0. So X = [4, 0, 0, 0]. A constant sequence (DC only) transforms to a spike at k=0 with amplitude N=4 and zeros elsewhere. This makes sense: a constant has energy only at zero frequency.

Solve the heat equation u_t = u_{xx} on R with initial data u(x, 0) = delta(x) (Dirac delta). What is u(x, t) and what does it represent physically?▼

Taking the Fourier transform in x: u-hat_t(xi, t) = -(2*pi*xi)^2 * u-hat(xi, t). The initial condition transforms to u-hat(xi, 0) = F(delta)(xi) = 1. The ODE gives u-hat(xi, t) = exp(-(2*pi*xi)^2 * t). Inverting the Fourier transform: the Gaussian exp(-4*pi^2*xi^2*t) transforms back to (1/sqrt(4*pi*t)) * exp(-x^2/(4t)). So u(x, t) = (1/sqrt(4*pi*t)) * exp(-x^2/(4t)). This is the fundamental solution (heat kernel) of the heat equation. Physically: if heat is concentrated at one point at t=0, it spreads into a Gaussian that broadens and flattens as time increases, while conserving total heat (the integral is always 1). The width grows as sqrt(t), consistent with the diffusive scaling.

State and prove the Riemann-Lebesgue lemma: for f in L^1(R), lim as |xi| approaches infinity of the integral of f(x)*exp(-2*pi*i*xi*x) dx = 0.▼

Step 1 (step functions): For f = 1_{[a,b]}, f-hat(xi) = integral from a to b of exp(-2*pi*i*xi*x) dx = [exp(-2*pi*i*xi*x)/(-2*pi*i*xi)] from a to b = (exp(-2*pi*i*xi*a) - exp(-2*pi*i*xi*b)) / (2*pi*i*xi). Taking absolute value: |f-hat(xi)| is less than or equal to 1 / (pi*|xi|), which goes to 0 as |xi| approaches infinity. Step 2 (step function combinations): By linearity, the lemma holds for finite linear combinations of indicator functions of intervals (step functions). Step 3 (general L^1): For any f in L^1(R) and epsilon greater than 0, approximate f by a step function s with ||f - s||_1 less than epsilon/2 (step functions are dense in L^1). Then |f-hat(xi)| is less than or equal to |(f-s)-hat(xi)| + |s-hat(xi)|. The first term is at most ||f - s||_1 less than epsilon/2 by the basic bound |f-hat(xi)| is less than or equal to ||f||_1. The second term is less than epsilon/2 for |xi| sufficiently large by Step 2. Thus |f-hat(xi)| less than epsilon for all sufficiently large |xi|. QED.

Essential Theorems at a Glance

Theorem	Hypothesis	Conclusion
Dirichlet (Fourier Series)	Piecewise smooth, period 2*pi	Series converges pointwise to (f(x+) + f(x-))/2
Parseval (Series)	f in L^2([-pi, pi])	\|\|f\|\|^2 = 2pi sum of \|c_n\|^2
Riemann-Lebesgue	f in L^1(R)	f-hat(xi) approaches 0 as \|xi\| approaches infinity
Fourier Inversion	f, f-hat in L^1(R)	f(x) = integral of f-hat(xi)exp(2piixix) dxi a.e.
Plancherel	f in L^2(R)	\|\|f-hat\|\|_L2 = \|\|f\|\|_L2
Convolution Theorem	f, g in L^1(R)	F(f * g) = F(f) * F(g)
Nyquist-Shannon	f bandlimited to B Hz	Perfect reconstruction from samples at rate greater than or equal to 2B
Uncertainty Principle	f in L^2(R)	Delta_t * Delta_xi is greater than or equal to 1/(4*pi)
Cooley-Tukey FFT	N = 2^m inputs	DFT computed in O(N log N) operations

Ready to Test Your Fourier Analysis Skills?

Practice Fourier series computations, transform problems, and FFT algorithm questions with our adaptive math tool. Build confidence for graduate analysis and signal processing exams.

Start Practicing Now