Advanced Mathematics

Stochastic Processes: Modeling Random Systems Over Time

Stochastic processes are mathematical models for systems that evolve randomly over time. From stock prices to queuing networks to the motion of molecules, these tools provide the rigorous language for reasoning about uncertainty in dynamic settings.

Learning Objectives

After working through this guide, you will be able to:

  • Define probability spaces, sigma-algebras, and random variables with full measure-theoretic rigor
  • Analyze Markov chains using transition matrices, compute stationary distributions, and verify ergodicity
  • Work with Poisson processes and derive key properties including superposition and thinning
  • State the definition and properties of standard Brownian motion and compute quadratic variation
  • Apply the optional stopping theorem and Doob's martingale convergence theorem
  • Use Ito's lemma to differentiate functions of Brownian motion and solve the Black-Scholes equation
  • Analyze M/M/1 and M/M/c queues and apply Little's law
  • State and apply the renewal reward theorem and its consequences

1. Probability Space Fundamentals

A stochastic process lives on a probability space. Before studying processes, we must understand the measure-theoretic foundations that give probability its mathematical precision.

The Probability Triple

A probability space is a triple (Omega, F, P) where:

  • Omega — the sample space: the set of all possible outcomes of the random experiment
  • F — the sigma-algebra: a collection of subsets of Omega called events, closed under complementation and countable unions
  • P — the probability measure: a function from F to [0,1] satisfying P(Omega) = 1 and countable additivity

Sigma-Algebras in Detail

A sigma-algebra F on Omega is a family of subsets satisfying three axioms: (1) Omega is in F, (2) if A is in F then its complement is in F, and (3) if A1, A2, ... are in F then their countable union is in F. The pair (Omega, F) is called a measurable space.

The Borel sigma-algebra on the real line is generated by all open intervals. It contains all open sets, closed sets, and most sets you encounter in practice. Stochastic processes on R use Borel sets as their natural sigma-algebra.

Filtrations and Information

A filtration is an increasing family of sigma-algebras F(0) contained in F(1) contained in ... contained in F. We interpret F(t) as the information available up to time t. A process X(t) is adapted to the filtration if X(t) is measurable with respect to F(t) for each t — meaning we can observe X(t) given the information at time t.

Kolmogorov's Extension Theorem

A fundamental existence result: given a consistent family of finite-dimensional distributions, Kolmogorov's theorem guarantees the existence of a stochastic process realizing those distributions. This is what allows us to speak of Brownian motion and Gaussian processes as well-defined objects without explicit construction.

2. Random Variables: Distributions and Expectations

A random variable X is a measurable function from (Omega, F) to (R, B(R)), where B(R) is the Borel sigma-algebra. The measurability condition ensures that events such as X less than or equal to x are in F and hence have a well-defined probability.

Cumulative Distribution Function

The CDF of X is F(x) = P(X less than or equal to x). Every CDF satisfies: (1) it is non-decreasing, (2) it is right-continuous, (3) it converges to 0 as x approaches negative infinity, and (4) it converges to 1 as x approaches positive infinity. These four properties characterize CDFs completely.

Expectation and Key Distributions

The expectation of X is defined as the Lebesgue integral of X with respect to P. For a discrete random variable, E[X] = sum of x * P(X = x). For a continuous random variable with density f, E[X] = integral of x * f(x) dx.

Exponential Distribution

X ~ Exp(lambda): f(x) = lambda * exp(-lambda * x), x >= 0

Mean = 1/lambda. Variance = 1/lambda-squared. The exponential distribution is the unique continuous memoryless distribution, making it the inter-arrival distribution of Poisson processes.

Normal Distribution

X ~ N(mu, sigma-squared)

Mean = mu. Variance = sigma-squared. The Gaussian distribution is stable under addition and appears as the limit distribution in the Central Limit Theorem. Brownian motion increments are Gaussian.

Poisson Distribution

X ~ Poisson(lambda): P(X=k) = lambda^k * e^(-lambda) / k!

Mean = lambda. Variance = lambda. Models the number of events in a fixed time interval when events occur at constant rate and independently.

Geometric Distribution

X ~ Geom(p): P(X=k) = (1-p)^(k-1) * p, k = 1,2,...

Mean = 1/p. Variance = (1-p)/p-squared. The discrete memoryless distribution. Models the number of trials until first success.

Variance and Higher Moments

The variance is Var(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2. The n-th moment is E[X^n]. The moment generating function is M(t) = E[exp(tX)] when it exists in a neighborhood of zero. The MGF uniquely determines the distribution and satisfies the n-th derivative of M evaluated at zero equals E[X^n], making it a powerful tool for computing moments.

Key MGF Results

  • Normal N(mu, sigma^2): M(t) = exp(mu*t + sigma^2*t^2/2)
  • Poisson(lambda): M(t) = exp(lambda*(e^t - 1))
  • Exponential Exp(lambda): M(t) = lambda/(lambda - t) for t less than lambda
  • If X and Y are independent: M of X+Y equals M_X times M_Y

Concentration Inequalities

Markov's inequality: for non-negative X and a greater than 0, P(X greater than or equal to a) is at most E[X]/a. Chebyshev's inequality: P(|X - mu| greater than or equal to k*sigma) is at most 1/k-squared. These hold without assuming any specific distribution, making them widely applicable.

3. Markov Chains

A discrete-time Markov chain is a sequence of random variables X(0), X(1), X(2), ... taking values in a countable state space S, satisfying the Markov property: the future depends on the present only, not the past. Markov chains are among the most important stochastic processes, appearing throughout queuing, genetics, PageRank, and machine learning.

Transition Matrix

The transition probabilities p(i,j) = P(X(n+1) = j | X(n) = i) form the transition matrix P, where P is row-stochastic: all entries are non-negative and each row sums to one. The n-step transition probabilities are the (i,j) entry of the matrix P^n.

Example: Two-State Chain

States: (0, 1). Transition matrix:

P = [ 1-alpha   alpha  ]
    [  beta    1-beta ]

From state 0, the chain moves to state 1 with probability alpha and stays in state 0 with probability 1-alpha. The stationary distribution is pi(0) = beta/(alpha+beta) and pi(1) = alpha/(alpha+beta).

Classification of States

Recurrent vs. Transient

State i is recurrent if the chain returns to i with probability 1. State i is transient if there is a positive probability of never returning. For finite irreducible chains, all states are recurrent.

Period of a State

The period of state i is the greatest common divisor of all n such that P^n(i,i) greater than 0. If the period is 1, the state is aperiodic. All states in an irreducible chain have the same period.

Stationary Distributions

A probability vector pi is a stationary distribution if pi*P = pi. Equivalently, pi(j) = sum over i of pi(i)*p(i,j) for all j. The stationary distribution represents the long-run proportion of time the chain spends in each state.

Detailed Balance (Reversibility)

A distribution pi satisfies detailed balance with P if pi(i)*p(i,j) = pi(j)*p(j,i) for all i, j. Detailed balance implies stationarity but not vice versa. Chains satisfying detailed balance are called reversible and are easier to analyze. Random walks on undirected graphs are reversible.

Ergodicity and Convergence

An irreducible aperiodic chain is ergodic: P^n(i,j) converges to pi(j) as n approaches infinity for all states i and j. The convergence is geometric: there exist C and rho less than 1 such that |P^n(i,j) - pi(j)| is at most C * rho^n. The rate rho is related to the second-largest eigenvalue of P.

The ergodic theorem for Markov chains states: if the chain is irreducible and positive recurrent, then with probability 1, the time average (1/n) * sum from k=1 to n of f(X(k)) converges to the ensemble average sum over j of f(j)*pi(j).

Hitting Times

The hitting time T(A) = min of n greater than or equal to 0 with X(n) in A is the first time the chain enters set A. The expected hitting time h(i) = E[T(A) | X(0) = i] satisfies a system of linear equations: h(i) = 0 for i in A, and h(i) = 1 + sum over j not in A of p(i,j)*h(j) for i not in A. Solving this system gives the expected absorption times.

Gambler's Ruin

A gambler starts with k dollars and bets one dollar per round, winning with probability p and losing with probability q = 1-p. Play stops at 0 (ruin) or N (goal). The probability of reaching N before ruin is: (1-(q/p)^k) / (1-(q/p)^N) when p is not equal to q, and k/N when p = q = 1/2. This classic result is solved using hitting time methods on a Markov chain with states 0 through N.

4. Poisson Processes

The Poisson process is the canonical model for events arriving randomly over time — phone calls, network packets, radioactive decays, and insurance claims. It is the simplest continuous-time counting process and serves as the foundation for more general point processes.

Definition and Characterization

A counting process N(t) is a Poisson process with rate lambda if: (1) N(0) = 0, (2) the process has independent increments, (3) the process has stationary increments, and (4) N(t+h) - N(t) is Poisson(lambda*h) for any h greater than 0. Equivalently, a Poisson process is a renewal process with exponential inter-arrival times with rate lambda.

Key Properties

  • E[N(t)] = lambda*t
  • Var[N(t)] = lambda*t
  • Inter-arrival times T1, T2, ... are i.i.d. Exp(lambda)
  • P(N(t) = k) = (lambda*t)^k * exp(-lambda*t) / k!
  • N(t) - lambda*t is a martingale (the centered Poisson process)

Superposition of Poisson Processes

If N1(t) and N2(t) are independent Poisson processes with rates lambda1 and lambda2, then N1(t) + N2(t) is a Poisson process with rate lambda1 + lambda2. This superposition property extends to any finite or countable number of independent Poisson processes.

Practical interpretation: if customers arrive at two counters independently at rates lambda1 and lambda2, the combined arrival stream is also Poisson. This makes the Poisson process the natural aggregate model for many independent sparse-event streams.

Thinning of Poisson Processes

If each event of a Poisson process with rate lambda is independently classified as type 1 with probability p and type 2 with probability 1-p, then the type-1 events form a Poisson process with rate lambda*p and the type-2 events form a Poisson process with rate lambda*(1-p), and the two are independent.

Poisson Arrival Paradox

Suppose buses arrive as a Poisson process with rate lambda. A passenger arriving at a random time expects to wait 1/lambda for the next bus. But by the memoryless property, the expected time since the last bus is also 1/lambda. Thus the expected inter-arrival interval containing the passenger's arrival has mean 2/lambda — twice the average inter-arrival. This inspection paradox arises whenever we sample a length-biased interval.

Conditional Distribution of Arrivals

Given N(t) = n, the n arrival times are distributed as the order statistics of n independent Uniform(0, t) random variables. This result connects the Poisson process to uniform sampling and is used to compute conditional expectations efficiently.

5. Brownian Motion

Standard Brownian motion (the Wiener process) is the fundamental continuous-time stochastic process. It is the scaling limit of symmetric random walks and the building block for stochastic calculus, financial mathematics, and the theory of partial differential equations.

Definition

A stochastic process W(t) for t greater than or equal to 0 is a standard Brownian motion if: (1) W(0) = 0 almost surely, (2) W has independent increments, (3) W has stationary increments, and (4) W(t) - W(s) ~ N(0, t-s) for all 0 less than or equal to s less than t. Existence follows from Kolmogorov's theorem; the Gaussian structure follows from conditions (2), (3), and (4).

Key Properties of Brownian Motion

  • Continuity: W has continuous sample paths almost surely (Kolmogorov continuity criterion)
  • Nowhere differentiable: W is continuous but not differentiable at any point almost surely — its paths are wildly irregular
  • Scaling invariance: For any c greater than 0, c^(-1/2) * W(c*t) is again a standard Brownian motion
  • Time inversion: t * W(1/t) for t greater than 0, with value 0 at t = 0, is a Brownian motion
  • Reflection principle: P(max over [0,t] of W(s) greater than a) = 2 * P(W(t) greater than a) for a greater than 0

Quadratic Variation

The quadratic variation of Brownian motion over [0, t] equals t. Formally, for any sequence of partitions of [0, t] with mesh going to zero, the sum of squared increments converges to t in probability (and in L2). This is written informally as (dW)^2 = dt.

The quadratic variation result is the source of the extra term in Ito's lemma. In ordinary calculus, (dx)^2 is negligible. For Brownian motion, (dW)^2 = dt is not negligible, fundamentally changing the chain rule for stochastic calculus.

Geometric Brownian Motion

Geometric Brownian motion satisfies dS = mu*S*dt + sigma*S*dW, which has the explicit solution:

S(t) = S(0) * exp((mu - sigma^2/2)*t + sigma*W(t))

GBM is the standard model for stock prices under the Black-Scholes framework. The drift is mu - sigma^2/2 rather than mu because of the Ito correction from quadratic variation.

6. Martingales

A martingale is a stochastic process that models a fair game: the best prediction of a future value given current information is the current value. Martingales are central to mathematical finance, where the absence of arbitrage is equivalent to the existence of an equivalent martingale measure.

Definition

A process M(t) adapted to filtration (F(t)) is a martingale if E[|M(t)|] is finite for all t and E[M(t) | F(s)] = M(s) for all s less than or equal to t. If the equality is replaced by greater than or equal to, the process is a submartingale; if replaced by less than or equal to, a supermartingale.

Key Martingale Examples

  • W(t) — standard Brownian motion is a martingale
  • W(t)^2 - t — the quadratic variation compensator; this is a martingale, not W(t)^2 alone
  • exp(theta*W(t) - theta^2*t/2) — the exponential martingale, used in Girsanov's theorem to change probability measure
  • N(t) - lambda*t — the centered Poisson process is a martingale
  • E[Z | F(t)] — the conditional expectation of any integrable Z is a martingale in t

Optional Stopping Theorem

Let M(t) be a martingale and T be a stopping time. Under regularity conditions, E[M(T)] = E[M(0)]. The three standard sufficient conditions are: (1) T is bounded almost surely, (2) E[T] is finite and the increments of M are bounded, or (3) the stopped process M(t stopped at T) is dominated by an integrable random variable.

Optional Stopping in Practice

For a symmetric random walk starting at k with absorbing barriers at 0 and N: since the walk itself is a martingale and T (the absorption time) satisfies regularity conditions, E[X(T)] = k. But X(T) is 0 or N, so N * P(reach N) = k, giving P(reach N) = k/N. This is the gambler's ruin formula derived in one line using optional stopping.

Doob's Martingale Convergence Theorem

If M(t) is a non-negative supermartingale (or more generally, if the supremum over t of E[|M(t)|] is finite), then M(t) converges almost surely as t approaches infinity. This is a deep result — it says that a non-negative supermartingale cannot oscillate forever; it must settle down.

Doob's inequality: for a non-negative submartingale M and any lambda greater than 0, P(max over 0 to t of M(s) greater than or equal to lambda) is at most E[M(t)] / lambda. This is the martingale analogue of Markov's inequality, but applied to the running maximum.

Doob's L2 Inequality

For a martingale M in L2, E[max over 0 to t of M(s)^2] is at most 4 * E[M(t)^2]. The constant 4 is sharp. This estimate is used to prove the existence and properties of stochastic integrals.

7. Stochastic Differential Equations and Ito's Lemma

Stochastic differential equations (SDEs) describe the evolution of processes driven by Brownian motion. They generalize ordinary differential equations to a random-noise setting and are the language of mathematical finance, stochastic control, and diffusion processes.

The Ito Stochastic Integral

The stochastic integral of an adapted process H(t) against Brownian motion is defined as the L2 limit of simple integrals. The key property is the Ito isometry:

E[(integral from 0 to T of H(t) dW(t))^2] = E[integral from 0 to T of H(t)^2 dt]

This isometry is the foundation for the L2 theory of stochastic integration. It implies that the stochastic integral is a martingale (when H is suitably integrable), since it has mean zero by the Ito isometry.

Ito's Lemma (Stochastic Chain Rule)

Let X(t) satisfy dX = mu(t) dt + sigma(t) dW and let f(t, x) be a smooth function (twice differentiable in x, once in t). Then:

df = (df/dt + mu * df/dx + (1/2) * sigma^2 * d^2f/dx^2) dt + sigma * df/dx * dW

The extra term (1/2) * sigma^2 * d^2f/dx^2 is the Ito correction. It arises because (dW)^2 = dt is non-negligible. In ordinary calculus, second-order terms vanish; in stochastic calculus they do not.

Example: applying Ito's lemma to f(W) = W(t)^2 gives d(W^2) = 2W*dW + dt, or equivalently W(t)^2 = 2 times the integral from 0 to t of W(s) dW(s), plus t. The extra t comes from the quadratic variation correction.

Ito vs. Stratonovich Calculus

The Ito integral evaluates the integrand at the left endpoint of each subinterval. The Stratonovich integral uses the midpoint. Stratonovich satisfies the ordinary chain rule (no correction term), making it natural for physics and differential geometry. Ito is natural for finance and is the standard in mathematics. The two are related: Ito integral = Stratonovich integral minus the (1/2) correction term.

Ito SDE Advantages

  • Stochastic integrals are martingales
  • Ito isometry holds cleanly
  • Natural for finance (no-arbitrage)
  • Adapted (non-anticipating) integrand

Stratonovich Advantages

  • Ordinary chain rule applies
  • Natural for physics and geometry
  • Approximated by smooth noise limits
  • Wong-Zakai theorem applies

The Black-Scholes Equation

Under the Black-Scholes model, the stock price satisfies dS = r*S*dt + sigma*S*dW under the risk-neutral measure Q. The price V(t, S) of a European derivative satisfies the Black-Scholes PDE:

dV/dt + r*S*(dV/dS) + (1/2)*sigma^2*S^2*(d^2V/dS^2) - r*V = 0

For a European call with strike K and expiry T, the solution is:

V = S*N(d1) - K*exp(-r*(T-t))*N(d2)

d1 = [ln(S/K) + (r + sigma^2/2)*(T-t)] / [sigma*sqrt(T-t)]

d2 = d1 - sigma*sqrt(T-t)

The Black-Scholes formula is derived by applying Ito's lemma to V(t, S), constructing a delta-hedged portfolio that eliminates randomness, and invoking the no-arbitrage principle to equate the portfolio return with the risk-free rate.

Existence and Uniqueness for SDEs

The SDE dX = mu(t, X) dt + sigma(t, X) dW with initial condition X(0) = x0 has a unique strong solution if mu and sigma satisfy Lipschitz and linear growth conditions: |mu(t,x) - mu(t,y)| + |sigma(t,x) - sigma(t,y)| is at most K*|x-y|, and |mu(t,x)|^2 + |sigma(t,x)|^2 is at most K^2*(1 + |x|^2). These are analogues of the Picard-Lindelof conditions for ODEs.

8. Queuing Theory

Queuing theory analyzes systems where customers arrive, wait for service, and depart. Applications span telecommunications, traffic engineering, hospital operations, and computer systems. The Kendall notation A/B/c describes queues by arrival process, service distribution, and number of servers.

The M/M/1 Queue

The M/M/1 queue has Poisson arrivals at rate lambda, exponential service times with rate mu, and a single server. The traffic intensity is rho = lambda/mu. The queue is stable when rho is less than 1. In steady state, the number of customers N has the geometric distribution:

  • P(N = n) = (1 - rho) * rho^n for n = 0, 1, 2, ...
  • E[N] = rho / (1 - rho)
  • E[W] = 1 / (mu - lambda) — mean time in system
  • E[Wq] = rho / (mu - lambda) — mean waiting time in queue
  • E[Nq] = rho^2 / (1 - rho) — mean queue length

The M/M/1 queue is analyzed using the balance equations for the Markov chain on states 0, 1, 2, ... The global balance equation at state n equates the rate of leaving state n to the rate of entering state n: (lambda + mu)*pi(n) = lambda*pi(n-1) + mu*pi(n+1) for n greater than or equal to 1.

The M/M/c Queue

With c servers, the system is stable when rho = lambda/(c*mu) is less than 1. The Erlang-C formula gives C(c, a) — the probability that an arriving customer must wait — where a = lambda/mu is the offered load:

C(c, a) = (a^c / c!) * (1/(1-rho)) / [sum(k=0 to c-1, a^k/k!) + (a^c/c!)*(1/(1-rho))]

Mean waiting time in queue: E[Wq] = C(c,a) / (c*mu - lambda)

Little's Law

Little's law is one of the most powerful results in queuing theory: L = lambda * W, where L is the mean number of customers in the system, lambda is the mean arrival rate (throughput), and W is the mean time each customer spends in the system. The result holds for any stable queuing system regardless of the arrival process, service distribution, or queuing discipline.

Little's Law Applications

  • If a factory produces 100 units per day and each unit spends 5 days in work-in-progress, then the WIP inventory is 500 units on average
  • For the M/M/1 queue: L = lambda/(mu - lambda), and W = L/lambda = 1/(mu - lambda), consistent with the direct formula
  • Applied separately to the queue and to the server, Little's law gives both E[Nq] = lambda * E[Wq] and rho = lambda/mu

PASTA Property

PASTA stands for Poisson Arrivals See Time Averages. For a system with Poisson arrivals, the fraction of arriving customers who find the system in any particular state equals the fraction of time the system spends in that state. This is not true for non-Poisson arrivals. PASTA justifies using the steady-state distribution to compute what an arriving customer sees.

9. Renewal Theory

Renewal theory studies processes where events (renewals) occur repeatedly, with the gaps between events being i.i.d. positive random variables. It generalizes the Poisson process (which has exponential inter-renewal times) and provides tools for long-run averages in applied probability.

Basic Setup

Let X1, X2, ... be i.i.d. positive random variables with mean mu = E[X1]. Define S(n) = X1 + ... + Xn as the time of the n-th renewal. The renewal counting process is N(t) = max of n such that S(n) is at most t. The elementary renewal theorem states E[N(t)] / t approaches 1/mu as t approaches infinity.

Renewal Function

m(t) = E[N(t)] satisfies the renewal equation: m(t) = F(t) + integral from 0 to t of m(t-s) dF(s), where F is the CDF of the inter-renewal distribution. This convolution equation has the solution m(t) = sum from n=1 to infinity of F^(n)(t), where F^(n) is the n-fold convolution of F. For Poisson processes, m(t) = lambda*t exactly.

Renewal Reward Theorem

Suppose a reward R(n) is earned at the n-th renewal, with E[R(n)] = r finite. The long-run average reward rate is:

lim (t to infinity) of [total reward up to t] / t = E[R] / E[X] = r / mu

This is the renewal reward theorem (or regenerative process theorem). It says the long-run rate equals the expected reward per cycle divided by the expected cycle length. It is the primary tool for long-run average cost analysis.

Key Renewal Theory Results

Blackwell's Theorem

For a non-lattice renewal process, the expected number of renewals in (t, t+h] converges to h/mu as t approaches infinity, for any fixed h greater than 0.

Key Renewal Theorem

For a directly Riemann integrable function g, the convolution integral of g with the renewal measure converges to (1/mu) * integral from 0 to infinity of g(t) dt as time goes to infinity.

Age and Excess Life

At time t, the age A(t) is the time since the last renewal, and the excess (residual) life E(t) is the time until the next renewal. In steady state, the distribution of excess life is related to the size-biased distribution of the inter-renewal times. This is related to the inspection paradox: the interval containing t tends to be longer than a typical interval.

10. Applications Across Disciplines

Stochastic processes provide the mathematical framework for modeling uncertainty in time across a wide range of fields.

Mathematical Finance

  • Option pricing: Black-Scholes-Merton model uses geometric Brownian motion for stock prices and Ito's lemma to derive the replicating portfolio
  • Risk-neutral pricing: Under the equivalent martingale measure Q, discounted asset prices are martingales, and option prices are expectations under Q
  • Interest rate models: Vasicek, Cox-Ingersoll-Ross, and Hull-White models use SDEs to model the evolution of the short rate
  • Credit risk: Structural models treat default as the first passage time of a firm-value process below a barrier

Biology and Epidemiology

  • Population genetics: Wright-Fisher model uses Markov chains to model allele frequency drift; the diffusion limit is an SDE
  • Epidemics: Stochastic SIR models use continuous-time Markov chains; the basic reproduction number R0 determines extinction vs. outbreak
  • Neuroscience: Leaky integrate-and-fire neurons are modeled by SDEs; spike times are hitting times of diffusion processes
  • Protein folding: Conformational changes modeled as Markov chains on metastable states; transition rates from the Arrhenius formula

Communications and Networks

  • Network traffic: Poisson arrival models for packet arrivals; queuing analysis for router buffers and congestion
  • Wireless channels: Rayleigh fading modeled as a complex Gaussian process; outage probability as the CDF of channel gain
  • Spread spectrum: CDMA interference modeled as sums of random processes; capacity analysis uses large-deviations theory
  • PageRank: The Google PageRank algorithm is the stationary distribution of a Markov chain on the web graph

Operations Research

  • Inventory control: (s, S) policies analyzed using renewal theory; optimal reorder points minimize long-run average cost
  • Reliability: System lifetime modeled as a stopping time; mean time to failure computed via renewal equations
  • Scheduling: Multi-class queuing networks optimized using stochastic dynamic programming (Markov decision processes)
  • Monte Carlo: Markov chain Monte Carlo (MCMC) uses ergodic Markov chains to sample from complex probability distributions

Practice Problems with Solutions

Problem 1 — Markov Chain Stationary Distribution

A Markov chain has three states (1, 2, 3) with transition matrix: P(1,2) = 1, P(2,1) = 1/2, P(2,3) = 1/2, P(3,2) = 1. All other transitions are zero. Find the stationary distribution.

Solution

The balance equations are pi = pi * P. Write out: pi(1) = (1/2)*pi(2), pi(2) = pi(1) + pi(3), pi(3) = (1/2)*pi(2), with pi(1)+pi(2)+pi(3) = 1.

From the first and third equations: pi(1) = pi(3) = (1/2)*pi(2). Substituting: (1/2)*pi(2) + pi(2) + (1/2)*pi(2) = 1, so 2*pi(2) = 1, giving pi(2) = 1/2. Therefore pi(1) = 1/4, pi(2) = 1/2, pi(3) = 1/4.

Problem 2 — Poisson Process Thinning

Customers arrive at a store according to a Poisson process with rate 10 per hour. Each customer independently makes a purchase with probability 0.3. (a) What is the distribution of purchasing customers per hour? (b) What is the probability of no purchases in a 30-minute period?

Solution

(a) By the thinning property, purchasing customers form a Poisson process with rate lambda*p = 10*0.3 = 3 per hour. So the number of purchases per hour is Poisson(3).

(b) Over 30 minutes, purchases arrive as Poisson(3 * 0.5) = Poisson(1.5). P(N = 0) = exp(-1.5) * 1.5^0 / 0! = exp(-1.5) which is approximately 0.223, or about 22.3%.

Problem 3 — Ito's Lemma

Let W(t) be standard Brownian motion. Use Ito's lemma to find the SDE satisfied by X(t) = exp(W(t)).

Solution

Apply Ito's lemma with f(w) = exp(w) to W(t), which satisfies dW = dW (drift mu = 0, diffusion sigma = 1):

df/dw = exp(w), d^2f/dw^2 = exp(w)

dX = (0 + 0*X + (1/2)*1^2*X) dt + 1*X dW = (1/2)*X dt + X dW

So X(t) = exp(W(t)) satisfies dX = (1/2)*X dt + X*dW. This is geometric Brownian motion with mu = 1/2 and sigma = 1. The explicit solution is X(t) = exp((1/2 - 1/2)*t + W(t)) = exp(W(t)), which is self-consistent.

Problem 4 — M/M/1 Queue

An M/M/1 queue has arrival rate lambda = 4 per hour and service rate mu = 6 per hour. Compute: (a) the traffic intensity rho, (b) the mean number in the system, (c) the mean time in the system.

Solution

(a) rho = lambda/mu = 4/6 = 2/3.

(b) E[N] = rho/(1-rho) = (2/3)/(1/3) = 2 customers.

(c) E[W] = 1/(mu-lambda) = 1/(6-4) = 1/2 hour = 30 minutes.

Verification via Little's law: L = lambda * W = 4 * (1/2) = 2. Consistent with (b).

Problem 5 — Optional Stopping Theorem

A symmetric random walk starts at position 3. It is absorbed at 0 or 7. Use the optional stopping theorem to find (a) the probability of being absorbed at 7, and (b) the expected absorption time.

Solution

(a) X(n) is a martingale. By optional stopping, E[X(T)] = X(0) = 3. Since X(T) is 0 or 7, we get 7*P(reach 7) + 0*P(reach 0) = 3, so P(reach 7) = 3/7.

(b) X(n)^2 - n is a martingale for a symmetric walk. By optional stopping: E[X(T)^2 - T] = X(0)^2 - 0 = 9. So E[T] = E[X(T)^2] - 9. Now E[X(T)^2] = 49*(3/7) + 0*(4/7) = 21. Therefore E[T] = 21 - 9 = 12.

Problem 6 — Renewal Reward Theorem

A machine alternates between working (exponential with mean 8 hours) and repair (exponential with mean 2 hours). Revenue accrues at rate $100 per working hour. What is the long-run average revenue rate?

Solution

Each cycle consists of one working period and one repair period. The mean cycle length is 8 + 2 = 10 hours. The expected revenue per cycle is $100 * 8 = $800 (earned only during working time). By the renewal reward theorem:

Long-run rate = E[reward per cycle] / E[cycle length] = 800 / 10 = $80 per hour

Alternatively, the machine is working a fraction 8/10 = 80% of the time (by the ergodic theorem for alternating renewal processes), so the average rate is 100 * 0.8 = $80/hour.

Exam Tips and Common Mistakes

Forgetting the Ito Correction

The most common error in SDE problems is applying the ordinary chain rule instead of Ito's lemma. Always add the (1/2) sigma^2 * d^2f/dx^2 term when differentiating a function of a diffusion process. The rule of thumb: write out (dX)^2 using (dt)^2 = 0, dW*dt = 0, (dW)^2 = dt, then keep all terms through order dt.

Misidentifying Stopping Time Conditions

The optional stopping theorem requires regularity conditions. Do not simply write E[M(T)] = E[M(0)] without checking that T is a.s. finite and that the martingale is uniformly integrable (or dominated). A common exam trap: the stopping time has positive probability of being infinite, so the result fails.

Confusing Stationary with Limiting Distribution

A stationary distribution pi satisfies pi*P = pi. A limiting (steady-state) distribution exists and equals pi when the chain is ergodic (irreducible and aperiodic). A periodic irreducible chain has a unique stationary distribution but no limiting distribution — the time-average converges to pi but the distribution at time n does not.

M/M/1 Stability Condition

The M/M/1 queue has a steady state only when rho = lambda/mu is strictly less than 1. When rho is greater than or equal to 1, the queue grows without bound. Do not compute E[N] = rho/(1-rho) when rho is greater than or equal to 1; the formula is only valid in steady state.

Tip: Use Martingales to Compute Expectations

When computing expected hitting times or absorption probabilities, look for a martingale involving the quantity you want. For random walks: X(n) is a martingale (use for probabilities), X(n)^2 - n is a martingale (use for expected times). For Poisson processes: N(t) - lambda*t is a martingale. Then apply optional stopping.

Tip: Verify Stationarity by Row Sums

After solving for a stationary distribution pi, verify by checking (pi * P)(j) = pi(j) for each state. Also confirm that pi sums to 1. If detailed balance holds (pi(i)*p(i,j) = pi(j)*p(j,i) for all i, j), then pi is stationary — this shortcut works for reversible chains and is worth checking first.

Tip: Thinning and Superposition Shortcuts

When a problem involves classifying or merging Poisson streams, apply thinning and superposition before computing anything else. These operations keep you in the Poisson family and often simplify problems dramatically. The result that independent Poisson processes remain Poisson after merging or splitting is one of the most useful facts in applied probability.

Related Topics

Stochastic processes draw from and connect to a rich web of mathematical disciplines. Strengthen your foundations and extend your understanding with these related areas:

Summary: The Big Picture

Stochastic processes provide a unified mathematical framework for modeling systems that evolve randomly over time. The hierarchy of key ideas:

  1. 1A probability space (Omega, F, P) provides the foundation. Sigma-algebras encode information; filtrations model information flow over time.
  2. 2Markov chains (discrete time, discrete space) satisfy the memoryless property. Ergodic chains have unique stationary distributions and time averages converge.
  3. 3The Poisson process (continuous time, discrete space) is the canonical model for rare events. Thinning and superposition are the key operations.
  4. 4Brownian motion (continuous time, continuous space) has Gaussian increments and quadratic variation equal to t. It is the noise source for stochastic calculus.
  5. 5Martingales model fair games. The optional stopping theorem and Doob's inequality are the primary tools for extracting expectations.
  6. 6Ito's lemma gives the stochastic chain rule. The extra (1/2)*sigma^2 correction drives the Black-Scholes formula and the theory of SDEs.
  7. 7Queuing theory and renewal theory provide the tools for steady-state and long-run average analysis. Little's law (L = lambda*W) is universally applicable.