Chapter 25
25 min read
Section 217 of 353

Separation of Variables Method

Introduction to PDEs

Learning Objectives

By the end of this section, you will be able to:

  1. Explain the key assumption behind separation of variables: that the solution is a product u(x,t)=X(x)T(t)u(x,t) = X(x) \cdot T(t)
  2. Execute the method step by step — substitute, separate, set equal to a constant, and solve two ODEs
  3. Solve eigenvalue problems arising from boundary conditions and identify the eigenfunctions
  4. Apply superposition to construct general solutions from individual modes
  5. Compute Fourier coefficients using orthogonality of eigenfunctions
  6. Contrast how the method produces exponential decay for the heat equation versus oscillation for the wave equation
  7. Connect separation of variables to spectral methods and Fourier Neural Operators in modern machine learning

The Big Picture: From One PDE to Many ODEs

"The art of doing mathematics is finding that special case which contains all the germs of generality." — David Hilbert

In the previous sections, we learned what partial differential equations are, how they are classified, and what boundary and initial conditions mean. Now comes the central question: how do we actually solve a PDE?

Partial differential equations are fundamentally harder than ordinary differential equations. An ODE involves derivatives with respect to one variable; a PDE involves derivatives with respect to multiple variables simultaneously. The solution is not a curve but an entire surface (or higher-dimensional object). We cannot simply "integrate both sides."

The method of separation of variables, developed by Joseph Fourier in the early 1800s while studying heat conduction, offers a profound insight: we can break a complicated PDE into a collection of simpler ODEs. The trick is to assume that the solution has a special multiplicative structure and then show that this assumption leads to a complete family of solutions.

The Central Strategy

Separation of variables converts a PDE in multiple independent variables into a set of ODEs, each in a single variable. Instead of solving one hard problem, we solve many easy ones.

PDE in (x,t)hard    ODE in xeasy  +  ODE in teasy\underbrace{\text{PDE in } (x, t)}_{\text{hard}} \;\longrightarrow\; \underbrace{\text{ODE in } x}_{\text{easy}} \;+\; \underbrace{\text{ODE in } t}_{\text{easy}}

Historical Context

Joseph Fourier's 1822 masterwork Théorie Analytique de la Chaleur (Analytical Theory of Heat) introduced both the heat equation and the method of separation of variables. His key realization was that any reasonable function could be represented as an infinite sum of sines and cosines — what we now call a Fourier series. This was initially controversial (even Lagrange and Laplace were skeptical), but it revolutionized both mathematics and physics.

The method was later applied to the wave equation by Daniel Bernoulli and to Laplace's equation by many others. Today, it remains one of the most important analytical tools in mathematical physics and engineering, and its ideas are the foundation of spectral methods in computational science and Fourier Neural Operators in machine learning.


The Core Idea: Products Become Sums

The entire method rests on a single, inspired guess:

The Separation Ansatz

Assume the solution u(x,t)u(x, t) can be written as a product of a function of xx alone and a function of tt alone:

u(x,t)=X(x)T(t)u(x, t) = X(x) \cdot T(t)

where X(x)X(x) captures the spatial shape and T(t)T(t) captures the temporal behavior.

Why would this work? Not every function of two variables is a product of two functions of one variable. The function u(x,t)=x+tu(x,t) = x + t cannot be written as X(x)T(t)X(x) \cdot T(t). But the key insight is that we don't need every solution to have this form — we only need enough of them that we can build the general solution by adding them together (superposition).

Why Products?

The product assumption works because of how derivatives interact with products: when you differentiate X(x)T(t)X(x) \cdot T(t) with respect to xx, the T(t)T(t) factor passes through unchanged. This is precisely what allows us to "separate" the variables onto different sides of the equation.


The Method Step by Step

Let us develop the method using the one-dimensional heat equation as our prototype. The same steps apply (with minor modifications) to the wave equation, Laplace's equation, and many other PDEs.

Step 1: State the Problem

We want to solve the heat equation on a rod of length LL with both ends held at zero temperature:

PDE:ut=α2ux2,0<x<L,  t>0\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}, \quad 0 < x < L, \; t > 0
BCs:u(0,t)=0,u(L,t)=0u(0, t) = 0, \quad u(L, t) = 0
IC:u(x,0)=f(x)u(x, 0) = f(x)

Step 2: Assume a Product Solution

Substitute u(x,t)=X(x)T(t)u(x, t) = X(x) \cdot T(t) into the PDE:

X(x)T(t)=αX(x)T(t)X(x) \cdot T'(t) = \alpha \cdot X''(x) \cdot T(t)

Here T(t)T'(t) means dT/dtdT/dt and X(x)X''(x) means d2X/dx2d^2X/dx^2. Note how the product structure allows partial derivatives to act on only one factor.

Step 3: Divide to Separate

Divide both sides by X(x)T(t)X(x) \cdot T(t) (assuming neither is zero):

1αT(t)T(t)=X(x)X(x)\frac{1}{\alpha} \cdot \frac{T'(t)}{T(t)} = \frac{X''(x)}{X(x)}

Now comes the critical observation. The left side depends only on tt. The right side depends only on xx. Yet they are equal for all values of both xx and tt. The only way this can happen is if both sides equal the same constant.

The Separation Argument

If f(t)=g(x)f(t) = g(x) for all xx and tt, then both f(t)f(t) and g(x)g(x) must be constant. Why? If you vary tt while holding xx fixed, the right side doesn't change, so the left side can't change either. Therefore f(t)=constf(t) = \text{const}. The same argument applies to g(x)g(x).

Step 4: Introduce the Separation Constant

We call this constant λ-\lambda (the minus sign and the symbol λ\lambda — lambda — are chosen by convention for convenience):

Spatial ODE:
X(x)+λX(x)=0X''(x) + \lambda X(x) = 0
Temporal ODE:
T(t)+αλT(t)=0T'(t) + \alpha \lambda T(t) = 0

We have converted one PDE into two ODEs! Each is a standard ODE that we know how to solve. The parameter λ\lambda connects them: it is the separation constant (which will turn out to be an eigenvalue).

Why -\u03BB and not +\u03BB?

We use λ-\lambda with λ>0\lambda > 0 because for the heat equation, the solution must decay over time (physically, heat dissipates). With T(t)=αλT(t)T'(t) = -\alpha\lambda T(t) and λ>0\lambda > 0, we get T(t)=eαλtT(t) = e^{-\alpha\lambda t}, which decays. If λ<0\lambda < 0, the solution would grow exponentially — unphysical for heat.

Step 5: Apply Boundary Conditions

The boundary conditions u(0,t)=0u(0, t) = 0 and u(L,t)=0u(L, t) = 0 translate directly to the spatial part (since T(t)T(t) is not identically zero):

X(0)=0,X(L)=0X(0) = 0, \quad X(L) = 0

This transforms the spatial ODE into an eigenvalue problem: find the values of λ\lambda for which the ODE X+λX=0X'' + \lambda X = 0 has a nonzero solution satisfying both boundary conditions.


Example: The Heat Equation on a Finite Rod

Let us carry out the full solution for the heat equation with Dirichlet boundary conditions (both ends at zero temperature).

Solving the Spatial ODE

The spatial ODE is:

X+λX=0,X(0)=0,X(L)=0X'' + \lambda X = 0, \quad X(0) = 0, \quad X(L) = 0

The general solution depends on the sign of λ\lambda. We consider three cases:

Case 1: λ<0\lambda < 0 (rejected)

Writing λ=μ2\lambda = -\mu^2, the ODE becomes Xμ2X=0X'' - \mu^2 X = 0 with solution X=Aeμx+BeμxX = Ae^{\mu x} + Be^{-\mu x}. Applying X(0)=0X(0) = 0 and X(L)=0X(L) = 0 forces A=B=0A = B = 0. Only the trivial solution exists — rejected.

Case 2: λ=0\lambda = 0 (rejected)

The ODE becomes X=0X'' = 0, so X=Ax+BX = Ax + B. Applying X(0)=0X(0) = 0 gives B=0B = 0, and X(L)=0X(L) = 0 gives A=0A = 0. Again trivial — rejected.

Case 3: λ>0\lambda > 0 (accepted!)

Writing λ=μ2\lambda = \mu^2, the ODE becomes X+μ2X=0X'' + \mu^2 X = 0 with solution X=Acos(μx)+Bsin(μx)X = A\cos(\mu x) + B\sin(\mu x).

Applying X(0)=0X(0) = 0: since cos(0)=1\cos(0) = 1 and sin(0)=0\sin(0) = 0, we get A=0A = 0.

So X=Bsin(μx)X = B\sin(\mu x). Applying X(L)=0X(L) = 0: we need sin(μL)=0\sin(\mu L) = 0, which means μL=nπ\mu L = n\pi for n=1,2,3,n = 1, 2, 3, \ldots

Therefore, the eigenvalues and eigenfunctions are:

Eigenvalues:λn=(nπL)2,n=1,2,3,\lambda_n = \left(\frac{n\pi}{L}\right)^2, \quad n = 1, 2, 3, \ldots
Eigenfunctions:Xn(x)=sin(nπxL),n=1,2,3,X_n(x) = \sin\left(\frac{n\pi x}{L}\right), \quad n = 1, 2, 3, \ldots

Solving the Temporal ODE

For each eigenvalue λn\lambda_n, the temporal ODE is:

Tn(t)+αλnTn(t)=0T_n'(t) + \alpha \lambda_n T_n(t) = 0

This is a first-order linear ODE with solution:

Tn(t)=eαλnt=eα(nπ/L)2tT_n(t) = e^{-\alpha \lambda_n t} = e^{-\alpha (n\pi/L)^2 t}

Each mode decays exponentially, with higher modes (larger nn) decaying much faster because the decay rate is proportional to n2n^2.

Individual Mode Solutions

Combining spatial and temporal parts, each mode (or normal mode) of the heat equation is:

un(x,t)=Xn(x)Tn(t)=sin(nπxL)eα(nπ/L)2tu_n(x, t) = X_n(x) \cdot T_n(t) = \sin\left(\frac{n\pi x}{L}\right) e^{-\alpha (n\pi/L)^2 t}
🎨The Separation Concept: u(x,t) = X(x) · T(t)

The key idea of separation of variables: assume the solution is a product of a spatial function X(x) and a temporal function T(t). Watch how the spatial shape remains fixed while the amplitude changes over time.

Spatial Part X(x)
Shape determined by boundary conditions
Temporal Part T(t)
Exponential decay (heat dissipates)
Product u(x,t) = X · T
Shape preserved, amplitude shrinks

Key Insight

For the heat equation, higher modes (larger n) decay faster because the decay rate is proportional to n². Mode n=1 decays with rate α(1π/L)² = 0.493. This is why sharp features (which need high modes) smooth out first.


Eigenvalue Problems: The Heart of the Method

The spatial ODE together with its boundary conditions forms an eigenvalue problem. This is one of the most important concepts in mathematical physics and has deep connections to linear algebra.

Sturm-Liouville Eigenvalue Problem

The spatial equation X+λX=0X'' + \lambda X = 0 with boundary conditions is a special case of a Sturm-Liouville problem. The key properties are:

  1. Discrete eigenvalues: Only specific values of λ\lambda yield nonzero solutions
  2. Orthogonality: Eigenfunctions corresponding to different eigenvalues are orthogonal: 0LXm(x)Xn(x)dx=0\int_0^L X_m(x) X_n(x) \, dx = 0 when mnm \neq n
  3. Completeness: Any "reasonable" function satisfying the boundary conditions can be expanded as a series of eigenfunctions

The analogy with linear algebra is powerful. In matrix algebra, we have Av=λvAv = \lambda v where AA is a matrix and vv is an eigenvector. Here, the differential operator d2/dx2-d^2/dx^2 plays the role of the matrix, and the eigenfunctions sin(nπx/L)\sin(n\pi x/L) play the role of eigenvectors.

Linear AlgebraPDE Eigenvalue Problem
Matrix AOperator -d²/dx²
Eigenvector vEigenfunction Xₙ(x) = sin(nπx/L)
Eigenvalue λEigenvalue λₙ = (nπ/L)²
Finite-dimensional (N eigenvectors)Infinite-dimensional (countably many)
vᵀₘ vₙ = 0 (orthogonality)∫ Xₘ Xₙ dx = 0 (orthogonality)
Any vector = ∑ cₙvₙAny function = ∑ bₙXₙ(x)
🎶Eigenfunctions: The Building Blocks

Boundary conditions force the spatial solutions into specific shapes called eigenfunctions. These form a complete orthogonal set — any function satisfying the boundary conditions can be expressed as a sum of these modes.

Mode nEigenfunctionEigenvalue λₙNodes (zeros)
1sin(1πx/L)(1π/L)²2
2sin(2πx/L)(2π/L)²3
3sin(3πx/L)(3π/L)²4
4sin(4πx/L)(4π/L)²5

Different boundary conditions produce different eigenfunctions:

Boundary ConditionsEigenfunctionsEigenvalues
u(0)=0, u(L)=0 (Dirichlet)sin(nπx/L)(nπ/L)²
u′(0)=0, u′(L)=0 (Neumann)cos(nπx/L)(nπ/L)²
u(0)=0, u′(L)=0 (Mixed)sin((2n-1)πx/(2L))((2n-1)π/(2L))²

Superposition and Fourier Series

Each individual mode un(x,t)=sin(nπx/L)eα(nπ/L)2tu_n(x,t) = \sin(n\pi x/L) \, e^{-\alpha(n\pi/L)^2 t} solves the PDE and satisfies the boundary conditions. But there are infinitely many such solutions (one for each nn). Which one is the actual solution?

The answer uses superposition: because the heat equation is linear, any linear combination of solutions is also a solution. Therefore, the general solution is:

u(x,t)=n=1bnsin(nπxL)eα(nπ/L)2tu(x, t) = \sum_{n=1}^{\infty} b_n \sin\left(\frac{n\pi x}{L}\right) e^{-\alpha(n\pi/L)^2 t}

The coefficients bnb_n are determined by the initial condition u(x,0)=f(x)u(x, 0) = f(x). At t=0t = 0, all the exponentials equal 1, so:

f(x)=n=1bnsin(nπxL)f(x) = \sum_{n=1}^{\infty} b_n \sin\left(\frac{n\pi x}{L}\right)

This is a Fourier sine series! The coefficients are computed using orthogonality: multiply both sides by sin(mπx/L)\sin(m\pi x/L) and integrate from 0 to LL. Because of orthogonality, all terms vanish except n=mn = m:

Fourier Coefficients via Orthogonality

bn=2L0Lf(x)sin(nπxL)dxb_n = \frac{2}{L} \int_0^L f(x) \sin\left(\frac{n\pi x}{L}\right) dx

Each coefficient bnb_n measures how much of mode nn is present in the initial condition f(x)f(x). This is the projection of ff onto the eigenfunction XnX_n.

Why Orthogonality Matters

Without orthogonality, finding the coefficients bnb_n would require solving an infinite system of coupled equations — essentially impossible. Orthogonality allows us to compute each coefficient independently, using a single integral. This is analogous to how orthogonal coordinate axes let us read off each component of a vector independently.

Convergence and the Complete Solution

The full solution to the heat equation is then:

u(x,t)=n=1[2L0Lf(ξ)sin(nπξL)dξ]sin(nπxL)eα(nπ/L)2tu(x, t) = \sum_{n=1}^{\infty} \left[\frac{2}{L} \int_0^L f(\xi) \sin\left(\frac{n\pi \xi}{L}\right) d\xi\right] \sin\left(\frac{n\pi x}{L}\right) e^{-\alpha(n\pi/L)^2 t}

This formula tells a beautiful physical story: the initial temperature distribution f(x)f(x) is decomposed into sinusoidal modes, each of which decays independently at a rate proportional to n2n^2. High-frequency modes (sharp features) die out first; the solution progressively smooths.

🎧Superposition: Building Solutions from Modes

The general solution is a superposition (sum) of all the separated solutions. By choosing the right coefficients, we can match any initial condition. Watch how adding more modes improves the approximation, and how the solution evolves in time.

Solution u(x,t) — dashed line: initial condition f(x), faded: individual modes

Space-time heatmap (white line: current t)

bₙ=1
0.900
bₙ=2
0.000
bₙ=3
-0.300
bₙ=4
-0.000
bₙ=5
-0.180

Gibbs Phenomenon

Notice the overshoot near the jump discontinuity in the "Step" initial condition. This is the Gibbs phenomenon: no matter how many modes you add, the Fourier series overshoots by about 9% at a jump. This is not a bug — it's a fundamental property of Fourier series at discontinuities.

Physical Meaning of Mode Decay

The fact that higher modes decay faster explains why a hot metal rod with a complex initial temperature distribution eventually approaches a simple sinusoidal shape (the first mode) before cooling to zero. Sharp temperature gradients are "expensive" in terms of energy flow and smooth out first.


Example: The Wave Equation on a String

The same method applies to the wave equation, but with a fundamentally different temporal behavior. Consider a vibrating string fixed at both ends:

PDE:2ut2=c22ux2,0<x<L,  t>0\frac{\partial^2 u}{\partial t^2} = c^2 \frac{\partial^2 u}{\partial x^2}, \quad 0 < x < L, \; t > 0
BCs:u(0,t)=0,u(L,t)=0u(0, t) = 0, \quad u(L, t) = 0
ICs:u(x,0)=f(x),ut(x,0)=g(x)u(x, 0) = f(x), \quad \frac{\partial u}{\partial t}(x, 0) = g(x)

Separation Leads to Different Temporal Behavior

Substituting u(x,t)=X(x)T(t)u(x,t) = X(x) \cdot T(t) and separating, the spatial equation is exactly the same as before:

X+λX=0Xn(x)=sin(nπxL)X'' + \lambda X = 0 \quad \Rightarrow \quad X_n(x) = \sin\left(\frac{n\pi x}{L}\right)

But the temporal equation is second-order:

T+c2λT=0Tn(t)=Ancos(ωnt)+Bnsin(ωnt)T'' + c^2 \lambda T = 0 \quad \Rightarrow \quad T_n(t) = A_n \cos(\omega_n t) + B_n \sin(\omega_n t)

where ωn=cλn=cnπ/L\omega_n = c\sqrt{\lambda_n} = cn\pi/L is the natural frequency of mode nn.

Heat Equation (Parabolic)

Tn(t)=eαλntT_n(t) = e^{-\alpha\lambda_n t}
  • First-order temporal ODE
  • Exponential decay — solution smooths
  • Higher modes vanish faster
  • Irreversible (entropy increases)
  • One initial condition: u(x,0)=f(x)u(x,0) = f(x)

Wave Equation (Hyperbolic)

Tn(t)=Ancos(ωnt)+Bnsin(ωnt)T_n(t) = A_n\cos(\omega_n t) + B_n\sin(\omega_n t)
  • Second-order temporal ODE
  • Oscillation — modes vibrate forever
  • All modes maintain their amplitude
  • Reversible (energy conserved)
  • Two initial conditions: u(x,0)u(x,0) and ut(x,0)u_t(x,0)

The General Wave Solution

The general solution to the wave equation with zero initial velocity (g(x)=0g(x) = 0) is:

u(x,t)=n=1bnsin(nπxL)cos(cnπtL)u(x, t) = \sum_{n=1}^{\infty} b_n \sin\left(\frac{n\pi x}{L}\right) \cos\left(\frac{cn\pi t}{L}\right)

with the same Fourier coefficients bn=2L0Lf(x)sin(nπx/L)dxb_n = \frac{2}{L}\int_0^L f(x) \sin(n\pi x/L)\,dx. If the initial velocity is nonzero, we also get sine terms in time with a separate set of coefficients determined by g(x)g(x).

Musical Harmonics

The natural frequencies ωn=cnπ/L\omega_n = cn\pi/L are integer multiples of the fundamental frequency ω1=cπ/L\omega_1 = c\pi/L. These are the harmonics of a vibrating string! The fundamental frequency determines the pitch, while the relative amplitudes of the harmonics determine the timbre (tone quality). This is why a guitar string and a piano string playing the same note sound different — they excite different harmonic mixtures.


Machine Learning Connections

The ideas behind separation of variables are remarkably relevant to modern machine learning. The decomposition of functions into orthogonal modes is the mathematical foundation for several important ML techniques.

1. Fourier Neural Operators (FNO)

The Fourier Neural Operator (Li et al., 2021) directly extends the spectral approach of separation of variables. Instead of solving in physical space, FNO learns solution operators in Fourier (spectral) space:

vl+1(x)=σ(Wlvl(x)+F1(RlF(vl))(x))v_{l+1}(x) = \sigma\left(W_l v_l(x) + \mathcal{F}^{-1}\left(R_l \cdot \mathcal{F}(v_l)\right)(x)\right)

where F\mathcal{F} is the Fourier transform and RlR_l is a learned weight matrix that acts on the Fourier modes. This is exactly the insight from separation of variables: many PDEs are diagonal in Fourier space, meaning each mode can be processed independently.

2. Spectral Normalization in GANs

Spectral normalization constrains the Lipschitz constant of neural network layers by normalizing their spectral norm (largest singular value). This is an eigenvalue computation, directly analogous to finding eigenvalues in separation of variables. It stabilizes GAN training by controlling how much each "mode" of the discriminator can amplify signals.

3. Principal Component Analysis (PCA)

PCA decomposes data into orthogonal principal components, just as separation of variables decomposes solutions into orthogonal eigenfunctions. The covariance matrix plays the role of the differential operator, eigenvectors play the role of eigenfunctions, and eigenvalues measure the importance of each mode. Keeping only the top modes is like truncating the Fourier series to the most important terms.

4. Diffusion Models

Score-based diffusion models for image generation are built on the same mathematics as the heat equation. The forward diffusion process adds noise progressively — mathematically, this is heat diffusion in pixel space. The Fourier decomposition tells us that high-frequency features (fine details) are destroyed first and low-frequency features (overall structure) last. The reverse (denoising) process reconstructs these features in the opposite order.

PDE ConceptML Application
Eigenfunction expansionPCA / Spectral decomposition
Fourier modesFourier Neural Operator layers
Mode truncationDimensionality reduction
OrthogonalityFeature independence
Eigenvalue decayExplained variance ratio
Heat equation diffusionForward process in diffusion models
Spectral methodsSpectral normalization

Python Implementation

Solving the Heat Equation by Separation of Variables

The following implementation computes the analytical solution using Fourier coefficients and mode superposition — the exact method we derived above.

Analytical Solution of the Heat Equation
🐍separation_of_variables_heat.py
3Analytical Solution Setup

Unlike numerical methods (finite differences), separation of variables gives us the EXACT analytical solution as an infinite series. We truncate to N modes for computation.

19Initial Condition

The step function is a challenging test case because it has a discontinuity. The Fourier series will exhibit the Gibbs phenomenon at the jump.

22Fourier Coefficient Computation

Each coefficient b_n is computed by the inner product of f(x) with sin(nπx/L). This uses the orthogonality of eigenfunctions: only the n-th mode picks up the n-th coefficient.

35Mode Superposition

The solution is a sum of modes, each decaying at its own rate. Mode n decays as exp(-α(nπ/L)²t). Higher modes (higher frequency) decay MUCH faster because the rate scales as n².

39Time Evolution

At t=0, we see the (truncated) Fourier series of f(x). As time increases, high-frequency modes die out first, smoothing the solution. Eventually only the first mode remains.

53 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3
4def separation_of_variables_heat(L=1.0, alpha=0.01, N_modes=20,
5                                  Nx=200, t_values=[0, 0.5, 1, 2, 5]):
6    """
7    Solve the 1D heat equation using separation of variables.
8
9    PDE: du/dt = alpha * d²u/dx²
10    BCs: u(0,t) = u(L,t) = 0
11    IC:  u(x,0) = f(x) (step function)
12
13    Solution: u(x,t) = sum_{n=1}^{N} b_n sin(n*pi*x/L) exp(-alpha*(n*pi/L)^2*t)
14    """
15    x = np.linspace(0, L, Nx)
16
17    # Initial condition: step function
18    def f(x_val):
19        return 1.0 if 0.25 < x_val < 0.75 else 0.0
20
21    # Compute Fourier sine coefficients: b_n = (2/L) * integral f(x) sin(n*pi*x/L) dx
22    def compute_bn(n):
23        integrand = np.array([f(xi) * np.sin(n * np.pi * xi / L) for xi in x])
24        return 2 / L * np.trapezoid(integrand, x)
25
26    # Pre-compute all coefficients
27    coefficients = [compute_bn(n) for n in range(1, N_modes + 1)]
28    print("First 5 Fourier coefficients:")
29    for n, bn in enumerate(coefficients[:5], 1):
30        print(f"  b_{n} = {bn:.6f}")
31
32    # Compute solution at each time
33    fig, axes = plt.subplots(1, len(t_values), figsize=(16, 4))
34
35    for idx, t in enumerate(t_values):
36        u = np.zeros_like(x)
37        for n_idx, bn in enumerate(coefficients):
38            n = n_idx + 1
39            lambda_n = (n * np.pi / L) ** 2
40            # Each mode: b_n * sin(n*pi*x/L) * exp(-alpha*lambda_n*t)
41            u += bn * np.sin(n * np.pi * x / L) * np.exp(-alpha * lambda_n * t)
42
43        axes[idx].fill_between(x, u, alpha=0.3, color='steelblue')
44        axes[idx].plot(x, u, 'b-', linewidth=2)
45        axes[idx].set_title(f't = {t}')
46        axes[idx].set_ylim(-0.3, 1.3)
47        axes[idx].set_xlabel('x')
48        axes[idx].grid(True, alpha=0.3)
49
50    axes[0].set_ylabel('u(x, t)')
51    fig.suptitle(f'Heat Equation Solution ({N_modes} modes)', fontsize=14)
52    plt.tight_layout()
53    plt.show()
54
55    return x, coefficients
56
57# Run the solver
58x, coeffs = separation_of_variables_heat()

Solving the Wave Equation by Separation of Variables

Analytical Solution of the Wave Equation
🐍separation_of_variables_wave.py
3Wave Equation Setup

The wave equation is second-order in time, so the temporal equation T'' = -ω²T gives oscillatory solutions cos(ωt) and sin(ωt) instead of the exponential decay of the heat equation.

13Zero Initial Velocity

With du/dt(x,0) = 0, we only get cos terms in time (no sin terms). If we had nonzero initial velocity, we would need both cos and sin, requiring a second set of Fourier coefficients.

34Natural Frequencies

Each mode n oscillates at its natural frequency ωₙ = cnπ/L. These are the harmonics of the string! The fundamental frequency is ω₁ = cπ/L, and overtones are integer multiples.

53Energy Conservation

Unlike the heat equation where energy dissipates, the wave equation conserves total energy. The L² norm of the solution remains constant over time — energy just redistributes between kinetic and potential forms.

69 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3
4def separation_of_variables_wave(L=1.0, c=1.0, N_modes=30,
5                                  Nx=200, t_values=[0, 0.3, 0.5, 0.8, 1.0]):
6    """
7    Solve the 1D wave equation using separation of variables.
8
9    PDE: d²u/dt² = c² * d²u/dx²
10    BCs: u(0,t) = u(L,t) = 0
11    IC:  u(x,0) = f(x) (triangle), du/dt(x,0) = 0
12
13    Solution: u(x,t) = sum b_n sin(n*pi*x/L) cos(c*n*pi*t/L)
14    """
15    x = np.linspace(0, L, Nx)
16
17    # Initial condition: triangle wave
18    def f(x_val):
19        if x_val < L / 2:
20            return 2 * x_val / L
21        return 2 * (1 - x_val / L)
22
23    # Fourier coefficients
24    def compute_bn(n):
25        integrand = np.array([f(xi) * np.sin(n * np.pi * xi / L) for xi in x])
26        return 2 / L * np.trapezoid(integrand, x)
27
28    coefficients = [compute_bn(n) for n in range(1, N_modes + 1)]
29
30    # Key difference from heat: temporal part is cos, not exp!
31    fig, axes = plt.subplots(1, len(t_values), figsize=(16, 4))
32
33    for idx, t in enumerate(t_values):
34        u = np.zeros_like(x)
35        for n_idx, bn in enumerate(coefficients):
36            n = n_idx + 1
37            omega_n = c * n * np.pi / L  # natural frequency of mode n
38            # Each mode: b_n * sin(n*pi*x/L) * cos(omega_n * t)
39            u += bn * np.sin(n * np.pi * x / L) * np.cos(omega_n * t)
40
41        axes[idx].plot(x, u, 'r-', linewidth=2)
42        axes[idx].fill_between(x, u, alpha=0.2, color='red')
43        axes[idx].set_title(f't = {t}')
44        axes[idx].set_ylim(-1.3, 1.3)
45        axes[idx].set_xlabel('x')
46        axes[idx].grid(True, alpha=0.3)
47
48    axes[0].set_ylabel('u(x, t)')
49    fig.suptitle('Wave Equation: Vibrating String', fontsize=14)
50    plt.tight_layout()
51    plt.show()
52
53    # Show that energy is conserved (unlike heat equation)
54    t_span = np.linspace(0, 3, 300)
55    energy = []
56    for t in t_span:
57        u_vals = np.zeros_like(x)
58        for n_idx, bn in enumerate(coefficients):
59            n = n_idx + 1
60            omega_n = c * n * np.pi / L
61            u_vals += bn * np.sin(n * np.pi * x / L) * np.cos(omega_n * t)
62        # Approximate total energy ~ integral of u^2
63        energy.append(np.trapezoid(u_vals**2, x))
64
65    plt.figure(figsize=(8, 3))
66    plt.plot(t_span, energy, 'g-', linewidth=2)
67    plt.xlabel('Time')
68    plt.ylabel('Energy (L2 norm squared)')
69    plt.title('Wave Equation: Energy is Conserved!')
70    plt.grid(True, alpha=0.3)
71    plt.show()
72
73separation_of_variables_wave()

From Separation of Variables to Spectral Methods

The connection to modern computational methods and ML becomes clear when we implement the spectral approach using FFT:

Spectral Methods: The Bridge to Machine Learning
🐍spectral_method_ml.py
3Spectral Methods: The Generalization

Spectral methods extend separation of variables to general domains using FFT. Instead of computing Fourier coefficients analytically, we use the Fast Fourier Transform.

14Connection to FNO

Fourier Neural Operators (FNO) learn in spectral space just like spectral PDE solvers. The key insight: many PDEs are DIAGONAL in Fourier space, meaning each mode evolves independently.

30Exact Spectral Evolution

In spectral space, the heat equation becomes a set of independent ODEs. Each wavenumber k has an exact solution: u_hat_k(t) = u_hat_k(0) * exp(-αk²t). No spatial discretization error!

35FFT-Based Solver

The FFT transforms between physical and spectral space in O(N log N) time. This makes spectral methods extremely efficient for problems with smooth solutions on periodic or simple domains.

82 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3
4def spectral_method_pde(N_basis=32, L=2*np.pi, alpha=0.01, dt=0.01, T_final=2.0):
5    """
6    Spectral method for PDEs: the ML-relevant generalization
7    of separation of variables.
8
9    Instead of solving in physical space, we:
10    1. Transform to frequency space (Fourier transform)
11    2. Solve ODEs for each mode independently
12    3. Transform back
13
14    This is the foundation of:
15    - Fourier Neural Operators (FNO)
16    - Spectral normalization in GANs
17    - Frequency-domain learning
18    """
19    x = np.linspace(0, L, N_basis, endpoint=False)
20
21    # Initial condition: sum of two Gaussians
22    u = np.exp(-10 * (x - L/3)**2) + 0.5 * np.exp(-20 * (x - 2*L/3)**2)
23
24    # In spectral space, the heat equation becomes:
25    # d(u_hat_k)/dt = -alpha * k^2 * u_hat_k
26    # This is just an ODE for each mode!
27
28    k = np.fft.fftfreq(N_basis, d=L/N_basis) * 2 * np.pi  # wavenumbers
29
30    # Time evolution in spectral space
31    Nt = int(T_final / dt)
32    history = [u.copy()]
33    times = [0]
34
35    for step in range(Nt):
36        t = (step + 1) * dt
37
38        # Forward FFT: physical -> spectral
39        u_hat = np.fft.fft(u)
40
41        # Exact solution in spectral space (no discretization error!)
42        # Each mode evolves independently: u_hat_k(t+dt) = u_hat_k(t) * exp(-alpha*k^2*dt)
43        u_hat *= np.exp(-alpha * k**2 * dt)
44
45        # Inverse FFT: spectral -> physical
46        u = np.real(np.fft.ifft(u_hat))
47
48        if step % (Nt // 10) == 0:
49            history.append(u.copy())
50            times.append(t)
51
52    # Visualize
53    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
54
55    for i, (u_snap, t) in enumerate(zip(history, times)):
56        alpha_val = 0.3 + 0.7 * (1 - i / len(history))
57        ax1.plot(x, u_snap, alpha=alpha_val, label=f't={t:.1f}')
58    ax1.set_xlabel('x')
59    ax1.set_ylabel('u(x,t)')
60    ax1.set_title('Spectral Method: Heat Equation Solution')
61    ax1.legend()
62    ax1.grid(True, alpha=0.3)
63
64    # Show spectral decay
65    u_hat_init = np.fft.fft(history[0])
66    u_hat_final = np.fft.fft(history[-1])
67    freqs = np.fft.fftfreq(N_basis, d=L/N_basis)
68    ax2.semilogy(np.abs(freqs[:N_basis//2]),
69                 np.abs(u_hat_init[:N_basis//2]), 'b-o', label='Initial', markersize=3)
70    ax2.semilogy(np.abs(freqs[:N_basis//2]),
71                 np.abs(u_hat_final[:N_basis//2]) + 1e-16, 'r-o', label='Final', markersize=3)
72    ax2.set_xlabel('Frequency |k|')
73    ax2.set_ylabel('|u_hat(k)|')
74    ax2.set_title('Spectral Coefficients: High Frequencies Decay')
75    ax2.legend()
76    ax2.grid(True, alpha=0.3)
77
78    plt.tight_layout()
79    plt.show()
80
81    print("Key insight for ML:")
82    print("  Fourier Neural Operators (FNO) learn solution operators")
83    print("  by parameterizing transformations in spectral space,")
84    print("  just as separation of variables solves PDEs mode-by-mode.")
85
86spectral_method_pde()

Common Pitfalls

Forgetting to Check All Cases for \u03BB

When solving the eigenvalue problem, you must check all three cases: λ<0\lambda < 0, λ=0\lambda = 0, and λ>0\lambda > 0. It is tempting to jump directly to the oscillatory case, but you must show rigorously that the other cases yield only the trivial solution.

Applying Separation of Variables to Nonlinear PDEs

Separation of variables relies on linearity of the PDE. It works because the sum of solutions is also a solution (superposition). For nonlinear PDEs like the Navier-Stokes equations or reaction-diffusion systems with nonlinear reaction terms, this method generally fails. You need other techniques (numerical methods, perturbation theory, symmetry methods).

Confusing Boundary Conditions and Initial Conditions

Boundary conditions determine the eigenfunctions (the spatial shapes). Initial conditions determine the coefficients (how much of each shape to include). Do not mix them up! The method processes them at different stages: BCs in steps 3-5, IC in the final superposition step.

Truncating Too Few Modes

For smooth initial conditions (like a parabola), just a few modes give an excellent approximation. But for discontinuous initial conditions (like a step function), you may need many modes, and the Gibbs phenomenon causes persistent overshoot near jumps. The number of modes needed depends on the smoothness of f(x)f(x).

When Does Separation of Variables Work?

The method works when:

  1. The PDE is linear and homogeneous
  2. The domain has a simple geometry (rectangle, disk, sphere)
  3. The boundary conditions are homogeneous (zero at the boundary)
  4. The coefficients of the PDE are constant or depend on only one variable

For non-homogeneous BCs, you can often subtract a steady-state solution to make them homogeneous. For non-homogeneous PDEs, use eigenfunction expansion.


Test Your Understanding

Test Your Understanding1 / 8

What is the key assumption in the separation of variables method?


Summary

The method of separation of variables is one of the most powerful analytical techniques for solving partial differential equations. By assuming a product form for the solution, we reduce a PDE to a collection of ODEs, each solvable by standard methods.

The Algorithm in Five Steps

  1. Assume a product solution: u(x,t)=X(x)T(t)u(x,t) = X(x) \cdot T(t)
  2. Substitute into the PDE and separate variables to get two sides depending on different variables
  3. Set both sides equal to a constant λ-\lambda, obtaining two ODEs
  4. Solve the spatial eigenvalue problem using boundary conditions to find eigenvalues λn\lambda_n and eigenfunctions Xn(x)X_n(x)
  5. Superpose all modes and determine coefficients from the initial condition using orthogonality

Key Concepts

ConceptDescription
Separation ansatzAssume u(x,t) = X(x)·T(t), converting PDE to ODEs
Separation constantλ connects the spatial and temporal ODEs
Eigenvalue problemSpatial ODE + BCs determine allowed λ values
Eigenfunctionssin(nπx/L) for Dirichlet BCs — the building blocks
Orthogonality∫ Xₘ Xₙ dx = 0 for m≠n — enables independent coefficient computation
SuperpositionGeneral solution = ∑ bₙ Xₙ(x) Tₙ(t)
Fourier coefficientsbₙ = (2/L)∫ f(x) Xₙ(x) dx — projection onto modes
Mode decay (heat)e^(-αλₙt) — higher modes decay faster as n²
Mode oscillation (wave)cos(ωₙt) — all modes vibrate forever

Key Takeaways

  1. Separation of variables reduces a PDE to two or more ODEs, each in a single variable
  2. The boundary conditions determine the spatial eigenfunctions, while the initial condition determines the expansion coefficients
  3. Orthogonality of eigenfunctions is the key property that makes the method practical — it allows independent computation of each Fourier coefficient
  4. The heat equation produces exponential decay of modes (diffusion smooths), while the wave equation produces oscillation (waves vibrate)
  5. The same eigenvalue/eigenfunction framework underlies PCA, spectral methods, Fourier Neural Operators, and diffusion models in machine learning
  6. The method works for linear PDEs with homogeneous BCs on simple geometries — it does not apply to nonlinear PDEs or irregular domains
The Essence of Separation of Variables:
"A PDE is hard because everything depends on everything. Separation of variables breaks this entanglement: space and time evolve independently, and the solution is a symphony of orthogonal modes, each marching to its own beat."
Coming Next: In the next chapter, we'll dive deeper into the heat equation, exploring its derivation from physical principles, the heat kernel, and its connections to diffusion in machine learning.
Loading comments...