Learning Objectives
By the end of this section, you will be able to:
- Explain the key assumption behind separation of variables: that the solution is a product
- Execute the method step by step — substitute, separate, set equal to a constant, and solve two ODEs
- Solve eigenvalue problems arising from boundary conditions and identify the eigenfunctions
- Apply superposition to construct general solutions from individual modes
- Compute Fourier coefficients using orthogonality of eigenfunctions
- Contrast how the method produces exponential decay for the heat equation versus oscillation for the wave equation
- Connect separation of variables to spectral methods and Fourier Neural Operators in modern machine learning
The Big Picture: From One PDE to Many ODEs
"The art of doing mathematics is finding that special case which contains all the germs of generality." — David Hilbert
In the previous sections, we learned what partial differential equations are, how they are classified, and what boundary and initial conditions mean. Now comes the central question: how do we actually solve a PDE?
Partial differential equations are fundamentally harder than ordinary differential equations. An ODE involves derivatives with respect to one variable; a PDE involves derivatives with respect to multiple variables simultaneously. The solution is not a curve but an entire surface (or higher-dimensional object). We cannot simply "integrate both sides."
The method of separation of variables, developed by Joseph Fourier in the early 1800s while studying heat conduction, offers a profound insight: we can break a complicated PDE into a collection of simpler ODEs. The trick is to assume that the solution has a special multiplicative structure and then show that this assumption leads to a complete family of solutions.
The Central Strategy
Separation of variables converts a PDE in multiple independent variables into a set of ODEs, each in a single variable. Instead of solving one hard problem, we solve many easy ones.
Historical Context
Joseph Fourier's 1822 masterwork Théorie Analytique de la Chaleur (Analytical Theory of Heat) introduced both the heat equation and the method of separation of variables. His key realization was that any reasonable function could be represented as an infinite sum of sines and cosines — what we now call a Fourier series. This was initially controversial (even Lagrange and Laplace were skeptical), but it revolutionized both mathematics and physics.
The method was later applied to the wave equation by Daniel Bernoulli and to Laplace's equation by many others. Today, it remains one of the most important analytical tools in mathematical physics and engineering, and its ideas are the foundation of spectral methods in computational science and Fourier Neural Operators in machine learning.
The Core Idea: Products Become Sums
The entire method rests on a single, inspired guess:
The Separation Ansatz
Assume the solution can be written as a product of a function of alone and a function of alone:
where captures the spatial shape and captures the temporal behavior.
Why would this work? Not every function of two variables is a product of two functions of one variable. The function cannot be written as . But the key insight is that we don't need every solution to have this form — we only need enough of them that we can build the general solution by adding them together (superposition).
Why Products?
The product assumption works because of how derivatives interact with products: when you differentiate with respect to , the factor passes through unchanged. This is precisely what allows us to "separate" the variables onto different sides of the equation.
The Method Step by Step
Let us develop the method using the one-dimensional heat equation as our prototype. The same steps apply (with minor modifications) to the wave equation, Laplace's equation, and many other PDEs.
Step 1: State the Problem
We want to solve the heat equation on a rod of length with both ends held at zero temperature:
Step 2: Assume a Product Solution
Substitute into the PDE:
Here means and means . Note how the product structure allows partial derivatives to act on only one factor.
Step 3: Divide to Separate
Divide both sides by (assuming neither is zero):
Now comes the critical observation. The left side depends only on . The right side depends only on . Yet they are equal for all values of both and . The only way this can happen is if both sides equal the same constant.
The Separation Argument
If for all and , then both and must be constant. Why? If you vary while holding fixed, the right side doesn't change, so the left side can't change either. Therefore . The same argument applies to .
Step 4: Introduce the Separation Constant
We call this constant (the minus sign and the symbol — lambda — are chosen by convention for convenience):
We have converted one PDE into two ODEs! Each is a standard ODE that we know how to solve. The parameter connects them: it is the separation constant (which will turn out to be an eigenvalue).
Why -\u03BB and not +\u03BB?
We use with because for the heat equation, the solution must decay over time (physically, heat dissipates). With and , we get , which decays. If , the solution would grow exponentially — unphysical for heat.
Step 5: Apply Boundary Conditions
The boundary conditions and translate directly to the spatial part (since is not identically zero):
This transforms the spatial ODE into an eigenvalue problem: find the values of for which the ODE has a nonzero solution satisfying both boundary conditions.
Example: The Heat Equation on a Finite Rod
Let us carry out the full solution for the heat equation with Dirichlet boundary conditions (both ends at zero temperature).
Solving the Spatial ODE
The spatial ODE is:
The general solution depends on the sign of . We consider three cases:
Case 1: (rejected)
Writing , the ODE becomes with solution . Applying and forces . Only the trivial solution exists — rejected.
Case 2: (rejected)
The ODE becomes , so . Applying gives , and gives . Again trivial — rejected.
Case 3: (accepted!)
Writing , the ODE becomes with solution .
Applying : since and , we get .
So . Applying : we need , which means for
Therefore, the eigenvalues and eigenfunctions are:
Solving the Temporal ODE
For each eigenvalue , the temporal ODE is:
This is a first-order linear ODE with solution:
Each mode decays exponentially, with higher modes (larger ) decaying much faster because the decay rate is proportional to .
Individual Mode Solutions
Combining spatial and temporal parts, each mode (or normal mode) of the heat equation is:
The key idea of separation of variables: assume the solution is a product of a spatial function X(x) and a temporal function T(t). Watch how the spatial shape remains fixed while the amplitude changes over time.
Shape determined by boundary conditions
Exponential decay (heat dissipates)
Shape preserved, amplitude shrinks
Key Insight
For the heat equation, higher modes (larger n) decay faster because the decay rate is proportional to n². Mode n=1 decays with rate α(1π/L)² = 0.493. This is why sharp features (which need high modes) smooth out first.
Eigenvalue Problems: The Heart of the Method
The spatial ODE together with its boundary conditions forms an eigenvalue problem. This is one of the most important concepts in mathematical physics and has deep connections to linear algebra.
Sturm-Liouville Eigenvalue Problem
The spatial equation with boundary conditions is a special case of a Sturm-Liouville problem. The key properties are:
- Discrete eigenvalues: Only specific values of yield nonzero solutions
- Orthogonality: Eigenfunctions corresponding to different eigenvalues are orthogonal: when
- Completeness: Any "reasonable" function satisfying the boundary conditions can be expanded as a series of eigenfunctions
The analogy with linear algebra is powerful. In matrix algebra, we have where is a matrix and is an eigenvector. Here, the differential operator plays the role of the matrix, and the eigenfunctions play the role of eigenvectors.
| Linear Algebra | PDE Eigenvalue Problem |
|---|---|
| Matrix A | Operator -d²/dx² |
| Eigenvector v | Eigenfunction Xₙ(x) = sin(nπx/L) |
| Eigenvalue λ | Eigenvalue λₙ = (nπ/L)² |
| Finite-dimensional (N eigenvectors) | Infinite-dimensional (countably many) |
| vᵀₘ vₙ = 0 (orthogonality) | ∫ Xₘ Xₙ dx = 0 (orthogonality) |
| Any vector = ∑ cₙvₙ | Any function = ∑ bₙXₙ(x) |
Boundary conditions force the spatial solutions into specific shapes called eigenfunctions. These form a complete orthogonal set — any function satisfying the boundary conditions can be expressed as a sum of these modes.
| Mode n | Eigenfunction | Eigenvalue λₙ | Nodes (zeros) |
|---|---|---|---|
| 1 | sin(1πx/L) | (1π/L)² | 2 |
| 2 | sin(2πx/L) | (2π/L)² | 3 |
| 3 | sin(3πx/L) | (3π/L)² | 4 |
| 4 | sin(4πx/L) | (4π/L)² | 5 |
Different boundary conditions produce different eigenfunctions:
| Boundary Conditions | Eigenfunctions | Eigenvalues |
|---|---|---|
| u(0)=0, u(L)=0 (Dirichlet) | sin(nπx/L) | (nπ/L)² |
| u′(0)=0, u′(L)=0 (Neumann) | cos(nπx/L) | (nπ/L)² |
| u(0)=0, u′(L)=0 (Mixed) | sin((2n-1)πx/(2L)) | ((2n-1)π/(2L))² |
Superposition and Fourier Series
Each individual mode solves the PDE and satisfies the boundary conditions. But there are infinitely many such solutions (one for each ). Which one is the actual solution?
The answer uses superposition: because the heat equation is linear, any linear combination of solutions is also a solution. Therefore, the general solution is:
The coefficients are determined by the initial condition . At , all the exponentials equal 1, so:
This is a Fourier sine series! The coefficients are computed using orthogonality: multiply both sides by and integrate from 0 to . Because of orthogonality, all terms vanish except :
Fourier Coefficients via Orthogonality
Each coefficient measures how much of mode is present in the initial condition . This is the projection of onto the eigenfunction .
Why Orthogonality Matters
Without orthogonality, finding the coefficients would require solving an infinite system of coupled equations — essentially impossible. Orthogonality allows us to compute each coefficient independently, using a single integral. This is analogous to how orthogonal coordinate axes let us read off each component of a vector independently.
Convergence and the Complete Solution
The full solution to the heat equation is then:
This formula tells a beautiful physical story: the initial temperature distribution is decomposed into sinusoidal modes, each of which decays independently at a rate proportional to . High-frequency modes (sharp features) die out first; the solution progressively smooths.
The general solution is a superposition (sum) of all the separated solutions. By choosing the right coefficients, we can match any initial condition. Watch how adding more modes improves the approximation, and how the solution evolves in time.
Solution u(x,t) — dashed line: initial condition f(x), faded: individual modes
Space-time heatmap (white line: current t)
Gibbs Phenomenon
Notice the overshoot near the jump discontinuity in the "Step" initial condition. This is the Gibbs phenomenon: no matter how many modes you add, the Fourier series overshoots by about 9% at a jump. This is not a bug — it's a fundamental property of Fourier series at discontinuities.
Physical Meaning of Mode Decay
The fact that higher modes decay faster explains why a hot metal rod with a complex initial temperature distribution eventually approaches a simple sinusoidal shape (the first mode) before cooling to zero. Sharp temperature gradients are "expensive" in terms of energy flow and smooth out first.
Example: The Wave Equation on a String
The same method applies to the wave equation, but with a fundamentally different temporal behavior. Consider a vibrating string fixed at both ends:
Separation Leads to Different Temporal Behavior
Substituting and separating, the spatial equation is exactly the same as before:
But the temporal equation is second-order:
where is the natural frequency of mode .
Heat Equation (Parabolic)
- First-order temporal ODE
- Exponential decay — solution smooths
- Higher modes vanish faster
- Irreversible (entropy increases)
- One initial condition:
Wave Equation (Hyperbolic)
- Second-order temporal ODE
- Oscillation — modes vibrate forever
- All modes maintain their amplitude
- Reversible (energy conserved)
- Two initial conditions: and
The General Wave Solution
The general solution to the wave equation with zero initial velocity () is:
with the same Fourier coefficients . If the initial velocity is nonzero, we also get sine terms in time with a separate set of coefficients determined by .
Musical Harmonics
The natural frequencies are integer multiples of the fundamental frequency . These are the harmonics of a vibrating string! The fundamental frequency determines the pitch, while the relative amplitudes of the harmonics determine the timbre (tone quality). This is why a guitar string and a piano string playing the same note sound different — they excite different harmonic mixtures.
Machine Learning Connections
The ideas behind separation of variables are remarkably relevant to modern machine learning. The decomposition of functions into orthogonal modes is the mathematical foundation for several important ML techniques.
1. Fourier Neural Operators (FNO)
The Fourier Neural Operator (Li et al., 2021) directly extends the spectral approach of separation of variables. Instead of solving in physical space, FNO learns solution operators in Fourier (spectral) space:
where is the Fourier transform and is a learned weight matrix that acts on the Fourier modes. This is exactly the insight from separation of variables: many PDEs are diagonal in Fourier space, meaning each mode can be processed independently.
2. Spectral Normalization in GANs
Spectral normalization constrains the Lipschitz constant of neural network layers by normalizing their spectral norm (largest singular value). This is an eigenvalue computation, directly analogous to finding eigenvalues in separation of variables. It stabilizes GAN training by controlling how much each "mode" of the discriminator can amplify signals.
3. Principal Component Analysis (PCA)
PCA decomposes data into orthogonal principal components, just as separation of variables decomposes solutions into orthogonal eigenfunctions. The covariance matrix plays the role of the differential operator, eigenvectors play the role of eigenfunctions, and eigenvalues measure the importance of each mode. Keeping only the top modes is like truncating the Fourier series to the most important terms.
4. Diffusion Models
Score-based diffusion models for image generation are built on the same mathematics as the heat equation. The forward diffusion process adds noise progressively — mathematically, this is heat diffusion in pixel space. The Fourier decomposition tells us that high-frequency features (fine details) are destroyed first and low-frequency features (overall structure) last. The reverse (denoising) process reconstructs these features in the opposite order.
| PDE Concept | ML Application |
|---|---|
| Eigenfunction expansion | PCA / Spectral decomposition |
| Fourier modes | Fourier Neural Operator layers |
| Mode truncation | Dimensionality reduction |
| Orthogonality | Feature independence |
| Eigenvalue decay | Explained variance ratio |
| Heat equation diffusion | Forward process in diffusion models |
| Spectral methods | Spectral normalization |
Python Implementation
Solving the Heat Equation by Separation of Variables
The following implementation computes the analytical solution using Fourier coefficients and mode superposition — the exact method we derived above.
Solving the Wave Equation by Separation of Variables
From Separation of Variables to Spectral Methods
The connection to modern computational methods and ML becomes clear when we implement the spectral approach using FFT:
Common Pitfalls
Forgetting to Check All Cases for \u03BB
When solving the eigenvalue problem, you must check all three cases: , , and . It is tempting to jump directly to the oscillatory case, but you must show rigorously that the other cases yield only the trivial solution.
Applying Separation of Variables to Nonlinear PDEs
Separation of variables relies on linearity of the PDE. It works because the sum of solutions is also a solution (superposition). For nonlinear PDEs like the Navier-Stokes equations or reaction-diffusion systems with nonlinear reaction terms, this method generally fails. You need other techniques (numerical methods, perturbation theory, symmetry methods).
Confusing Boundary Conditions and Initial Conditions
Boundary conditions determine the eigenfunctions (the spatial shapes). Initial conditions determine the coefficients (how much of each shape to include). Do not mix them up! The method processes them at different stages: BCs in steps 3-5, IC in the final superposition step.
Truncating Too Few Modes
For smooth initial conditions (like a parabola), just a few modes give an excellent approximation. But for discontinuous initial conditions (like a step function), you may need many modes, and the Gibbs phenomenon causes persistent overshoot near jumps. The number of modes needed depends on the smoothness of .
When Does Separation of Variables Work?
The method works when:
- The PDE is linear and homogeneous
- The domain has a simple geometry (rectangle, disk, sphere)
- The boundary conditions are homogeneous (zero at the boundary)
- The coefficients of the PDE are constant or depend on only one variable
For non-homogeneous BCs, you can often subtract a steady-state solution to make them homogeneous. For non-homogeneous PDEs, use eigenfunction expansion.
Test Your Understanding
What is the key assumption in the separation of variables method?
Summary
The method of separation of variables is one of the most powerful analytical techniques for solving partial differential equations. By assuming a product form for the solution, we reduce a PDE to a collection of ODEs, each solvable by standard methods.
The Algorithm in Five Steps
- Assume a product solution:
- Substitute into the PDE and separate variables to get two sides depending on different variables
- Set both sides equal to a constant , obtaining two ODEs
- Solve the spatial eigenvalue problem using boundary conditions to find eigenvalues and eigenfunctions
- Superpose all modes and determine coefficients from the initial condition using orthogonality
Key Concepts
| Concept | Description |
|---|---|
| Separation ansatz | Assume u(x,t) = X(x)·T(t), converting PDE to ODEs |
| Separation constant | λ connects the spatial and temporal ODEs |
| Eigenvalue problem | Spatial ODE + BCs determine allowed λ values |
| Eigenfunctions | sin(nπx/L) for Dirichlet BCs — the building blocks |
| Orthogonality | ∫ Xₘ Xₙ dx = 0 for m≠n — enables independent coefficient computation |
| Superposition | General solution = ∑ bₙ Xₙ(x) Tₙ(t) |
| Fourier coefficients | bₙ = (2/L)∫ f(x) Xₙ(x) dx — projection onto modes |
| Mode decay (heat) | e^(-αλₙt) — higher modes decay faster as n² |
| Mode oscillation (wave) | cos(ωₙt) — all modes vibrate forever |
Key Takeaways
- Separation of variables reduces a PDE to two or more ODEs, each in a single variable
- The boundary conditions determine the spatial eigenfunctions, while the initial condition determines the expansion coefficients
- Orthogonality of eigenfunctions is the key property that makes the method practical — it allows independent computation of each Fourier coefficient
- The heat equation produces exponential decay of modes (diffusion smooths), while the wave equation produces oscillation (waves vibrate)
- The same eigenvalue/eigenfunction framework underlies PCA, spectral methods, Fourier Neural Operators, and diffusion models in machine learning
- The method works for linear PDEs with homogeneous BCs on simple geometries — it does not apply to nonlinear PDEs or irregular domains
Coming Next: In the next chapter, we'll dive deeper into the heat equation, exploring its derivation from physical principles, the heat kernel, and its connections to diffusion in machine learning.