Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Explain the key assumption behind separation of variables: that the solution is a product $u(x,t) = X(x) \cdot T(t)$
Execute the method step by step — substitute, separate, set equal to a constant, and solve two ODEs
Solve eigenvalue problems arising from boundary conditions and identify the eigenfunctions
Apply superposition to construct general solutions from individual modes
Compute Fourier coefficients using orthogonality of eigenfunctions
Contrast how the method produces exponential decay for the heat equation versus oscillation for the wave equation
Connect separation of variables to spectral methods and Fourier Neural Operators in modern machine learning

The Big Picture: From One PDE to Many ODEs

"The art of doing mathematics is finding that special case which contains all the germs of generality." — David Hilbert

In the previous sections, we learned what partial differential equations are, how they are classified, and what boundary and initial conditions mean. Now comes the central question: how do we actually solve a PDE?

Partial differential equations are fundamentally harder than ordinary differential equations. An ODE involves derivatives with respect to one variable; a PDE involves derivatives with respect to multiple variables simultaneously. The solution is not a curve but an entire surface (or higher-dimensional object). We cannot simply "integrate both sides."

The method of separation of variables, developed by Joseph Fourier in the early 1800s while studying heat conduction, offers a profound insight: we can break a complicated PDE into a collection of simpler ODEs. The trick is to assume that the solution has a special multiplicative structure and then show that this assumption leads to a complete family of solutions.

The Central Strategy

Separation of variables converts a PDE in multiple independent variables into a set of ODEs, each in a single variable. Instead of solving one hard problem, we solve many easy ones.

\underbrace{\text{PDE in } (x, t)}_{\text{hard}} \;\longrightarrow\; \underbrace{\text{ODE in } x}_{\text{easy}} \;+\; \underbrace{\text{ODE in } t}_{\text{easy}}

Historical Context

Joseph Fourier's 1822 masterwork Théorie Analytique de la Chaleur (Analytical Theory of Heat) introduced both the heat equation and the method of separation of variables. His key realization was that any reasonable function could be represented as an infinite sum of sines and cosines — what we now call a Fourier series. This was initially controversial (even Lagrange and Laplace were skeptical), but it revolutionized both mathematics and physics.

The method was later applied to the wave equation by Daniel Bernoulli and to Laplace's equation by many others. Today, it remains one of the most important analytical tools in mathematical physics and engineering, and its ideas are the foundation of spectral methods in computational science and Fourier Neural Operators in machine learning.

The Core Idea: Products Become Sums

The entire method rests on a single, inspired guess:

The Separation Ansatz

Assume the solution $u(x, t)$ can be written as a product of a function of $x$ alone and a function of $t$ alone:

u(x, t) = X(x) \cdot T(t)

where $X(x)$ captures the spatial shape and $T(t)$ captures the temporal behavior.

Why would this work? Not every function of two variables is a product of two functions of one variable. The function $u(x,t) = x + t$ cannot be written as $X(x) \cdot T(t)$ . But the key insight is that we don't need every solution to have this form — we only need enough of them that we can build the general solution by adding them together (superposition).

Why Products?

The product assumption works because of how derivatives interact with products: when you differentiate $X(x) \cdot T(t)$ with respect to $x$ , the $T(t)$ factor passes through unchanged. This is precisely what allows us to "separate" the variables onto different sides of the equation.

The Method Step by Step

Let us develop the method using the one-dimensional heat equation as our prototype. The same steps apply (with minor modifications) to the wave equation, Laplace's equation, and many other PDEs.

Step 1: State the Problem

We want to solve the heat equation on a rod of length $L$ with both ends held at zero temperature:

PDE:

\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}, \quad 0 < x < L, \; t > 0

BCs:

u(0, t) = 0, \quad u(L, t) = 0

IC:

u(x, 0) = f(x)

Step 2: Assume a Product Solution

Substitute $u(x, t) = X(x) \cdot T(t)$ into the PDE:

X(x) \cdot T'(t) = \alpha \cdot X''(x) \cdot T(t)

Here $T'(t)$ means $dT/dt$ and $X''(x)$ means $d^2X/dx^2$ . Note how the product structure allows partial derivatives to act on only one factor.

Step 3: Divide to Separate

Divide both sides by $X(x) \cdot T(t)$ (assuming neither is zero):

\frac{1}{\alpha} \cdot \frac{T'(t)}{T(t)} = \frac{X''(x)}{X(x)}

Now comes the critical observation. The left side depends only on $t$ . The right side depends only on $x$ . Yet they are equal for all values of both $x$ and $t$ . The only way this can happen is if both sides equal the same constant.

The Separation Argument

If $f(t) = g(x)$ for all $x$ and $t$ , then both $f(t)$ and $g(x)$ must be constant. Why? If you vary $t$ while holding $x$ fixed, the right side doesn't change, so the left side can't change either. Therefore $f(t) = \text{const}$ . The same argument applies to $g(x)$ .

Step 4: Introduce the Separation Constant

We call this constant $-\lambda$ (the minus sign and the symbol $\lambda$ — lambda — are chosen by convention for convenience):

Spatial ODE:

X''(x) + \lambda X(x) = 0

Temporal ODE:

T'(t) + \alpha \lambda T(t) = 0

We have converted one PDE into two ODEs! Each is a standard ODE that we know how to solve. The parameter $\lambda$ connects them: it is the separation constant (which will turn out to be an eigenvalue).

Why -\u03BB and not +\u03BB?

We use $-\lambda$ with $\lambda > 0$ because for the heat equation, the solution must decay over time (physically, heat dissipates). With $T'(t) = -\alpha\lambda T(t)$ and $\lambda > 0$ , we get $T(t) = e^{-\alpha\lambda t}$ , which decays. If $\lambda < 0$ , the solution would grow exponentially — unphysical for heat.

Step 5: Apply Boundary Conditions

The boundary conditions $u(0, t) = 0$ and $u(L, t) = 0$ translate directly to the spatial part (since $T(t)$ is not identically zero):

X(0) = 0, \quad X(L) = 0

This transforms the spatial ODE into an eigenvalue problem: find the values of $\lambda$ for which the ODE $X'' + \lambda X = 0$ has a nonzero solution satisfying both boundary conditions.

Example: The Heat Equation on a Finite Rod

Let us carry out the full solution for the heat equation with Dirichlet boundary conditions (both ends at zero temperature).

Solving the Spatial ODE

The spatial ODE is:

X'' + \lambda X = 0, \quad X(0) = 0, \quad X(L) = 0

The general solution depends on the sign of $\lambda$ . We consider three cases:

Case 1: $\lambda < 0$ (rejected)

Writing $\lambda = -\mu^2$ , the ODE becomes $X'' - \mu^2 X = 0$ with solution $X = Ae^{\mu x} + Be^{-\mu x}$ . Applying $X(0) = 0$ and $X(L) = 0$ forces $A = B = 0$ . Only the trivial solution exists — rejected.

Case 2: $\lambda = 0$ (rejected)

The ODE becomes $X'' = 0$ , so $X = Ax + B$ . Applying $X(0) = 0$ gives $B = 0$ , and $X(L) = 0$ gives $A = 0$ . Again trivial — rejected.

Case 3: $\lambda > 0$ (accepted!)

Writing $\lambda = \mu^2$ , the ODE becomes $X'' + \mu^2 X = 0$ with solution $X = A\cos(\mu x) + B\sin(\mu x)$ .

Applying $X(0) = 0$ : since $\cos(0) = 1$ and $\sin(0) = 0$ , we get $A = 0$ .

So $X = B\sin(\mu x)$ . Applying $X(L) = 0$ : we need $\sin(\mu L) = 0$ , which means $\mu L = n\pi$ for $n = 1, 2, 3, \ldots$

Therefore, the eigenvalues and eigenfunctions are:

Eigenvalues:

\lambda_n = \left(\frac{n\pi}{L}\right)^2, \quad n = 1, 2, 3, \ldots

Eigenfunctions:

X_n(x) = \sin\left(\frac{n\pi x}{L}\right), \quad n = 1, 2, 3, \ldots

Solving the Temporal ODE

For each eigenvalue $\lambda_n$ , the temporal ODE is:

T_n'(t) + \alpha \lambda_n T_n(t) = 0

This is a first-order linear ODE with solution:

T_n(t) = e^{-\alpha \lambda_n t} = e^{-\alpha (n\pi/L)^2 t}

Each mode decays exponentially, with higher modes (larger $n$ ) decaying much faster because the decay rate is proportional to $n^2$ .

Individual Mode Solutions

Combining spatial and temporal parts, each mode (or normal mode) of the heat equation is:

u_n(x, t) = X_n(x) \cdot T_n(t) = \sin\left(\frac{n\pi x}{L}\right) e^{-\alpha (n\pi/L)^2 t}

🎨The Separation Concept: u(x,t) = X(x) · T(t)

The key idea of separation of variables: assume the solution is a product of a spatial function X(x) and a temporal function T(t). Watch how the spatial shape remains fixed while the amplitude changes over time.

Mode number n = 1

Time t = 0.00

Diffusivity α = 0.050

Spatial Part X(x)
Shape determined by boundary conditions

Temporal Part T(t)
Exponential decay (heat dissipates)

Product u(x,t) = X · T
Shape preserved, amplitude shrinks

Key Insight

For the heat equation, higher modes (larger n) decay faster because the decay rate is proportional to n². Mode n=1 decays with rate α(1π/L)² = 0.493. This is why sharp features (which need high modes) smooth out first.

Eigenvalue Problems: The Heart of the Method

The spatial ODE together with its boundary conditions forms an eigenvalue problem. This is one of the most important concepts in mathematical physics and has deep connections to linear algebra.

Sturm-Liouville Eigenvalue Problem

The spatial equation $X'' + \lambda X = 0$ with boundary conditions is a special case of a Sturm-Liouville problem. The key properties are:

Discrete eigenvalues: Only specific values of $\lambda$ yield nonzero solutions
Orthogonality: Eigenfunctions corresponding to different eigenvalues are orthogonal: $\int_0^L X_m(x) X_n(x) \, dx = 0$ when $m \neq n$
Completeness: Any "reasonable" function satisfying the boundary conditions can be expanded as a series of eigenfunctions

The analogy with linear algebra is powerful. In matrix algebra, we have $Av = \lambda v$ where $A$ is a matrix and $v$ is an eigenvector. Here, the differential operator $-d^2/dx^2$ plays the role of the matrix, and the eigenfunctions $\sin(n\pi x/L)$ play the role of eigenvectors.

Linear Algebra	PDE Eigenvalue Problem
Matrix A	Operator -d²/dx²
Eigenvector v	Eigenfunction Xₙ(x) = sin(nπx/L)
Eigenvalue λ	Eigenvalue λₙ = (nπ/L)²
Finite-dimensional (N eigenvectors)	Infinite-dimensional (countably many)
vᵀₘ vₙ = 0 (orthogonality)	∫ Xₘ Xₙ dx = 0 (orthogonality)
Any vector = ∑ cₙvₙ	Any function = ∑ bₙXₙ(x)

🎶Eigenfunctions: The Building Blocks

Boundary conditions force the spatial solutions into specific shapes called eigenfunctions. These form a complete orthogonal set — any function satisfying the boundary conditions can be expressed as a sum of these modes.

Number of modes shown: 4

Highlight mode n = 1

Mode n	Eigenfunction	Eigenvalue λₙ	Nodes (zeros)
1	sin(1πx/L)	(1π/L)²	2
2	sin(2πx/L)	(2π/L)²	3
3	sin(3πx/L)	(3π/L)²	4
4	sin(4πx/L)	(4π/L)²	5

Different boundary conditions produce different eigenfunctions:

Boundary Conditions	Eigenfunctions	Eigenvalues
u(0)=0, u(L)=0 (Dirichlet)	sin(nπx/L)	(nπ/L)²
u′(0)=0, u′(L)=0 (Neumann)	cos(nπx/L)	(nπ/L)²
u(0)=0, u′(L)=0 (Mixed)	sin((2n-1)πx/(2L))	((2n-1)π/(2L))²

Superposition and Fourier Series

Each individual mode $u_n(x,t) = \sin(n\pi x/L) \, e^{-\alpha(n\pi/L)^2 t}$ solves the PDE and satisfies the boundary conditions. But there are infinitely many such solutions (one for each $n$ ). Which one is the actual solution?

The answer uses superposition: because the heat equation is linear, any linear combination of solutions is also a solution. Therefore, the general solution is:

u(x, t) = \sum_{n=1}^{\infty} b_n \sin\left(\frac{n\pi x}{L}\right) e^{-\alpha(n\pi/L)^2 t}

The coefficients $b_n$ are determined by the initial condition $u(x, 0) = f(x)$ . At $t = 0$ , all the exponentials equal 1, so:

f(x) = \sum_{n=1}^{\infty} b_n \sin\left(\frac{n\pi x}{L}\right)

This is a Fourier sine series! The coefficients are computed using orthogonality: multiply both sides by $\sin(m\pi x/L)$ and integrate from 0 to $L$ . Because of orthogonality, all terms vanish except $n = m$ :

Fourier Coefficients via Orthogonality

b_n = \frac{2}{L} \int_0^L f(x) \sin\left(\frac{n\pi x}{L}\right) dx

Each coefficient $b_n$ measures how much of mode $n$ is present in the initial condition $f(x)$ . This is the projection of $f$ onto the eigenfunction $X_n$ .

Why Orthogonality Matters

Without orthogonality, finding the coefficients $b_n$ would require solving an infinite system of coupled equations — essentially impossible. Orthogonality allows us to compute each coefficient independently, using a single integral. This is analogous to how orthogonal coordinate axes let us read off each component of a vector independently.

Convergence and the Complete Solution

The full solution to the heat equation is then:

u(x, t) = \sum_{n=1}^{\infty} \left[\frac{2}{L} \int_0^L f(\xi) \sin\left(\frac{n\pi \xi}{L}\right) d\xi\right] \sin\left(\frac{n\pi x}{L}\right) e^{-\alpha(n\pi/L)^2 t}

This formula tells a beautiful physical story: the initial temperature distribution $f(x)$ is decomposed into sinusoidal modes, each of which decays independently at a rate proportional to $n^2$ . High-frequency modes (sharp features) die out first; the solution progressively smooths.

🎧Superposition: Building Solutions from Modes

The general solution is a superposition (sum) of all the separated solutions. By choosing the right coefficients, we can match any initial condition. Watch how adding more modes improves the approximation, and how the solution evolves in time.

Number of modes N = 5

Time t = 0.00

Solution u(x,t) — dashed line: initial condition f(x), faded: individual modes

Space-time heatmap (white line: current t)

bₙ=1

0.900

bₙ=2

0.000

bₙ=3

-0.300

bₙ=4

-0.000

bₙ=5

-0.180

Gibbs Phenomenon

Notice the overshoot near the jump discontinuity in the "Step" initial condition. This is the Gibbs phenomenon: no matter how many modes you add, the Fourier series overshoots by about 9% at a jump. This is not a bug — it's a fundamental property of Fourier series at discontinuities.

Physical Meaning of Mode Decay

The fact that higher modes decay faster explains why a hot metal rod with a complex initial temperature distribution eventually approaches a simple sinusoidal shape (the first mode) before cooling to zero. Sharp temperature gradients are "expensive" in terms of energy flow and smooth out first.

Example: The Wave Equation on a String

The same method applies to the wave equation, but with a fundamentally different temporal behavior. Consider a vibrating string fixed at both ends:

PDE:

\frac{\partial^2 u}{\partial t^2} = c^2 \frac{\partial^2 u}{\partial x^2}, \quad 0 < x < L, \; t > 0

BCs:

u(0, t) = 0, \quad u(L, t) = 0

ICs:

u(x, 0) = f(x), \quad \frac{\partial u}{\partial t}(x, 0) = g(x)

Separation Leads to Different Temporal Behavior

Substituting $u(x,t) = X(x) \cdot T(t)$ and separating, the spatial equation is exactly the same as before:

X'' + \lambda X = 0 \quad \Rightarrow \quad X_n(x) = \sin\left(\frac{n\pi x}{L}\right)

But the temporal equation is second-order:

T'' + c^2 \lambda T = 0 \quad \Rightarrow \quad T_n(t) = A_n \cos(\omega_n t) + B_n \sin(\omega_n t)

where $\omega_n = c\sqrt{\lambda_n} = cn\pi/L$ is the natural frequency of mode $n$ .

Heat Equation (Parabolic)

T_n(t) = e^{-\alpha\lambda_n t}

First-order temporal ODE
Exponential decay — solution smooths
Higher modes vanish faster
Irreversible (entropy increases)
One initial condition: $u(x,0) = f(x)$

Wave Equation (Hyperbolic)

T_n(t) = A_n\cos(\omega_n t) + B_n\sin(\omega_n t)

Second-order temporal ODE
Oscillation — modes vibrate forever
All modes maintain their amplitude
Reversible (energy conserved)
Two initial conditions: $u(x,0)$ and $u_t(x,0)$

The General Wave Solution

The general solution to the wave equation with zero initial velocity ( $g(x) = 0$ ) is:

u(x, t) = \sum_{n=1}^{\infty} b_n \sin\left(\frac{n\pi x}{L}\right) \cos\left(\frac{cn\pi t}{L}\right)

with the same Fourier coefficients $b_n = \frac{2}{L}\int_0^L f(x) \sin(n\pi x/L)\,dx$ . If the initial velocity is nonzero, we also get sine terms in time with a separate set of coefficients determined by $g(x)$ .

Musical Harmonics

The natural frequencies $\omega_n = cn\pi/L$ are integer multiples of the fundamental frequency $\omega_1 = c\pi/L$ . These are the harmonics of a vibrating string! The fundamental frequency determines the pitch, while the relative amplitudes of the harmonics determine the timbre (tone quality). This is why a guitar string and a piano string playing the same note sound different — they excite different harmonic mixtures.

Machine Learning Connections

The ideas behind separation of variables are remarkably relevant to modern machine learning. The decomposition of functions into orthogonal modes is the mathematical foundation for several important ML techniques.

1. Fourier Neural Operators (FNO)

The Fourier Neural Operator (Li et al., 2021) directly extends the spectral approach of separation of variables. Instead of solving in physical space, FNO learns solution operators in Fourier (spectral) space:

v_{l+1}(x) = \sigma\left(W_l v_l(x) + \mathcal{F}^{-1}\left(R_l \cdot \mathcal{F}(v_l)\right)(x)\right)

where $\mathcal{F}$ is the Fourier transform and $R_l$ is a learned weight matrix that acts on the Fourier modes. This is exactly the insight from separation of variables: many PDEs are diagonal in Fourier space, meaning each mode can be processed independently.

2. Spectral Normalization in GANs

Spectral normalization constrains the Lipschitz constant of neural network layers by normalizing their spectral norm (largest singular value). This is an eigenvalue computation, directly analogous to finding eigenvalues in separation of variables. It stabilizes GAN training by controlling how much each "mode" of the discriminator can amplify signals.

3. Principal Component Analysis (PCA)

PCA decomposes data into orthogonal principal components, just as separation of variables decomposes solutions into orthogonal eigenfunctions. The covariance matrix plays the role of the differential operator, eigenvectors play the role of eigenfunctions, and eigenvalues measure the importance of each mode. Keeping only the top modes is like truncating the Fourier series to the most important terms.

4. Diffusion Models

Score-based diffusion models for image generation are built on the same mathematics as the heat equation. The forward diffusion process adds noise progressively — mathematically, this is heat diffusion in pixel space. The Fourier decomposition tells us that high-frequency features (fine details) are destroyed first and low-frequency features (overall structure) last. The reverse (denoising) process reconstructs these features in the opposite order.

PDE Concept	ML Application
Eigenfunction expansion	PCA / Spectral decomposition
Fourier modes	Fourier Neural Operator layers
Mode truncation	Dimensionality reduction
Orthogonality	Feature independence
Eigenvalue decay	Explained variance ratio
Heat equation diffusion	Forward process in diffusion models
Spectral methods	Spectral normalization

Python Implementation

Solving the Heat Equation by Separation of Variables

The following implementation computes the analytical solution using Fourier coefficients and mode superposition — the exact method we derived above.

Analytical Solution of the Heat Equation

🐍separation_of_variables_heat.py

Explanation(5)

Code(58)

3Analytical Solution Setup

Unlike numerical methods (finite differences), separation of variables gives us the EXACT analytical solution as an infinite series. We truncate to N modes for computation.

19Initial Condition

The step function is a challenging test case because it has a discontinuity. The Fourier series will exhibit the Gibbs phenomenon at the jump.

22Fourier Coefficient Computation

Each coefficient b_n is computed by the inner product of f(x) with sin(nπx/L). This uses the orthogonality of eigenfunctions: only the n-th mode picks up the n-th coefficient.

35Mode Superposition

The solution is a sum of modes, each decaying at its own rate. Mode n decays as exp(-α(nπ/L)²t). Higher modes (higher frequency) decay MUCH faster because the rate scales as n².

39Time Evolution

At t=0, we see the (truncated) Fourier series of f(x). As time increases, high-frequency modes die out first, smoothing the solution. Eventually only the first mode remains.

53 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4def separation_of_variables_heat(L=1.0, alpha=0.01, N_modes=20,
5                                  Nx=200, t_values=[0, 0.5, 1, 2, 5]):
6    """
7    Solve the 1D heat equation using separation of variables.
8
9    PDE: du/dt = alpha * d²u/dx²
10    BCs: u(0,t) = u(L,t) = 0
11    IC:  u(x,0) = f(x) (step function)
12
13    Solution: u(x,t) = sum_{n=1}^{N} b_n sin(n*pi*x/L) exp(-alpha*(n*pi/L)^2*t)
14    """
15    x = np.linspace(0, L, Nx)
16
17    # Initial condition: step function
18    def f(x_val):
19        return 1.0 if 0.25 < x_val < 0.75 else 0.0
20
21    # Compute Fourier sine coefficients: b_n = (2/L) * integral f(x) sin(n*pi*x/L) dx
22    def compute_bn(n):
23        integrand = np.array([f(xi) * np.sin(n * np.pi * xi / L) for xi in x])
24        return 2 / L * np.trapezoid(integrand, x)
25
26    # Pre-compute all coefficients
27    coefficients = [compute_bn(n) for n in range(1, N_modes + 1)]
28    print("First 5 Fourier coefficients:")
29    for n, bn in enumerate(coefficients[:5], 1):
30        print(f"  b_{n} = {bn:.6f}")
31
32    # Compute solution at each time
33    fig, axes = plt.subplots(1, len(t_values), figsize=(16, 4))
34
35    for idx, t in enumerate(t_values):
36        u = np.zeros_like(x)
37        for n_idx, bn in enumerate(coefficients):
38            n = n_idx + 1
39            lambda_n = (n * np.pi / L) ** 2
40            # Each mode: b_n * sin(n*pi*x/L) * exp(-alpha*lambda_n*t)
41            u += bn * np.sin(n * np.pi * x / L) * np.exp(-alpha * lambda_n * t)
42
43        axes[idx].fill_between(x, u, alpha=0.3, color='steelblue')
44        axes[idx].plot(x, u, 'b-', linewidth=2)
45        axes[idx].set_title(f't = {t}')
46        axes[idx].set_ylim(-0.3, 1.3)
47        axes[idx].set_xlabel('x')
48        axes[idx].grid(True, alpha=0.3)
49
50    axes[0].set_ylabel('u(x, t)')
51    fig.suptitle(f'Heat Equation Solution ({N_modes} modes)', fontsize=14)
52    plt.tight_layout()
53    plt.show()
54
55    return x, coefficients
56
57# Run the solver
58x, coeffs = separation_of_variables_heat()

Solving the Wave Equation by Separation of Variables

Analytical Solution of the Wave Equation

🐍separation_of_variables_wave.py

Explanation(4)

Code(73)

3Wave Equation Setup

The wave equation is second-order in time, so the temporal equation T'' = -ω²T gives oscillatory solutions cos(ωt) and sin(ωt) instead of the exponential decay of the heat equation.

13Zero Initial Velocity

With du/dt(x,0) = 0, we only get cos terms in time (no sin terms). If we had nonzero initial velocity, we would need both cos and sin, requiring a second set of Fourier coefficients.

34Natural Frequencies

Each mode n oscillates at its natural frequency ωₙ = cnπ/L. These are the harmonics of the string! The fundamental frequency is ω₁ = cπ/L, and overtones are integer multiples.

53Energy Conservation

Unlike the heat equation where energy dissipates, the wave equation conserves total energy. The L² norm of the solution remains constant over time — energy just redistributes between kinetic and potential forms.

69 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4def separation_of_variables_wave(L=1.0, c=1.0, N_modes=30,
5                                  Nx=200, t_values=[0, 0.3, 0.5, 0.8, 1.0]):
6    """
7    Solve the 1D wave equation using separation of variables.
8
9    PDE: d²u/dt² = c² * d²u/dx²
10    BCs: u(0,t) = u(L,t) = 0
11    IC:  u(x,0) = f(x) (triangle), du/dt(x,0) = 0
12
13    Solution: u(x,t) = sum b_n sin(n*pi*x/L) cos(c*n*pi*t/L)
14    """
15    x = np.linspace(0, L, Nx)
16
17    # Initial condition: triangle wave
18    def f(x_val):
19        if x_val < L / 2:
20            return 2 * x_val / L
21        return 2 * (1 - x_val / L)
22
23    # Fourier coefficients
24    def compute_bn(n):
25        integrand = np.array([f(xi) * np.sin(n * np.pi * xi / L) for xi in x])
26        return 2 / L * np.trapezoid(integrand, x)
27
28    coefficients = [compute_bn(n) for n in range(1, N_modes + 1)]
29
30    # Key difference from heat: temporal part is cos, not exp!
31    fig, axes = plt.subplots(1, len(t_values), figsize=(16, 4))
32
33    for idx, t in enumerate(t_values):
34        u = np.zeros_like(x)
35        for n_idx, bn in enumerate(coefficients):
36            n = n_idx + 1
37            omega_n = c * n * np.pi / L  # natural frequency of mode n
38            # Each mode: b_n * sin(n*pi*x/L) * cos(omega_n * t)
39            u += bn * np.sin(n * np.pi * x / L) * np.cos(omega_n * t)
40
41        axes[idx].plot(x, u, 'r-', linewidth=2)
42        axes[idx].fill_between(x, u, alpha=0.2, color='red')
43        axes[idx].set_title(f't = {t}')
44        axes[idx].set_ylim(-1.3, 1.3)
45        axes[idx].set_xlabel('x')
46        axes[idx].grid(True, alpha=0.3)
47
48    axes[0].set_ylabel('u(x, t)')
49    fig.suptitle('Wave Equation: Vibrating String', fontsize=14)
50    plt.tight_layout()
51    plt.show()
52
53    # Show that energy is conserved (unlike heat equation)
54    t_span = np.linspace(0, 3, 300)
55    energy = []
56    for t in t_span:
57        u_vals = np.zeros_like(x)
58        for n_idx, bn in enumerate(coefficients):
59            n = n_idx + 1
60            omega_n = c * n * np.pi / L
61            u_vals += bn * np.sin(n * np.pi * x / L) * np.cos(omega_n * t)
62        # Approximate total energy ~ integral of u^2
63        energy.append(np.trapezoid(u_vals**2, x))
64
65    plt.figure(figsize=(8, 3))
66    plt.plot(t_span, energy, 'g-', linewidth=2)
67    plt.xlabel('Time')
68    plt.ylabel('Energy (L2 norm squared)')
69    plt.title('Wave Equation: Energy is Conserved!')
70    plt.grid(True, alpha=0.3)
71    plt.show()
72
73separation_of_variables_wave()

From Separation of Variables to Spectral Methods

The connection to modern computational methods and ML becomes clear when we implement the spectral approach using FFT:

Spectral Methods: The Bridge to Machine Learning

🐍spectral_method_ml.py

Explanation(4)

Code(86)

3Spectral Methods: The Generalization

Spectral methods extend separation of variables to general domains using FFT. Instead of computing Fourier coefficients analytically, we use the Fast Fourier Transform.

14Connection to FNO

Fourier Neural Operators (FNO) learn in spectral space just like spectral PDE solvers. The key insight: many PDEs are DIAGONAL in Fourier space, meaning each mode evolves independently.

30Exact Spectral Evolution

In spectral space, the heat equation becomes a set of independent ODEs. Each wavenumber k has an exact solution: u_hat_k(t) = u_hat_k(0) * exp(-αk²t). No spatial discretization error!

35FFT-Based Solver

The FFT transforms between physical and spectral space in O(N log N) time. This makes spectral methods extremely efficient for problems with smooth solutions on periodic or simple domains.

82 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4def spectral_method_pde(N_basis=32, L=2*np.pi, alpha=0.01, dt=0.01, T_final=2.0):
5    """
6    Spectral method for PDEs: the ML-relevant generalization
7    of separation of variables.
8
9    Instead of solving in physical space, we:
10    1. Transform to frequency space (Fourier transform)
11    2. Solve ODEs for each mode independently
12    3. Transform back
13
14    This is the foundation of:
15    - Fourier Neural Operators (FNO)
16    - Spectral normalization in GANs
17    - Frequency-domain learning
18    """
19    x = np.linspace(0, L, N_basis, endpoint=False)
20
21    # Initial condition: sum of two Gaussians
22    u = np.exp(-10 * (x - L/3)**2) + 0.5 * np.exp(-20 * (x - 2*L/3)**2)
23
24    # In spectral space, the heat equation becomes:
25    # d(u_hat_k)/dt = -alpha * k^2 * u_hat_k
26    # This is just an ODE for each mode!
27
28    k = np.fft.fftfreq(N_basis, d=L/N_basis) * 2 * np.pi  # wavenumbers
29
30    # Time evolution in spectral space
31    Nt = int(T_final / dt)
32    history = [u.copy()]
33    times = [0]
34
35    for step in range(Nt):
36        t = (step + 1) * dt
37
38        # Forward FFT: physical -> spectral
39        u_hat = np.fft.fft(u)
40
41        # Exact solution in spectral space (no discretization error!)
42        # Each mode evolves independently: u_hat_k(t+dt) = u_hat_k(t) * exp(-alpha*k^2*dt)
43        u_hat *= np.exp(-alpha * k**2 * dt)
44
45        # Inverse FFT: spectral -> physical
46        u = np.real(np.fft.ifft(u_hat))
47
48        if step % (Nt // 10) == 0:
49            history.append(u.copy())
50            times.append(t)
51
52    # Visualize
53    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
54
55    for i, (u_snap, t) in enumerate(zip(history, times)):
56        alpha_val = 0.3 + 0.7 * (1 - i / len(history))
57        ax1.plot(x, u_snap, alpha=alpha_val, label=f't={t:.1f}')
58    ax1.set_xlabel('x')
59    ax1.set_ylabel('u(x,t)')
60    ax1.set_title('Spectral Method: Heat Equation Solution')
61    ax1.legend()
62    ax1.grid(True, alpha=0.3)
63
64    # Show spectral decay
65    u_hat_init = np.fft.fft(history[0])
66    u_hat_final = np.fft.fft(history[-1])
67    freqs = np.fft.fftfreq(N_basis, d=L/N_basis)
68    ax2.semilogy(np.abs(freqs[:N_basis//2]),
69                 np.abs(u_hat_init[:N_basis//2]), 'b-o', label='Initial', markersize=3)
70    ax2.semilogy(np.abs(freqs[:N_basis//2]),
71                 np.abs(u_hat_final[:N_basis//2]) + 1e-16, 'r-o', label='Final', markersize=3)
72    ax2.set_xlabel('Frequency |k|')
73    ax2.set_ylabel('|u_hat(k)|')
74    ax2.set_title('Spectral Coefficients: High Frequencies Decay')
75    ax2.legend()
76    ax2.grid(True, alpha=0.3)
77
78    plt.tight_layout()
79    plt.show()
80
81    print("Key insight for ML:")
82    print("  Fourier Neural Operators (FNO) learn solution operators")
83    print("  by parameterizing transformations in spectral space,")
84    print("  just as separation of variables solves PDEs mode-by-mode.")
85
86spectral_method_pde()

Common Pitfalls

Forgetting to Check All Cases for \u03BB

When solving the eigenvalue problem, you must check all three cases: $\lambda < 0$ , $\lambda = 0$ , and $\lambda > 0$ . It is tempting to jump directly to the oscillatory case, but you must show rigorously that the other cases yield only the trivial solution.

Applying Separation of Variables to Nonlinear PDEs

Separation of variables relies on linearity of the PDE. It works because the sum of solutions is also a solution (superposition). For nonlinear PDEs like the Navier-Stokes equations or reaction-diffusion systems with nonlinear reaction terms, this method generally fails. You need other techniques (numerical methods, perturbation theory, symmetry methods).

Confusing Boundary Conditions and Initial Conditions

Boundary conditions determine the eigenfunctions (the spatial shapes). Initial conditions determine the coefficients (how much of each shape to include). Do not mix them up! The method processes them at different stages: BCs in steps 3-5, IC in the final superposition step.

Truncating Too Few Modes

For smooth initial conditions (like a parabola), just a few modes give an excellent approximation. But for discontinuous initial conditions (like a step function), you may need many modes, and the Gibbs phenomenon causes persistent overshoot near jumps. The number of modes needed depends on the smoothness of $f(x)$ .

When Does Separation of Variables Work?

The method works when:

The PDE is linear and homogeneous
The domain has a simple geometry (rectangle, disk, sphere)
The boundary conditions are homogeneous (zero at the boundary)
The coefficients of the PDE are constant or depend on only one variable

For non-homogeneous BCs, you can often subtract a steady-state solution to make them homogeneous. For non-homogeneous PDEs, use eigenfunction expansion.

Test Your Understanding

Test Your Understanding1 / 8

What is the key assumption in the separation of variables method?

Summary

The method of separation of variables is one of the most powerful analytical techniques for solving partial differential equations. By assuming a product form for the solution, we reduce a PDE to a collection of ODEs, each solvable by standard methods.

The Algorithm in Five Steps

Assume a product solution: $u(x,t) = X(x) \cdot T(t)$
Substitute into the PDE and separate variables to get two sides depending on different variables
Set both sides equal to a constant $-\lambda$ , obtaining two ODEs
Solve the spatial eigenvalue problem using boundary conditions to find eigenvalues $\lambda_n$ and eigenfunctions $X_n(x)$
Superpose all modes and determine coefficients from the initial condition using orthogonality

Key Concepts

Concept	Description
Separation ansatz	Assume u(x,t) = X(x)·T(t), converting PDE to ODEs
Separation constant	λ connects the spatial and temporal ODEs
Eigenvalue problem	Spatial ODE + BCs determine allowed λ values
Eigenfunctions	sin(nπx/L) for Dirichlet BCs — the building blocks
Orthogonality	∫ Xₘ Xₙ dx = 0 for m≠n — enables independent coefficient computation
Superposition	General solution = ∑ bₙ Xₙ(x) Tₙ(t)
Fourier coefficients	bₙ = (2/L)∫ f(x) Xₙ(x) dx — projection onto modes
Mode decay (heat)	e^(-αλₙt) — higher modes decay faster as n²
Mode oscillation (wave)	cos(ωₙt) — all modes vibrate forever

Key Takeaways

Separation of variables reduces a PDE to two or more ODEs, each in a single variable
The boundary conditions determine the spatial eigenfunctions, while the initial condition determines the expansion coefficients
Orthogonality of eigenfunctions is the key property that makes the method practical — it allows independent computation of each Fourier coefficient
The heat equation produces exponential decay of modes (diffusion smooths), while the wave equation produces oscillation (waves vibrate)
The same eigenvalue/eigenfunction framework underlies PCA, spectral methods, Fourier Neural Operators, and diffusion models in machine learning
The method works for linear PDEs with homogeneous BCs on simple geometries — it does not apply to nonlinear PDEs or irregular domains

The Essence of Separation of Variables:

"A PDE is hard because everything depends on everything. Separation of variables breaks this entanglement: space and time evolve independently, and the solution is a symphony of orthogonal modes, each marching to its own beat."

Coming Next: In the next chapter, we'll dive deeper into the heat equation, exploring its derivation from physical principles, the heat kernel, and its connections to diffusion in machine learning.