Chapter 26
22 min read
Section 218 of 353

Derivation of the Heat Equation

The Heat Equation

Learning Objectives

By the end of this section, you will be able to:

  1. Derive the heat equation from first principles using conservation of energy and Fourier's law
  2. Understand each term in the heat equation and its physical meaning
  3. Explain the role of thermal diffusivity and how it affects heat propagation
  4. Visualize the heat kernel (fundamental solution) and its Gaussian shape
  5. Connect the heat equation to diffusion models in modern machine learning
  6. Identify why sharp temperature features smooth out over time
  7. Apply the convolution solution formula using the heat kernel

The Big Picture: Why the Heat Equation Matters

"Heat, like gravity, penetrates every substance of the universe." — Joseph Fourier, 1822

The heat equation is arguably the most important PDE in applied mathematics. It describes not just thermal diffusion, but any process where a quantity spreads from high to low concentration:

🔥 Thermal Diffusion

Heat spreading through materials, from CPU cooling to climate models

🧪 Chemical Diffusion

Molecules spreading through fluids, drug delivery, pollution dispersal

💰 Financial Diffusion

Option prices, the Black-Scholes model is a modified heat equation

📚 Probability Diffusion

Random walks, Brownian motion, the Fokker-Planck equation

📷 Image Processing

Gaussian blur, noise removal, scale-space theory in computer vision

🤖 Generative AI

DALL-E, Stable Diffusion, and other diffusion models for image generation

The Central Equation

ut=α2ux2\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}

In words: The rate of change of temperature equals the diffusivity times the curvature of the temperature profile.


Historical Context: Fourier's Revolution

In 1822, Jean-Baptiste Joseph Fourier published "Théorie Analytique de la Chaleur" (The Analytical Theory of Heat), introducing both the heat equation and Fourier series. This work was revolutionary for several reasons:

1. First PDE for a Real Physical Problem

Fourier derived the heat equation from physical principles — not abstract mathematics. He showed that calculus could describe the continuous flow of heat through matter.

2. Introduction of Fourier Series

To solve the heat equation, Fourier decomposed arbitrary functions into sums of sines and cosines. This was initially controversial but became one of the most powerful tools in mathematics and engineering.

3. Dimensional Analysis

Fourier pioneered the systematic use of physical dimensions, showing that equations must be dimensionally consistent. This idea underpins all of modern physics and engineering.

Fourier's Legacy: The techniques he developed for heat conduction — separation of variables, Fourier series, and the convolution integral — are now fundamental tools across all of science, from signal processing to quantum mechanics.

Physical Setup: Heat in a Rod

Consider a thin rod of length LL made of some material. We want to describe how temperature varies along the rod and changes over time.

Key Assumptions

  1. One-dimensional flow: Heat only flows along the rod (not through its sides)
  2. Homogeneous material: The rod has uniform properties throughout
  3. No internal heat sources: Heat is neither created nor destroyed inside the rod
  4. Temperature varies continuously: We can use calculus to describe the temperature field

The Variables

SymbolNameDescriptionUnits
u(x,t)TemperatureTemperature at position x and time tK or °C
xPositionLocation along the rodm
tTimeTime since initial conditions
q(x,t)Heat fluxRate of heat flow per unit areaW/m²
kThermal conductivityHow easily heat flowsW/(m·K)
ρDensityMass per unit volumekg/m³
cₚSpecific heatEnergy to raise 1 kg by 1 KJ/(kg·K)

Conservation of Energy

The foundation of the heat equation is the First Law of Thermodynamics: energy cannot be created or destroyed, only transferred. For a small segment of the rod from xx to x+dxx + dx:

Energy Balance for a Control Volume

dEdt=q(x)Aheat inq(x+dx)Aheat out\frac{dE}{dt} = \underbrace{q(x)A}_{\text{heat in}} - \underbrace{q(x+dx)A}_{\text{heat out}}
Rate of energy change = Net heat flux through boundaries
Energy Conservation in Heat Conduction

Visualizing how energy flows in and out of a control volume

Conservation Law:

dE/dt = qin - qout

The rate of change of energy in the control volume equals the net heat flux through its boundaries.

Mathematical Expression

The thermal energy in our small segment is:

E=ρcpu(x,t)AdxE = \rho c_p u(x,t) \cdot A \cdot dx

where AA is the cross-sectional area. Taking the time derivative:

dEdt=ρcputAdx\frac{dE}{dt} = \rho c_p \frac{\partial u}{\partial t} \cdot A \cdot dx

Why Partial Derivative?

We use u/t\partial u/\partial t because temperature uu depends on both xx and tt. The partial derivative means "rate of change with time, holding position fixed."


Fourier's Law of Heat Conduction

Energy conservation tells us that temperature changes due to heat flux, but we need another equation to relate the heat flux to temperature. This is Fourier's Law, the "constitutive relation" for heat conduction:

Fourier's Law

q=kuxq = -k \frac{\partial u}{\partial x}
Heat flux = -Conductivity × Temperature gradient
Fourier's Law of Heat Conduction

Heat flux is proportional to the negative temperature gradient: q = -k (dT/dx)

q = -k · (dT/dx)
40.0
Heat Flux
0.50
Conductivity
-80.0
Gradient
Key Insight: The negative sign in Fourier's law ensures heat flows from high to low temperature (down the temperature gradient). This is the "constitutional law" that, combined with energy conservation, gives us the heat equation.

The Meaning of the Negative Sign

The negative sign is crucial! It encodes the Second Law of Thermodynamics:

  • If temperature increases to the right (u/x>0\partial u/\partial x > 0), heat flows to the left (q<0q < 0)
  • If temperature decreases to the right (u/x<0\partial u/\partial x < 0), heat flows to the right (q>0q > 0)
  • Heat always flows from hot to cold — down the temperature gradient

Conductivity k

The thermal conductivity kk measures how easily heat flows through a material. Metals have high kk (good conductors); insulators like wood or plastic have low kk.


The Derivation: Putting It Together

Now we combine energy conservation with Fourier's law to derive the heat equation. Follow the step-by-step walkthrough below:

📚Step-by-Step Derivation

Follow the mathematical derivation of the heat equation from first principles

Step 1 of 8

Start with Energy Conservation

Rate of energy change = Net heat flux in

Consider a small segment of a rod from x to x + dx. The thermal energy inside this segment can only change if heat flows in or out through the boundaries.

💡
Insight: This is the first law of thermodynamics applied to a control volume!

The Final Result

Combining everything, we arrive at the heat equation:

ut=α2ux2\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}
where α=kρcp\alpha = \frac{k}{\rho c_p} is the thermal diffusivity

Physical Interpretation

The heat equation says: Temperature at a point changes based on how different it is from its neighbors.

Curvature∂²u/∂x²Result
Point is HOTTER than neighbors< 0 (concave down)Temperature decreases ↓
Point is COLDER than neighbors> 0 (concave up)Temperature increases ↑
Point equals neighbor average= 0 (flat)No change

Thermal Diffusivity: The Speed of Heat

The parameter α=k/(ρcp)\alpha = k/(\rho c_p) is called thermal diffusivity. It determines how fast heat spreads through a material.

Understanding Diffusivity

α=kρcp\alpha = \frac{k}{\rho c_p}
Diffusivity formula
[α]=m2s[\alpha] = \frac{\text{m}^2}{\text{s}}
Units of diffusivity
tL2αt \sim \frac{L^2}{\alpha}
Time to diffuse distance L

Intuition for Diffusivity

  • High k (good conductor): Heat flows easily through the material → faster diffusion
  • High ρcp (large thermal mass): Lots of energy needed to change temperature → slower diffusion
  • Metals: High k and moderate ρcp → high diffusivity (copper: α ≈ 111 mm²/s)
  • Insulators: Low k → low diffusivity (wood: α ≈ 0.08 mm²/s)

Material Comparison

Comparing Thermal Diffusivity Across Materials
🐍thermal_diffusivity.py
4Material Properties

Three properties determine heat conduction: thermal conductivity k, density ρ, and specific heat capacity c_p. These combine to give thermal diffusivity.

27Thermal Diffusivity Formula

α = k/(ρc_p) has units [m²/s]. High conductivity OR low heat capacity means fast diffusion. Metals have high α due to high k; air is slow despite low ρc_p because k is tiny.

31Characteristic Diffusion Time

The time for heat to spread a distance L scales as t ~ L²/α. This is why insulation works: doubling thickness increases the time for heat to penetrate by 4x!

40 lines without explanation
1import numpy as np
2
3# Material properties for common materials
4materials = {
5    "Copper": {"k": 401, "rho": 8960, "cp": 385},      # W/(m·K), kg/m³, J/(kg·K)
6    "Aluminum": {"k": 237, "rho": 2700, "cp": 897},
7    "Steel": {"k": 50, "rho": 7800, "cp": 500},
8    "Glass": {"k": 1.0, "rho": 2500, "cp": 840},
9    "Wood": {"k": 0.12, "rho": 600, "cp": 2400},
10    "Water": {"k": 0.6, "rho": 1000, "cp": 4186},
11    "Air": {"k": 0.025, "rho": 1.2, "cp": 1005},
12}
13
14print("Thermal Diffusivity Comparison")
15print("=" * 50)
16print(f"{'Material':<12} {'k (W/mK)':<12} {'α (m²/s)':<15} {'1cm in...':>12}")
17print("-" * 50)
18
19for name, props in materials.items():
20    k = props["k"]
21    rho = props["rho"]
22    cp = props["cp"]
23
24    # Thermal diffusivity: α = k / (ρ * c_p)
25    alpha = k / (rho * cp)
26
27    # Time for heat to diffuse 1 cm (characteristic time L²/α)
28    L = 0.01  # 1 cm in meters
29    t_diffuse = L**2 / alpha
30
31    if t_diffuse < 1:
32        time_str = f"{t_diffuse*1000:.1f} ms"
33    elif t_diffuse < 60:
34        time_str = f"{t_diffuse:.1f} s"
35    else:
36        time_str = f"{t_diffuse/60:.1f} min"
37
38    print(f"{name:<12} {k:<12.1f} {alpha:<15.2e} {time_str:>12}")
39
40# Key insight: Heat spreads as sqrt(t), not linearly!
41print("\n" + "=" * 50)
42print("Key insight: Distance ~ sqrt(diffusivity × time)")
43print("Double the distance takes 4x the time!")

The √t Scaling Law

Heat spreads a distance LL in time tL2/αt \sim L^2/\alpha. This means:

  • Double the distance → 4× the time (heat spreads sub-linearly)
  • This is why insulation works! Doubling insulation thickness quadruples the protection time
  • Same reason Brownian motion scales as √t

The Heat Kernel: The Fundamental Solution

What happens if we start with all heat concentrated at a single point? The answer is the heat kernel (also called the fundamental solution or Green's function):

The Heat Kernel

G(x,t)=14παtexp(x24αt)G(x,t) = \frac{1}{\sqrt{4\pi\alpha t}} \exp\left(-\frac{x^2}{4\alpha t}\right)
A Gaussian with standard deviation σ = √(2αt) that spreads with time
🌡The Heat Kernel (Fundamental Solution)

The Gaussian that spreads: G(x,t) = (1/√(4παt)) · exp(-x²/(4αt))

As t increases, the Gaussian spreads

Key Properties:

  • • Integral always = 1 (conservation)
  • • Width grows as √t
  • • This is a Gaussian with σ = √(2αt)
  • • Foundation of diffusion models in ML!
Connection to ML: In diffusion models (DALL-E, Stable Diffusion), we progressively add Gaussian noise to images. The noise variance grows with time exactly like the heat kernel spreads. The heat equation describes the forward process of these generative models!

Properties of the Heat Kernel

  1. It's a Gaussian: The bell curve shape is characteristic of diffusion processes
  2. Total integral = 1: G(x,t)dx=1\int_{-\infty}^{\infty} G(x,t)\,dx = 1 (energy is conserved)
  3. Width grows as √t: The standard deviation is σ=2αt\sigma = \sqrt{2\alpha t}
  4. Height decreases as 1/√t: The peak flattens to maintain constant area
  5. As t→0, becomes a delta function: Returns to a point source

The Convolution Solution

The heat kernel gives us a beautiful formula for solving any initial value problem. If the initial temperature is u(x,0)=f(x)u(x,0) = f(x), then:

u(x,t)=G(xy,t)f(y)dy=Gfu(x,t) = \int_{-\infty}^{\infty} G(x-y,t) f(y)\,dy = G * f

The solution is the convolution of the initial condition with the heat kernel. Each point of the initial distribution spreads according to the heat kernel, and we sum all these contributions.


Key Properties of the Heat Equation

1. Smoothing Property

The heat equation smooths out discontinuities immediately. Even if the initial condition has jumps or corners, for any t > 0 the solution is infinitely differentiable.

Instant Smoothing

This is actually controversial physically: it implies that heat "knows" instantly about distant changes (infinite propagation speed). Real heat has finite speed due to molecular interactions. But for most applications, the approximation is excellent.

2. Maximum Principle

The maximum (and minimum) temperature in a domain can only occur:

  • At the initial time (t = 0)
  • On the boundary of the domain

In other words, new extremes cannot form inside the domain. Temperature naturally tends toward the average of its surroundings.

3. Energy Conservation

With appropriate boundary conditions (like insulated ends), the total thermal energy is conserved:

ddtu(x,t)dx=0\frac{d}{dt}\int u(x,t)\,dx = 0

4. Irreversibility

The heat equation is not time-reversible. If you run time backward (t → -t), the equation becomes unstable. This is a manifestation of the Second Law of Thermodynamics: heat diffusion increases entropy.


Connection to Machine Learning: Diffusion Models

One of the most exciting developments in AI is diffusion models, which power image generators like DALL-E, Stable Diffusion, and Midjourney. These models are directly connected to the heat equation!

🤖Heat Equation ↔ Diffusion Models

See how the heat equation's forward process connects to generative AI

Forward
Image → Noise
Heat spreads
Reverse
Noise → Image
Learned denoising

The Connection:

The forward process is exactly the heat equation! Noise variance grows like σ² = 2αt, matching the heat kernel's spreading. AI models learn to reverse this diffusion.

How Diffusion Models Work

  1. Forward Process (Adding Noise): Starting from a clean image, progressively add Gaussian noise. This follows the heat equation — the image "diffuses" into noise.
  2. Train a Denoiser: A neural network learns to predict and remove the noise at each step.
  3. Reverse Process (Generation): Start from pure noise and iteratively denoise. The network guides the reverse diffusion, creating realistic images.

The Mathematical Connection

The forward diffusion is described by a stochastic differential equation:

dx=β(t)2xdt+β(t)dWdx = -\frac{\beta(t)}{2}x\,dt + \sqrt{\beta(t)}\,dW

The probability density p(x,t)p(x,t) of the noised images satisfies:

pt=β(t)2((xp)+2p)\frac{\partial p}{\partial t} = \frac{\beta(t)}{2}\left(\nabla \cdot (xp) + \nabla^2 p\right)

This is essentially a heat equation in the space of images! The noise variance grows like σ2=0tβ(s)ds\sigma^2 = \int_0^t \beta(s)\,ds, analogous to the heat kernel's spreading.

Why This Matters for ML

Understanding the heat equation gives you deep insight into:

  • Why diffusion models work (the forward process has a known solution)
  • How to choose the noise schedule (β(t))
  • Why score matching is the right training objective
  • Connections to denoising autoencoders and energy-based models

Python Implementation

Solving with the Heat Kernel

Solving the Heat Equation via Convolution
🐍heat_equation_solution.py
3The Heat Kernel

The fundamental solution is a Gaussian that spreads with time. Its width is proportional to sqrt(t), which is characteristic of all diffusion processes.

18Convolution Solution

The general solution is the convolution of the initial condition with the heat kernel. This is a powerful result: we can solve any initial value problem by integration!

25Green's Function Approach

The heat kernel is the Green's function for the heat equation. It represents the response to a point source of heat at x=0, t=0.

43Step Function Initial Condition

A step function represents an abrupt temperature change. The heat equation smooths this discontinuity into a smooth transition governed by the error function.

56Analytical Solution

For a step function initial condition, the exact solution involves the error function: u(x,t) = (1/2)(1 - erf(x/(2*sqrt(alpha*t)))). This is a classic result!

77 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3from scipy.special import erf
4
5def heat_kernel(x, t, alpha=1.0):
6    """
7    The fundamental solution (heat kernel) of the 1D heat equation.
8
9    G(x,t) = 1/sqrt(4*pi*alpha*t) * exp(-x^2 / (4*alpha*t))
10
11    This is the response to a delta function initial condition.
12    """
13    if t <= 0:
14        return np.where(x == 0, np.inf, 0)
15
16    prefactor = 1.0 / np.sqrt(4 * np.pi * alpha * t)
17    exponent = -x**2 / (4 * alpha * t)
18    return prefactor * np.exp(exponent)
19
20def solve_heat_equation(initial_condition, x, t_values, alpha=1.0):
21    """
22    Solve heat equation using convolution with heat kernel.
23
24    u(x,t) = integral G(x-y, t) * u(y, 0) dy
25
26    This is the fundamental property of linear PDEs:
27    the solution is the initial condition convolved with
28    the fundamental solution (Green's function).
29    """
30    dx = x[1] - x[0]
31    solutions = []
32
33    for t in t_values:
34        if t == 0:
35            solutions.append(initial_condition.copy())
36        else:
37            # Convolution with heat kernel
38            u = np.zeros_like(x)
39            for i, xi in enumerate(x):
40                kernel = heat_kernel(xi - x, t, alpha)
41                u[i] = np.sum(kernel * initial_condition) * dx
42            solutions.append(u)
43
44    return solutions
45
46# Setup
47x = np.linspace(-5, 5, 500)
48alpha = 1.0  # Thermal diffusivity
49t_values = [0, 0.1, 0.5, 1.0, 2.0]
50
51# Initial condition: step function (hot rod on one side)
52u0 = np.where(x < 0, 1.0, 0.0)
53
54# Solve
55solutions = solve_heat_equation(u0, x, t_values, alpha)
56
57# Plot evolution
58plt.figure(figsize=(12, 5))
59
60plt.subplot(1, 2, 1)
61colors = plt.cm.viridis(np.linspace(0, 1, len(t_values)))
62for u, t, c in zip(solutions, t_values, colors):
63    plt.plot(x, u, color=c, linewidth=2, label=f't = {t}')
64plt.xlabel('x')
65plt.ylabel('u(x,t)')
66plt.title('Heat Equation: Step Function Diffusing')
67plt.legend()
68plt.grid(True, alpha=0.3)
69
70plt.subplot(1, 2, 2)
71# Compare with analytical solution (error function)
72for t in [0.1, 0.5, 1.0]:
73    numerical = solutions[t_values.index(t)]
74    analytical = 0.5 * (1 - erf(x / (2 * np.sqrt(alpha * t))))
75    plt.plot(x, numerical, 'b-', linewidth=2, alpha=0.7)
76    plt.plot(x, analytical, 'r--', linewidth=1)
77plt.title('Numerical (blue) vs Analytical (red dashed)')
78plt.xlabel('x')
79plt.grid(True, alpha=0.3)
80
81plt.tight_layout()
82plt.show()

Common Pitfalls

Confusing Flux Direction

Remember: q = -k(∂u/∂x). The negative sign means heat flows opposite to the temperature gradient — from hot to cold. Forgetting this sign leads to equations that predict temperature running "uphill"!

Infinite vs. Finite Domains

The heat kernel formula applies to the infinite line. For finite domains (like a rod), you need boundary conditions and the solution involves Fourier series, not just convolution.

Dimensional Consistency

Always check units! The heat equation requires:

  • [∂u/∂t] = [α][∂²u/∂x²] → K/s = (m²/s)(K/m²) ✓
  • Fourier's Law: [q] = [k][∂u/∂x] → W/m² = (W/m·K)(K/m) ✓

Numerical Stability

When solving the heat equation numerically (finite differences), you must satisfy the CFL condition: αΔt/Δx20.5\alpha \Delta t / \Delta x^2 \leq 0.5. Violating this causes the solution to explode! We'll cover this in the section on finite difference methods.


Test Your Understanding

Test Your UnderstandingQuestion 1 of 8

What physical principle is the foundation for deriving the heat equation?


Summary

We have derived the heat equation from first principles by combining energy conservation with Fourier's law. This parabolic PDE is the prototype for all diffusion phenomena.

Key Equations

EquationNameMeaning
∂u/∂t = α ∂²u/∂x²Heat EquationTemperature change = Diffusivity × Curvature
q = -k ∂u/∂xFourier's LawHeat flows down the temperature gradient
α = k/(ρcₚ)Thermal DiffusivityHow fast heat spreads (m²/s)
G(x,t) = 1/√(4παt) exp(-x²/4αt)Heat KernelFundamental solution (Gaussian)
u = G * fSolution FormulaConvolution of initial condition with kernel

Key Takeaways

  1. The heat equation comes from energy conservation (no heat created or destroyed) plus Fourier's law (heat flows from hot to cold)
  2. The thermal diffusivity α = k/(ρcp) determines how fast heat spreads; higher α means faster diffusion
  3. The second spatial derivative ∂²u/∂x² measures curvature: points hotter than their neighbors cool down, and vice versa
  4. The heat kernel is a Gaussian with width growing as √t — the characteristic signature of diffusion
  5. General solutions are convolutions: each point of the initial condition spreads according to the heat kernel
  6. Diffusion models in AI are built on the same mathematics: the forward process is essentially the heat equation applied to images
  7. The heat equation smooths out sharp features and is irreversible (entropy increases)
The Heat Equation in One Sentence:
"Temperature at each point evolves toward the average of its surroundings, at a rate proportional to how different it is."
Coming Next: In the next section, we'll solve the heat equation on a finite rod with boundary conditions. You'll see how Fourier series provide beautiful solutions that separate space and time dependencies.
Loading comments...