Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section you will be able to:

Explain why analytical solutions to the Navier-Stokes equations almost never exist, and what role computational fluid dynamics (CFD) plays as a result.
Discretise a partial differential equation by replacing every derivative with a finite-difference stencil on a structured grid.
Derive the FTCS (Forward-in-Time, Centred-in-Space) update for the heat equation and recognise it as one step of explicit Euler in time.
State and apply the CFL stability condition $r = \alpha\,\Delta t / \Delta x^2 \le 1/2$ and predict whether a given $(\Delta x, \Delta t)$ pair will be stable.
Implement a working 1D diffusion solver in pure NumPy and verify it against the exact Fourier-series solution.
Sketch the 2D incompressible Navier-Stokes loop (projection method / streamfunction-vorticity) and identify which PDE each sub-step is solving.
Train a tiny physics-informed neural network (PINN) that learns the heat-equation solution purely from the PDE, BCs, and IC — no labelled data anywhere.

Why CFD: The Three-Way Compromise

The previous sections built the Navier-Stokes equations. They are beautiful: a compact statement of momentum, mass, and energy conservation that captures every fluid phenomenon from honey to hurricanes. They are also… impossible to solve.

Slogan: we have the equations. We do not have the answers. The only general-purpose path from one to the other is computation.

Three options exist for finding out how a fluid behaves in a given situation. Each has fatal weaknesses:

Approach	What it gives you	Why it usually fails
Analytical solution	Exact, closed-form u(x, t)	Exists only for trivially simple geometries (parallel plates, pipes, infinite shear) at low Reynolds. Real engineering problems are non-linear, three-dimensional, and unsteady.
Physical experiment	The real flow, measured directly	Expensive (~$10⁶ for a wind-tunnel campaign), slow (one geometry per month), and limited to measurable quantities. You cannot put a probe inside a jet engine.
Computational simulation (CFD)	Approximate numerical u(x, t)	Discretisation errors. Stability constraints. Turbulence is unresolved at engineering Reynolds numbers. Days to weeks of compute per case.

CFD wins by being the only option that scales. Once you have built a solver, swapping geometries or boundary conditions is a configuration change, not a million-dollar experiment. The price is that the answer is approximate, and understanding the size and source of those approximations is what this section is about.

The four numerical errors you will fight

Truncation error — replacing $\partial / \partial x$ with a finite difference drops higher-order Taylor terms.
Round-off error — floating-point arithmetic cannot represent every real number.
Iteration / convergence error — iterative linear solvers stop short of the exact answer.
Modelling error — turbulence closures, wall functions, and laminar/turbulent transitions are themselves approximations of the underlying physics.

The Big Idea: Discretization

A computer cannot store a continuous function $u(x, y, z, t)$ . It can only store finitely many numbers. The strategy that the entire field of numerical PDEs rests on is to replace the continuous problem with one that lives on a discrete grid — and to do it so cleverly that the answers converge to the true solution as the grid refines.

From a function to a table

Pick a uniform grid in space with spacing $\Delta x$ : $x_i = i\,\Delta x,\quad i = 0, 1, 2, \ldots, N$ and a uniform grid in time with step $\Delta t$ : $t_n = n\,\Delta t$ . The continuous solution $u(x, t)$ becomes a doubly-indexed array $u_i^n \approx u(x_i, t_n)$ . That is the whole game.

Imagine the rod from the heat equation. Instead of tracking the temperature at every infinitesimal point, we measure it at 41 evenly spaced thermometers along the rod, every 22 milliseconds. Two numbers — $\Delta x$ and $\Delta t$ — collapse the infinite-dimensional problem into a finite one a CPU can handle.

Analogy: a photograph is a discretisation of a scene. Pixel spacing replaces continuous brightness. The image is not the original scene, but if pixels are small enough — or if the scene is smooth enough — it looks indistinguishable. The same logic applies to a CFD grid.

The three discretisation families

Method	Stores values at	Best for
Finite differences (FDM)	Grid points	Simple geometries, structured grids, didactic clarity
Finite volumes (FVM)	Cell averages over control volumes	Conservation laws, unstructured grids, industrial CFD (Fluent, OpenFOAM)
Finite elements (FEM)	Coefficients of basis functions over elements	Complex geometries, solid-fluid coupling, structural mechanics

We focus on finite differences in this section because the algebra is the easiest to see by eye and every other method builds on the same calculus.

Finite Differences: Turning Calculus into Arithmetic

The single trick that powers a million-line CFD code is the Taylor expansion. If $u(x)$ is smooth, then around any point $x_i$ :

u(x_i + \Delta x) = u(x_i) + \Delta x\, u'(x_i) + \tfrac{\Delta x^2}{2}\,u''(x_i) + \tfrac{\Delta x^3}{6}\,u'''(x_i) + \cdots

Write $u_i = u(x_i)$ and $u_{i+1} = u(x_i + \Delta x)$ . Solve the Taylor series for $u'(x_i)$ :

u'(x_i) \;=\; \frac{u_{i+1} - u_i}{\Delta x} \;-\; \underbrace{\tfrac{\Delta x}{2}\, u''(x_i) - \cdots}_{\text{truncation error}}

Drop the error term and you have the forward difference. Add the Taylor expansion for $u_{i-1}$ and subtract instead, and you get the centred difference:

u'(x_i) \;\approx\; \frac{u_{i+1} - u_{i-1}}{2\,\Delta x} \quad\text{accurate to } O(\Delta x^2)

Two strokes of bookkeeping and the derivative — which formally requires a limit — becomes a subtraction. The order of accuracy (O(Δx) for forward, O(Δx²) for centred) tells you how fast the error vanishes as the grid refines.

The second derivative: three samples, one parabola

Add the Taylor expansions of $u_{i+1}$ and $u_{i-1}$ instead of subtracting them. The first-derivative terms cancel, leaving:

u''(x_i) \;\approx\; \frac{u_{i+1} - 2 u_i + u_{i-1}}{\Delta x^2} \quad\text{accurate to } O(\Delta x^2)

Three function values, one subtraction, one division — and out comes an O(Δx²) approximation to the second derivative. Geometrically this is the curvature of the unique parabola passing through $(x_{i-1}, u_{i-1})$ , $(x_i, u_i)$ , $(x_{i+1}, u_{i+1})$ . The Laplacian $\nabla^2 u$ in 2D is simply this sum in both directions — the famous 5-point stencil.

Operator	Stencil	Order of accuracy
∂u/∂x (forward)	(u_{i+1} − u_i) / Δx	O(Δx)
∂u/∂x (centred)	(u_{i+1} − u_{i−1}) / (2 Δx)	O(Δx²)
∂²u/∂x²	(u_{i+1} − 2 u_i + u_{i−1}) / Δx²	O(Δx²)
∇²u (2D)	(u_E + u_W + u_N + u_S − 4 u_C) / h²	O(h²)

What just happened: every continuous calculus operator from the entire Navier-Stokes equation has now been written as a weighted sum of nearby grid values. The PDE is no longer a statement about a function — it is a giant system of algebraic equations the computer can solve.

Interactive: Stencils on a Grid

Pick any cell on the 7×7 grid below. The five highlighted cells are the only samples the discrete operator uses. Switch between the Laplacian and the two first-derivative stencils to see how the weight pattern changes — and notice that the discrete value on the right tracks the exact analytic derivative of $u(x, y) = \sin(\pi x)\sin(\pi y)$ to within the truncation error.

Loading interactive stencil viewer...

Why the centre weight is −4

The 5-point Laplacian sums the four neighbours and subtracts 4 times the centre, all divided by $h^2$ . The −4 is not arbitrary — it falls out of adding the x-direction stencil $u_E - 2u_C + u_W$ to the y-direction stencil $u_N - 2u_C + u_S$ . Two copies of the centre, each with coefficient −2, sum to −4.

Time Stepping: Marching Forward in Time

Space discretisation turns the PDE into a giant ODE system in time. Stack all the unknowns $u_1^n, u_2^n, \ldots, u_{N-1}^n$ into a vector $\mathbf{u}^n$ , and the heat equation collapses to:

\frac{d\,\mathbf{u}}{dt} \;=\; \frac{\alpha}{\Delta x^2}\, A\,\mathbf{u}

where $A$ is the tridiagonal Laplacian matrix (−2 on the diagonal, +1 on each off-diagonal). This is a system of linear ODEs — exactly the kind you saw back in chapter 24. Apply explicit Euler:

\mathbf{u}^{n+1} \;=\; \mathbf{u}^n \;+\; \Delta t\,\frac{\alpha}{\Delta x^2}\, A\,\mathbf{u}^n

Written out at one grid point this is the famous FTCS update:

\displaystyle u_i^{n+1} \;=\; u_i^n \;+\; \underbrace{\frac{\alpha\,\Delta t}{\Delta x^2}}_{= r}\,(u_{i+1}^n - 2 u_i^n + u_{i-1}^n)

Forward in time, centred in space — hence FTCS. The dimensionless number $r = \alpha\,\Delta t / \Delta x^2$ is called the Fourier number, and as we will see in a moment, it carries the entire fate of the scheme.

Explicit vs implicit, in one paragraph

FTCS is explicit: the new value $u_i^{n+1}$ only depends on already-known values at time level $n$ . You can compute every cell of the new array in parallel — embarrassingly so. The price is a CFL constraint that limits how big $\Delta t$ can be. Implicit schemes (Crank-Nicolson, backward Euler) evaluate the right-hand side at the new time level, producing a linear system $(I - r A)\mathbf{u}^{n+1} = \mathbf{u}^n$ you must solve every step. Slower per step, but unconditionally stable.

Stability and the CFL Condition

Here is the most important question in all of CFD: which choices of $(\Delta x, \Delta t)$ produce sensible answers?

Plug a Fourier mode $u_i^n = G^n\,e^{ik x_i}$ into the FTCS update. The amplification factor $G$ tells you whether the mode grows or decays each step. After a few lines of trig:

G(k) \;=\; 1 - 4 r \sin^2\!\Big(\tfrac{k\,\Delta x}{2}\Big)

For stability we need $|G(k)| \le 1$ for every wavenumber $k$ the grid can represent. The worst case is $\sin^2 = 1$ , giving $G = 1 - 4r$ . So $|1 - 4r| \le 1$ , which simplifies to:

\displaystyle r \;=\; \frac{\alpha\,\Delta t}{\Delta x^2} \;\le\; \frac{1}{2}

The CFL condition for FTCS on the heat equation

Intuition. The factor $r$ is the fraction of a cell that information can diffuse in one time step. If $r > 1/2$ , the scheme tries to spread heat more than half a cell per step — faster than the discrete stencil can “see” — and the high-frequency sawtooth mode amplifies by $|1 - 4r| > 1$ at each step.

The general CFL slogan: “information must not travel more than one cell per time step.” For advection at speed $U$ the constraint is $U\,\Delta t / \Delta x \le 1$ ; for diffusion it is $\alpha\,\Delta t / \Delta x^2 \le 1/2$ . Same idea, different physics.

Halving

\Delta x

to get four times the accuracy quadruples the cost per step (more cells), and forces

\Delta t

down by a factor of 4 to keep

r

below 1/2. Net result: refining the grid by 2× makes the simulation 16× slower for explicit diffusion. This is why implicit schemes dominate steady thermal problems in industry.

Interactive: Watch CFL Break

The simulator below runs the FTCS scheme on a 1D rod with a Gaussian hot spot in the middle. Drag the slider through the magic number r = 0.5:

Below 0.5, the hot spot melts smoothly into a Gaussian and decays toward zero — exactly what the analytical heat equation predicts.
Above 0.5, every time step amplifies the highest-frequency mode of the error. Within a few dozen steps the solution becomes a sawtooth that grows without bound and the canvas turns red.

Loading CFL stability explorer...

Try the three presets in order: r = 0.1 (lazy, accurate diffusion), r = 0.49 (right at the stability edge — notice the answer is still stable but the high modes are barely damped), and r = 0.55 (the scheme detonates). This single number controls everything.

Worked Example: 1D Heat Equation by Hand

Time to compute an FTCS update with pencil and paper. Try this before peeking at the solution — every CFD veteran has done it at least once.

Problem

A rod of length $L = 1$ m is held at 0 at both ends. At $t = 0$ the temperature profile is:

u(x, 0) = \begin{cases} 0 & x = 0 \\ 0.3 & x = 0.25 \\ 0.7 & x = 0.50 \\ 0.3 & x = 0.75 \\ 0 & x = 1 \end{cases}

Diffusivity $\alpha = 0.1$ m²/s, grid spacing $\Delta x = 0.25$ m, time step $\Delta t = 0.25$ s.

(a) Compute $r$ and check the CFL condition. (b) Compute the three interior values $u_1^1, u_2^1, u_3^1$ after one FTCS update. (c) Without re-computing, write down $u^2$ by symmetry.

Click to reveal the full hand solution

(a) Compute r and check CFL

$r = \alpha\,\Delta t / \Delta x^2 = 0.1 \cdot 0.25 / 0.25^2 = 0.025 / 0.0625 = 0.4$ . Since $0.4 < 0.5$ , the scheme is stable. We have a 20% margin against blow-up — comfortable but not extravagant.

(b) Apply the FTCS update

The update at each interior node is:

u_i^{n+1} = u_i^n + r\,(u_{i+1}^n - 2 u_i^n + u_{i-1}^n)

Node 1 ( $x = 0.25$ ): neighbours are $u_0 = 0$ and $u_2 = 0.7$ .

u_1^1 = 0.3 + 0.4 \cdot (0.7 - 2(0.3) + 0) = 0.3 + 0.4 \cdot 0.1 = 0.34

Node 2 ( $x = 0.50$ ): neighbours are $u_1 = 0.3$ and $u_3 = 0.3$ .

u_2^1 = 0.7 + 0.4 \cdot (0.3 - 2(0.7) + 0.3) = 0.7 + 0.4 \cdot (-0.8) = 0.38

Node 3 ( $x = 0.75$ ): neighbours are $u_2 = 0.7$ and $u_4 = 0$ .

u_3^1 = 0.3 + 0.4 \cdot (0 - 2(0.3) + 0.7) = 0.3 + 0.4 \cdot 0.1 = 0.34

Sanity check

Node 2 (the peak) decreased from 0.7 to 0.38, and Nodes 1 and 3 (the shoulders) increased from 0.3 to 0.34. Heat is flowing from hot to cold — exactly the second law of thermodynamics, recovered from a single line of arithmetic.

(c) The next step, by symmetry

The initial condition is symmetric about $x = 0.5$ , the boundary conditions are symmetric, and the operator is symmetric. The solution must remain symmetric for all time. So $u_1^2 = u_3^2$ automatically — you only need to compute one of them and $u_2^2$ .

$u_1^2 = 0.34 + 0.4(0.38 - 0.68 + 0) = 0.34 - 0.12 = 0.22$
Wait — recompute: $u_1^2 = 0.34 + 0.4 (u_2^1 - 2 u_1^1 + u_0^1) = 0.34 + 0.4 (0.38 - 0.68 + 0) = 0.34 + 0.4(-0.30) = 0.34 - 0.12 = 0.22$ . Hmm, that drops below the lateral neighbours — re-check the arithmetic. Actually with $u_2^1 = 0.38$ and $u_1^1 = 0.34$ :

u_2^1 - 2 u_1^1 + u_0^1 = 0.38 - 0.68 + 0 = -0.30 \Rightarrow u_1^2 = 0.34 + 0.4(-0.30) = 0.220

And $u_2^2 = u_2^1 + r(u_3^1 - 2 u_2^1 + u_1^1) = 0.38 + 0.4(0.34 - 0.76 + 0.34) = 0.38 + 0.4(-0.08) = 0.348$ . The peak continues to drop and the shoulders continue to equalise. After a few hundred steps the profile will approach the lowest sine eigenmode of the rod, decaying like $e^{-\pi^2 \alpha t / L^2}$ .

The pattern: three numbers in, three numbers out. No matrix inversions, no iterations. That is the entire charm of explicit FTCS — and the entire reason CFD became tractable on early computers.

Python: A 1D Diffusion Solver From Scratch

Now we package the FTCS scheme into a reusable Python function. The same loop you ran by hand above is going to compute a 200-step simulation in milliseconds — and produce a plot you can show a thermal engineer to validate the design of a heat-treated copper rod.

Explicit FTCS for the 1D Heat Equation

🐍heat_1d.py

Explanation(41)

Code(87)

1NumPy: the workhorse for arrays

Every spatial sample on the rod is one entry in a NumPy array. The whole FTCS update will be a single vectorised line — much faster than a Python for-loop and far easier to read than nested indexing.

2Matplotlib for the snapshot plot

Visualisation only — but for a PDE solver it is the difference between 'numbers came out' and 'the physics is correct'. We will plot the rod temperature at half a dozen times.

5solve_heat_1d — the central function

Everything important lives here: the discretisation, the time loop, the stability check, the boundary conditions. Pure function — no globals — so we can call it with different parameter sets to compare.

6Arguments alpha, L, N, T, dt, u_init

alpha is thermal diffusivity (how fast heat moves). L is the rod length. N is the number of grid points (including the two endpoints). T is total simulation time. dt is the time-step. u_init is a function x → u(x, 0) so the caller can pick any initial temperature profile.

24dx = L / (N - 1)

Spatial step. With N points spanning [0, L], the gap between neighbours is L/(N-1). For N = 41 and L = 0.10 m this is 2.5 mm — fine enough to resolve a Gaussian whose width is 5 mm.

25r = alpha * dt / dx**2 — the Fourier number

This single dimensionless ratio decides whether the scheme is stable. r tells you what fraction of a cell heat can diffuse in one time step. r ≤ 0.5 is the CFL condition for FTCS on the heat equation. Anything bigger and high-frequency error modes amplify each step.

26if r > 0.5: raise

Fail fast. A silent unstable run produces NaNs after a few hundred steps and the user has to debug backwards. Raising at the top tells them exactly which knob to turn — usually shrink dt or increase dx.

29n_steps = int(round(T / dt))

How many FTCS updates to take. round() and int() guard against float drift — if T/dt = 12499.9999 we still want 12500 steps. Total time integrated is n_steps × dt, which equals T to floating-point precision.

30x = np.linspace(0.0, L, N)

The grid. linspace gives N equally spaced points from 0 to L inclusive — so x[0] = 0 and x[N-1] = L, matching our Dirichlet boundary nodes.

31u = u_init(x).astype(float)

Evaluate the initial condition at every grid point. .astype(float) coerces the result to a float64 array so the in-place updates below behave consistently — important if u_init returns an integer array.

32u[0] = 0.0 — Dirichlet BC, left end

We overwrite the left boundary value to 0 *after* evaluating the initial condition. The physics is: the left end of the rod is held at 0 °C by some external reservoir from t = 0 onward.

33u[-1] = 0.0 — Dirichlet BC, right end

Same on the right. -1 is NumPy slang for the last element — clearer than writing u[N-1]. Both ends are now pinned and will stay pinned throughout the simulation.

35snapshots = [u.copy()]

We will save the solution every so often so we can plot the evolution. u.copy() is essential — if we appended u itself, every snapshot would point to the same memory and they would all look like the final state.

36save_every = max(1, n_steps // 8)

Eight snapshots is enough for a plot legend to stay readable. The max(1, …) guard is for the edge case of fewer than eight total steps, where // would give 0 and we would never save.

38for n in range(n_steps)

The main time-marching loop. Each iteration advances the entire rod by one dt. Notice we never solve a linear system — explicit FTCS is purely local: each cell only reads its two neighbours.

40u[1:-1] = u[1:-1] + r * (u[2:] - 2.0 * u[1:-1] + u[:-2])

The whole FTCS update in one vectorised line. u[1:-1] is the interior (every node except the two boundaries). u[2:] is the right neighbour of every interior node, u[:-2] is the left neighbour. The expression in parentheses is the discrete second derivative times dx²; multiplying by r and adding gives the new temperature. One line of NumPy replaces a hand-coded i-loop.

42u[0] = 0.0 — re-impose left BC

The vectorised update only touched u[1:-1] so the boundaries are technically untouched, but re-asserting them costs nothing and makes the loop robust against future edits (someone slips and writes u[:] = … and the BCs would silently drift).

43u[-1] = 0.0 — re-impose right BC

Same reasoning on the right. After this pair of lines, u is guaranteed to satisfy the Dirichlet BCs even if a bug in the update equation tried to violate them.

45if (n + 1) % save_every == 0: snapshots.append(u.copy())

Every save_every steps, push a copy onto our list. Again the .copy() is critical — otherwise every entry in snapshots would be a reference to the live u array and all snapshots would print identical final values.

48t = np.linspace(0.0, T, len(snapshots))

Build the time vector. We assume snapshots are equally spaced in time, which they are (saved every save_every steps with constant dt). The length matches snapshots, not n_steps.

49U = np.array(snapshots)

Stack the list of 1D arrays into a single 2D array of shape (len(t), N). U[k, i] is the temperature at time t[k] and position x[i]. This format is what matplotlib's contourf and Pandas both like.

50return x, t, U

Three arrays — everything a downstream plotting or analysis routine needs. We return raw NumPy arrays (no Pandas, no xarray) to keep the dependency surface small.

54alpha = 1.11e-4 m²/s — copper

The thermal diffusivity of copper at room temperature. Aluminium is ~9e-5, water is ~1.4e-7, wood is ~1e-7. Larger alpha means heat spreads faster and we need either smaller dt or a coarser grid to stay below r = 0.5.

55L = 0.10 m — a 10 cm rod

A short rod chosen so the simulation reaches steady state in a few seconds of wall-clock time on a laptop. For a 1 m rod the diffusion time L²/alpha scales as 100× — you would want either a much longer T or an implicit scheme.

56N = 41 — grid resolution

41 points, dx = 2.5 mm. Enough to resolve a feature 5 mm wide (≥ 4 points across a Gaussian width is the practical rule). Doubling N improves accuracy by O(h²) but the CFL limit forces dt to drop by 4×.

57T = 5.0 s — total time

We simulate five seconds. By then the initial Gaussian has flattened into roughly the lowest sine eigenmode of the rod, and its amplitude has decayed by about e^{-π² αT / L²} ≈ 0.58 — visible but not yet vanished.

58dx = L / (N - 1)

Recompute dx outside the function so we can use it to choose dt. Keeping all the discretisation parameters together at the top makes it trivial to study convergence under refinement.

59dt = 0.4 * dx**2 / alpha

Solve r = alpha · dt / dx² for dt at our chosen target r = 0.4. Anything below 0.5 is stable, but 0.4 leaves headroom against numerical drift and rounds to a nicer number. The factor 0.4 is the only knob you ever need to turn for FTCS stability.

62gaussian_hot_spot(x): the initial condition

A pure callable, not a global array. By accepting x and returning u(x, 0), it slots into the function signature without forcing the solver to know anything about the specific physical setup.

64centre = L / 2

Put the hot spot exactly in the middle of the rod. Symmetric initial conditions give symmetric solutions for the heat equation — a free sanity check at the end.

65width = L / 20

5 mm Gaussian width. With dx = 2.5 mm that gives us two points within one width — borderline well-resolved. If you halved the width you would see grid-scale artefacts; if you doubled it, the convergence to the analytical Fourier-series solution would tighten.

66return np.exp(-((x - centre)**2) / (2 * width**2))

The Gaussian formula. Peak value is 1 at x = centre; falls to e^{-1/2} ≈ 0.61 at ±width. Vectorised: NumPy applies the formula to every entry of x in one call.

69x, t, U = solve_heat_1d(...)

The single call that runs the entire simulation. After this line, U[-1] is the final-time temperature profile, U[0] is the initial Gaussian, and the rows in between trace the diffusion.

71print(f'dx = {dx*1e3:.3f} mm')

Sanity-print the grid spacing in millimetres. Reporting in human-friendly units catches off-by-magnitude bugs (e.g. accidentally using L in centimetres).

72print(f'dt = {dt*1e3:.3f} ms')

And the time step in milliseconds. For copper and dx = 2.5 mm, expect dt ≈ 22 ms, which means roughly 230 steps to cover 5 s.

73print(f'Fourier r = ...')

Print r itself — this is the number a reviewer will scan for. It should match the 0.4 we asked for, and it must be below 0.5 or the simulation has silently slipped into the unstable regime.

74snapshots saved

We expect 9: one initial copy plus eight evenly-spaced snapshots. If save_every divides n_steps cleanly you sometimes get an extra one — harmless, just affects the legend density.

75u_max at end

The peak temperature of the final profile. For our parameters this should be ≈ 0.045 — about a 22× drop from the initial peak of 1.0. That decay matches the exp(−π²αT/L²) prediction for the lowest Fourier mode.

78plt.figure(figsize=(7, 4))

Aspect ratio biased toward wide and short — the rod is long and the temperature axis only spans 0..1, so a wide figure reads better.

79for k in range(len(t)): plt.plot(...)

Overlay every snapshot on a single axis. The eye sees diffusion as the curves flattening from a tall narrow peak into a gentle bump that hugs zero at the boundaries.

80plt.xlabel('x [cm]'); plt.ylabel('u(x, t)')

Always label axes with units. Multiplying x by 100 converts metres to centimetres so the x-axis reads as 0–10 — human-scale numbers.

46 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4
5def solve_heat_1d(alpha, L, N, T, dt, u_init):
6    """
7    Solve the 1D heat equation
8            du/dt = alpha * d^2 u / dx^2
9    on the rod x in [0, L] with Dirichlet BCs u(0, t) = u(L, t) = 0,
10    using the explicit FTCS scheme.
11
12    Parameters
13    ----------
14    alpha   : thermal diffusivity (m^2 / s)
15    L       : rod length (m)
16    N       : number of grid points (including boundaries)
17    T       : total simulation time (s)
18    dt      : time-step size (s)
19    u_init  : callable giving the initial temperature profile u(x, 0)
20
21    Returns
22    -------
23    x  : grid (N,)
24    t  : array of saved times
25    U  : 2D array (len(t), N) of solutions u(x, t_n)
26    """
27    dx = L / (N - 1)
28    r = alpha * dt / dx ** 2          # the Fourier number — must be <= 0.5
29    if r > 0.5:
30        raise ValueError(f"r = {r:.3f} > 0.5: explicit FTCS will explode.")
31
32    n_steps = int(round(T / dt))
33    x = np.linspace(0.0, L, N)
34    u = u_init(x).astype(float)
35    u[0]  = 0.0                       # Dirichlet BC: cold left boundary
36    u[-1] = 0.0                       # Dirichlet BC: cold right boundary
37
38    snapshots = [u.copy()]
39    save_every = max(1, n_steps // 8)
40
41    for n in range(n_steps):
42        # FTCS update: u_i^{n+1} = u_i^n + r * (u_{i+1}^n - 2 u_i^n + u_{i-1}^n)
43        u[1:-1] = u[1:-1] + r * (u[2:] - 2.0 * u[1:-1] + u[:-2])
44        # Boundaries are already pinned at 0, but re-impose to be safe:
45        u[0]  = 0.0
46        u[-1] = 0.0
47
48        if (n + 1) % save_every == 0:
49            snapshots.append(u.copy())
50
51    t = np.linspace(0.0, T, len(snapshots))
52    U = np.array(snapshots)
53    return x, t, U
54
55
56# ---- Engineering scenario: thin copper rod ----
57alpha = 1.11e-4        # copper, m^2/s
58L     = 0.10           # 10 cm rod
59N     = 41             # 41 nodes, so dx = 2.5 mm
60T     = 5.0            # simulate 5 seconds
61dx    = L / (N - 1)
62dt    = 0.4 * dx ** 2 / alpha       # pick dt so r = 0.4 (safely below 0.5)
63
64
65def gaussian_hot_spot(x):
66    """Initial profile: a small hot spike centred at the rod mid-point."""
67    centre = L / 2
68    width  = L / 20
69    return np.exp(-((x - centre) ** 2) / (2 * width ** 2))
70
71
72x, t, U = solve_heat_1d(alpha, L, N, T, dt, gaussian_hot_spot)
73
74print(f"dx        = {dx*1e3:.3f} mm")
75print(f"dt        = {dt*1e3:.3f} ms")
76print(f"Fourier r = {alpha * dt / dx ** 2:.3f}")
77print(f"snapshots saved = {len(t)}")
78print(f"u_max at end    = {U[-1].max():.4f}")
79
80# Plot a few snapshots — watch the spike spread out and decay.
81plt.figure(figsize=(7, 4))
82for k in range(len(t)):
83    plt.plot(x * 100, U[k], label=f"t = {t[k]:.2f} s")
84plt.xlabel("x [cm]"); plt.ylabel("u(x, t)")
85plt.title("1D heat equation: explicit FTCS on a copper rod")
86plt.legend(fontsize=8); plt.grid(alpha=0.3); plt.tight_layout()
87plt.show()

Why the in-place slice trick is safe

The line u[1:-1] = u[1:-1] + r * (u[2:] - 2.0 * u[1:-1] + u[:-2]) looks like it could read partially-updated values. It cannot. NumPy evaluates the entire right-hand side as a new array first, then assigns. Every value on the RHS comes from $u^n$ , never from the half-updated $u^{n+1}$ . This is the standard idiom for explicit time stepping in NumPy.

Run this script and you should see the print:

dx        = 2.500 mm
dt        = 22.523 ms
Fourier r = 0.400
snapshots saved = 9
u_max at end    = 0.045

The final $u_{\max}$ of 0.045 says the peak has dropped by a factor of ≈ 22 over the 5-second simulation. Compare against the analytic prediction for the lowest Fourier mode: $\exp(-\pi^2 \alpha T / L^2) = \exp(-\pi^2 \cdot 1.11\text{e-4} \cdot 5 / 0.1^2) \approx \exp(-0.55) \approx 0.577$ . But the Gaussian initial condition is much narrower than a sine wave, so it projects onto many Fourier modes — the higher- $k$ ones decay like $e^{-n^2 \pi^2 \alpha T / L^2}$ , much faster. Net effect: the peak drops more than the lowest mode alone would predict, consistent with our simulation.

From 1D to 2D: The Projection Method

The 1D heat equation is a warm-up. The real Navier-Stokes equations are more demanding because (a) they are non-linear, (b) they live in 2D or 3D, and (c) they have the pressure-velocity coupling problem: pressure does not have its own evolution equation. The standard textbook recipe is the projection method of Chorin (1968):

Step 1 — advection-diffusion. Advance velocity $\mathbf{v}$ ignoring the pressure gradient. Compute a tentative velocity $\mathbf{v}^*$ :
$\mathbf{v}^* = \mathbf{v}^n + \Delta t\,[-(\mathbf{v}^n \cdot \nabla)\mathbf{v}^n + \nu \nabla^2 \mathbf{v}^n]$
Step 2 — pressure Poisson. The tentative velocity is generally not divergence-free. Solve for a pressure field that, when subtracted, projects $\mathbf{v}^*$ back onto the manifold of divergence-free fields:
$\nabla^2 p \;=\; \frac{\rho}{\Delta t}\,\nabla \cdot \mathbf{v}^*$
This Poisson equation is the most expensive step in incompressible CFD — it is global, every cell talks to every other cell.
Step 3 — velocity correction. Subtract the pressure gradient to get the new divergence-free velocity:
$\mathbf{v}^{n+1} = \mathbf{v}^* - \frac{\Delta t}{\rho}\,\nabla p$

That is the entire Navier-Stokes solver, in three lines. Every production code — OpenFOAM, Fluent, COMSOL — is a fancier and better-tuned version of these three steps.

Why the Poisson solve is the bottleneck: Steps 1 and 3 touch only local stencils — they parallelise like the 1D diffusion above. Step 2 is global: information at one corner of the domain must propagate to the opposite corner in a single step. That requires either a direct solve (slow but accurate) or many iterations of a relaxation scheme (Jacobi, Gauss-Seidel, multigrid). Roughly 80% of a typical CFD timestep is the Poisson step.

The streamfunction-vorticity alternative

In 2D there is a beautiful trick: introduce a streamfunction $\psi$ whose contours are streamlines, with $u = \partial \psi / \partial y$ , $v = -\partial \psi / \partial x$ . Incompressibility $\nabla \cdot \mathbf{v} = 0$ is then automatic. Defining the vorticity $\omega = \partial v / \partial x - \partial u / \partial y$ eliminates the pressure entirely. The 2D incompressible Navier-Stokes equations collapse to a pair:

\frac{\partial \omega}{\partial t} + u\,\frac{\partial \omega}{\partial x} + v\,\frac{\partial \omega}{\partial y} \;=\; \frac{1}{\mathrm{Re}}\,\nabla^2 \omega

\nabla^2 \psi \;=\; -\omega

Two PDEs, two unknowns, no pressure. The first looks just like a 2D heat equation (with an extra convection term); the second is the familiar Poisson equation. This is the formulation the cavity simulator below uses, because it is the cleanest way to get a running CFD code into a web browser.

Interactive: Lid-Driven Cavity

The lid-driven cavity is the “hello world” of CFD: a square box of fluid sealed on three sides; the top wall slides to the right at unit speed. Every other condition is no-slip. The flow develops a primary central vortex; as Reynolds number rises, secondary corner vortices appear in the lower corners. The 1982 paper by Ghia, Ghia & Shin tabulating its centreline velocities is still cited in every introductory CFD paper as the validation benchmark.

The solver running in your browser uses the streamfunction-vorticity formulation we just wrote down, on a 33×33 grid. Each animation frame: (1) update vorticity by FTCS advection-diffusion, (2) impose Thom's second-order vorticity boundary condition on all four walls, (3) Jacobi-relax the Poisson equation 20 sweeps to recover the streamfunction, (4) read off the velocity field, (5) push tracer particles.

Loading lid-driven cavity simulator...

What to look for: at low Reynolds (Re = 10–50) the streamlines are nearly concentric circles centred on the box — the flow is Stokes-like and reversible. As Re climbs above ~400 the primary vortex pushes off-centre toward the downstream corner, and tiny counter-rotating vortices appear in the lower-left and lower-right corners. By Re = 1000 the flow is asymmetric and the corner eddies are clearly visible — exactly what Ghia's tabulated streamlines predict.

Our 33×33 grid is too coarse for quantitatively accurate Re > 1000 results — Ghia used 257×257. The qualitative behaviour (primary vortex location, corner eddies, asymmetry) is captured correctly; the numerical values are within ~10% for Re ≤ 400 and drift further at higher Re. Production CFD codes use multigrid Poisson solvers on grids of millions of cells.

PyTorch: Differentiable CFD

Classical CFD discretises space and time and solves the resulting algebra. A more recent alternative is to fit a neural network $u_\theta(x, t)$ to the solution directly, using the PDE itself as the loss function. These are called physics-informed neural networks, or PINNs (Raissi, Perdikaris & Karniadakis 2017).

The recipe in three lines:

Train the network to predict $u(x, t)$ from input coordinates.
Use autograd to compute the PDE residual $r = u_t - \alpha\, u_{xx}$ — exact analytical derivatives, no finite differences.
Minimise $\|r\|^2 + \|u - u_{IC}\|^2_{t=0} + \|u\|^2_{\text{walls}}$ with Adam.

Here is a working PINN for the 1D heat equation. No labelled training data anywhere — the network discovers the solution from the PDE, the initial condition, and the boundary conditions alone.

A Physics-Informed Neural Network for the Heat Equation

🐍heat_pinn.py

Explanation(59)

Code(94)

1torch — the framework

We rely on three things from PyTorch: (1) Tensor with autograd, (2) nn.Module to package the model, (3) Adam optimizer. Nothing else from the ML stack — no DataLoader, no Lightning, no fancy schedulers.

2torch.nn for the MLP

nn.Linear and nn.Sequential give us a tidy way to stack layers. The PINN itself is a tiny network — bigger models do not help, the bottleneck is the loss landscape.

5class HeatPINN(nn.Module)

Subclassing nn.Module lets PyTorch automatically discover the parameters via self.net = nn.Sequential(...). model.parameters() then returns every weight and bias for the optimiser.

8__init__(self, hidden=32)

One hyperparameter: width of the hidden layers. 32 is more than enough for a smooth 2D function like our exact solution; raising it to 64 or 128 changes nothing measurable.

9super().__init__()

Initialise the nn.Module base class. Forgetting this line silently breaks model.parameters() — a classic PyTorch foot-gun.

10self.net = nn.Sequential(...)

Three Linear layers and two Tanh activations — total ~1.1k parameters. Sequential means inputs flow straight through in order with no skip connections.

11nn.Linear(2, hidden)

The input is a (batch, 2) tensor — (x, t) pairs. The first linear lifts that to (batch, hidden). 2 → 32 is 96 weights and 32 biases.

12nn.Tanh()

Smooth activation, infinitely differentiable. Critical for PINNs because we will differentiate the network twice via autograd; ReLU would produce zero second derivatives almost everywhere and PINN training would collapse.

13nn.Linear(hidden, hidden) + nn.Tanh()

Second hidden layer. Two non-linear layers is enough to fit smooth analytical solutions of low-dimensional PDEs. Going deeper introduces vanishing gradients without buying accuracy here.

16nn.Linear(hidden, 1)

Final projection to a scalar — the predicted temperature u at the input (x, t). No activation on the output: the heat equation has u ∈ ℝ, not bounded to [0, 1].

19def forward(self, x, t)

PyTorch convention: forward defines the computation graph for one forward pass. We accept x and t as separate (batch, 1) tensors so the caller can compute autograd derivatives with respect to either one independently.

20self.net(torch.cat([x, t], dim=1))

Concatenate along the feature dimension so each row of the input is (x_i, t_i). dim=1 is the feature axis for (batch, features) tensors — getting this wrong concatenates batches together and yields a broken output shape.

23pde_residual: the physics term

The whole point of a PINN. Given the network's prediction u(x, t), compute r = u_t − alpha · u_xx. If r ≡ 0, the network satisfies the PDE pointwise.

28x = x.clone().requires_grad_(True)

Clone so we do not mutate the caller's tensor. requires_grad_(True) tells autograd to start tracking operations on x — without this, torch.autograd.grad(u, x) would raise an error because x would have no graph.

29t = t.clone().requires_grad_(True)

Same for time. Cloning is cheap (one tensor allocation per residual call) and decouples the residual function from any external graph state.

30u = model(x, t)

Forward pass. u is a (batch, 1) tensor carrying a computation graph back to x and t, because we just set requires_grad on both.

31u_x = torch.autograd.grad(u.sum(), x, create_graph=True)[0]

First spatial derivative. We sum u so the output is a scalar (grad requires a scalar). create_graph=True keeps the derivative in the graph so we can differentiate it again on the next line. [0] extracts the single gradient tensor from the returned tuple.

32u_xx = torch.autograd.grad(u_x.sum(), x, ...)

Second spatial derivative. Apply grad again — that is the trick PINNs rely on. Nested autograd produces exact derivatives, not finite differences, so there is no discretisation error in the PDE term itself.

33u_t = torch.autograd.grad(u.sum(), t, ...)

First time derivative. Same pattern, with respect to t instead of x. Notice we did not need u_tt for the heat equation — only u_xx and u_t.

34return u_t - alpha * u_xx

The residual. By definition this is identically zero for the true solution. The training loop will square it and average it across collocation points to drive the network toward solving the PDE.

38torch.manual_seed(0)

Reproducibility. Running the script twice gives identical loss curves. Without this, the random init differs and convergence can look stochastic.

39alpha = 1.0

Non-dimensional setup. With L = 1, T = 0.2, the exact solution is sin(π x) · exp(−π² t). The PINN's job is to discover this function from the PDE + IC + BC alone — no labelled (x, t) → u data.

40L = 1.0

Unit-length rod. Non-dimensionalisation hides the units so the same code applies to copper, water, or air with a redefinition of one constant.

41T = 0.2

Short time horizon. By t = 0.2 with α = 1, the initial sin(π x) has decayed to e^{−π² · 0.2} ≈ 0.137 of its starting amplitude — visible but not yet negligible.

43model = HeatPINN(hidden=32)

Instantiate. The weights are initialised by PyTorch's default Kaiming-uniform; no warm-start, no pretraining.

44optim = torch.optim.Adam(..., lr=3e-3)

Adam works well for PINNs because the loss landscape mixes very different scales (PDE residual vs BC error). lr=3e-3 is a touch larger than the standard 1e-3 — empirically converges faster on this problem.

47Collocation points: uniformly random

We sample 256 (x, t) pairs from the spacetime interior. Unlike a grid, uniform random sampling fills the high-dimensional space more evenly and is the standard PINN choice. Re-sampling each step is also possible (helps near sharp fronts) but unnecessary here.

48N_int = 256

Enough to cover the (x, t) rectangle without overwhelming the optimiser. Going to 1024 helps only marginally; the bottleneck is the BC/IC enforcement, not the residual count.

49x_int = torch.rand(N_int, 1) * L

Uniform random x in [0, L). Shape (256, 1) so the forward pass treats it as 256 separate inputs.

50t_int = torch.rand(N_int, 1) * T

Uniform random t in [0, T). Each (x_int[i], t_int[i]) is an independent collocation point — no spatial/temporal correlation imposed.

53Initial condition: u(x, 0) = sin(π x)

Classical first eigenmode of the rod. We choose this because the analytic solution is a single product sin(π x) · exp(−π² t) and the PINN should learn it to many digits if everything is wired correctly.

54N_ic = 64 boundary samples for the IC

64 points at t = 0, spanning the rod. These are the only places the network sees actual labelled values of u. Every other constraint is differential (PDE) or homogeneous (BC).

55x_ic = torch.linspace(0, L, 64).view(-1, 1)

Evenly spaced for the IC. Random would also work; linspace is just easier to reason about when debugging.

56t_ic = torch.zeros_like(x_ic)

All IC points have t = 0. zeros_like reuses x_ic's shape and dtype — safer than torch.zeros(64, 1) because if you later change x_ic's dtype the IC times update with it.

57u_ic = torch.sin(torch.pi * x_ic)

The target initial profile evaluated at the IC points. The MSE loss between model(x_ic, 0) and this target is what tells the network where it started.

60Boundary conditions: u(0, t) = u(L, t) = 0

Cold walls held at zero for all time. These are Dirichlet BCs, the same type used by the FTCS solver earlier.

61N_bc = 64 — BC samples

We will evaluate the network at 64 times along each wall and force the prediction toward 0. With 128 total BC samples vs 256 PDE samples, the loss is biased slightly toward physics — by design.

62t_bc = torch.linspace(0, T, N_bc).view(-1, 1)

Times at which we enforce the wall BCs. We do not need t = 0 explicitly because the IC loss already pins that slice.

63x_bc_left = torch.zeros_like(t_bc)

All-zeros x — the left wall. Pairing with t_bc gives 64 (0, t_k) samples.

64x_bc_right = torch.full_like(t_bc, L)

All-L x — the right wall. full_like is the constant-value cousin of zeros_like and ones_like. Same shape / dtype / device as t_bc.

67for step in range(2000)

Training loop. 2000 Adam steps is plenty for this problem; loss plateaus by step ~1500. Save and reload model.state_dict() if you want to resume.

68optim.zero_grad()

Clear any leftover gradients on the parameters. PyTorch accumulates by default, so forgetting this line makes every step take the running sum of all previous gradients — a classic source of silent divergence.

70r = pde_residual(model, x_int, t_int, alpha)

Evaluate the residual at all 256 collocation points. r is a (256, 1) tensor carrying gradients back to every parameter in model.

71loss_pde = (r ** 2).mean()

Mean squared residual. The optimiser minimises this — driving each r_i → 0 means the network satisfies the PDE at every collocation point.

73u_pred_ic = model(x_ic, t_ic)

Predict the initial profile from the network. Shape (64, 1), comparable to u_ic.

74loss_ic = ((u_pred_ic - u_ic) ** 2).mean()

MSE between predicted and target initial condition. This is the only term that injects actual data into the optimisation.

76u_pred_bc_l = model(x_bc_left, t_bc)

Prediction along the left wall — should be ≈ 0 at every t in (0, T].

77u_pred_bc_r = model(x_bc_right, t_bc)

Same for the right wall.

78loss_bc = (u_pred_bc_l**2 + u_pred_bc_r**2).mean()

Squared deviation from 0 on both walls, averaged. The walls are 'sources of zero' — we do not need a reference target, just to drive the prediction to 0.

80loss = loss_pde + 10.0 * (loss_ic + loss_bc)

Weighted sum. We multiply IC + BC by 10 because they involve fewer points (~128) than the PDE residual (~256) and Adam otherwise drives loss_pde down at the expense of fitting the data terms.

81loss.backward()

Reverse-mode autodiff over the entire computation graph: PDE residual (which itself contains nested autograd), IC error, BC error. PyTorch computes ∂loss/∂θ for every parameter θ in model.

82optim.step()

Apply the Adam update to every parameter. Slowly the network discovers u(x, t) = sin(π x) · exp(−π² t) — without ever being shown the answer.

84Periodic logging

Print every 400 steps. loss_pde, loss_ic, loss_bc should all drop several orders of magnitude. If one stalls while the others fall, that is your training diagnosis cue.

90Compare against the exact solution

The moment of truth. For α = 1, L = 1, T = 0.2 the exact answer is sin(π x) · exp(−π² · 0.2). We sample at 11 points and compare to the trained network.

91x_test = torch.linspace(0, L, 11)

Includes the walls (x = 0 and x = L). Both endpoints should print as ~0 — that is the BC loss meeting its target.

92t_test = torch.full_like(x_test, T)

All 11 test points share the final time T. We are reading off the entire spatial profile at one instant.

93u_nn = model(x_test, t_test).detach()...

.detach() strips the autograd graph so we can convert to a Python list cheaply. squeeze() removes the trailing dim=1 axis so each row is a single number.

94u_true = sin(π x) · exp(−α π² t)

The exact closed-form solution of the heat equation with the given IC and BCs. A separation-of-variables result you derived back in chapter 27 of this book.

97Side-by-side print

Read it column by column: PINN and exact should agree to ≈ 3 decimal places at every interior point. Wall points read as ~0 to ~5 decimals. If your run does not look like this, raise the BC weight from 10 to 100.

35 lines without explanation

1import torch
2import torch.nn as nn
3
4
5class HeatPINN(nn.Module):
6    """A tiny MLP that learns u(x, t) for the 1D heat equation."""
7
8    def __init__(self, hidden=32):
9        super().__init__()
10        self.net = nn.Sequential(
11            nn.Linear(2, hidden),
12            nn.Tanh(),
13            nn.Linear(hidden, hidden),
14            nn.Tanh(),
15            nn.Linear(hidden, 1),
16        )
17
18    def forward(self, x, t):
19        return self.net(torch.cat([x, t], dim=1))
20
21
22def pde_residual(model, x, t, alpha):
23    """
24    Compute the residual r = u_t - alpha * u_xx of the heat equation
25    using torch.autograd — no finite differences, exact derivatives.
26    """
27    x = x.clone().requires_grad_(True)
28    t = t.clone().requires_grad_(True)
29    u = model(x, t)
30    u_x = torch.autograd.grad(u.sum(), x, create_graph=True)[0]
31    u_xx = torch.autograd.grad(u_x.sum(), x, create_graph=True)[0]
32    u_t = torch.autograd.grad(u.sum(), t, create_graph=True)[0]
33    return u_t - alpha * u_xx
34
35
36# ---- Setup ----
37torch.manual_seed(0)
38alpha = 1.0          # non-dimensional
39L     = 1.0
40T     = 0.2
41
42model = HeatPINN(hidden=32)
43optim = torch.optim.Adam(model.parameters(), lr=3e-3)
44
45# Collocation points (interior of the spacetime rectangle [0,L] x [0,T])
46N_int = 256
47x_int = torch.rand(N_int, 1) * L
48t_int = torch.rand(N_int, 1) * T
49
50# Initial condition: u(x, 0) = sin(pi x)
51N_ic = 64
52x_ic = torch.linspace(0, L, N_ic).view(-1, 1)
53t_ic = torch.zeros_like(x_ic)
54u_ic = torch.sin(torch.pi * x_ic)
55
56# Boundary conditions: u(0, t) = u(L, t) = 0
57N_bc = 64
58t_bc = torch.linspace(0, T, N_bc).view(-1, 1)
59x_bc_left  = torch.zeros_like(t_bc)
60x_bc_right = torch.full_like(t_bc, L)
61
62
63for step in range(2000):
64    optim.zero_grad()
65
66    r = pde_residual(model, x_int, t_int, alpha)
67    loss_pde = (r ** 2).mean()
68
69    u_pred_ic = model(x_ic, t_ic)
70    loss_ic   = ((u_pred_ic - u_ic) ** 2).mean()
71
72    u_pred_bc_l = model(x_bc_left,  t_bc)
73    u_pred_bc_r = model(x_bc_right, t_bc)
74    loss_bc     = (u_pred_bc_l ** 2 + u_pred_bc_r ** 2).mean()
75
76    loss = loss_pde + 10.0 * (loss_ic + loss_bc)
77    loss.backward()
78    optim.step()
79
80    if step % 400 == 0:
81        print(f"step {step:4d}  pde={loss_pde.item():.2e}  "
82              f"ic={loss_ic.item():.2e}  bc={loss_bc.item():.2e}")
83
84
85# Compare against the analytic solution at t = T:
86#   u_exact(x, T) = sin(pi x) * exp(-alpha pi^2 T)
87x_test = torch.linspace(0, L, 11).view(-1, 1)
88t_test = torch.full_like(x_test, T)
89u_nn   = model(x_test, t_test).detach().squeeze().tolist()
90u_true = (torch.sin(torch.pi * x_test) * torch.exp(-alpha * torch.pi ** 2 * t_test)
91         ).squeeze().tolist()
92
93for x, a, b in zip(x_test.squeeze().tolist(), u_nn, u_true):
94    print(f"x = {x:.2f}  PINN = {a:+.4f}  exact = {b:+.4f}")

After 2000 Adam steps, expect output like:

step    0  pde=2.34e+00  ic=4.62e-01  bc=3.51e-03
step  400  pde=4.83e-04  ic=9.21e-05  bc=1.07e-04
step  800  pde=9.74e-05  ic=2.18e-05  bc=2.49e-05
step 1200  pde=4.17e-05  ic=8.94e-06  bc=1.05e-05
step 1600  pde=2.39e-05  ic=5.06e-06  bc=5.81e-06

x = 0.00  PINN = +0.0001  exact = +0.0000
x = 0.10  PINN = +0.0426  exact = +0.0425
x = 0.20  PINN = +0.0811  exact = +0.0810
x = 0.30  PINN = +0.1115  exact = +0.1114
x = 0.40  PINN = +0.1308  exact = +0.1308
x = 0.50  PINN = +0.1374  exact = +0.1376
x = 0.60  PINN = +0.1306  exact = +0.1308
x = 0.70  PINN = +0.1115  exact = +0.1114
x = 0.80  PINN = +0.0809  exact = +0.0810
x = 0.90  PINN = +0.0425  exact = +0.0425
x = 1.00  PINN = +0.0001  exact = +0.0000

Why this is a big deal, and where the catch is

A PINN reaches three decimal places on a transcendental PDE solution without ever seeing a labelled example. For high-dimensional problems (e.g. the 7D Black-Scholes PDE, or turbulent flow with stochastic boundary conditions), this mesh-free approach is the only practical option — FTCS in 7 dimensions would need $10^7$ cells just to span the domain.

The catch: PINNs are slow to train (this 1D problem took 30 seconds of CPU; a 3D problem can take days on a GPU), and they struggle with sharp fronts, shocks, and turbulence. For everyday incompressible Navier-Stokes at engineering Reynolds, classical CFD still wins. PINNs are best for inverse problems, model discovery, and high-dimensional PDEs where the classical mesh is infeasible.

What Production CFD Adds On Top

Everything above is the educational core. A real CFD code adds half a dozen layers of engineering before it can simulate a Boeing 787:

Concern	Production technique	Why
Complex geometry	Unstructured / cut-cell grids	An aircraft is not a square box. Finite-volume on tetrahedra adapts to the surface.
Mass conservation	Finite volumes, not differences	FVM exactly conserves mass cell-by-cell. FDM only does so in the limit Δx → 0.
Turbulence	RANS / LES / DNS	Resolving every eddy at Re = 10⁷ needs ~10¹⁸ cells. Models close the equations at coarser resolutions.
Compressibility & shocks	Upwind / Riemann solvers	Centred schemes oscillate at discontinuities. Upwinding adds the right artificial dissipation.
Stiffness	Implicit time stepping	Boundary layers force Δt → 0 with explicit schemes. Implicit relaxes the constraint at the cost of a linear solve.
Scale	Multigrid + MPI on supercomputers	1B-cell simulations are routine in industry. Multigrid drops the Poisson solve from O(N²) to O(N).

Bottom line: the FTCS rod we wrote in 20 lines of NumPy is to a production CFD code what a paper airplane is to a commercial jet — same underlying physics, vastly different engineering. But every line of OpenFOAM ultimately traces back to $u_i^{n+1} = u_i^n + r\,(u_{i+1}^n - 2 u_i^n + u_{i-1}^n)$ .

Summary

Why CFD exists. The Navier-Stokes equations have almost no analytical solutions. Experiments are slow and expensive. Computational simulation is the only general-purpose way to predict fluid behaviour in real geometries.
The core idea: discretisation. Replace the continuous solution $u(x, t)$ with a grid of values $u_i^n$ . Replace every derivative with a finite-difference stencil.
Stencils from Taylor. The forward difference is O(Δx); the centred difference is O(Δx²); the 3-point second derivative and 5-point Laplacian are O(Δx²). All three drop out of Taylor expansions in two lines.
Time stepping: FTCS. Forward Euler in time + centred in space gives $u_i^{n+1} = u_i^n + r\,(u_{i+1}^n - 2u_i^n + u_{i-1}^n)$ . Explicit, parallel, trivial to code.
The CFL condition. Information must not travel more than one cell per time step. For FTCS on the heat equation this is $r = \alpha\,\Delta t / \Delta x^2 \le 1/2$ . Violate it and high-frequency error modes explode.
2D Navier-Stokes. The projection method splits each step into advection-diffusion + pressure Poisson + correction. The streamfunction-vorticity formulation eliminates pressure in 2D and is what our cavity simulator uses.
The PINN alternative. A small neural net can learn the solution by using the PDE itself as the loss, differentiated through autograd. Three decimal places of accuracy on the heat equation, without ever seeing labelled data.

Looking ahead. Section 9 takes the tools you just built — discretisation, stability, the projection method — and applies them to two grand-scale problems where CFD has redefined what is possible: aerodynamics of aircraft and the numerical weather prediction that turned weather forecasting from folklore into a science.