Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section you will be able to:

Recognise a system of first-order ODEs and write it in vector form $\mathbf{Y}'(t) = \mathbf{F}(t, \mathbf{Y})$ .
Identify the coupling between equations — the off-diagonal terms that prevent you from solving one variable at a time.
Reduce any single n-th order ODE to an equivalent first-order system of size n.
Interpret a solution as a curve in the phase plane and as two time-series stacked together.
State the existence-and-uniqueness theorem for systems and explain why a system of size n needs exactly n initial conditions.
Implement a forward-Euler and an RK4 integrator for a 2-D linear system in Python, and reproduce the same trajectory in PyTorch using the matrix exponential.

Why Systems? Two Quantities That Talk to Each Other

Almost nothing in the real world evolves alone.

Up to now every ODE has had one unknown function: $y(t)$ — the temperature of a cooling cup, the charge on a single capacitor, the position of one bouncing mass. That picture works only when the quantity in question does not depend on anything else changing in time. The moment two quantities influence each other's rate of change, a single ODE is no longer enough.

Look around. Predators eat prey, so the rabbit population determines how fast the fox population grows — but the fox population in turn determines how fast rabbits disappear. In an RLC circuit, the rate of change of current depends on the voltage across the capacitor; the rate of change of that voltage depends on the current. In a chemical reactor, the production rate of compound A depends on how much B is present, and the consumption rate of B depends on A. In every one of these examples a single equation can not describe the dynamics — you need at least two, and they have to be solved simultaneously.

The big idea

A system of ODEs describes the joint evolution of several quantities whose rates of change all depend on the same state. The mathematics generalises gracefully: every tool you have for one ODE has an analogue for systems, only now the unknown is a vector $\mathbf{Y}(t) \in \mathbb{R}^n$ and the right-hand side is a vector field $\mathbf{F}: \mathbb{R} \times \mathbb{R}^n \to \mathbb{R}^n$ .

A Concrete Picture: Two Coupled Tanks

Picture two tanks of brine, side by side, connected by a pipe. Pure water flows in from above at a constant rate; salty water flows out the bottom; and brine sloshes back and forth through the connecting pipe at a rate proportional to the difference in salt concentrations. Let $x(t)$ be the mass of salt in tank 1 and $y(t)$ the mass of salt in tank 2.

Track only tank 1 for a moment. Salt enters from tank 2 at rate $\alpha y$ (more salt next door → more salt coming in). Salt leaves through the outlet AND through the connecting pipe at rate $\beta x$ . So

\frac{dx}{dt} = -\beta\, x + \alpha\, y.

Now do the same accounting for tank 2 — by symmetry,

\frac{dy}{dt} = \alpha\, x - \beta\, y.

These two equations are inseparable. You cannot solve the first for $x(t)$ without already knowing $y(t)$ , and vice versa. They form a system:

\begin{cases} x' = -\beta x + \alpha y \\ y' = \phantom{-}\alpha x - \beta y \end{cases}

The off-diagonal couplings ARE the system

If $\alpha = 0$ the two equations decouple: $x' = -\beta x$ and $y' = -\beta y$ . We are back to two independent first-order ODEs, each solvable on its own. The whole interest of a system is in those off-diagonal terms $\alpha x$ and $\alpha y$ that wire the equations together.

Anatomy of a First-Order System

A system of n first-order ODEs in standard form looks like

\begin{cases} y_1' = f_1(t, y_1, y_2, \ldots, y_n) \\ y_2' = f_2(t, y_1, y_2, \ldots, y_n) \\ \;\vdots \\ y_n' = f_n(t, y_1, y_2, \ldots, y_n) \end{cases}

Three properties are worth naming explicitly, because each of them changes the toolkit you reach for.

Property	What it means	What changes if it fails
First-order	Only first derivatives y_i' appear on the left.	Higher orders can ALWAYS be reduced to first-order (see below), so this is no loss of generality.
Autonomous	t does not appear explicitly on the right: f_i = f_i(y_1, …, y_n).	If t appears explicitly we say the system is non-autonomous; the trick `set z = t` reduces this to an autonomous system one size larger.
Linear	Every f_i is a linear combination of the y_j's, possibly with t-dependent coefficients.	Nonlinear systems are the rule in nature (predator-prey, Lorenz, chemical reactors). They rarely have closed-form solutions and motivate the whole subject of phase-plane analysis.

Why first-order matters

Every numerical solver in existence — Euler, RK4, scipy $\text{solve\_ivp}$ , MATLAB ode45, torchdiffeq, even the analog circuits in old wind-tunnel computers — consumes a first-order system. So before you can integrate anything, you must rewrite it in first-order form. This is not a quirk: a first-order system says "given the current state, here is the velocity", which is exactly the information you need to step forward.

The Vector Form $\mathbf{Y}'(t) = \mathbf{F}(t, \mathbf{Y})$

Stack the unknowns into a column vector $\mathbf{Y}(t) = \big( y_1(t),\, y_2(t),\, \ldots,\, y_n(t) \big)^{\top}$ and stack the right-hand sides into a vector-valued function $\mathbf{F}(t, \mathbf{Y}) = \big( f_1, f_2, \ldots, f_n \big)^{\top}$ . Then the entire system collapses to a single, beautiful line:

\boxed{\;\mathbf{Y}'(t) = \mathbf{F}\big(t,\, \mathbf{Y}(t)\big)\;}

This is the master equation for the rest of the chapter. It looks identical to the scalar ODE $y' = f(t, y)$ you already know — and that is deliberate. Every theorem about scalar first-order ODEs (existence, uniqueness, slope fields, Euler's method, RK4, Picard iteration) carries over verbatim, with the single change that $y$ becomes a vector and $f$ becomes a vector field. That uniformity is the entire reason this notation is worth memorising.

Why the vector form pays off immediately

Reading $\mathbf{Y}' = \mathbf{F}(t, \mathbf{Y})$ aloud gives you the right mental picture: "at every point $\mathbf{Y}$ in state space, there is an arrow $\mathbf{F}(t, \mathbf{Y})$ telling you the velocity of the state." The solution is the curve you trace when you start somewhere and let the arrows carry you forward in time. This is the geometric core of the phase-plane picture.

Linear Systems and the Matrix $A$

When every right-hand side $f_i$ is a linear combination of the $y_j$ 's with constant coefficients, the system is linear with constant coefficients:

y_i'(t) = \sum_{j=1}^{n} a_{ij}\, y_j(t).

Stack the coefficients into a matrix $A = (a_{ij}) \in \mathbb{R}^{n \times n}$ and the system becomes

\boxed{\;\mathbf{Y}'(t) = A\, \mathbf{Y}(t)\;}

That is the equation that drives this entire chapter. It is the simplest possible system, and yet it already produces every flavour of behaviour we will study: decay, growth, oscillation, spiralling, saddles. The reason is that the matrix A is a stretching map on the velocity arrows: at state $\mathbf{Y}$ , the system velocity is $A\mathbf{Y}$ . So the geometry of the trajectories is entirely encoded in the geometry of the matrix — eventually, in its eigenvalues and eigenvectors (section 23.2).

One equation. Three viewpoints.

Component view: $x' = -2x + y,\; y' = x - 2y$ . Useful for paper-and-pencil and intuition.
Vector-field view: $\mathbf{Y}' = A\mathbf{Y}$ with $A = \begin{pmatrix} -2 & 1 \\ 1 & -2 \end{pmatrix}$ . Useful for theorems and code.
Geometric view: a vector field in the (x, y)-plane whose arrows are $(A\mathbf{Y})$ . Useful for seeing what the solutions do.

The Phase Plane: A New Way to Look at Solutions

For a scalar ODE you plot $y(t)$ against $t$ . For a 2-D system you have a choice. You can plot two time-series, $x(t)$ and $y(t)$ , side by side. Or you can do something far more revealing: throw away $t$ entirely and plot the curve

(x(t),\, y(t)) \;\subset\; \mathbb{R}^2.

The plane $\mathbb{R}^2$ is called the phase plane, and the curve is called the orbit or trajectory of the solution. Why is this view so useful? Because the system itself tells you, at every point in the plane, the direction the orbit must go: the velocity vector at $(x, y)$ is exactly $(x', y') = \mathbf{F}(x, y)$ . Draw those arrows on a grid and you get a vector field — every trajectory is just a curve tangent to that field at every point.

The phase plane turns "solve a system of ODEs" into "follow the arrows." It is the same idea as a wind map, where streamlines are exactly the paths a leaf would trace.

In higher dimensions the same picture lives in $\mathbb{R}^n$ and is called phase space. The Lorenz system lives in $\mathbb{R}^3$ and its phase-space orbit is the famous butterfly attractor. We will keep to $n=2$ for most of this chapter because it is the only case you can see.

Reduction: Every High-Order ODE Hides a System

Why do we focus on first-order systems? Because any higher-order ODE can be rewritten as one. This is the single most important trick in numerical ODEs.

Take the second-order spring-mass-damper equation

m\, x''(t) + c\, x'(t) + k\, x(t) = 0.

Introduce the new variable $v(t) := x'(t)$ (velocity). Then $v' = x''$ , and the original equation becomes $m v' + c v + k x = 0$ , i.e. $v' = -\tfrac{k}{m} x - \tfrac{c}{m} v$ . Together with the definition $x' = v$ , we have a 2-D first-order system:

\begin{cases} x' = v \\ v' = -\frac{k}{m}\, x - \frac{c}{m}\, v \end{cases} \;\Longleftrightarrow\; \mathbf{Y}' = A\,\mathbf{Y}, \;\; A = \begin{pmatrix} 0 & 1 \\ -k/m & -c/m \end{pmatrix}.

The same trick — "rename higher derivatives as new variables" — converts an $n$ -th order ODE into an $n$ -dimensional first-order system. The characteristic polynomial of the original ODE becomes the characteristic polynomial of $A$ : the roots of $\det(\lambda I - A) = 0$ are exactly the roots $r_1, r_2$ of $m r^2 + c r + k = 0$ from section 22.1. We did not lose anything by going to a system — we just made the structure more uniform.

The unification

A scalar n-th order ODE, a system of first-order ODEs, and a linear operator on $\mathbb{R}^n$ : these are three views of the same object. Each view is best for a different purpose. High-order ODEs are easiest to write down from physics. First-order systems are easiest to integrate numerically. Matrices are easiest to analyse with eigenvalues. Knowing how to translate between them is a superpower.

Existence and Uniqueness for Systems

The existence-and-uniqueness theorem for one ODE generalises without change to systems, once we replace scalar absolute values by vector norms.

Existence and uniqueness (system version)

Consider the initial-value problem

\mathbf{Y}'(t) = \mathbf{F}(t, \mathbf{Y}), \quad \mathbf{Y}(t_0) = \mathbf{Y}_0,

where $\mathbf{F}: \mathbb{R} \times \mathbb{R}^n \to \mathbb{R}^n$ is continuous in $t$ and Lipschitz in $\mathbf{Y}$ on a neighbourhood of $(t_0, \mathbf{Y}_0)$ . Then a unique solution $\mathbf{Y}(t)$ exists on some open interval around $t_0$ .

Two facts follow immediately and are worth keeping in your head:

A system of size $n$ needs exactly $n$ initial conditions — one per state variable. Two for a 2-D system; three for the Lorenz system; six for the position-plus-velocity of a particle in 3-D.
For a linear system $\mathbf{Y}' = A\mathbf{Y}$ with $A$ constant, the right-hand side is globally Lipschitz, so the solution exists for all $t \in \mathbb{R}$ . No blow-up, ever. For nonlinear systems (predator-prey, Lorenz) the solution may only exist on a bounded interval.

Two orbits can never cross

Uniqueness has a striking geometric consequence in the phase plane: two distinct trajectories of an autonomous system can never cross. If they did, the crossing point would be a state from which two different futures emerge — violating uniqueness. This is the reason phase portraits look so organised: the orbits foliate the plane.

Worked Example (Step-by-Step)

Let us solve the 2-D linear system

\begin{cases} x' = -2x + y \\ y' = \phantom{-}x - 2y \end{cases} \quad \text{with} \quad x(0) = 3,\; y(0) = 1.

We will derive the closed form by hand (so we have something to check our code against), and along the way we will preview the eigenvalue method that section 23.2 develops in full.

▶ Click to expand the full pen-and-paper derivation

Step 1 — Write the system in matrix form

Stack the unknowns into $\mathbf{Y} = \big(x,\, y\big)^{\top}$ . Read off the coefficients to get

A = \begin{pmatrix} -2 & 1 \\ 1 & -2 \end{pmatrix}, \qquad \mathbf{Y}' = A\,\mathbf{Y}.

Step 2 — Try the exponential ansatz $\mathbf{Y}(t) = e^{\lambda t}\,\mathbf{v}$

Inspired by the success of $y = e^{rt}$ for scalar linear ODEs, we guess $\mathbf{Y}(t) = e^{\lambda t}\,\mathbf{v}$ for some constant vector $\mathbf{v}$ and scalar $\lambda$ . Then $\mathbf{Y}' = \lambda e^{\lambda t}\,\mathbf{v}$ and $A\mathbf{Y} = e^{\lambda t}\, A\mathbf{v}$ . Cancel $e^{\lambda t} \neq 0$ from both sides:

A\,\mathbf{v} = \lambda\,\mathbf{v}.

That is exactly the eigenvalue equation. So a system of linear ODEs reduces to a linear-algebra problem: find the eigenvalues and eigenvectors of $A$ .

Step 3 — Solve the characteristic polynomial

The eigenvalues come from $\det(A - \lambda I) = 0$ :

\det\begin{pmatrix} -2-\lambda & 1 \\ 1 & -2-\lambda \end{pmatrix} = (-2-\lambda)^2 - 1 = 0.

Expand: $(\lambda+2)^2 = 1$ , so $\lambda+2 = \pm 1$ , giving $\lambda_1 = -1$ and $\lambda_2 = -3$ . Both eigenvalues are real and negative — we will see in section 23.4 that this guarantees the origin is a stable node.

Step 4 — Find the eigenvectors

For $\lambda_1 = -1$ solve $(A + I)\mathbf{v} = \mathbf{0}$ :

\begin{pmatrix} -1 & 1 \\ 1 & -1 \end{pmatrix} \mathbf{v}_1 = \mathbf{0} \quad \Longrightarrow \quad \mathbf{v}_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix}.

For $\lambda_2 = -3$ solve $(A + 3I)\mathbf{v} = \mathbf{0}$ :

\begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} \mathbf{v}_2 = \mathbf{0} \quad \Longrightarrow \quad \mathbf{v}_2 = \begin{pmatrix} 1 \\ -1 \end{pmatrix}.

Step 5 — Write down the general solution

By linearity, any linear combination of the two modes is also a solution:

\mathbf{Y}(t) = c_1 e^{-t}\, \begin{pmatrix} 1 \\ 1 \end{pmatrix} + c_2 e^{-3t}\, \begin{pmatrix} 1 \\ -1 \end{pmatrix}.

Step 6 — Apply the initial conditions

At $t = 0$ : $c_1 \binom{1}{1} + c_2 \binom{1}{-1} = \binom{3}{1}$ , i.e.

\begin{cases} c_1 + c_2 = 3 \\ c_1 - c_2 = 1 \end{cases} \;\Longrightarrow\; c_1 = 2,\; c_2 = 1.

Step 7 — Final closed form

\begin{cases} x(t) = 2 e^{-t} + e^{-3t} \\ y(t) = 2 e^{-t} - e^{-3t} \end{cases}

Step 8 — Spot-check numerically

At $t = 0.5$ : $2 e^{-0.5} = 1.21306\ldots$ , $e^{-1.5} = 0.22313\ldots$ . So

x(0.5) \approx 1.43619, \qquad y(0.5) \approx 0.98993.

You can verify these against the Python implementation below — both Euler with small $\Delta t$ and RK4 with larger $\Delta t$ reproduce them to several decimal places.

Step 9 — Read off the geometry

The two modes have different time scales: the $\lambda_2 = -3$ mode dies out $3\times$ faster than the $\lambda_1 = -1$ mode.
For large $t$ , the orbit aligns with the slow eigenvector $\mathbf{v}_1 = (1,1)$ — that is the direction the system approaches the origin from.
For short times, the fast mode $\mathbf{v}_2 = (1,-1)$ shapes the initial transient.

Interactive Phase-Portrait Explorer

Set the matrix $A$ with the sliders, drag the green dot anywhere in the plane to choose an initial condition, and watch the orange trajectory unroll. The blue arrows are the vector field $A\mathbf{Y}$ ; the dashed lines (when eigenvalues are real) are the eigenlines — the special directions along which the system moves in a straight line. Try the presets at the bottom right to see all five qualitative flavours of linear systems we will study in the next sections.

Loading phase-portrait explorer…

What to look for

With the default matrix (the one from our worked example), $\lambda_1 = -1$ and $\lambda_2 = -3$ . Both eigenvalues are negative ⇒ every trajectory funnels into the origin (stable node).
Drag the green dot ONTO an eigenline (the green dashed lines). The orbit stays on that line and decays straight to the origin — no curving. That is what the exponential ansatz $\mathbf{Y} = e^{\lambda t} \mathbf{v}$ literally means.
Try the Saddle preset ( $\lambda_1 = +1,\, \lambda_2 = -1$ ). One eigenvalue grows, one decays. Almost every orbit eventually shoots off along the unstable (red) eigenline.
Try the Center preset ( $\lambda = \pm i$ ). Trajectories close into perfect ellipses — undamped oscillation, same as a frictionless spring.

A Nonlinear System: Predator and Prey

Linear systems are clean, but most of nature is nonlinear. The canonical example is the Lotka–Volterra model of prey $x$ and predator $y$ :

\begin{cases} x' = \alpha x - \beta x y \\ y' = -\gamma y + \delta x y \end{cases}

Notice the terms $\beta x y$ and $\delta x y$ : products of the two unknowns. That is what makes the system nonlinear, and it is also exactly what makes it biologically realistic. Predators eat prey at a rate proportional to how often they meet, and meetings happen in proportion to $x \cdot y$ .

The same vector form $\mathbf{Y}' = \mathbf{F}(\mathbf{Y})$ still applies — the only thing that changes is that $\mathbf{F}$ is no longer the linear map $A\mathbf{Y}$ . There is no matrix to diagonalise, no closed-form solution. But the geometric picture is even richer: the orbits form closed loops, and the populations cycle forever in lockstep.

Loading predator–prey simulator…

What the closed loops mean

A closed orbit in the phase plane is a periodic solution: the system returns to its starting state after a fixed time. The cycle of "rabbits multiply → foxes feast → rabbits crash → foxes starve → rabbits recover" is captured exactly by the loop. You will spend much of sections 23.7–23.8 understanding when, and why, nonlinear systems produce closed orbits.

Computation in Python: Integrating a System

Below is a complete Python implementation that integrates the linear system from the worked example two different ways — forward Euler with a tiny step, and classical RK4 with a 10× larger step — and compares both to the closed form we derived by hand. Every step is annotated; click any line in the code panel to see what it does.

Forward Euler and RK4 for a 2-D linear system

🐍systems_euler_rk4.py

Explanation(19)

Code(54)

1Import NumPy

We use NumPy in two distinct roles. First, a NumPy array is our 2-vector Y = [x, y] — the entire state of the system at one instant. Second, the 2×2 matrix A acts on that state through the @ operator. Same library, two roles: container of state and linear operator on state.

6Build the system matrix A

Each ROW of A is one equation of the system. Row 0 says dx/dt = -2·x + 1·y. Row 1 says dy/dt = 1·x + (-2)·y. So A[0][0] is the coefficient of x in the x-equation, A[0][1] is the coefficient of y in the x-equation, and so on. The off-diagonal entries (A[0][1]=1 and A[1][0]=1) are EXACTLY the coupling — the channels through which x and y influence each other. If both off-diagonals were zero, x and y would not talk; they would evolve independently and the whole 'system' would just be two single-variable ODEs in disguise.

9Right-hand side f(Y)

f(Y) computes dY/dt for one state Y. The expression A @ Y is matrix-vector multiplication: it builds a new 2-vector whose i-th component is the dot product of row i of A with Y. So A @ [x, y] returns exactly [-2x+y, x-2y]. We wrote the whole right-hand side as ONE matrix multiplication — that is the entire reason we built A in the first place. Every numerical solver (Euler, RK4, scipy.solve_ivp, torchdiffeq) needs a function that maps a state to its time derivative; that function is f.

13Forward Euler

The simplest integrator. From state Y_i at time t_i, you estimate Y_{i+1} by stepping forward in the direction of the current derivative: Y_{i+1} = Y_i + dt · f(Y_i). It is the discrete analog of 'follow the tangent line for a tiny step'. Local error is O(dt²), global error O(dt). Cheap, but you usually need a very small dt to stay accurate or even stable.

14Number of time points

We integrate over [0, T] with step dt, so we visit n = T/dt + 1 grid points (including t=0). Smaller dt → larger n → more work. For dt=0.001 and T=2.0, n=2001, so Euler runs 2000 update steps.

15Time grid

np.linspace(0.0, T, n) creates an evenly spaced array [0, dt, 2·dt, …, T]. Same time grid we will compare against the exact closed-form solution at the end.

16Allocate state history

We pre-allocate one big (n, 2) array Ys instead of growing a Python list. Each row Ys[i] is the 2-vector state [x(t_i), y(t_i)]. Pre-allocation gives contiguous memory + predictable layout for plotting and analysis later.

17Initial condition

Ys[0] = Y0 plants the seed. A system of two first-order ODEs needs TWO numbers to start (one per equation, one per state component). Without both, the future is not determined.

19The Euler loop

Walk forward in time, one step at a time. Y_{i+1} = Y_i + dt · A @ Y_i. The arithmetic is identical to scalar Euler — the only change is that '+', '*', and the right-hand side now all act on 2-vectors. That generality is the entire point: any first-order vector ODE can be integrated by exactly this loop.

22RK4: a smarter integrator

Forward Euler measures the slope only at the start of the step, so it accumulates error fast. Runge–Kutta 4 evaluates the slope at FOUR positions inside one step (start, two midpoints, end) and takes a weighted average. The weights (1, 2, 2, 1) and the midpoint construction are chosen so that the local error is O(dt⁵) and the global error is O(dt⁴). Result: RK4 with dt=0.01 is more accurate than Euler with dt=0.001, while doing far fewer steps.

31k1 — slope at the start of the step

k1 = f(Y_i) is just the current derivative. Forward Euler would use k1 alone and stop. RK4 uses k1 only as a probe to estimate where the midpoint of the step would be.

32k2 — slope at the predicted midpoint

k2 = f(Y_i + (dt/2)·k1). We take half a Euler step using k1 to predict the midpoint state, then evaluate the slope THERE. This gives a much better estimate of the average slope across the step.

33k3 — refined midpoint slope

k3 = f(Y_i + (dt/2)·k2). Same midpoint, but the midpoint state is re-predicted using k2 (a better slope than k1). This 'predictor-corrector' refinement is one of the reasons RK4 is so accurate.

34k4 — slope at the end of the step

k4 = f(Y_i + dt·k3). A full step ahead using the refined midpoint slope, giving an estimate of the slope at the end of the interval.

35Weighted average update

Y_{i+1} = Y_i + (dt/6)·(k1 + 2·k2 + 2·k3 + k4). The weights 1:2:2:1 come from matching the Taylor expansion of the true solution to fourth order. The endpoints get weight 1, the midpoints get weight 2, the divisor 6 = 1+2+2+1. This single formula is the workhorse of almost every textbook ODE simulation.

39Initial state

We choose x(0)=3, y(0)=1 to match the worked example by hand. Two numbers — never one. A system of N first-order ODEs needs N initial conditions.

41Two integrations with very different step sizes

Euler uses dt=0.001 (tiny) — that is 2000 steps. RK4 uses dt=0.01 (10× larger) — that is only 200 steps. RK4 will still be much more accurate. This is the practical lesson: high-order methods buy you accuracy AND speed.

45Closed-form solution

From the worked example we derived x(t) = 2 e^(-t) + e^(-3t) and y(t) = 2 e^(-t) - e^(-3t). The two exponentials are the two normal modes of the matrix A: e^(-t) corresponds to eigenvalue λ=-1 (slow decay along the [1,1] direction) and e^(-3t) to λ=-3 (fast decay along [1,-1]).

48Error check

np.max(np.abs(...)) gives the worst-case error across the whole grid. With RK4 at dt=0.01 it will be of order 1e-7, far below anything we could see in a plot. This is how every solver is validated in research: pick a problem with known closed form, integrate, compare.

35 lines without explanation

1import numpy as np
2
3# We study the linear system
4#   dx/dt = -2 x + y
5#   dy/dt =  x - 2 y
6# Written as Y' = A Y  with  Y = [x, y]
7A = np.array([[-2.0,  1.0],
8              [ 1.0, -2.0]])
9
10def f(Y):
11    """Right-hand side of the system. Returns dY/dt as a 2-vector."""
12    return A @ Y
13
14def euler(Y0, T, dt):
15    """Forward Euler: simplest possible integrator."""
16    n  = int(T / dt) + 1
17    ts = np.linspace(0.0, T, n)
18    Ys = np.zeros((n, 2))
19    Ys[0] = Y0
20    for i in range(1, n):
21        Ys[i] = Ys[i-1] + dt * f(Ys[i-1])
22    return ts, Ys
23
24def rk4(Y0, T, dt):
25    """Classical Runge-Kutta 4 for the SAME right-hand side."""
26    n  = int(T / dt) + 1
27    ts = np.linspace(0.0, T, n)
28    Ys = np.zeros((n, 2))
29    Ys[0] = Y0
30    for i in range(1, n):
31        Y  = Ys[i-1]
32        k1 = f(Y)
33        k2 = f(Y + dt * k1 / 2)
34        k3 = f(Y + dt * k2 / 2)
35        k4 = f(Y + dt * k3)
36        Ys[i] = Y + dt * (k1 + 2*k2 + 2*k3 + k4) / 6
37    return ts, Ys
38
39# Initial state: x(0) = 3, y(0) = 1
40Y0 = np.array([3.0, 1.0])
41
42ts_e, Ys_e = euler(Y0, T=2.0, dt=0.001)   # tiny step
43ts_r, Ys_r = rk4  (Y0, T=2.0, dt=0.01)    # 10x larger step
44
45# Hand-derived closed form (see worked example below):
46#   x(t) = 2 e^(-t) + e^(-3 t)
47#   y(t) = 2 e^(-t) - e^(-3 t)
48x_exact = 2 * np.exp(-ts_r) + np.exp(-3 * ts_r)
49y_exact = 2 * np.exp(-ts_r) - np.exp(-3 * ts_r)
50
51err_rk4 = np.max(np.abs(Ys_r[:, 0] - x_exact))
52print("RK4 max error in x:", err_rk4)
53print("x(0.5) RK4:", Ys_r[50, 0], "  exact:", x_exact[50])
54print("y(0.5) RK4:", Ys_r[50, 1], "  exact:", y_exact[50])

Expected output

Running the script produces (your last-digit may differ due to floating-point):

RK4 max error in x: 7.1e-08
x(0.5) RK4: 1.4361914091  exact: 1.4361914796
y(0.5) RK4: 0.9899311094  exact: 0.9899311593

RK4 with $\Delta t = 0.01$ is accurate to seven decimal places — and it ran 10× fewer steps than the Euler version. Higher-order solvers really do save you time.

PyTorch View: Matrix Exponentials for Linear Systems

For a constant-coefficient linear system $\mathbf{Y}' = A\mathbf{Y}$ , there is a beautiful closed form that requires no numerical integration:

\mathbf{Y}(t) = e^{A t}\, \mathbf{Y}(0),

where $e^{A t}$ is the matrix exponential, defined by the same series as the scalar exponential

e^{M} := I + M + \tfrac{1}{2!} M^2 + \tfrac{1}{3!} M^3 + \cdots

This is one of the reasons matrices are so powerful: every scalar identity for $e^x$ has a matrix analogue, and the entire theory of linear ODEs collapses into one line. In PyTorch the routine is $\text{torch.linalg.matrix\_exp}$ and it runs on GPU; we use it as the exact reference solution.

Exact solution of a 2-D linear system via the matrix exponential

🐍systems_matrix_exp.py

Explanation(8)

Code(29)

1Import PyTorch

PyTorch is not just for neural networks. torch.linalg.matrix_exp is the GPU-friendly dense matrix exponential. We use it to write the EXACT solution of a linear constant-coefficient ODE in one line — no Euler, no RK4. The same routine appears inside torchdiffeq for the linear-flow component of neural ODEs.

4Same matrix A

Identical numbers to the NumPy version, just stored as a torch.Tensor with float64 precision. float64 is critical here: matrix exponentials can amplify small errors, and float32 sometimes loses three or four digits of accuracy.

6Initial state Y(0)

Y(0) is a length-2 tensor. PyTorch broadcasts it correctly into the matrix-vector product below.

12Time grid

201 evenly spaced times from 0 to 2.0. Unlike Euler/RK4, the time grid is NOT used to march forward step by step — every time point is computed independently from Y(0) via matrix_exp. That makes the solution embarrassingly parallel and trivially vectorizable on a GPU.

15Stack exp(A · t_i) · Y0 for every t_i

📚 torch.linalg.matrix_exp(M) computes e^M for a square matrix M using a Padé approximation with scaling-and-squaring. For each t_i we form the matrix A · t_i, take its matrix exponential (a 2×2 matrix), and multiply by Y0 to get Y(t_i) as a 2-vector. torch.stack(..., dim=0) glues all the 2-vectors into a single (201, 2) tensor. Result: the entire trajectory in a single tensor, exact up to floating-point round-off.

19Slice out x(t) and y(t)

Ys is (201, 2). Ys[:, 0] selects column 0 → x(t) for all 201 time points. Ys[:, 1] selects column 1 → y(t). The same slicing idiom you used in NumPy.

23Closed-form comparison

Element-wise torch.exp on the time tensor builds the analytic curves x(t) = 2·e^(-t) + e^(-3t) and y(t) = 2·e^(-t) - e^(-3t). Note: torch.exp is the SCALAR exponential applied elementwise. torch.linalg.matrix_exp is the MATRIX exponential. They are NOT the same function, and confusing them is the single most common bug when porting linear-ODE code.

27Final accuracy

matrix_exp is exact up to ~1e-15 in float64 because there is no time discretization at all — the solution is closed-form. This is the gold standard. The price you pay: matrix_exp costs O(n³) per call, so for very large systems people switch to Krylov methods. But for the 2×2, 3×3, 4×4 systems you meet in this chapter, it is the simplest and most accurate tool you have.

21 lines without explanation

1import torch
2
3# Same system as before:  dY/dt = A Y, with Y = [x, y].
4A  = torch.tensor([[-2.0,  1.0],
5                   [ 1.0, -2.0]], dtype=torch.float64)
6Y0 = torch.tensor([3.0, 1.0],   dtype=torch.float64)
7
8# For a LINEAR constant-coefficient system Y' = A Y, the exact solution is
9#     Y(t) = exp(A t) @ Y(0)
10# where exp(.) is the MATRIX exponential, not elementwise exp.
11
12ts = torch.linspace(0.0, 2.0, 201, dtype=torch.float64)
13
14# Vectorized: build exp(A * t_i) for every t_i
15Ys = torch.stack([
16    torch.linalg.matrix_exp(A * ti) @ Y0
17    for ti in ts
18])  # shape (201, 2)
19
20x_pt = Ys[:, 0]
21y_pt = Ys[:, 1]
22
23# Compare against the hand-derived closed form
24x_exact = 2 * torch.exp(-ts) + torch.exp(-3 * ts)
25y_exact = 2 * torch.exp(-ts) - torch.exp(-3 * ts)
26
27err = torch.max(torch.abs(x_pt - x_exact)).item()
28print("max abs error:", err)
29print("x(0.5) =", x_pt[50].item(), "  exact:", x_exact[50].item())

exp vs matrix_exp

Never confuse the scalar $\text{torch.exp}$ with the matrix $\text{torch.linalg.matrix\_exp}$ . The former acts elementwise; the latter is the genuine matrix exponential. Applying $\text{torch.exp}$ to a matrix would give you a matrix of elementwise exponentials — a totally different object that does not solve any ODE. This is the single most common bug when implementing continuous-time models.

Common Pitfalls

Forgetting a coupling term. Writing $x' = -2x$ instead of $x' = -2x + y$ decouples the equations and silently turns your system into two independent ODEs. Always double-check that every state variable that should influence a derivative actually appears on the right.
Wrong number of initial conditions. A 2-D system needs two; a 3-D system needs three. One ic per state component, no exceptions. If you write a 2-D system and only specify $x(0)$ , the future is not determined.
Solving variable-by-variable. A coupled system cannot be solved one variable at a time. You must step the entire vector $\mathbf{Y}$ forward together.
Confusing scalar and matrix exp. For linear systems the closed form is $e^{At} \mathbf{Y}_0$ with the matrix exponential — not the elementwise one.
Euler step too big. Forward Euler is conditionally stable: if $|1 + \Delta t\,\lambda| > 1$ for some eigenvalue $\lambda$ , the numerical solution diverges even when the true solution decays. For our worked example ( $\lambda = -3$ ) this means $\Delta t < 2/3$ . RK4 has a much larger stability region — one reason it is the default.
Reading rows vs columns of A. A row of $A$ tells you how ONE variable is built from all the others. A column of $A$ tells you how ONE variable INFLUENCES the derivatives of all the others. These are dual perspectives — get used to switching between them.

Summary

A system of first-order ODEs describes the joint evolution of several quantities and is written in compact form $\mathbf{Y}' = \mathbf{F}(t, \mathbf{Y})$ .
Coupling lives in the off-diagonal dependences of $\mathbf{F}$ on $\mathbf{Y}$ ; without it, the system is just several independent scalar ODEs.
Any $n$ -th order ODE can be reduced to an $n$ -dimensional first-order system by naming each derivative as a new state variable.
A linear constant-coefficient system has the form $\mathbf{Y}' = A\mathbf{Y}$ ; its geometry is entirely encoded in the eigenvalues and eigenvectors of $A$ .
The phase plane turns the problem into following a vector field — orbits are tangent to $\mathbf{F}$ everywhere, and uniqueness forbids them from crossing.
A first-order system of size $n$ needs $n$ initial conditions; the existence-and-uniqueness theorem for one ODE generalises verbatim.
Numerically, Euler and RK4 work line-for-line the same as in the scalar case — only now the state is a vector.
For linear systems, the exact solution is $e^{At}\mathbf{Y}_0$ with the matrix exponential — a single call to $\text{torch.linalg.matrix\_exp}$ .

Next up (section 23.2) we make the eigenvalue method rigorous and learn to read the qualitative behaviour of any 2×2 linear system directly off its trace and determinant.