Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section you will be able to:

Diagnose whether an initial value problem has a unique solution by inspecting the right-hand side $f(x, y)$ .
State and apply the Picard–Lindelöf theorem and the weaker Peano existence theorem, and tell which one you need.
Recognize Lipschitz continuity geometrically, as a finite slope cone bounding $f$ in the $y$ direction.
Run Picard's iteration by hand on a small problem and watch it converge to the true solution.
Predict how a small perturbation of the initial condition spreads through time, using the Grönwall-style bound $|\Delta y(x)| \le |\Delta y_0|\,e^{L|x - x_0|}$ .
Identify the maximal interval of existence and recognize when a solution blows up in finite time.

The Big Picture: Why We Need Two Theorems

Up to this section we have happily written equations like $\tfrac{dy}{dx} = f(x, y),\; y(x_0) = y_0$ and gone looking for a solution. A natural and slightly alarming question is:

Does a solution exist at all? And if it does, is there only one?

These two questions are not academic. If a numerical integrator can legally land on two completely different solution curves passing through the same initial condition, then every prediction it makes is suspect. If no solution exists at all, the equation is not a model — it is a nonsense statement.

The two milestone theorems

Peano (1886): if $f$ is continuous near the initial point, at least one solution exists locally.

Picard–Lindelöf (1893): if $f$ is also Lipschitz in y, then the solution is unique on some interval.

The gap between "continuous" and "Lipschitz in $y$ " is small but consequential — it is exactly the gap between "solutions exist" and "the world is deterministic." The rest of this section is built around making that gap visible.

The Motivating Disaster: Many Solutions From One Point

Consider the perfectly innocent-looking initial value problem:

\frac{dy}{dx} = 3\,y^{2/3}, \quad y(0) = 0.

The right-hand side $f(x, y) = 3 y^{2/3}$ is continuous everywhere — there are no poles, no logs, no divisions. We can separate variables:

y^{-2/3}\,dy = 3\,dx \;\Longrightarrow\; 3\,y^{1/3} = 3x + C \;\Longrightarrow\; y = (x + C/3)^3.

Plug in $y(0) = 0$ and we get $C = 0$ , so $y(x) = x^3$ is one solution. Great — done?

Not quite. Watch:

$y(x) \equiv 0$ is also a solution. It satisfies $y'(x) = 0 = 3\cdot 0^{2/3}$ everywhere and passes through $(0, 0)$ .
Worse, for any $a > 0$ the piecewise function $y_a(x) = 0$ for $x \le a$ and $y_a(x) = (x - a)^3$ for $x > a$ is also a perfectly valid solution.
That gives us an entire uncountable family of solutions through the single point $(0, 0)$ : the trivial zero solution sits on the x-axis as long as it likes, then leaps off into a cubic at any moment.

Determinism just died

For this IVP the future is not determined by the present. The system can choose, at any time, whether to keep sitting at zero or to leap off. No physical theory built on this equation could ever be reproducible.

The root cause is local: at $y = 0$ the partial derivative $\partial f / \partial y = 2\,y^{-1/3}$ blows up to $+\infty$ . The right-hand side is continuous, but it is far too steep in the $y$ direction near the x-axis. The interactive visualiser below lets you feel this failure with your own mouse.

Interactive: When Does Uniqueness Fail?

Switch between equations using the buttons above the canvas. For the cube-root and square-root cases, watch the dashed purple family of "branching" solutions passing through the highlighted yellow point. Click anywhere to launch your own numerical solution — for the Lipschitz case ( $dy/dx = y$ ) every click follows the same exponential law; for the non-Lipschitz cases nearby clicks can split apart dramatically.

Loading existence & uniqueness explorer…

What to look for

On $dy/dx = y$ , two clicks at the same point trace the same curve up to numerical noise.
On $dy/dx = 3 y^{2/3}$ , every dashed line is a perfectly legal solution through $(0, 0)$ . Determinism is locally broken on an entire interval.
The slope field looks tame everywhere — the catastrophe is hidden in how fast slopes change with y, not in their absolute size.

Lipschitz Continuity: The Fix

The mathematical patch that pins down a unique solution is a mild but crucial smoothness assumption on $f$ in the $y$ direction.

Definition (Lipschitz in y)

The function $f(x, y)$ is Lipschitz in y on a region $R$ if there exists a constant $L \ge 0$ such that for every two points $(x, y_1), (x, y_2) \in R$ :

|f(x, y_1) - f(x, y_2)| \;\le\; L\,|y_1 - y_2|.

The smallest such $L$ is called the Lipschitz constant.

What Lipschitz Means Geometrically

Lipschitz means that the secant slope between two points stacked vertically can never be larger than $L$ . Equivalently — if $f$ is differentiable in $y$ — it just means $|\partial f/\partial y| \le L$ .

Right-hand side	∂f/∂y	Lipschitz in y on ℝ²?
f(x, y) = y	1	Yes (L = 1)
f(x, y) = sin(xy)	x cos(xy) → bounded only when x is bounded	Locally yes; globally no
f(x, y) = y²	2y → unbounded	Locally yes (any bounded box); globally no
f(x, y) = 3 y^(2/3)	2 y^(−1/3) → ∞ at y = 0	No — fails at the x-axis

Lipschitz vs continuously differentiable

Every function with bounded partial derivative is Lipschitz, but the converse fails: $f(y) = |y|$ is Lipschitz with $L = 1$ even though it is not differentiable at zero. Lipschitz is the right hypothesis for uniqueness; it allows kinks.

Interactive: The Lipschitz Cone

For each $f$ in the picker, the visualiser below draws the steepest pair of lines that $f$ could possibly produce on a yellow disk around the clicked point. If those cyan lines stay finite as you slide the radius, $f$ is Lipschitz there. If the local $L$ shoots to infinity as the disk crosses a bad point, you have just witnessed uniqueness failing.

Loading Lipschitz visualization…

The Picard–Lindelöf Theorem

Theorem (Picard–Lindelöf)

Suppose $f(x, y)$ is continuous on a rectangle $R = \{|x - x_0| \le a,\;|y - y_0| \le b\}$ and is Lipschitz in y on R with constant $L$ . Let $M = \max_R |f|$ and $h = \min(a,\;b/M)$ . Then the IVP

\frac{dy}{dx} = f(x, y),\quad y(x_0) = y_0

has a unique solution $y(x)$ defined on $[x_0 - h,\; x_0 + h]$ .

Three pieces of information come bundled in the theorem:

Existence. The solution exists.
Uniqueness. Two solutions of the same IVP must agree on the entire interval $[x_0 - h, x_0 + h]$ .
Length estimate. The interval has length $2h$ , where $h \le b/M$ . Steep right-hand sides (large $M$ ) shrink the guaranteed interval.

A useful sufficient condition

If $f$ and $\partial f/\partial y$ are continuous on a rectangle around $(x_0, y_0)$ , then the Picard–Lindelöf hypothesis is automatically satisfied. This is what you usually check in practice.

Picard Iteration: Solving by Successive Approximation

Picard's proof is constructive — it tells you how to build the unique solution, not just that it exists. The trick is to convert the differential equation into an integral equation:

y(x) \;=\; y_0 + \int_{x_0}^{x} f(t,\, y(t))\,dt.

This identity holds for any candidate solution. It motivates the recursion:

y_{k+1}(x) = y_0 + \int_{x_0}^{x} f\bigl(t,\, y_k(t)\bigr)\,dt, \qquad y_0(x) \equiv y_0.

On the interval where Picard–Lindelöf applies, this sequence converges uniformly to the (unique) solution. The mechanism is a contraction: the operator that maps $y_k \mapsto y_{k+1}$ shrinks the gap between any two candidates by a factor of $Lh$ per step, and $h$ is chosen small enough that $Lh < 1$ .

Interactive: Watching Picard Converge

For the canonical example $dy/dx = y,\; y(0) = 1$ , the recursion produces $y_n(x) = \sum_{k=0}^{n} x^k/k!$ — the partial sums of the Taylor series for $e^x$ . Slide the iteration index to watch the cyan curve climb to the dashed white truth line.

Loading Picard visualizer…

Worked Example by Hand

Apply Picard's recursion to $dy/dx = y,\; y(0) = 1$ step by step. The Lipschitz constant on any rectangle is $L = 1$ , so the Picard–Lindelöf hypothesis holds — uniqueness is guaranteed, and our successive iterates must all converge to the same limit.

▶ Solution (click to expand and work it by hand)

Step 0 — initial guess. Take $y_0(x) \equiv 1$ . This already satisfies $y_0(0) = 1$ ; it just does not solve the ODE yet.

Step 1. $y_1(x) = 1 + \int_0^x y_0(t)\,dt = 1 + \int_0^x 1\,dt = 1 + x.$

Step 2. $y_2(x) = 1 + \int_0^x (1 + t)\,dt = 1 + x + \tfrac{x^2}{2}.$

Step 3. $y_3(x) = 1 + \int_0^x \bigl(1 + t + \tfrac{t^2}{2}\bigr)\,dt = 1 + x + \tfrac{x^2}{2} + \tfrac{x^3}{6}.$

Pattern. By induction $y_n(x) = \sum_{k=0}^{n} \tfrac{x^k}{k!}$ — the partial Taylor sum of $e^x$ . Letting $n \to \infty$ gives $y(x) = e^x$ , the unique solution.

Numerical check at x = 1.

n	y_n(1)	e ≈ 2.718282	\|error\|
0	1.000000	2.718282	1.7183
1	2.000000	2.718282	0.7183
2	2.500000	2.718282	0.2183
3	2.666667	2.718282	0.0516
4	2.708333	2.718282	0.0099
5	2.716667	2.718282	1.6 × 10⁻³
6	2.718056	2.718282	2.3 × 10⁻⁴

Sanity check. The error from step $n$ is the Taylor remainder, which is at most $e\cdot x^{n+1}/(n+1)!$ . For $x = 1, n = 5$ this gives $e/720 \approx 0.0038$ ; our observed error $0.0016$ sits comfortably below the bound.

Now apply uniqueness. Because $f(x, y) = y$ is Lipschitz in $y$ (constant $L = 1$ ), Picard–Lindelöf says $e^x$ is the only solution. So if you ever spot a paper that proposes another solution to this IVP — it is wrong.

Peano's Theorem: Existence Without Uniqueness

Theorem (Peano)

If $f$ is continuous on a rectangle around $(x_0, y_0)$ , then the IVP $y' = f(x, y),\;y(x_0) = y_0$ has at least one solution defined on some interval $[x_0 - h, x_0 + h]$ .

Peano's theorem is what saves the cube-root example from being nonsense. There, $f(x, y) = 3 y^{2/3}$ is continuous everywhere, so a solution exists through every initial point. Peano simply does not promise it is unique.

Hypothesis on f	Existence?	Uniqueness?	Theorem
Continuous	Yes	Maybe not	Peano
Continuous + Lipschitz in y	Yes	Yes	Picard–Lindelöf
Discontinuous	Not guaranteed	—	—

Choose the right theorem

When modelling, always reach for Picard–Lindelöf first — uniqueness is what makes a model predictive. Fall back to Peano only when your $f$ is genuinely non-Lipschitz, and then investigate why: physical models almost never want non-uniqueness, so a failure of Lipschitz is usually a flag that the model is missing something.

Continuous Dependence on Initial Conditions

Uniqueness is great in theory, but in practice we never know $y_0$ exactly. Measurement noise, rounding, and missing decimals all wiggle it. The same proof that gives uniqueness also tells us how much our solution wiggles back:

\bigl|y(x;\, y_0)\, -\, y(x;\, \tilde y_0)\bigr| \;\le\; |y_0 - \tilde y_0|\,e^{L|x - x_0|}.

This is sometimes called the Grönwall estimate. The Lipschitz constant $L$ controls how fast nearby trajectories can diverge. For $dy/dx = y$ with $L = 1$ , two trajectories starting $10^{-6}$ apart can be a factor of $e^{10} \approx 22\,000$ apart after only ten time units. This is the inequality behind every numerical error bound you ever encounter in scientific computing.

Lyapunov exponents and chaos

When the local Lipschitz constant $L$ is large, neighbouring trajectories spread quickly. The long-time average of $\log L$ along an orbit is the Lyapunov exponent. A positive exponent is the mathematical definition of deterministic chaos.

The Maximal Interval of Existence

Picard–Lindelöf is a local theorem. It guarantees a unique solution on a possibly tiny interval. The natural question is: how far can we extend it?

\frac{dy}{dx} = y^2, \quad y(0) = 1.

Solving by separation: $y(x) = 1/(1 - x)$ . The right-hand side is smooth — but the solution blows up at x = 1. We say the solution escapes in finite time.

The maximal interval of existence $(\alpha, \omega)$ is the largest open interval on which the unique solution lives. Picard–Lindelöf can be iterated repeatedly to extend the local interval as long as $y(x)$ stays in the region where $f$ is Lipschitz. The solution can only stop existing for two reasons:

Blow-up: $|y(x)| \to \infty$ as $x \to \omega^{-}$ .
Boundary escape: the trajectory reaches the boundary of the domain where $f$ is defined.

Reading the maximal interval off a solution

The interval depends on the initial condition. For $y' = y^2$ with $y(0) = c > 0$ the solution is $y(x) = c/(1 - c x)$ and the blow-up time is $1/c$ . Bigger initial condition → faster escape.

Python: Picard Iteration From Scratch

Let's implement Picard's recursion in plain Python so you can see every quadrature node and every iterate. We'll apply it to $dy/dx = y, \; y(0) = 1$ — the same example we worked by hand — and print the convergence table.

Picard Iteration in Pure Python

🐍picard_iteration.py

Explanation(16)

Code(53)

1Imports

`math` gives us the true reference value `math.exp(1.0) = e`. `typing` lets us annotate `f` as a callable taking two floats and returning a float — this is the signature of every first-order ODE right-hand side.

3Function signature

We accept the right-hand side `f`, the initial point `(x0, y0)`, the target `x_target` where we want to know the solution, the number of Picard iterations `n_iters`, and `n_quad` quadrature nodes used to approximate the integral.

EXAMPLE

picard_iterate(f=λ x,y: y, x0=0, y0=1, x_target=1, n_iters=6)

18Set up the quadrature grid

We split the interval [x0, x_target] into n_quad equal pieces. `ts[i]` is the i-th node and `h` is the step size. We will use the trapezoidal rule on every pair of adjacent nodes.

EXECUTION STATE

ts[0] = 0.0

ts[100] = 0.5

ts[200] = 1.0

h = 0.005

22Initial guess y_0

Picard's recursion starts from any continuous function passing through (x0, y0). The flat constant `y_0(t) ≡ y0` is the cheapest. We store it as a list of values at every quadrature node.

EXECUTION STATE

y_prev = [1.0, 1.0, …, 1.0] (length 201)

23Result collector

`y_values_at_target` will store y_k(x_target) for every k. The first entry is y_0(x_target) which is just y0 because the initial guess is constant.

25Outer loop — one Picard step per iteration

Each pass through this loop turns y_k into y_{k+1}. The recursion is y_{k+1}(x) = y0 + integral from x0 to x of f(t, y_k(t)) dt.

LOOP TRACE · 3 iterations

k = 0 (build y_1 from y_0 ≡ 1)

integrand f(t, y_0) = y_0(t) = 1 for all t

y_1(t) = 1 + ∫_0^t 1 du = 1 + t

y_1(1) = 2.0

k = 1 (build y_2 from y_1 = 1 + t)

integrand f(t, y_1) = 1 + t

y_2(t) = 1 + t + t²/2

y_2(1) = 2.5

k = 2 (build y_3 from y_2 = 1 + t + t²/2)

integrand f(t, y_2) = 1 + t + t²/2

y_3(t) = 1 + t + t²/2 + t³/6

y_3(1) = ≈ 2.6667

26Anchor at the initial condition

Every Picard iterate must satisfy y_{k+1}(x0) = y0 — that's the definition. We seed the new list with y0 and grow it forward.

27Running integral starts at zero

We accumulate the integral one trapezoidal piece at a time. The running total at the i-th node equals ∫ from x0 to ts[i+1] of f(t, y_k(t)) dt — exactly the displacement of the next iterate from y0.

28Inner loop — trapezoidal rule

We do not have a closed form for the integral, so we approximate it numerically. Trapezoidal rule averages the left and right values of the integrand and multiplies by the width. Over enough nodes it is more than accurate enough.

30Left and right values of f

We evaluate the right-hand side `f` at the two endpoints of the current quadrature interval, using the previous iterate y_k for the second argument.

EXAMPLE

fl = f(0.000, y_prev[0]) = 1.000   ;   fr = f(0.005, y_prev[1]) ≈ 1.005

32Accumulate the trapezoid area

Each small trapezoid contributes (h/2)·(fl + fr) to the integral. Summing them is just Simpson's less-accurate cousin — but for a single Picard step this is more than enough.

33Store the next iterate

y_{k+1}(ts[i+1]) = y0 + (running integral so far). We append it to the new list as we sweep right.

35Promote y_next to y_prev for the next outer loop

Before the next Picard step we forget the older iterate. Memory stays bounded.

36Record the new value at x_target

For the demo we only care about how y_k(x_target) evolves with k. The list `y_values_at_target` will let us print a convergence table.

41Demo: the canonical exponential ODE

f(x, y) = y is the simplest non-trivial example. Its closed-form solution is y(x) = e^x, so we have a ground truth to compare against. The function ignores `_x` because the right-hand side does not depend on x.

45Call the routine and compare

We iterate six times. The printed table makes the convergence rate visible: each Picard step roughly improves the error by a factor of x/k+1, which is the next Taylor remainder term.

EXAMPLE

y_0(1) = 1.000000  |y_0(1) - e| = 1.7183e+00
y_1(1) = 2.000000  |y_1(1) - e| = 7.1828e-01
y_2(1) = 2.500000  |y_2(1) - e| = 2.1828e-01
y_3(1) = 2.666667  |y_3(1) - e| = 5.1615e-02
y_4(1) = 2.708333  |y_4(1) - e| = 9.9485e-03
y_5(1) = 2.716667  |y_5(1) - e| = 1.6152e-03
y_6(1) = 2.718056  |y_6(1) - e| = 2.2607e-04
e        = 2.718282

37 lines without explanation

1import math
2from typing import Callable, List
3
4def picard_iterate(
5    f: Callable[[float, float], float],
6    x0: float,
7    y0: float,
8    x_target: float,
9    n_iters: int,
10    n_quad: int = 200,
11) -> List[float]:
12    """
13    Compute Picard's successive approximations to the IVP
14        dy/dx = f(x, y),    y(x0) = y0.
15
16    We build y_{k+1}(x) = y0 + ∫_{x0}^{x} f(t, y_k(t)) dt
17    by storing each y_k as a dense table of (t, value) samples.
18    """
19    # Discretise the interval [x0, x_target] with n_quad subintervals.
20    ts = [x0 + (x_target - x0) * i / n_quad for i in range(n_quad + 1)]
21    h = (x_target - x0) / n_quad
22
23    # y_0(t) ≡ y0 — the constant initial guess.
24    y_prev = [y0 for _ in ts]
25    y_values_at_target: List[float] = [y_prev[-1]]
26
27    for k in range(n_iters):
28        y_next = [y0]                       # y_{k+1}(x0) = y0 always
29        running_integral = 0.0
30        for i in range(n_quad):
31            # Trapezoidal rule on [t_i, t_{i+1}]
32            fl = f(ts[i], y_prev[i])
33            fr = f(ts[i + 1], y_prev[i + 1])
34            running_integral += 0.5 * h * (fl + fr)
35            y_next.append(y0 + running_integral)
36
37        y_prev = y_next
38        y_values_at_target.append(y_prev[-1])
39
40    return y_values_at_target
41
42
43# Demo: dy/dx = y, y(0) = 1.  True solution is y = e^x.
44def f(_x: float, y: float) -> float:
45    return y
46
47approx = picard_iterate(f, x0=0.0, y0=1.0, x_target=1.0, n_iters=6)
48truth = math.exp(1.0)
49
50for k, val in enumerate(approx):
51    err = abs(val - truth)
52    print(f"y_{k}(1) = {val:.6f}    |y_{k}(1) - e| = {err:.4e}")
53print(f"e        = {truth:.6f}")

Run this and you reproduce the table from the worked example to six decimal places. The convergence is roughly factorial: each step kills about one more decimal because the next dropped Taylor term is $x^{n+1}/(n+1)!$ .

PyTorch: Sensitivity via Autograd

The Grönwall estimate $|\Delta y(x)| \le |\Delta y_0|\, e^{L|x - x_0|}$ is an inequality. For $dy/dx = y$ , equality is achieved — and we can verify it experimentally by asking PyTorch to differentiate the numerical solution with respect to the initial condition.

Sensitivity of y(T) to y₀ with PyTorch Autograd

🐍sensitivity_autograd.py

Explanation(11)

Code(36)

1Import PyTorch

We use PyTorch only as an autograd engine, not for any neural network. Every tensor operation below is recorded so that we can ask for derivatives with `torch.autograd.grad`.

7rk4_solve — a differentiable ODE integrator

Runge–Kutta-4 is a fourth-order accurate explicit integrator. Crucially every line of it is just arithmetic on tensors, so the whole step is differentiable end-to-end.

10Step size h

We divide [0, T] into `n_steps` equal pieces. Smaller h means higher accuracy, at the cost of a deeper autograd graph.

11Initial state

`x` is the independent variable (a plain tensor, no gradient). `y` starts at `y0`, which DOES require gradient. PyTorch will trace every operation on `y0` from here on.

13Inner RK4 stage

k1, k2, k3, k4 are the standard four slopes. Each call to `f` reuses the same right-hand side; autograd records each one. After the loop, `y` is a function of `y0` going through 800 multiplies and 200 additions.

LOOP TRACE · 3 iterations

step 0 (x = 0, y = 1)

k1 = f(0, 1) = 1.0

k2 = f(0.0025, 1.0025) ≈ 1.0025

k3 = ≈ 1.002503

k4 = ≈ 1.005013

Δy = h · weighted_avg ≈ 0.005012

step 1 (x = 0.005, y ≈ 1.005012)

Δy = h · weighted_avg ≈ 0.005037

new y = ≈ 1.010050

after all 200 steps (x = 1.0)

y = ≈ 2.718282 (matches e)

19Return y at the final time

y(T) is a tensor whose computational graph traces back through every RK4 stage to the leaf `y0`. That graph is exactly what autograd needs.

22Define the right-hand side

For dy/dx = y the right-hand side is just y itself. Note the `_x` underscore — we accept the argument so the integrator signature stays generic but we never use it.

26Differentiable initial condition

`requires_grad=True` is the only line that turns this script from a numerical demo into a sensitivity analysis. Without it autograd has nothing to differentiate with respect to.

EXAMPLE

y0 = tensor(1.0, requires_grad=True)

29Run the integrator forward

200 RK4 steps from x = 0 to x = 1. The output `yT` is a scalar tensor close to e ≈ 2.718282.

EXECUTION STATE

yT = tensor(2.718282, grad_fn=<AddBackward0>)

32Ask autograd for dy(T)/dy0

`torch.autograd.grad(output, input)` returns ∂output/∂input by reverse-mode differentiation through the entire 200-step graph. We index `[0]` because autograd returns a tuple — one element per input.

34Compare with the Picard–Lindelöf prediction

The theorem says |y(T; y0) − y(T; y0+δ)| ≤ δ · e^(L·T) where L = 1 for this ODE. Here L·T = 1, so the bound is exactly e. The empirical sensitivity dy(T)/dy0 must equal e^T = e. PyTorch confirms it to floating-point precision.

EXAMPLE

y(T)              = 2.718279   (truth e^T = 2.718282)
dy(T)/dy0         = 2.718279   (Picard–Lindelöf prediction e^T)
Lipschitz bound L = 1   ⇒   |y(T)-y(T;y0+δ)| ≤ δ · e^(L·T) = δ · e ≈ 2.7183·δ

25 lines without explanation

1import torch
2
3# Continuous dependence: how much does y(T) move if we wiggle y0?
4# We solve dy/dx = y from x = 0 to x = T with RK4 and let PyTorch's
5# autograd compute  dy(T)/dy0  directly.
6
7def rk4_solve(f, y0: torch.Tensor, T: float, n_steps: int) -> torch.Tensor:
8    """Forward Euler? No — that's noisy. RK4 is differentiable too."""
9    h = T / n_steps
10    x = torch.tensor(0.0)
11    y = y0
12    for _ in range(n_steps):
13        k1 = f(x, y)
14        k2 = f(x + h / 2, y + h * k1 / 2)
15        k3 = f(x + h / 2, y + h * k2 / 2)
16        k4 = f(x + h, y + h * k3)
17        y = y + h * (k1 + 2 * k2 + 2 * k3 + k4) / 6
18        x = x + h
19    return y
20
21
22def f(_x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
23    return y                                # dy/dx = y
24
25
26y0 = torch.tensor(1.0, requires_grad=True)  # ← variable we differentiate w.r.t.
27T = 1.0
28
29yT = rk4_solve(f, y0, T, n_steps=200)       # y(T) ≈ e^T
30
31# PyTorch tracks every op above and now computes dy(T)/dy0 with one call.
32sensitivity = torch.autograd.grad(yT, y0)[0]
33
34print(f"y(T)              = {yT.item():.6f}   (truth e^T = {2.718282:.6f})")
35print(f"dy(T)/dy0         = {sensitivity.item():.6f}   (Picard–Lindelöf prediction e^T)")
36print(f"Lipschitz bound L = 1   ⇒   |y(T)-y(T;y0+δ)| ≤ δ · e^(L·T) = δ · e ≈ {2.718282:.4f}·δ")

The autograd answer $dy(T)/dy_0 \approx e^T$ reproduces the Grönwall bound exactly. This is precisely how neural-ODE training propagates gradients — the only difference is that $f$ is a neural network instead of an elementary function.

Why This Matters in Practice

Field	What existence/uniqueness buys you
Physics simulation	Knowing the trajectory is uniquely determined means a higher-order integrator is safe to use; otherwise it can fork between numerically equivalent branches.
Control theory	The state feedback law u = K(x) plus dynamics x' = f(x, u) yields a well-posed closed-loop system only if the combined right-hand side is Lipschitz.
Neural ODEs	Backpropagation through an ODE solver requires the right-hand side neural network to be Lipschitz. The Lipschitz constant of f directly controls how stable training is.
Optimization (gradient flow)	Gradient descent θ' = -∇L(θ) is well-posed when ∇L is Lipschitz — i.e. when the loss is L-smooth. This is the assumption behind every convergence theorem.
Population biology	Logistic growth y' = ry(1-y/K) has bounded ∂f/∂y on any compact set, so populations have well-defined trajectories. Models that violate Lipschitz are usually missing a regularising effect.

Common Pitfalls

Continuous ≠ Lipschitz

A function can be perfectly continuous and still fail Lipschitz at a single point — $y^{2/3}$ and $\sqrt{|y|}$ are the canonical examples. Always check the $y$ -direction smoothness separately.

Local vs global

Picard–Lindelöf gives existence on $[x_0 - h, x_0 + h]$ only. Even when $f$ is smooth everywhere, the solution may blow up at finite time (as in $y' = y^2$ ). Do not extrapolate uniqueness past the maximal interval.

Forgetting initial conditions

Without $y(x_0) = y_0$ there is nothing for either theorem to act on — they pin down a unique solution to the IVP, not a unique solution to the bare ODE.

When uniqueness fails, ask why

A non-unique IVP is almost always a sign that the model is missing something. Adding a small viscosity term, a tiny noise, or a higher derivative usually restores Lipschitz — and physical reality.

Summary

Concept	What it says
Peano's theorem	Continuous f ⇒ at least one local solution
Picard–Lindelöf	Continuous + Lipschitz in y ⇒ unique local solution
Lipschitz constant L	Bounds \|f(x, y₁) − f(x, y₂)\| by L\|y₁ − y₂\|
Picard iteration	Constructive scheme that converges to the unique solution
Grönwall estimate	\|y(x; y₀) − y(x; ỹ₀)\| ≤ \|y₀ − ỹ₀\| e^{L\|x − x₀\|}
Maximal interval	Largest open interval on which the unique solution lives
Blow-up time	Finite time at which \|y(x)\| → ∞ — endpoint of maximal interval

Key Takeaways

Existence and uniqueness are separate guarantees. Peano delivers the first; Picard–Lindelöf delivers both — and the second is what makes a differential equation a predictive model.
The decisive hypothesis is Lipschitz in y, which really means "the partial derivative $\partial f/\partial y$ is locally bounded." Smoothness in $x$ is not enough.
Picard's constructive proof is also a numerical algorithm. For $dy/dx = y, y(0) = 1$ the iterates are exactly the Taylor partial sums of $e^x$ .
The same Lipschitz constant $L$ that guarantees uniqueness also bounds how fast nearby trajectories can spread. This is the Grönwall estimate and it is the root of every stability bound in numerical ODE solving and chaos theory.
Smooth right-hand sides can still produce solutions with finite maximal intervals. Uniqueness is local; always check whether your solution survives to the time you care about.

The Essence of Existence and Uniqueness:

"An initial value problem is a deterministic law of nature precisely when the right-hand side is Lipschitz in the state. The Lipschitz constant is the speed limit on how quickly the future can forget where it started."

Coming Next: Chapter 21 begins our toolbox of explicit solution techniques. We'll start with Linear First-Order Equations, where the integrating factor method turns every such ODE into a single antiderivative — cleanly, and with uniqueness automatically guaranteed.