Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section you will be able to:

Explain why the method of undetermined coefficients fails for forcings like $\sec t$ , $\tan t$ , or $1/t$ , and why a fundamentally different method is needed.
State and motivate the central guess of variation of parameters: replace the constants $C_1, C_2$ in the homogeneous solution by functions $u_1(t), u_2(t)$ .
Derive the imposed constraint $u_1' y_1 + u_2' y_2 = 0$ from the desire to avoid second derivatives of $u_1, u_2$ .
Write down the Wronskian formulas $u_1' = -y_2 g / (aW)$ and $u_2' = y_1 g / (aW)$ from Cramer's rule, and integrate them to obtain the particular solution.
Solve the canonical example $y'' + y = \sec t$ from scratch.
Implement a general numerical variation-of-parameters solver in Python, and verify any closed-form candidate with PyTorch autograd.

The Big Picture: Why a New Method?

In the previous section we learned the method of undetermined coefficients: if the right-hand side $g(t)$ is a polynomial, an exponential, a sine, a cosine, or a finite product of these, then we can guess a particular solution of the same form with unknown coefficients and solve for them.

But what happens when $g(t)$ is something like $\sec t$ , $\tan t$ , $\ln t$ , or $1/(1+e^t)$ ? There is no finite combination of polynomials/exponentials/trig functions that closes under differentiation in the way $\sin t$ closes onto $\cos t$ . You cannot "guess and adjust." The method simply has nothing to feed on.

The motivating problem

Solve $y'' + y = \sec t$ on the interval $(-\pi/2,\,\pi/2)$ .

Method of undetermined coefficients: fails. There is no template to try.

Variation of parameters: produces the exact closed-form answer $y_p = \cos t \,\ln|\cos t| + t\sin t$ in half a page.

Variation of parameters is the universal method for finding particular solutions of linear ODEs. It works for any continuous $g(t)$ , no guessing required, provided we know the homogeneous solutions $y_1, y_2$ . Its price is one integral per unknown — which sometimes evaluates to elementary functions and sometimes leaves you with a clean integral representation. Either way, the method always terminates with an answer.

The Conceptual Leap: Let the Constants Vary

We start from the homogeneous solution

y_h(t) = C_1\,y_1(t) + C_2\,y_2(t)

with $C_1, C_2$ constant. This solves the homogeneous equation but not the forced one. So Lagrange's beautiful idea (1774) was: what if the constants weren't constant? Try

\boxed{\,y_p(t) = u_1(t)\,y_1(t) + u_2(t)\,y_2(t)\,}

with $u_1, u_2$ now unknown functions of $t$ . That is the entire leap. The name "variation of parameters" means literally "allowing the parameters $C_1, C_2$ to vary with $t$ ."

Why this is more than a syntactic trick

One unknown function (a single particular solution) is being replaced by two unknown functions. That extra freedom is what buys us the power to handle any forcing — but it also means the problem looks under-determined. We will fix that by imposing one extra equation of our own choosing. The clever part is the choice.

Deriving the 2×2 System

Differentiate $y_p = u_1 y_1 + u_2 y_2$ by the product rule:

y_p' = (u_1' y_1 + u_2' y_2) + (u_1 y_1' + u_2 y_2').

Look at the term in the first parenthesis. If we were to take $y_p''$ next, that parenthesis would differentiate into terms involving $u_1'', u_2''$ — making the problem genuinely worse than where we started. So we make a strategic choice and declare it equal to zero:

\textbf{Constraint:}\quad u_1' y_1 + u_2' y_2 = 0.

This is the missing equation that pins down the two unknown functions. With it, $y_p' = u_1 y_1' + u_2 y_2'$ — exactly the same shape as $y_p$ , just with primed basis functions.

Differentiate once more:

y_p'' = u_1' y_1' + u_2' y_2' + u_1 y_1'' + u_2 y_2''.

Now substitute $y_p, y_p', y_p''$ into $a y'' + b y' + c y = g$ :

a(u_1' y_1' + u_2' y_2') + u_1\underbrace{(a y_1'' + b y_1' + c y_1)}_{=0} + u_2\underbrace{(a y_2'' + b y_2' + c y_2)}_{=0} = g.

The two underbraced quantities are zero because $y_1, y_2$ solve the homogeneous equation. That collapses the whole expression to the clean second equation:

\textbf{From the ODE:}\quad u_1' y_1' + u_2' y_2' = \dfrac{g}{a}.

Combine the two boxed equations into a linear system in the unknowns $u_1', u_2'$ :

\begin{bmatrix} y_1 & y_2 \\[2pt] y_1' & y_2' \end{bmatrix}\begin{bmatrix} u_1' \\[2pt] u_2' \end{bmatrix} = \begin{bmatrix} 0 \\[2pt] g/a \end{bmatrix}.

The coefficient matrix is exactly the Wronskian matrix we met in section 22.01.

The Wronskian Formulas

The determinant of that 2×2 matrix is the Wronskian $W = y_1 y_2' - y_2 y_1'$ . Since $y_1, y_2$ are a fundamental set of homogeneous solutions, $W(t) \neq 0$ everywhere on the interval (Abel's theorem). So Cramer's rule gives:

u_1'(t) = \frac{\begin{vmatrix} 0 & y_2 \\ g/a & y_2' \end{vmatrix}}{W(t)} = -\frac{y_2(t)\,g(t)}{a(t)\,W(t)},

u_2'(t) = \frac{\begin{vmatrix} y_1 & 0 \\ y_1' & g/a \end{vmatrix}}{W(t)} = +\frac{y_1(t)\,g(t)}{a(t)\,W(t)}.

Integrate both:

u_1(t) = -\int \frac{y_2(t)\,g(t)}{a(t)\,W(t)}\,dt,\qquad u_2(t) = +\int \frac{y_1(t)\,g(t)}{a(t)\,W(t)}\,dt.

And the particular solution is

\boxed{\;y_p(t) = u_1(t)\,y_1(t) + u_2(t)\,y_2(t).\;}

The minus sign is not optional

The formula for $u_1'$ carries a minus sign; $u_2'$ is plus. Forgetting the minus produces a wrong $y_p$ that fails to satisfy the ODE. The sign comes from the cofactor expansion of the determinant in Cramer's rule — there is no way to massage it away.

The Recipe in Five Steps

Given a linear second-order ODE $a y'' + b y' + c y = g(t)$ :

Solve the homogeneous equation first. Obtain $y_1, y_2$ by any method — characteristic equation, reduction of order, table lookup. Without these, variation of parameters cannot even start.
Compute the Wronskian $W = y_1 y_2' - y_2 y_1'$ . Often a clean closed form (e.g. $W = 1$ for $y_1=\cos t, y_2=\sin t$ ).
Form $u_1' = -y_2 g / (aW)$ and $u_2' = +y_1 g / (aW)$ .
Integrate. Constants of integration can be set to $0$ — they will be absorbed by $C_1, C_2$ when we add the homogeneous part.
Assemble $y(t) = C_1 y_1 + C_2 y_2 + u_1 y_1 + u_2 y_2$ and apply the two initial conditions to pin down $C_1, C_2$ .

Worked Example: $y'' + y = \sec t$

Solve $y'' + y = \sec t$ on $(-\pi/2,\,\pi/2)$ with initial conditions $y(0) = 0,\; y'(0) = 0$ .

Click to expand the full hand calculation

Step 1 — Homogeneous solution.

Characteristic equation: $r^2 + 1 = 0 \Rightarrow r = \pm i$ . Complex conjugates with $\alpha = 0, \beta = 1$ , so

y_1(t) = \cos t,\qquad y_2(t) = \sin t.

Step 2 — Wronskian.

W = y_1 y_2' - y_2 y_1' = \cos t \cdot \cos t - \sin t \cdot (-\sin t) = \cos^2 t + \sin^2 t = 1.

The cleanest possible Wronskian. That happy fact is unique to the $(\cos, \sin)$ basis; in other problems $W$ can be an exponential or a polynomial.

Step 3 — Form u₁′ and u₂′.

Here $a = 1$ and $g(t) = \sec t = 1/\cos t$ . So

u_1'(t) = -\frac{y_2 g}{aW} = -\frac{\sin t \cdot \sec t}{1 \cdot 1} = -\tan t,

u_2'(t) = +\frac{y_1 g}{aW} = +\frac{\cos t \cdot \sec t}{1} = 1.

Step 4 — Integrate.

$u_1(t) = -\int \tan t\,dt = \ln|\cos t| + K_1$ . Set $K_1 = 0$ .

$u_2(t) = \int 1\,dt = t + K_2$ . Set $K_2 = 0$ .

The constants of integration are dropped because any additive constant in $u_1$ simply contributes a multiple of $y_1 = \cos t$ to the answer — which is absorbed by $C_1$ when we add the homogeneous part. Same for $K_2$ .

Step 5 — Particular solution.

y_p(t) = u_1 y_1 + u_2 y_2 = \cos t \cdot \ln|\cos t| + t \sin t.

Step 6 — Verify by direct substitution.

First derivative: $y_p' = -\sin t \ln|\cos t| + \cos t \cdot \tfrac{-\sin t}{\cos t} + \sin t + t \cos t$ $= -\sin t \ln|\cos t| + t\cos t$ (the two $\sin t$ terms cancel).

Second derivative: $y_p'' = -\cos t \ln|\cos t| + \tfrac{\sin^2 t}{\cos t} + \cos t - t \sin t.$

Sum: $y_p'' + y_p$ $= \tfrac{\sin^2 t}{\cos t} + \cos t$ $= \tfrac{\sin^2 t + \cos^2 t}{\cos t}$ $= \tfrac{1}{\cos t} = \sec t. \;\checkmark$

Step 7 — Apply initial conditions.

General solution: $y(t) = C_1 \cos t + C_2 \sin t + \cos t \ln|\cos t| + t\sin t$ .

At $t=0$ : $y_p(0) = 1\cdot\ln 1 + 0 = 0$ and $y_p'(0) = 0 + 0 = 0$ . So $y(0) = C_1 = 0$ and $y'(0) = C_2 = 0$ .

Step 8 — Final answer.

\boxed{\;y(t) = \cos t \,\ln|\cos t| + t \sin t.\;}

Sanity: at $t = 0.5$ radians, $\cos 0.5 \approx 0.8776$ , $\ln 0.8776 \approx -0.1306$ , $\sin 0.5 \approx 0.4794$ . So $y(0.5) \approx 0.8776 \times (-0.1306) + 0.5 \times 0.4794 \approx -0.1146 + 0.2397 = 0.1251.$ Cross-check this against the interactive explorer below by setting $C_1 = C_2 = 0$ and reading $y_p$ at $t = 0.5$ .

Interactive Variation-of-Parameters Explorer

The explorer below fixes the homogeneous family at $y_1 = \cos t,\, y_2 = \sin t$ (so $W = 1$ ) and lets you pick the forcing $g(t)$ . Watch how the running integrals $u_1(t) = -\int_0^t \sin s\, g(s)\,ds$ and $u_2(t) = +\int_0^t \cos s\, g(s)\,ds$ grow, and how their combination $y_p = u_1 \cos t + u_2 \sin t$ gives the bend in the full solution.

Loading variation-of-parameters explorer…

What to look for

Start with $g(t) = \sec t$ and $C_1 = C_2 = 0$ . The blue curve is exactly the $y_p = \cos t \ln|\cos t| + t\sin t$ from the worked example. Verify by eye that $y_p(0) = 0$ .
Slide $C_1$ up — a cosine wave is layered on top with NO interaction. Linearity in action.
Switch to $g(t) = e^{-t^2}$ . There is no closed-form particular solution, but $y_p$ appears as a perfectly smooth curve, computed by numerical integration of $u_1', u_2'$ .
Switch to the $g(t) = \tan t$ preset and watch $y_p$ develop the same kind of logarithmic feature near $t = \pm \pi/2$ as the sec problem. Both forcings inherit their singularities from $1/\cos t$ .

Why the Constraint $u_1'y_1 + u_2'y_2 = 0$ ?

It might feel arbitrary that we imposed $u_1' y_1 + u_2' y_2 = 0$ out of thin air. Let's see what would have happened without it.

Without the constraint, plugging $y_p = u_1 y_1 + u_2 y_2$ directly into the ODE generates terms with $u_1'', u_2''$ (from differentiating $y_p'$ a second time). That gives one equation in two unknowns of higher order — the problem becomes harder, not easier.

The constraint $u_1' y_1 + u_2' y_2 = 0$ is chosen so that $y_p'$ looks just like $y_p$ with primed basis functions — no $u_1', u_2'$ appear in it. Then $y_p''$ picks up exactly one first-derivative term in $u_1', u_2'$ and the whole substitution remains a linear, first-order system in the unknowns. That keeps the math tractable. In any derivation that "just works out," the constraint is the engineer's choice that makes it so.

It generalizes

For an $n$ -th order linear ODE, variation of parameters introduces $n$ functions $u_1, \dots, u_n$ and $n-1$ imposed constraints (the obvious generalization of the one above), plus one equation from the ODE itself. The system $W \vec{u}' = \vec{b}$ is solved with Cramer's rule via the generalized Wronskian. Every formula in this section extends almost verbatim.

Computation in Python

Below is a general-purpose numerical variation-of-parameters solver. It accepts callable $y_1, y_2$ , their derivatives, the coefficient $a(t)$ , and the forcing $g(t)$ ; performs cumulative trapezoidal integration of $u_1', u_2'$ ; and stitches on the homogeneous part to honor the initial conditions.

Numerical solver via cumulative integration

A general numerical implementation of variation of parameters

🐍variation_of_parameters.py

Explanation(12)

Code(72)

1Imports

NumPy gives us vectorised math on the time grid. From scipy we import cumulative_trapezoid — the running trapezoidal integral. We need a RUNNING integral (not a single number) because the answer u₁(t) and u₂(t) are functions of t, not constants.

4Function header — what does this routine consume?

Variation of parameters needs FIVE inputs that other methods don't. (1) the two homogeneous basis functions y₁, y₂, (2) their derivatives y₁′, y₂′, (3) the coefficient a(t) that multiplies y″ — usually 1, (4) the forcing g(t), and (5) the time grid plus two initial conditions y(0), y′(0). All function inputs are CALLABLES that accept a NumPy array — this lets us vectorise the whole computation.

22Evaluate every homogeneous building block on the grid

Before we can compute u₁′ and u₂′ we need y₁(t), y₂(t), their derivatives, the coefficient a(t), and the forcing g(t) — all sampled at every point of the time grid. After this line we have six 1-D NumPy arrays of equal length. Everything downstream is elementwise array arithmetic.

28Wronskian on the grid

The Wronskian W(t) = y₁(t) y₂′(t) − y₂(t) y₁′(t) is the determinant of the 2×2 coefficient matrix in the linear system we're about to solve. Abel's theorem guarantees W(t) is either NEVER zero on an interval or IDENTICALLY zero. So if W is non-zero at a single point, dividing by it is safe everywhere on the interval. Here y₁=cos, y₂=sin gives W=cos²+sin²=1 — the cleanest possible Wronskian.

32The two derivatives u₁′(t) and u₂′(t)

The whole method collapses to these two formulas: u₁′ = −y₂g/(aW) and u₂′ = y₁g/(aW). Note the sign asymmetry (minus on u₁′, plus on u₂′) — this comes directly from Cramer's rule applied to the 2×2 system [[y₁,y₂],[y₁′,y₂′]] · [u₁′,u₂′]ᵀ = [0, g/a]ᵀ. Forgetting that minus sign is the single most common error in this whole method.

38Cumulative trapezoidal integration

cumulative_trapezoid(f, t) returns, at index i, the trapezoidal estimate of ∫₀^{t_i} f(s) ds. It outputs an array of length len(t)−1, so we prepend 0.0 to restore the original length and to encode u₁(0)=u₂(0)=0 — the convention that lets us derive the homogeneous constants cleanly later. If t doesn't start at 0, you'd instead pin u₁ and u₂ to specific anchor values; we keep things simple by assuming t contains 0.

43Build y_p(t)

Now the particular solution falls out as one line: y_p = u₁(t)·y₁(t) + u₂(t)·y₂(t). This is the SAME formal shape as the homogeneous solution C₁y₁ + C₂y₂ — we just replaced the constants with the running integrals. That replacement is what 'variation of parameters' literally means.

44Derivative of y_p — note the cancellation

By the product rule, y_p′ = (u₁′y₁ + u₂′y₂) + (u₁y₁′ + u₂y₂′). But we designed u₁′y₁ + u₂′y₂ = 0 (the imposed constraint), so y_p′ simplifies to u₁y₁′ + u₂y₂′ — the same shape as y_p but with the derivatives of the basis. This is exactly why the constraint was chosen: it removes ugly second-derivative terms in u₁″, u₂″ that would otherwise appear when we differentiate twice.

48Solve a 2×2 system for the homogeneous constants

Because we integrated u₁, u₂ from 0, in general y_p(0) ≠ y(0) and y_p′(0) ≠ y′(0). We absorb the leftover into the homogeneous part: C₁y₁(0)+C₂y₂(0) = y(0)−y_p(0), and the same for derivatives. The matrix M is the Wronskian matrix at t=0, so it's invertible (otherwise y₁, y₂ would not be a fundamental set in the first place).

53Assemble the full answer

y(t) = C₁y₁(t) + C₂y₂(t) + y_p(t). This single formula combines two ingredients: the SYSTEM'S OWN free behavior (the homogeneous part) and the RESPONSE TO FORCING (the particular part). The principle of superposition for linear ODEs says these add cleanly with no interaction.

58Test on the canonical sec(t) problem

Why sec(t)? Because it's the simplest function that defeats the method of undetermined coefficients: sec(t) is not a finite combination of polynomials, exponentials, or sines/cosines, so you cannot 'guess and adjust'. Variation of parameters handles it without flinching. The closed form y_p = cos(t)ln|cos(t)| + t sin(t) was derived by hand in the worked example above — comparing against it tests our numerics.

66Compare numerical vs closed-form

On 401 grid points spanning roughly [−1.3, 1.3] (avoiding the asymptotes at ±π/2) the trapezoidal integral gets us within ~10⁻⁵ of the exact solution. Pushing N higher trades runtime for accuracy at rate O(N⁻²). For real production use scipy's solve_ivp on the equivalent first-order system, but for understanding the method itself, this direct implementation makes every step transparent.

60 lines without explanation

1import numpy as np
2from scipy.integrate import cumulative_trapezoid
3
4def variation_of_parameters(y1, y2, y1p, y2p, a_coef, g, t, y0, yp0):
5    """
6    Solve  a(t) * y'' + b(t) * y' + c(t) * y = g(t)  on the grid 't'
7    given two linearly independent homogeneous solutions y1, y2 and
8    their derivatives y1p, y2p (all callable on a numpy array).
9
10    Parameters
11    ----------
12    y1, y2     : callables, the homogeneous basis
13    y1p, y2p   : callables, their first derivatives
14    a_coef     : callable, the coefficient on y''  (often just 1.0)
15    g          : callable, the forcing term
16    t          : np.ndarray, time grid (must contain 0.0 for clean IC)
17    y0, yp0    : floats, initial conditions y(0), y'(0)
18
19    Returns
20    -------
21    y : np.ndarray, the full solution evaluated on t
22    info : dict with the homogeneous constants and the building blocks
23    """
24    Y1, Y2 = y1(t), y2(t)
25    Y1p, Y2p = y1p(t), y2p(t)
26    A = a_coef(t)
27    G = g(t)
28
29    # Wronskian W(t) = y1*y2' - y2*y1'
30    W = Y1 * Y2p - Y2 * Y1p
31
32    # u1'(t) = - y2(t) g(t) / (a(t) W(t))
33    # u2'(t) =   y1(t) g(t) / (a(t) W(t))
34    u1p = -Y2 * G / (A * W)
35    u2p =  Y1 * G / (A * W)
36
37    # Integrate from t=0 outward.  cumulative_trapezoid returns an array
38    # one shorter than t, so prepend 0 to recover the same length.
39    u1 = np.concatenate(([0.0], cumulative_trapezoid(u1p, t)))
40    u2 = np.concatenate(([0.0], cumulative_trapezoid(u2p, t)))
41
42    # Particular solution
43    y_p  = u1 * Y1 + u2 * Y2
44    y_pp = u1 * Y1p + u2 * Y2p     # y_p'  (the cross terms cancel by design)
45
46    # Now stitch the homogeneous part to match the initial conditions.
47    # y(0) = C1 y1(0) + C2 y2(0) + y_p(0)
48    # y'(0)= C1 y1'(0)+ C2 y2'(0)+ y_p'(0)
49    M = np.array([[y1(0.0), y2(0.0)],
50                  [y1p(0.0), y2p(0.0)]])
51    rhs = np.array([y0 - y_p[0],  yp0 - y_pp[0]])
52    C1, C2 = np.linalg.solve(M, rhs)
53
54    y = C1 * Y1 + C2 * Y2 + y_p
55    return y, {"C1": C1, "C2": C2, "u1": u1, "u2": u2, "y_p": y_p}
56
57
58# ---- Worked example:  y'' + y = sec(t),  y(0) = 0,  y'(0) = 0  ----
59t = np.linspace(-1.3, 1.3, 401)   # stay clear of t = ±π/2 singularities
60y, info = variation_of_parameters(
61    y1  = np.cos,
62    y2  = np.sin,
63    y1p = lambda s: -np.sin(s),
64    y2p = np.cos,
65    a_coef = lambda s: np.ones_like(s),
66    g      = lambda s: 1.0 / np.cos(s),
67    t = t, y0 = 0.0, yp0 = 0.0,
68)
69
70# Hand-derived closed form: y_p = cos(t)*ln|cos(t)| + t*sin(t)
71y_p_exact = np.cos(t) * np.log(np.abs(np.cos(t))) + t * np.sin(t)
72print("max |y - y_p_exact| =", np.max(np.abs(y - y_p_exact)))

PyTorch View: Verifying the Solution with Autograd

We derived $y_p = \cos t\,\ln|\cos t| + t\sin t$ by hand and called the integral "done." But did we actually get the right answer? The most rigorous check is to plug $y_p$ back into the ODE and confirm that $y_p'' + y_p - \sec t = 0$ at every grid point. Computing those derivatives by hand is tedious. PyTorch's autograd does it for free.

Autograd-based residual check for any candidate solution

🐍vop_residual.py

Explanation(12)

Code(45)

1Import PyTorch

We only need torch — no torch.nn or torch.optim. The whole demo uses raw autograd, the same machinery that powers backpropagation in neural networks. We're going to use it for a completely different purpose: VERIFYING a candidate ODE solution.

3Function header — what is a residual?

Given any candidate function y(t) for the ODE y″ + y = g(t), the RESIDUAL is r(t) = y″(t) + y(t) − g(t). If y is the true solution, r(t) is exactly 0 for all t. If y is a guess that's slightly wrong, r(t) measures HOW wrong it is, pointwise. Plotting r(t) is a fast visual test for any closed-form candidate.

12Re-anchor t with requires_grad=True

Autograd only tracks tensors that have requires_grad=True. We detach+clone the incoming t so the caller's tensor is left unchanged, then mark our copy as requiring gradients. From now on, every operation involving this t is recorded into a computation graph so we can differentiate through it.

13Evaluate y at every grid point

y_fn is the candidate solution as a plain Python callable. Calling it on the (T,)-shaped tensor t produces a (T,)-shaped tensor of y values, with the computation graph attached. We have NOT yet computed any derivative — only the forward pass.

16First derivative via torch.autograd.grad

torch.autograd.grad(outputs=y, inputs=t, grad_outputs=ones_like(y)) is the differentiation primitive. Reading right-to-left: we ask 'what is dy_i/dt_i for every grid point i?'. Because y is a vector, autograd actually computes a vector-Jacobian product — multiplying the Jacobian by grad_outputs (all-ones) gives us dy/dt elementwise, which is exactly the derivative we want when each y_i depends only on t_i (which is true here).

19create_graph=True — why this matters

By default, autograd FREES the intermediate tensors after differentiation to save memory. But we want to take a SECOND derivative next, so the graph that computed yp must persist. Setting create_graph=True keeps everything alive. The cost is roughly 2x memory; the alternative would be to compute y″ from finite differences, which is far less accurate.

23Second derivative — same call, on yp

We just repeat the same trick on yp. Because the graph is still alive, autograd can trace yp back to t and differentiate again, giving y″ on the entire grid. This is a TRUE second derivative computed symbolically through the graph — not a numerical approximation. create_graph=False here because we don't need a third derivative.

29Return the residual

Now we just plug into the ODE: y″ + y − g(t). A true solution makes this identically 0. A near-solution returns small numbers. A bogus candidate returns big numbers.

34Candidate solution from variation of parameters

This is the closed form we derived by hand in the worked example: y_p(t) = cos(t)·ln|cos(t)| + t·sin(t). It is the particular solution that came out of integrating u₁′ = −tan(t) and u₂′ = 1. If our derivation was correct, the residual must vanish.

37Forcing g(t) = sec(t)

Use torch.cos to keep g(t) inside the same autograd-friendly type system as y(t). Avoid mixing math.cos (Python scalar) with torch — that will silently break the graph.

40Run the verification

We sample 401 grid points in [−1.3, 1.3] (avoiding ±π/2 where sec blows up) at float64 precision. With float64 the residual stays below ~10⁻¹², limited only by floating-point round-off. With float32 it would be ~10⁻⁵ — still a clear pass. This is the gold-standard sanity check for any closed-form ODE candidate: if autograd thinks it satisfies the equation, you can publish it.

45What this technique generalizes to

The same pattern verifies solutions to ANY ODE, not just y″+y=sec(t). Replace 'ypp + y - g(t)' with 'a(t)·ypp + b(t)·yp + c(t)·y - g(t)' and you have a universal residual checker. This is how modern physics-informed neural networks (PINNs) train — they minimize the residual of a PDE evaluated via autograd. The trick you just saw is the engine inside PINNs.

33 lines without explanation

1import torch
2
3def vop_residual(t, y_fn, g_fn):
4    """
5    Given a candidate solution y_fn(t) and a forcing g_fn(t) for the ODE
6        y'' + y = g(t),
7    return the pointwise residual  y'' + y - g(t).
8    A correct solution makes this ~0 everywhere.
9
10    We compute y' and y'' with PyTorch autograd — no finite differences,
11    no closed-form differentiation by hand.
12    """
13    t = t.detach().clone().requires_grad_(True)
14    y = y_fn(t)
15
16    # First derivative dy/dt
17    yp = torch.autograd.grad(
18        outputs=y, inputs=t,
19        grad_outputs=torch.ones_like(y),
20        create_graph=True,            # keep graph alive for the 2nd derivative
21    )[0]
22
23    # Second derivative d2y/dt2
24    ypp = torch.autograd.grad(
25        outputs=yp, inputs=t,
26        grad_outputs=torch.ones_like(yp),
27        create_graph=False,
28    )[0]
29
30    return ypp + y - g_fn(t)
31
32
33# Candidate solution from variation of parameters:
34#   y_p(t) = cos(t) * ln|cos(t)| + t * sin(t)
35def y_candidate(t):
36    return torch.cos(t) * torch.log(torch.abs(torch.cos(t))) + t * torch.sin(t)
37
38def g(t):
39    return 1.0 / torch.cos(t)
40
41t = torch.linspace(-1.3, 1.3, 401, dtype=torch.float64)
42res = vop_residual(t, y_candidate, g)
43
44print("max |residual| =", torch.max(torch.abs(res)).item())
45print("mean |residual| =", torch.mean(torch.abs(res)).item())

Why this connects to modern ML

The same idea — "differentiate a candidate solution and demand that the ODE/PDE residual be small" — is the entire training loss of a physics-informed neural network (PINN). A neural network plays the role of $y_p$ ; autograd computes $y_p', y_p''$ ; the residual is minimized with gradient descent. Variation of parameters gives a closed-form answer when the problem is linear; PINNs give a neural answer when it isn't. The autograd plumbing is identical.

Variation of Parameters vs Undetermined Coefficients

Aspect	Undetermined Coefficients	Variation of Parameters
Works for which g(t)?	Only polynomials, exponentials, sin/cos, and products of these	Any continuous g(t), including sec, tan, ln, 1/t, piecewise
Coefficients a, b, c	Must be constants	Can depend on t (provided y₁, y₂ are still known)
Setup cost	Guess a template; solve a tiny algebraic system	Compute Wronskian; integrate two expressions
Hard part	Choosing the right template, handling resonance	Evaluating the two integrals (often the bottleneck)
Always succeeds?	No — fails if g(t) is not in the template family	Yes — always produces an integral representation
When closed form?	Always closed form when applicable	Closed form iff the integrals evaluate elementarily

In practice: try undetermined coefficients first when $g(t)$ looks "nice." Reach for variation of parameters when it doesn't, or when the coefficients are non-constant.

Common Pitfalls

Forgetting to divide by a(t)

The formulas are $u_i' = \pm y_j g / (\mathbf{a}\,W)$ , with the coefficient of $y''$ in the denominator. If your ODE is $2y'' + 8y' + 6y = g(t)$ , divide $g$ by 2 before plugging in, OR keep the factor of $a = 2$ in the denominator. Either is fine; dropping it is not.

Mixing up the signs

$u_1'$ has a minus; $u_2'$ has a plus. The quickest way to remember: the function paired with the OTHER homogeneous solution gets the sign. So $u_1$ uses $y_2$ (and gets the minus), and $u_2$ uses $y_1$ (and gets the plus).

Including the integration constants twice

Any $+K_1$ in $u_1$ contributes $+K_1 y_1$ to $y_p$ — a multiple of the homogeneous solution. So it is absorbed by $C_1$ in the general solution. Setting both $K_1 = K_2 = 0$ is standard and prevents double-counting.

Trying to skip the homogeneous step

You CANNOT do variation of parameters without first knowing $y_1, y_2$ . If the homogeneous equation has variable coefficients and no closed-form basis (e.g. $t^2 y'' + t y' + (t^2 - 1) y = g(t)$ ), you need a different tool entirely (power series, special functions, numerical integration of the homogeneous equation first).

Singularities in g(t)

Variation of parameters happily integrates across smooth regions of $g(t)$ , but the integral diverges where $g$ blows up (e.g. $\sec t$ at $\pm \pi/2$ ). The solution is only valid on intervals where the integrals exist — that's why the worked example restricts to $(-\pi/2,\, \pi/2)$ .

Summary

Variation of parameters is the universal method for finding a particular solution to a linear ODE when undetermined coefficients fails. The recipe:

Find $y_1, y_2$ for the homogeneous problem.
Compute the Wronskian $W = y_1 y_2' - y_2 y_1'$ .
Set $u_1' = -y_2 g / (aW)$ and $u_2' = +y_1 g / (aW)$ .
Integrate to get $u_1(t), u_2(t)$ .
Form $y = C_1 y_1 + C_2 y_2 + u_1 y_1 + u_2 y_2$ and apply initial conditions.

Master formula

y_p(t) = -y_1(t)\int \frac{y_2(t) g(t)}{a(t) W(t)}\,dt + y_2(t)\int \frac{y_1(t) g(t)}{a(t) W(t)}\,dt.

The slogan

"Replace the constants C₁, C₂ with functions u₁(t), u₂(t); impose one extra equation; solve the rest with Cramer's rule."

Coming next: Section 22.06 returns to the physical meaning of these solutions — mechanical vibrations. Mass, spring, dashpot, and external force. The variation-of-parameters machinery you just learned is exactly what produces the response of a driven oscillator to an arbitrary forcing — the bridge from pure math to engineering reality.