Explain why the method of undetermined coefficients fails for forcings like sect, tant, or 1/t, and why a fundamentally different method is needed.
State and motivate the central guess of variation of parameters: replace the constants C1,C2 in the homogeneous solution by functionsu1(t),u2(t).
Derive the imposed constraint u1′y1+u2′y2=0 from the desire to avoid second derivatives of u1,u2.
Write down the Wronskian formulas u1′=−y2g/(aW) and u2′=y1g/(aW) from Cramer's rule, and integrate them to obtain the particular solution.
Solve the canonical example y′′+y=sect from scratch.
Implement a general numerical variation-of-parameters solver in Python, and verify any closed-form candidate with PyTorch autograd.
The Big Picture: Why a New Method?
In the previous section we learned the method of undetermined coefficients: if the right-hand side g(t) is a polynomial, an exponential, a sine, a cosine, or a finite product of these, then we can guess a particular solution of the same form with unknown coefficients and solve for them.
But what happens when g(t) is something like sect, tant, lnt, or 1/(1+et)? There is no finite combination of polynomials/exponentials/trig functions that closes under differentiation in the way sint closes onto cost. You cannot "guess and adjust." The method simply has nothing to feed on.
The motivating problem
Solve y′′+y=sect on the interval (−π/2,π/2).
Method of undetermined coefficients: fails. There is no template to try.
Variation of parameters: produces the exact closed-form answer yp=costln∣cost∣+tsint in half a page.
Variation of parameters is the universal method for finding particular solutions of linear ODEs. It works for any continuous g(t), no guessing required, provided we know the homogeneous solutions y1,y2. Its price is one integral per unknown — which sometimes evaluates to elementary functions and sometimes leaves you with a clean integral representation. Either way, the method always terminates with an answer.
The Conceptual Leap: Let the Constants Vary
We start from the homogeneous solution
yh(t)=C1y1(t)+C2y2(t)
with C1,C2 constant. This solves the homogeneous equation but not the forced one. So Lagrange's beautiful idea (1774) was: what if the constants weren't constant? Try
yp(t)=u1(t)y1(t)+u2(t)y2(t)
with u1,u2 now unknown functions of t. That is the entire leap. The name "variation of parameters" means literally "allowing the parameters C1,C2 to vary with t."
Why this is more than a syntactic trick
One unknown function (a single particular solution) is being replaced by two unknown functions. That extra freedom is what buys us the power to handle any forcing — but it also means the problem looks under-determined. We will fix that by imposing one extra equation of our own choosing. The clever part is the choice.
Deriving the 2×2 System
Differentiate yp=u1y1+u2y2 by the product rule:
yp′=(u1′y1+u2′y2)+(u1y1′+u2y2′).
Look at the term in the first parenthesis. If we were to take yp′′ next, that parenthesis would differentiate into terms involving u1′′,u2′′ — making the problem genuinely worse than where we started. So we make a strategic choice and declare it equal to zero:
Constraint:u1′y1+u2′y2=0.
This is the missing equation that pins down the two unknown functions. With it, yp′=u1y1′+u2y2′ — exactly the same shape as yp, just with primed basis functions.
The two underbraced quantities are zero because y1,y2 solve the homogeneous equation. That collapses the whole expression to the clean second equation:
From the ODE:u1′y1′+u2′y2′=ag.
Combine the two boxed equations into a linear system in the unknowns u1′,u2′:
[y1y1′y2y2′][u1′u2′]=[0g/a].
The coefficient matrix is exactly the Wronskian matrix we met in section 22.01.
The Wronskian Formulas
The determinant of that 2×2 matrix is the Wronskian W=y1y2′−y2y1′. Since y1,y2 are a fundamental set of homogeneous solutions, W(t)=0 everywhere on the interval (Abel's theorem). So Cramer's rule gives:
The formula for u1′ carries a minus sign; u2′ is plus. Forgetting the minus produces a wrong yp that fails to satisfy the ODE. The sign comes from the cofactor expansion of the determinant in Cramer's rule — there is no way to massage it away.
The Recipe in Five Steps
Given a linear second-order ODE ay′′+by′+cy=g(t):
Solve the homogeneous equation first. Obtain y1,y2 by any method — characteristic equation, reduction of order, table lookup. Without these, variation of parameters cannot even start.
Compute the WronskianW=y1y2′−y2y1′. Often a clean closed form (e.g. W=1 for y1=cost,y2=sint).
Formu1′=−y2g/(aW) and u2′=+y1g/(aW).
Integrate. Constants of integration can be set to 0 — they will be absorbed by C1,C2 when we add the homogeneous part.
Assembley(t)=C1y1+C2y2+u1y1+u2y2 and apply the two initial conditions to pin down C1,C2.
Worked Example: y′′+y=sect
Solve y′′+y=sect on (−π/2,π/2) with initial conditions y(0)=0,y′(0)=0.
Click to expand the full hand calculation
Step 1 — Homogeneous solution.
Characteristic equation: r2+1=0⇒r=±i. Complex conjugates with α=0,β=1, so
The cleanest possible Wronskian. That happy fact is unique to the (cos,sin) basis; in other problems W can be an exponential or a polynomial.
Step 3 — Form u₁′ and u₂′.
Here a=1 and g(t)=sect=1/cost. So
u1′(t)=−aWy2g=−1⋅1sint⋅sect=−tant,
u2′(t)=+aWy1g=+1cost⋅sect=1.
Step 4 — Integrate.
u1(t)=−∫tantdt=ln∣cost∣+K1. Set K1=0.
u2(t)=∫1dt=t+K2. Set K2=0.
The constants of integration are dropped because any additive constant in u1 simply contributes a multiple of y1=cost to the answer — which is absorbed by C1 when we add the homogeneous part. Same for K2.
Step 5 — Particular solution.
yp(t)=u1y1+u2y2=cost⋅ln∣cost∣+tsint.
Step 6 — Verify by direct substitution.
First derivative: yp′=−sintln∣cost∣+cost⋅cost−sint+sint+tcost=−sintln∣cost∣+tcost (the two sint terms cancel).
Second derivative: yp′′=−costln∣cost∣+costsin2t+cost−tsint.
General solution: y(t)=C1cost+C2sint+costln∣cost∣+tsint.
At t=0: yp(0)=1⋅ln1+0=0 and yp′(0)=0+0=0. So y(0)=C1=0 and y′(0)=C2=0.
Step 8 — Final answer.
y(t)=costln∣cost∣+tsint.
Sanity: at t=0.5 radians, cos0.5≈0.8776, ln0.8776≈−0.1306, sin0.5≈0.4794. So y(0.5)≈0.8776×(−0.1306)+0.5×0.4794≈−0.1146+0.2397=0.1251. Cross-check this against the interactive explorer below by setting C1=C2=0 and reading yp at t=0.5.
Interactive Variation-of-Parameters Explorer
The explorer below fixes the homogeneous family at y1=cost,y2=sint (so W=1) and lets you pick the forcing g(t). Watch how the running integrals u1(t)=−∫0tsinsg(s)ds and u2(t)=+∫0tcossg(s)ds grow, and how their combination yp=u1cost+u2sint gives the bend in the full solution.
Loading variation-of-parameters explorer…
What to look for
Start with g(t)=sect and C1=C2=0. The blue curve is exactly the yp=costln∣cost∣+tsint from the worked example. Verify by eye that yp(0)=0.
Slide C1 up — a cosine wave is layered on top with NO interaction. Linearity in action.
Switch to g(t)=e−t2. There is no closed-form particular solution, but yp appears as a perfectly smooth curve, computed by numerical integration of u1′,u2′.
Switch to the g(t)=tant preset and watch yp develop the same kind of logarithmic feature near t=±π/2 as the sec problem. Both forcings inherit their singularities from 1/cost.
Why the Constraint u1′y1+u2′y2=0?
It might feel arbitrary that we imposed u1′y1+u2′y2=0 out of thin air. Let's see what would have happened without it.
Without the constraint, plugging yp=u1y1+u2y2 directly into the ODE generates terms with u1′′,u2′′ (from differentiating yp′ a second time). That gives one equation in two unknowns of higher order — the problem becomes harder, not easier.
The constraint u1′y1+u2′y2=0 is chosen so that yp′ looks just like yp with primed basis functions — no u1′,u2′ appear in it. Then yp′′ picks up exactly one first-derivative term in u1′,u2′ and the whole substitution remains a linear, first-order system in the unknowns. That keeps the math tractable. In any derivation that "just works out," the constraint is the engineer's choice that makes it so.
It generalizes
For an n-th order linear ODE, variation of parameters introduces n functions u1,…,un and n−1 imposed constraints (the obvious generalization of the one above), plus one equation from the ODE itself. The system Wu′=b is solved with Cramer's rule via the generalized Wronskian. Every formula in this section extends almost verbatim.
Computation in Python
Below is a general-purpose numerical variation-of-parameters solver. It accepts callable y1,y2, their derivatives, the coefficient a(t), and the forcing g(t); performs cumulative trapezoidal integration of u1′,u2′; and stitches on the homogeneous part to honor the initial conditions.
Numerical solver via cumulative integration
A general numerical implementation of variation of parameters
🐍variation_of_parameters.py
Explanation(12)
Code(72)
1Imports
NumPy gives us vectorised math on the time grid. From scipy we import cumulative_trapezoid — the running trapezoidal integral. We need a RUNNING integral (not a single number) because the answer u₁(t) and u₂(t) are functions of t, not constants.
4Function header — what does this routine consume?
Variation of parameters needs FIVE inputs that other methods don't. (1) the two homogeneous basis functions y₁, y₂, (2) their derivatives y₁′, y₂′, (3) the coefficient a(t) that multiplies y″ — usually 1, (4) the forcing g(t), and (5) the time grid plus two initial conditions y(0), y′(0). All function inputs are CALLABLES that accept a NumPy array — this lets us vectorise the whole computation.
22Evaluate every homogeneous building block on the grid
Before we can compute u₁′ and u₂′ we need y₁(t), y₂(t), their derivatives, the coefficient a(t), and the forcing g(t) — all sampled at every point of the time grid. After this line we have six 1-D NumPy arrays of equal length. Everything downstream is elementwise array arithmetic.
28Wronskian on the grid
The Wronskian W(t) = y₁(t) y₂′(t) − y₂(t) y₁′(t) is the determinant of the 2×2 coefficient matrix in the linear system we're about to solve. Abel's theorem guarantees W(t) is either NEVER zero on an interval or IDENTICALLY zero. So if W is non-zero at a single point, dividing by it is safe everywhere on the interval. Here y₁=cos, y₂=sin gives W=cos²+sin²=1 — the cleanest possible Wronskian.
32The two derivatives u₁′(t) and u₂′(t)
The whole method collapses to these two formulas: u₁′ = −y₂g/(aW) and u₂′ = y₁g/(aW). Note the sign asymmetry (minus on u₁′, plus on u₂′) — this comes directly from Cramer's rule applied to the 2×2 system [[y₁,y₂],[y₁′,y₂′]] · [u₁′,u₂′]ᵀ = [0, g/a]ᵀ. Forgetting that minus sign is the single most common error in this whole method.
38Cumulative trapezoidal integration
cumulative_trapezoid(f, t) returns, at index i, the trapezoidal estimate of ∫₀^{t_i} f(s) ds. It outputs an array of length len(t)−1, so we prepend 0.0 to restore the original length and to encode u₁(0)=u₂(0)=0 — the convention that lets us derive the homogeneous constants cleanly later. If t doesn't start at 0, you'd instead pin u₁ and u₂ to specific anchor values; we keep things simple by assuming t contains 0.
43Build y_p(t)
Now the particular solution falls out as one line: y_p = u₁(t)·y₁(t) + u₂(t)·y₂(t). This is the SAME formal shape as the homogeneous solution C₁y₁ + C₂y₂ — we just replaced the constants with the running integrals. That replacement is what 'variation of parameters' literally means.
44Derivative of y_p — note the cancellation
By the product rule, y_p′ = (u₁′y₁ + u₂′y₂) + (u₁y₁′ + u₂y₂′). But we designed u₁′y₁ + u₂′y₂ = 0 (the imposed constraint), so y_p′ simplifies to u₁y₁′ + u₂y₂′ — the same shape as y_p but with the derivatives of the basis. This is exactly why the constraint was chosen: it removes ugly second-derivative terms in u₁″, u₂″ that would otherwise appear when we differentiate twice.
48Solve a 2×2 system for the homogeneous constants
Because we integrated u₁, u₂ from 0, in general y_p(0) ≠ y(0) and y_p′(0) ≠ y′(0). We absorb the leftover into the homogeneous part: C₁y₁(0)+C₂y₂(0) = y(0)−y_p(0), and the same for derivatives. The matrix M is the Wronskian matrix at t=0, so it's invertible (otherwise y₁, y₂ would not be a fundamental set in the first place).
53Assemble the full answer
y(t) = C₁y₁(t) + C₂y₂(t) + y_p(t). This single formula combines two ingredients: the SYSTEM'S OWN free behavior (the homogeneous part) and the RESPONSE TO FORCING (the particular part). The principle of superposition for linear ODEs says these add cleanly with no interaction.
58Test on the canonical sec(t) problem
Why sec(t)? Because it's the simplest function that defeats the method of undetermined coefficients: sec(t) is not a finite combination of polynomials, exponentials, or sines/cosines, so you cannot 'guess and adjust'. Variation of parameters handles it without flinching. The closed form y_p = cos(t)ln|cos(t)| + t sin(t) was derived by hand in the worked example above — comparing against it tests our numerics.
66Compare numerical vs closed-form
On 401 grid points spanning roughly [−1.3, 1.3] (avoiding the asymptotes at ±π/2) the trapezoidal integral gets us within ~10⁻⁵ of the exact solution. Pushing N higher trades runtime for accuracy at rate O(N⁻²). For real production use scipy's solve_ivp on the equivalent first-order system, but for understanding the method itself, this direct implementation makes every step transparent.
60 lines without explanation
1import numpy as np
2from scipy.integrate import cumulative_trapezoid
34defvariation_of_parameters(y1, y2, y1p, y2p, a_coef, g, t, y0, yp0):5"""
6 Solve a(t) * y'' + b(t) * y' + c(t) * y = g(t) on the grid 't'
7 given two linearly independent homogeneous solutions y1, y2 and
8 their derivatives y1p, y2p (all callable on a numpy array).
910 Parameters
11 ----------
12 y1, y2 : callables, the homogeneous basis
13 y1p, y2p : callables, their first derivatives
14 a_coef : callable, the coefficient on y'' (often just 1.0)
15 g : callable, the forcing term
16 t : np.ndarray, time grid (must contain 0.0 for clean IC)
17 y0, yp0 : floats, initial conditions y(0), y'(0)
1819 Returns
20 -------
21 y : np.ndarray, the full solution evaluated on t
22 info : dict with the homogeneous constants and the building blocks
23 """24 Y1, Y2 = y1(t), y2(t)25 Y1p, Y2p = y1p(t), y2p(t)26 A = a_coef(t)27 G = g(t)2829# Wronskian W(t) = y1*y2' - y2*y1'30 W = Y1 * Y2p - Y2 * Y1p
3132# u1'(t) = - y2(t) g(t) / (a(t) W(t))33# u2'(t) = y1(t) g(t) / (a(t) W(t))34 u1p =-Y2 * G /(A * W)35 u2p = Y1 * G /(A * W)3637# Integrate from t=0 outward. cumulative_trapezoid returns an array38# one shorter than t, so prepend 0 to recover the same length.39 u1 = np.concatenate(([0.0], cumulative_trapezoid(u1p, t)))40 u2 = np.concatenate(([0.0], cumulative_trapezoid(u2p, t)))4142# Particular solution43 y_p = u1 * Y1 + u2 * Y2
44 y_pp = u1 * Y1p + u2 * Y2p # y_p' (the cross terms cancel by design)4546# Now stitch the homogeneous part to match the initial conditions.47# y(0) = C1 y1(0) + C2 y2(0) + y_p(0)48# y'(0)= C1 y1'(0)+ C2 y2'(0)+ y_p'(0)49 M = np.array([[y1(0.0), y2(0.0)],50[y1p(0.0), y2p(0.0)]])51 rhs = np.array([y0 - y_p[0], yp0 - y_pp[0]])52 C1, C2 = np.linalg.solve(M, rhs)5354 y = C1 * Y1 + C2 * Y2 + y_p
55return y,{"C1": C1,"C2": C2,"u1": u1,"u2": u2,"y_p": y_p}565758# ---- Worked example: y'' + y = sec(t), y(0) = 0, y'(0) = 0 ----59t = np.linspace(-1.3,1.3,401)# stay clear of t = ±π/2 singularities60y, info = variation_of_parameters(61 y1 = np.cos,62 y2 = np.sin,63 y1p =lambda s:-np.sin(s),64 y2p = np.cos,65 a_coef =lambda s: np.ones_like(s),66 g =lambda s:1.0/ np.cos(s),67 t = t, y0 =0.0, yp0 =0.0,68)6970# Hand-derived closed form: y_p = cos(t)*ln|cos(t)| + t*sin(t)71y_p_exact = np.cos(t)* np.log(np.abs(np.cos(t)))+ t * np.sin(t)72print("max |y - y_p_exact| =", np.max(np.abs(y - y_p_exact)))
PyTorch View: Verifying the Solution with Autograd
We derived yp=costln∣cost∣+tsint by hand and called the integral "done." But did we actually get the right answer? The most rigorous check is to plug yp back into the ODE and confirm that yp′′+yp−sect=0 at every grid point. Computing those derivatives by hand is tedious. PyTorch's autograd does it for free.
Autograd-based residual check for any candidate solution
🐍vop_residual.py
Explanation(12)
Code(45)
1Import PyTorch
We only need torch — no torch.nn or torch.optim. The whole demo uses raw autograd, the same machinery that powers backpropagation in neural networks. We're going to use it for a completely different purpose: VERIFYING a candidate ODE solution.
3Function header — what is a residual?
Given any candidate function y(t) for the ODE y″ + y = g(t), the RESIDUAL is r(t) = y″(t) + y(t) − g(t). If y is the true solution, r(t) is exactly 0 for all t. If y is a guess that's slightly wrong, r(t) measures HOW wrong it is, pointwise. Plotting r(t) is a fast visual test for any closed-form candidate.
12Re-anchor t with requires_grad=True
Autograd only tracks tensors that have requires_grad=True. We detach+clone the incoming t so the caller's tensor is left unchanged, then mark our copy as requiring gradients. From now on, every operation involving this t is recorded into a computation graph so we can differentiate through it.
13Evaluate y at every grid point
y_fn is the candidate solution as a plain Python callable. Calling it on the (T,)-shaped tensor t produces a (T,)-shaped tensor of y values, with the computation graph attached. We have NOT yet computed any derivative — only the forward pass.
16First derivative via torch.autograd.grad
torch.autograd.grad(outputs=y, inputs=t, grad_outputs=ones_like(y)) is the differentiation primitive. Reading right-to-left: we ask 'what is dy_i/dt_i for every grid point i?'. Because y is a vector, autograd actually computes a vector-Jacobian product — multiplying the Jacobian by grad_outputs (all-ones) gives us dy/dt elementwise, which is exactly the derivative we want when each y_i depends only on t_i (which is true here).
19create_graph=True — why this matters
By default, autograd FREES the intermediate tensors after differentiation to save memory. But we want to take a SECOND derivative next, so the graph that computed yp must persist. Setting create_graph=True keeps everything alive. The cost is roughly 2x memory; the alternative would be to compute y″ from finite differences, which is far less accurate.
23Second derivative — same call, on yp
We just repeat the same trick on yp. Because the graph is still alive, autograd can trace yp back to t and differentiate again, giving y″ on the entire grid. This is a TRUE second derivative computed symbolically through the graph — not a numerical approximation. create_graph=False here because we don't need a third derivative.
29Return the residual
Now we just plug into the ODE: y″ + y − g(t). A true solution makes this identically 0. A near-solution returns small numbers. A bogus candidate returns big numbers.
34Candidate solution from variation of parameters
This is the closed form we derived by hand in the worked example: y_p(t) = cos(t)·ln|cos(t)| + t·sin(t). It is the particular solution that came out of integrating u₁′ = −tan(t) and u₂′ = 1. If our derivation was correct, the residual must vanish.
37Forcing g(t) = sec(t)
Use torch.cos to keep g(t) inside the same autograd-friendly type system as y(t). Avoid mixing math.cos (Python scalar) with torch — that will silently break the graph.
40Run the verification
We sample 401 grid points in [−1.3, 1.3] (avoiding ±π/2 where sec blows up) at float64 precision. With float64 the residual stays below ~10⁻¹², limited only by floating-point round-off. With float32 it would be ~10⁻⁵ — still a clear pass. This is the gold-standard sanity check for any closed-form ODE candidate: if autograd thinks it satisfies the equation, you can publish it.
45What this technique generalizes to
The same pattern verifies solutions to ANY ODE, not just y″+y=sec(t). Replace 'ypp + y - g(t)' with 'a(t)·ypp + b(t)·yp + c(t)·y - g(t)' and you have a universal residual checker. This is how modern physics-informed neural networks (PINNs) train — they minimize the residual of a PDE evaluated via autograd. The trick you just saw is the engine inside PINNs.
33 lines without explanation
1import torch
23defvop_residual(t, y_fn, g_fn):4"""
5 Given a candidate solution y_fn(t) and a forcing g_fn(t) for the ODE
6 y'' + y = g(t),
7 return the pointwise residual y'' + y - g(t).
8 A correct solution makes this ~0 everywhere.
910 We compute y' and y'' with PyTorch autograd — no finite differences,
11 no closed-form differentiation by hand.
12 """13 t = t.detach().clone().requires_grad_(True)14 y = y_fn(t)1516# First derivative dy/dt17 yp = torch.autograd.grad(18 outputs=y, inputs=t,19 grad_outputs=torch.ones_like(y),20 create_graph=True,# keep graph alive for the 2nd derivative21)[0]2223# Second derivative d2y/dt224 ypp = torch.autograd.grad(25 outputs=yp, inputs=t,26 grad_outputs=torch.ones_like(yp),27 create_graph=False,28)[0]2930return ypp + y - g_fn(t)313233# Candidate solution from variation of parameters:34# y_p(t) = cos(t) * ln|cos(t)| + t * sin(t)35defy_candidate(t):36return torch.cos(t)* torch.log(torch.abs(torch.cos(t)))+ t * torch.sin(t)3738defg(t):39return1.0/ torch.cos(t)4041t = torch.linspace(-1.3,1.3,401, dtype=torch.float64)42res = vop_residual(t, y_candidate, g)4344print("max |residual| =", torch.max(torch.abs(res)).item())45print("mean |residual| =", torch.mean(torch.abs(res)).item())
Why this connects to modern ML
The same idea — "differentiate a candidate solution and demand that the ODE/PDE residual be small" — is the entire training loss of a physics-informed neural network (PINN). A neural network plays the role of yp; autograd computes yp′,yp′′; the residual is minimized with gradient descent. Variation of parameters gives a closed-form answer when the problem is linear; PINNs give a neural answer when it isn't. The autograd plumbing is identical.
Variation of Parameters vs Undetermined Coefficients
Aspect
Undetermined Coefficients
Variation of Parameters
Works for which g(t)?
Only polynomials, exponentials, sin/cos, and products of these
Any continuous g(t), including sec, tan, ln, 1/t, piecewise
Coefficients a, b, c
Must be constants
Can depend on t (provided y₁, y₂ are still known)
Setup cost
Guess a template; solve a tiny algebraic system
Compute Wronskian; integrate two expressions
Hard part
Choosing the right template, handling resonance
Evaluating the two integrals (often the bottleneck)
Always succeeds?
No — fails if g(t) is not in the template family
Yes — always produces an integral representation
When closed form?
Always closed form when applicable
Closed form iff the integrals evaluate elementarily
In practice: try undetermined coefficients first when g(t) looks "nice." Reach for variation of parameters when it doesn't, or when the coefficients are non-constant.
Common Pitfalls
Forgetting to divide by a(t)
The formulas are ui′=±yjg/(aW), with the coefficient of y′′ in the denominator. If your ODE is 2y′′+8y′+6y=g(t), divide g by 2 before plugging in, OR keep the factor of a=2 in the denominator. Either is fine; dropping it is not.
Mixing up the signs
u1′ has a minus; u2′ has a plus. The quickest way to remember: the function paired with the OTHER homogeneous solution gets the sign. So u1 uses y2 (and gets the minus), and u2 uses y1 (and gets the plus).
Including the integration constants twice
Any +K1 in u1 contributes +K1y1 to yp — a multiple of the homogeneous solution. So it is absorbed by C1 in the general solution. Setting both K1=K2=0 is standard and prevents double-counting.
Trying to skip the homogeneous step
You CANNOT do variation of parameters without first knowing y1,y2. If the homogeneous equation has variable coefficients and no closed-form basis (e.g. t2y′′+ty′+(t2−1)y=g(t)), you need a different tool entirely (power series, special functions, numerical integration of the homogeneous equation first).
Singularities in g(t)
Variation of parameters happily integrates across smooth regions of g(t), but the integral diverges where g blows up (e.g. sect at ±π/2). The solution is only valid on intervals where the integrals exist — that's why the worked example restricts to (−π/2,π/2).
Summary
Variation of parameters is the universal method for finding a particular solution to a linear ODE when undetermined coefficients fails. The recipe:
Find y1,y2 for the homogeneous problem.
Compute the Wronskian W=y1y2′−y2y1′.
Set u1′=−y2g/(aW) and u2′=+y1g/(aW).
Integrate to get u1(t),u2(t).
Form y=C1y1+C2y2+u1y1+u2y2 and apply initial conditions.
"Replace the constants C₁, C₂ with functions u₁(t), u₂(t); impose one extra equation; solve the rest with Cramer's rule."
Coming next: Section 22.06 returns to the physical meaning of these solutions — mechanical vibrations. Mass, spring, dashpot, and external force. The variation-of-parameters machinery you just learned is exactly what produces the response of a driven oscillator to an arbitrary forcing — the bridge from pure math to engineering reality.