Chapter 2
22 min read
Section 22 of 353

L'Hôpital's Rule Preview

Limits — Approaching the Infinite

Learning Objectives

By the end of this section, you will be able to:

  1. Recognise the seven indeterminate forms and explain why they defy direct substitution.
  2. State L'Hôpital's Rule and describe the hypotheses under which it is legal to apply.
  3. Interpret the rule geometrically — two curves kissing the axis with slopes whose ratio is the limit.
  4. Compute limits of the forms 0/0 and ∞/∞ using the rule, and convert 0 · ∞, ∞ − ∞, 0⁰, 1^∞, ∞⁰ into one of those two forms.
  5. Verify the rule numerically in Python and automate it with PyTorch's autograd.

The Problem: When Algebra Runs Out

"What is 0 divided by 0?" The honest answer is: it depends on how both parts got to zero.

Up to this point, computing a limit at a point x=cx = c has been a two-step choreography: plug in cc, and if the result is a finite number, that is the limit. The trouble is that a quotient like

limx0sin(3x)sin(5x)\displaystyle \lim_{x \to 0} \frac{\sin(3x)}{\sin(5x)}

hands back 0/00/0 the moment you substitute. The numerator and denominator both vanish, and the ratio is whatever you want it to be — 0/00/0 carries no information. Yet the limit clearly exists, because if you plot the curve near x=0x = 0 you see it settle comfortably at 3/53/5.

The question that launched L'Hôpital's Rule

Given two functions that both go to zero at the same point, which one gets there faster? The answer — their rates, i.e. their derivatives — turns out to be exactly the limit of the ratio.

In 1696, Guillaume de L'Hôpital published the first calculus textbook, and inside it was a rule (learned from his tutor, Johann Bernoulli) that lets us substitute the derivatives of the numerator and denominator when the originals both go to zero. The rule is astonishingly cheap to state, astonishingly powerful in practice, and leans on every piece of machinery we have built so far: limits, continuity, and the derivative's meaning as a local rate.


What Indeterminate Forms Actually Mean

Not every suspicious-looking expression is indeterminate. A limit that produces  0/5\;0/5 is simply 00. A limit that produces 5/05/0 with the denominator approaching 0 from the positive side is ++\infty. Indeterminate forms are the ones where different underlying rates produce different answers.

FormExamplePossible limitsStrategy
0 / 0sin(x)/x as x → 0Any real number, ∞, or DNEL'Hôpital directly
∞ / ∞x / eˣ as x → ∞Any real number or ∞L'Hôpital directly
0 · ∞x ln(x) as x → 0⁺Any real number or ∞Rewrite as 0/0 or ∞/∞
∞ − ∞csc(x) − 1/x as x → 0Any real number or ∞Combine into a single fraction
0⁰xˣ as x → 0⁺Any value in [0, ∞]Take ln; becomes 0 · ∞
1^∞(1 + 1/n)ⁿ as n → ∞Any value in [0, ∞]Take ln; becomes ∞ · 0
∞⁰x^(1/x) as x → ∞Any value in [0, ∞]Take ln; becomes 0 · ∞

Why seven, and why these?

Each indeterminate form is a place where two competing behaviours meet — one pushing toward 0, the other toward infinity — and the final answer depends on which wins and by how much. L'Hôpital's Rule quantifies the fight by comparing rates.


Intuition: Two Cars Racing to Zero

Think of f(x)f(x) and g(x)g(x) as positions of two cars on the number line. At time x=cx = c they both reach the finish line y=0y = 0. The ratio f(x)/g(x)f(x)/g(x) is the distance ratio between the two cars just before they arrive.

🏎 Naive view (breaks down at c)

Directly read positions at the finish line: both are 0, so the ratio is 0/00/0. No information — you cannot compare distances when both cars are exactly at the line.

⚡ L'Hôpital's view (uses rates)

Compare the cars' speeds as they cross the line. If ff is travelling at 3 m/s and gg at 5 m/s right at the instant of arrival, the ratio of remaining distances an instant earlier was 3/53/5. That is the limit.

This is exactly the content of the rule: the indeterminate ratio of positions is determined by the ratio of the rates at which they reach zero. The derivative f(c)f'(c) literally means "how fast f is changing at c" — which for a function vanishing at c is the same as "how fast f is approaching zero at c".

Linear approximation in disguise

Near cc we have f(x)f(c)(xc)f(x) \approx f'(c)(x - c) and g(x)g(c)(xc)g(x) \approx g'(c)(x - c). Dividing, the (xc)(x - c) factors cancel and you are left with f(c)/g(c)f'(c)/g'(c). L'Hôpital is just this cancellation made rigorous.


The Rule, Stated Precisely

L'Hôpital's Rule

Suppose ff and gg are differentiable on an open interval containing cc (except possibly at cc itself), that g(x)0g'(x) \neq 0 on that interval, and that either

limxcf(x)=limxcg(x)=0\lim_{x \to c} f(x) = \lim_{x \to c} g(x) = 0

or

limxcf(x)=limxcg(x)=.\lim_{x \to c} |f(x)| = \lim_{x \to c} |g(x)| = \infty.

Then, provided the right-hand limit exists (or is ±\pm\infty),

limxcf(x)g(x)=limxcf(x)g(x).\displaystyle \lim_{x \to c} \frac{f(x)}{g(x)} = \lim_{x \to c} \frac{f'(x)}{g'(x)}.

The same conclusion holds at c=±c = \pm\infty and for one-sided limits.

All three conditions must hold

  • Indeterminate form: direct substitution must produce 0/00/0 or /\infty/\infty.
  • Differentiability near c: both ff and gg must have derivatives on a punctured neighbourhood of cc.
  • g(x)0g'(x) \neq 0 near cc — you cannot divide by zero, even after differentiating.
  • The new limit limf/g\lim f'/g' must exist (or be ±\pm\infty). If it does not, the rule simply tells you nothing.

Why It Works — A Linear Approximation Proof

The clearest way to see the rule is to Taylor-expand. Because ff and gg are differentiable at cc with f(c)=g(c)=0f(c) = g(c) = 0, for xx near cc:

f(x)=f(c)(xc)+Rf(x),g(x)=g(c)(xc)+Rg(x)f(x) = f'(c)(x - c) + R_f(x), \quad g(x) = g'(c)(x - c) + R_g(x)

where the remainders Rf,RgR_f, R_g are o(xc)o(x - c) — they vanish faster than the linear term. Dividing,

f(x)g(x)=f(c)(xc)+Rf(x)g(c)(xc)+Rg(x)=f(c)+Rf(x)/(xc)g(c)+Rg(x)/(xc).\displaystyle \frac{f(x)}{g(x)} = \frac{f'(c)(x-c) + R_f(x)}{g'(c)(x-c) + R_g(x)} = \frac{f'(c) + R_f(x)/(x-c)}{g'(c) + R_g(x)/(x-c)}.

Because Rf(x)/(xc)0R_f(x)/(x-c) \to 0 and Rg(x)/(xc)0R_g(x)/(x-c) \to 0 as xcx \to c, the middle expression tends to f(c)/g(c)f'(c)/g'(c). That is the rule. A more careful proof uses the Cauchy Mean Value Theorem — which is the main-course version of the idea we'll cover in a later chapter — but the linear-approximation picture is the honest intuition.

What if f'(c)/g'(c) is itself 0/0?

Apply the rule again! For (1cosx)/x2(1 - \cos x)/x^2, the first application yields sin(x)/(2x)\sin(x)/(2x) — still 0/0. A second application gives cos(x)/21/2\cos(x)/2 \to 1/2. Each pass peels off another layer of vanishing behaviour.


Interactive: Compare f/g and f'/g'

Below you can step through six indeterminate-form limits side by side. The red plot is the naive ratio f(x)/g(x)f(x)/g(x); the green plot is the ratio of derivatives f(x)/g(x)f'(x)/g'(x). Drag the zoom slider to shrink the window around the trouble point. Watch how the red curve has to crawl toward the answer while the green curve sits on it from the start.

Loading L'Hôpital visualizer…

Try this experimental loop:

  1. Start with sin(x)/x at x → 0. Zoom all the way in — the red curve wobbles numerically near 0, but the green cos(x)/1 is perfectly flat at 1.
  2. Switch to (1 − cos x)/x². Zoom in. The red crawl is slow because the numerator is quadratic; but sin(x)/(2x)\sin(x)/(2x) is still 0/0, so the green curve itself is an indeterminate ratio until you apply the rule a second time (mentally) to land on 1/2.
  3. Jump to x/eˣ at x → ∞. Now you zoom on a very large window. The red ratio falls from 1/eˣ quickly; the green curve 1/ex1/e^x is already the derivative answer.
  4. Pick x · ln(x) at x → 0⁺. This is a 0·∞ form. The rewrite  x/(1/lnx)\;x/(1/\ln x) converts it to 0/0 and shows both ratios climbing to 0.

Worked Example: lim sin(3x)/sin(5x)

We will compute limx0sin(3x)sin(5x)\displaystyle \lim_{x \to 0} \frac{\sin(3x)}{\sin(5x)} three ways: as a direct substitution (fails), via L'Hôpital, and via a small Taylor argument. Do each step on paper — it builds the reflex for recognising when the rule applies and when it does not.

📝 Step-by-step numerical walkthrough — try it yourself first

Step 1 — Attempt direct substitution. Plug in x=0x = 0:

sin(30)sin(50)=00.\frac{\sin(3 \cdot 0)}{\sin(5 \cdot 0)} = \frac{0}{0}.

Useless. Both parts vanish at the same point. This flags the limit as a candidate for L'Hôpital.

Step 2 — Check hypotheses. Both sin(3x)\sin(3x) and sin(5x)\sin(5x) are differentiable everywhere, and g(x)=5cos(5x)g'(x) = 5\cos(5x) is nonzero in a neighbourhood of 0 (it equals 5 there). The rule applies.

Step 3 — Differentiate top and bottom separately. Important: we do not use the quotient rule; L'Hôpital differentiates the numerator and denominator independently.

ddxsin(3x)=3cos(3x),ddxsin(5x)=5cos(5x).\frac{d}{dx}\sin(3x) = 3\cos(3x), \qquad \frac{d}{dx}\sin(5x) = 5\cos(5x).

Step 4 — Evaluate the new ratio at x = 0.

limx03cos(3x)5cos(5x)=3cos(0)5cos(0)=3151=35.\lim_{x \to 0} \frac{3\cos(3x)}{5\cos(5x)} = \frac{3\cos(0)}{5\cos(0)} = \frac{3 \cdot 1}{5 \cdot 1} = \frac{3}{5}.

Step 5 — Sanity-check numerically. At x=0.1x = 0.1:

sin(0.3) = 0.2955202067
sin(0.5) = 0.4794255386
ratio = 0.2955202067 / 0.4794255386 = 0.6164048071 (≈ 0.6 ✓)

At x=0.001x = 0.001 the ratio is 0.6000016000; at x=104x = 10^{-4} it is 0.6000000160. Convergence is quadratic — exactly what Taylor predicts.

Step 6 — Cross-check with Taylor. Using sin(u)=uu3/6+\sin(u) = u - u^3/6 + \cdots:

sin(3x)sin(5x)=3x(3x)3/6+5x(5x)3/6+=3x(13x2/2+)5x(125x2/6+)=35(1+O(x2)).\frac{\sin(3x)}{\sin(5x)} = \frac{3x - (3x)^3/6 + \cdots}{5x - (5x)^3/6 + \cdots} = \frac{3x(1 - 3x^2/2 + \cdots)}{5x(1 - 25x^2/6 + \cdots)} = \frac{3}{5}\left(1 + O(x^2)\right).

The leading behaviour is 3/53/5 exactly; the correction is quadratic in xx. This explains the empirical pattern: the error at x=10kx = 10^{-k} is about 102k10^{-2k}.

The pattern works for every linear-in-argument pair

For limx0sin(ax)/sin(bx)\lim_{x \to 0} \sin(ax)/\sin(bx) with b0b \neq 0, the answer is always a/ba/b. Same for tan(ax)/tan(bx)\tan(ax)/\tan(bx), sin(ax)/tan(bx)\sin(ax)/\tan(bx), etc. Each L'Hôpital call strips one layer off the chain rule.


Handling ∞/∞ and Other Forms

∞ / ∞ directly: x / eˣ as x → ∞

Both xx and exe^x blow up, so direct evaluation gives /\infty/\infty. Differentiate:

limxxex=limx1ex=0.\displaystyle \lim_{x \to \infty} \frac{x}{e^x} = \lim_{x \to \infty} \frac{1}{e^x} = 0.

One L'Hôpital call collapsed a hard question into an easy one. This also proves the classical fact that exe^x dominates every polynomial — apply the rule nn times to xn/exx^n/e^x and you end up with n!/ex0n!/e^x \to 0.

0 · ∞: x ln(x) as x → 0⁺

A product of 0 and -\infty is ambiguous. Rewrite it as a quotient to force a 0/0 or ∞/∞ form, whichever is easier to differentiate:

xln(x)=ln(x)1/x(now ).x \ln(x) = \frac{\ln(x)}{1/x} \quad \text{(now }\tfrac{-\infty}{\infty}\text{)}.

Apply L'Hôpital:

limx0+ln(x)1/x=limx0+1/x1/x2=limx0+(x)=0.\lim_{x \to 0^+} \frac{\ln(x)}{1/x} = \lim_{x \to 0^+} \frac{1/x}{-1/x^2} = \lim_{x \to 0^+} (-x) = 0.

Notice the freedom in the rewrite: we could also have used x1/lnx\tfrac{x}{1/\ln x} (a 0/0 form), but the resulting derivative is uglier. Part of the skill is choosing the rewrite that leads to the cleanest derivative.

1^∞ and ∞⁰: take the logarithm first

For powers whose base and exponent both misbehave, take ln\ln to convert a product of a log and another ambiguous factor, then use the techniques above. The classical example is

limn(1+1n)n=e.\displaystyle \lim_{n \to \infty}\left(1 + \frac{1}{n}\right)^{n} = e.

Let y=(1+1/n)ny = (1 + 1/n)^n so lny=nln(1+1/n)\ln y = n \ln(1 + 1/n), an 0\infty \cdot 0 form. Rewrite:

lny=ln(1+1/n)1/napply L’Hoˆpital in the variable h=1/n0+.\ln y = \frac{\ln(1 + 1/n)}{1/n} \quad \longrightarrow \quad \text{apply L'Hôpital in the variable } h = 1/n \to 0^+.

The resulting limit is 11, so lny1\ln y \to 1 and ye1=ey \to e^1 = e. The same trick handles 0⁰, 1^∞, and ∞⁰ uniformly.


Python: Verifying L'Hôpital Numerically

The rule is a theorem, but a computer can illustrate it beautifully. Below we compute sin(3x)/sin(5x)\sin(3x)/\sin(5x) at shrinking xx values and watch it converge to the L'Hôpital answer 3/53/5. We then compute the answer directly by plugging x=0x = 0 into f/gf'/g' — zero shrinking required.

Pure Python: naive shrinking vs one-shot L'Hôpital
🐍lhopital_rule.py
1import math

We need math.sin and math.cos. Using the stdlib keeps the example portable and highlights that L'Hôpital is a calculus idea, not a library trick.

EXECUTION STATE
math = Python standard library for scalar math — sin, cos, exp, log, etc. Pure-Python, no dependencies.
3def f(x): the numerator sin(3x)

Defines the numerator we want to take the limit of. sin(3x) vanishes at 0 because sin(0) = 0, which is exactly what makes the quotient 0/0 and forces us to reach for L'Hôpital.

EXECUTION STATE
⬇ input: x = Any real number. We will evaluate it at shrinking positive values approaching 0.
⬆ returns = sin(3x) — a float. Example: f(0.1) = sin(0.3) ≈ 0.2955.
→ why 3x inside sin? = The coefficient 3 sets the derivative at 0: d/dx sin(3x) = 3 cos(3x), which is 3 at x = 0. That 3 becomes half of the final answer.
7def g(x): the denominator sin(5x)

The denominator. sin(5x) also vanishes at 0. Because both numerator and denominator go to zero, the quotient is 0/0 — an indeterminate form.

EXECUTION STATE
⬇ input: x = Same sweep as for f — the same x values feed both functions so we can divide them.
⬆ returns = sin(5x). Example: g(0.1) = sin(0.5) ≈ 0.4794.
→ why 5x? = The coefficient 5 sets g'(0) = 5. The final limit is f'(0)/g'(0) = 3/5 — no guessing, L'Hôpital reads it straight off.
11def f_prime(x): analytic derivative

The chain rule derivative of sin(3x) is 3 cos(3x). We type it out explicitly so the code reads like the math. A real project might use autograd instead (see the PyTorch code below).

EXECUTION STATE
⬆ returns = 3 * cos(3x). At x = 0: 3 * cos(0) = 3 * 1 = 3.
📚 math.cos(x) = Python stdlib cosine in radians. math.cos(0) = 1, math.cos(π/2) ≈ 0.
15def g_prime(x): analytic derivative

Derivative of sin(5x) is 5 cos(5x). At x = 0 this evaluates to 5. The ratio 3/5 will be our limit.

EXECUTION STATE
⬆ returns = 5 * cos(5x). At x = 0: 5 * 1 = 5.
20print("Naive limit ...") header

Announce the first experiment: shrink x and watch f(x)/g(x). This is the brute-force approach a computer can always try; we use it to build intuition before applying the rule.

21print column headers

Format specifiers right-align the columns so the table is readable. `:>10` means right-align in width 10; `:>14` leaves room for long decimals.

EXECUTION STATE
:>10 = Right-align text in a width-10 field. Example: 'x'.rjust(10) → ' x'.
22print("-" * 54)

Repeats a dash 54 times as a horizontal separator. Python's `str * int` is the idiomatic way to build separator strings.

23for x in [0.5, 0.1, 0.01, 0.001, 1e-4]

Walk x down by factors of 10. Each row squeezes x closer to 0 so we can watch the ratio stabilise. If the limit is real, the ratio should converge; if not, it will diverge or oscillate.

LOOP TRACE · 5 iterations
x = 0.5
sin(3·0.5) = sin(1.5) = 0.997495
sin(5·0.5) = sin(2.5) = 0.598472
ratio = 1.66673586
x = 0.1
sin(0.3) = 0.295520
sin(0.5) = 0.479426
ratio = 0.61640481
x = 0.01
sin(0.03) = 0.029996
sin(0.05) = 0.049979
ratio = 0.60016004
x = 0.001
ratio = 0.60000160
x = 1e-4
ratio = 0.60000002
→ target = 3/5 = 0.6
24fx = f(x) — evaluate the numerator

Pull the numerator value out once. Reusing `fx` avoids computing sin(3x) twice on the same line (once for display, once in the division).

EXECUTION STATE
fx = A single float equal to sin(3x). Example: fx at x = 0.01 is 0.029996.
25gx = g(x) — evaluate the denominator

Same idea for the denominator. Pulling them into named variables makes the table-print on the next line readable and avoids floating-point noise from recomputation.

EXECUTION STATE
gx = A single float equal to sin(5x). Example: at x = 0.01, gx = 0.049979.
26print formatted row

Print one row of the convergence table. `:.6f` gives 6 fixed-point decimals; `:.8f` for the ratio shows more digits so we can watch the last few digits stabilise at 0.60000000…

EXECUTION STATE
:.6f = Fixed-point, 6 decimal places. 0.029996 renders as '0.029996'.
:.8f = Fixed-point, 8 decimals. We use extra digits for the ratio so convergence is visible at small x.
29print() — blank separator

Empty print inserts a newline. Keeps the naive-vs-L'Hôpital outputs visually separated.

30print("L'Hopital: ...") header

Announce the shortcut: instead of shrinking x, evaluate f'(0)/g'(0) at the trouble spot itself. This only works because both f and g are differentiable at 0 and g'(0) ≠ 0.

31limit = f_prime(0) / g_prime(0)

Read the limit off directly: f'(0) = 3, g'(0) = 5, so the limit is 3/5 = 0.6. No shrinking, no table — one arithmetic step.

EXECUTION STATE
f_prime(0) = 3 · cos(0) = 3.0
g_prime(0) = 5 · cos(0) = 5.0
limit = 3.0 / 5.0 = 0.6
→ why this is legal = L'Hôpital applies when (1) f and g → 0 at the same point, (2) f and g are differentiable near that point, (3) g'(x) ≠ 0 there. All three are true here: cos(0) = 1, so g'(0) = 5 ≠ 0.
32print the computed limit

Echo the arithmetic so the reader sees 3/5 = 0.6 explicitly. `.6f` formats the answer to six decimal places for a clean-looking 0.600000.

35print("Absolute error ...") header

Start the third experiment: measure how fast the naive ratio approaches 0.6. If L'Hôpital is correct, the error should shrink smoothly to zero as x → 0.

36for x in [0.1, 0.01, 0.001, 1e-4]

Sweep x over four orders of magnitude. Watching |f(x)/g(x) − 0.6| decrease to machine precision is a strong numerical witness that our analytic answer is correct.

LOOP TRACE · 4 iterations
x = 0.1
|f/g − 3/5| = 1.640e-02
x = 0.01
|f/g − 3/5| = 1.600e-04
x = 0.001
|f/g − 3/5| = 1.600e-06
x = 1e-4
|f/g − 3/5| = 1.600e-08
→ pattern = Error shrinks roughly as x². This matches the Taylor remainder: sin(3x)/sin(5x) = 3/5 + O(x²).
37err = abs(f(x) / g(x) - limit)

Measure the distance between the numerical ratio and the L'Hôpital answer 3/5. `abs()` returns the absolute value so we do not need to worry about sign.

EXECUTION STATE
abs(a) = Built-in absolute value. abs(-0.01) = 0.01.
err = Non-negative float. Shrinks roughly like x² because the next-order Taylor term in sin(3x)/sin(5x) − 3/5 is O(x²).
38print formatted error row

`:>7.0e` right-aligns x in width-7 scientific notation with no fractional digits. `:.3e` formats the error with 3 significant digits.

EXECUTION STATE
:>7.0e = Example: 1e-4 → ' 1e-04'.
:.3e = Example: 0.00016 → '1.600e-04'.
19 lines without explanation
1import math
2
3def f(x):
4    """Numerator — vanishes at x = 0."""
5    return math.sin(3 * x)
6
7def g(x):
8    """Denominator — also vanishes at x = 0."""
9    return math.sin(5 * x)
10
11def f_prime(x):
12    """Analytic derivative of sin(3x) is 3 cos(3x)."""
13    return 3 * math.cos(3 * x)
14
15def g_prime(x):
16    """Analytic derivative of sin(5x) is 5 cos(5x)."""
17    return 5 * math.cos(5 * x)
18
19# --- 1) Naive evaluation: shrink x toward 0 and watch f/g. -----------------
20print("Naive limit of f(x)/g(x) as x -> 0")
21print(f"{'x':>10}  {'f(x)':>12}  {'g(x)':>12}  {'f(x)/g(x)':>14}")
22print("-" * 54)
23for x in [0.5, 0.1, 0.01, 0.001, 1e-4]:
24    fx = f(x)
25    gx = g(x)
26    print(f"{x:>10.4g}  {fx:>12.6f}  {gx:>12.6f}  {fx / gx:>14.8f}")
27
28# --- 2) L'Hopital shortcut: evaluate f'/g' at x = 0 directly. --------------
29print()
30print("L'Hopital: evaluate ratio of derivatives AT x = 0")
31limit = f_prime(0) / g_prime(0)        # 3*cos(0) / 5*cos(0) = 3/5
32print(f"f'(0) / g'(0) = {f_prime(0)}/{g_prime(0)} = {limit:.6f}")
33
34# --- 3) Sanity check: the two approaches agree in the limit. ---------------
35print()
36print("Absolute error between naive ratio and 3/5")
37for x in [0.1, 0.01, 0.001, 1e-4]:
38    err = abs(f(x) / g(x) - limit)
39    print(f"x = {x:>7.0e}   |f/g - 3/5| = {err:.3e}")

What to notice in the output

The naive ratio needs shrinking x by four orders of magnitude to nail the first eight digits of 0.6. The L'Hôpital call produces those digits from a single division. The error column also shows the quadratic convergence predicted by Taylor — a quiet confirmation that f(0)/g(0)f'(0)/g'(0) really is the limit.


PyTorch: Using Autograd as an L'Hôpital Engine

Hand-differentiating sin(3x)\sin(3x) is easy. Hand-differentiating a 100-term loss function in a neural net is not. Autograd does it for us, which means we can build a fully automatic L'Hôpital evaluator: declare ff and gg, call .backward() twice, divide, done.

PyTorch: autograd reads f'(0) and g'(0) in one call each
🐍lhopital_autograd.py
1import torch

PyTorch gives us torch.sin, torch.cos, and — crucially — automatic differentiation. We will use autograd to compute f'(0) and g'(0) without hand-coding the chain rule.

EXECUTION STATE
torch = Tensor library with autograd. Here we use only 0-dim (scalar) tensors.
3def f(x): return torch.sin(3 * x)

Exactly the same numerator, but with torch.sin so the operation is recorded in the autograd graph. Multiplying a tensor by 3 is also a tracked op, so the chain rule flows automatically.

EXECUTION STATE
📚 torch.sin(t) = Elementwise sine of a tensor. Has an autograd implementation: derivative is torch.cos(t).
⬇ input: x (tensor, requires_grad=True) = A 0-dim tracked tensor. Each op on it appends a node to the autograd graph.
⬆ returns = A scalar tensor whose grad_fn is SinBackward, wrapping a MulBackward for 3*x.
6def g(x): return torch.sin(5 * x)

Denominator as a tensor function. Autograd records this as SinBackward ∘ MulBackward just like f.

EXECUTION STATE
⬆ returns = sin(5x) as a scalar tensor. At x = 0, value is 0.0.
9Comment: probe at x = 0

Explains the reasoning for the next line: we want to evaluate the derivatives at the singular point itself. Autograd is happy to do this because sin is differentiable everywhere, even where it equals zero.

12x = torch.tensor(0.0, requires_grad=True)

Create a tracked scalar tensor at exactly x = 0. With requires_grad=True, PyTorch will accumulate gradients into x.grad on every subsequent backward() call.

EXECUTION STATE
📚 torch.tensor(data, requires_grad) = Constructor: builds a leaf tensor. requires_grad=True registers it for gradient tracking.
⬇ arg 1: 0.0 = The value at which we want the derivative. Both f and g vanish here — exactly the place naive evaluation produces 0/0.
⬇ arg 2: requires_grad=True = Enables autograd. Without it, calling .backward() on a downstream tensor would error because there is nothing to differentiate with respect to.
→ x.grad = Starts as None. Populated (or added to) by every subsequent .backward() call.
14Comment — evaluate f and g once

Signals what the next two lines do: two forward passes, each building its own autograd subgraph that shares the leaf x.

16y_f = f(x)

Forward pass through f. y_f is a scalar tensor with value sin(0) = 0.0 and a grad_fn that chains SinBackward → MulBackward → x.

EXECUTION STATE
y_f = tensor(0.0, grad_fn=<SinBackward0>)
→ y_f.item() = 0.0 — this is why the naive ratio is 0/0 and needs the rule.
17y_g = g(x)

Forward pass through g. Same story: value 0, a separate autograd subgraph ending at x.

EXECUTION STATE
y_g = tensor(0.0, grad_fn=<SinBackward0>)
20y_f.backward(retain_graph=True)

Backprop through f's graph to deposit df/dx into x.grad. retain_graph=True tells PyTorch to keep internal buffers so we can backward through g's graph later (otherwise PyTorch frees them for memory).

EXECUTION STATE
📚 .backward(retain_graph=False) = Walks the autograd graph from this scalar tensor, applying the chain rule. Accumulates into .grad of every leaf with requires_grad=True.
⬇ arg: retain_graph=True = Keep the saved intermediates. We will call backward on y_g afterwards; without this flag y_g's graph would already be freed.
→ side effect = x.grad becomes tensor(3.0) because df/dx = 3 cos(3x) and cos(0) = 1.
21f_prime = x.grad.item()

Pull the Python float out of the 0-dim gradient tensor. This number is f'(0) = 3.0 — precisely the coefficient we hard-coded earlier, now obtained automatically.

EXECUTION STATE
📚 .item() = Returns the Python scalar for a 0-dim tensor. Raises if the tensor has more than one element.
f_prime = 3.0
22x.grad = None # clear accumulator

PyTorch *adds* gradients to .grad on each backward call. If we did not clear it, the next backward would return 3 + g'(0) = 3 + 5 = 8, not 5. Setting x.grad to None is the cleanest reset.

EXECUTION STATE
why not .zero_()? = Both work. Setting to None is slightly cheaper (no memory to zero) and also makes the next backward allocate a fresh tensor.
24y_g.backward()

Backprop through g to load dg/dx into x.grad. No retain_graph needed because we do not need g's graph again.

EXECUTION STATE
→ side effect = x.grad = tensor(5.0) because g'(x) = 5 cos(5x) and cos(0) = 1.
25g_prime = x.grad.item()

Extract g'(0) as a Python float. This is 5.0 — the other number L'Hôpital needs.

EXECUTION STATE
g_prime = 5.0
27print("f(0) = ...")

Echo the forward values so the reader sees 0/0 concretely before the rule saves the day.

EXECUTION STATE
y_f.item() = 0.0
:.6f = Fixed-point, 6 decimals. 0.0 → '0.000000'.
28print("g(0) = ...")

Same sanity-print for g. Two zeros on top of each other — the indeterminate form.

29print("f'(0) = ...")

Show the autograd-computed numerator-of-L'Hôpital.

EXECUTION STATE
f_prime = 3.0 — matches 3·cos(0) = 3.
30print("g'(0) = ...")

And the denominator-of-L'Hôpital.

EXECUTION STATE
g_prime = 5.0 — matches 5·cos(0) = 5.
31print L'Hopital limit

One division and the limit drops out: 3.0 / 5.0 = 0.6. The whole detour through 0/0 was unnecessary once we had autograd.

EXECUTION STATE
f_prime / g_prime = 0.6 — the exact answer, matching the naive table's convergence target.
34print() — blank line

Cosmetic newline between the L'Hôpital result and the sanity-check table.

35print("Autograd ... vs numerical ratio")

Header for the sanity check. We will compare the naive ratio f(x)/g(x) against the autograd-computed L'Hôpital answer across shrinking x.

36with torch.no_grad():

Disable autograd inside the block. We are only printing numbers, not backproping. Skipping graph construction is faster and avoids accidentally polluting x.grad from the earlier backward calls.

EXECUTION STATE
📚 torch.no_grad() = Context manager that sets requires_grad=False for all ops inside. Standard idiom for evaluation/inference.
→ why here? = We are done with gradients. Every forward pass under no_grad is cheaper and cannot alter x.grad.
37for xv in [0.1, 0.01, 0.001, 1e-4]

Iterate over the same shrinking x values we used in pure Python. Re-running under torch keeps the two implementations comparable.

LOOP TRACE · 4 iterations
xv = 0.1
naive = 0.61640481
err = 1.64e-02
xv = 0.01
naive = 0.60016004
err = 1.60e-04
xv = 0.001
naive = 0.60000160
err = 1.60e-06
xv = 1e-4
naive = 0.60000002
err = 1.60e-08
38xt = torch.tensor(xv)

Wrap each Python float in a zero-dim tensor so we can feed it through the same f, g that expect tensors. No requires_grad needed — we are inside no_grad and do not need gradients here.

EXECUTION STATE
xt = A scalar tensor with the current xv value. Example: torch.tensor(0.01) → tensor(0.0100).
39naive = (f(xt) / g(xt)).item()

Compute the naive ratio as a Python float. Division of two scalar tensors is itself a scalar tensor; `.item()` unwraps it.

EXECUTION STATE
f(xt) / g(xt) = Elementwise tensor division. For xv = 0.01 the result is tensor(0.6002). Inside no_grad it builds no backward graph.
naive = Python float version of the ratio. Example at xv = 0.01: 0.60016004.
40err = abs(naive - f_prime / g_prime)

Distance between the numerical ratio and the L'Hôpital answer. If the rule gave the correct limit, this should shrink to machine precision as xv → 0.

EXECUTION STATE
err = Non-negative float. Decreases quadratically (O(x²)) because the next Taylor term of sin(3x)/sin(5x) is quadratic.
41print formatted row

Print xv (scientific, 0 fractional digits), the naive ratio (8 decimals to show convergence), and the absolute error (3-sig-fig scientific).

EXECUTION STATE
:>7.0e = Right-align width 7, scientific, 0 digits: 1e-4 → ' 1e-04'.
:.8f = Fixed-point with 8 decimals: 0.6 → '0.60000000'.
:.2e = Scientific with 2 decimals: 1.6e-4 → '1.60e-04'.
16 lines without explanation
1import torch
2
3def f(x):
4    return torch.sin(3 * x)
5
6def g(x):
7    return torch.sin(5 * x)
8
9# 1) A point where BOTH functions vanish. We probe "just off zero"
10#    because autograd at a leaf tensor only needs a place to evaluate;
11#    but evaluating exactly at 0 works too, since f and g are analytic.
12x = torch.tensor(0.0, requires_grad=True)
13
14# 2) Evaluate f and g once. This builds two autograd graphs that
15#    share x as the leaf.
16y_f = f(x)
17y_g = g(x)
18
19# 3) Backprop through f separately from g. We zero out x.grad in between
20#    because PyTorch accumulates gradients by default.
21y_f.backward(retain_graph=True)
22f_prime = x.grad.item()
23x.grad = None                      # clear accumulator
24
25y_g.backward()
26g_prime = x.grad.item()
27
28print(f"f(0)  = {y_f.item():.6f}   (should be 0)")
29print(f"g(0)  = {y_g.item():.6f}   (should be 0)")
30print(f"f'(0) = {f_prime:.6f}")
31print(f"g'(0) = {g_prime:.6f}")
32print(f"L'Hopital limit = f'(0) / g'(0) = {f_prime / g_prime:.6f}")
33
34# 4) Sanity check: sweep x and compare f(x)/g(x) with the L'Hopital answer.
35print()
36print("Autograd L'Hopital vs numerical ratio")
37with torch.no_grad():
38    for xv in [0.1, 0.01, 0.001, 1e-4]:
39        xt = torch.tensor(xv)
40        naive = (f(xt) / g(xt)).item()
41        err = abs(naive - f_prime / g_prime)
42        print(f"x = {xv:>7.0e}  naive = {naive:.8f}  err = {err:.2e}")

The deep connection to machine learning

Every training step of a neural network computes a ratio-like quantity — the gradient of loss with respect to weights — at a point where the network's output is near its target. That computation is, in effect, asking "how does the output respond to a tiny nudge?", and the answer comes from autograd the same way L'Hôpital reads f(c)/g(c)f'(c)/g'(c). The rule that 17th-century mathematicians wrote for limits is the same rule that 21st-century optimisers use to train billion-parameter models.


Where L'Hôpital Unlocks Real Problems

📐 Growth comparisons

Which grows faster: x100x^{100} or 1.01x1.01^x? L'Hôpital applied 100 times proves the exponential wins — the basis of every complexity argument in computer science.

🌊 Small-angle physics

sin(θ)θ\sin(\theta) \approx \theta and 1cos(θ)θ2/21 - \cos(\theta) \approx \theta^2/2 are L'Hôpital results. They underpin the pendulum equation, optics, and every small-perturbation expansion in physics.

📈 Continuous compounding

The limit (1+r/n)ner(1 + r/n)^n \to e^r is the reason continuous compounding exists. Without L'Hôpital (or Taylor), the conversion from discrete to continuous interest is mysterious.

🤖 Model selection

Ratios of log-likelihoods, Bayes factors, softmax temperatures — many statistical criteria reduce to ratios that are 0/0 or ∞/∞ at critical limits. L'Hôpital is the tool that makes them meaningful.


Common Pitfalls

Pitfall 1 — Applying the rule to non-indeterminate forms

limx1x2+1x+1=22=1\displaystyle \lim_{x \to 1} \frac{x^2 + 1}{x + 1} = \frac{2}{2} = 1 by direct substitution. Blindly differentiating gives 2x/1=22x/1 = 2 at x=1x = 1 — a wrong answer. Always verify the form is 0/0 or ∞/∞ before differentiating.

Pitfall 2 — Using the quotient rule

L'Hôpital replaces f(x)/g(x)f(x)/g(x) with f(x)/g(x)f'(x)/g'(x)not with (f/g)(f/g)'. Do not apply the quotient rule; it gives a different (usually much uglier) expression.

Pitfall 3 — Stopping too early when the form persists

After one application, the new ratio may still be indeterminate. Keep applying the rule (or switch to Taylor) until you land on a form that can be evaluated directly. Trying (1cosx)/x2(1 - \cos x)/x^2 with only one pass produces sin(x)/(2x)\sin(x)/(2x) — an answer, but still 0/0. A second pass gives the true answer 1/2.

Pitfall 4 — When the derivative limit does not exist

If limf/g\lim f'/g' fails to exist (for instance, oscillating like xsin(1/x)x\sin(1/x)), the rule is silent — it does not say the original limit fails. You may need a different technique (squeeze theorem, Taylor remainder bounds, etc.).

Pitfall 5 — Circular reasoning for sin(x)/x

Using L'Hôpital to "prove" limx0sin(x)/x=1\lim_{x \to 0} \sin(x)/x = 1 is logically circular, because the derivative of sin\sin is usually derived from that limit. This is a book-keeping issue, not a rule issue: once derivatives of trig are established by a geometric squeeze argument, L'Hôpital handles every other trig limit cleanly.


Summary

L'Hôpital's Rule is a bridge between two pillars of calculus: the limit and the derivative. When direct substitution produces an indeterminate form, the rule says: the answer is controlled by the rate at which numerator and denominator approach their limiting values. Formally,

limxcf(x)g(x)=limxcf(x)g(x)\displaystyle \lim_{x \to c} \frac{f(x)}{g(x)} = \lim_{x \to c} \frac{f'(x)}{g'(x)}

whenever the hypotheses (indeterminate form, differentiability, g0g' \neq 0, and existence of the right-hand limit) hold.

Indeterminate formConversionThen apply
0 / 0L'Hôpital directly
∞ / ∞L'Hôpital directly
0 · ∞Rewrite as 0/(1/∞) or ∞/(1/0)L'Hôpital on the quotient
∞ − ∞Combine into one fraction via common denominatorL'Hôpital
0⁰, 1^∞, ∞⁰Take ln; becomes 0 · ∞Convert and apply L'Hôpital

Key Takeaways

  1. L'Hôpital is a statement about rates: the ratio of values at the trouble point equals the ratio of their derivatives there.
  2. The rule applies only to 0/0 and ∞/∞ forms directly; every other indeterminate form must be rewritten into one of these first.
  3. If the new ratio is still indeterminate, apply the rule again — or switch to a Taylor expansion for clarity.
  4. The proof is just a linear approximation: f(x)f(c)(xc)f(x) \approx f'(c)(x-c) and g(x)g(c)(xc)g(x) \approx g'(c)(x-c), then cancel (xc)(x-c).
  5. In practice, PyTorch's autograd turns L'Hôpital into a two-line procedure — which is effectively what every deep-learning optimiser does, billions of times per second, at scale.
The L'Hôpital promise:
"When values collide at 0/0 or ∞/∞, the answer is hidden in their rates. Differentiate once, and the collision unfolds."
Coming Next: Chapter 3 turns the ε–δ language into a property of functions themselves: continuity. We will see why continuous functions are the sandbox on which every theorem about derivatives and integrals is built — and how the smallest break in a curve unravels entire theorems.
Loading comments...