Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Recognise the seven indeterminate forms and explain why they defy direct substitution.
State L'Hôpital's Rule and describe the hypotheses under which it is legal to apply.
Interpret the rule geometrically — two curves kissing the axis with slopes whose ratio is the limit.
Compute limits of the forms 0/0 and ∞/∞ using the rule, and convert 0 · ∞, ∞ − ∞, 0⁰, 1^∞, ∞⁰ into one of those two forms.
Verify the rule numerically in Python and automate it with PyTorch's autograd.

The Problem: When Algebra Runs Out

"What is 0 divided by 0?" The honest answer is: it depends on how both parts got to zero.

Up to this point, computing a limit at a point $x = c$ has been a two-step choreography: plug in $c$ , and if the result is a finite number, that is the limit. The trouble is that a quotient like

\displaystyle \lim_{x \to 0} \frac{\sin(3x)}{\sin(5x)}

hands back $0/0$ the moment you substitute. The numerator and denominator both vanish, and the ratio is whatever you want it to be — $0/0$ carries no information. Yet the limit clearly exists, because if you plot the curve near $x = 0$ you see it settle comfortably at $3/5$ .

The question that launched L'Hôpital's Rule

Given two functions that both go to zero at the same point, which one gets there faster? The answer — their rates, i.e. their derivatives — turns out to be exactly the limit of the ratio.

In 1696, Guillaume de L'Hôpital published the first calculus textbook, and inside it was a rule (learned from his tutor, Johann Bernoulli) that lets us substitute the derivatives of the numerator and denominator when the originals both go to zero. The rule is astonishingly cheap to state, astonishingly powerful in practice, and leans on every piece of machinery we have built so far: limits, continuity, and the derivative's meaning as a local rate.

What Indeterminate Forms Actually Mean

Not every suspicious-looking expression is indeterminate. A limit that produces $\;0/5$ is simply $0$ . A limit that produces $5/0$ with the denominator approaching 0 from the positive side is $+\infty$ . Indeterminate forms are the ones where different underlying rates produce different answers.

Form	Example	Possible limits	Strategy
0 / 0	sin(x)/x as x → 0	Any real number, ∞, or DNE	L'Hôpital directly
∞ / ∞	x / eˣ as x → ∞	Any real number or ∞	L'Hôpital directly
0 · ∞	x ln(x) as x → 0⁺	Any real number or ∞	Rewrite as 0/0 or ∞/∞
∞ − ∞	csc(x) − 1/x as x → 0	Any real number or ∞	Combine into a single fraction
0⁰	xˣ as x → 0⁺	Any value in [0, ∞]	Take ln; becomes 0 · ∞
1^∞	(1 + 1/n)ⁿ as n → ∞	Any value in [0, ∞]	Take ln; becomes ∞ · 0
∞⁰	x^(1/x) as x → ∞	Any value in [0, ∞]	Take ln; becomes 0 · ∞

Why seven, and why these?

Each indeterminate form is a place where two competing behaviours meet — one pushing toward 0, the other toward infinity — and the final answer depends on which wins and by how much. L'Hôpital's Rule quantifies the fight by comparing rates.

Intuition: Two Cars Racing to Zero

Think of $f(x)$ and $g(x)$ as positions of two cars on the number line. At time $x = c$ they both reach the finish line $y = 0$ . The ratio $f(x)/g(x)$ is the distance ratio between the two cars just before they arrive.

🏎 Naive view (breaks down at c)

Directly read positions at the finish line: both are 0, so the ratio is $0/0$ . No information — you cannot compare distances when both cars are exactly at the line.

⚡ L'Hôpital's view (uses rates)

Compare the cars' speeds as they cross the line. If $f$ is travelling at 3 m/s and $g$ at 5 m/s right at the instant of arrival, the ratio of remaining distances an instant earlier was $3/5$ . That is the limit.

This is exactly the content of the rule: the indeterminate ratio of positions is determined by the ratio of the rates at which they reach zero. The derivative $f'(c)$ literally means "how fast f is changing at c" — which for a function vanishing at c is the same as "how fast f is approaching zero at c".

Linear approximation in disguise

Near $c$ we have $f(x) \approx f'(c)(x - c)$ and $g(x) \approx g'(c)(x - c)$ . Dividing, the $(x - c)$ factors cancel and you are left with $f'(c)/g'(c)$ . L'Hôpital is just this cancellation made rigorous.

The Rule, Stated Precisely

L'Hôpital's Rule

Suppose $f$ and $g$ are differentiable on an open interval containing $c$ (except possibly at $c$ itself), that $g'(x) \neq 0$ on that interval, and that either

\lim_{x \to c} f(x) = \lim_{x \to c} g(x) = 0

\lim_{x \to c} |f(x)| = \lim_{x \to c} |g(x)| = \infty.

Then, provided the right-hand limit exists (or is $\pm\infty$ ),

\displaystyle \lim_{x \to c} \frac{f(x)}{g(x)} = \lim_{x \to c} \frac{f'(x)}{g'(x)}.

The same conclusion holds at $c = \pm\infty$ and for one-sided limits.

All three conditions must hold

Indeterminate form: direct substitution must produce $0/0$ or $\infty/\infty$ .
Differentiability near c: both $f$ and $g$ must have derivatives on a punctured neighbourhood of $c$ .
$g'(x) \neq 0$ near $c$ — you cannot divide by zero, even after differentiating.
The new limit $\lim f'/g'$ must exist (or be $\pm\infty$ ). If it does not, the rule simply tells you nothing.

Why It Works — A Linear Approximation Proof

The clearest way to see the rule is to Taylor-expand. Because $f$ and $g$ are differentiable at $c$ with $f(c) = g(c) = 0$ , for $x$ near $c$ :

f(x) = f'(c)(x - c) + R_f(x), \quad g(x) = g'(c)(x - c) + R_g(x)

where the remainders $R_f, R_g$ are $o(x - c)$ — they vanish faster than the linear term. Dividing,

\displaystyle \frac{f(x)}{g(x)} = \frac{f'(c)(x-c) + R_f(x)}{g'(c)(x-c) + R_g(x)} = \frac{f'(c) + R_f(x)/(x-c)}{g'(c) + R_g(x)/(x-c)}.

Because $R_f(x)/(x-c) \to 0$ and $R_g(x)/(x-c) \to 0$ as $x \to c$ , the middle expression tends to $f'(c)/g'(c)$ . That is the rule. A more careful proof uses the Cauchy Mean Value Theorem — which is the main-course version of the idea we'll cover in a later chapter — but the linear-approximation picture is the honest intuition.

What if f'(c)/g'(c) is itself 0/0?

Apply the rule again! For $(1 - \cos x)/x^2$ , the first application yields $\sin(x)/(2x)$ — still 0/0. A second application gives $\cos(x)/2 \to 1/2$ . Each pass peels off another layer of vanishing behaviour.

Interactive: Compare f/g and f'/g'

Below you can step through six indeterminate-form limits side by side. The red plot is the naive ratio $f(x)/g(x)$ ; the green plot is the ratio of derivatives $f'(x)/g'(x)$ . Drag the zoom slider to shrink the window around the trouble point. Watch how the red curve has to crawl toward the answer while the green curve sits on it from the start.

Loading L'Hôpital visualizer…

Try this experimental loop:

Start with sin(x)/x at x → 0. Zoom all the way in — the red curve wobbles numerically near 0, but the green cos(x)/1 is perfectly flat at 1.
Switch to (1 − cos x)/x². Zoom in. The red crawl is slow because the numerator is quadratic; but $\sin(x)/(2x)$ is still 0/0, so the green curve itself is an indeterminate ratio until you apply the rule a second time (mentally) to land on 1/2.
Jump to x/eˣ at x → ∞. Now you zoom on a very large window. The red ratio falls from 1/eˣ quickly; the green curve $1/e^x$ is already the derivative answer.
Pick x · ln(x) at x → 0⁺. This is a 0·∞ form. The rewrite $\;x/(1/\ln x)$ converts it to 0/0 and shows both ratios climbing to 0.

Worked Example: lim sin(3x)/sin(5x)

We will compute $\displaystyle \lim_{x \to 0} \frac{\sin(3x)}{\sin(5x)}$ three ways: as a direct substitution (fails), via L'Hôpital, and via a small Taylor argument. Do each step on paper — it builds the reflex for recognising when the rule applies and when it does not.

📝 Step-by-step numerical walkthrough — try it yourself first

Step 1 — Attempt direct substitution. Plug in $x = 0$ :

\frac{\sin(3 \cdot 0)}{\sin(5 \cdot 0)} = \frac{0}{0}.

Useless. Both parts vanish at the same point. This flags the limit as a candidate for L'Hôpital.

Step 2 — Check hypotheses. Both $\sin(3x)$ and $\sin(5x)$ are differentiable everywhere, and $g'(x) = 5\cos(5x)$ is nonzero in a neighbourhood of 0 (it equals 5 there). The rule applies.

Step 3 — Differentiate top and bottom separately. Important: we do not use the quotient rule; L'Hôpital differentiates the numerator and denominator independently.

\frac{d}{dx}\sin(3x) = 3\cos(3x), \qquad \frac{d}{dx}\sin(5x) = 5\cos(5x).

Step 4 — Evaluate the new ratio at x = 0.

\lim_{x \to 0} \frac{3\cos(3x)}{5\cos(5x)} = \frac{3\cos(0)}{5\cos(0)} = \frac{3 \cdot 1}{5 \cdot 1} = \frac{3}{5}.

Step 5 — Sanity-check numerically. At $x = 0.1$ :

sin(0.3) = 0.2955202067
sin(0.5) = 0.4794255386
ratio = 0.2955202067 / 0.4794255386 = 0.6164048071 (≈ 0.6 ✓)

At $x = 0.001$ the ratio is 0.6000016000; at $x = 10^{-4}$ it is 0.6000000160. Convergence is quadratic — exactly what Taylor predicts.

Step 6 — Cross-check with Taylor. Using $\sin(u) = u - u^3/6 + \cdots$ :

\frac{\sin(3x)}{\sin(5x)} = \frac{3x - (3x)^3/6 + \cdots}{5x - (5x)^3/6 + \cdots} = \frac{3x(1 - 3x^2/2 + \cdots)}{5x(1 - 25x^2/6 + \cdots)} = \frac{3}{5}\left(1 + O(x^2)\right).

The leading behaviour is $3/5$ exactly; the correction is quadratic in $x$ . This explains the empirical pattern: the error at $x = 10^{-k}$ is about $10^{-2k}$ .

The pattern works for every linear-in-argument pair

For $\lim_{x \to 0} \sin(ax)/\sin(bx)$ with $b \neq 0$ , the answer is always $a/b$ . Same for $\tan(ax)/\tan(bx)$ , $\sin(ax)/\tan(bx)$ , etc. Each L'Hôpital call strips one layer off the chain rule.

Handling ∞/∞ and Other Forms

∞ / ∞ directly: x / eˣ as x → ∞

Both $x$ and $e^x$ blow up, so direct evaluation gives $\infty/\infty$ . Differentiate:

\displaystyle \lim_{x \to \infty} \frac{x}{e^x} = \lim_{x \to \infty} \frac{1}{e^x} = 0.

One L'Hôpital call collapsed a hard question into an easy one. This also proves the classical fact that $e^x$ dominates every polynomial — apply the rule $n$ times to $x^n/e^x$ and you end up with $n!/e^x \to 0$ .

0 · ∞: x ln(x) as x → 0⁺

A product of 0 and $-\infty$ is ambiguous. Rewrite it as a quotient to force a 0/0 or ∞/∞ form, whichever is easier to differentiate:

x \ln(x) = \frac{\ln(x)}{1/x} \quad \text{(now }\tfrac{-\infty}{\infty}\text{)}.

Apply L'Hôpital:

\lim_{x \to 0^+} \frac{\ln(x)}{1/x} = \lim_{x \to 0^+} \frac{1/x}{-1/x^2} = \lim_{x \to 0^+} (-x) = 0.

Notice the freedom in the rewrite: we could also have used $\tfrac{x}{1/\ln x}$ (a 0/0 form), but the resulting derivative is uglier. Part of the skill is choosing the rewrite that leads to the cleanest derivative.

1^∞ and ∞⁰: take the logarithm first

For powers whose base and exponent both misbehave, take $\ln$ to convert a product of a log and another ambiguous factor, then use the techniques above. The classical example is

\displaystyle \lim_{n \to \infty}\left(1 + \frac{1}{n}\right)^{n} = e.

Let $y = (1 + 1/n)^n$ so $\ln y = n \ln(1 + 1/n)$ , an $\infty \cdot 0$ form. Rewrite:

\ln y = \frac{\ln(1 + 1/n)}{1/n} \quad \longrightarrow \quad \text{apply L'Hôpital in the variable } h = 1/n \to 0^+.

The resulting limit is $1$ , so $\ln y \to 1$ and $y \to e^1 = e$ . The same trick handles 0⁰, 1^∞, and ∞⁰ uniformly.

Python: Verifying L'Hôpital Numerically

The rule is a theorem, but a computer can illustrate it beautifully. Below we compute $\sin(3x)/\sin(5x)$ at shrinking $x$ values and watch it converge to the L'Hôpital answer $3/5$ . We then compute the answer directly by plugging $x = 0$ into $f'/g'$ — zero shrinking required.

Pure Python: naive shrinking vs one-shot L'Hôpital

🐍lhopital_rule.py

Explanation(20)

Code(39)

1import math

We need math.sin and math.cos. Using the stdlib keeps the example portable and highlights that L'Hôpital is a calculus idea, not a library trick.

EXECUTION STATE

math = Python standard library for scalar math — sin, cos, exp, log, etc. Pure-Python, no dependencies.

3def f(x): the numerator sin(3x)

Defines the numerator we want to take the limit of. sin(3x) vanishes at 0 because sin(0) = 0, which is exactly what makes the quotient 0/0 and forces us to reach for L'Hôpital.

EXECUTION STATE

⬇ input: x = Any real number. We will evaluate it at shrinking positive values approaching 0.

⬆ returns = sin(3x) — a float. Example: f(0.1) = sin(0.3) ≈ 0.2955.

→ why 3x inside sin? = The coefficient 3 sets the derivative at 0: d/dx sin(3x) = 3 cos(3x), which is 3 at x = 0. That 3 becomes half of the final answer.

7def g(x): the denominator sin(5x)

The denominator. sin(5x) also vanishes at 0. Because both numerator and denominator go to zero, the quotient is 0/0 — an indeterminate form.

EXECUTION STATE

⬇ input: x = Same sweep as for f — the same x values feed both functions so we can divide them.

⬆ returns = sin(5x). Example: g(0.1) = sin(0.5) ≈ 0.4794.

→ why 5x? = The coefficient 5 sets g'(0) = 5. The final limit is f'(0)/g'(0) = 3/5 — no guessing, L'Hôpital reads it straight off.

11def f_prime(x): analytic derivative

The chain rule derivative of sin(3x) is 3 cos(3x). We type it out explicitly so the code reads like the math. A real project might use autograd instead (see the PyTorch code below).

EXECUTION STATE

⬆ returns = 3 * cos(3x). At x = 0: 3 * cos(0) = 3 * 1 = 3.

📚 math.cos(x) = Python stdlib cosine in radians. math.cos(0) = 1, math.cos(π/2) ≈ 0.

15def g_prime(x): analytic derivative

Derivative of sin(5x) is 5 cos(5x). At x = 0 this evaluates to 5. The ratio 3/5 will be our limit.

EXECUTION STATE

⬆ returns = 5 * cos(5x). At x = 0: 5 * 1 = 5.

20print("Naive limit ...") header

Announce the first experiment: shrink x and watch f(x)/g(x). This is the brute-force approach a computer can always try; we use it to build intuition before applying the rule.

21print column headers

Format specifiers right-align the columns so the table is readable. `:>10` means right-align in width 10; `:>14` leaves room for long decimals.

EXECUTION STATE

:>10 = Right-align text in a width-10 field. Example: 'x'.rjust(10) → ' x'.

22print("-" * 54)

Repeats a dash 54 times as a horizontal separator. Python's `str * int` is the idiomatic way to build separator strings.

23for x in [0.5, 0.1, 0.01, 0.001, 1e-4]

Walk x down by factors of 10. Each row squeezes x closer to 0 so we can watch the ratio stabilise. If the limit is real, the ratio should converge; if not, it will diverge or oscillate.

LOOP TRACE · 5 iterations

x = 0.5

sin(3·0.5) = sin(1.5) = 0.997495

sin(5·0.5) = sin(2.5) = 0.598472

ratio = 1.66673586

x = 0.1

sin(0.3) = 0.295520

sin(0.5) = 0.479426

ratio = 0.61640481

x = 0.01

sin(0.03) = 0.029996

sin(0.05) = 0.049979

ratio = 0.60016004

x = 0.001

ratio = 0.60000160

x = 1e-4

ratio = 0.60000002

→ target = 3/5 = 0.6

24fx = f(x) — evaluate the numerator

Pull the numerator value out once. Reusing `fx` avoids computing sin(3x) twice on the same line (once for display, once in the division).

EXECUTION STATE

fx = A single float equal to sin(3x). Example: fx at x = 0.01 is 0.029996.

25gx = g(x) — evaluate the denominator

Same idea for the denominator. Pulling them into named variables makes the table-print on the next line readable and avoids floating-point noise from recomputation.

EXECUTION STATE

gx = A single float equal to sin(5x). Example: at x = 0.01, gx = 0.049979.

26print formatted row

Print one row of the convergence table. `:.6f` gives 6 fixed-point decimals; `:.8f` for the ratio shows more digits so we can watch the last few digits stabilise at 0.60000000…

EXECUTION STATE

:.6f = Fixed-point, 6 decimal places. 0.029996 renders as '0.029996'.

:.8f = Fixed-point, 8 decimals. We use extra digits for the ratio so convergence is visible at small x.

29print() — blank separator

Empty print inserts a newline. Keeps the naive-vs-L'Hôpital outputs visually separated.

30print("L'Hopital: ...") header

Announce the shortcut: instead of shrinking x, evaluate f'(0)/g'(0) at the trouble spot itself. This only works because both f and g are differentiable at 0 and g'(0) ≠ 0.

31limit = f_prime(0) / g_prime(0)

Read the limit off directly: f'(0) = 3, g'(0) = 5, so the limit is 3/5 = 0.6. No shrinking, no table — one arithmetic step.

EXECUTION STATE

f_prime(0) = 3 · cos(0) = 3.0

g_prime(0) = 5 · cos(0) = 5.0

limit = 3.0 / 5.0 = 0.6

→ why this is legal = L'Hôpital applies when (1) f and g → 0 at the same point, (2) f and g are differentiable near that point, (3) g'(x) ≠ 0 there. All three are true here: cos(0) = 1, so g'(0) = 5 ≠ 0.

32print the computed limit

Echo the arithmetic so the reader sees 3/5 = 0.6 explicitly. `.6f` formats the answer to six decimal places for a clean-looking 0.600000.

35print("Absolute error ...") header

Start the third experiment: measure how fast the naive ratio approaches 0.6. If L'Hôpital is correct, the error should shrink smoothly to zero as x → 0.

36for x in [0.1, 0.01, 0.001, 1e-4]

Sweep x over four orders of magnitude. Watching |f(x)/g(x) − 0.6| decrease to machine precision is a strong numerical witness that our analytic answer is correct.

LOOP TRACE · 4 iterations

x = 0.1

|f/g − 3/5| = 1.640e-02

x = 0.01

|f/g − 3/5| = 1.600e-04

x = 0.001

|f/g − 3/5| = 1.600e-06

x = 1e-4

|f/g − 3/5| = 1.600e-08

→ pattern = Error shrinks roughly as x². This matches the Taylor remainder: sin(3x)/sin(5x) = 3/5 + O(x²).

37err = abs(f(x) / g(x) - limit)

Measure the distance between the numerical ratio and the L'Hôpital answer 3/5. `abs()` returns the absolute value so we do not need to worry about sign.

EXECUTION STATE

abs(a) = Built-in absolute value. abs(-0.01) = 0.01.

err = Non-negative float. Shrinks roughly like x² because the next-order Taylor term in sin(3x)/sin(5x) − 3/5 is O(x²).

38print formatted error row

`:>7.0e` right-aligns x in width-7 scientific notation with no fractional digits. `:.3e` formats the error with 3 significant digits.

EXECUTION STATE

:>7.0e = Example: 1e-4 → ' 1e-04'.

:.3e = Example: 0.00016 → '1.600e-04'.

19 lines without explanation

1import math
2
3def f(x):
4    """Numerator — vanishes at x = 0."""
5    return math.sin(3 * x)
6
7def g(x):
8    """Denominator — also vanishes at x = 0."""
9    return math.sin(5 * x)
10
11def f_prime(x):
12    """Analytic derivative of sin(3x) is 3 cos(3x)."""
13    return 3 * math.cos(3 * x)
14
15def g_prime(x):
16    """Analytic derivative of sin(5x) is 5 cos(5x)."""
17    return 5 * math.cos(5 * x)
18
19# --- 1) Naive evaluation: shrink x toward 0 and watch f/g. -----------------
20print("Naive limit of f(x)/g(x) as x -> 0")
21print(f"{'x':>10}  {'f(x)':>12}  {'g(x)':>12}  {'f(x)/g(x)':>14}")
22print("-" * 54)
23for x in [0.5, 0.1, 0.01, 0.001, 1e-4]:
24    fx = f(x)
25    gx = g(x)
26    print(f"{x:>10.4g}  {fx:>12.6f}  {gx:>12.6f}  {fx / gx:>14.8f}")
27
28# --- 2) L'Hopital shortcut: evaluate f'/g' at x = 0 directly. --------------
29print()
30print("L'Hopital: evaluate ratio of derivatives AT x = 0")
31limit = f_prime(0) / g_prime(0)        # 3*cos(0) / 5*cos(0) = 3/5
32print(f"f'(0) / g'(0) = {f_prime(0)}/{g_prime(0)} = {limit:.6f}")
33
34# --- 3) Sanity check: the two approaches agree in the limit. ---------------
35print()
36print("Absolute error between naive ratio and 3/5")
37for x in [0.1, 0.01, 0.001, 1e-4]:
38    err = abs(f(x) / g(x) - limit)
39    print(f"x = {x:>7.0e}   |f/g - 3/5| = {err:.3e}")

What to notice in the output

The naive ratio needs shrinking x by four orders of magnitude to nail the first eight digits of 0.6. The L'Hôpital call produces those digits from a single division. The error column also shows the quadratic convergence predicted by Taylor — a quiet confirmation that $f'(0)/g'(0)$ really is the limit.

PyTorch: Using Autograd as an L'Hôpital Engine

Hand-differentiating $\sin(3x)$ is easy. Hand-differentiating a 100-term loss function in a neural net is not. Autograd does it for us, which means we can build a fully automatic L'Hôpital evaluator: declare $f$ and $g$ , call .backward() twice, divide, done.

PyTorch: autograd reads f'(0) and g'(0) in one call each

🐍lhopital_autograd.py

Explanation(26)

Code(42)

1import torch

PyTorch gives us torch.sin, torch.cos, and — crucially — automatic differentiation. We will use autograd to compute f'(0) and g'(0) without hand-coding the chain rule.

EXECUTION STATE

torch = Tensor library with autograd. Here we use only 0-dim (scalar) tensors.

3def f(x): return torch.sin(3 * x)

Exactly the same numerator, but with torch.sin so the operation is recorded in the autograd graph. Multiplying a tensor by 3 is also a tracked op, so the chain rule flows automatically.

EXECUTION STATE

📚 torch.sin(t) = Elementwise sine of a tensor. Has an autograd implementation: derivative is torch.cos(t).

⬇ input: x (tensor, requires_grad=True) = A 0-dim tracked tensor. Each op on it appends a node to the autograd graph.

⬆ returns = A scalar tensor whose grad_fn is SinBackward, wrapping a MulBackward for 3*x.

6def g(x): return torch.sin(5 * x)

Denominator as a tensor function. Autograd records this as SinBackward ∘ MulBackward just like f.

EXECUTION STATE

⬆ returns = sin(5x) as a scalar tensor. At x = 0, value is 0.0.

9Comment: probe at x = 0

Explains the reasoning for the next line: we want to evaluate the derivatives at the singular point itself. Autograd is happy to do this because sin is differentiable everywhere, even where it equals zero.

12x = torch.tensor(0.0, requires_grad=True)

Create a tracked scalar tensor at exactly x = 0. With requires_grad=True, PyTorch will accumulate gradients into x.grad on every subsequent backward() call.

EXECUTION STATE

📚 torch.tensor(data, requires_grad) = Constructor: builds a leaf tensor. requires_grad=True registers it for gradient tracking.

⬇ arg 1: 0.0 = The value at which we want the derivative. Both f and g vanish here — exactly the place naive evaluation produces 0/0.

⬇ arg 2: requires_grad=True = Enables autograd. Without it, calling .backward() on a downstream tensor would error because there is nothing to differentiate with respect to.

→ x.grad = Starts as None. Populated (or added to) by every subsequent .backward() call.

14Comment — evaluate f and g once

Signals what the next two lines do: two forward passes, each building its own autograd subgraph that shares the leaf x.

16y_f = f(x)

Forward pass through f. y_f is a scalar tensor with value sin(0) = 0.0 and a grad_fn that chains SinBackward → MulBackward → x.

EXECUTION STATE

y_f = tensor(0.0, grad_fn=<SinBackward0>)

→ y_f.item() = 0.0 — this is why the naive ratio is 0/0 and needs the rule.

17y_g = g(x)

Forward pass through g. Same story: value 0, a separate autograd subgraph ending at x.

EXECUTION STATE

y_g = tensor(0.0, grad_fn=<SinBackward0>)

20y_f.backward(retain_graph=True)

Backprop through f's graph to deposit df/dx into x.grad. retain_graph=True tells PyTorch to keep internal buffers so we can backward through g's graph later (otherwise PyTorch frees them for memory).

EXECUTION STATE

📚 .backward(retain_graph=False) = Walks the autograd graph from this scalar tensor, applying the chain rule. Accumulates into .grad of every leaf with requires_grad=True.

⬇ arg: retain_graph=True = Keep the saved intermediates. We will call backward on y_g afterwards; without this flag y_g's graph would already be freed.

→ side effect = x.grad becomes tensor(3.0) because df/dx = 3 cos(3x) and cos(0) = 1.

21f_prime = x.grad.item()

Pull the Python float out of the 0-dim gradient tensor. This number is f'(0) = 3.0 — precisely the coefficient we hard-coded earlier, now obtained automatically.

EXECUTION STATE

📚 .item() = Returns the Python scalar for a 0-dim tensor. Raises if the tensor has more than one element.

f_prime = 3.0

22x.grad = None # clear accumulator

PyTorch *adds* gradients to .grad on each backward call. If we did not clear it, the next backward would return 3 + g'(0) = 3 + 5 = 8, not 5. Setting x.grad to None is the cleanest reset.

EXECUTION STATE

why not .zero_()? = Both work. Setting to None is slightly cheaper (no memory to zero) and also makes the next backward allocate a fresh tensor.

24y_g.backward()

Backprop through g to load dg/dx into x.grad. No retain_graph needed because we do not need g's graph again.

EXECUTION STATE

→ side effect = x.grad = tensor(5.0) because g'(x) = 5 cos(5x) and cos(0) = 1.

25g_prime = x.grad.item()

Extract g'(0) as a Python float. This is 5.0 — the other number L'Hôpital needs.

EXECUTION STATE

g_prime = 5.0

27print("f(0) = ...")

Echo the forward values so the reader sees 0/0 concretely before the rule saves the day.

EXECUTION STATE

y_f.item() = 0.0

:.6f = Fixed-point, 6 decimals. 0.0 → '0.000000'.

28print("g(0) = ...")

Same sanity-print for g. Two zeros on top of each other — the indeterminate form.

29print("f'(0) = ...")

Show the autograd-computed numerator-of-L'Hôpital.

EXECUTION STATE

f_prime = 3.0 — matches 3·cos(0) = 3.

30print("g'(0) = ...")

And the denominator-of-L'Hôpital.

EXECUTION STATE

g_prime = 5.0 — matches 5·cos(0) = 5.

31print L'Hopital limit

One division and the limit drops out: 3.0 / 5.0 = 0.6. The whole detour through 0/0 was unnecessary once we had autograd.

EXECUTION STATE

f_prime / g_prime = 0.6 — the exact answer, matching the naive table's convergence target.

34print() — blank line

Cosmetic newline between the L'Hôpital result and the sanity-check table.

35print("Autograd ... vs numerical ratio")

Header for the sanity check. We will compare the naive ratio f(x)/g(x) against the autograd-computed L'Hôpital answer across shrinking x.

36with torch.no_grad():

Disable autograd inside the block. We are only printing numbers, not backproping. Skipping graph construction is faster and avoids accidentally polluting x.grad from the earlier backward calls.

EXECUTION STATE

📚 torch.no_grad() = Context manager that sets requires_grad=False for all ops inside. Standard idiom for evaluation/inference.

→ why here? = We are done with gradients. Every forward pass under no_grad is cheaper and cannot alter x.grad.

37for xv in [0.1, 0.01, 0.001, 1e-4]

Iterate over the same shrinking x values we used in pure Python. Re-running under torch keeps the two implementations comparable.

LOOP TRACE · 4 iterations

xv = 0.1

naive = 0.61640481

err = 1.64e-02

xv = 0.01

naive = 0.60016004

err = 1.60e-04

xv = 0.001

naive = 0.60000160

err = 1.60e-06

xv = 1e-4

naive = 0.60000002

err = 1.60e-08

38xt = torch.tensor(xv)

Wrap each Python float in a zero-dim tensor so we can feed it through the same f, g that expect tensors. No requires_grad needed — we are inside no_grad and do not need gradients here.

EXECUTION STATE

xt = A scalar tensor with the current xv value. Example: torch.tensor(0.01) → tensor(0.0100).

39naive = (f(xt) / g(xt)).item()

Compute the naive ratio as a Python float. Division of two scalar tensors is itself a scalar tensor; `.item()` unwraps it.

EXECUTION STATE

f(xt) / g(xt) = Elementwise tensor division. For xv = 0.01 the result is tensor(0.6002). Inside no_grad it builds no backward graph.

naive = Python float version of the ratio. Example at xv = 0.01: 0.60016004.

40err = abs(naive - f_prime / g_prime)

Distance between the numerical ratio and the L'Hôpital answer. If the rule gave the correct limit, this should shrink to machine precision as xv → 0.

EXECUTION STATE

err = Non-negative float. Decreases quadratically (O(x²)) because the next Taylor term of sin(3x)/sin(5x) is quadratic.

41print formatted row

Print xv (scientific, 0 fractional digits), the naive ratio (8 decimals to show convergence), and the absolute error (3-sig-fig scientific).

EXECUTION STATE

:>7.0e = Right-align width 7, scientific, 0 digits: 1e-4 → ' 1e-04'.

:.8f = Fixed-point with 8 decimals: 0.6 → '0.60000000'.

:.2e = Scientific with 2 decimals: 1.6e-4 → '1.60e-04'.

16 lines without explanation

1import torch
2
3def f(x):
4    return torch.sin(3 * x)
5
6def g(x):
7    return torch.sin(5 * x)
8
9# 1) A point where BOTH functions vanish. We probe "just off zero"
10#    because autograd at a leaf tensor only needs a place to evaluate;
11#    but evaluating exactly at 0 works too, since f and g are analytic.
12x = torch.tensor(0.0, requires_grad=True)
13
14# 2) Evaluate f and g once. This builds two autograd graphs that
15#    share x as the leaf.
16y_f = f(x)
17y_g = g(x)
18
19# 3) Backprop through f separately from g. We zero out x.grad in between
20#    because PyTorch accumulates gradients by default.
21y_f.backward(retain_graph=True)
22f_prime = x.grad.item()
23x.grad = None                      # clear accumulator
24
25y_g.backward()
26g_prime = x.grad.item()
27
28print(f"f(0)  = {y_f.item():.6f}   (should be 0)")
29print(f"g(0)  = {y_g.item():.6f}   (should be 0)")
30print(f"f'(0) = {f_prime:.6f}")
31print(f"g'(0) = {g_prime:.6f}")
32print(f"L'Hopital limit = f'(0) / g'(0) = {f_prime / g_prime:.6f}")
33
34# 4) Sanity check: sweep x and compare f(x)/g(x) with the L'Hopital answer.
35print()
36print("Autograd L'Hopital vs numerical ratio")
37with torch.no_grad():
38    for xv in [0.1, 0.01, 0.001, 1e-4]:
39        xt = torch.tensor(xv)
40        naive = (f(xt) / g(xt)).item()
41        err = abs(naive - f_prime / g_prime)
42        print(f"x = {xv:>7.0e}  naive = {naive:.8f}  err = {err:.2e}")

The deep connection to machine learning

Every training step of a neural network computes a ratio-like quantity — the gradient of loss with respect to weights — at a point where the network's output is near its target. That computation is, in effect, asking "how does the output respond to a tiny nudge?", and the answer comes from autograd the same way L'Hôpital reads $f'(c)/g'(c)$ . The rule that 17th-century mathematicians wrote for limits is the same rule that 21st-century optimisers use to train billion-parameter models.

Where L'Hôpital Unlocks Real Problems

📐 Growth comparisons

Which grows faster: $x^{100}$ or $1.01^x$ ? L'Hôpital applied 100 times proves the exponential wins — the basis of every complexity argument in computer science.

🌊 Small-angle physics

$\sin(\theta) \approx \theta$ and $1 - \cos(\theta) \approx \theta^2/2$ are L'Hôpital results. They underpin the pendulum equation, optics, and every small-perturbation expansion in physics.

📈 Continuous compounding

The limit $(1 + r/n)^n \to e^r$ is the reason continuous compounding exists. Without L'Hôpital (or Taylor), the conversion from discrete to continuous interest is mysterious.

🤖 Model selection

Ratios of log-likelihoods, Bayes factors, softmax temperatures — many statistical criteria reduce to ratios that are 0/0 or ∞/∞ at critical limits. L'Hôpital is the tool that makes them meaningful.

Common Pitfalls

Pitfall 1 — Applying the rule to non-indeterminate forms

$\displaystyle \lim_{x \to 1} \frac{x^2 + 1}{x + 1} = \frac{2}{2} = 1$ by direct substitution. Blindly differentiating gives $2x/1 = 2$ at $x = 1$ — a wrong answer. Always verify the form is 0/0 or ∞/∞ before differentiating.

Pitfall 2 — Using the quotient rule

L'Hôpital replaces $f(x)/g(x)$ with $f'(x)/g'(x)$ — not with $(f/g)'$ . Do not apply the quotient rule; it gives a different (usually much uglier) expression.

Pitfall 3 — Stopping too early when the form persists

After one application, the new ratio may still be indeterminate. Keep applying the rule (or switch to Taylor) until you land on a form that can be evaluated directly. Trying $(1 - \cos x)/x^2$ with only one pass produces $\sin(x)/(2x)$ — an answer, but still 0/0. A second pass gives the true answer 1/2.

Pitfall 4 — When the derivative limit does not exist

If $\lim f'/g'$ fails to exist (for instance, oscillating like $x\sin(1/x)$ ), the rule is silent — it does not say the original limit fails. You may need a different technique (squeeze theorem, Taylor remainder bounds, etc.).

Pitfall 5 — Circular reasoning for sin(x)/x

Using L'Hôpital to "prove" $\lim_{x \to 0} \sin(x)/x = 1$ is logically circular, because the derivative of $\sin$ is usually derived from that limit. This is a book-keeping issue, not a rule issue: once derivatives of trig are established by a geometric squeeze argument, L'Hôpital handles every other trig limit cleanly.

Summary

L'Hôpital's Rule is a bridge between two pillars of calculus: the limit and the derivative. When direct substitution produces an indeterminate form, the rule says: the answer is controlled by the rate at which numerator and denominator approach their limiting values. Formally,

\displaystyle \lim_{x \to c} \frac{f(x)}{g(x)} = \lim_{x \to c} \frac{f'(x)}{g'(x)}

whenever the hypotheses (indeterminate form, differentiability, $g' \neq 0$ , and existence of the right-hand limit) hold.

Indeterminate form	Conversion	Then apply
0 / 0	—	L'Hôpital directly
∞ / ∞	—	L'Hôpital directly
0 · ∞	Rewrite as 0/(1/∞) or ∞/(1/0)	L'Hôpital on the quotient
∞ − ∞	Combine into one fraction via common denominator	L'Hôpital
0⁰, 1^∞, ∞⁰	Take ln; becomes 0 · ∞	Convert and apply L'Hôpital

Key Takeaways

L'Hôpital is a statement about rates: the ratio of values at the trouble point equals the ratio of their derivatives there.
The rule applies only to 0/0 and ∞/∞ forms directly; every other indeterminate form must be rewritten into one of these first.
If the new ratio is still indeterminate, apply the rule again — or switch to a Taylor expansion for clarity.
The proof is just a linear approximation: $f(x) \approx f'(c)(x-c)$ and $g(x) \approx g'(c)(x-c)$ , then cancel $(x-c)$ .
In practice, PyTorch's autograd turns L'Hôpital into a two-line procedure — which is effectively what every deep-learning optimiser does, billions of times per second, at scale.

The L'Hôpital promise:

"When values collide at 0/0 or ∞/∞, the answer is hidden in their rates. Differentiate once, and the collision unfolds."

Coming Next: Chapter 3 turns the ε–δ language into a property of functions themselves: continuity. We will see why continuous functions are the sandbox on which every theorem about derivatives and integrals is built — and how the smallest break in a curve unravels entire theorems.