Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

State and intuitively justify the two fundamental limits $\displaystyle\lim_{x\to 0}\frac{\sin x}{x}=1$ and $\displaystyle\lim_{n\to\infty}\left(1+\tfrac{1}{n}\right)^{n}=e$ .
Derive the squeeze inequality $\cos x \le \sin x / x \le 1$ from the unit circle and apply the Squeeze Theorem to prove the first limit.
Use the sinc limit to evaluate many related limits — including $(1-\cos x)/x^{2}$ , $\tan x / x$ , and $\sin(kx)/x$ — without invoking L'Hôpital.
Connect the compound-interest sequence $(1+1/n)^{n}$ to the continuous limit $\lim_{h\to 0}(1+h)^{1/h}$ and to $(e^{h}-1)/h \to 1$ .
Verify both limits numerically in plain Python and recognize them as the limits that define $(\sin x)' = \cos x$ and $(e^{x})' = e^{x}$ in PyTorch's autograd.

Why These Two Limits Are Special

Every standard limit rule we have met so far — the sum, product, quotient, and composition laws — says if you already know the pieces, here is how to combine them. But somewhere the chain has to bottom out. Somewhere a few non-trivial limits must be computed from first principles, and all the other calculus rules will then ride on top of them.

Two limits play exactly this role throughout all of analysis. Everything you know about the derivative of sine, cosine, tangent, e^x, ln, and every decay, oscillation, or compound-growth process in physics, economics, and machine learning can be traced back to these two sentences:

The trigonometric seed

\displaystyle\lim_{x\to 0}\frac{\sin x}{x}=1

It says "for small angles the sine is almost equal to the angle itself". This single sentence unlocks the derivative of sine, the small-angle pendulum formula, Fourier analysis, and diffraction optics.

The exponential seed

\displaystyle\lim_{n\to\infty}\left(1+\tfrac{1}{n}\right)^{n}=e

It says "compounding gains infinitely often gives a finite, universal number". This constant e ≈ 2.71828 then drives every differential equation of growth, decay, interest, and probability.

Why they cannot be avoided

Both limits are of the indeterminate type $0/0$ or $1^{\infty}$ . Algebra can't resolve them directly — we need either a geometric sandwich argument (for the first) or a delicate monotonic bounded-sequence argument (for the second). That is exactly why they have to be proved once, and every other identity then reuses them.

sin(x)/x — Geometric Intuition

Draw a unit circle centered at O and pick a small positive angle $x$ (in radians). Three natural lengths appear in the picture:

The vertical drop from the circle point to the horizontal axis has length $\sin x$ .
The arc on the circle from the horizontal axis to that point has length exactly $x$ — that is the geometric definition of a radian.
The tangent segment dropped from the end of the arc to the horizontal axis (outside the circle) has length $\tan x$ .

Comparing areas of the three shaded regions in the interactive below:

\underbrace{\tfrac{1}{2}\sin x\cos x}_{\text{green triangle}}\;\le\;\underbrace{\tfrac{1}{2}x}_{\text{violet sector}}\;\le\;\underbrace{\tfrac{1}{2}\tan x}_{\text{red triangle}}.

Divide the whole chain by $\tfrac{1}{2}\sin x > 0$ to get

\cos x\;\le\;\dfrac{x}{\sin x}\;\le\;\dfrac{1}{\cos x},

and invert (all three quantities are positive for $0<x<\pi/2$ ):

\cos x\;\le\;\dfrac{\sin x}{x}\;\le\;1.

As $x\to 0$ , $\cos x\to 1$ (cosine is continuous at 0), so the ratio is trapped between two things that both converge to 1. The Squeeze Theorem forces $\sin x / x$ to converge to 1 as well.

Interactive: Squeeze sin(x)/x onto 1

Drag the slider (or press Play) to shrink the angle $x$ . Watch the green, violet, and red regions flatten onto each other in the unit circle and — at the same time — the purple curve $\sin x / x$ get pinched onto the horizontal line $y=1$ .

Loading sin(x)/x squeeze visualizer…

Read the picture like this

The violet arc is $x$ . The green height is $\sin x$ . For small angles these two shrink at the same rate — their ratio approaches 1. For a 0.01 radian slice (about half a degree), the sine and the arc agree to six decimal places: the small-angle approximation $\sin x\approx x$ is that good.

The Sandwich Proof

Theorem (First Special Limit).

\displaystyle\lim_{x\to 0}\dfrac{\sin x}{x}=1.

Proof. Fix $0<x<\pi/2$ . From the unit-circle areas we derived

\cos x\le\dfrac{\sin x}{x}\le 1.

Cosine is continuous at 0 with $\cos 0=1$ , so the left bound tends to 1. The right bound is already 1. By the Squeeze Theorem, the middle term is forced to tend to 1. The case $-\pi/2<x<0$ is identical because both $\sin x$ and $x$ flip sign, so the ratio is even. ∎

The one-line corollary

\sin x = x + o(x)\quad\text{as }x\to 0.

In words: sine is the identity function plus a small error. Physicists call this the small-angle approximation; it is what lets them replace $\sin\theta$ with $\theta$ in the pendulum equation and recover simple harmonic motion.

Worked Example: Why sin(5x)/x → 5

Compute $\displaystyle\lim_{x\to 0}\dfrac{\sin(5x)}{x}$ without L'Hôpital, Taylor series, or any machinery beyond the first special limit.

Work it by hand, step by step

Step 1. The numerator is $\sin(5x)$ , not $\sin x$ , so the first limit does not apply directly. Multiply the numerator and denominator by a compensating $5$ :

\dfrac{\sin(5x)}{x} = 5\cdot\dfrac{\sin(5x)}{5x}.

Step 2. Substitute $u=5x$ . Then as $x\to 0$ , also $u\to 0$ (because 5 is just a scale factor).

5\cdot\dfrac{\sin(5x)}{5x} = 5\cdot\dfrac{\sin u}{u}.

Step 3. Apply the first special limit to the right-hand factor:

\dfrac{\sin u}{u}\xrightarrow{u\to 0}1.

Step 4. Combine:

\lim_{x\to 0}\dfrac{\sin(5x)}{x} = 5\cdot 1 = 5.

Check numerically. At $x=0.01$ : $\sin(0.05)/0.01 \approx 4.99791\ldots$ , and at $x=0.001$ : $\sin(0.005)/0.001 \approx 4.99997916\ldots$ , both approaching 5. ✔

The general rule. The exact same substitution shows $\lim_{x\to 0}\sin(kx)/x=k$ for any constant $k$ . The factor outside the sine controls the slope at zero; everything else cancels.

Playground: Limits Built from sin(x)/x

Pick a formula and shrink the window. Each curve has a removable hole at 0; the green line marks the value the hole would fill in. Notice how every curve flattens onto its green line as $x\to 0$ , and how the numerical probe table loses two decimal digits per row — a signature of algebraic rate-of-convergence.

Loading derived limits playground…

The trick behind (1 − cos x)/x² = 1/2

Multiply numerator and denominator by $1+\cos x$ :

\dfrac{1-\cos x}{x^{2}} = \dfrac{1-\cos^{2}x}{x^{2}(1+\cos x)} = \left(\dfrac{\sin x}{x}\right)^{2}\dfrac{1}{1+\cos x}.

Both factors approach easy values: $(\sin x/x)^{2}\to 1$ and $1/(1+\cos x)\to 1/2$ . Product → $1/2$ .

(1 + 1/n)^n — The Compound-Interest Story

In 1683 Jacob Bernoulli asked a practical question that secretly founded modern analysis: if a bank pays 100% interest once a year, you end up with $2 on each $1. What if they compound twice a year — 50% every six months?

\left(1+\tfrac{1}{2}\right)^{2}=\$2.25.

Four times a year (25% per quarter) gives $(1+\tfrac{1}{4})^{4}=\$2.44\ldots$ . Monthly gives $(1+\tfrac{1}{12})^{12}=\$2.613\ldots$ . The more often you compound, the more you earn — but the payoff of adding another compounding step keeps shrinking. Is there a ceiling? Yes: the sequence $a_n=(1+1/n)^{n}$ climbs forever but never crosses

e = 2.71828182845904\ldots

This number e is the one Euler later placed at the center of analysis. Its definition is literally the limit of the compound-interest sequence.

e = compounding gains in the limit of instantaneous compounding. Every appearance of $e^{t}$ in physics — radioactive decay, RC circuit voltage, population growth, Planck radiation — is the continuous limit of a discrete multiplicative process compounded ever more finely.

Interactive: Watch the Balance Climb to e

The left panel shows the running balance of $1 compounded $n$ times a year. The right panel plots $a_n=(1+1/n)^{n}$ on a logarithmic axis. Drag the slider to change $n$ by orders of magnitude.

Loading (1 + 1/n)^n visualizer…

What to notice

The sequence is monotonically increasing — every additional compounding step strictly raises the final balance.
The sequence is bounded above by $e$ . No finite $n$ ever reaches $e$ .
Convergence is slow: the gap $e-a_n\approx e/(2n)$ , so doubling $n$ only halves the error. Ten more digits of $e$ require $n\sim 10^{10}$ .

Why Does It Converge?

Unlike the sinc limit, there is no simple geometric picture — the proof is about numbers on a line, not shapes in a plane. The standard argument is that the sequence is monotonically increasing and bounded above. A classical theorem then forces it to converge.

Step 1 — Monotonicity via the binomial theorem

Expand by the binomial formula:

\left(1+\tfrac{1}{n}\right)^{n}=\sum_{k=0}^{n}\binom{n}{k}\dfrac{1}{n^{k}}=\sum_{k=0}^{n}\dfrac{1}{k!}\prod_{j=0}^{k-1}\left(1-\dfrac{j}{n}\right).

Each factor $(1-j/n)$ strictly increases with $n$ and the series has one extra positive term when $n$ grows by 1. Every summand grows, so the whole sum grows: $a_{n+1}>a_{n}$ .

Step 2 — Boundedness via factorial comparison

Each product $\prod(1-j/n)\le 1$ , so

a_{n}\le\sum_{k=0}^{n}\dfrac{1}{k!}\le 1+1+\dfrac{1}{2}+\dfrac{1}{2^{2}}+\cdots=3.

(We used $k!\ge 2^{k-1}$ for $k\ge 1$ .) So $a_n<3$ for all $n$ . A bounded increasing sequence must converge. Call the limit $e$ . The series on the right also converges (to the same number) — that is the other standard formula $e=\sum_{k\ge 0}1/k!$ .

From Discrete n to Continuous x

The same number $e$ appears in three equivalentlimits — the discrete sequence we just studied and two continuous-variable versions that come up constantly in calculus:

Form	Limit	Where you meet it
Integer sequence	(1 + 1/n)^n → e as n → ∞	Compound interest, discrete probability
Continuous base	(1 + h)^(1/h) → e as h → 0	General growth, differential equations
Exponential derivative	(e^h − 1)/h → 1 as h → 0	(e^x)' = e^x, neural network gradients
Logarithmic derivative	ln(1 + h)/h → 1 as h → 0	(ln x)' = 1/x at x = 1, information theory

The bridge from the discrete to the continuous form is simply the substitution $n=1/h$ : as $n\to\infty$ , $h\to 0^{+}$ , and

\left(1+\tfrac{1}{n}\right)^{n}=(1+h)^{1/h}.

The continuous version $(1+h)^{1/h}\to e$ works for $h\to 0$ from either side, not just through integer $n$ .

How (e^h − 1)/h → 1 follows

Take the log of both sides of $(1+h)^{1/h}\to e$ :

\dfrac{\ln(1+h)}{h}\to 1.

Substitute $u=\ln(1+h)$ , so $h=e^{u}-1$ , and invert:

\dfrac{e^{u}-1}{u}\to 1.

These three statements are algebraically interchangeable; all three are worth knowing under their own names.

Worked Example: Continuous Compounding on $1000

A bank offers 5% APR on a $1000 deposit for one year. They will compound at any frequency you choose. How much money do you have at the end?

Walk through the hand calculation

Step 1. If interest is compounded $n$ times per year at rate $r=0.05$ , the ending balance is

A_n=1000\left(1+\tfrac{r}{n}\right)^{n}.

Step 2. Plug in the standard compounding schedules:

Frequency	n	Ending balance
Annually	1	$1050.00
Monthly	12	$1051.1619
Daily	365	$1051.2675
Hourly	8760	$1051.2709

Step 3. Let $n\to\infty$ . Substitute $m=n/r$ so $n=rm$ and $m\to\infty$ too:

\left(1+\tfrac{r}{n}\right)^{n}=\left[\left(1+\tfrac{1}{m}\right)^{m}\right]^{r}\xrightarrow{m\to\infty}e^{r}.

So continuous compounding gives

A_\infty=1000\,e^{0.05}=\$1051.2711\ldots

Step 4 — the punchline. Hourly compounding was already within a fraction of a cent of the continuous limit. The marginal benefit of finer compounding collapses very quickly once you are past daily. That is the practical shadow of $a_n\to e$ : returns are finite even when the number of compounding steps is infinite.

The moral for modeling. Whenever a real process multiplies itself by $(1+\text{small})$ many times — population growth, inflation, viral spread, RC discharge — the right idealization is the continuous limit $e^{rt}$ . The special limit is not just a curiosity; it is the bridge between discrete repeated multiplication and continuous exponential growth.

Python: Watching Both Limits Converge

Before jumping to PyTorch, let us see the two limits converge in the simplest possible Python. No libraries beyond math. Click any line on the right panel — the left panel shows exactly what value every variable and argument carries at that moment.

Plain Python — both limits, decimal by decimal

🐍special_limits.py

Explanation(18)

Code(27)

1import math

Python's math module gives us sin, exp, and the constant math.e. Pure standard library, no NumPy needed — we only need scalar floats for this numerical study.

EXECUTION STATE

math = Standard library for elementary math. Provides math.sin, math.cos, math.exp, math.log, and math.e ≈ 2.718281828459045.

→ math.e = Euler's number as a Python float (double precision). About 2.718281828459045. We import it to measure how close (1 + 1/n)^n gets to its limit.

3def sinc_ratio(x) → float

The first special limit as a plain Python function. We'll probe it at shrinking x to see sin(x)/x pinch onto 1. The function is undefined at x = 0 (division by zero) — the whole point of the limit is to tell us what value f would naturally take there.

EXECUTION STATE

⬇ input: x = Any nonzero real number. We will call it with x = 1, 0.1, 0.01, … to see the ratio slide toward 1.

⬆ returns = math.sin(x) / x — a Python float. For x close to 0 this is ≈ 1 but never exactly 1 (except by rounding).

→ example = sinc_ratio(0.1) = sin(0.1) / 0.1 ≈ 0.0998334 / 0.1 ≈ 0.9983342

→ domain hole = x = 0 raises ZeroDivisionError. That's fine — the limit fills in the missing value by approaching, never evaluating.

5return math.sin(x) / x

math.sin computes the sine (radians) of x using the standard C library under the hood. We divide it by x to form the ratio. Both operations are O(1) floats.

EXECUTION STATE

📚 math.sin(x) = The sine function from C's math.h, wrapped by Python. Accepts radians. Accurate to roughly 15 decimal digits. Example: math.sin(0.1) = 0.09983341664682815.

→ why radians = The limit sin(x)/x → 1 only holds in RADIANS. In degrees sin(x)/x → π/180 ≈ 0.01745. The derivative (sin x)' = cos x also requires radians.

⬆ return: sin(x) / x = A scalar float. Example: for x = 1.0, returns 0.8414709848.

7def euler_sequence(n) → float

The second special limit as a Python function. We'll call it with n = 1, 10, 100, … to see a_n climb monotonically toward e.

EXECUTION STATE

⬇ input: n = A positive integer, the number of compounding periods. Larger n means finer splitting of the year; the limit n → ∞ is continuous compounding.

⬆ returns = (1 + 1/n) ** n — a Python float that approaches e as n grows.

→ sanity check = euler_sequence(1) = 2.0, euler_sequence(2) = 2.25, euler_sequence(4) = 2.44140625. Already climbing toward 2.718.

9return (1 + 1 / n) ** n

Two floating-point operations: first form the base 1 + 1/n (slightly bigger than 1 for positive n), then raise it to the integer power n. ** is Python's exponentiation operator.

EXECUTION STATE

1 + 1 / n = The per-period growth factor. For n = 12 (monthly), this is 1 + 1/12 ≈ 1.083333. Money is multiplied by this each month.

** n = Python's power operator. ( )**n multiplies the base by itself n times. Example: 1.01**100 ≈ 2.7048.

→ why 1/n SHRINKS but n GROWS = The base 1 + 1/n → 1 (would give 1) but raising to the n-th power → ∞ (would give ∞). The product 1^∞ is an indeterminate form, and the exact compromise is e.

⬆ return: a_n = Example: n = 1 → 2.0, n = 1000 → 2.7169239, n = 10⁶ → 2.7182804. Never quite reaches e.

12# --- sin(x)/x as x -> 0 ---

Comment marker opening the first of two numerical experiments. We're about to print a tiny table showing the ratio at x = 1, 0.1, 0.01, 0.001, 0.0001.

13print("sin(x) / x -> 1")

Header row for the first table. Tells the reader what limit we're verifying. Prints to stdout.

EXECUTION STATE

📚 print() = Python built-in. Converts each argument to string and writes to standard output followed by newline. No return value worth tracking.

14print(f"{'x':>10} {'f(x)':>18} {'|f(x) - 1|':>14}")

Prints column headings using f-string formatting with right alignment (>) and fixed widths. This guarantees our columns line up even though the header text and numeric values have different widths.

EXECUTION STATE

📚 f-string '{expr:>N}' = Format spec: pad the value to at least N characters, right-aligned. Example: f'{'x':>10}' → 9 spaces + 'x'. The colon separates the expression from the format spec.

15print("-" * 46)

String multiplication: "-" repeated 46 times produces a visual separator under the header. Python strings support * to repeat and + to concatenate.

EXECUTION STATE

"-" * 46 = A 46-character dash line: ----------------------------------------------

16for x in [1.0, 0.1, 0.01, 0.001, 0.0001]

Loop over a geometric sequence of shrinking probes. Each iteration picks one x, evaluates sinc_ratio, and prints the row. We use 5 widely spaced values so we can see the error shrink by about two decimal digits per row.

LOOP TRACE · 5 iterations

x = 1.0

f = sin(1.0) / 1.0 = 0.8414709848

|f - 1| = 1.59e-01 — still 16% off

x = 0.1

f = sin(0.1) / 0.1 = 0.9983341665

|f - 1| = 1.67e-03 — two digits closer

x = 0.01

f = sin(0.01) / 0.01 = 0.9999833334

|f - 1| = 1.67e-05 — error squared again

x = 0.001

f = sin(0.001) / 0.001 = 0.9999998333

|f - 1| = 1.67e-07

x = 0.0001

f = sin(0.0001) / 0.0001 = 0.9999999983

|f - 1| = 1.67e-09 — essentially 1

17f = sinc_ratio(x)

Evaluate the sinc ratio at the current probe x. The call returns a float. We save it once so we can print it and compute |f - 1| without recomputing.

EXECUTION STATE

f = The numerical value of sin(x)/x at the current x. Example: at x = 0.1, f = 0.9983341665.

18print(f"{x:>10.4e} {f:>18.12f} {abs(f - 1):>14.2e}")

Formatted print of one row. `{x:>10.4e}` means scientific notation with 4 fractional digits in a 10-wide field; `{f:>18.12f}` is fixed-point with 12 decimals; `{abs(f-1):>14.2e}` is scientific with 2 fractional digits. We use abs() to report the distance from the candidate limit 1.

EXECUTION STATE

📚 abs(v) = Python built-in. Returns |v| for numbers — turns -0.001 into 0.001.

→ format :>10.4e = Right-align in 10-char field, scientific, 4 digits after decimal. Example: 1.0000e-03.

→ format :>18.12f = Right-align in 18-char field, fixed-point, 12 digits after decimal. Example: 0.999999833334.

→ format :>14.2e = Scientific with 2 digits, useful because the error shrinks by orders of magnitude per row.

20print() # blank line

print() with no arguments just emits a newline. This visually separates the sinc table from the (1 + 1/n)^n table that follows.

21print("(1 + 1/n)^n -> e")

Header for the second experiment — the discrete sequence a_n = (1 + 1/n)^n.

22print(f"e = {math.e:.15f}")

Print the constant e so the reader has a direct comparison number. `.15f` asks for 15 decimal digits — the full precision of a Python float.

EXECUTION STATE

math.e = 2.718281828459045 — Euler's number to double precision.

→ .15f spec = Fixed-point, 15 decimals. Any more digits would be noise because a double holds ~15–17 significant decimal digits total.

25for n in [1, 10, 100, 10_000, 1_000_000]

Loop over n on a log scale. Each jump is ×10 or more, so the error e - a_n should shrink by roughly 10× per row. Python 3.6+ allows underscores in numeric literals for readability (10_000 means 10000).

LOOP TRACE · 5 iterations

n = 1

a_n = (1 + 1/1)^1 = 2.0

e - a_n = 7.18e-01 — off by 26%

n = 10

a_n = (1 + 0.1)^10 = 2.5937424601

e - a_n = 1.25e-01

n = 100

a_n = (1.01)^100 = 2.7048138294

e - a_n = 1.35e-02

n = 10 000

a_n = 2.7181459268

e - a_n = 1.36e-04

n = 1 000 000

a_n = 2.7182804691

e - a_n = 1.36e-06 — still 6 digits to go

26a = euler_sequence(n)

Evaluate a_n = (1 + 1/n)^n at the current n. We save it in `a` so we can reuse it below without recomputing the expensive power.

EXECUTION STATE

a = Current term of the sequence. Monotonically increasing in n; strictly less than e for every finite n.

27print(f"{n:>12,d} {a:>18.12f} {math.e - a:>14.2e}")

Same formatting pattern as before. The `>12,d` format includes a thousands separator, so 1_000_000 prints as `1,000,000` — much easier to read than `1000000`.

EXECUTION STATE

→ format :>12,d = Integer, width 12, with thousands-comma separator. Example: 1,000,000.

math.e - a = The gap between the true limit and our current approximation. For n = 10⁶ this is ≈ 1.36e-06 — our sequence converges at rate O(1/n), painfully slow compared to most analytic limits.

→ convergence rate = Doubling n roughly halves the gap. To get one more decimal digit we need about 10× more n. This is why numerical analysts normally use the Taylor series for e, not this limit.

9 lines without explanation

1import math
2
3def sinc_ratio(x):
4    """f(x) = sin(x) / x — defined for x != 0, limit at 0 is 1."""
5    return math.sin(x) / x
6
7def euler_sequence(n):
8    """a_n = (1 + 1/n)^n — the compound-interest sequence converging to e."""
9    return (1 + 1 / n) ** n
10
11# --- sin(x)/x as x -> 0 ---
12print("sin(x) / x ->  1")
13print(f"{'x':>10}  {'f(x)':>18}  {'|f(x) - 1|':>14}")
14print("-" * 46)
15for x in [1.0, 0.1, 0.01, 0.001, 0.0001]:
16    f = sinc_ratio(x)
17    print(f"{x:>10.4e}  {f:>18.12f}  {abs(f - 1):>14.2e}")
18
19# --- (1 + 1/n)^n as n -> infinity ---
20print()
21print("(1 + 1/n)^n ->  e")
22print(f"e = {math.e:.15f}")
23print(f"{'n':>12}  {'a_n':>18}  {'e - a_n':>14}")
24print("-" * 48)
25for n in [1, 10, 100, 10_000, 1_000_000]:
26    a = euler_sequence(n)
27    print(f"{n:>12,d}  {a:>18.12f}  {math.e - a:>14.2e}")

What the output looks like

sin(x) / x ->  1
         x                f(x)       |f(x) - 1|
----------------------------------------------
1.0000e+00    0.841470984808    1.59e-01
1.0000e-01    0.998334166468    1.67e-03
1.0000e-02    0.999983333417    1.67e-05
1.0000e-03    0.999999833333    1.67e-07
1.0000e-04    0.999999998333    1.67e-09

(1 + 1/n)^n ->  e
e = 2.718281828459045
           n                 a_n         e - a_n
------------------------------------------------
           1      2.000000000000    7.18e-01
          10      2.593742460100    1.25e-01
         100      2.704813829422    1.35e-02
      10,000      2.718145926825    1.36e-04
   1,000,000      2.718280469156    1.36e-06

The sinc error drops by a factor of 100 per row (quadratic convergence — because $\sin x=x-x^3/6+\ldots$ ), while the Euler error drops by only a factor of 10 per row (linear convergence). Two different rates of "approach" — both valid limits.

PyTorch: Using These Limits as Derivatives

The classical proof that $(\sin x)' = \cos x$ uses $\sin h/h\to 1$ . The proof that $(e^{x})'=e^{x}$ uses $(e^{h}-1)/h\to 1$ . PyTorch's autograd implements those derivative rules, so we can verify numerically that the two "special" limits are exactly the numbers autograd produces.

PyTorch — the limits that autograd bakes in

🐍autograd_special_limits.py

Explanation(17)

Code(20)

1import torch

PyTorch's core module. Provides torch.tensor (a differentiable N-D array) and an autograd engine that computes derivatives by reverse-mode differentiation.

EXECUTION STATE

torch = The PyTorch library. Tensors, autograd, neural network layers, and CUDA/MPS acceleration all live here. We only need tensors and autograd on CPU for this demo.

→ why autograd here = Autograd implements the derivative as a LIMIT under the hood — but it does so symbolically using rules like (sin x)' = cos x. We will verify numerically that those rules match the limits we just derived.

3# ---- sin'(0) = 1 follows from sin(x)/x -> 1 ----

Section marker. sin'(0) is the textbook shorthand for d/dx sin(x) evaluated at x = 0. The classical proof uses exactly the limit sin(h)/h → 1 that we just verified.

4x = torch.tensor(0.0, requires_grad=True)

Create a 0-dimensional (scalar) tensor with value 0.0 and tell PyTorch to track operations on it. We need requires_grad=True so the autograd tape will remember that x is a leaf with a gradient slot.

EXECUTION STATE

📚 torch.tensor(data, requires_grad) = Factory function that builds a torch.Tensor from Python data. Returns a multi-dimensional array object that supports autograd when requires_grad=True.

⬇ arg: 0.0 = The numeric payload. A Python float becomes a 0-dim float32 tensor. We pick 0 because that is exactly where we want to test the derivative.

⬇ arg: requires_grad=True = Marks the tensor as a leaf variable whose gradient should be collected when we call backward(). Without this, y.backward() would complain that no parameter needs gradients.

⬆ result: x = tensor(0., requires_grad=True) — shape (), dtype torch.float32.

5y = torch.sin(x)

Apply the sine function to x. Because x has requires_grad=True, PyTorch records this operation on the autograd tape along with its local derivative (cos x). y becomes a non-leaf tensor that remembers how to propagate gradients back to x.

EXECUTION STATE

📚 torch.sin(input) = Element-wise sine. Accepts radians and returns a tensor of the same shape. Its local derivative w.r.t. input is cos(input). This local rule is exactly what the limit sin(h)/h → 1 lets us prove.

⬇ arg: x = A scalar tensor with value 0.0 and tracking enabled.

⬆ result: y = tensor(0., grad_fn=<SinBackward0>) — value sin(0) = 0.

→ grad_fn = Points to the backward callback PyTorch will invoke. Stores enough context (here just x) to compute dy/dx = cos(x) when needed.

6y.backward()

Run reverse-mode autodiff starting from the scalar y. PyTorch walks the tape backward, applying the chain rule, and deposits the derivative into x.grad. Because y is scalar, no explicit initial gradient is needed (PyTorch assumes dL/dy = 1).

EXECUTION STATE

📚 tensor.backward() = Runs the backward pass on the graph ending at this tensor. For a scalar output, it is equivalent to computing dy/dparam for every parameter with requires_grad=True.

→ effect = Populates x.grad with cos(0) = 1. This is the same 1 that the limit sin(h)/h gives us — autograd has no magic, just the limit baked into the rule.

7print(f"d/dx sin(x) at x=0 : {x.grad.item():.8f}")

Print the computed derivative. `.item()` extracts the Python float out of the 0-dim tensor. We expect 1.0 to 8 decimals — exactly what the special limit predicted.

EXECUTION STATE

📚 tensor.item() = Converts a 1-element tensor into a Python scalar. Raises if the tensor has more than one element. Cheap — no copy.

x.grad = tensor(1.) — the derivative slot filled in by backward(). Matches cos(0) = 1.

→ printed value = d/dx sin(x) at x=0 : 1.00000000

9# ---- (eˣ)'(0) = 1 follows from (eᴴ - 1)/h -> 1 ----

Section header for the second derivative. The classical proof of (eˣ)'(0) = 1 is exactly the limit (e^h − 1)/h → 1, which is the continuous cousin of (1 + 1/n)^n → e.

10x2 = torch.tensor(0.0, requires_grad=True)

A fresh scalar tensor for the exp experiment. We can't reuse x because its gradient has already been accumulated — autograd would otherwise sum our new gradient into the old one.

EXECUTION STATE

x2 = tensor(0., requires_grad=True) — independent of x.

→ why a new tensor? = PyTorch accumulates gradients. Calling backward a second time on the same leaf would add to its existing .grad rather than overwriting it. Fresh tensor = fresh gradient slot.

11y2 = torch.exp(x2)

Apply the exponential. torch.exp records its own local derivative (which equals the output itself, since d/dx eˣ = eˣ). At x = 0 both value and derivative equal 1.

EXECUTION STATE

📚 torch.exp(input) = Element-wise eˣ. Its local derivative w.r.t. input is exp(input) itself — the most economical gradient rule in all of analysis.

⬆ result: y2 = tensor(1., grad_fn=<ExpBackward0>) — value exp(0) = 1.

12y2.backward()

Trigger autodiff for the exp graph. PyTorch multiplies the upstream gradient (1 by default) by the local derivative exp(0) = 1 and stores 1.0 in x2.grad.

13print(f"d/dx exp(x) at x=0 : {x2.grad.item():.8f}")

Same pattern as before — print the scalar derivative. Expected output is 1.00000000, matching the limit (eʰ − 1)/h → 1.

EXECUTION STATE

x2.grad = tensor(1.) — equals exp(0).

→ printed value = d/dx exp(x) at x=0 : 1.00000000

15# ---- Verify the discrete limit via a finite difference ----

Sanity check: we now compute sin(h)/h and (e^h − 1)/h directly at a tiny h and confirm both are essentially 1. This mirrors what autograd did, but uses the limit definition explicitly.

16h = torch.tensor(1e-4)

A small but not tiny h. 10⁻⁴ is large enough to avoid catastrophic cancellation in (eʰ − 1) but small enough that we expect the ratios within 10⁻⁸ of 1.

EXECUTION STATE

h = tensor(0.0001) — a 0-dim float32 tensor.

→ why not 1e-12? = Too small. With float32 precision (~7 digits), eʰ − 1 would lose most of its significant digits — we'd be subtracting two numbers that agree in the first 6 decimals. 1e-4 is the sweet spot.

17fd_sin = torch.sin(h) / h

Numerical version of sin(h)/h. At h = 1e-4 this should equal approximately 1 − h²/6 ≈ 0.99999999833…. We'll see 1.00000000 because float32 can't represent the 10⁻⁹ deviation.

EXECUTION STATE

torch.sin(h) = tensor(9.9998e-05) — sin expands as h − h³/6 + O(h⁵). At h = 1e-4 the cubic term is 1.67e-13, invisible at float32.

fd_sin = tensor(1.0000) — the numerical confirmation of sin(x)/x → 1.

18fd_exp = (torch.exp(h) - 1) / h

Numerical version of (eʰ − 1)/h. Note the parentheses — without them we'd compute exp(h) − 1/h and get nonsense. For h = 1e-4 this is ≈ 1 + h/2 ≈ 1.00005, still extremely close to 1.

EXECUTION STATE

torch.exp(h) - 1 = tensor(1.0000999e-04) — eʰ expands as 1 + h + h²/2 + O(h³). Subtracting 1 removes the leading term and leaves h(1 + h/2 + …).

(exp(h) - 1) / h = tensor(1.00005) — dividing by h removes the factor of h and exposes 1 + h/2, which → 1 as h → 0.

→ parenthesis trap = Without the outer (...), Python evaluates torch.exp(h) - 1 / h as torch.exp(h) - (1/h) because / has higher precedence. Always wrap the numerator of a finite difference.

19print(f"sin(h)/h at h=1e-4 : {fd_sin.item():.8f}")

Print fd_sin. Expected output: 1.00000000. That is the entire meaning of sin(x)/x → 1 compressed to one row.

20print(f"(exp(h)-1)/h at h=1e-4 : {fd_exp.item():.8f}")

Print fd_exp. Expected output: 1.00004995 — the h/2 correction is visible because 1e-4 isn't small enough to hide it, but the limit is clearly 1.

3 lines without explanation

1import torch
2
3# ---- sin'(0) = 1 follows from sin(x)/x -> 1 ----
4x = torch.tensor(0.0, requires_grad=True)
5y = torch.sin(x)
6y.backward()
7print(f"d/dx sin(x) at x=0 : {x.grad.item():.8f}")  # expect 1.0
8
9# ---- (eˣ)'(0) = 1 follows from (eᴴ - 1)/h -> 1 ----
10x2 = torch.tensor(0.0, requires_grad=True)
11y2 = torch.exp(x2)
12y2.backward()
13print(f"d/dx exp(x) at x=0 : {x2.grad.item():.8f}")  # expect 1.0
14
15# ---- Verify the discrete limit via a finite difference ----
16h = torch.tensor(1e-4)
17fd_sin = torch.sin(h) / h                         # -> 1
18fd_exp = (torch.exp(h) - 1) / h                   # -> 1
19print(f"sin(h)/h        at h=1e-4 : {fd_sin.item():.8f}")
20print(f"(exp(h)-1)/h    at h=1e-4 : {fd_exp.item():.8f}")

The deeper story

Every differentiable function in PyTorch has a local linear approximation baked into its backward rule. That local slope is the limit $(f(x+h)-f(x))/h$ as $h\to 0$ . For $\sin$ and $\exp$ , those limits are exactly the two special limits of this section. Every gradient-descent step you will ever run is built on top of them.

Where Both Limits Show Up

sin(x)/x shows up in…

Physics: small-angle pendulum $\ddot\theta+\tfrac{g}{L}\sin\theta=0$ becomes simple harmonic motion under $\sin\theta\approx\theta$ .
Optics: single-slit diffraction amplitude is literally the sinc function $\sin(x)/x$ .
Signal processing: ideal low-pass filter's impulse response is $\sin(\pi t)/(\pi t)$ — the Whittaker–Shannon interpolation formula.
Geometry: the derivative $(\sin x)'=\cos x$ and the arc-length element $ds=\sqrt{1+(y')^{2}}\,dx$ both rely on the small-angle linearization.

(1 + 1/n)^n shows up in…

Finance: continuous compounding $A=Pe^{rt}$ is the n → ∞ limit of discrete compounding.
Probability: the Poisson limit of a Binomial $\text{Bin}(n,\lambda/n)\to\text{Poi}(\lambda)$ rides on $(1-\lambda/n)^{n}\to e^{-\lambda}$ .
Physics: radioactive decay $N(t)=N_{0}e^{-\lambda t}$ and RC-circuit discharge are the continuous-time realizations of the sequence limit.
Machine learning: softmax normalization relies on $e^{z}$ , whose derivative rule is exactly $(e^{h}-1)/h\to 1$ . Adam's running-average decay is a discrete first-order analog of $e^{-\beta t}$ .

Common Pitfalls

Pitfall 1 — Using degrees instead of radians

$\sin x/x\to 1$ only in radians. In degrees $\sin x/x\to \pi/180\approx 0.01745$ . Every calculus rule for trig functions silently assumes radians — that is why it is the natural unit of angle.

Pitfall 2 — Evaluating (1 + 1/n)^n numerically for huge n

For $n\gtrsim 10^{16}$ , the expression $1+1/n$ rounds to exactly 1 in double precision, and the computer reports the limit as $1^{n}=1$ — a catastrophic underflow. Use the series $e=\sum 1/k!$ or the library math.exp(1) for actual high-precision work.

Pitfall 3 — Confusing sin(5x)/x with sin(5x)/(5x)

The second ratio obviously has limit 1 — it is the sinc limit with $u=5x$ . The first has limit 5. Missing the compensating factor of 5 is the most common algebra error students make with this limit family.

Pitfall 4 — Claiming the e-limit is "obvious"

The form $1^{\infty}$ is indeterminate: $(1)^{\infty}=1$ , but $(1+1/\sqrt n)^{n}\to\infty$ and $(1+1/n^{2})^{n}\to 1$ . The precise pairing of $1/n$ in the base with $n$ in the exponent is what makes the limit equal a finite number — and the number happens to be e.

Summary

Two limits — one trigonometric, one exponential — act as the base cases for almost every derivative rule in single-variable calculus. Everything else is built from them by algebraic manipulation.

Limit	Value	Proof tool	Derivative it unlocks
sin(x) / x as x → 0	1	Unit-circle areas + Squeeze Theorem	(sin x)' = cos x
(1 − cos x) / x² as x → 0	1/2	Algebra + first limit	(cos x)' = − sin x
tan(x) / x as x → 0	1	Product of two easy limits	(tan x)' = sec²(x)
(1 + 1/n)^n as n → ∞	e	Monotone + bounded sequence	Definition of e
(e^h − 1) / h as h → 0	1	Logarithm of the e-limit	(e^x)' = e^x
ln(1 + h) / h as h → 0	1	Log of the e-limit	(ln x)' = 1/x at x = 1

Key Takeaways

sin x / x → 1 is the geometric fact that the sine of a tiny angle is the angle (measured in radians). It is proved with one squeeze and ends up under every derivative of every trig function.
(1 + 1/n)^n → e is the analytic fact that compounding converges. Equivalent to $(e^{h}-1)/h\to 1$ and $\ln(1+h)/h\to 1$ , so it underlies every derivative involving e and ln.
Both limits are of indeterminate form. Neither follows from arithmetic alone — one needs geometry, the other needs a monotone-bounded argument.
Numerically, sinc converges quadratically, the e-sequence only linearly. "Same limit" in analysis does not mean "same speed" in computation.
Every gradient PyTorch computes for $\sin, \cos, \exp, \ln$ is these two limits in disguise. Autograd doesn't take a new limit each time — it uses the closed-form rule that these limits proved once and for all.

The two pillars of single-variable calculus:

"For small angles, sine is the angle. For large compounding, growth is e."

Coming Next: With these two special limits in our toolbox, we can finally formalize the shortcut every student wants for $0/0$ and $\infty/\infty$ forms — L'Hôpital's rule. The next section shows how to use it, when it fails, and why it's a corollary (not a replacement) of the work we did here.