Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

State the Squeeze Theorem and explain why it works from the definition of a limit.
Recognise when a function is a candidate for the squeeze — oscillations, signed chaos, or products of bounded and vanishing pieces.
Construct two bounding functions g(x) and h(x) that trap a wild function and both converge to the same value.
Prove $\lim_{x \to 0} x^{2} \sin(1/x) = 0$ and $\lim_{x \to 0} \frac{\sin x}{x} = 1$ — the two flagship examples.
Verify the squeeze numerically in Python and use PyTorch to see what does — and does not — transfer to the derivative.

The Problem: A Function Too Wild to Evaluate

"Some functions are so chaotic near a point that you cannot compute their limit directly. The squeeze theorem is the detective tool — you never touch the wild function itself; you trap it."

Consider $f(x) = x^{2}\, \sin\!\big(\tfrac{1}{x}\big)$ . As $x \to 0$ the inner quantity $1/x$ explodes, and $\sin(1/x)$ oscillates between $-1$ and $+1$ infinitely many times in any neighbourhood of 0. Plugging $x = 0$ in is illegal (division by zero). All of our earlier tricks — factoring, limit laws, substitution — fail, because $\sin(1/x)$ has no limit at 0.

The key observation

We don't actually need to know what $\sin(1/x)$ is doing. We only know it stays between $-1$ and $+1$ . Multiply by $x^{2}$ and we get a trap:

-x^{2} \;\le\; x^{2} \sin(1/x) \;\le\; x^{2}.

Both bounds go to 0. The function is pinned between them. It has no choice.

Intuition: Sandwiched Between Two Bounds

Imagine three people walking along a number line — Low, Middle, and High — with these rules:

Low and High are always moving so that $\text{Low}(x) \le \text{Middle}(x) \le \text{High}(x)$ .
As $x \to c$ , Low and High both arrive at the same destination L.

Middle has absolutely no freedom: it is penned in between two walls that are converging on the same point. By the end it must also be standing at L. It does not matter how wildly Middle zigzagged along the way — the only escape would be to cross one of the walls, and the rules forbid that.

Why this works from ε–δ

Given any $\varepsilon > 0$ , pick δ so small that both Low and High are within $\varepsilon$ of L. Any x inside this δ-window forces Middle into an interval $(L - \varepsilon,\, L + \varepsilon)$ — which is exactly what the ε–δ definition of a limit demands. The squeeze theorem is not magic; it is a direct, tiny corollary of the ε–δ definition from Section 2.5.

The Squeeze Theorem

Squeeze Theorem

Suppose $g(x) \le f(x) \le h(x)$ for every x in some open interval containing $c$ (possibly except at $c$ itself). If

\lim_{x \to c} g(x) = \lim_{x \to c} h(x) = L,

then $\lim_{x \to c} f(x)$ exists and equals $L$ .

Ingredient	What it means	Where we get it
g(x) ≤ f(x) ≤ h(x)	Two bounds trap f on some interval near c.	Produced by hand — usually from known inequalities like \|sin\| ≤ 1.
limit of g = limit of h = L	Both walls march to the same destination.	Proved with earlier limit laws or direct computation.
limit of f = L	f is forced to land at L too.	The conclusion — no independent calculation needed.

You never evaluate the wild function

This is the whole point. The theorem's power is that $f$ can be arbitrarily badly behaved — oscillate, jitter, switch sign infinitely often — and we still nail its limit just by finding two polite functions that sandwich it.

Interactive: Watch the Squeeze Collapse

The picture is worth a thousand proofs. Pick a preset, then drag the Zoom window slider toward 0. The rose and emerald curves represent the lower and upper bounds. Watch how the blue trapped curve has no choice but to collapse onto the meeting point.

Loading Squeeze visualizer…

Try the following experiment loop:

Start with x² sin(1/x) at wide zoom. Notice the blue curve oscillates so fast it looks like a filled region.
Shrink the zoom window. The rose and emerald envelopes pinch toward (0, 0). The blue curve's amplitude has to shrink with them.
Switch to x² cos(50/x). Even this far more violent oscillator is helpless — the envelope wins.
Switch to x · sin(1/x). The bounds are now ±|x|, linear instead of quadratic. The squeeze still works, but the convergence is slower.
Switch to Smooth squeeze at x = 1. The bounds here are smooth polynomials, and the meeting point is (1, 2). Same theorem, different shape.

The Strategy for Using It

Almost every squeeze-theorem argument follows the same three moves. When you see a problem, recognise these patterns and you are done:

Identify the bounded piece. Usually something like $\sin(\text{anything})$ , $\cos(\text{anything})$ , or a fractional part — a quantity you know lives in a fixed interval like $[-1, 1]$ or $[0, 1]$ .
Multiply by the vanishing piece. The bounded piece is multiplied by a factor that goes to 0 (or to some other tame limit). The product cannot grow beyond the bounded interval times the vanishing factor.
Write the inequality and take limits. Both envelope limits are tame; compute them with the limit laws. The trapped function is done for.

Pattern	Bounded piece	Vanishing piece	Envelope
x² sin(1/x)	sin(1/x) ∈ [−1, 1]	x² → 0	[−x², x²]
x cos(1/x³)	cos(1/x³) ∈ [−1, 1]	x → 0	[−\|x\|, \|x\|]
(sin x) / x	sin x (but normalised)	— (special)	[cos x, 1] near 0
e^{-1/x²} · sin(1/x)	sin(1/x)	e^{-1/x²} → 0 fast	[−e^{-1/x²}, +e^{-1/x²}]

Worked Example: lim x² sin(1/x) = 0

Let's prove $\displaystyle\lim_{x \to 0} x^{2} \sin(1/x) = 0$ step by step, the way you would on an exam.

📝 Step-by-step numerical walkthrough — try it yourself first

Step 1 — Find a bound you know cold. The sine function never escapes $[-1, 1]$ , whatever its argument. So for every $x \neq 0$ ,

-1 \;\le\; \sin(1/x) \;\le\; 1.

Step 2 — Multiply by a non-negative vanishing factor. $x^{2}$ is always ≥ 0, so multiplying preserves the direction of the inequalities:

-x^{2} \;\le\; x^{2}\sin(1/x) \;\le\; x^{2}.

Why x² (and not x)?

If we had multiplied by $x$ instead, we would have to split into cases $x > 0$ and $x < 0$ because multiplying an inequality by a negative number flips it. $x^{2}$ is non-negative on both sides of 0 — life is easier.

Step 3 — Take limits of the bounds. $\lim_{x \to 0}(-x^{2}) = 0$ and $\lim_{x \to 0}(x^{2}) = 0$ . Both envelopes march to the same destination.

Step 4 — Invoke the theorem. Since $-x^{2} \le x^{2}\sin(1/x) \le x^{2}$ and both outer expressions have limit 0,

\lim_{x \to 0} x^{2}\sin(1/x) = 0. \quad \blacksquare

Step 5 — Plug in a number to feel it. At $x = 0.01$ :

$-x^{2} = -10^{-4} = -0.0001$
$x^{2}\sin(1/x) = 10^{-4} \cdot \sin(100) \approx -5.06 \times 10^{-5}$
$+x^{2} = +10^{-4} = +0.0001$

The middle value lives comfortably between the two envelopes, and both envelopes are already under $10^{-4}$ . Shrink x by another factor of 10 and everything shrinks by 100 — the squeeze is a converging vice.

The Most Famous Squeeze: sin x / x → 1

No limit in calculus is more pivotal than

\lim_{x \to 0} \frac{\sin x}{x} = 1.

It is the reason $\frac{d}{dx}\sin x = \cos x$ , and it shows up whenever you linearise a rotation, compute a Fourier integral, or take a derivative of any trigonometric function. Plugging in $x = 0$ gives the indeterminate form $0/0$ — so we need a trick. The squeeze theorem, combined with a geometric insight, seals the deal.

Why sin x ≤ x ≤ tan x — The Geometric Argument

Draw a unit circle. Pick an angle $\theta \in (0, \pi/2)$ . Look at three regions:

The rose triangle with vertices O, $P = (\cos\theta, \sin\theta)$ , Q = (1, 0). Its area is $\tfrac{1}{2} \sin\theta$ .
The violet circular sector from Q to P, with area $\tfrac{1}{2}\theta$ (the unit-circle sector formula).
The emerald triangle with vertices O, Q, $T = (1, \tan\theta)$ . Its area is $\tfrac{1}{2} \tan\theta$ .

The rose triangle is inside the violet sector, and the violet sector is inside the emerald triangle. Comparing areas gives

\tfrac{1}{2}\sin\theta \;\le\; \tfrac{1}{2}\theta \;\le\; \tfrac{1}{2}\tan\theta \quad\Longrightarrow\quad \sin\theta \;\le\; \theta \;\le\; \tan\theta.

Divide by $\sin\theta > 0$ and flip reciprocals (carefully — flipping reverses the inequality direction):

\cos\theta \;\le\; \frac{\sin\theta}{\theta} \;\le\; 1.

Both outer quantities head to 1 as $\theta \to 0^{+}$ . $\sin\theta / \theta$ is trapped between them, so it too goes to 1. The same argument with $\theta \to 0^{-}$ follows by the even-parity of $\sin(x)/x$ : $\sin(-x)/(-x) = \sin(x)/x$ . Both one-sided limits are 1, so the two-sided limit is 1.

Loading unit-circle diagram…

Drag the θ slider. As θ shrinks, all three areas collapse, but the ratio of the rose to the emerald — which is $\cos\theta$ — sprints to 1. The violet sector, with area equal to $\tfrac{1}{2}\theta$ , is squeezed to match. The geometry is the proof.

Seeing the Squeeze as a Plot

Here are the three curves — $\cos x$ (lower), $\sin x / x$ (trapped), $1$ (upper) — plotted simultaneously across $x \in [-1.5, 1.5]$ . Slide the probe toward $x = 0$ and watch the numerical readouts converge.

Loading sin x / x plot…

A useful numerical fingerprint

At $x = 0.1$ : $\cos(0.1) \approx 0.99500$ , $\sin(0.1)/0.1 \approx 0.99833$ , and $1 = 1$ . The band from cos to 1 has width $1 - \cos(0.1) \approx 0.005$ , and the trapped value sits comfortably inside. Halve x and the band width drops by a factor of four (by the Taylor series $1 - \cos x \approx x^{2}/2$ ).

Python: Verifying a Squeeze Numerically

A computer cannot prove an ∀x statement, but it can flood the interval with samples and verify the bound holds at every single one. Below we check that $-x^{2} \le x^{2}\sin(1/x) \le x^{2}$ holds across 1001 sample points and then print the shrinking envelope as x marches to 0.

Pure Python — verify the squeeze and watch the envelope collapse

🐍squeeze_theorem.py

Explanation(21)

Code(38)

1import math

Python's math module provides math.sin, which we need for the inner sin(1/x) — the wild oscillator at the heart of this example. We prefer math over numpy here because our sample is scalar and we want to see pure Python arithmetic without any array abstraction.

EXECUTION STATE

math = Python's standard numerical library. Provides math.sin, math.cos, math.sqrt, math.pi, etc. Single-value (scalar) math — not vectorised.

3def f(x): the wild function

Defines the function we want to squeeze. f(x) = x² · sin(1/x) has two ingredients: the envelope x² (which vanishes at 0) and the oscillator sin(1/x) (whose argument blows up as x → 0, creating infinitely many oscillations packed into any neighbourhood of 0). Evaluating the limit by plugging in x = 0 fails because sin(1/0) is undefined — the squeeze theorem is the tool that saves us.

EXECUTION STATE

⬇ input: x = Any real number. We will mostly sample x near 0 but the function is well-defined for every x ≠ 0.

⬆ returns = 0.0 when x = 0 (our choice — redefining removes the singularity). Otherwise x * x * sin(1/x). A single Python float.

→ example = f(0.1) = 0.01 · sin(10) ≈ 0.01 · (−0.5440) = −0.005440

→ why define f(0) = 0? = Not required by the squeeze theorem, but making f continuous at 0 matches the limiting behaviour. Any value here is fine — the limit is about x near 0, not at 0.

5if x == 0: return 0.0

Guard against division by zero inside sin(1/x). Without this guard Python would raise ZeroDivisionError the moment the outer sample lands exactly on zero. This is purely a programming detail — the mathematics of the squeeze theorem does not require f to be defined at 0.

EXECUTION STATE

x == 0 = Exact equality with zero. For floating point this is safe because we construct x = k/500 and k = 0 gives exact 0.0.

⬆ early return = 0.0 — a defensive value. The squeeze will prove the limit is 0, so this is self-consistent.

7return x * x * math.sin(1 / x)

The actual formula. Compute 1/x first (a large number when x is tiny), feed it to math.sin (whose output is bounded in [−1, 1]), and finally multiply by x² (which kills whatever oscillation was there). The multiplication order is irrelevant — this ordering matches how the squeeze inequality is written.

EXECUTION STATE

📚 math.sin(y) = Computes the sine of y radians. Output is always in [−1, 1]. Example: math.sin(0) = 0, math.sin(math.pi/2) = 1.

→ 1 / x = Reciprocal. For x = 0.1 this is 10. For x = 0.01 this is 100 — the smaller x gets, the faster sin(1/x) oscillates.

x * x = The envelope. For x = 0.1 this is 0.01 — a tiny number that crushes whatever sin(1/x) returns.

⬆ return = For x = 0.1: 0.01 · sin(10) = 0.01 · (−0.5440) = −0.005440. For x = 0.01: 0.0001 · sin(100) = 0.0001 · (−0.5064) = −5.064e-5.

9def lower(x): -x²

The lower bound of the squeeze. Since sin(1/x) ≥ −1 always, x² · sin(1/x) ≥ x² · (−1) = −x². This is the smallest value our oscillator can ever reach at x.

EXECUTION STATE

⬇ input: x = Same x we will feed to f.

⬆ returns = −x². Always ≤ 0. Vanishes as x → 0.

→ example = lower(0.1) = −0.01.

13def upper(x): +x²

The upper bound of the squeeze. Since sin(1/x) ≤ 1 always, x² · sin(1/x) ≤ x² · 1 = x². This is the largest value our oscillator can ever reach at x.

EXECUTION STATE

⬇ input: x = Same x fed to f.

⬆ returns = +x². Always ≥ 0. Vanishes as x → 0.

→ example = upper(0.1) = +0.01.

17Header print for step 1 table

Sets up a four-column table we will NOT actually populate row-by-row (it would be 1001 lines long). Instead we will do the inequality check silently and report a single summary number. This header explains what WOULD appear — it is documentation for the subsequent sweep.

EXECUTION STATE

f-string = Python 3.6+ string interpolation. {'x':>10} means 'insert the string "x" right-aligned in a 10-wide field'.

:>12 = Right-align in a 12-character field. Chosen so that scientific-notation numbers like ' 1.23e-04' fit cleanly.

18print("-" * 62)

String multiplication repeats '-' sixty-two times to form a horizontal rule underneath the header. Quick, readable separator.

EXECUTION STATE

"-" * 62 = String of 62 dashes.

19violations = 0

Counter for the number of samples where the inequality g(x) ≤ f(x) ≤ h(x) fails. We expect zero violations — that is the programmatic confirmation that the squeeze really holds.

EXECUTION STATE

violations = Integer counter. Starts at 0, increments only if the inequality is ever broken.

20for k in range(-500, 501)

Loop the integer k from −500 to 500 inclusive (1001 values). On the next line we will map k to x = k/500, giving 1001 samples evenly spaced across [−1, 1]. Sampling on both sides of 0 is critical because the squeeze must hold from both directions.

EXECUTION STATE

📚 range(start, stop) = Built-in Python generator that yields start, start+1, …, stop−1. Note: stop is exclusive, so range(-500, 501) yields through 500.

⬇ arg 1: start = -500 = Lower inclusive bound. We start left of 0 to catch negative-x behaviour.

⬇ arg 2: stop = 501 = Exclusive upper bound. Using 501 (not 500) ensures k = 500 is included in the iteration, giving us x = 1.0 as the last sample.

count = 500 − (−500) + 1 = 1001 samples total.

21x = k / 500.0

Convert the integer k into a floating-point x in [−1, 1]. Dividing by 500.0 (not 500) guarantees Python 3 float division — safer across versions. For k = 0 we get x = 0.0, which is where the guard in f() kicks in.

EXECUTION STATE

x = A float uniformly sampled from [−1, 1] with step 0.002. For k = 0: x = 0.0. For k = 250: x = 0.5. For k = −50: x = −0.1.

22g, fx, h = lower(x), f(x), upper(x)

Tuple-unpack the three values of the squeeze inequality at this x. Three function calls on one line — Python evaluates left-to-right and assigns in parallel, so g, fx, h are always consistent with a single x.

EXECUTION STATE

g = −x² — the lower bound at this x.

fx = x² sin(1/x) — the trapped function.

h = +x² — the upper bound.

→ example (x = 0.1) = g = −0.01, fx = −0.005440, h = +0.01.

23if not (g - 1e-12 <= fx <= h + 1e-12): violations += 1

Strict programmatic check of the squeeze inequality with a tiny tolerance (10⁻¹²) absorbing floating-point rounding. If the trapped value ever escapes the band, increment the violations counter. A final value of 0 is the experimental proof that the bound holds for every sample.

EXECUTION STATE

g - 1e-12 <= fx <= h + 1e-12 = Chained comparison. Reads: g (minus rounding slop) is ≤ fx AND fx is ≤ h (plus rounding slop). Both must be true.

1e-12 tolerance = Guards against floating-point noise where, say, sin(1/x) rounds to exactly 1.0 and fx ties h — without the tolerance the ≤ would pass but == would cause confusing failures elsewhere.

not (…) = Inverts the comparison. Execution enters the if-body only if the inequality FAILS.

violations += 1 = Compound assignment shorthand for violations = violations + 1. We expect this line never to execute.

25print violation summary

After the loop, print the running count. A value of 0 is the empirical confirmation that g ≤ f ≤ h for every one of the 1001 samples. It's not a mathematical proof — only dense evidence — but it's a reassuring sanity check.

EXECUTION STATE

expected output = 'violations over 1001 samples: 0'

26print() — blank line

An empty print() emits just a newline. It visually separates step 1's one-line summary from step 2's upcoming table.

29Second-table header line

Headers for the squeeze table we are about to fill row-by-row. This time we really do print values because only six rows are needed to see the collapse.

30print("-" * 52)

52-character horizontal rule matching the new four-column width (:>10 + :>12 + :>12 + :>12 + separators).

31for x in [0.5, 0.1, 0.05, 0.01, 0.001, 1e-5]

A logarithmic sweep of x values marching toward 0. We skip x = 0 itself (the limit is about behaviour near 0). The ratio between adjacent x values is roughly 5–10, so we see six decades of shrinking without a wall of text.

LOOP TRACE · 6 iterations

x = 0.5

g = −x² = −0.25

f = x² sin(1/x) = +0.2273

h = +x² = +0.25

x = 0.1

g = −x² = −0.01

f = x² sin(1/x) = −0.005440

h = +x² = +0.01

x = 0.05

g = −x² = −0.0025

f = x² sin(1/x) = +0.002282

h = +x² = +0.0025

x = 0.01

g = −x² = −1.0e-4

f = x² sin(1/x) = −5.064e-5

h = +x² = +1.0e-4

x = 0.001

g = −x² = −1.0e-6

f = x² sin(1/x) = +8.27e-7

h = +x² = +1.0e-6

x = 1e-5

g = −x² = −1.0e-10

f = x² sin(1/x) = ≈ 3.6e-11

h = +x² = +1.0e-10

32print formatted squeeze row

Print x, lower, f(x), upper side-by-side. Format specs :.2e keep the scientific notation short (two fractional digits), so every row is visually uniform even as the magnitudes shrink by 10⁻¹⁰.

EXECUTION STATE

:.5f = Fixed-point, 5 fractional digits. 0.0001 → '0.00010'.

:.2e = Scientific notation, 2 fractional digits. 1.0e-5 → '1.00e-05'.

34print() — spacer

Another blank line to separate the table from the closing narration.

35print closing remark

A plain-English restatement of the squeeze theorem using the concrete values from the table. This is the bridge between the experimental data and the mathematical claim lim_{x→0} x² sin(1/x) = 0.

EXECUTION STATE

why this is not a proof = The print statement only declares the result — it does not derive it. The actual proof comes from the theorem: since lower and upper are squeezing f, and both go to 0, so must f.

17 lines without explanation

1import math
2
3def f(x):
4    """The wild function: oscillates faster and faster near 0."""
5    if x == 0:
6        return 0.0              # value at the point itself doesn't matter
7    return x * x * math.sin(1 / x)
8
9def lower(x):
10    """Lower bound g(x) = -x^2."""
11    return -(x * x)
12
13def upper(x):
14    """Upper bound h(x) = +x^2."""
15    return x * x
16
17# Step 1 — verify the inequality g(x) <= f(x) <= h(x) on a dense sample.
18print(f"{'x':>10}  {'g(x)':>12}  {'f(x)':>12}  {'h(x)':>12}  {'inside?':>8}")
19print("-" * 62)
20violations = 0
21for k in range(-500, 501):
22    x = k / 500.0                       # 1001 samples across [-1, 1]
23    g, fx, h = lower(x), f(x), upper(x)
24    if not (g - 1e-12 <= fx <= h + 1e-12):
25        violations += 1
26
27print(f"violations over 1001 samples: {violations}")
28print()
29
30# Step 2 — watch the three values pinch together as x -> 0.
31print(f"{'x':>10}  {'g(x)':>12}  {'f(x)':>12}  {'h(x)':>12}")
32print("-" * 52)
33for x in [0.5, 0.1, 0.05, 0.01, 0.001, 1e-5]:
34    print(f"{x:>10.5f}  {lower(x):>12.2e}  {f(x):>12.2e}  {upper(x):>12.2e}")
35
36# Step 3 — confirm the limit value by squeezing.
37print()
38print("As x -> 0:  g(x) -> 0  and  h(x) -> 0,  so f(x) -> 0 by the squeeze theorem.")

What you should see in the output

A 0-violation summary confirms the inequality holds on every sample. The follow-up table then shows the envelope values shrinking by a factor of 10⁻¹⁰ across six orders of magnitude — the trapped function has to follow. Notice it does so while still flipping sign: pinned, not smooth.

PyTorch: Autograd Through a Pinched Function

The squeeze theorem tells us $f(x) \to 0$ . A subtle trap lives in its corollary that you might assume: does the derivative also go to 0? No! PyTorch's autograd makes this concrete in just a few lines. We evaluate the trapped function and its pointwise derivative on the same x samples and print both.

PyTorch — autograd sees the squeeze, but not for the derivative

🐍squeeze_theorem_torch.py

Explanation(23)

Code(35)

1import torch

PyTorch: tensors with automatic differentiation. Using torch here lets us evaluate f, the bounds, and the derivative of f simultaneously on the whole sample tensor, and shows that a function squeezed to a limit can still be differentiable in the vicinity.

EXECUTION STATE

torch = PyTorch library. Provides torch.tensor, torch.sin, .backward(), etc.

3def f(x): tensor version of x² sin(1/x)

Same mathematical function as the Python version, but accepting a tensor and using torch.sin so autograd can track gradients. Every operation (multiply, divide, sin) is overloaded on tensors to build an autograd computation graph.

EXECUTION STATE

⬇ input: x (tensor) = A tensor of any shape. Element-wise operations apply to every entry.

→ x * x = Element-wise square. Shape preserved.

→ 1.0 / x = Element-wise reciprocal. For xs = [0.5, 0.1, 0.01]: 1/xs = [2.0, 10.0, 100.0].

→ torch.sin(1.0/x) = Element-wise sine. Always in [−1, 1]. For xs = [0.5, 0.1, 0.01]: ≈ [0.9093, −0.5440, −0.5064].

⬆ returns = A tensor of the same shape as x, holding x² sin(1/x) element-wise. Autograd-tracked if x requires_grad.

5Comment: march toward 0

A narrative comment introducing the next line. We intentionally do not include x = 0 because the squeeze theorem is a statement about behaviour near 0, not at 0.

6xs = torch.tensor([0.5, 0.1, 0.01, 1e-3, 1e-4], requires_grad=True)

Build a 1-D tensor of five decreasing sample points. requires_grad=True registers xs as a leaf in autograd so we can later compute df/dx at every sample in one backward call.

EXECUTION STATE

📚 torch.tensor(data, requires_grad) = Constructor. Copies `data` into a new tensor with the specified gradient-tracking flag. Default dtype is float32.

⬇ arg 1: [0.5, 0.1, 0.01, 1e-3, 1e-4] = Python list of five floats. Each is one order of magnitude closer to 0 than the previous. Gives a logarithmic view of the squeeze.

⬇ arg 2: requires_grad = True = Tells autograd to remember every op touching xs. Without it, calling .backward() later would error with 'element 0 of tensors does not require grad'.

xs.shape = torch.Size([5]) — a 1-D tensor with 5 elements.

8Comment: evaluate f and bounds on the whole tensor

Vectorised evaluation. Instead of looping over the five x values we evaluate f, upper, lower element-wise in three tensor expressions. Autograd builds one graph for all five.

9fx = f(xs)

Run f on the sample tensor. Returns a 1-D tensor with five entries: fx[i] = xs[i]² · sin(1/xs[i]). Because xs has requires_grad, fx inherits a grad_fn that links back to xs through the torch.sin, multiply, and reciprocal nodes.

EXECUTION STATE

fx = tensor([ 2.2732e-01, −5.4402e-03, −5.0637e-05, 8.27e-7, −3.06e-9]) (approx).

→ fx.grad_fn = <MulBackward0> — the top of the graph that backward() will walk.

10upper = xs * xs

Element-wise square. PyTorch overloads * on tensors — no Python loop, no autograd detour needed. `upper` is autograd-tracked because xs is.

EXECUTION STATE

upper = tensor([2.50e-1, 1.00e-2, 1.00e-4, 1.00e-6, 1.00e-8]).

11lower = -xs * xs

Negation followed by element-wise square. Note: Python evaluates −xs first then multiplies by xs (right-to-left via operator precedence turns out the same as applying unary minus to the whole product — in either case the result is the element-wise −x²).

EXECUTION STATE

lower = tensor([−2.50e-1, −1.00e-2, −1.00e-4, −1.00e-6, −1.00e-8]).

13Comment: element-wise inequality check

Before printing values we confirm that the squeeze actually holds for every sample. This is the tensor analogue of the Python violation counter.

14squeezed = (lower.detach() <= fx.detach()) & (fx.detach() <= upper.detach())

Element-wise double inequality. We call .detach() on each tensor because comparison operations should not be part of the autograd graph — they return bool tensors and cannot be differentiated. & is the bitwise AND; on bool tensors it acts element-wise logical-AND.

EXECUTION STATE

📚 .detach() = Tensor method: returns a view that shares data but is cut from the autograd graph. Safe for comparisons, printing, indexing.

→ why detach here? = Comparison ops don't build autograd history. Detaching is a clarity hint — we don't need gradients through a boolean check.

<= on tensors = Element-wise. Returns a BoolTensor of the same shape.

& (bitwise AND) = On bool tensors, element-wise logical AND. For integer tensors it would do bitwise AND. Example: tensor([True, False]) & tensor([True, True]) → tensor([True, False]).

squeezed = tensor([True, True, True, True, True]).

15print(..., squeezed.all().item())

Reduce the bool tensor with .all() (True iff every entry is True), then unwrap to a Python bool with .item(). If any sample had escaped the band, this would print False and we'd know the bound was wrong.

EXECUTION STATE

📚 .all() = Tensor method: returns a 0-dim bool tensor, True iff every element is truthy.

📚 .item() = Convert a 0-dim tensor to the underlying Python scalar (float, int, or bool).

expected output = 'Squeeze holds for every sample: True'

16print() — blank line

Cosmetic spacer between the confirmation line and the upcoming value table.

18Comment: print the three bounds side-by-side

Introduce the table that makes the squeeze visible. The headers will show x, −x², f(x), +x².

19Header for the four-column table

Same format spec pattern as the Python version: right-aligned numeric columns so scientific notation lines up.

20print("-" * 58)

Horizontal rule 58 characters wide to match the four columns.

21for i in range(len(xs))

Loop over tensor indices. We deliberately use index-based access (xs[i]) so each row prints one sample clearly. Vectorised printing is possible but less readable for pedagogy.

LOOP TRACE · 5 iterations

i = 0, x = 0.5

lower = −2.500e-01

fx = +2.273e-01

upper = +2.500e-01

i = 1, x = 0.1

lower = −1.000e-02

fx = −5.440e-03

upper = +1.000e-02

i = 2, x = 0.01

lower = −1.000e-04

fx = −5.064e-05

upper = +1.000e-04

i = 3, x = 0.001

lower = −1.000e-06

fx = +8.267e-07

upper = +1.000e-06

i = 4, x = 0.0001

lower = −1.000e-08

fx = ≈ 3.06e-09

upper = +1.000e-08

22print f-string row — x, lower, fx, upper

Print one row of the squeeze table. `.item()` converts the 0-dim slice back to a Python float so f-strings can format it.

EXECUTION STATE

:.5f = Fixed-point, 5 decimals. 0.00001 → '0.00001'.

:.3e = Scientific, 3 decimals. 1e-4 → '1.000e-04'.

27fx.sum().backward()

Scalar backward pass. fx is a length-5 tensor, and .backward() requires a scalar — so we sum first. Summing before backward is the standard trick: ∂(Σᵢ fᵢ) / ∂xⱼ = ∂fⱼ/∂xⱼ for disjoint xⱼ, giving us every sample's derivative in one shot.

EXECUTION STATE

📚 .sum() = Reduce a tensor to a 0-dim tensor by adding every element. Differentiable: gradient of sum wrt input is a tensor of ones.

📚 .backward() = Walk the autograd graph, filling each leaf tensor's .grad. For fx.sum(): xs.grad[i] = ∂fx[i]/∂xs[i].

→ math = df/dx = 2x sin(1/x) + x² · cos(1/x) · (−1/x²) = 2x sin(1/x) − cos(1/x). Note the second term stays of order 1 — no squeeze.

→ why sum first? = backward() on a non-scalar output requires a 'grad_output' tensor. Summing is the clearest way to get per-sample gradients without that extra argument.

28print() — blank line

Gap between the value table and the derivative table.

29Header for derivative table

Two columns this time: x and df/dx. Shows what autograd computed.

30print("-" * 36)

Shorter rule for the two-column derivative table.

31for i in range(len(xs)) (derivative table)

Iterate through the gradient values. Unlike f(x) itself, df/dx does NOT vanish at 0 — it keeps oscillating (because −cos(1/x) stays of order 1), so autograd shows the derivative not existing in the classical sense at 0.

LOOP TRACE · 5 iterations

i = 0, x = 0.5

df/dx ≈ 2x sin(1/x) − cos(1/x) = ≈ 0.9093 − (−0.4161) = 1.3254

i = 1, x = 0.1

df/dx = ≈ 2(0.1)(−0.5440) − 0.8623 = −0.9712

i = 2, x = 0.01

df/dx = oscillates in [−1, 1]: ≈ 0.8623 (no squeeze!)

i = 3, x = 0.001

df/dx = still order 1 — the oscillator escapes the squeeze on the derivative

i = 4, x = 0.0001

df/dx = order-1 oscillation continues

32print row — x, xs.grad[i]

Print each sample's gradient from xs.grad. Crucially, the numbers here do NOT shrink to 0 — revealing a subtle lesson: f(x) → 0 does not imply f'(x) → 0. The squeeze works on the function but not on its derivative.

EXECUTION STATE

xs.grad = Populated by .backward(). Stores ∂(fx.sum())/∂xs — a tensor the same shape as xs.

:.4e = Scientific, 4 decimals — enough precision to see the oscillation clearly.

12 lines without explanation

1import torch
2
3def f(x: torch.Tensor) -> torch.Tensor:
4    """Same trapped function, written so autograd can track it."""
5    return x * x * torch.sin(1.0 / x)
6
7# 1) Sample x-values marching toward 0 (avoiding x = 0 itself).
8xs = torch.tensor([0.5, 0.1, 0.01, 1e-3, 1e-4], requires_grad=True)
9
10# 2) Evaluate f and its bounds on the whole tensor at once.
11fx    = f(xs)
12upper = xs * xs
13lower = -xs * xs
14
15# 3) Check the squeeze inequality element-wise.
16squeezed = (lower.detach() <= fx.detach()) & (fx.detach() <= upper.detach())
17print("Squeeze holds for every sample:", squeezed.all().item())
18print()
19
20# 4) Print the three bounds side-by-side.
21print(f"{'x':>10}  {'-x^2':>14}  {'f(x)':>14}  {'+x^2':>14}")
22print("-" * 58)
23for i in range(len(xs)):
24    print(f"{xs[i].item():>10.5f}  "
25          f"{lower[i].item():>14.3e}  "
26          f"{fx[i].item():>14.3e}  "
27          f"{upper[i].item():>14.3e}")
28
29# 5) Backprop through the sum of f(xs) to get each df/dx.
30fx.sum().backward()
31print()
32print(f"{'x':>10}  {'df/dx from autograd':>22}")
33print("-" * 36)
34for i in range(len(xs)):
35    print(f"{xs[i].item():>10.5f}  {xs.grad[i].item():>22.4e}")

The lesson the derivative table teaches

$f(x) \to 0$ as $x \to 0$ , but $f'(x) = 2x \sin(1/x) - \cos(1/x)$ oscillates between roughly $\pm 1$ . The $\cos(1/x)$ term has no envelope to squeeze it. So although the function is continuous (and even differentiable) at 0 — with $f'(0) = 0$ computed from the limit definition — the derivative is discontinuous there. A function can be squeezed to a limit without its derivative following suit.

Common Pitfalls

Pitfall 1 — Using unequal bound limits

The squeeze theorem requires $\lim g = \lim h$ . If the bounds go to different values, the theorem says nothing — the trapped function could end up anywhere in between, or might not have a limit at all.

Pitfall 2 — Bounds that only hold on one side

The inequalities $g \le f \le h$ must hold on some interval around c (except maybe at c itself), on both sides. If your bound is only valid for x > 0, you can conclude the right-hand limit only, and you must give an independent argument for the left side.

Pitfall 3 — Multiplying by a variable-sign factor

From $-1 \le \sin(1/x) \le 1$ , multiplying by $x$ is dangerous because x changes sign around 0. Multiplying an inequality by a negative number flips it. Use $|x|$ or $x^{2}$ to stay safe.

Pitfall 4 — Assuming the derivative is also squeezed

A squeeze on $f(x)$ does not squeeze $f'(x)$ . The PyTorch demo above showed the derivative keeps oscillating. If you need the derivative's limit, you need a separate squeeze on the derivative itself.

Why the Squeeze Theorem Matters

📐 Derivatives of trig functions

The entire chain — $\tfrac{d}{dx}\sin x = \cos x$ , $\tfrac{d}{dx}\cos x = -\sin x$ — rests on $\sin x / x \to 1$ , which rests on the squeeze.

🌊 Fourier analysis

Every integral of the form $\int \sin(kx)/x \, dx$ — the heart of Fourier transforms — traces back to $\sin x / x \to 1$ .

🧠 Convergence proofs in ML

Stochastic gradient descent proofs routinely bound noisy iterates by two vanishing sequences — the squeeze pins the iterate to the optimum.

🔬 Numerical analysis error bounds

Truncation errors are bounded by vanishing envelopes. Saying "the error issqueezed between two expressions that go to 0" is often easier than computing the error exactly.

Summary

The squeeze theorem is the detective tool of limits. You corner the wild function between two tame ones, prove the tame ones converge to a common point, and collect the conclusion for free.

Move	What to look for	What you produce
Spot the bounded piece	sin(…), cos(…), fractional part, anything in [a, b]	An inequality −1 ≤ bounded ≤ 1 (or similar).
Identify the vanishing factor	x, x², e^{-1/x²}, anything with a known tame limit	Multiply bounded × vanishing to produce the envelope.
Check bound validity	Are both sides of c covered? Is the multiplier non-negative?	A legitimate two-sided sandwich g ≤ f ≤ h.
Compute envelope limits	Use limit laws on g and h	Both go to the same L.
Invoke the theorem	All three ingredients present	lim f = L, no direct evaluation of f required.

Key Takeaways

The squeeze theorem converts wild limits into tame ones by finding two bounds that share a limit.
The flagship examples are $\lim_{x \to 0} x^{2}\sin(1/x) = 0$ and $\lim_{x \to 0} \sin x / x = 1$ .
The sine limit is the reason derivatives of trigonometric functions have the form they do — a central link between geometry and calculus.
The theorem is a direct corollary of the ε–δ definition: squeeze both bounds inside an ε-window, and the trapped function is stuck with them.
A squeeze on $f$ does not squeeze $f'$ . Derivatives need their own argument.

The squeeze promise:

"If I trap a function between two walls heading to the same place, the function must go there too — no matter how much it screams on the way."

Coming Next: Section 2.8 takes the two iconic limits we just squeezed — $\sin x / x \to 1$ and the companion limit $(1 + 1/n)^{n} \to e$ — and turns them into the foundation for derivatives of trigonometric and exponential functions.