Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

After this section you will be able to:

Explain what problem limits solve — why plugging in a number is sometimes forbidden but an answer still exists.
Read the notation $\displaystyle \lim_{x \to a} f(x) = L$ in plain English.
Approximate a limit from a numerical table and from an interactive graph.
Detect when a limit fails to exist (jump, oscillation, blow-up).
Connect limits to the derivative by watching a secant slope converge.
Compute a simple limit in Python and in PyTorch.

The Problem That Birthed Calculus

How fast are you going right now? Not between two moments — right now, at a single instant.

Speed is distance divided by time. That is the definition from primary school: $\text{avg speed} = \dfrac{\Delta x}{\Delta t}$ . But what happens when $\Delta t$ is zero? Distance travelled in zero time is zero. We are asked to compute $0/0$ . Our arithmetic cannot do this.

And yet a speedometer reads something — a definite number — at every instant. Cars and planets move with an instantaneous speed even though the formula $\Delta x / \Delta t$ breaks down. Physics insists the number exists. Algebra insists the computation is illegal. Who is right?

The Calculus Question

Can we give a rigorous meaning to “the value a quantity is heading toward” even when plugging in the target itself is forbidden?

The answer calculus invented is called a limit. It is a new kind of “value” — not the output of a formula at a point, but the target the output approaches as we slide toward that point. Every derivative, every integral, every differential equation in this book rests on this one idea. Let us build it carefully.

The Hole in the Graph

To make the idea concrete, throw away the motion story for a moment and just stare at a formula:

\displaystyle f(x) = \frac{x^{2} - 1}{x - 1}

Plug in any number you like: $x = 2$ gives $3/1 = 3$ , $x = 0$ gives $-1/-1 = 1$ , $x = 1.1$ gives $0.21 / 0.1 = 2.1$ . Every value works — except $x = 1$ . There both the numerator and denominator vanish, and we are handed $0/0$ — a meaningless expression.

The graph of this function looks exactly like the straight line $y = x + 1$ — except for a tiny infinitesimal puncture at the single point $(1, 2)$ . The line is whole everywhere else; only this one point is missing.

Why x + 1?

Factor the numerator: $x^{2} - 1 = (x - 1)(x + 1)$ . Then $\dfrac{(x - 1)(x + 1)}{x - 1} = x + 1$ whenever we are allowed to cancel — i.e., whenever $x \neq 1$ . At $x = 1$ the cancellation is illegal (it would be dividing zero by itself), so the function stays undefined there. The hole is real; it is not an artifact of bad algebra.

So here is the puzzle. The graph is a straight line with one missing point. Every neighbour of $x = 1$ gives an output you can almost predict — values just below 1 produce outputs just below 2, values just above 1 produce outputs just above 2. The function is pointing at the value 2 from both sides. It just never arrives.

The limit is our formal name for “the value the function is pointing at.” Let us see it live.

Zoom Into the Hole — Interactive

Drag the slider. The window around $x = 1$ narrows by a factor of ten every decade. The function is plotted in indigo; the orange/teal dots are probes moving with the zoom. The white circle at $(1, 2)$ marks the hole — it is genuinely empty, no matter how far you zoom in.

Loading zoom visualization…

What the zoom is telling you

No matter how much you magnify, the two probes always sit symmetrically around a single point on the graph, and that point is always one unit above the $y = 2$ line scaled with the zoom. In absolute terms, both probe readings sit within a distance roughly equal to the current window half-width of the target value $y = 2$ . As $\delta \to 0$ , those readings become indistinguishable from 2.

The Numerical Table Approach

Before the formal definition, Leibniz and the Bernoullis convinced themselves that limits made sense by computing. Given a function and a target input, they built a table of outputs at nearby inputs and watched for convergence. We do the same here:

x (approach from left)	f(x)	x (approach from right)	f(x)
0.9	1.9	1.1	2.1
0.99	1.99	1.01	2.01
0.999	1.999	1.001	2.001
0.9999	1.9999	1.0001	2.0001
0.99999	1.99999	1.00001	2.00001
↓	↓	↓	↓
1 (forbidden)	undefined	1 (forbidden)	undefined

Left column: outputs are climbing toward 2 from below. Right column: outputs are descending toward 2 from above. The function does not attain the value 2 (because the inputs never attain 1). But both columns leave no doubt about the target.

Calculus captures this observation with a single piece of notation:

\displaystyle \lim_{x \to 1} \frac{x^{2} - 1}{x - 1} = 2

Read aloud: “the limit, as $x$ approaches 1, of $(x^{2} - 1)/(x - 1)$ , equals 2.” The arrow is the important symbol — it means approaching but never reaching. The equation is not claiming $f(1) = 2$ (that is false — $f(1)$ is undefined). It is claiming something subtler and stronger: that the outputs line up arbitrarily close to 2 whenever the inputs line up arbitrarily close to 1.

The Intuitive Definition of a Limit

Working Definition (intuitive)

We write $\displaystyle \lim_{x \to a} f(x) = L$ to mean: we can make $f(x)$ as close to $L$ as we please — within any tolerance you demand — by choosing $x$ close enough to $a$ (but not equal to $a$ ).

Three pieces of this sentence deserve emphasis.

“As close as we please”. No tolerance is too small. Name any positive number $\varepsilon > 0$ — a billionth, a trillionth, $10^{-100}$ — and we must be able to place $f(x)$ within $\varepsilon$ of $L$ .
“By choosing x close enough”. For every such tolerance, there is a distance $\delta > 0$ around $a$ that does the job. Small $\varepsilon$ typically demands small $\delta$ .
“But not equal to a”. The value of $f$ at $a$ is irrelevant. It might be undefined, or defined but wrong (think of a graph with a single filled dot floating above the line). The limit only cares about neighbours of $a$ .

This is the language that will become the formal $\varepsilon$ - $\delta$ definition in §2.5. For now, hold onto the metaphor: you (the challenger) pick a tolerance around $L$ ; the function (the responder) must guarantee a matching window around $a$ . If it can always respond, the limit equals $L$ .

Worked Example — Factoring the 0/0

Before we trust any software, let us compute the same limit by hand. The trick is to break the spell of $0/0$ by factoring.

Click to expand — step-by-step calculation of

\lim_{x \to 1} \dfrac{x^{2}-1}{x-1}

Step 1 — Try direct substitution (just to confirm it fails).

\dfrac{1^{2}-1}{1-1} = \dfrac{0}{0} \quad \text{(indeterminate — we cannot conclude from this)}

The symbol $0/0$ is called an indeterminate form. It does not mean the limit doesn't exist — it just means direct substitution is not enough to find it.

Step 2 — Factor the numerator.

x^{2} - 1 = (x - 1)(x + 1) \quad \text{(difference of squares)}

Step 3 — Substitute back and cancel.

\dfrac{x^{2}-1}{x-1} = \dfrac{(x-1)(x+1)}{x-1} = x + 1 \quad (\text{valid for } x \neq 1)

We do not cancel at $x = 1$ itself — division by zero is still illegal there. But the limit only cares about $x$ near 1, not $x = 1$ , so the cancellation is legitimate inside the limit.

Step 4 — Take the limit of the simplified form.

\displaystyle \lim_{x \to 1} (x + 1) = 1 + 1 = 2

The function $x + 1$ is a polynomial; it is defined at 1 (value 2) and substitution IS valid for polynomials. So the simplified limit is exactly 2.

Step 5 — Conclude.

\displaystyle \lim_{x \to 1} \dfrac{x^{2}-1}{x-1} = 2

This matches the numerical table, the zoom visualization, and common sense. The function has a hole at $(1, 2)$ , but it wants to be 2 there — and the limit gives us the formal right to say so.

When the Limit Does Not Exist

The limit of our factoring example worked because both one-sided “targets” agreed. What if they don't? Four things can go wrong, and it is worth recognising each one.

Jump. The two sides approach different numbers. For the sign function, $\lim_{x \to 1^{-}} \mathrm{sgn}(x-1) = -1$ while $\lim_{x \to 1^{+}} \mathrm{sgn}(x-1) = +1$ . No single $L$ can satisfy the definition.
Blow-up. The values grow without bound near $a$ . For $f(x) = 1/(x-1)^{2}$ , both one-sided outputs head off toward $+\infty$ . No finite $L$ exists, though we write $\lim f = +\infty$ as shorthand.
Wild oscillation. The function bounces arbitrarily fast near $a$ . The canonical example is $f(x) = \sin\!\left(\tfrac{1}{x-1}\right)$ ; every neighbourhood of 1 contains values arbitrarily close to $+1$ and $-1$ . There is no single target.
Undefined on one side. e.g., $f(x) = \sqrt{x}$ near $x = 0$ has no left-sided limit (no real square root for negative inputs).

The rule of the two sides

The two-sided limit exists and equals $L$ if and only if both one-sided limits exist and equal $L$ . If either side misbehaves, the two-sided limit fails. We will revisit one-sided limits formally in §2.2.

Left vs. Right — Interactive

Four scenarios, one probe on each side. Use the buttons to switch and the slider to shrink the probe distance $h$ . Watch when the two sides agree and when they do not.

Loading left/right probe…

Python: Computing the Limit by Brute Force

The table we stared at earlier was hand-computed. Let us do the same thing in Python, so the student can paste this into a REPL or notebook and watch the convergence live. Click any line of the code to see the exact values flowing through it.

Plain Python — brute-force limit probe

🐍limit_by_brute_force.py

Explanation(16)

Code(19)

1def f(x) → float

Defines the Python function that implements our mystery formula. This is the exact function we want to take the limit of. It is defined for every real x EXCEPT x = 1, because at x = 1 the denominator (x − 1) equals zero and Python will raise a ZeroDivisionError.

EXECUTION STATE

⬇ input: x = A real number. Can be any float in Python (positive, negative, 0, 1.5, 0.99999, etc.) EXCEPT 1.0 — that value is forbidden because it would divide by zero.

⬆ returns = float — the value of (x² − 1) ÷ (x − 1). Algebraically this simplifies to x + 1 whenever x ≠ 1, but Python doesn't know that; it evaluates the literal formula.

2Docstring: what f is

Plain text documentation. Mentions the form of f and calls out its singularity at x = 1. Good docstrings remind future readers of pre-conditions — here, that x ≠ 1.

3return (x**2 - 1) / (x - 1)

Computes the raw formula. x**2 is Python's exponent operator (NOT the XOR ^). The expression is literally the top minus 1, divided by bottom minus 1 — no simplification is done at all.

EXECUTION STATE

x**2 = Python's exponentiation: x raised to the power 2 — i.e., x × x. Example: 1.1**2 = 1.21.

x**2 - 1 = Numerator. At x = 1.1 this is 1.21 − 1 = 0.21. At x = 1.0 this would be 0, causing 0/0.

x - 1 = Denominator. At x = 1.1 this is 0.1. At x = 1.0 this is 0 → ZeroDivisionError.

⬆ return: f(1.1) = 0.21 / 0.1 = 2.1

5Comment — the plan

Announces the loop strategy: we will pick ever-smaller distances h from x = 1, evaluate f at x = 1 + h (right side) and x = 1 − h (left side), and watch what number the outputs approach.

6print header row

Prints a column header using an f-string. The {'x':>10} means 'place the character x right-aligned in a 10-character field'. This just makes the output table readable.

EXECUTION STATE

f-string = Python's formatted string literal (PEP 498). Anything inside { } is evaluated. {'x':>10} aligns the literal 'x' right-padded to width 10.

⬆ printed = x f(x) side

7print("-" * 32)

Python allows multiplying a string by an integer to repeat it. '-' * 32 produces a 32-character dashed separator line. Purely cosmetic.

EXECUTION STATE

⬆ printed = --------------------------------

9for h in [0.1, 0.01, 0.001, 0.0001, 1e-5]:

Loops over five distances h that shrink by a factor of 10 each time. 1e-5 is Python's scientific-notation literal for 0.00001. At each iteration we probe x = 1 ± h and record f.

LOOP TRACE · 5 iterations

iteration 1

h = 0.1

iteration 2

h = 0.01

iteration 3

h = 0.001

iteration 4

h = 0.0001

iteration 5

h = 1e-5 (0.00001)

10x_r = 1 + h # right side

The right probe: a point slightly greater than 1. As h shrinks, x_r slides leftward toward 1 without ever reaching it.

LOOP TRACE · 5 iterations

h = 0.1

x_r = 1.1

h = 0.01

x_r = 1.01

h = 0.001

x_r = 1.001

h = 0.0001

x_r = 1.0001

h = 1e-5

x_r = 1.00001

11x_l = 1 - h # left side

The left probe: a point slightly less than 1. As h shrinks, x_l slides rightward toward 1 without ever reaching it.

LOOP TRACE · 5 iterations

h = 0.1

x_l = 0.9

h = 0.01

x_l = 0.99

h = 0.001

x_l = 0.999

h = 0.0001

x_l = 0.9999

h = 1e-5

x_l = 0.99999

12print(...) for right side

Prints the right-side row. f(x_r) calls our function: it computes ((1+h)² − 1) / h, which algebraically equals 2 + h. So the printed value is always slightly ABOVE 2 and shrinks toward 2.

EXECUTION STATE

📚 f-string formatting spec :>12.6f = > = right-align. 12 = total field width. .6f = fixed-point, 6 decimal places. Example: 2.1 becomes ' 2.100000' (padded on the left to 12 chars).

LOOP TRACE · 5 iterations

h = 0.1

f(x_r) = 2.100000

h = 0.01

f(x_r) = 2.010000

h = 0.001

f(x_r) = 2.001000

h = 0.0001

f(x_r) = 2.000100

h = 1e-5

f(x_r) = 2.000010

13print(...) for left side

Prints the left-side row. f(x_l) = ((1−h)² − 1) / (−h) = 2 − h. So the value is slightly BELOW 2 and rises toward 2. The two sides squeeze the target value from above and below.

LOOP TRACE · 5 iterations

h = 0.1

f(x_l) = 1.900000

h = 0.01

f(x_l) = 1.990000

h = 0.001

f(x_l) = 1.999000

h = 0.0001

f(x_l) = 1.999900

h = 1e-5

f(x_l) = 1.999990

15Comment — now the forbidden point

Signals the pedagogical pivot: we have watched f approach 2 from both sides; now we confront the fact that plugging in x = 1 directly is illegal.

16try: — guarded evaluation

Starts a Python exception handler. Any error raised inside the try block will be caught by the matching except clause instead of crashing the program.

EXECUTION STATE

📚 try / except = Python's error-handling construct. Example: try: 1/0 except ZeroDivisionError as e: print(e) prints 'division by zero' instead of aborting.

17f(1)

Calls f with the forbidden value. Inside f this evaluates (1**2 − 1)/(1 − 1) = 0 / 0. Python raises ZeroDivisionError because you cannot divide a float by exact zero.

EXECUTION STATE

⬇ arg: x = 1 = The singular input. Numerator = 0, denominator = 0.

⬆ result = Exception — no value is returned.

18except ZeroDivisionError as e:

Catches the specific error raised on line 17 and binds it to variable e. 'ZeroDivisionError' is a built-in Python exception class; 'as e' gives us a handle on the error message for printing.

EXECUTION STATE

e (ZeroDivisionError) = The exception instance. Its string form is 'float division by zero' in Python 3.

19print error message

Prints the friendly version of the error. This confirms computationally what we already suspected: f(1) cannot be evaluated — even though both one-sided probes said the answer 'should' be 2. The limit is NOT the same as the value of the function.

EXECUTION STATE

⬆ printed = At x = 1: ZeroDivisionError -- float division by zero

3 lines without explanation

1def f(x):
2    """f(x) = (x^2 - 1) / (x - 1). Undefined at x = 1 (0/0)."""
3    return (x**2 - 1) / (x - 1)
4
5# Approach x = 1 from both sides. Shrink |h| each step.
6print(f"{'x':>10}  {'f(x)':>12}  side")
7print("-" * 32)
8
9for h in [0.1, 0.01, 0.001, 0.0001, 1e-5]:
10    x_r = 1 + h                                   # approach from the right
11    x_l = 1 - h                                   # approach from the left
12    print(f"{x_r:>10.5f}  {f(x_r):>12.6f}  right")
13    print(f"{x_l:>10.5f}  {f(x_l):>12.6f}  left")
14
15# Now try x = 1 exactly -- the forbidden point.
16try:
17    f(1)
18except ZeroDivisionError as e:
19    print(f"\nAt x = 1: ZeroDivisionError -- {e}")

The output prints a table that squeezes onto 2 from both sides, and the last three lines confirm that $f(1)$ really does raise ZeroDivisionError. The limit exists; the function value does not. These are two different things, and Python made the distinction painfully obvious.

PyTorch: Limits as Vectorised Evaluation

Pure Python required a for-loop. PyTorch lets us evaluate $f$ at many probe points in a single tensor operation. More importantly, we get a first glimpse of autograd — PyTorch's automatic-differentiation engine, which is nothing more than a limit machinery implemented in C++.

PyTorch — vectorised limit + autograd preview

🐍limit_with_pytorch.py

Explanation(13)

Code(16)

1import torch

Loads PyTorch — the tensor library used throughout deep learning. A PyTorch tensor is like a NumPy array, but with two superpowers: (1) it runs on GPU, (2) it can track its own derivatives via autograd. We will use both here: tensors for vectorised limit probing, autograd for the derivative preview.

EXECUTION STATE

torch = PyTorch library. Provides torch.tensor (n-dimensional array), element-wise math, auto-differentiation, neural-network layers.

3Comment — vectorised strategy

Announces the plan: instead of a Python for-loop that computes f(x) one value at a time, we will put all six probe points into a single tensor and evaluate f on all of them simultaneously. This is a real speed-up on large problems and, pedagogically, lets us see the limit converge from both sides in one glance.

4Comment — how broadcasting works

Reminds the reader that arithmetic on tensors is element-wise. x**2 squares each element; (x - 1) subtracts 1 from each element; the division pairs up corresponding elements. No explicit loop.

5x = torch.tensor([0.9, 0.99, 0.999, 1.001, 1.01, 1.1])

Creates a 1D tensor containing six probe x-values: three approaching 1 from below and three approaching from above. We carefully avoided x = 1.0 itself — that would produce a 0/0 on the next line.

EXECUTION STATE

📚 torch.tensor(data) = Factory function that builds a tensor from a Python list / nested list / numpy array. The dtype is inferred (here float32).

⬆ x = tensor([0.9000, 0.9900, 0.9990, 1.0010, 1.0100, 1.1000])

x.shape = torch.Size([6]) — a 1D tensor of length 6

6y = (x**2 - 1) / (x - 1)

Applies the entire formula in one line. Every operation is element-wise: x**2 squares each entry, − 1 subtracts from each entry, / divides pair-wise. No iteration, no division by zero, no Python-level loop. This is the essence of 'vectorised' computation.

EXECUTION STATE

x**2 = tensor([0.8100, 0.9801, 0.9980, 1.0020, 1.0201, 1.2100])

x**2 - 1 = tensor([-0.1900, -0.0199, -0.0020, 0.0020, 0.0201, 0.2100])

x - 1 = tensor([-0.1000, -0.0100, -0.0010, 0.0010, 0.0100, 0.1000])

⬆ y = (x²-1)/(x-1) = tensor([1.9000, 1.9900, 1.9990, 2.0010, 2.0100, 2.1000])

→ pattern = The two leftmost values climb: 1.900 → 1.990 → 1.999. The two rightmost values descend: 2.100 → 2.010 → 2.001. Both sides squeeze toward 2.

8print header row

Same f-string trick as before. Right-aligns the literals 'x' and 'f(x)' in 8-character fields.

9for xi, yi in zip(x.tolist(), y.tolist()):

Iterates over both tensors in lockstep. .tolist() converts a tensor back to a Python list (of Python floats) so zip can pair them.

EXECUTION STATE

📚 zip(a, b) = Pairs the i-th element of a with the i-th element of b. zip([1,2],[3,4]) yields (1,3) then (2,4).

📚 .tolist() = Tensor method: returns a nested Python list with the same shape. Detaches from autograd. Needed here because we want plain Python floats for printing.

LOOP TRACE · 6 iterations

i = 0

xi, yi = 0.900, 1.9000

i = 1

xi, yi = 0.990, 1.9900

i = 2

xi, yi = 0.999, 1.9990

i = 3

xi, yi = 1.001, 2.0010

i = 4

xi, yi = 1.010, 2.0100

i = 5

xi, yi = 1.100, 2.1000

10print formatted row

Prints one row of the converging table. Format spec :>8.3f means right-aligned, width 8, three decimals. The student reading the output can visually verify both sides collapsing toward 2.0000.

12Comment — the autograd bridge

Announces the conceptual leap: automatic differentiation is literally the limit of a difference quotient, computed symbolically by the framework. The rule f'(x) = limₕ→0 [f(x+h)−f(x)]/h is the foundation — autograd performs it for you.

13Comment — the simplified form

Reminds the reader that (x² − 1)/(x − 1) factors as (x − 1)(x + 1)/(x − 1) = x + 1 for x ≠ 1. So computing autograd on x + 1 is equivalent to computing it on the original f at every point where f is defined, and the derivative is 1 everywhere.

14x1 = torch.tensor(1.0, requires_grad=True)

Creates a scalar tensor at x = 1 and tells PyTorch to track any operation involving this tensor in an autograd graph. Without requires_grad=True, backward() would have no effect.

EXECUTION STATE

⬇ arg 1: 1.0 = The initial value. We pick exactly 1.0 because the original f was undefined there — autograd will still give a meaningful answer through the simplified form.

⬇ arg 2: requires_grad=True = Flag telling PyTorch to build a computation graph behind every operation on x1. Without this flag, x1.grad stays None forever. Default is False for literal tensors; True for learnable parameters.

⬆ x1 = tensor(1., requires_grad=True)

15(x1 + 1).backward()

Two things happen here. First (x1 + 1) evaluates to 2.0 and PyTorch records that this tensor was built by adding 1 to x1. Then .backward() walks that tiny graph in reverse and accumulates d(output)/d(x1) into x1.grad.

EXECUTION STATE

x1 + 1 = tensor(2., grad_fn=<AddBackward0>) — a scalar tensor whose .grad_fn records the op that built it (an add).

📚 .backward() = Tensor method: computes d(self)/d(leaf tensors). For a scalar output, no arguments are needed. For a non-scalar output you'd pass a gradient tensor. It MUTATES leaf tensors' .grad attributes — does not return anything.

→ what it actually does = Applies the chain rule along the recorded graph. Here the graph is trivial: output = x1 + 1, so d(output)/d(x1) = 1. That 1.0 is added to x1.grad.

16print the gradient

Confirms numerically what the math predicted. x1.grad.item() pulls the single scalar out of the grad tensor as a Python float. The printed value 1.0 is the answer to 'limit as h → 0 of [(1+h+1) − (1+1)] / h' — PyTorch did this limit for us symbolically, no epsilon loop required.

EXECUTION STATE

📚 .item() = Tensor method: for a tensor containing exactly ONE element, returns that element as a native Python scalar (float, int, bool). Errors on multi-element tensors.

x1.grad = tensor(1.)

x1.grad.item() = 1.0

⬆ printed = Autograd: d/dx (x + 1) at x = 1 = 1.0

3 lines without explanation

1import torch
2
3# Vectorised limit: evaluate f at many x-values close to 1 all at once.
4# Tensor broadcasting runs the arithmetic on all elements in parallel.
5x = torch.tensor([0.9, 0.99, 0.999, 1.001, 1.01, 1.1])
6y = (x**2 - 1) / (x - 1)
7
8print(f"{'x':>8}   {'f(x)':>8}")
9for xi, yi in zip(x.tolist(), y.tolist()):
10    print(f"{xi:>8.3f}   {yi:>8.4f}")
11
12# Autograd: the derivative IS a limit, computed symbolically.
13# f(x) = (x^2 - 1)/(x - 1) simplifies to x + 1, so f'(1) = 1.
14x1 = torch.tensor(1.0, requires_grad=True)
15(x1 + 1).backward()
16print(f"\nAutograd: d/dx (x + 1) at x = 1 = {x1.grad.item()}")

Why this matters later

Every time you call loss.backward() in a neural-network training loop, PyTorch is computing millions of limits on your behalf. They are the same limits you just saw — just executed by the chain rule at every parameter. Nothing in that backward pass can be understood without the idea developed in this section.

Seeding the Derivative — Interactive

We started with a physics question we could not answer: what is the instantaneous speed of a moving object? Let us close the loop and answer it — at least for one specific curve.

Consider the parabola $y = x^{2}$ at $x = 1$ . Its slope there is well-defined in principle but there is no “rise-over-run” formula that works at a single point — the run would be zero. The trick is the same trick we just invented: approach the point with a secant line and take the limit of its slope.

The slope of the secant through $(1, 1)$ and $(1+h, (1+h)^{2})$ is, after a tiny algebra check,

\displaystyle \frac{(1+h)^{2} - 1}{(1+h) - 1} = \frac{2h + h^{2}}{h} = 2 + h \quad (h \neq 0)

and so $\displaystyle \lim_{h \to 0} \frac{(1+h)^{2} - 1}{h} = \lim_{h \to 0}(2 + h) = 2$ . The number 2 is the tangent slope at $(1, 1)$ — our first derivative, computed with nothing more than the limit idea.

Loading secant visualization…

Drag $h$ all the way down. The secant (in orange) rotates onto the tangent (the dashed grey line) and the displayed slope converges to 2 from above. This is the picture behind every derivative in Chapter 4 — and, by extension, every gradient in gradient descent.

Common Pitfalls

Confusing $\displaystyle \lim_{x \to a} f(x)$ with $f(a)$ . They are independent. The limit can exist when $f(a)$ does not; they can also disagree (think of a graph with a single misplaced dot). The limit is about trend, not evaluation.
Treating 0/0 as 0 or as 1. $0/0$ is an indeterminate form. It is a flag that says “do more work” — factor, rationalise, apply L'Hôpital — not a shortcut to an answer.
Using a finite sample as proof. Our numerical table suggests $L = 2$ but does not prove it. Floating-point arithmetic can mislead: tables for $\sin(1/x)$ near 0 can look like they converge to any value you like, depending on where you sample. The formal definition (next sections) is what forbids that trap.
Forgetting that both sides must agree. A left-only approach is not a limit. When a function has a natural boundary — a square root at zero, a log at a negative input — the two-sided limit simply does not exist there.

Summary

The limit formalises the intuition of “the value a function is heading toward,” even when the function is not defined at the target.
$\displaystyle \lim_{x \to a} f(x) = L$ means: for any tolerance $\varepsilon > 0$ there is a neighbourhood of $a$ on which $f$ stays within $\varepsilon$ of $L$ .
The two-sided limit exists iff the left and right limits exist and agree.
Limits fail to exist through jumps, blow-up, wild oscillation, or one-sided domains.
The derivative is a limit (of a secant slope). Autograd evaluates it for us symbolically; the underlying mathematics is exactly what we practiced here.

What's next

§2.2 formalises one-sided limits and jump discontinuities. §2.5 returns with the full $\varepsilon$ - $\delta$ definition and proves the table intuition rigorous.