Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

State the formal definition of the derivative as a limit of the difference quotient
Explain why we need limits to define instantaneous rate of change (the 0/0 problem)
Interpret the difference quotient geometrically as the slope of a secant line
Visualize how secant lines approach the tangent line as $h \to 0$
Compute derivatives from the definition for polynomial and simple functions
Apply the limit definition to real-world rate-of-change problems
Connect this definition to numerical differentiation and automatic differentiation in machine learning

The Big Picture: Capturing Instantaneous Change

"The derivative is the mathematical answer to an ancient question: How fast is something changing at this very instant?"

Imagine a ball thrown into the air. At any given instant, the ball has a definite velocity — it's moving at some specific speed in some direction. But here's the paradox that puzzled mathematicians for centuries:

The Paradox of Instantaneous Velocity

Velocity is defined as distance traveled divided by time:

\text{velocity} = \frac{\text{distance}}{\text{time}} = \frac{\Delta s}{\Delta t}

But at a single instant, no time passes (Δt = 0) and no distance is traveled (Δs = 0). So instantaneous velocity would be:

\frac{0}{0} = \text{undefined???}

Yet we know that at each moment, the ball has a definite speed! How can we make mathematical sense of this?

The resolution to this paradox is one of the most profound ideas in mathematics: we don't compute the ratio at the instant, but rather we take the limit of ratios over shorter and shorter time intervals approaching the instant.

Historical Context: Newton, Leibniz, and the Birth of Calculus

The formal definition of the derivative emerged in the 17th century through the independent work of Isaac Newton (England) and Gottfried Wilhelm Leibniz (Germany). Both were trying to solve the same fundamental problems:

Newton's Motivation (1666)

Newton was studying motion and gravity. He needed to find instantaneous velocities and accelerations of moving objects. He called his method "fluxions" — quantities that "flow" or change over time.

Leibniz's Motivation (1684)

Leibniz was focused on tangent lines to curves. He wanted to find the slope of any curve at any point. His notation (dy/dx) emphasized the ratio of infinitely small changes.

Both approaches captured the same mathematical concept, but it took another 150 years for mathematicians (notably Augustin-Louis Cauchy in the 1820s) to put the idea on rigorous footing using limits.

Why "Limit" Instead of Infinitesimals?

Early calculus used the notion of "infinitely small quantities" — numbers that are smaller than any positive number but not zero. This was philosophically troubling. The limit definition avoids infinitesimals entirely: we never actually divide by zero, we only examine what happens as we get arbitrarily close to zero.

Building Intuition: From Average to Instantaneous

Let's trace the logical path from average rate of change to the derivative. Consider a function $f(x)$ and two points on its graph:

Point P:

(x, f(x))

Point Q:

(x + h, f(x + h))

The secant line through P and Q has slope:

\text{slope of secant} = \frac{\Delta y}{\Delta x} = \frac{f(x+h) - f(x)}{(x+h) - x} = \frac{f(x+h) - f(x)}{h}

This expression $\frac{f(x+h) - f(x)}{h}$ is called the difference quotient. It measures the average rate of change of $f$ over the interval $[x, x+h]$ .

The Key Insight: Let h Approach Zero

Now imagine what happens as we make $h$ smaller and smaller:

Point Q slides along the curve toward point P
The secant line rotates, getting closer to the tangent line at P
The difference quotient approaches the slope of the tangent line

If we take the limit as $h \to 0$ , we get the instantaneous rate of change — the derivative:

f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

The Formal Definition: Limit of the Difference Quotient

Definition: The Derivative

The derivative of a function $f$ at a point $x$ is defined as:

f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

provided this limit exists. The process of finding the derivative is called differentiation.

Dissecting the Definition

Let's understand each component of this definition:

Symbol	Name	Meaning
f(x)	Function value at x	The y-coordinate at the starting point
f(x+h)	Function value at x+h	The y-coordinate at a nearby point
h	Increment	The horizontal distance between the two points
f(x+h) - f(x)	Rise (Δy)	The vertical change between the two points
[f(x+h)-f(x)]/h	Difference quotient	The slope of the secant line
lim_{h→0}	Limit	What the expression approaches as h gets arbitrarily small

Equivalent Forms of the Definition

The definition can be written in several equivalent ways. Each illuminates a different aspect:

Standard Form (h-notation)

f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

Best for: Computing derivatives from the definition

Point Form (at a specific point a)

f'(a) = \lim_{x \to a} \frac{f(x) - f(a)}{x - a}

Best for: Understanding the limit geometrically

Leibniz Notation

\frac{dy}{dx} = \lim_{\Delta x \to 0} \frac{\Delta y}{\Delta x} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x}

Best for: Chain rule and implicit differentiation

Interactive Visualization: Watching the Limit

The visualization below shows the difference quotient in action. As you decrease $h$ , watch how:

The secant line (blue) rotates toward the tangent line (green dashed)
The difference quotient converges to the exact derivative value
The approximation error shrinks toward zero

\u03B4The Limit of the Difference Quotient

Watch how the secant line (blue) approaches the tangent line (green) as h \u2192 0

Select Function

Point x = 1.00

Increment h = 1.0000(drag to see secant approach tangent)

h \u2248 0.0001h \u2192 0 (limit)h = 2

Difference Quotient (Secant Slope)

f(x+h) - f(x) / h

= (4.0000 - 1.0000) / 1.0000

= 3.000000

True Derivative (Limit as h\u21920)

f'(x) = 2x

at x = 1.00

= 2.000000

Approximation Error:|error| = 1.00000000

As h \u2192 0, the error \u2192 0, and the difference quotient converges to the true derivative.

Convergence Analysis

h	[f(x+h) - f(x)] / h	Error
1	3.00000000	1.00000000
0.1	2.10000000	0.10000000
0.01	2.01000000	0.01000000
0.001	2.00100000	0.00100000
0.0001	2.00010000	0.00010000
h \u2192 0	2.00000000	0

Understanding the Limit Process

The limit $\lim_{h \to 0}$ doesn't mean we substitute $h = 0$ . That would give $\frac{0}{0}$ , which is undefined. Instead, the limit asks:

"What value does the expression approach as h gets closer and closer to 0?"

The Epsilon-Delta Intuition

Formally, the limit exists and equals $L$ if we can make the difference quotient arbitrarily close to $L$ by making $h$ sufficiently small (but not zero).

Think of it this way:

Challenge: "I want the difference quotient within 0.001 of the derivative"
Response: "Use any h with |h| < 0.0001 and you'll get there"
Challenge: "I want it within 0.0000001 of the derivative"
Response: "Use any h with |h| < 0.00000001 and you'll get there"

No matter how close you demand, there's always a small enough h that works. That's what it means for the limit to exist.

When Does the Limit Fail to Exist?

The derivative may not exist at points where:

Case	Example	What Happens
Corner/Cusp	f(x) = \|x\| at x = 0	Left and right limits differ: lim_{h→0⁻} ≠ lim_{h→0⁺}
Vertical tangent	f(x) = ∛x at x = 0	Difference quotient → ±∞
Discontinuity	Step function at jump	Function not continuous, so not differentiable

Computing Derivatives from the Definition

Let's work through several examples to see how the limit process eliminates the 0/0 problem and yields the derivative.

Example 1: f(x) = x² (The Parabola)

Step 1: Write the difference quotient

\frac{f(x+h) - f(x)}{h} = \frac{(x+h)^2 - x^2}{h}

Step 2: Expand the numerator

= \frac{x^2 + 2xh + h^2 - x^2}{h} = \frac{2xh + h^2}{h}

Step 3: Factor out h (this is the key step!)

= \frac{h(2x + h)}{h} = 2x + h

Step 4: Take the limit

f'(x) = \lim_{h \to 0} (2x + h) = 2x

Result:

\frac{d}{dx}[x^2] = 2x

Example 2: f(x) = 1/x (The Hyperbola)

Step 1: Write the difference quotient

\frac{f(x+h) - f(x)}{h} = \frac{\frac{1}{x+h} - \frac{1}{x}}{h}

Step 2: Find common denominator in numerator

= \frac{\frac{x - (x+h)}{x(x+h)}}{h} = \frac{-h}{h \cdot x(x+h)}

Step 3: Cancel h

= \frac{-1}{x(x+h)}

Step 4: Take the limit

f'(x) = \lim_{h \to 0} \frac{-1}{x(x+h)} = \frac{-1}{x^2}

Result:

\frac{d}{dx}\left[\frac{1}{x}\right] = -\frac{1}{x^2}

Example 3: f(x) = sin(x) at x = 0

Setup: The difference quotient at x = 0

f'(0) = \lim_{h \to 0} \frac{\sin(0+h) - \sin(0)}{h} = \lim_{h \to 0} \frac{\sin(h)}{h}

This is the famous limit $\lim_{h \to 0} \frac{\sin h}{h} = 1$

Result:

\frac{d}{dx}[\sin x]\bigg|_{x=0} = \cos(0) = 1

The Pattern

The key algebraic step is always to manipulate the numerator so that we can factor out h and cancel with the h in the denominator. This removes the 0/0 indeterminate form and reveals the derivative.

Real-World Applications

The limit definition of the derivative appears throughout science and engineering whenever we need to understand instantaneous rates of change.

Physics: Instantaneous Velocity and Acceleration

If $s(t)$ is the position of an object at time $t$ :

Velocity (instantaneous rate of position change):

v(t) = \lim_{\Delta t \to 0} \frac{s(t + \Delta t) - s(t)}{\Delta t} = s'(t)

Acceleration (instantaneous rate of velocity change):

a(t) = \lim_{\Delta t \to 0} \frac{v(t + \Delta t) - v(t)}{\Delta t} = v'(t) = s''(t)

Economics: Marginal Analysis

If $C(q)$ is the total cost of producing $q$ units:

Marginal Cost = cost of producing one more unit:

MC(q) = \lim_{\Delta q \to 0} \frac{C(q + \Delta q) - C(q)}{\Delta q} = C'(q)

The marginal cost at $q = 100$ tells you approximately how much it costs to produce the 101st unit.

Biology: Population Growth Rate

If $P(t)$ is the population at time $t$ :

Instantaneous growth rate:

\frac{dP}{dt} = \lim_{\Delta t \to 0} \frac{P(t + \Delta t) - P(t)}{\Delta t}

This tells us how fast the population is growing at each moment, not just the average growth over some period.

Machine Learning Connection

The limit definition of the derivative is fundamental to how machine learning works. Every time a neural network "learns," it's using derivatives.

Gradient Descent: The Heart of Deep Learning

Training a neural network means finding parameters $\theta$ that minimize a loss function $L(\theta)$ . The algorithm is:

\theta_{\text{new}} = \theta_{\text{old}} - \eta \cdot \frac{\partial L}{\partial \theta}

The partial derivative $\frac{\partial L}{\partial \theta}$ is computed using the same limit definition:

\frac{\partial L}{\partial \theta} = \lim_{h \to 0} \frac{L(\theta + h) - L(\theta)}{h}

Modern deep learning frameworks (PyTorch, TensorFlow) use automatic differentiation to compute these derivatives efficiently, but the mathematical foundation is exactly the limit definition.

Numerical Gradients for Debugging

In practice, we often use the limit definition directly to verify that automatic differentiation is working correctly. This is called a gradient check:

Numerical gradient (using the definition):

\frac{\partial L}{\partial \theta} \approx \frac{L(\theta + h) - L(\theta - h)}{2h}

This "centered difference" formula is more accurate than the one-sided version, with error $O(h^2)$ instead of $O(h)$ .

Python Implementation

Investigating the Limit Numerically

The following code demonstrates how the difference quotient converges to the derivative as $h \to 0$ :

Numerical Investigation of the Limit

🐍limit_investigation.py

Explanation(6)

Code(51)

4The Difference Quotient

This is the exact formula from the derivative definition: [f(x+h) - f(x)] / h. It computes the slope of the secant line between (x, f(x)) and (x+h, f(x+h)).

9Investigating the Limit

We compute the difference quotient for progressively smaller values of h to observe how it converges to the true derivative. This is exactly what 'taking the limit' means computationally.

12Sequence of h Values

Each h is smaller than the previous. As h → 0, we expect the difference quotient to approach f'(x). The error column shows how close we get.

20Observing Convergence

For each h, we compute the approximation and its error. Notice how the error shrinks by roughly a factor of 10 each time h shrinks by 10 — this is first-order convergence.

29Testing on x²

For f(x) = x², the derivative is f'(x) = 2x. At x = 3, f'(3) = 6. The difference quotient should converge to 6 as h → 0.

37The Famous sin(x)/x Limit

For f(x) = sin(x) at x = 0, the difference quotient becomes sin(h)/h. This famous limit equals 1, confirming that f'(0) = cos(0) = 1.

45 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4def difference_quotient(f, x, h):
5    """
6    Computes the difference quotient: [f(x+h) - f(x)] / h
7    This approximates the derivative as h → 0.
8    """
9    return (f(x + h) - f(x)) / h
10
11def investigate_limit(f, x, df_exact):
12    """
13    Shows how the difference quotient converges to the derivative
14    as h approaches 0 from different directions.
15    """
16    h_values = [1, 0.5, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001]
17
18    print(f"Investigating f'({x}) using limit definition")
19    print(f"True derivative: {df_exact}")
20    print("-" * 55)
21    print(f"{'h':>12} | {'[f(x+h)-f(x)]/h':>18} | {'Error':>15}")
22    print("-" * 55)
23
24    for h in h_values:
25        approx = difference_quotient(f, x, h)
26        error = abs(approx - df_exact)
27        print(f"{h:>12.6f} | {approx:>18.10f} | {error:>15.2e}")
28
29    print("-" * 55)
30    print(f"{'h → 0':>12} | {df_exact:>18.10f} | {'0':>15}")
31
32# Example 1: f(x) = x² at x = 3
33# Derivative: f'(x) = 2x, so f'(3) = 6
34print("=" * 60)
35print("Example 1: f(x) = x² at x = 3")
36print("=" * 60)
37investigate_limit(lambda x: x**2, x=3, df_exact=6)
38
39# Example 2: f(x) = sin(x) at x = 0
40# Derivative: f'(x) = cos(x), so f'(0) = 1
41print("\n" + "=" * 60)
42print("Example 2: f(x) = sin(x) at x = 0")
43print("=" * 60)
44investigate_limit(np.sin, x=0, df_exact=1)
45
46# Example 3: f(x) = e^x at x = 1
47# Derivative: f'(x) = e^x, so f'(1) = e
48print("\n" + "=" * 60)
49print(f"Example 3: f(x) = eˣ at x = 1")
50print("=" * 60)
51investigate_limit(np.exp, x=1, df_exact=np.e)

Visualizing the Geometric Meaning

This code creates a visual demonstration of how secant lines approach the tangent line:

Geometric Visualization

🐍secant_to_tangent.py

Explanation(4)

Code(80)

7Geometric Visualization

This function creates two plots: one showing secant lines approaching the tangent geometrically, and another showing the numerical convergence of the difference quotient.

19Multiple Secant Lines

We draw secant lines for different h values, colored from light to dark blue. As h decreases, the secant lines rotate toward the tangent line (green dashed).

31The Tangent Line

The tangent line uses the exact derivative f'(x₀) as its slope. This is what the secant lines converge to — it's the visual meaning of the limit.

52Logarithmic Scale

Using a log scale for h shows convergence behavior across many orders of magnitude. As h → 0 (left on the plot), the difference quotient approaches the derivative value.

76 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3from matplotlib.patches import FancyArrowPatch
4
5def visualize_secant_to_tangent(f, df, x0, title=""):
6    """
7    Visualizes how secant lines approach the tangent line
8    as h → 0. This is the geometric meaning of the derivative.
9    """
10    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
11
12    # Left plot: Multiple secant lines approaching tangent
13    ax1 = axes[0]
14    x = np.linspace(x0 - 2, x0 + 2, 200)
15    y = f(x)
16
17    ax1.plot(x, y, 'k-', linewidth=2.5, label='f(x)')
18
19    # Plot secant lines for different h values
20    h_values = [1.5, 1.0, 0.5, 0.2, 0.05]
21    colors = plt.cm.Blues(np.linspace(0.3, 0.9, len(h_values)))
22
23    for h, color in zip(h_values, colors):
24        # Secant line through (x0, f(x0)) and (x0+h, f(x0+h))
25        slope = (f(x0 + h) - f(x0)) / h
26        y_secant = f(x0) + slope * (x - x0)
27        ax1.plot(x, y_secant, '-', color=color, linewidth=1.5,
28                 alpha=0.7, label=f'h = {h}')
29        ax1.plot(x0 + h, f(x0 + h), 'o', color=color, markersize=6)
30
31    # Plot tangent line (the limit)
32    slope_tangent = df(x0)
33    y_tangent = f(x0) + slope_tangent * (x - x0)
34    ax1.plot(x, y_tangent, 'g-', linewidth=2.5,
35             label=f"Tangent (h→0)", linestyle='--')
36
37    # Mark the point
38    ax1.plot(x0, f(x0), 'ro', markersize=10, zorder=5)
39    ax1.annotate(f'P = ({x0}, {f(x0):.2f})', (x0, f(x0)),
40                 xytext=(x0 - 0.5, f(x0) + 0.5),
41                 fontsize=10, ha='right')
42
43    ax1.set_xlim(x0 - 2, x0 + 2)
44    ax1.set_ylim(min(y) - 1, max(y) + 1)
45    ax1.set_xlabel('x', fontsize=12)
46    ax1.set_ylabel('y', fontsize=12)
47    ax1.set_title('Secant Lines → Tangent Line', fontsize=14)
48    ax1.legend(loc='upper left', fontsize=9)
49    ax1.grid(True, alpha=0.3)
50    ax1.axhline(y=0, color='gray', linewidth=0.5)
51    ax1.axvline(x=0, color='gray', linewidth=0.5)
52
53    # Right plot: Convergence of difference quotient
54    ax2 = axes[1]
55    h_vals = np.logspace(-6, 1, 50)
56    dq_vals = [(f(x0 + h) - f(x0)) / h for h in h_vals]
57
58    ax2.semilogx(h_vals, dq_vals, 'b-', linewidth=2,
59                 label='Difference quotient')
60    ax2.axhline(y=df(x0), color='g', linestyle='--', linewidth=2,
61                label=f"f'({x0}) = {df(x0):.4f}")
62
63    ax2.set_xlabel('h (log scale)', fontsize=12)
64    ax2.set_ylabel('[f(x+h) - f(x)] / h', fontsize=12)
65    ax2.set_title('Convergence to Derivative', fontsize=14)
66    ax2.legend(loc='upper right')
67    ax2.grid(True, alpha=0.3)
68    ax2.set_xlim(1e-6, 10)
69
70    plt.suptitle(title, fontsize=14, y=1.02)
71    plt.tight_layout()
72    plt.show()
73
74# Visualize for f(x) = x² at x = 2
75visualize_secant_to_tangent(
76    f=lambda x: x**2,
77    df=lambda x: 2*x,
78    x0=2,
79    title="The Derivative as a Limit: f(x) = x²"
80)

Numerical Differentiation: Approximating Derivatives

When we can't find derivatives symbolically, we use numerical approximations based directly on the limit definition.

Three Common Formulas

Method	Formula	Error Order
Forward difference	f'(x) ≈ [f(x+h) - f(x)] / h	O(h)
Backward difference	f'(x) ≈ [f(x) - f(x-h)] / h	O(h)
Central difference	f'(x) ≈ [f(x+h) - f(x-h)] / (2h)	O(h²)

The central difference is more accurate because errors from the left and right partially cancel. This is why gradient checking in neural networks uses the central difference formula.

Choosing h: The Trade-off

h too large

Poor approximation — the secant line is far from the tangent. Truncation error dominates.

h too small

Numerical instability — subtracting nearly equal numbers causes round-off error to dominate.

Practical Guidance

For 64-bit floating point, $h \approx 10^{-8}$ (forward difference) or $h \approx 10^{-5}$ (central difference) often works well. When in doubt, try different h values and watch for convergence.

Common Mistakes to Avoid

Mistake 1: Setting h = 0 directly

Wrong: "Let h = 0, so we get [f(x) - f(x)] / 0 = 0/0... undefined."

Correct: We never substitute h = 0. We take the limit as h approaches 0, which is a fundamentally different operation.

Mistake 2: Forgetting to simplify before taking the limit

Wrong: "The limit of (x+h)² - x² as h → 0 is 0."

Correct: First simplify the difference quotient to remove the factor of h from numerator and denominator. Then take the limit of what remains.

Mistake 3: Confusing average and instantaneous rate

The difference quotient [f(x+h) - f(x)] / h is an average rate of change over interval [x, x+h].

The derivative f'(x) is the instantaneous rate of change at the single point x.

Mistake 4: Assuming the limit always exists

Not every function has a derivative at every point. Corners (like $|x|$ at 0), cusps, vertical tangents, and discontinuities are places where the derivative doesn't exist.

Test Your Understanding

Test Your UnderstandingQuestion 1 of 8

What does the difference quotient [f(x+h) - f(x)] / h represent geometrically?

ProgressScore: 0/0

Summary

The derivative is defined as the limit of the difference quotient, which captures the idea of instantaneous rate of change without ever dividing by zero.

The Central Equation

f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

Key Concepts

Concept	Description
Difference Quotient	The ratio [f(x+h) - f(x)] / h — slope of the secant line
Secant Line	Line through two points on a curve
Tangent Line	Line touching curve at exactly one point (locally)
The Limit	What the difference quotient approaches as h → 0
Derivative f'(x)	The limit of the difference quotient; instantaneous rate of change

Key Takeaways

The derivative captures instantaneous rate of change by taking a limit of average rates of change
Geometrically, the derivative is the slope of the tangent line, which is the limit of secant line slopes
The limit process avoids the 0/0 problem by examining what happens as we approach h = 0, not at h = 0
Computing derivatives from the definition requires algebraic simplification to cancel the h in the denominator
Numerical differentiation approximates derivatives using small but nonzero h values
This definition is the foundation of machine learning — every gradient computation ultimately rests on this limit

The Philosophical Core:

"We cannot capture 'the instant' by stopping time. Instead, we capture it by examining what happens in the limit of ever-smaller intervals — approaching but never reaching the moment itself."

Coming Next: In the next section, we'll explore differentiability — when does the derivative exist, and what can go wrong at corners, cusps, and discontinuities?