Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section you will be able to:

State what makes a function linear and recognise the form $y = m\,x + b$ .
Compute slope from any two points using $m = \dfrac{\Delta y}{\Delta x}$ — and explain why the answer never depends on which two points you pick.
Translate between three forms of a line: slope–intercept, point–slope, and two-point.
Generalise the slope formula to average rate of change for any function — the secant slope $\dfrac{f(b)-f(a)}{b-a}$ .
See how the secant rotates onto the tangent as the two points collapse — the first taste of the derivative.
Implement all of the above in plain Python and then in PyTorch, and confirm the limit numerically against autograd.

The Big Picture: Constant Rate of Change

A linear function is a function whose rate of change never changes. Pick any starting point, walk any distance to the right, and the function will always step up (or down) by the same multiple of how far you walked.

Imagine you are filling a bathtub from a tap that pours at exactly the same speed every second. After one second the water is up by some amount; after two seconds it is up by twice that amount; after three seconds, three times. The water level is a linear function of time. The story is boring in the best possible way — every second is exactly like the last.

The intuition in one sentence

A linear function is what you get when nothing about the rate ever changes. Everything else we will study in this book — curves, growth, decay, motion — is what happens when the rate does change. Linear is the baseline.

Why is this the right place to start a calculus book? Because all of differential calculus is built on a single dream: zoom in close enough to any smooth curve and it looks like a straight line. The slope of that local straight line is what we will eventually call the derivative. So before we earn the right to study curves, we must master the lines that approximate them.

What is a Linear Function?

A linear function is a function of the form

f(x) = m\,x + b

where $m$ and $b$ are fixed real numbers. The graph is a straight line in the $xy$ -plane.

Symbol	Name	Geometric meaning
x	Input	Horizontal coordinate of any point on the line.
f(x), or y	Output	Vertical coordinate. Always equal to m·x + b.
m	Slope	How much y changes when x changes by 1. Steepness.
b	y-intercept	The value of y when x = 0. Where the line hits the y-axis.

What is NOT a linear function (in this book)

You will sometimes hear that $y = 3x^2 + 7$ or $y = \sin(x)$ are "non-linear". They are. In calculus, the word linear function is reserved strictly for the form $y = m\,x + b$ — first power of $x$ , no squares, no logs, no sines. (Mathematicians in higher courses also call $y = m\,x$ "linear" in the algebraic sense, because $b$ breaks additivity, but the calculus convention is the one above.)

Slope: The Soul of a Linear Function

Take any two points on a line, call them $P_1 = (x_1, y_1)$ and $P_2 = (x_2, y_2)$ . The slope of the line is

m = \dfrac{y_2 - y_1}{x_2 - x_1} = \dfrac{\Delta y}{\Delta x}

The symbol $\Delta$ (Greek capital delta) is the standard mathematical notation for "the change in". $\Delta y$ means "the change in y", $\Delta x$ means "the change in x". The slope is rise over run.

The single most important property

For a linear function the ratio $\Delta y / \Delta x$ gives the same number no matter which two points you pick. That is what "constant rate of change" means in symbols.

Why is the slope constant? A short proof you can do in your head

Take any two points $(x_1, y_1)$ and $(x_2, y_2)$ on the line $y = m\,x + b$ . Then by definition $y_1 = m\,x_1 + b$ and $y_2 = m\,x_2 + b$ . Subtract:

y_2 - y_1 = (m\,x_2 + b) - (m\,x_1 + b) = m\,(x_2 - x_1)

The two $b$ 's cancel. Divide both sides by $x_2 - x_1$ and you get $(y_2 - y_1)/(x_2 - x_1) = m$ . The slope formula always returns $m$ , independent of which two points you picked. This is the algebraic shadow of the geometric fact that the line is perfectly straight.

Interactive Explorer: y = m·x + b

Drag the two coloured points along the line. The dashed amber leg is $\Delta x$ and the dashed red leg is $\Delta y$ . Watch the readout in the side panel: $\Delta y / \Delta x$ never moves from $m$ , no matter where you drag.

Loading linear function explorer…

Make $\Delta x$ huge, then tiny — the ratio does not budge.
Drag $P_1$ to the same place as $P_2$ : the formula goes to $0/0$ (undefined). To talk about slope you must always pick two different points.
Slide $m$ to $0$ : the line is horizontal — "y never changes when x changes".
Set $m$ negative: the line slopes down ("y decreases as x increases").

Worked Example — Pizza Delivery by Hand

Let's leave the abstract and ground everything in a realistic scenario. You deliver pizzas. Your pay model is simple: a flat $\$15$ daily stipend plus $\$3$ per delivery.

Let $x$ be the number of deliveries on a given day and $y$ the dollars you take home. Try to derive the linear law, slope, intercept, and predict your earnings — all by hand — before opening the answer below.

Step-by-step solution (click to expand)

Step 1 — Tabulate the first few values. With $x = 0$ deliveries you still get the stipend, so $y = 15$ . With $x = 1$ : $y = 15 + 3 = 18$ . With $x = 2$ : $y = 15 + 6 = 21$ . With $x = 10$ : $y = 15 + 30 = 45$ .

Step 2 — Find the slope from any two points. Pick $(0, 15)$ and $(10, 45)$ :

m = \dfrac{45 - 15}{10 - 0} = \dfrac{30}{10} = 3

Pick $(1, 18)$ and $(2, 21)$ instead:

m = \dfrac{21 - 18}{2 - 1} = \dfrac{3}{1} = 3

Same answer. Pick $(2, 21)$ and $(10, 45)$ :

m = \dfrac{45 - 21}{10 - 2} = \dfrac{24}{8} = 3

Still $3$ . The slope is $3 per delivery — which matches the wage rule we started with.

Step 3 — Find the y-intercept. From the table at $x = 0$ : $b = 15$ . So the linear law is

y = 3\,x + 15

Step 4 — Predict. If you do $x = 23$ deliveries:

y = 3(23) + 15 = 69 + 15 = 84

Step 5 — Reverse the question. If you want to take home $y = 99$ dollars, how many deliveries? Solve $99 = 3x + 15 \Rightarrow x = 28$ .

Step 6 — Sanity check the units. $m$ has units of dollars per delivery; $b$ has units of dollars. Adding them only makes sense because $m \cdot x$ first multiplies dollars/delivery by deliveries, leaving dollars. Always think in units — it catches half of all algebra bugs.

The y-Intercept b: Where the Line Starts

The constant $b$ in $y = m\,x + b$ answers one simple question: what is y when x is zero? Plug it in: $y(0) = m \cdot 0 + b = b$ .

Geometrically, $b$ is the height at which the line crosses the vertical axis. The interactive explorer above marks it with the green dot. Slide the $b$ slider and the entire line moves up or down without tilting — slope does not depend on intercept.

Real-world meaning of b

In the pizza example $b = 15$ is the base stipend — what you earn for showing up before delivering anything. In physics it is often a starting position. In economics it is fixed cost. Whenever you see $y = m\,x + b$ , ask: what does "zero input" mean here, and is $b$ reasonable at that moment?

Three Ways to Write the Same Line

The same straight line can be described in different algebraic outfits. Knowing all three lets you start from whatever information you happen to have.

Form	Formula	Best used when you know…
Slope–intercept	y = m·x + b	the slope m and the y-intercept b
Point–slope	y − y₁ = m·(x − x₁)	the slope m and one point (x₁, y₁) on the line
Two-point	y − y₁ = ( (y₂ − y₁)/(x₂ − x₁) )·(x − x₁)	two points (x₁, y₁), (x₂, y₂) on the line

All three are algebraically equivalent. The point–slope form is especially important in calculus because it is the form of the tangent line: once you know the slope at a point (the derivative) and the point itself, point–slope is how you write the approximation.

\underbrace{y - y_1}_{\Delta y} = \underbrace{m}_{\text{slope}} \cdot \underbrace{(x - x_1)}_{\Delta x}

Compare this with the slope formula $m = \Delta y / \Delta x$ : point–slope is literally the same equation, just multiplied out. The two are rearrangements of one another. Never memorise both — memorise $m = \Delta y / \Delta x$ and derive the rest.

The Staircase: Watching Slope Step by Step

Here is a way to feel the slope in your body. Imagine walking along the line one unit at a time on the x-axis. Each step rises by exactly $m$ on the y-axis. Press Play and watch the climber zigzag right and up, right and up, always the same horizontal foot and the same vertical rise.

Loading rate-of-change staircase…

If you flip the slope slider to $m = 0$ the rises disappear — the climber walks on flat ground. Flip to $m = 2$ and each step jumps up two. Flip to $m = -1$ and the climber descends one unit per step. The staircase is the slope made physical.

Average Rate of Change for Any Function

We have been very strict so far: the slope formula belongs to straight lines. But the formula is so good that we use it on curves too — we just rename it.

For any function $f$ and any two points $a$ and $b$ in its domain, the average rate of change of $f$ on the interval $[a, b]$ is

\overline{R}_{[a,b]} = \dfrac{f(b) - f(a)}{b - a}

Geometrically this is the slope of the straight line — called a secant — drawn through the two graph points $(a, f(a))$ and $(b, f(b))$ .

The deep observation

For a linear function this average is the same on every interval — we just call it the slope $m$ . For a curve, the average changes depending on where you measure. Calculus is fundamentally the study of how that average behaves as you let the two endpoints crash into each other.

An analogy you can drive home

Average rate of change is exactly the same idea as your average speed on a road trip. If you drive 240 km in 4 hours, your average speed is $240/4 = 60$ km/h. But your speedometer almost certainly was not pinned at 60 km/h the whole time — at any instant your instantaneous speed was probably higher or lower. The speedometer reads what calculus will eventually call the derivative; the average from trip start to trip end is the secant slope.

Secant → Tangent: A Preview of the Derivative

Below is the central animation of the entire book. A curve $f$ , an anchor point $P = (a, f(a))$ , and a movable second point $Q = (a+h, f(a+h))$ . The orange line through them is the secant; the dashed green line is the tangent at $P$ .

Press Shrink h → 0. Watch the orange secant rotate until it nearly overlaps the green tangent. Read the side panel: the average rate $(f(a+h)-f(a))/h$ approaches $f'(a)$ .

Loading secant → tangent preview…

This is the only idea in differential calculus, stripped to its skeleton:

f'(a) = \lim_{h \to 0} \dfrac{f(a + h) - f(a)}{h}

Read it slowly. $f'(a)$ is the instantaneous rate of change of $f$ at $a$ . It equals the limit of the average rate of change, as the interval shrinks to a single point. We will spend the whole of Chapter 2 explaining what $\lim$ means rigorously, but you already have the picture: rotate the secant until both endpoints merge.

Why does linear come first?

Because the tangent line itself is a linear function. The tangent at $a$ has slope $m = f'(a)$ and passes through $(a, f(a))$ , so its equation is

y = f(a) + f'(a)\,(x - a)

That is point–slope form. Every derivative you ever compute will produce a number that is the slope of a particular line. Learning lines deeply now pays back forever.

Python: Computing Rates of Change

Let's pin down everything we just said in code. First we verify that for a linear function the slope is the same for any two points (so our formula really is well-defined). Then we look at the famously non-linear function $f(x) = x^2$ at $a = 1$ and watch the discrete difference quotient converge to $2$ .

From slope formula to numerical derivative

🐍rate_of_change.py

Explanation(14)

Code(28)

1def slope(x1, y1, x2, y2) — the only formula you really need

This is the canonical slope formula: pick any two points on a line and the slope is the change in y divided by the change in x. It is the discrete cousin of the derivative — same idea, no limit yet. The function is a pure Python def so it works on plain floats; later in this section we vectorise it with NumPy/PyTorch.

EXECUTION STATE

⬇ x1, y1 = Coordinates of point P₁. Example: x1 = 0, y1 = 10.

⬇ x2, y2 = Coordinates of point P₂. Example: x2 = 1, y2 = 35.

⬆ returns = A single float: (y2 − y1) / (x2 − x1). For y1 = 10, y2 = 35 it returns 25.0 — that is the slope.

3return (y2 - y1) / (x2 - x1)

Computes Δy ÷ Δx. Notice the order: top is the second y minus the first y, bottom is the second x minus the first x. Reversing both gives the same answer (signs cancel), but mixing the order — e.g. (y2 − y1) / (x1 − x2) — flips the sign and is the most common student bug.

EXECUTION STATE

y2 - y1 = Vertical change Δy. With y1 = 10 and y2 = 35 this is 25.

x2 - x1 = Horizontal change Δx. With x1 = 0 and x2 = 1 this is 1. If this is 0 the line is vertical and slope is undefined — Python will raise ZeroDivisionError.

⬆ return = 25 ÷ 1 = 25.0

6earnings(hours) — a concrete linear law

A wage example: you make $25 for every hour, plus a $10 sign-on bonus. This is exactly y = m·x + b with m = 25 and b = 10. We will sample it at three unrelated hour pairs and observe that the slope between every pair is the same 25 — the defining property of a linear function.

8return 25 * hours + 10

Literal evaluation of the linear formula. Every additional hour adds exactly $25, regardless of how many hours you already worked — that is what 'constant rate of change' means in plain English.

EXECUTION STATE

hours = Input. Real number ≥ 0 in practice but any float works.

⬆ return = earnings(0) = 10, earnings(2) = 60, earnings(3.5) = 97.5. Each pair has the same Δy/Δx.

11pairs = [(0, 1), (2, 7), (3.5, 10.25)]

Three arbitrary hour intervals. They have nothing in common — different starting hour, different width. If we get the same slope from all three pairs, that confirms the law is linear.

EXECUTION STATE

(h1, h2) = First interval (0 → 1), second (2 → 7), third (3.5 → 10.25). Widths 1, 5, 6.75.

12for (h1, h2) in pairs

Iterates through each interval. On each pass we compute the earnings at the two endpoints, then apply our slope() function.

EXECUTION STATE

iteration 1 = h1 = 0, h2 = 1

iteration 2 = h1 = 2, h2 = 7

iteration 3 = h1 = 3.5, h2 = 10.25

13y1 = earnings(h1)

Compute earnings at the left endpoint of the interval.

EXECUTION STATE

iter 1 = earnings(0) = 25·0 + 10 = 10

iter 2 = earnings(2) = 25·2 + 10 = 60

iter 3 = earnings(3.5) = 25·3.5 + 10 = 97.5

14y2 = earnings(h2)

Earnings at the right endpoint.

EXECUTION STATE

iter 1 = earnings(1) = 25·1 + 10 = 35

iter 2 = earnings(7) = 25·7 + 10 = 185

iter 3 = earnings(10.25) = 25·10.25 + 10 = 266.25

15m = slope(h1, y1, h2, y2)

Compute Δy/Δx for the current interval. The whole point of this exercise: this number is the same every time — exactly 25 — even though Δy and Δx are wildly different on each iteration.

EXECUTION STATE

iter 1 = (35 − 10) / (1 − 0) = 25 / 1 = 25.0

iter 2 = (185 − 60) / (7 − 2) = 125 / 5 = 25.0

iter 3 = (266.25 − 97.5) / (10.25 − 3.5) = 168.75 / 6.75 = 25.0

21def f(x): return x * x — a deliberately non-linear function

Now we switch to f(x) = x². This is the simplest curve that is NOT linear. The Δy/Δx ratio will change depending on which two points we pick. Watching how it changes as the two points get closer is the entire idea of the derivative.

24a = 1

The anchor point. We will fix one endpoint at x = 1 and let the other endpoint be x = 1 + h, where h is the gap. As h shrinks, the secant line through (1, f(1)) and (1 + h, f(1 + h)) rotates onto the tangent line at x = 1.

26for h in [1.0, 0.5, 0.1, 0.01, 0.001, 1e-5]

Six progressively smaller gaps. Each new h is roughly an order of magnitude smaller than the previous. This lets us watch a numerical limit form before our eyes.

27avg = (f(a + h) - f(a)) / h

The discrete difference quotient — the same Δy/Δx formula as before, but now applied to the curve. For f(x) = x² at a = 1, algebra gives ((1 + h)² − 1) / h = (1 + 2h + h² − 1) / h = 2 + h. So the answer should be exactly 2 + h.

EXECUTION STATE

h = 1.0 = avg = (2² − 1²) / 1 = 3.000000 — quite far from 2

h = 0.5 = avg = (2.25 − 1) / 0.5 = 2.500000

h = 0.1 = avg = (1.21 − 1) / 0.1 = 2.100000

h = 0.01 = avg = (1.0201 − 1) / 0.01 = 2.010000

h = 0.001 = avg = 2.001000

h = 1e-5 = avg = 2.000010

28Watch the column distance to 2

It halves every step roughly. The average rate of change is silently telling us that the instantaneous rate at x = 1 is exactly 2. We have not done any calculus yet — we have just stared hard at one ratio. This is how Newton and Leibniz first found derivatives.

14 lines without explanation

1def slope(x1, y1, x2, y2):
2    """Slope of the straight line through (x1, y1) and (x2, y2)."""
3    return (y2 - y1) / (x2 - x1)
4
5# A linear law: hourly wage. Earnings y = 25*x + 10
6# (the 10 is a flat sign-on bonus you get even at x = 0 hours).
7def earnings(hours):
8    return 25 * hours + 10
9
10# Sample THREE arbitrary pairs of hours. The slope must be the same.
11pairs = [(0, 1), (2, 7), (3.5, 10.25)]
12for (h1, h2) in pairs:
13    y1 = earnings(h1)
14    y2 = earnings(h2)
15    m  = slope(h1, y1, h2, y2)
16    print(f"hours {h1:>5} -> {h2:<5}   "
17          f"Δy/Δx = ({y2:>6.2f} - {y1:>6.2f}) / ({h2} - {h1}) = {m:.4f}")
18
19# Now do the same for a NON-linear function f(x) = x^2 at a = 1.
20def f(x):
21    return x * x
22
23a = 1
24print("\\nAverage rate of f(x) = x^2 on [1, 1 + h]:")
25print(f"{'h':>8}  {'avg rate':>10}  {'distance to 2':>14}")
26for h in [1.0, 0.5, 0.1, 0.01, 0.001, 1e-5]:
27    avg = (f(a + h) - f(a)) / h
28    print(f"{h:>8}  {avg:>10.6f}  {abs(avg - 2):>14.6f}")

Expected output

hours     0 -> 1       Δy/Δx = ( 35.00 -  10.00) / (1 - 0)     = 25.0000
hours     2 -> 7       Δy/Δx = (185.00 -  60.00) / (7 - 2)     = 25.0000
hours   3.5 -> 10.25   Δy/Δx = (266.25 -  97.50) / (10.25 - 3.5) = 25.0000

Average rate of f(x) = x^2 on [1, 1 + h]:
       h    avg rate   distance to 2
     1.0    3.000000        1.000000
     0.5    2.500000        0.500000
     0.1    2.100000        0.100000
    0.01    2.010000        0.010000
   0.001    2.001000        0.001000
   1e-05    2.000010        0.000010

Notice the structure. The wage example has a flat slope of 25, period. The curve example has a slope of $2 + h$ that you can see converging to 2 in the column on the right. Every time you halve $h$ , the distance to 2 halves with it. That is what "the limit equals 2" means in plain Python.

PyTorch: Vectorising the Rate of Change

Python for-loops are fine for an interactive demonstration but slow once we want to apply the same formula to millions of points or differentiate a neural network. PyTorch does two things for us:

Vectorisation: compute the difference quotient at many $h$ values in parallel on the CPU or GPU.
Autograd: hand us the exact derivative $f'(a)$ with no numerical error — perfect for cross-checking that the limit really is what we claim.

Vectorised difference quotient and an exact derivative

🐍rate_of_change_torch.py

Explanation(9)

Code(18)

1import torch

PyTorch is a tensor library with automatic differentiation. We use it here for two reasons: (1) we can evaluate the difference quotient at every h at once, and (2) we can ask PyTorch for the exact derivative via autograd to compare against our numerical estimate.

3def f(x): return x ** 2

Same f(x) = x² as before, but now x can be a tensor of any shape. PyTorch's ** operator is elementwise — for a tensor of 6 values, it produces a tensor of 6 squared values.

6a = torch.tensor(1.0)

The anchor point as a scalar tensor. Using a tensor (not a plain Python float) lets it participate in broadcast arithmetic on line 9.

EXECUTION STATE

a = tensor(1.0) shape () dtype float32

7h = torch.tensor([1.0, 0.5, 0.1, 0.01, 0.001, 1e-5])

A 1-D tensor of six step sizes. In one line we will compute six difference quotients in parallel — far faster than the Python for-loop on the CPU, and ready for GPU.

EXECUTION STATE

h = tensor([1.0000, 0.5000, 0.1000, 0.0100, 0.0010, 0.00001]) shape (6,)

10avg_rate = (f(a + h) - f(a)) / h

The single most important line. PyTorch broadcasts the scalar a across the 6-element h, so a + h is a 6-element tensor. f(·) squares elementwise. f(a) is also a scalar that broadcasts against the result. Division is elementwise. Output is shape (6,).

EXECUTION STATE

a + h = tensor([2.0000, 1.5000, 1.1000, 1.0100, 1.0010, 1.00001]) — each h added to the anchor

f(a + h) = tensor([4.0000, 2.2500, 1.2100, 1.0201, 1.002001, 1.0000200001]) — those values squared

f(a) = tensor(1.0) — broadcast scalar

⬆ avg_rate = tensor([3.0000, 2.5000, 2.1000, 2.0100, 2.0010, 2.00001]) — converges to 2

14x = torch.tensor(1.0, requires_grad=True)

We create a NEW scalar tensor x with requires_grad = True. This switches on the autograd tape: every operation on x will be recorded so PyTorch can compute the derivative of any scalar that depends on x.

EXECUTION STATE

x = tensor(1.0, requires_grad=True)

15y = f(x)

Compute y = x² = 1 at x = 1, but with the graph attached. PyTorch silently builds a backwards graph that knows y was produced via pow(x, 2).

EXECUTION STATE

y = tensor(1.0, grad_fn=<PowBackward0>) — same value as before, but with provenance

16y.backward()

Triggers reverse-mode differentiation. PyTorch walks the recorded graph and applies the chain rule. For y = x² we get dy/dx = 2x, which at x = 1 is 2.

17x.grad.item() — the truth

After backward(), x.grad holds dy/dx evaluated at the current x. .item() pulls the scalar out as a Python float. The printed value is exactly 2.0 — the same number our numerical avg_rate column was converging to. The derivative is the limit of the difference quotient, full stop.

EXECUTION STATE

x.grad = tensor(2.0)

.item() = 2.0 (Python float)

9 lines without explanation

1import torch
2
3def f(x):
4    return x ** 2
5
6a = torch.tensor(1.0)
7h = torch.tensor([1.0, 0.5, 0.1, 0.01, 0.001, 1e-5])
8
9# Vectorised: a + h broadcasts across the whole h tensor.
10avg_rate = (f(a + h) - f(a)) / h
11print("h        :", h.tolist())
12print("avg rate :", [round(v, 6) for v in avg_rate.tolist()])
13
14# Truth: autograd gives f'(1) exactly.
15x = torch.tensor(1.0, requires_grad=True)
16y = f(x)
17y.backward()
18print("f'(1) by autograd =", x.grad.item())

Expected output

h        : [1.0, 0.5, 0.1, 0.01, 0.001, 1e-05]
avg rate : [3.0, 2.5, 2.1, 2.01, 2.001, 2.00001]
f'(1) by autograd = 2.0

The two answers agree. Numerical limit and symbolic derivative converge to the same number. This is the recipe we will reuse in every later chapter: derive a formula on paper, sanity-check it with a one-line PyTorch experiment.

Why This Matters — Applications

🚗 Physics — uniform motion

For an object moving at constant velocity $v$ , position is $x(t) = v\,t + x_0$ . The slope $v$ is the velocity; the y-intercept $x_0$ is the starting position.

💰 Economics — fixed and marginal cost

Total cost $C(q) = c\,q + F$ for $q$ units. The slope $c$ is the marginal cost per unit; the intercept $F$ is fixed overhead.

🌡️ Engineering — calibration curves

A thermocouple voltage $V$ related to temperature $T$ via $V = \alpha\,T + V_0$ . Linear fit gives the sensor's sensitivity $\alpha$ and offset $V_0$ .

🤖 Machine learning — linear regression

The simplest predictive model is $\hat{y} = w\,x + b$ . Training fits the slope (weight) $w$ and intercept (bias) $b$ to data. Every neuron in a deep network is one of these followed by a non-linear squashing.

The big arc

Linear models are everywhere in applied science, and even when the real-world law is non-linear we typically linearise it by approximating the curve with its tangent. That single move — replace a hard curve locally with an easy line — is the unifying technique of physics, engineering and modern machine learning. The derivative will hand us those tangent lines, but the structure they live in is exactly the $y = m\,x + b$ we are studying today.

Common Pitfalls

Sign errors from inconsistent subtraction order

The slope formula is $(y_2 - y_1)/(x_2 - x_1)$ . If you compute the numerator as $y_2 - y_1$ but the denominator as $x_1 - x_2$ , you flip the sign. Always keep the same order top and bottom.

Vertical lines have no slope

A vertical line like $x = 3$ is not a function (one input cannot map to many outputs), and its slope is undefined, not infinite. The denominator $x_2 - x_1 = 0$ kills the formula.

Confusing slope with the function value

The slope $m$ and the output $y$ are different beasts. The slope measures how fast y changes, not how big y is. A line that sits high but is flat has small slope; a line near the origin that dives steeply has big slope.

Reading the difference quotient out loud

When you see $(f(a+h)-f(a))/h$ , train yourself to read it as "the change in $f$ , divided by the change in $x$ , on a tiny interval starting at $a$ ". The Greek $\Delta$ and the Roman $h$ are the same idea.

Summary

Linear functions are the simplest functions whose rate of change is meaningful — and the rate is constant. This single property forces the graph to be a straight line, makes the slope formula return the same answer for every pair of points, and seeds every later idea in calculus.

Concept	Formula	One-sentence meaning
Linear function	f(x) = m·x + b	First-power formula whose graph is a straight line.
Slope	m = (y₂ − y₁)/(x₂ − x₁)	Constant rise per unit run.
y-intercept	b = f(0)	Where the line crosses the y-axis.
Point–slope form	y − y₁ = m·(x − x₁)	Line written from one point and a slope.
Average rate of change	(f(b) − f(a))/(b − a)	Slope of the secant through two points of any graph.
Derivative (preview)	f'(a) = lim_{h→0} (f(a+h) − f(a))/h	Instantaneous rate, the limit of the average rate as the interval shrinks.

Key Takeaways

A function is linear iff its rate of change is the same on every interval.
The slope formula $m = \Delta y / \Delta x$ works on any two points of a line and returns the same answer.
For a curve, that same formula gives an average rate of change — the secant slope.
As the two endpoints collapse to one, the secant rotates onto the tangent line. Its slope is the derivative.
The tangent itself is a linear function — point–slope form $y = f(a) + f'(a)\,(x - a)$ .
Both plain Python and PyTorch make this concrete: tabulate the difference quotient, watch it converge, double-check with autograd.

The essence in one line:

"A line is the unique function whose rate of change is constant — and a derivative is the slope of the line that hugs a curve at a single point."

Coming next: Section 1.3 zooms out from straight lines to polynomial functions, where the rate-of-change idea finally has something to chew on — the slope of $x^2$ , $x^3$ , and their combinations will be different at every point, foreshadowing Chapter 4's derivative rules.