Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section you will be able to:

Describe continuity at a point in everyday language — "no breaks, no jumps, no holes" — and translate it into a picture of a pencil tracing a graph.
Recognise continuous curves and spot the four canonical failures: jumps, removable holes, infinite poles, and essential oscillations.
State the three demands a function must satisfy at a point to be continuous there, and connect them to the limit machinery from Chapter 2.
Diagnose continuity numerically by comparing $\displaystyle\lim_{x\to a^{-}}f(x)$ , $f(a)$ , and $\displaystyle\lim_{x\to a^{+}}f(x)$ .
Connect continuity to real phenomena — from a ball in flight to the ReLU inside a neural network — and see why it is the gateway assumption for every theorem in the rest of the book.

The Big Picture

Limits tell us where a function wants to go. Continuity is the statement that a function actually arrives. When the limit and the function agree at every point — when the target a curve is aiming at is the same point it reaches — the curve is seamless. No jumps, no holes, no leaps into infinity.

Core idea

A function is continuous at $x=a$ when you can plug $a$ straight into the formula and get the same answer that the graph is clearly heading toward from both sides. In symbols,

\displaystyle\lim_{x\to a}f(x)=f(a).

A function is continuous on an interval when every point of that interval satisfies the equation above.

Continuity is not a bonus feature — it is the gateway assumption for most of calculus. The Intermediate Value Theorem, the Extreme Value Theorem, the Fundamental Theorem of Calculus, every convergence result for numerical methods: all of them begin with the words "let f be continuous on [a, b]". Before we meet those giants, we need a rock-solid mental model of what continuity actually feels like.

Intuition: The Pencil Test

Here is the image every calculus student should carry in their head. Put the tip of a pencil on a curve, then drag it along the graph from left to right. Ask one simple question:

Did I have to lift the pencil? If the answer is no on an interval — the tip stayed on the paper the whole way — the function is continuous on that interval. Every lift is a discontinuity, and every kind of lift corresponds to a different failure mode.

The mental test is surprisingly accurate. A smooth parabola, a sine wave, a growing exponential — none of them ever force you to lift the pencil. A staircase, a signum function, a graph with a tiny hole drilled at one point — all of them do.

Four ways to be forced to lift the pencil

Jump. The curve suddenly leaps to a new height. Example: $\text{sign}(x)$ at 0 or the floor function $\lfloor x\rfloor$ at any integer.
Removable hole. The curve looks continuous except for a single missing point. Example: $\displaystyle\frac{x^{2}-9}{x-3}$ at $x=3$ .
Infinite discontinuity. The graph shoots off to $\pm\infty$ . Example: $1/x$ at 0 (recall §2.4).
Essential (oscillatory). The curve wiggles infinitely fast near the bad point. Example: $\sin(1/x)$ as $x\to 0$ .

Gallery: Four Kinds of Curves

Press play and watch a virtual pencil trace each graph below. The red dot is the pencil tip. Count how many times the tip has to leave the paper:

Loading pencil gallery…

The polynomial needs zero lifts — it is continuous everywhere. Each of the other three requires exactly one lift, and the kind of lift tells you which family of discontinuity you are dealing with. A calculus textbook can feel abstract, but the pencil test keeps the whole chapter grounded in a physical action you could perform with a crayon on graph paper.

“Continuous” means continuous at every point

When we say f is continuous without a qualifier, we mean continuous at every point of its domain — no lifts anywhere. A single bad point is enough to disqualify the function from the label "continuous" on an interval that contains that point. That is why mathematicians are careful to say $f$ is continuous on $(a,b)$ — the interval matters.

The Three Demands of Continuity

The pencil test is great for intuition, but mathematics needs a check that works even when there is no picture in front of us. Turning the picture into a checklist gives the classical three-condition definition: a function $f$ is continuous at $x=a$ iff all three of the following hold:

Demand 1

f(a)

is defined

You can actually plug $a$ into the formula. If $a$ makes a denominator zero or sits outside the domain, this demand fails before we even start.

Demand 2

\displaystyle\lim_{x\to a}f(x)

exists

Both one-sided limits must exist and agree on a single real number. No jumps, no infinite escape, no oscillatory tantrum.

Demand 3

\displaystyle\lim_{x\to a}f(x)=f(a)

The value the function is heading to equals the value it actually takes. This is the demand a removable hole violates even when 1 and 2 might otherwise be satisfiable.

Each of the four failure modes in the gallery corresponds to a specific demand collapsing. The table below shows which one breaks where:

Failure	Demand 1: f(a) defined?	Demand 2: limit exists?	Demand 3: limit = f(a)?
Continuous	Yes	Yes	Yes
Jump	Usually yes	No — two sides disagree	—
Removable hole	No	Yes	— (can't compare)
Infinite pole	No	No (limit is ±∞, not a real number)	—
Oscillatory	Sometimes	No — no single limit	—

Equivalent one-line slogan

The three demands compress into a single equation that you should memorise for life:

\displaystyle\lim_{x\to a}f(x)=f(a).

Every time this equality holds, the pencil never left the paper at $x=a$ . Every time it fails, you can classify the failure by examining which of the three demands broke.

Interactive: The Continuity Microscope

Pick a function below. A green dot probes from the left, an orange dot from the right, and the red dot marks $f(a)$ itself. Shrink $h$ and watch the three values race toward (or refuse to race toward) the same height:

Loading continuity microscope…

Things to try

Start with the polynomial and shrink $h$ . All three numbers converge on 3 — every demand satisfied.
Switch to $\text{sign}(x)$ . The green dot sits at −1, the orange at +1, and the red at 0. No matter how small you make $h$ , the gap never closes: Demand 2 fails.
Try $\frac{x^{2}-9}{x-3}$ . Green and orange both converge to 6, but the red "f(a)" badge reads undefined — Demand 1 fails. Plugging the hole with $f(3)=6$ would fix it.
Compare the floor function with sign(x). Floor is right-continuous at every integer but not two-sided continuous: its left-hand limit disagrees, just like sign(x).

Worked Example — Three Functions at a Glance

Let us apply the three demands by hand to three functions at three different points. Try it yourself before expanding the walkthrough.

\displaystyle f(x)=x^{2}-3x+5 \;\; \text{at}\;\; a=2

\displaystyle g(x)=\text{sign}(x) \;\; \text{at}\;\; a=0

\displaystyle h(x)=\frac{x^{2}-9}{x-3} \;\; \text{at}\;\; a=3

Expand step-by-step walkthrough

Function f — the smooth polynomial.

Demand 1. $f(2)=2^{2}-3\cdot 2+5=4-6+5=3$ . Defined ✅.
Demand 2. Limit laws (sum, product of continuous pieces) give $\lim_{x\to 2}(x^{2}-3x+5)=3$ . Exists ✅.
Demand 3. $\lim=3=f(2)$ . Matches ✅.

Verdict: continuous at $a=2$ . In fact, every polynomial is continuous on all of $\mathbb{R}$ .

Function g — the sign function.

Demand 1. $g(0)=0$ . Defined ✅.
Demand 2. $\lim_{x\to 0^{-}}g(x)=-1$ , $\lim_{x\to 0^{+}}g(x)=+1$ . The two sides disagree, so the two-sided limit does not exist. Demand 2 fails ❌.
Demand 3. Irrelevant once Demand 2 is broken.

Verdict: jump discontinuity at $a=0$ . No plug-in value of $g(0)$ can repair it — the two sides will never meet.

Function h — the removable hole. First simplify: $(x-3)(x+3)/(x-3)=x+3$ whenever $x\neq 3$ .

Demand 1. $h(3)$ requires 0/0 — undefined. Demand 1 fails ❌.
Demand 2. $\lim_{x\to 3}(x+3)=6$ . Exists ✅.
Demand 3. There is no $h(3)$ to compare against.

Verdict: removable discontinuity at $a=3$ . Define a patched function $\tilde h$ that agrees with $h$ everywhere except $\tilde h(3)=6$ , and $\tilde h$ becomes continuous everywhere.

Sanity check. Using a calculator with $h=0.0001$ :

Function	f(a−h)	f(a)	f(a+h)
x² − 3x + 5 at a=2	+2.9999	3	+3.0001
sign(x) at a=0	−1	0	+1
(x² − 9)/(x − 3) at a=3	+5.9999	undefined	+6.0001

The numbers match our three verdicts exactly.

Python: A Hand-Made Continuity Tester

The three demands translate into a tiny Python script. Instead of reaching for SymPy or NumPy, we'll build the tester ourselves so every line is transparent. Click any line in the editor to see what the variables hold on that line across all three experiments.

Plain Python — three demands as executable code

🐍continuity_tester.py

Explanation(31)

Code(38)

1Comment — the goal

Declares the experiment. Instead of relying on a graph, we will build the definition of continuity as executable code: compare three numbers — left limit, f(a), right limit — and flip a verdict flag when they all agree.

2# Idea block — the three numbers

Restates the mathematical definition in a comment. Continuity at x=a is the statement that approaching a from the left, plugging in a, and approaching a from the right all land on the same real number.

3Comment — equation form

The equation the next function will check numerically. All three quantities must be equal (and finite). If any disagrees, the function has a discontinuity at a.

4Comment — approximation strategy

We cannot compute true limits in Python, but we can approximate them by evaluating f at a - h and a + h for a small positive h (like 1e-6). If h is tiny enough, those values are indistinguishable from the true one-sided limits.

6def continuity_report(f, a, name, h=1e-6):

Defines a reusable tester. It receives a function to probe, the point a where we suspect something might happen, a label for printing, and a default step size h of 10⁻⁶ (small enough to behave like a limit for most well-behaved functions).

EXECUTION STATE

⬇ input: f = A Python function (or lambda) taking one real argument. Examples: lambda x: x*x, math.sin, our custom sign(x).

⬇ input: a = The point we suspect might be a discontinuity. Example: a=2 for a polynomial, a=0 for sign(x), a=3 for (x²−9)/(x−3).

⬇ input: name = A human-readable label just for printing, e.g. 'x^2 - 3x + 5'. Keeps the report legible when we probe multiple functions.

⬇ input: h=1e-6 = Default probe distance. Written 1e-6 = 0.000001. Small enough to mimic a limit but large enough to avoid float underflow.

⬆ returns = None — the function only prints. A production version would return a dict, but for teaching we just emit readable text.

7Docstring — three-line summary

Triple-quoted string that documents what the function does. Shows up with help(continuity_report). Professional courtesy for future readers.

8Comment — step 1 preview

Section divider. Marks the part of the function that computes f(a) itself. This is the middle of the three numbers the definition demands.

9try: — guard f(a) against ZeroDivisionError

Starts a try/except block. Some discontinuities are undefined at a (like (x²−9)/(x−3) at x=3, which is 0/0). Without the guard, calling f(a) would crash the whole program.

EXECUTION STATE

📚 try / except = Python's error-handling construct. Code in the try runs optimistically; if an exception fires, control jumps to the matching except block instead of crashing the program.

10fa = f(a)

Calls the user-provided function at the suspicious point a. This is the single number that Requirement 2 of continuity asks about: 'the function value at a'. If this fails with a ZeroDivisionError, the function is already disqualified.

EXECUTION STATE

fa = The direct function value. For poly at a=2: fa = 2² − 3·2 + 5 = 4 − 6 + 5 = 3. For sign at a=0: fa = 0. For removable at a=3: throws ZeroDivisionError.

11except ZeroDivisionError: fa = None

Catches the specific 'divide by zero' crash. Instead of propagating the error, we record that f(a) is undefined by setting fa = None. None is Python's stand-in for 'no value' and is easy to check later.

EXECUTION STATE

📚 None = Python's singleton for 'nothing here'. We use it as a flag meaning 'f is undefined at a'. Later we check 'fa is not None' to decide whether Requirement 2 holds.

13Comment — step 2 preview

Section header for the numerical one-sided limits. The true limits are out of reach in code, but evaluating at a ± h for tiny h is a very good stand-in.

14left = f(a - h) — approach from below

Evaluates f just to the left of a. With a=2 and h=1e-6, that is f(1.999999). Any well-behaved function will return something indistinguishable from its true left-limit there.

EXECUTION STATE

a - h = a minus the probe step. Examples: 2.0 − 1e-6 = 1.999999; 0.0 − 1e-6 = −1e-6; 3.0 − 1e-6 = 2.999999.

⬆ left (poly, a=2) = f(1.999999) = 1.999999² − 3·1.999999 + 5 = 2.999997000001 ≈ +3.000000

⬆ left (sign, a=0) = f(−1e-6) = −1 (the sign of any negative number is −1)

⬆ left (removable, a=3) = f(2.999999) = (2.999999² − 9) / (2.999999 − 3) ≈ +5.999999

15right = f(a + h) — approach from above

Mirror image: evaluates f just to the right of a. Combined with 'left', we can tell whether the two sides agree — the core of whether a two-sided limit exists.

EXECUTION STATE

a + h = a plus the probe step. 2.000001, 1e-6, 3.000001 for the three examples.

⬆ right (poly, a=2) = f(2.000001) ≈ +3.000001

⬆ right (sign, a=0) = f(+1e-6) = +1

⬆ right (removable, a=3) = f(3.000001) ≈ +6.000001

17Comment — step 3 preview

Section header for the three Boolean checks that implement the official definition of continuity.

18limit_exists = abs(left - right) < 1e-3

Do the two one-sided limits agree? If their difference is tiny (below a tolerance of 0.001), we accept them as the same number. This is the first demand of continuity: the two-sided limit must exist.

EXECUTION STATE

📚 abs() = Python built-in: returns the absolute value. |−5| = 5, |0.0012| = 0.0012. Used so that 'left − right' and 'right − left' give the same verdict.

→ poly (a=2) = |3.000000 − 3.000001| = 0.000001 < 0.001 → True

→ sign (a=0) = |−1 − 1| = 2 > 0.001 → False (two sides disagree)

→ removable (a=3) = |5.999999 − 6.000001| ≈ 0 → True (limit exists, it's 6)

19matches_fa = (fa is not None) and abs((left+right)/2 - fa) < 1e-3

Does the limit equal f(a)? First confirm f(a) is defined at all, then check that the midpoint of the two probes (our best estimate of the limit) matches f(a) within tolerance. This is the third demand of continuity.

EXECUTION STATE

(left + right) / 2 = Midpoint of the two one-sided probes. For symmetric functions this is the classic symmetric-difference estimate of the limit. Example for poly: (3.000000 + 3.000001)/2 ≈ 3.0000005.

and = Python's short-circuit logical AND. The right side is only evaluated if fa is not None. Guards us against crashing on 'None − number' arithmetic.

→ poly (a=2) = (fa is not None) = True; |3.0000005 − 3| ≈ 0 < 0.001 → True

→ sign (a=0) = (fa=0 is not None) = True; |0 − 0| = 0 < 0.001 → True — but limit_exists is False, so overall continuous stays False.

→ removable (a=3) = fa is None → short-circuits → False. f is undefined at 3, so it cannot match.

20continuous = limit_exists and matches_fa

The verdict. Both booleans must be True to call the function continuous at a. This is the executable form of 'lim = f(a) exists AND equals f(a)'.

EXECUTION STATE

⬆ continuous (poly, a=2) = True AND True = True → Continuous ✅

⬆ continuous (sign, a=0) = False AND True = False → Discontinuous (jump) ❌

⬆ continuous (removable, a=3) = True AND False = False → Discontinuous (removable) ❌

22print(f"--- {name} at x = {a} ---")

Prints a header for each report. f"…" is a Python f-string: any expression inside {curly braces} is evaluated and inserted into the string. Here {name} and {a} are substituted with their runtime values.

EXECUTION STATE

📚 f"…" = Formatted string literal. Example: name='x^2'; a=2 → f"{name} at {a}" becomes 'x^2 at 2'.

23print(f" left limit approx: {left:+.6f}")

Emits the numerical left-limit. The :+.6f format specifier means 'always show a sign (+ or −), 6 digits after the decimal point, float'. Produces aligned columns across reports.

EXECUTION STATE

+.6f = Format specifier: '+' forces a sign character; '.6' means 6 fractional digits; 'f' means fixed-point float. Example: 3.0 → '+3.000000'.

24print(f" right limit approx: {right:+.6f}")

Same format, right-limit side. Reading left and right side by side is what makes discontinuities pop out visually.

25print(f" f({a}) : {fa}")

Prints the direct function value. Notice we do not use :+.6f here — fa might be None, which has no numeric format. Default formatting prints 3, 0, or None as appropriate.

26print(f" limit exists? {limit_exists}")

Prints the first Boolean. Python converts True/False to the strings 'True'/'False' automatically inside an f-string.

27print(f" matches f(a)? {matches_fa}")

Prints the second Boolean. Together with the previous line, readers can see which of the three demands of continuity failed for a discontinuous function.

28print(f" continuous? {continuous}\\n")

Final verdict with a trailing blank line (the \n inside \\n is an escaped newline). The blank line separates one report from the next in the terminal output.

EXECUTION STATE

\\n = An escaped backslash followed by n. Inside a Python string \n means newline. Inside an f-string expression you write \\n so the backslash itself survives into the final string.

30Comment — the candidates

Header comment marking the start of the experiments. We will now define three lambdas, one for each kind of behavior.

31poly = lambda x: x*x - 3*x + 5

Defines the polynomial as a one-line anonymous function. Lambdas are perfect for throw-away probes: they accept x and return a value, nothing more.

EXECUTION STATE

📚 lambda = Python's anonymous function keyword. lambda x: expr is shorthand for def f(x): return expr. We use it to keep each function on a single line.

→ poly(2) = 2·2 − 3·2 + 5 = 4 − 6 + 5 = 3

32sign = lambda x: (-1 if x < 0 else (1 if x > 0 else 0))

Defines sign(x) with a nested ternary: returns −1 if x < 0, else +1 if x > 0, else 0. This creates a jump discontinuity at x=0 because the two halves can never meet.

EXECUTION STATE

→ sign(−0.5) = -1

→ sign(0.0) = 0 — the value at the jump is the middle option

→ sign(+0.5) = +1

33removable = lambda x: (x*x - 9) / (x - 3)

Defines the textbook removable-hole example. Algebraically (x² − 9)/(x − 3) = x + 3 for every x ≠ 3, but Python evaluates the expression literally, so at x = 3 it tries 0/0 and throws ZeroDivisionError.

EXECUTION STATE

→ removable(2.999999) = (8.999994... − 9) / (−0.000001) ≈ +5.999999

→ removable(3.000001) = (9.000006... − 9) / (+0.000001) ≈ +6.000001

→ removable(3.0) = ZeroDivisionError — hole at the point itself

35continuity_report(poly, 2.0, "x^2 - 3x + 5")

Runs the tester on the polynomial. Expect all three numbers (left≈3, f(2)=3, right≈3) to match → CONTINUOUS.

EXECUTION STATE

→ output =

left  limit approx: +3.000000
right limit approx: +3.000000
f(2.0)             : 3.0
limit exists?       True
matches f(a)?       True
continuous?         True

36continuity_report(sign, 0.0, "sign(x)")

Runs the tester on sign(x). Expect left = −1, right = +1 → limit_exists fails → NOT CONTINUOUS.

EXECUTION STATE

→ output =

left  limit approx: -1.000000
right limit approx: +1.000000
f(0.0)             : 0
limit exists?       False
matches f(a)?       True
continuous?         False

37continuity_report(removable, 3.0, "(x^2 - 9)/(x - 3)")

Runs the tester on the removable-hole function. Expect the two limits to agree at ≈6, but f(3) to be undefined → matches_fa fails → NOT CONTINUOUS.

EXECUTION STATE

→ output =

left  limit approx: +5.999999
right limit approx: +6.000001
f(3.0)             : None
limit exists?       True
matches f(a)?       False
continuous?         False

7 lines without explanation

1# A hand-made continuity tester.
2# Idea: at x = a, a function is continuous iff
3#   lim_{x -> a-} f(x)  ==  f(a)  ==  lim_{x -> a+} f(x)
4# We approximate the two one-sided limits by probing a tiny step away.
5
6def continuity_report(f, a, name, h=1e-6):
7    """Print f(a), left-limit approx, right-limit approx, and the verdict."""
8    # Step 1: function value at the point itself
9    try:
10        fa = f(a)
11    except ZeroDivisionError:
12        fa = None                          # f(a) undefined
13
14    # Step 2: numerical one-sided limits (tiny probe on each side)
15    left  = f(a - h)                       # approach from below
16    right = f(a + h)                       # approach from above
17
18    # Step 3: compare the three numbers
19    limit_exists = abs(left - right) < 1e-3
20    matches_fa   = (fa is not None) and abs((left + right) / 2 - fa) < 1e-3
21    continuous   = limit_exists and matches_fa
22
23    print(f"--- {name} at x = {a} ---")
24    print(f"  left  limit approx: {left:+.6f}")
25    print(f"  right limit approx: {right:+.6f}")
26    print(f"  f({a})             : {fa}")
27    print(f"  limit exists?       {limit_exists}")
28    print(f"  matches f(a)?       {matches_fa}")
29    print(f"  continuous?         {continuous}\n")
30
31# Three candidate functions
32poly      = lambda x: x*x - 3*x + 5                 # smooth polynomial
33sign      = lambda x: (-1 if x < 0 else (1 if x > 0 else 0))
34removable = lambda x: (x*x - 9) / (x - 3)           # hole at x=3
35
36continuity_report(poly,      2.0, "x^2 - 3x + 5")
37continuity_report(sign,      0.0, "sign(x)")
38continuity_report(removable, 3.0, "(x^2 - 9)/(x - 3)")

Running the script produces three reports. The polynomial report prints continuous? True. The sign report prints limit exists? False and continuous? False. The removable report prints limit exists? True but matches f(a)? False — and therefore continuous? False. Each failure mode lights up a different part of the checklist.

PyTorch: Why Networks Love Continuous Activations

Continuity is not a mathematical nicety — it is a hard requirement for gradient-based learning. Every time a neural network trains, the chain rule multiplies derivatives through the computation graph. If one link in that chain is discontinuous, the derivative blows up or collapses to zero, and learning stops.

Compare two activations side by side. $\text{ReLU}(x)=\max(0,x)$ is continuous everywhere (the two halves meet smoothly at the origin). $\text{sign}(x)$ is the textbook jump: it leaps from $-1$ to $+1$ at $x=0$ . Let autograd reveal the difference.

PyTorch — continuous vs discontinuous activations

🐍continuity_activations.py

Explanation(17)

Code(22)

1import torch

Loads the PyTorch tensor library. We need tensors (for batched elementwise evaluation) and autograd (to see how gradients behave through a continuous function vs a discontinuous one).

EXECUTION STATE

📚 torch = PyTorch library: provides the Tensor type (n-dimensional array with autograd support), many math functions, and the nn module for neural networks.

2import torch.nn.functional as F

Loads PyTorch's stateless function module aliased as F. This is where F.relu, F.softmax, F.cross_entropy live — pure functions without trainable parameters.

EXECUTION STATE

📚 F = Alias for torch.nn.functional. Examples: F.relu(x) = elementwise max(0, x); F.softmax(x, dim=-1) = row-wise softmax.

4Comment — probe design

Describes the choice of x-values. Straddling zero is deliberate: both activations behave identically far from the origin, but their continuity disagreement lives right at x=0.

5x = torch.tensor([...], requires_grad=True)

Builds a 1-D tensor of 5 probe points and turns on gradient tracking. Every op we do on x is recorded so PyTorch can compute df/dx via the chain rule.

EXECUTION STATE

📚 torch.tensor(data, requires_grad=...) = Factory that copies a Python list into a new tensor. If requires_grad=True, autograd tracks the tensor; .grad is populated after .backward().

⬇ arg 1: data = [-0.10, -0.01, 0.0, 0.01, 0.10] — five x-values straddling 0.

⬇ arg 2: requires_grad=True = Turns on gradient tracking. Default is False; without this flag x.grad stays None after backward().

⬆ result: x (shape [5], dtype float32) = [-0.1000, -0.0100, 0.0000, 0.0100, 0.1000]

7Comment — ReLU is continuous

Reminder that ReLU(x) = max(0, x) is continuous everywhere: the two halves meet at (0, 0) without a jump. Differentiable everywhere except exactly at 0, where it has a corner.

8y_relu = F.relu(x)

Applies ReLU elementwise to every entry of x. Below 0 the output is 0; above 0 the output is x itself. At 0 both formulas meet at 0, so the graph has no jump.

EXECUTION STATE

📚 F.relu(x) = Elementwise max(0, x). f(x) = 0 for x ≤ 0; f(x) = x for x > 0. Continuous at 0 because both branches agree there.

⬇ arg: x = [-0.10, -0.01, 0.00, 0.01, 0.10]

⬆ result: y_relu = [0.00, 0.00, 0.00, 0.01, 0.10]

→ continuity note = Compare adjacent outputs 0.00 → 0.00 → 0.00 → 0.01 → 0.10. No jumps; the sequence slides through zero smoothly.

10Comment — sign is discontinuous

Reminder that torch.sign steps from −1 to +1 at x=0 with a single value (0) exactly at the jump. This is the textbook jump discontinuity.

11x2 = torch.tensor([...], requires_grad=True)

Creates a fresh copy of the probe points for the second experiment. We need a separate tensor because x already has autograd state attached to y_relu; reusing x would accumulate gradients from both graphs into the same .grad buffer.

EXECUTION STATE

→ why a fresh tensor? = Autograd accumulates: calling backward() on both graphs with the same leaf would leave x.grad holding relu'(x) + sign'(x). Using x2 keeps the two experiments independent.

12y_sign = torch.sign(x2)

Applies sign elementwise. Returns −1 for negatives, +1 for positives, 0 for zero. The jump from −1 to +1 across x=0 is the canonical discontinuity.

EXECUTION STATE

📚 torch.sign(x) = Elementwise sign. −1 for x<0, 0 for x=0, +1 for x>0. Discontinuous at x=0: the two one-sided limits are −1 and +1.

⬇ arg: x2 = [-0.10, -0.01, 0.00, 0.01, 0.10]

⬆ result: y_sign = [-1.00, -1.00, 0.00, 1.00, 1.00]

→ discontinuity note = Neighbouring outputs go −1 → −1 → 0 → +1 → +1. The middle row jumps by a whole unit across h=0.01.

14Comment — backward-pass preview

Marks the boundary between forward and backward. The next two lines ask autograd to compute df/dx for each activation, so we can compare how learning propagates through a continuous vs discontinuous function.

15y_relu.sum().backward()

.sum() reduces the 5-element vector to a scalar so .backward() has a single number to differentiate. Then .backward() walks the graph backwards, accumulating df/dx_i into x.grad — which equals the elementwise derivative of ReLU.

EXECUTION STATE

📚 .sum() = Reduces a tensor to a scalar by addition. .backward() requires a scalar target; summing preserves per-element gradients.

📚 .backward() = Reverse-mode automatic differentiation. Walks the computation graph from the output scalar back to every leaf tensor with requires_grad=True, accumulating into .grad.

→ expected relu'(x) = 0 where x ≤ 0, 1 where x > 0. PyTorch picks 0 at exactly x = 0.

⬆ side effect: x.grad = [0.0, 0.0, 0.0, 1.0, 1.0]

→ training signal = Positive inputs receive a non-zero gradient — the network can learn from them. Negative inputs get 0 (the 'dying ReLU' phenomenon).

16y_sign.sum().backward()

Same operation for sign(x). But because sign is flat (constant) everywhere except the jump, its derivative is 0 away from 0 and undefined at 0. Autograd chooses the subgradient 0 at the jump, producing zero gradients everywhere.

EXECUTION STATE

→ expected sign'(x) = 0 for every x ≠ 0 (flat pieces), undefined at x = 0 (a jump has no slope). PyTorch returns 0.

⬆ side effect: x2.grad = [0.0, 0.0, 0.0, 0.0, 0.0]

→ why this kills training = Every parameter update uses the chain rule. If one link in the chain is sign, its gradient is 0 → the whole chain is 0 → no learning. This is why modern networks almost never use hard threshold activations.

18print("x :", x.tolist())

Prints the probe tensor as a Python list. .tolist() copies tensor data into a nested list of floats (detaching from autograd — fine here, we just want to look).

EXECUTION STATE

📚 .tolist() = Tensor method: copies data into a (nested) list of Python floats. Breaks the autograd link; use for display only.

19print("relu(x) :", y_relu.tolist())

Emits the ReLU output vector. Reading it shows the continuous pass-through above 0 and hard-zero below.

EXECUTION STATE

→ output = [0.0, 0.0, 0.0, 0.009999999776482582, 0.10000000149011612]

20print("sign(x) :", y_sign.tolist())

Emits the sign output vector. The jump from −1 to +1 across the middle entry is the visual signature of a jump discontinuity.

EXECUTION STATE

→ output = [-1.0, -1.0, 0.0, 1.0, 1.0]

21print("relu'(x) :", x.grad.tolist())

Prints ReLU's per-element derivative. Half the entries are 0 (dead zone), half are 1 (active zone).

EXECUTION STATE

→ output = [0.0, 0.0, 0.0, 1.0, 1.0]

22print("sign'(x) :", x2.grad.tolist())

Prints sign's gradient. All zeros — the discontinuity destroys any useful learning signal. A neural network with this activation cannot train by gradient descent.

EXECUTION STATE

→ output = [0.0, 0.0, 0.0, 0.0, 0.0]

→ lesson = Continuity is not aesthetics — it is a hard requirement for gradient-based learning. Engineers smooth discontinuous ideas (sigmoid instead of threshold, soft-argmax instead of argmax) precisely to restore a usable gradient.

5 lines without explanation

1import torch
2import torch.nn.functional as F
3
4# Probe points that straddle x = 0 — the junction where activations differ the most.
5x = torch.tensor([-0.10, -0.01, 0.0, 0.01, 0.10], requires_grad=True)
6
7# ReLU: continuous, piecewise linear. f(x) = max(0, x).
8y_relu = F.relu(x)
9
10# Sign: discontinuous, jumps from -1 to +1 at the origin.
11x2 = torch.tensor([-0.10, -0.01, 0.0, 0.01, 0.10], requires_grad=True)
12y_sign = torch.sign(x2)
13
14# Backward pass on each — the gradient reveals how training would respond.
15y_relu.sum().backward()
16y_sign.sum().backward()
17
18print("x          :", x.tolist())
19print("relu(x)    :", y_relu.tolist())
20print("sign(x)    :", y_sign.tolist())
21print("relu'(x)   :", x.grad.tolist())
22print("sign'(x)   :", x2.grad.tolist())

The lesson the gradient is teaching

ReLU's gradient is $[0,0,0,1,1]$ — a mix of "dead" and "alive" units that can still update the live half. Sign's gradient is $[0,0,0,0,0]$ — completely silent. Gradient descent cannot escape a flat landscape, so a network with hard-threshold activations cannot learn. This is why the field replaced step functions with sigmoids in the 1980s, and sigmoids with ReLUs in the 2010s: each step moved the network toward more usable continuity.

Where Continuity Shows Up in the Real World

🚀 Physics — trajectories

Newton's laws forbid teleportation. Position, velocity, and acceleration are all continuous in time for any mass moving under bounded forces. Any model that predicts a jump in position is a bug — usually a sign of missing impulse or a misplaced boundary condition.

📈 Finance — price vs cash flow

Stock prices are continuous during trading hours (modulo bid-ask spread). A dividend payment, however, creates a genuine jump: the stock re-opens lower by exactly the dividend. Continuity tells quant models where to apply the no-arbitrage "jump correction".

⚡ Engineering — signals and circuits

Real voltages rise continuously; an ideal step function is an engineering fiction. The rise time of a signal — the short window in which it climbs from 10% to 90% — is the concrete measurement of how continuous the signal actually is. Faster rise time = closer to a discontinuity = harder EMI problems.

🤖 Machine learning — the chain rule

Every differentiable activation is continuous, but not every continuous activation is differentiable (ReLU has a corner at 0). Training relies on both properties: continuity for the forward pass to be well-defined, differentiability (almost everywhere) for the backward pass.

Common Pitfalls

“The graph has no jumps” is not enough

A function can fail Demand 1 (undefined) while the pencil-test picture looks seamless on either side. The removable hole at $x=3$ above is exactly that case: the left and right pieces line up perfectly at height 6, but $f(3)$ does not exist, so the function is not continuous at 3. Always check that $a$ is in the domain.

Continuous vs differentiable

Continuity is a weaker condition than differentiability. ReLU is continuous everywhere but not differentiable at 0. The absolute value $|x|$ is continuous on all of $\mathbb{R}$ but has a corner at 0. Weierstrass' nowhere-differentiable function is continuous everywhere and differentiable nowhere. Never equate the two.

One-sided continuity is a thing

A function can be right-continuous at $a$ — meaning $\lim_{x\to a^{+}}f(x)=f(a)$ — without being left-continuous. The floor function $\lfloor x\rfloor$ is right-continuous at every integer but not left-continuous. Two-sided continuity requires both sides to agree.

Summary

Idea	Formula / Description
Pencil test	Draw the graph without lifting the pencil on the interval
Equation form	lim_{x→a} f(x) = f(a)
Demand 1	f(a) is defined
Demand 2	lim_{x→a} f(x) exists
Demand 3	The limit equals f(a)
Jump	Demand 2 fails — two one-sided limits disagree
Removable	Demand 1 fails — limit exists but f(a) undefined
Infinite	Demand 2 fails — limit is ±∞
Essential / oscillatory	Demand 2 fails — no single limit

Key Takeaways

A function is continuous at $a$ iff $\lim_{x\to a}f(x)=f(a)$ . Everything else in this chapter is an unpacking of that single equation.
The pencil test is the gold standard for intuition: "did I have to lift the pencil?" The answer immediately tells you whether a point is continuous or which kind of failure you are looking at.
Four canonical failures — jump, removable, infinite, essential — each break a specific demand of the three-condition definition.
Numerical testers, pencil traces, and graphs are all views of the same object. Switching between them is the single most useful habit in this chapter.
Continuity is the admission ticket for the Intermediate Value Theorem, the Extreme Value Theorem, and every major calculus result. Without it, the machinery ahead cannot start.

The Essence:

“A function is continuous when the number it is heading to is the same number it arrives at — no teleportation, no cliffs, no missing pixels.”

Coming next: §3.2 turns the pencil test into the rigorous three-condition definition and shows how to prove continuity using limit laws instead of pictures. You will see that every result from Chapter 2 was secretly preparing you to check Demand 2.