Chapter 3
18 min read
Section 26 of 353

Properties of Continuous Functions

Continuity — No Breaks, No Jumps

Learning Objectives

By the end of this section you will be able to:

  1. State and justify the algebra of continuity — sums, differences, products, quotients, and compositions of continuous functions are continuous.
  2. Certify that a function is continuous by decomposing it into known-continuous building blocks, instead of rerunning the three-condition test from §3.2 from scratch.
  3. Predict the only way an algebraic combination can fail: a denominator vanishing, or a composition landing on a point outside the inner function's domain.
  4. Understand the Boundedness and Extreme Value properties of continuous functions on closed intervals, and explain what breaks if the interval is open or the function has a singularity.
  5. Recognise why every well-behaved neural network unit is a stack of continuous pieces — and why that is what makes gradient descent possible.

Why Bother With Properties?

So far in Chapter 3 we have treated continuity as a three-part test performed at one point at a time: f(c)f(c) is defined, limxcf(x)\lim_{x \to c} f(x) exists, and the two agree. That is fine for diagnosing a single suspect point. But nothing you will actually meet in applied mathematics is supplied as a raw point-by-point definition. Real functions are built out of pieces: polynomials multiplied together, trigonometric terms added to exponentials, neural-network layers composed one on top of the next.

It would be absurd to rerun the three-condition test at every real number for every new formula. What we need are closure rules: if f and g are continuous at cc, what can we say about f+gf + g, fgf \cdot g, f/gf / g, fgf \circ g? The answer — derived below from the limit laws of §2.6 — is as clean as it gets: continuity is preserved under every reasonable algebraic operation. That is why we almost never compute limits from scratch: we just point at the building blocks.

The big idea. Continuity propagates. Sums, products, quotients (where the denominator survives), and compositions of continuous functions are continuous. Once you have a small library of building blocks that you know are continuous, you can certify enormous classes of functions for free.

The Algebra of Continuity

Suppose ff and gg are continuous at cc. Then each of the following combinations inherits continuity at cc:

CombinationWhere continuousWhy
(f+g)(x)=f(x)+g(x)(f + g)(x) = f(x) + g(x)Everywhere f and g are continuous.Limit laws: lim (f + g) = lim f + lim g = f(c) + g(c).
(fg)(x)=f(x)g(x)(f - g)(x) = f(x) - g(x)Everywhere f and g are continuous.Limit laws again; special case of sum with coefficient −1 on g.
(cf)(x)=cf(x)(c \cdot f)(x) = c \cdot f(x)Everywhere f is continuous.Scalar multiple is a product with the constant function g(x) ≡ c, which is trivially continuous.
(fg)(x)=f(x)g(x)(f \cdot g)(x) = f(x)\, g(x)Everywhere f and g are continuous.Product law: lim (f·g) = (lim f)(lim g).
(f/g)(x)=f(x)/g(x)(f / g)(x) = f(x) / g(x)At any c where f and g are continuous and g(c)0g(c) \ne 0.Quotient law, but only when the denominator is nonzero.

Every one of these is a direct consequence of the limit laws of §2.6: the limit of a sum is the sum of the limits, and so on. Because continuity at cc is the statement limxcf(x)=f(c)\lim_{x\to c} f(x) = f(c), anything the limit laws preserve, continuity preserves as well.

The one thing that can fail: a vanishing denominator

The quotient rule has an escape clause. If g(c)=0g(c) = 0 then f/gf/g is not even defined at cc, so continuity there is moot — and one of two things happens just off cc:

  • If f(c)0f(c) \ne 0 the quotient blows up, producing an infinite discontinuity (§3.3, Type 3).
  • If f(c)=0f(c) = 0 as well, we get the indeterminate form 0/00/0. The limit might still exist (a removable hole) — but we can only know by further analysis, usually factoring or L'Hôpital's rule.
Example. R(x)=x21x1R(x) = \dfrac{x^{2}-1}{x-1} has an apparent denominator problem at x=1x=1. Because the numerator also vanishes and factors as (x1)(x+1)(x-1)(x+1), the hole is removable: limx1R(x)=2\lim_{x\to 1} R(x) = 2. The algebra of continuity tells us that R is continuous everywhere except x=1x=1; §3.3 tells us the discontinuity there is removable.
Consequence. Every polynomial is continuous on ℝ. Every rational function is continuous on its domain (i.e. everywhere the denominator is nonzero). That one sentence covers 99% of the functions you ever write down by hand.

Composition — Continuity Survives Nesting

The composition rule is arguably the most powerful of all — it is what lets us build towers of continuous functions such as sin(ex2+1)\sin(e^{x^{2}} + 1) or a deep neural network.

Composition Rule. If gg is continuous at cc and ff is continuous at g(c)g(c), then the composition (fg)(x)=f(g(x))(f \circ g)(x) = f(g(x)) is continuous at cc.

The proof is a direct chase of sequences. If xncx_n \to c, then by continuity of gg at cc we get g(xn)g(c)g(x_n) \to g(c). By continuity of ff at g(c)g(c) applied to this new sequence, f(g(xn))f(g(c))f(g(x_n)) \to f(g(c)). That is exactly the definition of fgf \circ g being continuous at cc.

What this buys you. To prove that sin(x2)\sin(x^{2}) is continuous on ℝ, you no longer compute a limit. You observe: xx2x \mapsto x^{2} is a polynomial, continuous on ℝ; sin\sin is continuous on ℝ; composition of continuous functions is continuous. Done. The chain can be of any length — cos(esin(ln(1+x2)))\cos(e^{\sin(\ln(1 + x^{2}))}) is continuous on all of ℝ by the same argument, applied layer by layer.

Sharp condition: the inner function must land where the outer is continuous. If f(u)=1/uf(u) = 1/u and g(x)=x1g(x) = x - 1, then (fg)(x)=1/(x1)(f \circ g)(x) = 1/(x - 1) fails the composition rule at x=1x = 1 — because the outer function ff is not continuous at the inner-output g(1)=0g(1) = 0. The composition rule hands you continuity only when the outer function's continuity point is genuinely where the inner function lands.

Interactive: Algebra-of-Continuity Explorer

Pick two building-block functions ff and gg, choose an operation, and slide the probe to any xx. The violet curve is the combined function. For division, dashed red verticals mark the problem points {x:g(x)=0}\{x : g(x) = 0\} — everywhere else, continuity is inherited for free.

Loading algebra-of-continuity explorer…
Try f(x)=1f(x) = 1 (constant) divided by g(x)=sin(x)g(x) = \sin(x). The combined curve is csc(x)\csc(x), which is continuous on its domain but blows up at every integer multiple of π\pi. The red dashes land exactly on those points — a visual proof of the "g(c)0g(c)\ne 0" caveat in the quotient rule.

Inverse and Monotone Functions

One more closure property is worth pinning down because it powers a huge amount of calculus to come.

Inverse of a continuous function. If f:[a,b]Rf : [a, b] \to \mathbb{R} is continuous and strictly monotone (either strictly increasing or strictly decreasing), then the inverse f1f^{-1} is also continuous on the image f([a,b])f([a,b]).

Why it matters. This one line guarantees the continuity of all of the following by just pairing them with the continuity of their generators:

  • x\sqrt{x} is continuous on [0,)[0, \infty) — it's the inverse of x2x^{2} on [0,)[0, \infty).
  • ln(x)\ln(x) is continuous on (0,)(0, \infty) — inverse of exe^{x}.
  • arctan(x)\arctan(x) is continuous on ℝ — inverse of tan(x)\tan(x) on (π/2,π/2)(-\pi/2, \pi/2).
Monotonicity is essential. Continuity alone is not enough. f(x)=x2f(x) = x^{2} is continuous on ℝ but not monotone there; it has no global inverse. The monotonicity hypothesis is precisely what lets you "undo" the function without ambiguity.

Boundedness on a Closed Interval

Up to now every rule has been local — something is continuous at a single point cc. The next two theorems make a global statement about what continuous functions do on a full interval.

Boundedness Theorem. If f:[a,b]Rf : [a, b] \to \mathbb{R} is continuous on the closed bounded interval [a,b][a, b], then ff is bounded: there exist finite numbers m,Mm, M such that mf(x)Mm \le f(x) \le M for every x[a,b]x \in [a, b].

Intuition. A continuous graph drawn without lifting the pen, over a finite horizontal window that includes both endpoints, simply cannot escape to ±\pm\infty. The pen has to stay on the page. If it tried to shoot up to ++\infty along the way, there would be a point where the function is not defined, or where the limit is infinite — both of which would contradict continuity.

Both hypotheses are necessary.

Condition droppedCounter-exampleWhat goes wrong
Not closed (open interval)f(x)=1/xf(x) = 1/x on (0,1](0, 1]Blows up as x → 0⁺. The continuous function is unbounded because it can chase a limit towards the missing endpoint.
Not bounded (infinite interval)f(x)=xf(x) = x on [0,)[0, \infty)Continuous and runs to +∞. Closedness alone doesn't help if the interval itself is unbounded.
Not continuousf(x)=1/xf(x) = 1/x patched with f(0)=0f(0) = 0 on [0,1][0, 1]Now defined on the closed interval, but discontinuous at 0 — and unbounded.

The Extreme Value Theorem (Preview)

Boundedness tells us that finite bounds m,Mm, M exist. The Extreme Value Theorem says something sharper: those bounds are actually reached.

Extreme Value Theorem (EVT). If f:[a,b]Rf : [a, b] \to \mathbb{R} is continuous on the closed bounded interval [a,b][a, b], then there exist points xmin,xmax[a,b]x_{\min}, x_{\max} \in [a, b] such that f(xmin)f(x)f(xmax)f(x_{\min}) \le f(x) \le f(x_{\max}) for every x[a,b]x \in [a, b].

In other words, ff attains its maximum M=f(xmax)M = f(x_{\max}) and its minimum m=f(xmin)m = f(x_{\min}) on the interval — not merely approaches them.

Why "attained" matters. On an open interval the supremum may not be reached. Take f(x)=xf(x) = x on (0,1)(0, 1): the supremum is 1 but never attained — 1 is not in the domain. EVT's closed-interval hypothesis plugs exactly this hole.

EVT is the workhorse that later guarantees: a continuous loss function defined on a closed, bounded parameter region has an actual minimum; the global maximum of a continuous utility function over a compact feasible set is attained; a classic calculus optimisation problem "find the max of f(x)f(x) on [a,b][a, b]" has an answer you can actually write down. We will prove EVT formally later in §3.6; for now the picture is enough.


Interactive: Bounded & Extreme Value Explorer

Slide the endpoints aa and bb, pick a function, and toggle the interval type. Green dashed line = max MM, red dashed line = min mm. The coloured dots mark where the extremes are attained. Switch to the 1/(x1)1/(x-1) preset and straddle x=1x = 1 to watch EVT fail — the function is no longer continuous on the interval.

Loading bounded-interval explorer…

Worked Example — Continuity of a Neural Network Unit

Before running the Python, work through the certification by hand. Try it on your own paper first; unfold only when you've tried a step.

Click to reveal — certify N(x)=σ(w2ReLU(w1x+b1)+b2)N(x) = \sigma\bigl(w_2\,\text{ReLU}(w_1 x + b_1) + b_2\bigr) is continuous on ℝ
Step 1 — Identify the building blocks.

The five atomic pieces are:

  • u1(x)=w1x+b1u_1(x) = w_1 x + b_1 — affine (polynomial of degree 1).
  • r(u)=max(0,u)=ReLU(u)r(u) = \max(0, u) = \text{ReLU}(u) — the ReLU activation.
  • u2(h)=w2h+b2u_2(h) = w_2 h + b_2 — another affine map.
  • σ(y)=1/(1+ey)\sigma(y) = 1/(1 + e^{-y}) — the sigmoid.
Step 2 — Certify each piece on its own.
  • u1u_1, u2u_2: polynomials, continuous on ℝ by the algebra rules.
  • rr (ReLU): continuous everywhere. The only suspect point is u=0u = 0, where limu0max(0,u)=0\lim_{u\to 0^-} \max(0,u) = 0, limu0+max(0,u)=0\lim_{u\to 0^+} \max(0,u) = 0, and max(0,0)=0\max(0, 0) = 0. All three agree.
  • σ\sigma: built from eye^{y} (continuous on ℝ), addition (continuous), and division (continuous wherever the denominator 1+ey>01 + e^{-y} > 0 is nonzero — which is always). So σ\sigma is continuous on ℝ.
Step 3 — Apply the composition rule, one layer at a time.
  • u1u_1 is continuous at every cc. Its output lands somewhere in ℝ.
  • rr is continuous at every real number, so it is continuous at u1(c)u_1(c). Hence ru1r \circ u_1 is continuous at cc.
  • Another affine map and another composition: u2ru1u_2 \circ r \circ u_1 is continuous.
  • Finally σ\sigma is continuous everywhere, so N(x)=σu2ru1N(x) = \sigma \circ u_2 \circ r \circ u_1 is continuous on ℝ.
Step 4 — Evaluate at one point as a sanity check.

Take w1=0.8w_1 = 0.8, b1=0.1b_1 = 0.1, w2=0.5w_2 = 0.5, b2=0.05b_2 = 0.05, x=1x = 1.

  1. u1=0.81+0.1=0.9u_1 = 0.8 \cdot 1 + 0.1 = 0.9.
  2. r(0.9)=max(0,0.9)=0.9r(0.9) = \max(0, 0.9) = 0.9.
  3. u2=0.50.9+0.05=0.5u_2 = 0.5 \cdot 0.9 + 0.05 = 0.5.
  4. σ(0.5)=1/(1+e0.5)0.6225\sigma(0.5) = 1/(1 + e^{-0.5}) \approx 0.6225.

Perturb xx by 10510^{-5} and the output changes by about 10610^{-6}. Continuous.


Python: Verifying Algebraic Continuity Numerically

We now turn the statements of the algebra-of-continuity theorem into a tiny numerical verifier. Given any function hh constructed out of our two building blocks, the code evaluates it at a candidate point cc, at cεc - \varepsilon, and at c+εc + \varepsilon, then reports whether all three values agree to within tolerance. If they do — continuity holds numerically. Click any line of the panel to see the exact values flowing through that line.

Algebra-of-Continuity Verifier — Interactive
🐍algebra_of_continuity.py
1import math

Pulls in Python's scalar math library — gives us math.sin so we can define g(x) = sin(x). Only scalar functions are needed here; NumPy is not required for this single-point verification.

EXECUTION STATE
math = Standard-library module for scalar math. Provides sin, cos, exp, log, sqrt, pi, e. Pure-Python and always available — no pip install needed.
3# Two continuous building blocks (comment)

Narrative marker: we begin by declaring two functions that are known continuous on all of ℝ. Every new function we assemble from them will inherit continuity automatically — that's the theorem the code is about to verify.

4def f(x): → x²

Defines f(x) = x². Polynomials are continuous everywhere because they are built out of (a) the identity x, which is continuous, and (b) products and sums, which preserve continuity.

EXECUTION STATE
⬇ input: x = A real number. Example: x = 1.5. The function is defined for every real input with no exceptions.
⬆ returns = x * x — a float. Example: f(1.5) = 1.5 × 1.5 = 2.25.
→ why continuous? = x² = x · x. Both factors are the identity, which is trivially continuous. The product rule for continuity (below) then gives continuity of x².
5Docstring — f is a polynomial

Human-readable note: this is a polynomial, continuous on all of ℝ. Picked up by help(f) and IDE tool-tips so the intent is visible without re-reading the body.

6return x * x

Element-level multiplication. For scalar x the * operator is plain float multiplication. This single line IS the polynomial.

EXECUTION STATE
* (float multiply) = Python operator: multiplies two floats (or ints) pair-wise.
Example = x = 1.5 → x * x = 2.25
8def g(x): → sin(x)

Defines g(x) = sin(x). Trigonometric functions are continuous everywhere on ℝ. (Proof comes from bounding |sin(a) − sin(b)| ≤ |a − b|, which forces continuity at every point.)

EXECUTION STATE
⬇ input: x = A real number — any angle in radians.
⬆ returns = math.sin(x), a float in [−1, 1]. Example: g(1.5) = sin(1.5) ≈ 0.997495.
📚 math.sin = Standard-library scalar sine. Accepts radians (not degrees). sin(0) = 0, sin(π/2) = 1, sin(π) ≈ 1.22e-16 (tiny float noise).
9Docstring — g is trigonometric

Records that g is continuous on all of ℝ — another building block.

10return math.sin(x)

Delegate to the standard library. math.sin is implemented in C, fast and accurate to about 15 decimal digits.

12# Numerical continuity probe (comment)

Introduces the helper that actually tests continuity numerically. Continuity at c means h(c ± ε) → h(c) as ε → 0; we pick a small ε and check both sides.

13def is_continuous_at(h_fn, c, eps=1e-5, tol=1e-3)

A general-purpose continuity probe. Given any function h_fn and a point c, it evaluates h at c, c − ε, c + ε and decides whether all three are within tol of each other.

EXECUTION STATE
⬇ input: h_fn = The function under test. Must accept a float and return a float. In our run we will pass lambda x: f(x) + g(x), etc.
⬇ input: c = The candidate point where continuity is being checked. Example: c = 1.5.
⬇ input: eps = 1e-5 = Step size away from c on each side. Small enough to resolve fine behavior; large enough to avoid 1e-15 floating-point cancellation. Default 1e-5 is the usual sweet spot.
⬇ input: tol = 1e-3 = Tolerance on the absolute gap between h(c) and h(c ± ε). Anything below this is counted as 'equal'. Default 1e-3 works for smooth functions with moderate slope.
⬆ returns = (bool, v_c, v_left, v_right) — continuous-flag plus the three sample values so the caller can report them.
14Docstring — the continuity test

Tells the reader: we compare h(c) with h(c ± ε) and pass if both gaps are below tol.

15v_c = h_fn(c)

Evaluate h at the candidate point itself. For h = f + g at c = 1.5 we get 1.5² + sin(1.5) ≈ 2.25 + 0.997495 = 3.247495.

EXECUTION STATE
v_c = The value of h at c. Example (sum): 3.247495.
16v_left = h_fn(c - eps)

Evaluate just below c. For h = f + g at c = 1.5, eps = 1e-5: h(1.49999) ≈ 1.49999² + sin(1.49999) ≈ 3.247464.

EXECUTION STATE
c - eps = 1.5 − 1e-5 = 1.49999.
v_left (example for sum) = 3.247464
17v_right = h_fn(c + eps)

Mirror of the previous line — sample just above c. Example: h(1.50001) ≈ 3.247526.

EXECUTION STATE
v_right (example for sum) = 3.247526
18gap_L = abs(v_c - v_left)

Absolute distance between the centre and the left probe. For the sum example: |3.247495 − 3.247464| = 3.1e-5 — comfortably below tol = 1e-3.

EXECUTION STATE
📚 abs() = Python built-in: returns the non-negative magnitude. abs(-0.003) = 0.003.
gap_L (example) = ≈ 3.1e-5
19gap_R = abs(v_c - v_right)

Right-side absolute distance. Same interpretation. For the sum example: ≈ 3.1e-5.

20return gap_L < tol and gap_R < tol, v_c, v_left, v_right

Compare both gaps against the tolerance, AND them together, and bundle the whole result into a tuple. Python packs comma-separated returns into a tuple automatically.

EXECUTION STATE
→ example return = (True, 3.247495, 3.247464, 3.247526)
22# Build new functions (comment)

Transition to the interesting part: assembling new functions out of f and g using the algebra-of-continuity theorems (sum, product, composition).

23sum_fn = lambda x: f(x) + g(x)

One-line anonymous function. Evaluates f at x, evaluates g at x, adds. By the sum rule, sum_fn is continuous wherever both f and g are continuous — i.e. everywhere on ℝ.

EXECUTION STATE
📚 lambda = Python's anonymous-function syntax. 'lambda params: expr' creates a callable without giving it a def block. Useful for one-liners.
Example = sum_fn(1.5) = 2.25 + 0.997495 = 3.247495
24product_fn = lambda x: f(x) * g(x)

Product of the two building blocks. Continuous wherever both factors are, by the product rule.

EXECUTION STATE
Example = product_fn(1.5) = 2.25 × 0.997495 = 2.244364
25compose_fn = lambda x: g(f(x))

Composition (g ∘ f)(x) = sin(x²). Continuous because f is continuous at x and g is continuous at f(x). The outer function lands on a continuous argument, so the chain stays continuous.

EXECUTION STATE
Example = compose_fn(1.5) = sin(1.5²) = sin(2.25) ≈ 0.778073
→ why the composition rule? = If x_n → c then f(x_n) → f(c) (f continuous at c), and then g(f(x_n)) → g(f(c)) (g continuous at f(c)). The sequence definition of continuity composes cleanly.
27# Probe at x = 1.5 (comment)

We pick c = 1.5 because sin(1.5) ≠ 0, so the quotient f/g would also be fine here — although we do not test the quotient to keep the demo short.

28c = 1.5

The test point.

EXECUTION STATE
c = 1.5
29for name, h_fn in [...]: — loop over the three new functions

Python-idiomatic for loop that iterates through three (name, function) tuples. Each pass binds name and h_fn to one of the constructed functions.

LOOP TRACE · 3 iterations
name='f + g', h_fn=sum_fn
h_fn(c) = 3.247495
h_fn(c − ε) = 3.247464
h_fn(c + ε) = 3.247526
verdict = continuous ✓
name='f * g', h_fn=product_fn
h_fn(c) = 2.244364
h_fn(c − ε) = 2.244387
h_fn(c + ε) = 2.244341
verdict = continuous ✓
name='g o f', h_fn=compose_fn
h_fn(c) = 0.778073
h_fn(c − ε) = 0.778092
h_fn(c + ε) = 0.778054
verdict = continuous ✓
32ok, v, L, R = is_continuous_at(h_fn, c)

Call the probe and immediately unpack the returned tuple into four named variables. Python's starred-assignment syntax lets us read the result as 'ok, value, left, right' without writing result[0], result[1], …

EXECUTION STATE
ok = Boolean — True when both gaps are below tol.
v, L, R = h(c), h(c − ε), h(c + ε) — the raw numerical samples.
33verdict = "continuous" if ok else "DISCONTINUOUS"

Python's conditional expression. The word 'continuous' is chosen if ok is True, otherwise 'DISCONTINUOUS' (uppercase for visual contrast).

EXECUTION STATE
📚 ternary if = Expression-level conditional: 'A if cond else B'. Evaluates to A when cond is truthy, B otherwise. Distinct from the statement-level if/else.
34print(f"{name:<8} at c={c}: …")

Formatted output. :<8 left-pads the name to 8 characters; :.6f renders each float with six decimal places.

EXECUTION STATE
Final printed output =
f + g    at c=1.5: h(c)=3.247495  L=3.247464  R=3.247526  -> continuous
f * g    at c=1.5: h(c)=2.244364  L=2.244387  R=2.244341  -> continuous
g o f    at c=1.5: h(c)=0.778073  L=0.778092  R=0.778054  -> continuous
Takeaway = Sums, products, and compositions of continuous building blocks are all continuous — the theorem 'holds' numerically to 4-5 decimal places of agreement.
7 lines without explanation
1import math
2
3# Two continuous building blocks
4def f(x):
5    """f(x) = x^2 — polynomial, continuous everywhere on R."""
6    return x * x
7
8def g(x):
9    """g(x) = sin(x) — trigonometric, continuous everywhere on R."""
10    return math.sin(x)
11
12# Numerical continuity probe: compare value at c with values on either side
13def is_continuous_at(h_fn, c, eps=1e-5, tol=1e-3):
14    """Return True if |h(c+-eps) - h(c)| < tol for both sides."""
15    v_c     = h_fn(c)
16    v_left  = h_fn(c - eps)
17    v_right = h_fn(c + eps)
18    gap_L = abs(v_c - v_left)
19    gap_R = abs(v_c - v_right)
20    return gap_L < tol and gap_R < tol, v_c, v_left, v_right
21
22# Build new functions from f and g using the algebra of continuity
23sum_fn      = lambda x: f(x) + g(x)            # f + g
24product_fn  = lambda x: f(x) * g(x)            # f * g
25compose_fn  = lambda x: g(f(x))                # (g o f)(x) = sin(x^2)
26
27# Probe at x = 1.5 — a "safe" point where sin(x) != 0
28c = 1.5
29for name, h_fn in [("f + g", sum_fn),
30                   ("f * g", product_fn),
31                   ("g o f", compose_fn)]:
32    ok, v, L, R = is_continuous_at(h_fn, c)
33    verdict = "continuous" if ok else "DISCONTINUOUS"
34    print(f"{name:<8} at c={c}: h(c)={v:.6f}  L={L:.6f}  R={R:.6f}  -> {verdict}")

PyTorch: Why Continuous Building Blocks Matter

The punchline of this section in applied terms: a neural network is a long composition of continuous functions. Every affine layer is a polynomial (continuous); every activation we ever use in practice — ReLU, GELU, tanh, sigmoid — is continuous by construction; every loss function we care about (MSE, cross-entropy) is continuous in its inputs. The algebra of continuity is the reason gradient descent has anything to follow.

The snippet below builds the smallest possible two-layer MLP and then verifies continuity numerically by perturbing the input.

A Neural Unit is a Stack of Continuous Maps — Interactive
🐍continuous_neural_unit.py
1import torch

PyTorch is a tensor library with automatic differentiation. We use it here purely for tensor arithmetic; in a real training loop the same code would also record a computation graph for backprop.

EXECUTION STATE
torch = Deep-learning tensor library. Provides torch.Tensor (multidimensional array with autograd), torch.nn (layer modules), torch.optim (SGD, Adam, …), torch.nn.functional (stateless ops like relu, sigmoid).
2import torch.nn.functional as F

Imports the stateless functional API under the alias F. Gives us F.relu, F.sigmoid, F.softmax, etc. — pure functions with no learnable parameters of their own.

EXECUTION STATE
torch.nn.functional = A namespace of plain functions (no nn.Module wrappers). F.relu(x) = max(0, x) element-wise; F.sigmoid(x) = 1 / (1 + exp(-x)).
4# Tiny MLP neuron (comment)

Narrative anchor: we will build the smallest possible two-layer MLP — two input features, two hidden units, one output — and observe that the whole thing is a composition of continuous maps.

7W1 = torch.tensor([[0.8, -0.3], [0.2, 0.6]])

First-layer weight matrix, shape (2, 2). Multiplying W1 by a length-2 input vector produces a length-2 hidden pre-activation. Values are fixed (not trainable) so the example stays deterministic.

EXECUTION STATE
📚 torch.tensor() = Factory that builds a tensor from nested Python lists. Here we pass a 2×2 list → tensor of shape (2, 2), default dtype float32.
⬇ arg: [[0.8, -0.3], [0.2, 0.6]] = The numeric entries. Row 0 = [0.8, -0.3] controls the first hidden unit, row 1 = [0.2, 0.6] controls the second.
⬆ result: W1 (2×2) =
     col0   col1
row0 0.80  -0.30
row1 0.20   0.60
9b1 = torch.tensor([0.1, -0.2])

First-layer bias, shape (2,). After W1 @ x we add this bias component-wise. Its purpose is to shift the pre-activation so the ReLU threshold can fall in different places for different units.

EXECUTION STATE
⬆ result: b1 = [0.1, -0.2] — a 1-D tensor of length 2.
10W2 = torch.tensor([[0.5, -0.4]])

Second-layer weight matrix, shape (1, 2). It takes the 2-D hidden vector down to a 1-D output.

EXECUTION STATE
⬆ result: W2 (1×2) =
     col0   col1
row0 0.50  -0.40
11b2 = torch.tensor([0.05])

Output bias, shape (1,).

EXECUTION STATE
⬆ result: b2 = [0.05]
13def neuron(x):

Defines the forward pass. Each line inside is one continuous operation; stacked together, they form a single continuous function ℝ² → (0, 1).

EXECUTION STATE
⬇ input: x = A 1-D tensor of shape (2,). Example: torch.tensor([1.0, -0.5]).
⬆ returns = A 1-D tensor of shape (1,) with a value in (0, 1) — the neuron's probability estimate.
14Docstring — hidden layer + output

States the architecture: one ReLU hidden layer followed by a sigmoid output head.

15z = W1 @ x + b1 — affine pre-activation

Matrix-vector multiply plus bias. This is an affine map, which is a linear map plus a translation — both of which are continuous (and in fact infinitely smooth).

EXECUTION STATE
📚 @ (matmul) = PyTorch's matrix-multiplication operator. For W1 shape (2, 2) and x shape (2,) the result has shape (2,).
W1 @ x = = [0.8×1.0 + (-0.3)×(-0.5), 0.2×1.0 + 0.6×(-0.5)] = [0.8 + 0.15, 0.2 - 0.3] = [0.95, -0.10]
⬆ result: z = W1 @ x + b1 = [0.95 + 0.1, -0.10 + (-0.2)] = [1.05, -0.30]
16h = F.relu(z) — ReLU activation

Applies ReLU element-wise: max(0, z_i). ReLU is continuous on all of ℝ — at z = 0 both one-sided limits equal 0 and the value is 0. It has a corner (not differentiable), but the function itself is perfectly continuous.

EXECUTION STATE
📚 F.relu() = Element-wise ReLU: zeroes out negatives, passes positives through unchanged. Implementation: torch.clamp(x, min=0.0).
⬇ arg: z = [1.05, -0.30]
⬆ result: h = [1.05, 0.00] — the negative entry gets clamped to zero.
→ continuous? yes = L = lim_{z→0-} max(0, z) = 0. R = lim_{z→0+} max(0, z) = 0. Value at 0 is 0. All three agree → continuous at the corner.
17y = W2 @ h + b2 — second affine

Another affine layer, compressing the 2-D hidden vector into a single output scalar.

EXECUTION STATE
W2 @ h = 0.5×1.05 + (-0.4)×0.0 = 0.525
⬆ result: y = W2 @ h + b2 = [0.525 + 0.05] = [0.575]
18return torch.sigmoid(y)

Final squish. σ(y) = 1 / (1 + e^-y) is C^∞ (smooth of every order), hence continuous everywhere. Output lands safely in (0, 1) so it can be interpreted as a probability.

EXECUTION STATE
📚 torch.sigmoid() = σ(y) = 1 / (1 + exp(-y)). Monotone increasing from 0 (at y = -∞) to 1 (at y = +∞). σ(0) = 0.5. σ is infinitely differentiable.
⬇ arg: y = [0.575]
⬆ result = σ(0.575) = 1 / (1 + e^{-0.575}) = 0.639916
20# Forward pass (comment)

Transitions from definition to execution. Time to call the function.

21x = torch.tensor([1.0, -0.5])

Input feature vector — two numbers standing in for two real-world measurements.

EXECUTION STATE
x = tensor([1.0, -0.5]) — shape (2,)
22out = neuron(x)

Runs the forward pass. Each of the inner ops is continuous, so the composed function is continuous — the output depends continuously on x.

EXECUTION STATE
⬆ out = tensor([0.639916])
23print(f"neuron(x) = {out.item():.6f}")

Reports the scalar output. .item() pulls the Python float out of the 1-element tensor.

EXECUTION STATE
Printed = neuron(x) = 0.639916
25# Numerical continuity check (comment)

Now the moment of truth: perturb x by ±ε in its first coordinate and measure how much the output moves. A continuous function must move only a tiny amount when x moves a tiny amount.

26eps = 1e-5

Step size for the perturbation.

EXECUTION STATE
eps = 1e-5 = 0.00001
27x_left = x + torch.tensor([-eps, 0.0])

Shifts the first coordinate of x down by ε, leaving the second coordinate untouched.

EXECUTION STATE
x_left = tensor([0.99999, -0.5])
28x_right = x + torch.tensor([eps, 0.0])

Mirror shift in the positive direction.

EXECUTION STATE
x_right = tensor([1.00001, -0.5])
29print(f"|neuron(x+-eps) - neuron(x)| = ...")

Prints the left and right response gaps. For a continuous map with finite slope we expect both gaps to be on the order of ε — roughly 1e-5.

EXECUTION STATE
Printed output = |neuron(x-eps) - neuron(x)| = 1.15e-06, |neuron(x+eps) - neuron(x)| = 1.15e-06
Takeaway = A 1e-5 perturbation in x produces a ~1e-6 perturbation in the output. The neuron is numerically continuous — exactly the property needed for gradient-based optimization to work.
9 lines without explanation
1import torch
2import torch.nn.functional as F
3
4# A tiny MLP "neuron": y = sigmoid( W2 @ ReLU(W1 @ x + b1) + b2 )
5# Every layer is a composition of continuous maps, so the whole thing
6# is continuous -> gradients flow -> the network can be trained.
7W1 = torch.tensor([[0.8, -0.3],
8                   [0.2,  0.6]])
9b1 = torch.tensor([0.1, -0.2])
10W2 = torch.tensor([[0.5, -0.4]])
11b2 = torch.tensor([0.05])
12
13def neuron(x):
14    """One hidden layer (ReLU) + output (sigmoid)."""
15    z = W1 @ x + b1              # affine: continuous (polynomial)
16    h = F.relu(z)                # ReLU: continuous everywhere
17    y = W2 @ h + b2              # affine: continuous
18    return torch.sigmoid(y)      # sigmoid: C^infty, squashes to (0,1)
19
20# Forward pass at a single input
21x  = torch.tensor([1.0, -0.5])
22out = neuron(x)
23print(f"neuron(x) = {out.item():.6f}")
24
25# Numerical continuity check on the composed map
26eps = 1e-5
27x_left  = x + torch.tensor([-eps, 0.0])
28x_right = x + torch.tensor([ eps, 0.0])
29print(f"|neuron(x+-eps) - neuron(x)| = "
30      f"{abs(neuron(x_left).item() - out.item()):.2e},  "
31      f"{abs(neuron(x_right).item() - out.item()):.2e}")

Where These Properties Show Up

DomainProperty usedConcrete example
Physics — conservation lawsAlgebra + compositionKinetic energy (½ m v²) × a continuous velocity profile → continuous kinetic energy everywhere the profile is continuous.
Optimisation / economicsBoundedness + EVTA continuous cost function on a compact (closed & bounded) feasible set is guaranteed to achieve a global minimum — so the optimisation problem is well-posed.
Numerical analysisComposition + EVTError bounds for polynomial and spline approximation rely on f being continuous on [a, b] so its extreme values exist as worst-case bounds.
Signal processingSum, product, compositionFilters are rational transfer functions composed with the input signal — continuous inputs yield continuous outputs away from pole singularities.
Machine learningAll of themDeep networks are compositions of continuous layers. Gradient descent only works because the loss is continuous (and almost everywhere differentiable) in the parameters.
Control theoryEVT on a closed horizonThe maximum control effort over a finite time window [0, T] is attained — no infinite-effort pathology if dynamics are continuous.

Common Pitfalls

  • Forgetting the quotient's fine print. f/gf/g is continuous wherever f and g are continuous and g0g \ne 0. Drop the second clause and you can incorrectly certify tan(x)=sin(x)/cos(x)\tan(x) = \sin(x)/\cos(x) as continuous on ℝ — it is not, because cos(x)\cos(x) vanishes at π/2+kπ\pi/2 + k\pi.
  • Composition hypothesis at the inner output, not the inner input. The composition rule asks that the outer function be continuous at g(c)g(c), not at cc. A common mistake is to prove gg is continuous at some point and then forget to check where gg actually lands.
  • Applying EVT to an open interval. "Continuous on (a,b)(a, b)" is not enough. The classic counter-example is f(x)=xf(x) = x on (0,1)(0, 1): the supremum 1 is not attained. Closedness is the hypothesis that prevents the escape.
  • Confusing continuity with differentiability. ReLU is continuous but not differentiable at 0. The algebra-of-continuity rules do not imply anything about smoothness — they only say the combined function has no jumps, holes, or blowups.
  • Assuming monotonicity is free. The inverse-continuity theorem requires strict monotonicity on top of continuity. Forget it and you cannot invert x2x^{2} globally — you have to pick a branch first (positive or negative), each of which is monotone.

Summary

  1. Algebra of continuity. Sums, differences, scalar multiples, and products of continuous functions are continuous. Quotients are continuous except where the denominator vanishes.
  2. Composition rule. If gg is continuous at cc and ff is continuous at g(c)g(c), then fgf \circ g is continuous at cc.
  3. Inverse rule. A continuous and strictly monotone function on [a,b][a,b] has a continuous inverse on its image — giving us x\sqrt{x}, lnx\ln x, arctanx\arctan x for free.
  4. Boundedness. Continuous on [a,b][a, b] ⇒ bounded on [a,b][a, b]. Both hypotheses — closed and bounded — are needed.
  5. Extreme Value Theorem. On a closed bounded interval, a continuous function attains its maximum and minimum at actual points of the interval. This underwrites the existence of solutions to all the optimisation problems you will meet.
  6. Applied payoff. A neural network is a composition of affine maps and continuous activations. The algebra rules certify continuity of the whole in one sentence — and that is the condition that lets gradient descent work.

Next, §3.5 turns continuity into a tool: the Intermediate Value Theorem, which says a continuous function can't jump over a value it needs to reach — the conceptual backbone of root-finding algorithms.

Loading comments...