Learning Objectives
By the end of this section you will be able to:
- State and justify the algebra of continuity — sums, differences, products, quotients, and compositions of continuous functions are continuous.
- Certify that a function is continuous by decomposing it into known-continuous building blocks, instead of rerunning the three-condition test from §3.2 from scratch.
- Predict the only way an algebraic combination can fail: a denominator vanishing, or a composition landing on a point outside the inner function's domain.
- Understand the Boundedness and Extreme Value properties of continuous functions on closed intervals, and explain what breaks if the interval is open or the function has a singularity.
- Recognise why every well-behaved neural network unit is a stack of continuous pieces — and why that is what makes gradient descent possible.
Why Bother With Properties?
So far in Chapter 3 we have treated continuity as a three-part test performed at one point at a time: is defined, exists, and the two agree. That is fine for diagnosing a single suspect point. But nothing you will actually meet in applied mathematics is supplied as a raw point-by-point definition. Real functions are built out of pieces: polynomials multiplied together, trigonometric terms added to exponentials, neural-network layers composed one on top of the next.
It would be absurd to rerun the three-condition test at every real number for every new formula. What we need are closure rules: if f and g are continuous at , what can we say about , , , ? The answer — derived below from the limit laws of §2.6 — is as clean as it gets: continuity is preserved under every reasonable algebraic operation. That is why we almost never compute limits from scratch: we just point at the building blocks.
The big idea. Continuity propagates. Sums, products, quotients (where the denominator survives), and compositions of continuous functions are continuous. Once you have a small library of building blocks that you know are continuous, you can certify enormous classes of functions for free.
The Algebra of Continuity
Suppose and are continuous at . Then each of the following combinations inherits continuity at :
| Combination | Where continuous | Why |
|---|---|---|
| Everywhere f and g are continuous. | Limit laws: lim (f + g) = lim f + lim g = f(c) + g(c). | |
| Everywhere f and g are continuous. | Limit laws again; special case of sum with coefficient −1 on g. | |
| Everywhere f is continuous. | Scalar multiple is a product with the constant function g(x) ≡ c, which is trivially continuous. | |
| Everywhere f and g are continuous. | Product law: lim (f·g) = (lim f)(lim g). | |
| At any c where f and g are continuous and . | Quotient law, but only when the denominator is nonzero. |
Every one of these is a direct consequence of the limit laws of §2.6: the limit of a sum is the sum of the limits, and so on. Because continuity at is the statement , anything the limit laws preserve, continuity preserves as well.
The one thing that can fail: a vanishing denominator
The quotient rule has an escape clause. If then is not even defined at , so continuity there is moot — and one of two things happens just off :
- If the quotient blows up, producing an infinite discontinuity (§3.3, Type 3).
- If as well, we get the indeterminate form . The limit might still exist (a removable hole) — but we can only know by further analysis, usually factoring or L'Hôpital's rule.
Composition — Continuity Survives Nesting
The composition rule is arguably the most powerful of all — it is what lets us build towers of continuous functions such as or a deep neural network.
The proof is a direct chase of sequences. If , then by continuity of at we get . By continuity of at applied to this new sequence, . That is exactly the definition of being continuous at .
What this buys you. To prove that is continuous on ℝ, you no longer compute a limit. You observe: is a polynomial, continuous on ℝ; is continuous on ℝ; composition of continuous functions is continuous. Done. The chain can be of any length — is continuous on all of ℝ by the same argument, applied layer by layer.
Interactive: Algebra-of-Continuity Explorer
Pick two building-block functions and , choose an operation, and slide the probe to any . The violet curve is the combined function. For division, dashed red verticals mark the problem points — everywhere else, continuity is inherited for free.
Inverse and Monotone Functions
One more closure property is worth pinning down because it powers a huge amount of calculus to come.
Why it matters. This one line guarantees the continuity of all of the following by just pairing them with the continuity of their generators:
- is continuous on — it's the inverse of on .
- is continuous on — inverse of .
- is continuous on ℝ — inverse of on .
Boundedness on a Closed Interval
Up to now every rule has been local — something is continuous at a single point . The next two theorems make a global statement about what continuous functions do on a full interval.
Intuition. A continuous graph drawn without lifting the pen, over a finite horizontal window that includes both endpoints, simply cannot escape to . The pen has to stay on the page. If it tried to shoot up to along the way, there would be a point where the function is not defined, or where the limit is infinite — both of which would contradict continuity.
Both hypotheses are necessary.
| Condition dropped | Counter-example | What goes wrong |
|---|---|---|
| Not closed (open interval) | on | Blows up as x → 0⁺. The continuous function is unbounded because it can chase a limit towards the missing endpoint. |
| Not bounded (infinite interval) | on | Continuous and runs to +∞. Closedness alone doesn't help if the interval itself is unbounded. |
| Not continuous | patched with on | Now defined on the closed interval, but discontinuous at 0 — and unbounded. |
The Extreme Value Theorem (Preview)
Boundedness tells us that finite bounds exist. The Extreme Value Theorem says something sharper: those bounds are actually reached.
In other words, attains its maximum and its minimum on the interval — not merely approaches them.
EVT is the workhorse that later guarantees: a continuous loss function defined on a closed, bounded parameter region has an actual minimum; the global maximum of a continuous utility function over a compact feasible set is attained; a classic calculus optimisation problem "find the max of on " has an answer you can actually write down. We will prove EVT formally later in §3.6; for now the picture is enough.
Interactive: Bounded & Extreme Value Explorer
Slide the endpoints and , pick a function, and toggle the interval type. Green dashed line = max , red dashed line = min . The coloured dots mark where the extremes are attained. Switch to the preset and straddle to watch EVT fail — the function is no longer continuous on the interval.
Worked Example — Continuity of a Neural Network Unit
Before running the Python, work through the certification by hand. Try it on your own paper first; unfold only when you've tried a step.
Click to reveal — certify is continuous on ℝ
The five atomic pieces are:
- — affine (polynomial of degree 1).
- — the ReLU activation.
- — another affine map.
- — the sigmoid.
- , : polynomials, continuous on ℝ by the algebra rules.
- (ReLU): continuous everywhere. The only suspect point is , where , , and . All three agree.
- : built from (continuous on ℝ), addition (continuous), and division (continuous wherever the denominator is nonzero — which is always). So is continuous on ℝ.
- is continuous at every . Its output lands somewhere in ℝ.
- is continuous at every real number, so it is continuous at . Hence is continuous at .
- Another affine map and another composition: is continuous.
- Finally is continuous everywhere, so is continuous on ℝ.
Take , , , , .
- .
- .
- .
- .
Perturb by and the output changes by about . Continuous.
Python: Verifying Algebraic Continuity Numerically
We now turn the statements of the algebra-of-continuity theorem into a tiny numerical verifier. Given any function constructed out of our two building blocks, the code evaluates it at a candidate point , at , and at , then reports whether all three values agree to within tolerance. If they do — continuity holds numerically. Click any line of the panel to see the exact values flowing through that line.
PyTorch: Why Continuous Building Blocks Matter
The punchline of this section in applied terms: a neural network is a long composition of continuous functions. Every affine layer is a polynomial (continuous); every activation we ever use in practice — ReLU, GELU, tanh, sigmoid — is continuous by construction; every loss function we care about (MSE, cross-entropy) is continuous in its inputs. The algebra of continuity is the reason gradient descent has anything to follow.
The snippet below builds the smallest possible two-layer MLP and then verifies continuity numerically by perturbing the input.
Where These Properties Show Up
| Domain | Property used | Concrete example |
|---|---|---|
| Physics — conservation laws | Algebra + composition | Kinetic energy (½ m v²) × a continuous velocity profile → continuous kinetic energy everywhere the profile is continuous. |
| Optimisation / economics | Boundedness + EVT | A continuous cost function on a compact (closed & bounded) feasible set is guaranteed to achieve a global minimum — so the optimisation problem is well-posed. |
| Numerical analysis | Composition + EVT | Error bounds for polynomial and spline approximation rely on f being continuous on [a, b] so its extreme values exist as worst-case bounds. |
| Signal processing | Sum, product, composition | Filters are rational transfer functions composed with the input signal — continuous inputs yield continuous outputs away from pole singularities. |
| Machine learning | All of them | Deep networks are compositions of continuous layers. Gradient descent only works because the loss is continuous (and almost everywhere differentiable) in the parameters. |
| Control theory | EVT on a closed horizon | The maximum control effort over a finite time window [0, T] is attained — no infinite-effort pathology if dynamics are continuous. |
Common Pitfalls
- Forgetting the quotient's fine print. is continuous wherever f and g are continuous and . Drop the second clause and you can incorrectly certify as continuous on ℝ — it is not, because vanishes at .
- Composition hypothesis at the inner output, not the inner input. The composition rule asks that the outer function be continuous at , not at . A common mistake is to prove is continuous at some point and then forget to check where actually lands.
- Applying EVT to an open interval. "Continuous on " is not enough. The classic counter-example is on : the supremum 1 is not attained. Closedness is the hypothesis that prevents the escape.
- Confusing continuity with differentiability. ReLU is continuous but not differentiable at 0. The algebra-of-continuity rules do not imply anything about smoothness — they only say the combined function has no jumps, holes, or blowups.
- Assuming monotonicity is free. The inverse-continuity theorem requires strict monotonicity on top of continuity. Forget it and you cannot invert globally — you have to pick a branch first (positive or negative), each of which is monotone.
Summary
- Algebra of continuity. Sums, differences, scalar multiples, and products of continuous functions are continuous. Quotients are continuous except where the denominator vanishes.
- Composition rule. If is continuous at and is continuous at , then is continuous at .
- Inverse rule. A continuous and strictly monotone function on has a continuous inverse on its image — giving us , , for free.
- Boundedness. Continuous on ⇒ bounded on . Both hypotheses — closed and bounded — are needed.
- Extreme Value Theorem. On a closed bounded interval, a continuous function attains its maximum and minimum at actual points of the interval. This underwrites the existence of solutions to all the optimisation problems you will meet.
- Applied payoff. A neural network is a composition of affine maps and continuous activations. The algebra rules certify continuity of the whole in one sentence — and that is the condition that lets gradient descent work.
Next, §3.5 turns continuity into a tool: the Intermediate Value Theorem, which says a continuous function can't jump over a value it needs to reach — the conceptual backbone of root-finding algorithms.