Learning Objectives
By the end of this section, you will be able to:
- Read and write piecewise function notation fluently — translating between a list of rules and the graph they produce.
- Decide whether a piecewise function is continuous by checking left and right limits at every breakpoint.
- Re-express and its variants as explicit two-piece linear functions, and solve equations like by case analysis.
- Visualize the V-shape transformations and read off the vertex, slopes, and reflections at a glance.
- Recognize the Heaviside step, sign function, ReLU, and tax brackets as piecewise functions sharing one structural idea.
- Anticipate the consequences of a corner — non-differentiability — and connect this to the dying-ReLU phenomenon in deep learning.
The Big Picture: One Function, Many Rules
"Reality is piecewise. Math gets neat when we admit it."
Most functions in school appear as a single formula: , , . But many natural rules are not like that. A tax authority applies one rate up to one threshold, a different rate above it. A thermostat is OFF below a setpoint and ON above. A neuron in a deep network is silent for negative inputs and linear for positive ones. A car's brakes do nothing until you press them.
These rules all share one shape: different formulas on different parts of the input domain. The function that captures this idea is called a piecewise function. The simplest and most famous instance is the absolute value, , which is on one side and on the other.
The core insight
A piecewise function is not a new kind of object. It is an old kind of object — a function — that we describe with a switchboard instead of a single algebraic expression. The switchboard says: "If x lives in region 1, apply rule 1; if x lives in region 2, apply rule 2; …" Every input still produces exactly one output.
An everyday analogy
Think about your phone bill. The first 100 minutes are free; the next 500 minutes cost each; everything beyond that is each. The cost as a function of minutes is not one formula — it's three formulas stitched together. That stitch is what piecewise notation makes precise.
Mathematical Definition
A piecewise function is a function whose rule changes depending on which subset of the domain its input lies in. The general form with pieces is
with two non-negotiable rules:
- Coverage. The domains together cover every input the function is supposed to accept.
- No conflict. The domains do not overlap. (Or, if they do overlap, the rules agree on the overlap so there is no ambiguity.)
The breakpoints
The boundary -values where one piece ends and the next begins are called breakpoints (or knots). Whether the breakpoint belongs to the left piece, the right piece, or both, is part of the function's definition — and we mark it visually with a closed dot (included) or an open dot (approached but not taken).
The two questions to ask at every breakpoint
(1) Which side's rule wins at the breakpoint itself? (2) Do the two side-rules produce the same y-value there? If the answer to (2) is "yes," the function is continuous at the breakpoint. If "no," there is a jump discontinuity.
Interactive Piecewise Playground
Switch between the presets in the dropdown, then try Custom 2-piece and drag the sliders. Watch the green/red banner at the bottom: it tells you when the two pieces agree at the breakpoint (continuous) and when they disagree (jump). Pay special attention to filled vs hollow dots — they are how we visually encode which side "owns" the breakpoint value.
Piecewise function playground
Two rays meeting at the origin with slopes -1 and +1. The corner is at x = 0.
Tip: filled dot = value included at that x; hollow dot = value approached but not taken. Continuity fails exactly when the filled-dot and hollow-dot y-values disagree.
What to notice as you play
- Continuity is a meeting condition, not a smoothness condition. Two pieces can meet at the breakpoint (continuous) and still arrive with completely different slopes (corner).
- |x| and ReLU are cousins. Both are two-piece linear functions whose pieces meet at the origin. ReLU clips the negative side to zero; |x| reflects it upward. We will exploit this kinship in the ML section.
- Heaviside and sign have jumps on purpose. They model switches — events that have to flip discontinuously.
- Tax brackets are designed to be continuous. The piecewise intercepts are chosen specifically so that crossing into the next bracket never produces a sudden jump in tax owed. Without continuity, taxpayers would have a perverse incentive to earn slightly less.
Absolute Value as a Piecewise Function
The absolute value of a real number x, written , is its distance from zero on the number line. Distance is always non-negative, so for every x — equality holds only at .
The two-piece definition
The formal definition spells out the two cases:
Read it like English: "if x is negative, flip its sign to make it positive; if x is non-negative, leave it alone." Both cases produce a non-negative output, which is why the absolute-value graph lies entirely on or above the x-axis.
Geometric meaning: a V at the origin
The graph of is a perfect V. The left ray has slope ; the right ray has slope . They meet at the origin in a corner. The corner is the geometric signature of every absolute-value function — it is the price you pay for forcing the output to be non-negative.
A second face: distance between two numbers
Replacing x with gives , which is the distance from x to a. This is the most useful reading of absolute value in the rest of calculus and statistics:
- means "x is within of 3."
- means "the function value is within of L." This is the language of limits, which we will meet in Chapter 2.
- The median of a data set minimizes . (Compare the mean, which minimizes .)
Two readings, one symbol
When you see , it always means the distance from to . When you see alone, it means the distance from to 0. They are the same function shifted.
Transformations of |x|: Building Vs Anywhere
Just as the parabola family places a U anywhere on the plane, the absolute-value family
places a V anywhere on the plane. The three parameters do exactly what you think:
| Parameter | Geometric effect |
|---|---|
| h | Horizontal shift of the vertex. Positive h moves the V right. |
| k | Vertical shift of the vertex. Positive k moves the V up. |
| a | Steepness and orientation. |a| > 1 makes the V steeper; 0 < |a| < 1 makes it shallower; a < 0 flips the V upside down. |
Play with the sliders below. The dashed grey curve is the reference ; the amber curve is your transformed V. The amber dot marks the vertex, which sits at .
Absolute-value transformations
Unchanged from |x|.
Reading slopes off the formula
For :
- To the right of the vertex (): slope is .
- To the left of the vertex (): slope is .
- At the vertex itself: no single slope exists — the function has a corner.
Worked Example by Hand
We will work two examples in detail. The first solves an absolute-value equation by case analysis. The second graphs a sum of two absolute-value functions and discovers a surprising flat region. Try each step yourself before peeking.
▶ Example 1 — Solve |2x − 3| = 5
Step 1. Write the defining rule of absolute value:
Step 2. Identify and . Substituting gives two linear equations.
Step 3. Solve each.
Step 4. Verify both by substitution:
| Candidate x | 2x − 3 | |2x − 3| | Match 5? |
|---|---|---|---|
| x = 4 | 5 | 5 | ✓ |
| x = −1 | −5 | 5 | ✓ |
Answer. or . The equation has two solutions, symmetric about — which is the value where the inside equals zero, i.e. the corner of .
Why exactly two solutions, geometrically
Graphing gives a V with vertex at . The horizontal line intersects this V at two points, equidistant from the vertex. Their horizontal distance from is (because the V has slopes , so we travel in x to rise 5 in y). Thus the solutions are = and .
▶ Example 2 — Graph g(x) = |x − 1| + |x + 2| and find its minimum
Step 1. Locate the corners. Each has a corner where its inside is zero, so:
- has a corner at .
- has a corner at .
The corners split the real line into three regions: , , and .
Step 2. Drop the bars in each region by choosing the right sign for the inside.
| Region | |x − 1| becomes | |x + 2| becomes | g(x) |
|---|---|---|---|
| x < −2 | 1 − x | −x − 2 | −2x − 1 |
| −2 ≤ x ≤ 1 | 1 − x | x + 2 | 3 |
| x > 1 | x − 1 | x + 2 | 2x + 1 |
Step 3. The middle row is the punchline. On the entire interval , the function is identically 3 — the x-terms cancel. Outside the interval, the function ramps up linearly with slope .
Step 4. Spot-check three values inside the flat region.
| x | |x − 1| | |x + 2| | g(x) |
|---|---|---|---|
| −2 | 3 | 0 | 3 |
| 0 | 1 | 2 | 3 |
| 0.7 | 0.3 | 2.7 | 3 |
| 1 | 0 | 3 | 3 |
Step 5. The minimum value of g is , achieved on the entire interval . This is the distance between the two corners — a general fact: for with , the minimum value is , achieved on the entire interval .
Why this matters for ML
The fact that the minimum of a sum of absolute values is achieved on an interval rather than at a single point is exactly the reason L1-regularized regression produces sparse coefficients — the optimizer can "land" flat against the corner of a coordinate axis and stay there. We'll return to this when we discuss the lasso in the optimization chapter.
The Sign Function and the Heaviside Step
Two more piecewise functions deserve names because they appear everywhere from control engineering to deep learning to distribution theory.
The sign function
The sign function reports which side of zero a number lives on, ignoring its magnitude. A useful identity ties it to absolute value:
That second form says: "to find the sign of x, divide x by its magnitude." The result is .
The Heaviside step
Oliver Heaviside introduced this step around 1880 to write down equations for electrical circuits that switch on at . It is the cleanest possible discontinuous function: zero on one side, one on the other, with a single jump of size 1.
The Heaviside-step trick
Many piecewise functions can be expressed as linear combinations of Heaviside steps. For instance, describes a function that jumps up by 3 at , then down by 2 at . This trick is the foundation of signal processing.
How they relate to ReLU and |x|
All four functions are first cousins on the same family tree:
| Function | Pieces | Continuous? | Slope on the left | Slope on the right |
|---|---|---|---|---|
| sgn(x) | −1, 0, +1 | No (jump of 2) | 0 | 0 |
| H(x) | 0, 1 | No (jump of 1) | 0 | 0 |
| |x| | −x, x | Yes (corner) | −1 | +1 |
| ReLU(x) | 0, x | Yes (corner) | 0 | +1 |
Differentiation moves up this table: , and . Each row is the derivative of the row below it. The corners become jumps; the jumps become "deltas" (which we'll meet in distribution theory, beyond this chapter).
Corners, Continuity, and Derivatives
Piecewise functions are the first place a student meets the distinction between continuity and differentiability. They are not the same thing.
Continuity at a breakpoint
Let be a breakpoint between two pieces. The function is continuous at if and only if
All three numbers must exist and be equal. We will make these limit symbols precise in Chapter 2; for now, "left limit equals right limit equals value" is the working definition.
Differentiability at a breakpoint
Differentiability is a stricter requirement than continuity. Even when the pieces meet (no jump), if they arrive with different slopes, the function has a corner, and the derivative at that point does not exist. at is the canonical example:
Two different one-sided slopes ⟹ no single derivative. We will derive this carefully in Chapter 4, but the geometry already tells the story: the V has two different tangent directions at the vertex.
Continuity ⇏ Differentiability
A piecewise function can be perfectly continuous and still fail to be differentiable. Corners (|x|, ReLU) and cusps ( at 0) are continuous everywhere yet non-differentiable at one point. The reverse implication holds: differentiable always implies continuous.
The hierarchy of smoothness
We can stack functions by how much regularity they enjoy at their breakpoints:
- Discontinuous (Heaviside, sign): a jump in the function value.
- Continuous but not differentiable (|x|, ReLU): the pieces meet, but slopes differ.
- Differentiable but not twice differentiable: slopes match, but curvatures differ. The piecewise / example from the Python section sits here — same value AND same slope at (because ), but second derivative jumps from 2 to 0.
- Smooth: every derivative exists everywhere. Single-formula functions like , , polynomials.
ReLU: Piecewise in Modern Machine Learning
The Rectified Linear Unit,
is the most-used non-linear function in modern neural networks. Every transformer layer, every CNN feature map, every MLP hidden layer typically passes its values through a ReLU (or a close relative). Understanding piecewise functions is therefore not optional for ML — it is the central activation pattern.
Why ReLU works so well in practice
- Cheap. One comparison, one selection. No exponentials, no divisions.
- Gradient survives. On the active side, the derivative is exactly 1, so gradients flow unchanged through arbitrarily deep stacks. Sigmoid's derivative is at most 0.25, so a 10-layer sigmoid stack divides gradient by in the worst case — the "vanishing gradient" problem.
- Sparse activations. About half the inputs are negative for a typical random initialization, so about half of every hidden layer is exactly zero. Sparsity is cheap to compute and often improves generalization.
The dark side: the dying-ReLU problem
The very piecewise structure that makes ReLU efficient also creates a failure mode. If a neuron's pre-activation is always negative (across the entire training set), then its output is always zero, its gradient is always zero, and the optimizer cannot move its weights. The neuron is dead, frozen at initialization, contributing nothing to the network forever.
We will demonstrate this with PyTorch in the code block below — initialize a neuron with a very negative bias, run SGD, and watch the weights refuse to move. Then in practice, three remedies break the symmetry:
- Smart initialization (He, Kaiming) so pre-activations span both sides of zero from the start.
- Leaky ReLU: for small (e.g. 0.01). The flat side becomes a gentle slope, so gradients survive.
- Skip connections (ResNets): provide an alternate pathway for gradient even when a unit is dead.
The takeaway
The most important non-linearity in modern AI is a piecewise function with two linear pieces. Its strengths and its failure modes are direct consequences of its piecewise structure. Every other thing we'll do with derivatives, gradients, and optimization is downstream of understanding this single shape.
Python Implementation
Two NumPy idioms cover almost every piecewise function you'll ever write: np.piecewise for the general n-piece case, and np.where for the common 2-piece case. The code below uses both side by side and plots a piecewise polynomial, a ReLU, and an absolute value on a single figure.
From hand-solved to computer-verified
Below we revisit the two worked examples — solving and analyzing — but now with NumPy doing the heavy lifting. Notice the flat-bottom plateau in the plot: it is the set of minimizers, not a single point.
PyTorch Implementation
Now we shift from descriptive plotting to autograd: ask PyTorch what the derivative of ReLU is at five sample points, watch it return the Heaviside step exactly. Then we reproduce the dying-ReLU pathology by initializing a neuron with a fatally negative bias and showing that SGD cannot rescue it.
What autograd is really doing at the corner
PyTorch does not compute the derivative of ReLU symbolically — it dispatches to a hand-written backward kernel that returns 1 if the forward input was positive, 0 otherwise, and 0 at exact zero. The Heaviside step is therefore literally what autograd uses, not just a mathematical analogy.
Common Pitfalls
| Pitfall | Why it bites | Fix |
|---|---|---|
| Overlapping piece domains | If two pieces both include a breakpoint, the function is ambiguous there. | Use strict and non-strict inequalities consistently. Convention: each piece owns its left endpoint with ≤ or its right endpoint with <. |
| Gap in the piece domains | Some inputs hit no piece at all — the function is undefined there. | Check that the union of all domains covers the intended input set. |
| Confusing continuity with differentiability | Drawing a continuous V and concluding it's differentiable at the vertex. | Check slopes from each side. If they disagree, no derivative exists at that point — even though the function is continuous. |
| |x|² = x (false) | Squaring strips the absolute value only because of x², not because |·| is the identity. | Memorize |x|² = x² for all real x. The bars vanish under squaring; the variable stays squared. |
| Solving |u| = c without considering c < 0 | |u| can never be negative, so |u| = −1 has no solution. | Always check the sign of c first. If c < 0, the equation has no real solutions. |
| Forgetting one root of |2x − 3| = 5 | Two cases (2x−3 = 5 and 2x−3 = −5) produce two roots. Students often write only one. | Whenever you 'drop' the absolute value bars, generate the two cases on a fresh line before any algebra. |
| Initializing a ReLU layer with a very negative bias | Pre-activations sit on the flat side forever; the gradient is zero. | Use He / Kaiming initialization, or switch to Leaky ReLU / GELU. |
Summary
A piecewise function is a function described by different rules on different parts of its domain. The breakpoints between regions are where the action lives: they are the only places the function can misbehave — by jumping (discontinuity), by bending sharply (corner), or by changing curvature (non-smooth-but-differentiable). Reading a piecewise function is the same skill in every form it takes: tax brackets, thermostats, ReLU, Heaviside, sign, |x|.
- Piecewise notation says "in region , use rule ." The domains must cover the input set without conflict.
- Continuity at a breakpoint = left limit, right limit, and value all agree. Otherwise there is a jump.
- |x| is two-piece linear: −x for negatives, x for non-negatives. The V-shape has slopes meeting in a corner at the origin.
- |x − a| is the distance from x to a — the most useful reading of absolute value in calculus and statistics.
- Equations split into two linear cases and . Always check both, always verify by substitution.
- Sums of absolute values can be minimized on an entire interval. This is the geometric root of L1-induced sparsity in machine learning.
- Sign, Heaviside, |x|, ReLU are one family. Each row is the derivative of the row below it.
- Corners ⇒ no derivative. Continuity does not imply differentiability. This distinction is the foundation of everything in Chapter 4.
- ReLU is the most important piecewise function alive. Its strengths (cheap, non-vanishing gradient, sparse) and its weaknesses (dying neurons) are both direct consequences of its two-piece structure.
- NumPy:
np.piecewiseandnp.where. PyTorch:F.relu+ autograd reproduces the Heaviside step exactly.
What's next. In Section 1.12 we will study transformations of arbitrary functions — shift, scale, reflect — using exactly the same shifting language that we previewed here for . In Chapter 2 we will give precise meaning to the left/right limits that defined continuity at a breakpoint.