Chapter 2
12 min read
Section 15 of 353

One-Sided Limits and Jump Discontinuities

Limits — Approaching the Infinite

Learning Objectives

By the end of this section you will be able to:

  1. Describe what a one-sided limit is, in both the approach-from-the-left form limxaf(x)\lim_{x \to a^-} f(x) and the approach-from-the-right form limxa+f(x)\lim_{x \to a^+} f(x).
  2. State the agreement theorem: limxaf(x)\lim_{x \to a} f(x) exists if and only if the two one-sided limits exist AND agree.
  3. Identify a jump discontinuity by computing both one-sided limits.
  4. Compute one-sided limits numerically by probing closer and closer values.
  5. Connect this idea to machine learning, where ReLU creates a jump in the derivative at the origin.

Why One-Sided? A Physical Story

In the previous section we watched a function sneak up on a value from both sides at once. Real phenomena are not always so well-behaved. Flip a light switch at t=0t = 0. An instant before, the current is 00. An instant after, the current is I0I_0. If you stood at the exact moment the switch closed and asked “what is the current”, you would get a different answer depending on which direction in time you came from.

The motivating problem: how do we talk about a limit when the function disagrees with itself at a point?

The fix is almost embarrassingly simple: split the definition. We do not talk about ONE limit — we talk about TWO, one for each direction of approach. That single move unlocks the language we need to describe switches, step functions, digital signals, price ladders, the ReLU non-linearity, the floor function, and every real-world process that jumps.


The Two-Doors Intuition

Picture the point x=ax = a as a room with two doors. The LEFT door is the only entrance from values x<ax < a. The RIGHT door is the only entrance from values x>ax > a. If you walk toward the room through the left door you will see one value of ff getting nearer and nearer. Walk in through the right door and you may see something completely different.

The key mental picture

A limit is about the approach, not the destination. The question is not “what is f(a)f(a)?” — that is a separate question. The limit asks: “where is f(x)f(x) heading as xx closes in on aa?”

One-sided limits force us to answer that question for each direction independently. When they agree, we have a regular two-sided limit. When they disagree, we have something richer — a jump.


Formal Definition

Let ff be defined on an open interval containing aa (except possibly at aa itself).

Left-Hand Limit

We write limxaf(x)=L\lim_{x \to a^-} f(x) = L^-, read “the limit of f(x)f(x) as xx approaches aa from the left is LL^-”, to mean thatf(x)f(x) can be made arbitrarily close to LL^- by choosing xx sufficiently close to aa, with the restriction x<ax < a.

Right-Hand Limit

Similarly, limxa+f(x)=L+\lim_{x \to a^+} f(x) = L^+ means f(x)f(x) can be made arbitrarily close to L+L^+ by choosing xx sufficiently close to aa, with the restriction x>ax > a.

Read the superscripts out loud. The “minus” in aa^- is not subtraction — it is a direction marker, a little arrow that points toward aa from the negative side. Likewise a+a^+ means “coming down from above”.


Interactive: Approaching From Each Side

Pick a piecewise function below. Drag the blue slider and watch the blue dot crawl along the LEFT branch toward the vertical purple line. Drag the orange slider for the RIGHT branch. The two one-sided limits are the heights the dots are heading to.

One-Sided Limit Explorer

Drag x toward a = 2 from each side. Watch f(x) approach two different values.

A textbook jump discontinuity. Left branch ends at 5, right branch starts at 1. The gap is 4 units tall.

01234-101234567x = 2jump = -4.00
Approach from the LEFT (x → 2⁻)x = 1.4000
f(1.4000) = 2.9600
→ approaching L⁻ = 5
Approach from the RIGHT (x → 2⁺)x = 2.6000
f(2.6000) = 1.6000
→ approaching L⁺ = 1
L⁻ = 5 ≠ L⁺ = 1  ✗  Two-sided limit does NOT exist

The small open circle means the branch is approaching but does not own the point at x=ax = a. The filled circle is the branch whose definition actually evaluates at x=ax = a. The limit question ignores both — it only cares about the journey, not the landing.


The Agreement Theorem

Here is the statement that ties one-sided limits back to the ordinary two-sided limit:

Two-Sided = Both One-Sided Agreeing

limxaf(x)=L    limxaf(x)=L and limxa+f(x)=L\lim_{x \to a} f(x) = L \iff \lim_{x \to a^-} f(x) = L \text{ and } \lim_{x \to a^+} f(x) = L

In plain language: the two-sided limit exists and equals LL exactly when both one-sided limits exist AND both equal the same number LL.

This gives us a clean failure taxonomy for limits:

SituationTwo-Sided Limit?
Both one-sided limits exist and agreeExists (and equals their common value).
Both one-sided limits exist but DISAGREEDoes not exist — this is a jump discontinuity.
At least one one-sided limit fails to exist (e.g., oscillates or blows up)Does not exist — could be infinite or oscillatory.

What Is a Jump Discontinuity?

A jump discontinuity at x=ax = a is the specific failure mode where:

  1. The left-hand limit LL^- exists (is a finite real number);
  2. The right-hand limit L+L^+ exists (also finite);
  3. But LL+L^- \neq L^+.

The number L+LL^+ - L^- is called the jump, or sometimes the saltus (Latin for “leap”). It measures, in one signed number, exactly how much the function lies to itself at the discontinuity.

Do not confuse a jump discontinuity with a removable one (where both limits agree but f(a)f(a) is either missing or wrong) or an infinite one (where at least one side blows up to ±\pm\infty). We will meet those two cases in sections 2.4 and 3.3 respectively.


Four famous jumpy functions. Click through them and drag the point of interest to see how the jump size stays constant (for floor, ceiling, square wave) or is fixed by the definition (sign).

Jump Discontinuity Gallery

Click a function, drag the point of interest, and watch the jump size change.

A staircase: floor(x) jumps up by 1 at every integer. At x = n: left limit is n−1, right limit is n.

-3-2-101234-3-2-101234x = 2jump = 1.0
Point of interest: x = 2
Left limit L⁻
1.00
Right limit L⁺
2.00
Jump size
1.00
Function value at the point: f(2) = 2 — note this is independent of the two one-sided limits.

The floor function x\lfloor x \rfloor shows up whenever you round DOWN — discrete pay scales, pixel coordinates, page numbers from a character count, etc. At every integer nn: limxnx=n1\lim_{x \to n^-} \lfloor x \rfloor = n - 1 and limxn+x=n\lim_{x \to n^+} \lfloor x \rfloor = n. One-sided limits give you the mathematical machinery to describe every one of those corners precisely.


Worked Example (Collapsible)

Try the problem yourself before opening the walkthrough. Given

f(x)={x2+1x<2x1x2f(x) = \begin{cases} x^2 + 1 & x < 2 \\ x - 1 & x \geq 2 \end{cases}

Compute limx2f(x)\lim_{x \to 2^-} f(x), limx2+f(x)\lim_{x \to 2^+} f(x), f(2)f(2), and decide whether limx2f(x)\lim_{x \to 2} f(x) exists.

📝 Show the full step-by-step walkthrough
Step 1 — Pick the correct branch for each side.

To the LEFT of 2 we use f(x)=x2+1f(x) = x^2 + 1. To the RIGHT of 2 we use f(x)=x1f(x) = x - 1. At exactly x=2x = 2 the definition says x2x \geq 2, so the second branch wins — f(2)=21=1f(2) = 2 - 1 = 1.

Step 2 — Left-hand limit.

Because the branch x2+1x^2 + 1 is a polynomial (continuous everywhere), we can just plug in:

limx2(x2+1)=22+1=5\lim_{x \to 2^-} (x^2 + 1) = 2^2 + 1 = 5

Quick sanity check: probe with x=1.999x = 1.999: 1.9992+14.9961.999^2 + 1 \approx 4.996, and with x=1.9999x = 1.9999: 4.9996\approx 4.9996. The values are marching to 5.

Step 3 — Right-hand limit.

The branch x1x - 1 is also a polynomial, so:

limx2+(x1)=21=1\lim_{x \to 2^+} (x - 1) = 2 - 1 = 1

Sanity check: f(2.001)=1.001f(2.001) = 1.001, f(2.0001)=1.0001f(2.0001) = 1.0001 — marching to 1.

Step 4 — Compare.

L=5L^- = 5 and L+=1L^+ = 1. Because 515 \neq 1, the agreement theorem says the two-sided limit limx2f(x)\lim_{x \to 2} f(x) does NOT exist.

Step 5 — Measure the jump.

Jump = L+L=15=4L^+ - L^- = 1 - 5 = -4. The function drops by 4 units as we cross x=2x = 2 going left-to-right. The sign matters: a negative jump means a cliff down; a positive jump means a step up.

Step 6 — Where does f(2) land?

We found f(2)=1f(2) = 1. That is exactly where the RIGHT branch starts. On a graph this shows up as a closed dot on the right piece at height 1, and an OPEN dot on the left piece at height 5.

Summary.

Everything the problem asked for: L=5L^- = 5, L+=1L^+ = 1, f(2)=1f(2) = 1, limx2f(x)\lim_{x \to 2} f(x) does not exist, jump size 4-4.


Python: Watch the Limit Form

The intuition we just walked through is something you can see the computer do. The script below is the same worked example, but automated: it evaluates ff at a few points crowding toward 22 from each side and prints the results. Click any line in the code on the right — the left panel shows the full execution trace for that line, including every value that passes through.

Numerical One-Sided-Limit Probe — Click Any Line
🐍numerical_limit_probe.py
1File header comment

Marks this script as a numerical probe — we will not prove a limit, we will watch one happen by plugging in values that crowd toward x = 2 from both sides.

3def f(x: float) -> float

Defines the piecewise function f. The colon-annotations are type hints: x is a float and the function returns a float. They are comments to humans and tools — Python does not enforce them at runtime.

EXECUTION STATE
⬇ input: x (float) = Any real number. We will feed in 1.9, 1.99, 1.999, ... and also 2.1, 2.01, 2.001, ... to probe from both sides.
⬆ returns = float — either x² + 1 (if x < 2) or x − 1 (if x ≥ 2). The value at x = 2 is 2 − 1 = 1 because the else-branch owns the point.
4Docstring

Triple-quoted string right after the def. Python stores it as f.__doc__. It documents intent: this function is deliberately discontinuous at x = 2.

5if x < 2:

Selects the LEFT branch. The boolean comparison x < 2 is True whenever x is strictly less than 2. At exactly x = 2 this is False, so the else-branch runs.

EXECUTION STATE
x < 2 = Returns a bool: True for 1.9999, False for 2.0 and 2.0001.
6return x**2 + 1

Left branch — squares x and adds 1. As x → 2⁻, x² → 4 and so x² + 1 → 5. This is the left limit.

EXECUTION STATE
** = Python exponentiation operator. x**2 computes x × x. Example: 1.999 ** 2 = 3.996001.
→ f(1.9) = 1.9² + 1 = 3.61 + 1 = 4.610000
→ f(1.99) = 1.99² + 1 = 3.9601 + 1 = 4.960100
→ f(1.999) = 1.999² + 1 = 3.996001 + 1 = 4.996001
→ f(1.9999) = 1.9999² + 1 ≈ 4.999600
7else:

Any x ≥ 2 falls into this branch. Notice the point x = 2 itself is owned by this side — in graphs this is drawn as a filled dot on the right piece.

8return x - 1

Right branch — a straight line. As x → 2⁺, x − 1 → 1. This is the right limit.

EXECUTION STATE
→ f(2.0) = 2.0 − 1 = 1.000000
→ f(2.0001) = 2.0001 − 1 = 1.000100
→ f(2.001) = 2.001 − 1 = 1.001000
→ f(2.01) = 2.01 − 1 = 1.010000
11left_xs = [1.5, 1.9, 1.99, 1.999, 1.9999]

A list of increasingly close approximations to 2 from the LEFT. Each successive element is 10× closer to 2 than the previous — this is the shape of a limit: not a single value, but a sequence that crowds toward 2 without ever reaching it.

EXECUTION STATE
left_xs = [1.5, 1.9, 1.99, 1.999, 1.9999]
→ distances to 2 = [0.5, 0.1, 0.01, 0.001, 0.0001] — shrinking toward 0
12print('Approaching 2 from the LEFT (x < 2)')

Just a header for the output so the two sides are visually separate in the terminal.

13for x in left_xs:

Iterate over every probe value on the left side. Each pass through the loop evaluates f at one increasingly-close value.

LOOP TRACE · 5 iterations
x = 1.5
f(1.5) = 1.5² + 1 = 3.250000
x = 1.9
f(1.9) = 1.9² + 1 = 4.610000
x = 1.99
f(1.99) = 1.99² + 1 = 4.960100
x = 1.999
f(1.999) = 1.999² + 1 = 4.996001
x = 1.9999
f(1.9999) = 1.9999² + 1 ≈ 4.999600 (closing in on 5)
14print(f' f({x:<6}) = {f(x):.6f}')

f-string formatted print. Each format spec controls exactly what ends up in the terminal.

EXECUTION STATE
📚 f-strings = PEP 498. Any expression inside {...} inside an f-prefixed string is evaluated and inserted. f'{a+b}' evaluates a+b first.
⬇ spec: {x:<6} = Format x as a string, left-aligned (<), padded to width 6 with spaces. Keeps the values lined up vertically: '1.9 ', '1.99 ', '1.999 '.
⬇ spec: {f(x):.6f} = Format f(x) as a fixed-point decimal with 6 digits after the point. 4.6101 becomes '4.610100'. This makes the convergence visible at many decimal places.
17right_xs = [2.5, 2.1, 2.01, 2.001, 2.0001]

Now the RIGHT side: values that crowd down to 2 from above. Same shrinking-distance idea, opposite direction.

EXECUTION STATE
right_xs = [2.5, 2.1, 2.01, 2.001, 2.0001]
→ distances to 2 = [0.5, 0.1, 0.01, 0.001, 0.0001]
18print('\nApproaching 2 from the RIGHT (x >= 2)')

The leading \n is a newline escape — prints a blank line before the header so the two probes are visually separated.

EXECUTION STATE
\n = Escape sequence for a newline character. In source you type two characters (backslash, n); the printed output is a single line break.
19for x in right_xs:

Same loop structure, different values. Watch f(x) fall toward 1 as x falls toward 2.

LOOP TRACE · 5 iterations
x = 2.5
f(2.5) = 2.5 − 1 = 1.500000
x = 2.1
f(2.1) = 2.1 − 1 = 1.100000
x = 2.01
f(2.01) = 2.01 − 1 = 1.010000
x = 2.001
f(2.001) = 2.001 − 1 = 1.001000
x = 2.0001
f(2.0001) = 2.0001 − 1 = 1.000100 (closing in on 1)
20print(f' f({x:<6}) = {f(x):.6f}')

Same formatted print — we reuse it so the two sides line up visually in the terminal, inviting the eye to compare 4.999600 with 1.000100.

22print('\nConclusion:')

Prints a section header before the verdict. The point of the whole script is this verdict.

23print(' L^- = 5.0 (left limit)')

Reports the left-hand limit. All five left-side values were crawling up toward 5.0, and any closer value would land even nearer. That is the operational meaning of lim x→2⁻ f(x) = 5.

24print(' L^+ = 1.0 (right limit)')

Reports the right-hand limit. All five right-side values were drifting down to 1.0. So lim x→2⁺ f(x) = 1.

25print(' L^- != L^+ -> limit does NOT exist')

The punchline. Because the two sides disagree (5 ≠ 1), the ordinary two-sided limit lim x→2 f(x) is undefined. This is exactly the condition that defines a JUMP DISCONTINUITY.

EXECUTION STATE
verdict = No two-sided limit at x = 2. Jump size = L⁺ − L⁻ = 1 − 5 = −4.
6 lines without explanation
1# numerical_limit_probe.py  —  watch a one-sided limit form itself
2
3def f(x: float) -> float:
4    """Piecewise function with a jump at x = 2."""
5    if x < 2:
6        return x**2 + 1        # left branch
7    else:
8        return x - 1           # right branch
9
10# Probe from the LEFT: x -> 2^-
11left_xs = [1.5, 1.9, 1.99, 1.999, 1.9999]
12print("Approaching 2 from the LEFT (x < 2)")
13for x in left_xs:
14    print(f"  f({x:<6}) = {f(x):.6f}")
15
16# Probe from the RIGHT: x -> 2^+
17right_xs = [2.5, 2.1, 2.01, 2.001, 2.0001]
18print("\nApproaching 2 from the RIGHT (x >= 2)")
19for x in right_xs:
20    print(f"  f({x:<6}) = {f(x):.6f}")
21
22print("\nConclusion:")
23print("  L^- = 5.0   (left limit)")
24print("  L^+ = 1.0   (right limit)")
25print("  L^- != L^+  ->  the two-sided limit does NOT exist")

Plain Python first — no NumPy, no external libraries. The entire mechanism of a one-sided limit can be expressed in one for-loop and one if-statement. That is the whole idea: a limit is a process, not a number.


PyTorch: ReLU's Jumpy Derivative

One-sided limits are not just textbook curiosities — they sit at the mathematical heart of modern deep learning. The ReLU activation ReLU(x)=max(0,x)\operatorname{ReLU}(x) = \max(0, x) is continuous (its two one-sided limits agree at x=0x = 0, both equal to 0), but its derivative has a JUMP at x=0x = 0:

limx0ddxReLU(x)=0butlimx0+ddxReLU(x)=1.\lim_{x \to 0^-} \frac{d}{dx}\operatorname{ReLU}(x) = 0 \quad \text{but} \quad \lim_{x \to 0^+} \frac{d}{dx}\operatorname{ReLU}(x) = 1.

Every backward pass through a ReLU neuron hits this jump. PyTorch has to pick some value at exactly x=0x = 0 — let's probe what choice it makes.

ReLU Derivative — Probing the Jump with PyTorch Autograd
🐍relu_one_sided.py
1File header comment

Marks this as the ML application: the ReLU activation — the most common nonlinearity in modern neural networks — has a DERIVATIVE that jumps at x = 0. The very idea of a one-sided limit shows up in every backward pass through a ReLU.

2import torch

PyTorch's top-level module. Provides torch.tensor (the core n-dimensional array with autograd) and torch.autograd (the reverse-mode differentiation engine). Without autograd, we would have to compute ReLU's derivative by hand.

EXECUTION STATE
torch = Deep-learning framework — tensors + autograd + GPU. Imported once per script.
3import torch.nn.functional as F

A namespace of stateless layer functions. F.relu(x) is the plain mathematical ReLU: max(0, x). The alias 'F' is the community-standard short name — you will see it in every PyTorch codebase.

EXECUTION STATE
📚 F.relu = Function form of ReLU. Takes a tensor, returns a tensor of the same shape with every negative entry replaced by 0. Example: F.relu(tensor([-1, 2])) → tensor([0, 2]).
5def relu_grad_at(x_value) → float

A tiny helper that asks autograd: 'what is the derivative of ReLU at this specific input?'. Calling it four times from the left and four times from the right is exactly the one-sided-limit probe — but for a DERIVATIVE.

EXECUTION STATE
⬇ input: x_value = A plain Python float — the point at which we want the derivative. We will pass in tiny negatives, tiny positives, and exactly 0.
⬆ returns = A Python float — the numerical value of dReLU/dx at x_value.
6Docstring

Documents the helper. 'autograd' = PyTorch's automatic differentiation. It records the operations y = F.relu(x), then replays them backwards to compute dy/dx.

7x = torch.tensor(x_value, requires_grad=True)

Wraps the plain float as a PyTorch tensor that autograd can track.

EXECUTION STATE
📚 torch.tensor() = Factory: builds a new tensor from Python data. Example: torch.tensor(3.14) → tensor(3.1400).
⬇ arg 1: x_value = The numeric payload. Becomes the tensor's single element. Example: if x_value = -0.001, the tensor is tensor(-0.0010).
⬇ arg 2: requires_grad=True = Tells autograd to WATCH this tensor. Every op touching x will be recorded on the computation graph, so we can later ask for dy/dx. Without this flag, backward() would have nothing to fill in.
⬆ result: x = tensor(x_value, requires_grad=True) — a leaf tensor on the autograd graph.
8y = F.relu(x)

Applies ReLU: y = max(0, x). For x < 0 the result is 0; for x > 0 the result is x. Autograd also records the rule it will use on the way back.

EXECUTION STATE
📚 F.relu(x) = Piecewise linear: y = x if x ≥ 0 else 0. In the backward pass the local derivative is 1 for x > 0 and 0 for x < 0 — which is precisely where one-sided limits come in.
→ if x = -0.001 = y = max(0, -0.001) = 0.0
→ if x = +0.001 = y = max(0, +0.001) = 0.001
9y.backward()

The autograd trigger. It walks from y backwards through every recorded op, multiplying local derivatives, and writes the result into x.grad.

EXECUTION STATE
📚 .backward() = Tensor method that starts reverse-mode auto-differentiation. For scalar y, it is equivalent to computing dy/d(leaf) for every leaf tensor on the graph.
→ side effect = Populates x.grad with dy/dx. Before this line, x.grad is None.
10return x.grad.item()

Unwraps the 0-dim tensor x.grad back to a plain Python float. .item() only works when the tensor contains exactly one element — which is our case.

EXECUTION STATE
📚 .item() = Converts a single-element tensor to a Python scalar. Useful when you want to hand the value to non-PyTorch code (print, lists, numpy, etc.).
→ for x = -1e-3 = x.grad = tensor(0.) → .item() = 0.0 (ReLU is locally flat on the left)
→ for x = +1e-3 = x.grad = tensor(1.) → .item() = 1.0 (ReLU has slope 1 on the right)
13left_probes = [-0.1, -1e-3, -1e-6, -1e-9]

Probes on the LEFT of 0. Written in scientific notation: 1e-3 = 0.001, 1e-9 = 0.000000001. The list crowds toward 0 from below.

EXECUTION STATE
1e-9 notation = Python literal syntax for 10⁻⁹. 'e' means 'times 10 to the power of'. 1e-9 = 0.000000001.
14right_probes = [+1e-9, +1e-6, +1e-3, +0.1]

Mirror image: values above 0 shrinking toward it. The explicit + signs are cosmetic — they make the left/right symmetry leap out to the reader.

16print('ReLU derivative from the LEFT (x -> 0^-):')

Banner before the left-side probe. Notice the script prints a derivative, not a function value. This is the key conceptual upgrade: one-sided limits apply to any quantity — including slopes.

17for x in left_probes:

Loop through the four left-side probe points and print the derivative at each.

LOOP TRACE · 4 iterations
x = -0.1
relu_grad_at(-0.1) = 0.0 (ReLU is flat for x < 0)
x = -1e-3
relu_grad_at(-0.001) = 0.0
x = -1e-6
relu_grad_at(-1e-6) = 0.0
x = -1e-9
relu_grad_at(-1e-9) = 0.0 (still exactly 0 — this is the LEFT limit of dReLU/dx)
18print(f' dReLU/dx at x={x:+.1e} = {relu_grad_at(x):.1f}')

Formatted output, one line per probe. The format specs are deliberately chosen to make the pattern obvious.

EXECUTION STATE
⬇ spec: {x:+.1e} = + forces an explicit sign (− or +). .1e means scientific notation with 1 digit after the decimal. Example: -0.001 → '-1.0e-03'.
⬇ spec: {relu_grad_at(x):.1f} = Fixed-point float with 1 digit. 0.0 and 1.0 — that is ALL the derivative ever prints. No decimals drift, because ReLU is piecewise linear.
20print('\nReLU derivative from the RIGHT (x -> 0^+):')

Banner for the right-side probe. \n puts a blank line before the header.

21for x in right_probes:

Same loop, opposite side. Every derivative here comes out to 1.0 — the slope of y = x.

LOOP TRACE · 4 iterations
x = +1e-9
relu_grad_at(+1e-9) = 1.0 (LEFT-most on the right side — the RIGHT limit of dReLU/dx)
x = +1e-6
relu_grad_at(+1e-6) = 1.0
x = +1e-3
relu_grad_at(+0.001) = 1.0
x = +0.1
relu_grad_at(+0.1) = 1.0
22print(f' dReLU/dx at x={x:+.1e} = {relu_grad_at(x):.1f}')

Same format as the left side, so the reader can compare two columns of output and see the jump from 0.0 to 1.0 with their own eyes.

25print(f'PyTorch convention at x = 0: {relu_grad_at(0.0):.1f}')

The last probe: exactly x = 0. Mathematically the derivative is UNDEFINED here (left = 0, right = 1 disagree). A framework MUST pick some value to keep training going — PyTorch picks 0.

EXECUTION STATE
verdict = Prints '0.0'. This is a library convention, not a mathematical truth — it is one element of the SUBGRADIENT set {any value in [0, 1]}.
26# -> 0.0 (PyTorch picks the left-side subgradient)

A closing comment explaining the result. The fact that neural networks survive this discontinuity — that training still converges — is because SGD rarely lands on x = 0 exactly, and when it does, any value in [0, 1] is a valid subgradient.

7 lines without explanation
1# relu_one_sided.py  —  a real ML function with a jumpy derivative
2import torch
3import torch.nn.functional as F
4
5def relu_grad_at(x_value: float) -> float:
6    """Compute d/dx of ReLU(x) at a single point using autograd."""
7    x = torch.tensor(x_value, requires_grad=True)
8    y = F.relu(x)
9    y.backward()
10    return x.grad.item()
11
12# Probe the derivative near x = 0 from both sides
13left_probes  = [-0.1, -1e-3, -1e-6, -1e-9]
14right_probes = [+1e-9, +1e-6, +1e-3, +0.1]
15
16print("ReLU derivative from the LEFT (x -> 0^-):")
17for x in left_probes:
18    print(f"  dReLU/dx at x={x:+.1e}  =  {relu_grad_at(x):.1f}")
19
20print("\nReLU derivative from the RIGHT (x -> 0^+):")
21for x in right_probes:
22    print(f"  dReLU/dx at x={x:+.1e}  =  {relu_grad_at(x):.1f}")
23
24# And at the non-differentiable point itself:
25print(f"\nPyTorch's convention at x = 0: {relu_grad_at(0.0):.1f}")
26# -> 0.0 (PyTorch picks the left-side subgradient)

Why this matters for deep learning

A standard feed-forward network with nn ReLUs can have its derivative jump at up to nn input regions. Every time SGD steps across one of these boundaries, the gradient changes abruptly. The fact that training still works — and works well — is partly because the set of points where all those one-sided limits disagree has measure zero, so in practice the optimizer almost never lands there exactly.


Common Pitfalls

  • Assuming f(a)f(a) equals one of the one-sided limits. Not required. The sign function has L=1L^- = -1, L+=1L^+ = 1, but sgn(0)=0\operatorname{sgn}(0) = 0 — a value that neither side approaches.
  • Forgetting that “exists” has a precise meaning. The two-sided limit EXISTS only when both one-sided limits exist AND equal each other. If either one-sided limit blows up or oscillates, the two-sided limit does not exist.
  • Reading aa^- as “a minus something”. The little dash is a direction marker, not subtraction. It says “approach aa through values smaller than aa”.
  • Picking too few probe points. Three values is usually not enough to be confident the limit exists. Use a decreasing sequence like 0.1,0.01,0.001,0.00010.1, 0.01, 0.001, 0.0001 so each probe is ten times closer than the one before.

Summary

  • The left-hand limit limxaf(x)\lim_{x \to a^-} f(x) watches ff as xx approaches aa through values x<ax < a.
  • The right-hand limit limxa+f(x)\lim_{x \to a^+} f(x) is the mirror image: x>ax > a.
  • Agreement theorem: the two-sided limit exists iff both one-sided limits exist AND are equal.
  • A jump discontinuity is the specific failure where both one-sided limits exist but differ. The jump is L+LL^+ - L^-.
  • Every real-world switching, rounding, or step phenomenon — from digital logic to ReLU activations — is precisely a jump discontinuity, and one-sided limits are the language we use to talk about it.
Next section we flip direction: what if xx itself is running away — xx \to \infty? Limits at infinity, and the horizontal asymptotes they reveal.
Loading comments...