Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Define what it means for a function to be differentiable at a point
Explain the relationship between continuity and differentiability
Identify corners, cusps, vertical tangents, and discontinuities as points where derivatives fail to exist
Compute one-sided (left-hand and right-hand) derivatives
Analyze piecewise functions for differentiability
Connect non-differentiable points to challenges in machine learning optimization

The Big Picture: When Derivatives Break Down

"The derivative exists at a point only when the function has a single, well-defined tangent line there — not a corner, not a cusp, not a vertical line."

In the previous sections, we defined the derivative as a limit and computed it for smooth functions like polynomials. But not every function has a derivative at every point. This section explores the boundary conditions of calculus — where and why the derivative fails to exist.

The Central Question

Given a function $f(x)$ , at which points does $f'(x)$ exist, and what can go wrong at points where it doesn't?

Understanding differentiability is crucial because:

Theoretical foundation: Many calculus theorems (Mean Value Theorem, Taylor series) require differentiability
Optimization: Gradient-based optimization assumes derivatives exist
Machine learning: Non-differentiable activation functions (like ReLU) require special handling
Physics: Sharp boundaries and phase transitions create non-differentiable points

Historical Context: Weierstrass and the Monsters

In the 19th century, mathematicians assumed that any continuous function could be differentiated except perhaps at a few isolated points. Then in 1872, Karl Weierstrass shocked the mathematical world by constructing a function that is continuous everywhere but differentiable nowhere.

The Weierstrass Function

Weierstrass showed that the function $f(x) = \sum_{n=0}^{\infty} a^n \cos(b^n \pi x)$ (for appropriate $a$ and $b$ ) is continuous everywhere but has a corner at every single point! This "mathematical monster" forced mathematicians to be much more careful about the distinction between continuity and differentiability.

This discovery revealed that continuity and differentiability are fundamentally different properties. Today, we encounter similar issues in practical applications — neural network activation functions like ReLU are continuous but not differentiable at certain points.

What Does Differentiable Mean?

Let's state precisely what it means for a function to be differentiable:

Definition: Differentiability at a Point

A function $f$ is differentiable at $x = a$ if the following limit exists and is finite:

f'(a) = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h}

This means the limit must:

Exist (not oscillate or diverge)
Be the same whether h approaches 0 from the left or right
Be a finite real number (not ±∞)

What Can Go Wrong?

The derivative at $x = a$ fails to exist when:

Problem	Description	Example
Corner/Cusp	Left and right limits exist but differ	f(x) = \|x\| at x = 0
Vertical Tangent	Limit is infinite (±∞)	f(x) = ∛x at x = 0
Discontinuity	Function not continuous at the point	Step functions
Oscillation	Limit doesn't exist (oscillates)	f(x) = sin(1/x) near x = 0

Continuity vs Differentiability: A Crucial Distinction

One of the most important relationships in calculus is between continuity and differentiability:

✓ Differentiable ⟹ Continuous

If $f$ is differentiable at $x = a$ , then $f$ must be continuous at $x = a$ .

Differentiability is a "stronger" condition.

✗ Continuous ⟹ Differentiable

If $f$ is continuous at $x = a$ , it does NOT necessarily mean $f$ is differentiable there.

Counter-example: f(x) = |x| at x = 0.

Proof: Differentiability Implies Continuity

Goal: Show that if $f'(a)$ exists, then $\lim_{x \to a} f(x) = f(a)$ .

Proof: Write $f(x) - f(a) = \frac{f(x) - f(a)}{x - a} \cdot (x - a)$

Taking the limit as $x \to a$ :

$\lim_{x \to a} [f(x) - f(a)] = \lim_{x \to a} \frac{f(x) - f(a)}{x - a} \cdot \lim_{x \to a}(x - a)$

$= f'(a) \cdot 0 = 0$

Therefore $\lim_{x \to a} f(x) = f(a)$ , which means $f$ is continuous at $a$ . ∎

Key Insight

Continuity is necessary but not sufficient for differentiability. Think of it as: you must first pass the continuity test before you can even attempt the differentiability test.

Interactive Exploration

Use the visualizer below to explore different types of non-differentiable points. Watch how the secant line behaves as h → 0 from the left and right, and see why the derivative fails to exist at problem points.

📐Differentiability Explorer

Watch how the secant line behaves as h → 0 from the left or right. A function is differentiable at a point only if both one-sided limits exist and are equal.

Function Type

Approach Direction

Animation

Step Size: h = 1.0000(Drag to manually adjust)

Difference Quotient

x₁ = 0.0000

x₂ = x₁ + h = -1.0000

f(x₁) = 0.0000

f(x₂) = 1.0000

Slope = -1.0000

Analysis

The absolute value function has a sharp corner at x = 0

✗ Issue: Left derivative (-1) ≠ Right derivative (+1)

Try: Switch between "From Left" and "From Right" to see how the slopes differ!

Key Insight

At a corner, the left-hand derivative exists (here it's -1) and the right-hand derivative exists (here it's +1), but they are not equal. Since the two-sided limit doesn't exist, the function is not differentiable at x = 0.

Corners and Cusps: Sharp Turns in the Graph

Corners: The Classic Example

A corner occurs where the function changes direction abruptly. The most famous example is the absolute value function:

f(x) = |x| = \begin{cases} -x & \text{if } x < 0 \\ x & \text{if } x \geq 0 \end{cases}

At $x = 0$ , the graph forms a "V" shape. Let's check differentiability:

Approaching from the left (h < 0)

\lim_{h \to 0^-} \frac{|0+h| - |0|}{h}

= \lim_{h \to 0^-} \frac{|h|}{h}

= \lim_{h \to 0^-} \frac{-h}{h}

(since h < 0)

= -1

Approaching from the right (h > 0)

\lim_{h \to 0^+} \frac{|0+h| - |0|}{h}

= \lim_{h \to 0^+} \frac{|h|}{h}

= \lim_{h \to 0^+} \frac{h}{h}

(since h > 0)

= +1

Since $-1 \neq +1$ , the two-sided limit doesn't exist. The derivative $f'(0)$ is undefined.

Cusps: Even Sharper

A cusp is like an extreme corner where the slopes on both sides approach infinity. Consider $f(x) = x^{2/3}$ :

Near $x = 0$ , the derivative $f'(x) = \frac{2}{3}x^{-1/3}$ approaches $\pm\infty$ depending on the direction.

One-Sided Derivatives

To analyze corners and other problematic points, we introduce one-sided derivatives:

Left-Hand Derivative

f'_-(a) = \lim_{h \to 0^-} \frac{f(a+h) - f(a)}{h}

The slope as we approach from the left (h is negative)

Right-Hand Derivative

f'_+(a) = \lim_{h \to 0^+} \frac{f(a+h) - f(a)}{h}

The slope as we approach from the right (h is positive)

Differentiability Criterion

A function $f$ is differentiable at $x = a$ if and only if:

The left-hand derivative $f'_-(a)$ exists
The right-hand derivative $f'_+(a)$ exists
They are equal: $f'_-(a) = f'_+(a)$

↔️One-Sided Derivatives: Left vs Right

Compare the left-hand and right-hand derivatives of f(x) = |x| at x = 0. Watch both secant lines simultaneously as h approaches 0 from each side.

Left Approach: h = -0.500

Slope = -1.0000

Right Approach: h = 0.500

Slope = 1.0000

One-Sided Derivative Comparison

Left-Hand Derivative

-1.0000

As h → 0⁻

≠

Right-Hand Derivative

1.0000

As h → 0⁺

Conclusion: Left limit = -1 ≠ Right limit = +1→f'(0) does not exist

Left-Hand Derivative

f'₋(0) = lim_h→0⁻ [f(0+h) - f(0)] / h

= lim_h→0⁻ [|h| - 0] / h

= lim_h→0⁻ (-h) / h

= -1

Right-Hand Derivative

f'₊(0) = lim_h→0⁺ [f(0+h) - f(0)] / h

= lim_h→0⁺ [|h| - 0] / h

= lim_h→0⁺ h / h

= +1

Vertical Tangents: Infinite Slopes

Some functions are continuous and have no corners, yet still fail to be differentiable. Consider the cube root function:

f(x) = \sqrt[3]{x} = x^{1/3}

At $x = 0$ , the function is continuous and the graph is smooth-looking. But the tangent line becomes vertical:

$f'(x) = \frac{1}{3}x^{-2/3} = \frac{1}{3\sqrt[3]{x^2}}$

As $x \to 0$ : $f'(x) \to +\infty$

The derivative formula gives infinity at x = 0, which means the derivative does not exist as a real number. The tangent line is vertical — it has undefined slope.

Vertical vs. Infinite

We don't say the derivative "equals infinity." Infinity is not a real number. Instead, we say the derivative does not exist because the limit is not finite.

Discontinuities: The Prerequisite Fails

Since differentiability requires continuity, any discontinuity automatically prevents differentiability:

Jump Discontinuity

Function "jumps" from one value to another. Example: step function.

✗ Not continuous → Not differentiable

Removable Discontinuity

Function undefined at a point but has a limit. Example: f(x) = x²/x at x = 0.

✗ Not defined → Not differentiable

Infinite Discontinuity

Function approaches ±∞. Example: f(x) = 1/x at x = 0.

✗ Not continuous → Not differentiable

Gallery of Non-Differentiable Points

🎨Gallery of Non-Differentiable Points

Six common ways a function can fail to be differentiable at a point. Each represents a different breakdown of the limit definition.

📐

Corner

f(x) = |x|

✗ Left and right derivatives differ

The function changes direction abruptly. The left slope is -1, the right slope is +1.

⚡

Cusp

f(x) = x^(2/3)

✗ Both derivatives approach ±∞

Like a corner, but more extreme. Both sides curve sharply toward a point with infinite slopes.

📏

Vertical Tangent

f(x) = ∛x

✗ Derivative approaches infinity

The function is smooth and continuous, but the tangent line becomes vertical.

⬆️

Jump Discontinuity

f(x) = sgn(x)

✗ Function is not continuous

Differentiability requires continuity. A jump makes the derivative undefined.

〰️

Wild Oscillation

f(x) = x·sin(1/x)

✗ Limit does not exist (oscillates)

The function oscillates infinitely fast near 0. The difference quotient never settles.

🕳️

Removable Discontinuity

f(x) = x²/x

✗ Function undefined at point

The function equals x everywhere except at 0, where it's undefined (hole in graph).

The Common Thread

In every case, the limit lim_h→0 [f(x+h) - f(x)] / h fails to exist as a single, finite real number. The reason varies — the limit might not exist at all, might be infinite, or might differ depending on the direction of approach — but the result is the same: no derivative at that point.

Analyzing Piecewise Functions

Piecewise functions are especially important to analyze at their "transition points." Here's a systematic approach:

Check continuity first: Evaluate the left and right limits and the function value. They must all be equal.
Check left-hand derivative: Compute the derivative of the left piece at the transition point.
Check right-hand derivative: Compute the derivative of the right piece at the transition point.
Compare: If left and right derivatives are equal, the function is differentiable.

Example: Analyzing a Piecewise Function

Consider $f(x) = \begin{cases} x^2 & \text{if } x \leq 1 \\ 2x - 1 & \text{if } x > 1 \end{cases}$

Step 1: Check continuity at x = 1

Left limit: $\lim_{x \to 1^-} x^2 = 1$
Right limit: $\lim_{x \to 1^+} (2x-1) = 1$
Function value: $f(1) = 1^2 = 1$

✓ All equal, so f is continuous at x = 1.

Step 2: Check derivatives

Left derivative: $\frac{d}{dx}(x^2)\big|_{x=1} = 2(1) = 2$
Right derivative: $\frac{d}{dx}(2x-1)\big|_{x=1} = 2$

✓ Both equal 2, so f is differentiable at x = 1 with f'(1) = 2.

Why This Works

Both pieces share the same tangent line at the transition point. The function "smoothly" transitions from one formula to the other.

Machine Learning Applications

Non-differentiable points appear frequently in machine learning, creating both challenges and opportunities.

The ReLU Challenge

The Rectified Linear Unit (ReLU) activation function is the most widely used in deep learning:

\text{ReLU}(x) = \max(0, x) = \begin{cases} 0 & \text{if } x < 0 \\ x & \text{if } x \geq 0 \end{cases}

This is essentially $|x|$ shifted — it has a corner at $x = 0$ !

The Problem

At x = 0, ReLU has left derivative 0 and right derivative 1. The derivative is undefined at exactly the point where neurons "activate."

The Solution

In practice, we use subgradients: arbitrarily define ReLU'(0) = 0 or 1. The probability of landing exactly at x = 0 is negligible, so this works surprisingly well.

Smooth Alternatives

To avoid differentiability issues, researchers have developed smooth approximations:

Activation	Formula	Differentiable?
ReLU	max(0, x)	No (corner at 0)
Leaky ReLU	max(αx, x)	No (corner at 0)
Softplus	log(1 + eˣ)	Yes (smooth)
GELU	x · Φ(x)	Yes (smooth)
Swish	x · σ(x)	Yes (smooth)

Why ReLU Still Wins

Despite being non-differentiable, ReLU often outperforms smooth alternatives in practice because:

It's computationally cheap (just a max operation)
It doesn't suffer from vanishing gradients for positive inputs
The non-differentiable point at 0 is rarely an issue numerically

This is a great example of how practical engineering sometimes trumps mathematical elegance!

Python Implementation

Testing Differentiability Numerically

Let's write code to numerically check if a function is differentiable by comparing left and right derivatives:

Checking Differentiability Numerically

🐍check_differentiability.py

Explanation(6)

Code(67)

3Differentiability Checker

This function numerically tests whether f is differentiable at x0 by computing and comparing the left-hand and right-hand derivatives.

14Left-Hand Derivative

We compute the difference quotient approaching from the left by using f(x0) - f(x0 - h), which is equivalent to [f(x0 + h) - f(x0)]/h with h < 0.

18Right-Hand Derivative

The standard difference quotient approaching from the right. If this equals the left-hand derivative, the function is differentiable.

25Tolerance Check

Due to floating-point precision, we check if the difference is smaller than a tolerance rather than exactly zero.

30Testing |x| at 0

The absolute value function has left derivative -1 and right derivative +1 at x = 0, so the difference is 2 — clearly not differentiable.

43Testing x² at 0

A parabola is smooth, so both one-sided derivatives equal 0 at the origin. The function is differentiable everywhere.

61 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4def check_differentiability(f, x0, h_values=None, tolerance=1e-6):
5    """
6    Check if a function f is differentiable at x0 by comparing
7    left-hand and right-hand derivatives.
8
9    Returns: (is_differentiable, left_derivative, right_derivative)
10    """
11    if h_values is None:
12        h_values = [0.1, 0.01, 0.001, 0.0001, 0.00001]
13
14    left_derivatives = []
15    right_derivatives = []
16
17    for h in h_values:
18        # Left-hand derivative: approach from the left (h < 0)
19        left_dq = (f(x0) - f(x0 - h)) / h
20        left_derivatives.append(left_dq)
21
22        # Right-hand derivative: approach from the right (h > 0)
23        right_dq = (f(x0 + h) - f(x0)) / h
24        right_derivatives.append(right_dq)
25
26    # Take the last (smallest h) as our best estimate
27    left_deriv = left_derivatives[-1]
28    right_deriv = right_derivatives[-1]
29
30    # Check if they're approximately equal
31    is_differentiable = abs(left_deriv - right_deriv) < tolerance
32
33    return is_differentiable, left_deriv, right_deriv
34
35# Example 1: f(x) = |x| at x = 0 (corner)
36def absolute_value(x):
37    return np.abs(x)
38
39result = check_differentiability(absolute_value, 0)
40print("f(x) = |x| at x = 0:")
41print(f"  Differentiable: {result[0]}")
42print(f"  Left derivative:  {result[1]:.6f}")
43print(f"  Right derivative: {result[2]:.6f}")
44print(f"  Difference: {abs(result[1] - result[2]):.6f}")
45print()
46
47# Example 2: f(x) = x^2 at x = 0 (smooth)
48def parabola(x):
49    return x ** 2
50
51result = check_differentiability(parabola, 0)
52print("f(x) = x^2 at x = 0:")
53print(f"  Differentiable: {result[0]}")
54print(f"  Left derivative:  {result[1]:.6f}")
55print(f"  Right derivative: {result[2]:.6f}")
56print(f"  Difference: {abs(result[1] - result[2]):.6f}")
57print()
58
59# Example 3: Piecewise function
60def piecewise(x):
61    return np.where(x <= 0, x**2, x)
62
63result = check_differentiability(piecewise, 0)
64print("Piecewise: x^2 for x <= 0, x for x > 0, at x = 0:")
65print(f"  Differentiable: {result[0]}")
66print(f"  Left derivative:  {result[1]:.6f}")
67print(f"  Right derivative: {result[2]:.6f}")

Activation Functions in Deep Learning

Here's how non-differentiability appears in neural network activation functions:

ReLU and Smooth Alternatives

🐍activation_functions.py

Explanation(5)

Code(67)

4ReLU Definition

ReLU(x) = max(0, x) is essentially |x|/2 + x/2. It's a corner function — not differentiable at x = 0.

8ReLU Derivative Problem

The derivative is 0 for x < 0 and 1 for x > 0. At x = 0, it's undefined! In practice, we arbitrarily choose 0 or 1 — this is called a 'subgradient'.

21Leaky ReLU

By allowing a small slope (alpha) for negative inputs, gradients can flow even for negative values. Still has a corner, but works better in practice.

33Softplus: The Smooth Alternative

Softplus approximates ReLU but is infinitely differentiable. The derivative is the sigmoid function, which smoothly transitions from 0 to 1.

44Sigmoid as Derivative

The sigmoid σ(x) = 1/(1+e^(-x)) is the derivative of softplus. It's smooth, bounded between 0 and 1, and well-defined everywhere.

62 lines without explanation

1import numpy as np
2
3# ReLU: The most common activation in deep learning
4def relu(x):
5    """Rectified Linear Unit: max(0, x)"""
6    return np.maximum(0, x)
7
8def relu_derivative(x):
9    """
10    Derivative of ReLU.
11
12    At x > 0: derivative = 1
13    At x < 0: derivative = 0
14    At x = 0: undefined! (corner)
15
16    In practice, we use a "subgradient" and set it to 0 or 1.
17    """
18    return np.where(x > 0, 1.0, 0.0)
19
20# Leaky ReLU: A differentiable-ish alternative
21def leaky_relu(x, alpha=0.01):
22    """
23    Leaky ReLU: x if x > 0, else alpha * x
24
25    Still has a corner at 0, but the transition is less severe.
26    """
27    return np.where(x > 0, x, alpha * x)
28
29def leaky_relu_derivative(x, alpha=0.01):
30    """Derivative of Leaky ReLU"""
31    return np.where(x > 0, 1.0, alpha)
32
33# Smooth alternative: Softplus
34def softplus(x):
35    """
36    Softplus: log(1 + exp(x))
37
38    This is a SMOOTH approximation to ReLU.
39    It's differentiable everywhere!
40    """
41    # Use stable computation to avoid overflow
42    return np.where(x > 20, x, np.log1p(np.exp(x)))
43
44def softplus_derivative(x):
45    """
46    Derivative of softplus: 1 / (1 + exp(-x)) = sigmoid(x)
47
48    This is the sigmoid function — smooth and well-defined everywhere.
49    """
50    return 1 / (1 + np.exp(-x))
51
52# Compare at and near x = 0
53x_test = np.array([-0.1, 0, 0.1])
54
55print("Function values near x = 0:")
56print(f"  ReLU:     {relu(x_test)}")
57print(f"  Softplus: {softplus(x_test)}")
58print()
59
60print("Derivative values near x = 0:")
61print(f"  ReLU:       {relu_derivative(x_test)}")
62print(f"  Softplus:   {softplus_derivative(x_test)}")
63print()
64
65print("Key insight:")
66print("  ReLU has a corner at 0 (derivative jumps from 0 to 1)")
67print("  Softplus is smooth (derivative transitions gradually)")

Common Mistakes to Avoid

Mistake 1: Assuming continuity implies differentiability

Wrong: "The function is continuous at x = 0, so it must be differentiable there."

Correct: Continuity is necessary but not sufficient. You must also check that left and right derivatives exist and are equal. Example: |x| is continuous at 0 but not differentiable.

Mistake 2: Ignoring the direction of approach

Wrong: Computing only one-sided derivative and concluding the derivative exists.

Correct: Always check both left and right limits. The derivative exists only if they're equal.

Mistake 3: Saying derivative equals infinity

Wrong: "The derivative at x = 0 is infinity."

Correct: The derivative does not exist because the limit is infinite. Infinity is not a real number.

Mistake 4: Only checking the formula

Wrong: Taking the derivative formula and evaluating at a problem point without verifying the limit exists.

Correct: For piecewise functions and functions with potential problems, always verify differentiability using the limit definition.

Test Your Understanding

🧪Test Your Understanding: Differentiability

Question 1 of 6

Which of the following is TRUE about the relationship between continuity and differentiability?

Summary

Differentiability is a stronger condition than continuity. A function can be continuous but still fail to be differentiable at corners, cusps, vertical tangents, or discontinuities.

Key Relationships

Differentiable⟹Continuous⟹Defined

But the reverse implications do NOT hold!

Key Concepts

Concept	Description
Differentiable at a	lim_{h→0} [f(a+h) - f(a)] / h exists and is finite
One-sided derivatives	f'₋(a) from the left, f'₊(a) from the right
Criterion for differentiability	f'₋(a) = f'₊(a) and both are finite
Corner/Cusp	Left and right derivatives exist but differ (or are infinite)
Vertical tangent	Derivative limit is ±∞
Continuity requirement	Must be continuous to be differentiable

Key Takeaways

Differentiability implies continuity — but a continuous function need not be differentiable
At a corner (like |x| at 0), left and right derivatives differ, so the function is not differentiable
At a vertical tangent (like ∛x at 0), the derivative is infinite, so it doesn't exist as a real number
Discontinuities prevent differentiability because continuity is a prerequisite
For piecewise functions, check both continuity and matching derivatives at transition points
In machine learning, non-differentiable points (like in ReLU) are handled with subgradients or smooth approximations

The Boundary of Calculus:

"Differentiability tells us where calculus can work its magic — and where we must tread carefully because the smooth machinery breaks down."

Coming Next: In the next section, we'll learn the Power Rule — a shortcut that lets us compute derivatives of polynomial functions without using the limit definition. We'll also see the Sum and Constant rules that make differentiation practical.