Chapter 4
20 min read
Section 40 of 353

Higher-Order Derivatives

The Derivative - Instantaneous Rate of Change

Learning Objectives

By the end of this section, you will be able to:

  1. Compute second, third, and higher-order derivatives of functions using differentiation rules
  2. Interpret the physical meaning of higher derivatives in terms of motion (velocity, acceleration, jerk)
  3. Use multiple notation systems (Leibniz, Lagrange, Newton) for higher-order derivatives
  4. Connect the second derivative to concavity and curvature of curves
  5. Understand how higher-order derivatives appear in Taylor series approximations
  6. Apply second-order derivatives (the Hessian) in machine learning optimization
  7. Recognize patterns in derivatives of common functions (polynomials, exponentials, trig)

The Big Picture: Derivatives of Derivatives

"The first derivative tells us how a quantity changes. The second tells us how that change is changing. And so on, layer by layer, into the infinite depths of change."

We've learned that the derivative f(x)f'(x) measures the instantaneous rate of change of a function. But f(x)f'(x) is itself a function — so we can ask: how fast is the rate of change changing?

This leads us to the second derivative f(x)f''(x), which is simply the derivative of the derivative. And we can continue: the third derivative f(x)f'''(x), fourth derivative f(4)(x)f^{(4)}(x), and so on.

The Chain of Derivatives

Original
f(x)
1st Derivative
f'(x)
2nd Derivative
f''(x)
3rd Derivative
f'''(x)
→ ...

Why Higher-Order Derivatives Matter

Higher-order derivatives appear throughout mathematics and its applications:

  • Physics: Acceleration is the second derivative of position; jerk (important for passenger comfort) is the third
  • Curve Analysis: The second derivative tells us about concavity and inflection points
  • Taylor Series: Higher derivatives determine how well polynomials approximate functions
  • Differential Equations: Many physical laws involve second derivatives (F = ma, wave equation, heat equation)
  • Machine Learning: The Hessian (matrix of second derivatives) is crucial for optimization algorithms like Newton's method

Historical Context

The concept of higher-order derivatives emerged naturally from the work of Newton and Leibniz in the 17th century. Newton, in his work on mechanics, recognized that acceleration (what we now call the second derivative of position) was the key quantity in his laws of motion.

Leibniz's notation fracd2ydx2\\frac{d^2y}{dx^2} made the concept of repeated differentiation intuitive — it suggests "differentiating twice with respect to x." This notation proved especially powerful for working with differential equations.

Brook Taylor (1685-1731) showed how all derivatives of a function at a point encode complete information about the function nearby — this became the famous Taylor series. This discovery revealed that higher-order derivatives are not just an abstract concept but carry deep geometric and analytical meaning.


The Second Derivative

The second derivative of ff is the derivative of ff':

f(x)=fracddx[f(x)]=fracddxleft[fracdfdxright]=fracd2fdx2f''(x) = \\frac{d}{dx}[f'(x)] = \\frac{d}{dx}\\left[\\frac{df}{dx}\\right] = \\frac{d^2f}{dx^2}

Example 1: Polynomial

Find all derivatives of f(x)=x4f(x) = x^4:

  • f(x)=x4f(x) = x^4
  • f(x)=4x3f'(x) = 4x^3
  • f(x)=12x2f''(x) = 12x^2
  • f(x)=24xf'''(x) = 24x
  • f(4)(x)=24f^{(4)}(x) = 24
  • f(5)(x)=0f^{(5)}(x) = 0 (and all higher derivatives are 0)

For a polynomial of degree n, the (n+1)th derivative and beyond are all zero.

Example 2: Trigonometric Function

Find the first four derivatives of f(x)=sin(x)f(x) = \\sin(x):

  • f(x)=sin(x)f(x) = \\sin(x)
  • f(x)=cos(x)f'(x) = \\cos(x)
  • f(x)=sin(x)f''(x) = -\\sin(x)
  • f(x)=cos(x)f'''(x) = -\\cos(x)
  • f(4)(x)=sin(x)f^{(4)}(x) = \\sin(x)

The derivatives of sin(x) cycle with period 4!


Third and Higher Derivatives

We can continue differentiating indefinitely. The nth derivative is written:

f(n)(x)quadtextorquadfracdnfdxnf^{(n)}(x) \\quad \\text{or} \\quad \\frac{d^n f}{dx^n}

For most common functions, there are patterns in their higher-order derivatives:

FunctionPattern of Derivatives
xⁿ (polynomial)Decreases degree by 1 each time; becomes 0 after n+1 derivatives
Every derivative equals eˣ (the function is its own derivative!)
sin(x), cos(x)Cycle with period 4: sin → cos → -sin → -cos → sin
ln(x)f^(n)(x) = (-1)^(n+1) · (n-1)! / xⁿ for n ≥ 1
e^(ax)f^(n)(x) = aⁿ · e^(ax)

The Special Property of e\u02E3

The function f(x)=exf(x) = e^x is remarkable: every derivative is e\u02E3. This is why e\u02E3 appears so frequently in differential equations — it's the only function (up to scaling) that equals its own derivative!


Interactive Explorer

Use this interactive visualization to explore how the original function and its first four derivatives relate to each other. Notice how:

  • When f is increasing, f' is positive
  • When f is concave up, f'' is positive
  • Inflection points of f occur where f'' = 0
  • Each derivative captures finer details about the function's shape
Higher-Order Derivative Explorer
f(x)
f'(x)
f''(x)
f'''(x)
f''''(x)

Values at x = 1.00

f(x)
1.0000
f'(x)
4.0000
f''(x)
12.0000

Derivative Formulas for Polynomial: x⁴

f'(x) = 4x³
f''(x) = 12x²
f'''(x) = 24x
f''''(x) = 24

Notation Systems for Higher-Order Derivatives

There are several common notation systems for higher-order derivatives, each with its own advantages:

Name1st2nd3rdnthBest For
Lagrange (Prime)f'(x)f''(x)f'''(x)f^(n)(x)Quick calculations
Leibnizdy/dxd²y/dx²d³y/dx³dⁿy/dxⁿChain rule, physics
Newton (Dot)ẋ̇̇Physics (time derivatives)
D-operatorDfD²fD³fDⁿfDifferential equations

Choosing Notation

Use prime notation (f', f'') for quick calculations. Use Leibniz notation (dy/dx) when you need to be explicit about variables, especially in the chain rule. Use dot notation (\u1E8B, \u1E8D) for derivatives with respect to time in physics.


Physics: Motion in Detail

In physics, higher-order derivatives of position have specific names and physical meanings:

DerivativeNameSymbolPhysical MeaningUnits (SI)
0th (position)PositionsWhere the object ismeters (m)
1stVelocityv = ds/dtHow fast position changesm/s
2ndAccelerationa = dv/dtHow fast velocity changesm/s²
3rdJerkj = da/dtHow fast acceleration changesm/s³
4thSnap (Jounce)How fast jerk changesm/s⁴
5thCrackleHow fast snap changesm/s⁵
6thPopHow fast crackle changesm/s⁶

Newton's Second Law F=maF = ma relates force to the second derivative of position. This is why so many physical laws involve second-order differential equations!

Physics of Motion: Higher-Order Derivatives in Action
Position
10.00
s(t) = 0th derivative
Velocity
5.00
v(t) = s'(t) = 1st derivative
Acceleration
-9.80
a(t) = s''(t) = 2nd derivative
Jerk
0.00
j(t) = s'''(t) = 3rd derivative

What's Happening?

In physics, each derivative tells us something important about motion:

  • Position s(t): Where the object is
  • Velocity v(t) = s'(t): How fast position is changing (speed and direction)
  • Acceleration a(t) = s''(t): How fast velocity is changing (force/mass by Newton's 2nd law)
  • Jerk j(t) = s'''(t): How fast acceleration is changing (affects passenger comfort!)

Why Jerk Matters

Engineers designing elevators, trains, and roller coasters pay close attention to jerk (the third derivative of position). High jerk causes discomfort and even injury. Smooth transportation requires not just constant velocity, but also smooth changes in acceleration!


Concavity and Curvature

The second derivative reveals crucial information about a curve's shape:

f''(x) > 0: Concave Up

The curve bends upward like a bowl that holds water. The tangent line lies below the curve. The slope f'(x) is increasing.

f''(x) < 0: Concave Down

The curve bends downward like an upside-down bowl. The tangent line lies above the curve. The slope f'(x) is decreasing.

An inflection point occurs where concavity changes — typically where f(x)=0f''(x) = 0 (though we must verify the sign actually changes).

Curvature: Quantifying Bending

The curvature kappa\\kappa at a point measures how sharply the curve bends. It's defined as:

kappa=fracf(x)(1+[f(x)]2)3/2\\kappa = \\frac{|f''(x)|}{(1 + [f'(x)]^2)^{3/2}}

The denominator accounts for the slope; otherwise, a tilted straight line would appear curved.

The osculating circle at a point is the circle that best approximates the curve there. Its radius is R=1/kappaR = 1/\\kappa, the reciprocal of curvature.

Curvature and the Second Derivative
f'(x) - Slope
1.0000
f''(x) - Concavity
2.0000
Concave up ↑
\u03BA - Curvature
0.7071
R - Radius
1.41

Curvature Formula

\u03BA = |f''(x)| / (1 + f'(x)\u00B2)^(3/2)
R = 1/\u03BA = Radius of osculating circle

The osculating circle is the circle that best approximates the curve at a point. Its radius is the reciprocal of curvature. Notice how:

  • Where f''(x) = 0 (inflection points), curvature is 0 and radius is infinite (straight line)
  • Large |f''(x)| means tight curvature (small radius)
  • The sign of f''(x) determines which side of the curve the center lies

Taylor Series: Higher Derivatives Build Approximations

One of the most profound uses of higher-order derivatives is in Taylor series. The Taylor series of f centered at a is:

f(x)=f(a)+f(a)(xa)+fracf(a)2!(xa)2+fracf(a)3!(xa)3+cdotsf(x) = f(a) + f'(a)(x-a) + \\frac{f''(a)}{2!}(x-a)^2 + \\frac{f'''(a)}{3!}(x-a)^3 + \\cdots

In summation form: f(x)=sumn=0inftyfracf(n)(a)n!(xa)nf(x) = \\sum_{n=0}^{\\infty} \\frac{f^{(n)}(a)}{n!}(x-a)^n

Each term uses a higher-order derivative to capture more detail about the function's behavior near the point a:

  • 0th derivative (f(a)): The value at a (constant term)
  • 1st derivative (f'(a)): The slope at a (linear approximation)
  • 2nd derivative (f''(a)): The curvature at a (quadratic approximation)
  • Higher derivatives: Finer and finer details of the shape
Taylor Series: Higher-Order Derivatives Build Approximations

Taylor Polynomial of Order 2

T2(x) = 1 + x + 1/2 · x²

Each term uses a higher-order derivative: the n-th term is f^(n)(a) / n! \u00B7 (x-a)^n

Key Insight

Higher-order derivatives capture more and more information about the function's behavior:

  • 0th derivative (value): Where the function is
  • 1st derivative (slope): Which direction it's going
  • 2nd derivative (curvature): How it's bending
  • Higher: Finer details of the shape

ML Connection

Taylor expansions are fundamental to machine learning:

  • Gradient descent: Uses 1st-order (gradient)
  • Newton's method: Uses 2nd-order (Hessian)
  • Natural gradient: Uses Fisher information
  • Approximating losses: Taylor around optimum

Patterns in Derivatives of Common Functions

Recognizing patterns in higher-order derivatives saves time and provides insight:

Exponential Functions

For f(x)=eaxf(x) = e^{ax}:

f(n)(x)=aneaxf^{(n)}(x) = a^n e^{ax}

The special case a = 1 gives f(n)(x)=exf^{(n)}(x) = e^x for all n.

Sine and Cosine

The derivatives cycle with period 4:

sin(x)

sin \u2192 cos \u2192 -sin \u2192 -cos \u2192 sin

cos(x)

cos \u2192 -sin \u2192 -cos \u2192 sin \u2192 cos

General formula: fracdndxn[sin(x)]=sin(x+npi/2)\\frac{d^n}{dx^n}[\\sin(x)] = \\sin(x + n\\pi/2)

Polynomials

For f(x)=xnf(x) = x^n:

f(k)(x)=fracn!(nk)!xnk=n(n1)(n2)cdots(nk+1)xnkf^{(k)}(x) = \\frac{n!}{(n-k)!} x^{n-k} = n(n-1)(n-2)\\cdots(n-k+1)x^{n-k}

At the nth derivative: f(n)(x)=n!f^{(n)}(x) = n! (a constant)
Beyond that: f(n+1)(x)=0f^{(n+1)}(x) = 0


Machine Learning Applications

Second-order derivatives are crucial in machine learning optimization:

The Hessian Matrix

For a function of multiple variables f(x1,x2,ldots,xn)f(x_1, x_2, \\ldots, x_n), the Hessian matrix contains all second-order partial derivatives:

Hij=fracpartial2fpartialxipartialxjH_{ij} = \\frac{\\partial^2 f}{\\partial x_i \\partial x_j}

The Hessian is symmetric (Hij = Hji) for smooth functions.

Newton's Method for Optimization

While gradient descent uses only first-order information (the gradient), Newton's method uses the Hessian to take smarter steps:

xn+1=xnH1nablafx_{n+1} = x_n - H^{-1} \\nabla f

This often converges faster (quadratically vs linearly) but requires computing and inverting the Hessian.

Gradient Descent

  • Uses 1st derivative (gradient)
  • Step: -\u03B1\u2207f
  • Linear convergence
  • Cheap per iteration

Newton's Method

  • Uses 2nd derivative (Hessian)
  • Step: -H\u207B\u00B9\u2207f
  • Quadratic convergence
  • Expensive per iteration

Curvature Information for Better Training

The Hessian provides crucial information:

  • Eigenvalues: Tell us about the curvature in different directions
  • Condition number: Large = ill-conditioned loss surface = harder to optimize
  • Saddle points: Detected by mixed positive/negative eigenvalues
  • Local minimum: Confirmed by all positive eigenvalues

Practical Approaches

Computing the full Hessian is expensive (O(n\u00B2) storage, O(n\u00B3) to invert). Practical methods include:

  • Diagonal approximations: Only keep diagonal elements
  • L-BFGS: Approximate inverse Hessian from gradient history
  • Natural gradient: Use Fisher information instead
  • Adaptive methods: Adam, AdaGrad adapt to curvature implicitly

Python Implementation

Computing Higher-Order Derivatives

Here's how to compute higher-order derivatives numerically and recognize their patterns:

Computing nth Derivatives
🐍higher_order_derivatives.py
4Recursive nth Derivative

We compute higher-order derivatives recursively: the nth derivative is the derivative of the (n-1)th derivative. This mirrors the mathematical definition.

10Central Difference

The first derivative uses the central difference formula: f'(x) ≈ [f(x+h) - f(x-h)]/(2h), which is more accurate than forward or backward difference.

13Recursive Case

For n > 1, we define a function that computes the (n-1)th derivative, then differentiate that. This chains the differentiation process.

24Polynomial Termination

For x^4: 5th derivative and beyond are 0. Each derivative reduces the degree by 1, and after degree + 1 derivatives, we get zero.

35Trigonometric Cycling

sin(x) derivatives cycle with period 4: sin → cos → -sin → -cos → sin. This is because d/dx(sin) = cos and d/dx(cos) = -sin.

42 lines without explanation
1import numpy as np
2from scipy.misc import derivative
3
4def nth_derivative(f, x, n, h=1e-5):
5    """
6    Compute the nth derivative of f at x numerically.
7
8    Uses recursive central difference formula.
9    """
10    if n == 0:
11        return f(x)
12    elif n == 1:
13        return (f(x + h) - f(x - h)) / (2 * h)
14    else:
15        # Recursive: n-th derivative is derivative of (n-1)th
16        def f_prev(t):
17            return nth_derivative(f, t, n - 1, h)
18        return (f_prev(x + h) - f_prev(x - h)) / (2 * h)
19
20# Example: Analyze f(x) = x^4
21f = lambda x: x**4
22
23print("Derivatives of f(x) = x^4 at x = 2:")
24for n in range(6):
25    deriv = nth_derivative(f, 2.0, n)
26    print(f"  f^({n})(2) = {deriv:.2f}")
27
28# Expected values at x = 2:
29# f(2)     = 16
30# f'(2)    = 4*2^3 = 32
31# f''(2)   = 12*2^2 = 48
32# f'''(2)  = 24*2 = 48
33# f''''(2) = 24
34# f^(5)(2) = 0
35
36print()
37
38# Example: Sin function cycles every 4 derivatives
39g = lambda x: np.sin(x)
40x0 = np.pi / 4
41
42print(f"Derivatives of sin(x) at x = π/4:")
43for n in range(8):
44    deriv = nth_derivative(g, x0, n)
45    print(f"  sin^({n})(π/4) = {deriv:.4f}")
46
47# Pattern: sin -> cos -> -sin -> -cos -> sin (period 4)

The Hessian in Machine Learning

Computing the Hessian matrix for optimization:

Hessian Matrix for Optimization
🐍hessian_optimization.py
3The Hessian Matrix

The Hessian is the matrix of all second-order partial derivatives. For a function of n variables, it's an n×n symmetric matrix.

17Mixed Partials

We compute ∂²f/∂x_i∂x_j using the formula with four function evaluations. By Schwarz's theorem, mixed partials are equal: ∂²f/∂x∂y = ∂²f/∂y∂x.

26Finite Difference Formula

This formula approximates the second mixed partial derivative using four strategically placed points, similar to how we approximate first derivatives.

40Eigenvalue Test

If all eigenvalues of the Hessian are positive, the function is locally convex (bowl-shaped upward). This is the second derivative test for multiple variables!

46Newton's Method

Newton's method uses the Hessian to take optimal steps: Δx = -H^(-1)∇f. This uses second-order information to converge faster than gradient descent.

51 lines without explanation
1import numpy as np
2
3def compute_hessian(f, x, h=1e-5):
4    """
5    Compute the Hessian matrix of f at point x.
6
7    The Hessian contains all second-order partial derivatives:
8    H[i,j] = ∂²f / ∂x_i ∂x_j
9
10    Used in Newton's method for optimization.
11    """
12    n = len(x)
13    H = np.zeros((n, n))
14
15    for i in range(n):
16        for j in range(n):
17            # Second partial derivative using finite differences
18            x_pp = x.copy()
19            x_pm = x.copy()
20            x_mp = x.copy()
21            x_mm = x.copy()
22
23            x_pp[i] += h; x_pp[j] += h
24            x_pm[i] += h; x_pm[j] -= h
25            x_mp[i] -= h; x_mp[j] += h
26            x_mm[i] -= h; x_mm[j] -= h
27
28            H[i, j] = (f(x_pp) - f(x_pm) - f(x_mp) + f(x_mm)) / (4 * h * h)
29
30    return H
31
32# Example: Quadratic function (bowl shape)
33# f(x, y) = x^2 + 2y^2 + xy
34# Hessian should be [[2, 1], [1, 4]]
35
36def f(x):
37    return x[0]**2 + 2*x[1]**2 + x[0]*x[1]
38
39x0 = np.array([1.0, 1.0])
40H = compute_hessian(f, x0)
41
42print("Function: f(x,y) = x² + 2y² + xy")
43print(f"Hessian at (1, 1):")
44print(H)
45print()
46
47# Check eigenvalues for convexity
48eigenvalues = np.linalg.eigvalsh(H)
49print(f"Eigenvalues: {eigenvalues}")
50print(f"All positive? {np.all(eigenvalues > 0)} → Convex (local min)")
51
52# Newton's step: Δx = -H^(-1) * gradient
53grad = np.array([2*x0[0] + x0[1], 4*x0[1] + x0[0]])  # gradient
54delta = -np.linalg.solve(H, grad)
55print(f"Newton step from (1,1): {delta}")
56print(f"New point: {x0 + delta}")

Test Your Understanding

Test Your Understanding
Question 1 of 10
If f(x) = x⁴, what is f″(x)?

Summary

Higher-order derivatives extend the power of calculus, revealing progressively finer details about how functions behave.

Key Formulas

ConceptFormula
Second derivativef′′(x) = d/dx[f′(x)] = d²f/dx²
nth derivativef^(n)(x) = d^n f/dx^n
Curvatureκ = |f′′|/(1 + f′²)^(3/2)
Taylor series termf^(n)(a)/n! · (x-a)^n
Newton optimizationx_{n+1} = x_n - H⁻¹∇f

Key Concepts

  1. The second derivative measures the rate of change of the rate of change — it tells us about concavity and curvature
  2. In physics, derivatives of position give velocity (1st), acceleration (2nd), and jerk (3rd)
  3. Polynomials terminate: after degree + 1 derivatives, all higher derivatives are 0
  4. Exponentials are special: e\u02E3 equals all its own derivatives
  5. Trig functions cycle with period 4: sin \u2192 cos \u2192 -sin \u2192 -cos \u2192 sin
  6. Taylor series use all derivatives to build polynomial approximations
  7. The Hessian (matrix of second partials) is crucial for optimization in ML
The Takeaway:
Higher-order derivatives peel back layers of change — each one reveals more about how functions evolve, from their slope to their curvature to the finest details of their shape.
Coming Next: In the next chapter, we'll explore Derivatives of Transcendental Functions — how to differentiate exponentials, logarithms, and trigonometric functions.
Loading comments...