Chapter 17
25 min read
Section 152 of 353

Directional Derivatives and the Gradient

Partial Derivatives

Learning Objectives

By the end of this section, you will be able to:

  1. Compute directional derivatives to find the rate of change of a multivariable function in any direction, not just along coordinate axes
  2. Understand and calculate the gradient vector f\nabla f, which captures all partial derivative information and points toward steepest ascent
  3. Apply the gradient-directional derivative formula Duf=fuD_{\mathbf{u}} f = \nabla f \cdot \mathbf{u} to efficiently compute rates of change in arbitrary directions
  4. Visualize gradient fields and understand their relationship to level curves and surface geometry
  5. Connect these concepts to gradient descent, the fundamental optimization algorithm powering modern machine learning
Why This Matters: The gradient is arguably the most important concept in multivariable calculus for applications. It answers the question "which way is up?" at any point on a surface. This simple idea underlies GPS navigation (finding shortest paths), neural network training (minimizing loss functions), physical simulations (heat flow, fluid dynamics), and countless optimization problems across science and engineering.

The Big Picture

In single-variable calculus, the derivative f(x)f'(x) tells us the rate of change of ff as we move along the number line. There's only one direction to go: left or right.

But for a function of two variables f(x,y)f(x, y), we have infinitely many directions to explore from any point. We could move along the x-axis, the y-axis, or at any angle in between. Each direction gives a potentially different rate of change.

The Central Questions:
  1. How fast does ff change if we move in a specific direction?
  2. Which direction gives the maximum rate of increase?
  3. How can we find this direction efficiently?

The directional derivative answers the first question, and the gradient vector elegantly answers the second and third. Together, they provide a complete picture of how a function changes in all directions at once.


Historical Context

The concept of the gradient emerged from the work of several 19th-century mathematicians who sought to generalize calculus to functions of multiple variables.

  • Augustin-Louis Cauchy (1789-1857) developed rigorous foundations for multivariable calculus and used gradient-like concepts in optimization problems
  • William Rowan Hamilton (1805-1865) introduced the nabla operator \nabla (named after a Hebrew harp) while developing his theory of quaternions
  • Peter Guthrie Tait (1831-1901) and James Clerk Maxwell (1831-1879) extensively used the gradient in their formulation of electromagnetism, where it describes how potential energy varies in space

Maxwell wrote: "The gradient of a scalar field gives both the magnitude and direction of its most rapid increase." This physical intuition—imagining a scalar field like temperature or pressure, and asking which way it increases fastest—remains the best way to understand the gradient.


The Directional Derivative

Definition and Formula

Given a function f(x,y)f(x, y) and a unit vector u=(u1,u2)\mathbf{u} = (u_1, u_2) with u=1\|\mathbf{u}\| = 1, the directional derivative of ff at point (a,b)(a, b) in the direction u\mathbf{u} is:

Duf(a,b)=limh0f(a+hu1,b+hu2)f(a,b)hD_{\mathbf{u}} f(a, b) = \lim_{h \to 0} \frac{f(a + hu_1, b + hu_2) - f(a, b)}{h}

This measures how fast ff changes as we move from (a,b)(a, b) in the direction u\mathbf{u}.

Special Cases: When u=i=(1,0)\mathbf{u} = \mathbf{i} = (1, 0), the directional derivative equals the partial derivative fx\frac{\partial f}{\partial x}. Similarly, u=j=(0,1)\mathbf{u} = \mathbf{j} = (0, 1) gives fy\frac{\partial f}{\partial y}. Partial derivatives are just directional derivatives along coordinate axes!

The remarkable fact is that if ff is differentiable, we can compute DufD_{\mathbf{u}} f for any direction using just the partial derivatives:

Duf(a,b)=fx(a,b)u1+fy(a,b)u2=fuD_{\mathbf{u}} f(a, b) = f_x(a, b) \cdot u_1 + f_y(a, b) \cdot u_2 = \nabla f \cdot \mathbf{u}

Interactive: Directional Derivative Explorer

Explore how the directional derivative changes as you vary the direction. Notice how it equals the maximum value when your direction aligns with the gradient (orange arrow) and zero when perpendicular to it:

Loading visualization...

The Gradient Vector

Definition of the Gradient

The gradient of a function f(x,y)f(x, y) is the vector of its partial derivatives:

f(x,y)=(fx,fy)=fxi+fyj\nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = f_x \mathbf{i} + f_y \mathbf{j}

The symbol \nabla (nabla or del) is a vector operator. For functions of three variables:

f(x,y,z)=(fx,fy,fz)\nabla f(x, y, z) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right)

Key Properties of the Gradient

The gradient has several remarkable properties that make it central to calculus:

PropertyStatementMeaning
Direction of Steepest Ascent∇f points in the direction of maximum increaseFollow ∇f to climb the surface fastest
Maximum Rate of Change||∇f|| equals the maximum directional derivativeThe gradient magnitude tells you the steepest slope
Perpendicular to Level Curves∇f ⟂ level curves f(x,y) = cLevel curves are always crossed at right angles by ∇f
Directional Derivative FormulaD_u f = ∇f · uProject the gradient onto any direction to get that rate of change
The Gradient Points Toward Steepest Ascent

Among all possible directions, moving in the direction of f\nabla f increases ff at the maximum possible rate. The rate of increase in this direction is f\|\nabla f\|, the magnitude of the gradient.

Conversely, moving in the direction of f-\nabla f decreases ff at the maximum rate. This is the basis of gradient descent.


Interactive: Gradient Field Visualizer

Explore the gradient vector field on different surfaces. The arrows show the direction of steepest ascent at each point. Notice how their color and length indicate the magnitude of the gradient (rate of steepest climb):

Loading 3D visualization...
What to Observe:
  • On the paraboloid, all gradients point outward from the minimum at the origin
  • On the saddle, gradients point in different directions depending on location
  • Gradient arrows are longer where the surface is steeper (higher rate of change)
  • Near flat regions (minima, maxima, or saddle points), arrows become very short

The Gradient and Level Curves

One of the most beautiful facts about the gradient is its relationship to level curves (contour lines). A level curve of f(x,y)f(x, y) is the set of points where f(x,y)=cf(x, y) = c for some constant cc.

Theorem: The gradient f\nabla f is always perpendicular to level curves of ff.

Why? If you move along a level curve, ff doesn't change—so the directional derivative along the level curve is zero. Since Duf=fu=0D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u} = 0, the direction u\mathbf{u} tangent to the level curve must be perpendicular to f\nabla f.

Hiking Analogy: Imagine hiking on a mountain where level curves represent constant elevation (like contour lines on a topographic map). The gradient at any point is the direction you'd walk to climb most steeply uphill. This direction is always perpendicular to the elevation contour you're standing on.

Computing Directional Derivatives

Example 1: Find the directional derivative of f(x,y)=x2y+y3f(x, y) = x^2 y + y^3 at the point (2,1)(2, 1) in the direction toward the origin.

Solution:

Step 1: Compute the partial derivatives:fx=2xy=2(2)(1)=4f_x = 2xy = 2(2)(1) = 4 and fy=x2+3y2=4+3=7f_y = x^2 + 3y^2 = 4 + 3 = 7

Step 2: The gradient at (2,1)(2, 1) is f(2,1)=(4,7)\nabla f(2, 1) = (4, 7)

Step 3: The direction from (2,1)(2, 1) to the origin is (0,0)(2,1)=(2,1)(0, 0) - (2, 1) = (-2, -1). The unit vector is u=(2,1)5=(25,15)\mathbf{u} = \frac{(-2, -1)}{\sqrt{5}} = \left(-\frac{2}{\sqrt{5}}, -\frac{1}{\sqrt{5}}\right)

Step 4: Compute the directional derivative:Duf=fu=(4,7)(25,15)=875=1556.71D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u} = (4, 7) \cdot \left(-\frac{2}{\sqrt{5}}, -\frac{1}{\sqrt{5}}\right) = \frac{-8 - 7}{\sqrt{5}} = -\frac{15}{\sqrt{5}} \approx -6.71

The negative value means ff is decreasing as we move toward the origin.

Example 2: At what rate does f(x,y,z)=x2+2y2+3z2f(x, y, z) = x^2 + 2y^2 + 3z^2 increase most rapidly at (1,1,1)(1, 1, 1)? In what direction?

Solution: The gradient is f=(2x,4y,6z)\nabla f = (2x, 4y, 6z). At (1,1,1)(1, 1, 1):f(1,1,1)=(2,4,6)\nabla f(1, 1, 1) = (2, 4, 6)

The maximum rate of increase is f=4+16+36=567.48\|\nabla f\| = \sqrt{4 + 16 + 36} = \sqrt{56} \approx 7.48, occurring in the direction (2,4,6)56=(114,214,314)\frac{(2, 4, 6)}{\sqrt{56}} = \left(\frac{1}{\sqrt{14}}, \frac{2}{\sqrt{14}}, \frac{3}{\sqrt{14}}\right)


Real-World Applications

Physics: Temperature and Electric Fields

Temperature Gradient: If T(x,y,z)T(x, y, z) represents temperature at position (x,y,z)(x, y, z), then T\nabla T points in the direction of fastest temperature increase. Heat flows in the opposite direction, from hot to cold, following T-\nabla T. This is Fourier's Law of heat conduction.

Electric Field: If V(x,y,z)V(x, y, z) is the electric potential (voltage), the electric field is E=V\mathbf{E} = -\nabla V. Charges "roll downhill" in the potential landscape.

Engineering: Optimization and Design

Engineers use gradients to optimize designs. If C(x1,...,xn)C(x_1, ..., x_n) represents cost as a function of design parameters, then C-\nabla C points toward the direction of greatest cost reduction. Iteratively following this direction leads to optimal (minimum cost) designs.


Machine Learning: Gradient Descent

Perhaps the most important modern application of gradients is in machine learning. Every neural network, from simple linear regression to large language models like GPT, is trained using gradient descent.

The idea is beautifully simple: we have a loss function L(w)L(\mathbf{w}) that measures how wrong our model's predictions are, where w\mathbf{w} represents all the model's parameters (weights). We want to find the weights that minimize the loss.

The Gradient Descent Update Rule:

wnew=woldαL(wold)\mathbf{w}_{\text{new}} = \mathbf{w}_{\text{old}} - \alpha \nabla L(\mathbf{w}_{\text{old}})

where α\alpha is the learning rate (step size). We move opposite to the gradient because we want to descend (minimize), not ascend.

The Learning Rate Matters!
  • Too small: Learning is slow; may get stuck in local minima
  • Too large: May overshoot the minimum and diverge
  • Just right: Converges efficiently to a good minimum
Modern optimizers like Adam and SGD with momentum adaptively adjust learning rates.

Interactive: Gradient Descent Demo

Watch gradient descent navigate different loss landscapes. Try the Rosenbrock function to see how challenging optimization can be, or the saddle function to see a case where gradient descent can get stuck:

Loading gradient descent demo...

Python Implementation

Here's a complete Python implementation showing numerical gradient computation, directional derivatives, and gradient descent optimization:

Gradient Computation and Gradient Descent
🐍python
1Imports

Import NumPy for numerical operations and matplotlib for visualization.

4Numerical Gradient Function

Computes the gradient numerically using central differences: ∂f/∂xᵢ ≈ (f(x + heᵢ) - f(x - heᵢ)) / 2h. This is the foundation of automatic differentiation.

23Directional Derivative Function

Computes the directional derivative D_u f = ∇f · u. The formula shows that the directional derivative is the component of the gradient in the direction u.

47Gradient Descent Algorithm

Implements gradient descent: x_new = x_old - α∇f. By moving in the negative gradient direction, we descend toward the minimum of the loss landscape.

77Example Function Definition

Example function f(x,y) = x² + 2y² with its minimum at the origin. The analytical gradient ∇f = (2x, 4y) allows us to verify our numerical computation.

88Gradient Computation Demo

Demonstrates computing the gradient both numerically and analytically, then computing the directional derivative in a 45° direction.

103Gradient Descent Demo

Runs gradient descent from (3, 3) to find the minimum at (0, 0). The algorithm follows the path of steepest descent at each step.

122 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3
4def numerical_gradient(f, x, h=1e-7):
5    """
6    Compute the gradient of f at point x numerically.
7
8    This is how automatic differentiation systems work
9    at their core - computing partial derivatives.
10
11    Args:
12        f: Function f(x) where x is a numpy array
13        x: Point at which to compute the gradient
14        h: Small step for finite difference
15
16    Returns:
17        grad: Gradient vector (same shape as x)
18    """
19    grad = np.zeros_like(x)
20
21    for i in range(len(x)):
22        # Create unit vector in i-th direction
23        e_i = np.zeros_like(x)
24        e_i[i] = h
25
26        # Central difference formula for partial derivative
27        grad[i] = (f(x + e_i) - f(x - e_i)) / (2 * h)
28
29    return grad
30
31def directional_derivative(f, x, u, h=1e-7):
32    """
33    Compute the directional derivative of f at x in direction u.
34
35    D_u f = ∇f · u (gradient dot direction)
36
37    Args:
38        f: Scalar function
39        x: Point of evaluation
40        u: Direction vector (will be normalized)
41        h: Step size for numerical differentiation
42
43    Returns:
44        D_u_f: Rate of change of f in direction u
45    """
46    # Ensure u is a unit vector
47    u_normalized = u / np.linalg.norm(u)
48
49    # Method 1: Use gradient formula
50    grad = numerical_gradient(f, x, h)
51    D_u_f_formula = np.dot(grad, u_normalized)
52
53    # Method 2: Direct limit definition
54    D_u_f_limit = (f(x + h * u_normalized) - f(x)) / h
55
56    return D_u_f_formula, grad
57
58def gradient_descent(f, x0, learning_rate=0.1, max_iters=100, tol=1e-6):
59    """
60    Minimize f using gradient descent.
61
62    The gradient points toward steepest ascent,
63    so we move in the NEGATIVE gradient direction.
64
65    Args:
66        f: Function to minimize
67        x0: Starting point
68        learning_rate: Step size (alpha)
69        max_iters: Maximum iterations
70        tol: Convergence tolerance
71
72    Returns:
73        x: Final point (approximate minimum)
74        history: List of all points visited
75    """
76    x = x0.copy()
77    history = [x.copy()]
78
79    for i in range(max_iters):
80        grad = numerical_gradient(f, x)
81        grad_norm = np.linalg.norm(grad)
82
83        # Check convergence
84        if grad_norm < tol:
85            print(f"Converged at iteration {i}")
86            break
87
88        # Update rule: move opposite to gradient
89        x = x - learning_rate * grad
90        history.append(x.copy())
91
92    return x, history
93
94# Example: f(x,y) = x² + 2y² (elliptical paraboloid)
95def f(p):
96    x, y = p
97    return x**2 + 2*y**2
98
99# Analytical gradient: ∇f = (2x, 4y)
100def grad_f_analytical(p):
101    x, y = p
102    return np.array([2*x, 4*y])
103
104# Test at point (1, 1)
105point = np.array([1.0, 1.0])
106direction = np.array([1.0, 1.0])  # 45° direction
107
108# Compute gradient
109grad_numerical = numerical_gradient(f, point)
110grad_analytical = grad_f_analytical(point)
111
112print(f"Point: {point}")
113print(f"Numerical gradient: {grad_numerical}")
114print(f"Analytical gradient: {grad_analytical}")
115print(f"Gradient magnitude: {np.linalg.norm(grad_numerical):.4f}")
116
117# Compute directional derivative
118D_u_f, _ = directional_derivative(f, point, direction)
119print(f"\nDirection: {direction / np.linalg.norm(direction)}")
120print(f"Directional derivative: {D_u_f:.4f}")
121
122# Run gradient descent from (3, 3)
123x0 = np.array([3.0, 3.0])
124x_min, history = gradient_descent(f, x0, learning_rate=0.1)
125print(f"\nGradient Descent:")
126print(f"Starting point: {x0}")
127print(f"Final point: {x_min}")
128print(f"Function value at minimum: {f(x_min):.6f}")
129print(f"Iterations: {len(history)}")

Test Your Understanding


Summary

In this section, we explored two fundamental concepts that bridge single-variable calculus to multivariable optimization:

  1. Directional Derivative DufD_{\mathbf{u}} f: The rate of change of ff in direction u\mathbf{u}. Computed as Duf=fuD_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}.
  2. Gradient Vector f\nabla f: The vector of partial derivatives that points in the direction of steepest ascent with magnitude equal to the maximum rate of change.
  3. Key Relationship: The gradient is perpendicular to level curves, and its magnitude gives the maximum directional derivative.
  4. Gradient Descent: Moving in the direction L-\nabla L minimizes loss functions, forming the backbone of machine learning optimization.
Looking Ahead: In the next section, we'll use the gradient to find maximum and minimum values of multivariable functions, extending the critical point analysis from single-variable calculus. We'll see that points where f=0\nabla f = \mathbf{0} are candidates for extrema, and we'll develop the second derivative test using the Hessian matrix to classify them.
Loading comments...