Chapter 16
18 min read
Section 142 of 353

Derivatives of Vector Functions

Vector-Valued Functions

Learning Objectives

By the end of this section, you will be able to:

  1. Define the derivative of a vector-valued function using limits
  2. Compute derivatives by differentiating each component separately
  3. Apply differentiation rules: sum, product, chain rule for vector functions
  4. Interpret the derivative geometrically as the tangent vector to a curve
  5. Calculate velocity, speed, and acceleration for motion problems
  6. Find unit tangent vectors and understand their significance
  7. Connect vector derivatives to gradients in machine learning

The Big Picture: Why Differentiate Vectors?

"The derivative tells us how things change — and in the vector world, change means not just 'how fast' but also 'in which direction.'"

In single-variable calculus, the derivative f(x)f'(x) tells us the instantaneous rate of change of a function — how fast and in what sense (increasing or decreasing) the output changes as we nudge the input. With vector-valued functions, we face a richer question: when a point moves along a curve in space, how does its position vector change?

The answer is the derivative of a vector function, which gives us a new vector — the tangent vector to the curve. This tangent vector captures:

  • Direction: Which way is the point moving at this instant?
  • Speed: How fast is it moving (the magnitude of the tangent)?
  • Velocity: The complete picture — direction and speed together

The Central Idea

For a vector function r(t)=f(t),g(t),h(t)\mathbf{r}(t) = \langle f(t), g(t), h(t) \rangle, the derivative is computed by differentiating each component:

r(t)=f(t),g(t),h(t)\mathbf{r}'(t) = \langle f'(t), g'(t), h'(t) \rangle

This elegant result lets us apply all our single-variable differentiation techniques component by component!

Where Vector Derivatives Appear

Physics

  • Velocity = derivative of position
  • Acceleration = derivative of velocity
  • Jerk, snap, and higher derivatives
  • Electric and magnetic field variations

Engineering

  • Robot arm kinematics
  • Flight path analysis
  • Structural deformation rates
  • Control system dynamics

Computer Graphics

  • Curve tangents for shading
  • Motion interpolation
  • Camera path smoothing
  • Particle system dynamics

Machine Learning

  • Gradients for optimization
  • Backpropagation (chain rule)
  • Neural network training
  • Optimization trajectories

Historical Origins

The calculus of vector functions developed alongside classical mechanics in the 17th-19th centuries, as mathematicians sought precise ways to describe motion in space.

Newton and Leibniz: The Foundations

Isaac Newton (1643–1727) essentially invented vector calculus to solve physics problems. His "method of fluxions" treated velocity as the rate of change of position — exactly our modern concept of the derivative of a position vector. His laws of motion require computing derivatives of vector quantities.

Gottfried Leibniz (1646–1716) developed the notation we still use today. The symbols drdt\frac{d\mathbf{r}}{dt} and r(t)\mathbf{r}'(t) both trace back to his systematic approach to infinitesimal calculus.

The 19th Century Formalization

The formal treatment of vector derivatives emerged with the work of William Rowan Hamilton, Josiah Willard Gibbs, and Oliver Heaviside in the 1800s. They established that:

  • Vector functions can be differentiated component-wise
  • The derivative rules (product, chain) extend naturally to vectors
  • The geometric meaning is the tangent vector to the curve

From Physics to Machine Learning

The same mathematical framework Newton used to describe planetary motion now powers machine learning. When we compute gradients in neural networks, we're using vector calculus — differentiating a scalar loss function with respect to a vector of weights produces a gradient vector, just as differentiating position with respect to time produces a velocity vector.


The Definition: Derivative of a Vector Function

Definition: Derivative of a Vector Function

Let r(t)\mathbf{r}(t) be a vector-valued function. The derivative of r\mathbf{r} at tt is:

r(t)=limΔt0r(t+Δt)r(t)Δt\mathbf{r}'(t) = \lim_{\Delta t \to 0} \frac{\mathbf{r}(t + \Delta t) - \mathbf{r}(t)}{\Delta t}

provided this limit exists. The derivative is also denoted drdt\frac{d\mathbf{r}}{dt}.

This definition is identical in form to the scalar derivative — we're taking the limit of a difference quotient. The key insight is that subtracting vectors and dividing by a scalar still yields a vector.

Component-Wise Differentiation

The remarkable practical consequence is that we can differentiate component by component:

Theorem: Component-Wise Differentiation

If r(t)=f(t),g(t),h(t)\mathbf{r}(t) = \langle f(t), g(t), h(t) \rangle, then:

r(t)=f(t),g(t),h(t)\mathbf{r}'(t) = \langle f'(t), g'(t), h'(t) \rangle

provided f(t)f'(t), g(t)g'(t), and h(t)h'(t) all exist.

This follows directly from limit laws: the limit of a sum is the sum of limits, and limits can be taken component by component.

Example: Circular Motion

Consider a particle moving on a unit circle: r(t)=cos(t),sin(t)\mathbf{r}(t) = \langle \cos(t), \sin(t) \rangle

Derivative: r(t)=sin(t),cos(t)\mathbf{r}'(t) = \langle -\sin(t), \cos(t) \rangle

At t=0t = 0: position is (1,0)(1, 0) and velocity is (0,1)(0, 1) — pointing straight up!

The velocity vector is tangent to the circle and perpendicular to the position vector.

Why Perpendicular?

For any curve on a sphere (including a circle), the velocity is perpendicular to the position vector. This is because r(t)2=constant|\mathbf{r}(t)|^2 = \text{constant}, so differentiating both sides gives 2rr=02\mathbf{r} \cdot \mathbf{r}' = 0, which means rr\mathbf{r} \perp \mathbf{r}'.


Visualizing the Limit: Secant to Tangent

Just as the derivative in single-variable calculus arises from the limit of secant lines approaching a tangent line, the vector derivative arises from secant vectors approaching the tangent vector.

The secant vector from r(t0)\mathbf{r}(t_0) to r(t0+Δt)\mathbf{r}(t_0 + \Delta t) is:

r(t0+Δt)r(t0)Δt\frac{\mathbf{r}(t_0 + \Delta t) - \mathbf{r}(t_0)}{\Delta t}

As Δt0\Delta t \to 0, this secant vector rotates and stretches/shrinks to become the tangent vector r(t0)\mathbf{r}'(t_0).

Use the interactive visualization below to watch the limit process in action:

The Limit Definition: Secant → Tangent
r(t₀)r(t₀+Δt)Secant (approx)True tangentDisplacement
t₀ = 0.300
Δt = 0.3000

The Limit Definition

r'(t₀) = limΔt→0 [r(t₀+Δt) - r(t₀)]/ Δt

Vector Comparison

Secant (approx):(-1.667, -5.129)
True r'(t₀):(-5.976, -1.942)
Error magnitude:5.3600
⟳ Decrease Δt to see the secant approach the tangent vector

Understanding the Limit

The secant vector connects two points on the curve and approximates the direction of motion. As we take the limit Δt → 0, this secant rotates and shrinks, approaching the true tangent vector — the instantaneous rate of change of the position vector. This is exactly how we defined the scalar derivative, but now applied to vectors!


Differentiation Rules for Vector Functions

All the familiar differentiation rules extend to vector functions. Here are the key rules:

Basic Rules

RuleFormulaNotes
Sum Ruled/dt[u + v] = u' + v'Add component derivatives
Constant Multipled/dt[c · u] = c · u'c is a scalar constant
Scalar Function Productd/dt[f(t)u] = f'(t)u + f(t)u'Product rule with scalar

Product Rules

There are three important product rules for vectors:

Dot Product Rule

ddt[uv]=uv+uv\frac{d}{dt}[\mathbf{u} \cdot \mathbf{v}] = \mathbf{u}' \cdot \mathbf{v} + \mathbf{u} \cdot \mathbf{v}'

Note: The result is a scalar (derivative of a scalar is a scalar).

Cross Product Rule

ddt[u×v]=u×v+u×v\frac{d}{dt}[\mathbf{u} \times \mathbf{v}] = \mathbf{u}' \times \mathbf{v} + \mathbf{u} \times \mathbf{v}'

Note: Order matters! The cross product is not commutative.

Scalar-Vector Product Rule

ddt[f(t)u(t)]=f(t)u(t)+f(t)u(t)\frac{d}{dt}[f(t)\mathbf{u}(t)] = f'(t)\mathbf{u}(t) + f(t)\mathbf{u}'(t)

This is the standard product rule with a scalar function.

Chain Rule

If r(t)\mathbf{r}(t) is a vector function and t=g(s)t = g(s) is a scalar function, then:

drds=drdtdtds=r(t)g(s)\frac{d\mathbf{r}}{ds} = \frac{d\mathbf{r}}{dt} \cdot \frac{dt}{ds} = \mathbf{r}'(t) \cdot g'(s)

The Chain Rule is Fundamental to ML

The chain rule for vectors is exactly what powers backpropagation in neural networks. When computing gradients, we chain together derivatives through multiple layers — each application of the chain rule propagates the gradient backward through the network.


Geometric Interpretation: The Tangent Vector

The derivative r(t)\mathbf{r}'(t) has a beautiful geometric meaning: it is the tangent vector to the curve at the point r(t)\mathbf{r}(t).

Geometric Meaning of r'(t)

  1. Direction: r(t)\mathbf{r}'(t) points in the direction of motion along the curve at time tt
  2. Magnitude: r(t)|\mathbf{r}'(t)| equals the speed — how fast the point moves along the curve
  3. Tangent Line: The line through r(t0)\mathbf{r}(t_0) with direction r(t0)\mathbf{r}'(t_0) is the tangent line to the curve

The Tangent Line

The parametric equation of the tangent line at t=t0t = t_0 is:

L(s)=r(t0)+sr(t0)\mathbf{L}(s) = \mathbf{r}(t_0) + s \cdot \mathbf{r}'(t_0)

Here, ss is a parameter ranging over all real numbers. When s=0s = 0, we're at the point of tangency.

Interactive Exploration

Explore how the tangent vector changes as you move along different curves. Notice how the tangent always points in the direction of motion.

Vector Function Derivative Visualizer
r(t) positionr'(t) tangent

r(t) = ⟨cos(t), sin(t)⟩

t = 3.142
0.50×

Computed Values

Position r(t):(-1.000, 0.000)
Derivative r'(t):(-0.000, -1.000)
Speed |r'(t)|:1.000
Unit T(t):(-0.000, -1.000)

Key Insight

The tangent vector r'(t) always points in the direction of motion along the curve. Its magnitude represents the speed — how fast the point moves. The unit tangent T(t) has magnitude 1, capturing only the direction without speed information.


Velocity and Speed: The Physical Interpretation

When r(t)\mathbf{r}(t) represents the position of a particle at time tt, the derivative has a direct physical meaning:

QuantityDefinitionType
Velocityv(t) = r'(t)Vector (direction + magnitude)
Speed|v(t)| = |r'(t)|Scalar (magnitude only)
Accelerationa(t) = v'(t) = r''(t)Vector

Key Distinction: Velocity vs. Speed

Velocity v(t)

  • A vector
  • Has direction and magnitude
  • Can be negative (reverses)
  • v(t)=r(t)\mathbf{v}(t) = \mathbf{r}'(t)

Speed |v(t)|

  • A scalar
  • Magnitude only (no direction)
  • Always non-negative
  • v(t)=r(t)|\mathbf{v}(t)| = |\mathbf{r}'(t)|

This distinction is crucial: velocity tells you how fast and in what direction something is moving, while speed tells you only how fast.

Explore this distinction interactively below. Watch how the velocity vector changes direction around the ellipse while the speed (its magnitude) varies:

Velocity (Vector) vs. Speed (Scalar)
xyv(t) velocityT(t) directionSpeed1.00
t = 0.000 rad

Speed Over Time

0

Speed varies as the object moves along the ellipse

Current Values

Position r(t):(3.000, 0.000)
Velocity v(t):(0.000, 1.000)
Speed |v(t)|:1.000

Key Distinction

Velocity v(t)

Vector — has direction

= r'(t)

Speed |v(t)|

Scalar — just magnitude

= |r'(t)|

Physical Interpretation

Watch how the velocity vector changes as the particle moves along the ellipse. At the ends of the major axis (left and right), the speed is slowest (the object "turns around"). At the top and bottom, the speed is fastest. The velocity vector always points tangent to the path — the direction of instantaneous motion.


Unit Tangent Vector

Sometimes we want just the direction of motion, without the speed information. This is captured by the unit tangent vector:

Definition: Unit Tangent Vector

The unit tangent vector at tt is:

T(t)=r(t)r(t)\mathbf{T}(t) = \frac{\mathbf{r}'(t)}{|\mathbf{r}'(t)|}

provided r(t)0\mathbf{r}'(t) \neq \mathbf{0}. By construction, T(t)=1|\mathbf{T}(t)| = 1.

The unit tangent vector T(t)\mathbf{T}(t) points in the direction of motion with a standardized length of 1. This is useful for:

  • Describing the direction of a curve independently of parameterization
  • Computing curvature (how fast the direction changes)
  • Building the Frenet-Serret frame (T, N, B) for curve analysis
  • Normalizing directions in computer graphics

Computing T(t)

For r(t)=t2,t3\mathbf{r}(t) = \langle t^2, t^3 \rangle:

1. Find r(t)=2t,3t2\mathbf{r}'(t) = \langle 2t, 3t^2 \rangle

2. Find magnitude: r(t)=t4+9t2|\mathbf{r}'(t)| = |t|\sqrt{4 + 9t^2}

3. Divide: T(t)=1t4+9t22t,3t2\mathbf{T}(t) = \frac{1}{|t|\sqrt{4 + 9t^2}} \langle 2t, 3t^2 \rangle


Higher-Order Derivatives

Just as with scalar functions, we can take multiple derivatives of vector functions:

DerivativePhysical MeaningFormula
r(t)PositionWhere the particle is
r'(t) = v(t)VelocityHow position changes
r''(t) = a(t)AccelerationHow velocity changes
r'''(t) = j(t)JerkHow acceleration changes

Each level of derivative tells us about the rate of change of the previous quantity.

Example (Helix): For r(t)=cos(t),sin(t),t\mathbf{r}(t) = \langle \cos(t), \sin(t), t \rangle

• Velocity: r(t)=sin(t),cos(t),1\mathbf{r}'(t) = \langle -\sin(t), \cos(t), 1 \rangle

• Acceleration: r(t)=cos(t),sin(t),0\mathbf{r}''(t) = \langle -\cos(t), -\sin(t), 0 \rangle

• Jerk: r(t)=sin(t),cos(t),0\mathbf{r}'''(t) = \langle \sin(t), -\cos(t), 0 \rangle

Acceleration Points Inward

For the helix (and any circular motion), the acceleration vector r(t)\mathbf{r}''(t) points toward the center of the circle! This is the centripetal acceleration that keeps the particle curving instead of flying off in a straight line.


Applications in Science and Engineering

1. Projectile Motion

A projectile launched with initial velocity v0\mathbf{v}_0 from position r0\mathbf{r}_0 under gravity follows:

r(t)=r0+v0t+12gt2\mathbf{r}(t) = \mathbf{r}_0 + \mathbf{v}_0 t + \frac{1}{2}\mathbf{g}t^2

Taking derivatives:

  • Velocity: v(t)=v0+gt\mathbf{v}(t) = \mathbf{v}_0 + \mathbf{g}t
  • Acceleration: a(t)=g\mathbf{a}(t) = \mathbf{g} (constant)

2. Circular Motion

For uniform circular motion with radius RR and angular velocity ω\omega:

Position: r(t)=Rcos(ωt),sin(ωt)\mathbf{r}(t) = R\langle \cos(\omega t), \sin(\omega t) \rangle

Velocity: v(t)=Rωsin(ωt),cos(ωt)\mathbf{v}(t) = R\omega\langle -\sin(\omega t), \cos(\omega t) \rangle

Speed: v=Rω|\mathbf{v}| = R\omega (constant)

Acceleration: a(t)=Rω2cos(ωt),sin(ωt)=ω2r\mathbf{a}(t) = -R\omega^2\langle \cos(\omega t), \sin(\omega t) \rangle = -\omega^2 \mathbf{r}

The acceleration points toward the center (centripetal) with magnitude Rω2R\omega^2.

3. Robotics: End Effector Velocity

In robotics, the Jacobian relates joint velocities to end effector velocity. If joint angles are q(t)\mathbf{q}(t) and the end effector position is r(q)\mathbf{r}(\mathbf{q}), then:

vend=drdt=rqdqdt=J(q)q˙\mathbf{v}_{end} = \frac{d\mathbf{r}}{dt} = \frac{\partial \mathbf{r}}{\partial \mathbf{q}} \cdot \frac{d\mathbf{q}}{dt} = \mathbf{J}(\mathbf{q}) \cdot \dot{\mathbf{q}}

This is the chain rule for vectors, relating joint velocities q˙\dot{\mathbf{q}} to workspace velocity vend\mathbf{v}_{end}.


Machine Learning Applications

Vector derivatives are the heart of machine learning optimization. Every time you train a neural network, you're computing vector derivatives.

The Gradient: Derivative of a Scalar Function

Given a loss function L(w)L(\mathbf{w}) that depends on a weight vector w=(w1,w2,...,wn)\mathbf{w} = (w_1, w_2, ..., w_n), the gradient is:

L=Lw=Lw1,Lw2,...,Lwn\nabla L = \frac{\partial L}{\partial \mathbf{w}} = \left\langle \frac{\partial L}{\partial w_1}, \frac{\partial L}{\partial w_2}, ..., \frac{\partial L}{\partial w_n} \right\rangle

This gradient vector points in the direction of steepest increase of LL. To minimize the loss, we move in the opposite direction:

wnew=woldηL\mathbf{w}_{new} = \mathbf{w}_{old} - \eta \nabla L

where η\eta is the learning rate.

Backpropagation: Chain Rule in Action

Neural networks are compositions of functions: L=LnLn1...L1L = L_n \circ L_{n-1} \circ ... \circ L_1. The chain rule gives us:

Lw1=Lznznzn1...z2w1\frac{\partial L}{\partial \mathbf{w}_1} = \frac{\partial L}{\partial \mathbf{z}_n} \cdot \frac{\partial \mathbf{z}_n}{\partial \mathbf{z}_{n-1}} \cdot ... \cdot \frac{\partial \mathbf{z}_2}{\partial \mathbf{w}_1}

This is exactly the vector chain rule applied repeatedly! Each term is a Jacobian matrix, and backpropagation efficiently computes this product.

The Deep Connection

When you trace the optimization path of gradient descent in weight space, you get a curve — just like the space curves we've been studying! The "velocity" along this path is ηL-\eta \nabla L, and optimization is the process of following this curve downhill toward a minimum.


Python Implementation

Vector Derivatives in NumPy

Here's how to work with vector function derivatives in Python:

Vector Function Derivatives
🐍vector_derivatives.py
8Position Vector Function

We define a helix: r(t) = ⟨cos(t), sin(t), t/2⟩. The x and y components trace a circle while z increases linearly, creating a spiral staircase shape.

12Velocity (First Derivative)

The derivative r'(t) = ⟨-sin(t), cos(t), 0.5⟩ is computed by differentiating each component separately. This is the velocity vector.

16Acceleration (Second Derivative)

The second derivative r''(t) = ⟨-cos(t), -sin(t), 0⟩ gives acceleration. Notice the z-component is 0 because velocity in z is constant.

28Speed Calculation

Speed is the magnitude of velocity: |r'(t)| = √(sin²t + cos²t + 0.25) = √1.25. The speed is constant for this helix!

32Unit Tangent Vector

T(t) = r'(t)/|r'(t)| gives the unit tangent — the direction of motion with magnitude 1. Essential for studying curve geometry.

40Numerical Differentiation

Central differences approximate the derivative: (r(t+h) - r(t-h))/(2h). This is how autodiff systems compute gradients numerically.

111 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3from mpl_toolkits.mplot3d import Axes3D
4
5# ============================================
6# DERIVATIVES OF VECTOR FUNCTIONS
7# ============================================
8
9def r(t):
10    """Position vector: r(t) = ⟨cos(t), sin(t), t/2⟩ (helix)"""
11    return np.array([np.cos(t), np.sin(t), t/2])
12
13def r_prime(t):
14    """Velocity vector (derivative): r'(t) = ⟨-sin(t), cos(t), 0.5⟩"""
15    return np.array([-np.sin(t), np.cos(t), 0.5])
16
17def r_double_prime(t):
18    """Acceleration vector (second derivative): r''(t) = ⟨-cos(t), -sin(t), 0⟩"""
19    return np.array([-np.cos(t), -np.sin(t), 0])
20
21# Evaluate at a specific time
22t0 = np.pi / 4
23position = r(t0)
24velocity = r_prime(t0)
25acceleration = r_double_prime(t0)
26
27print(f"At t = π/4:")
28print(f"  Position r(t):     {position}")
29print(f"  Velocity r'(t):    {velocity}")
30print(f"  Acceleration r''(t): {acceleration}")
31
32# Speed (magnitude of velocity)
33speed = np.linalg.norm(velocity)
34print(f"  Speed |r'(t)|:     {speed:.4f}")
35
36# Unit tangent vector
37T = velocity / speed
38print(f"  Unit tangent T(t): {T}")
39
40# ============================================
41# NUMERICAL DIFFERENTIATION
42# ============================================
43
44def numerical_derivative(r_func, t, h=1e-5):
45    """Approximate derivative using central differences."""
46    return (r_func(t + h) - r_func(t - h)) / (2 * h)
47
48# Compare numerical vs. analytical
49numerical_vel = numerical_derivative(r, t0)
50analytical_vel = r_prime(t0)
51
52print(f"\n--- Numerical vs Analytical ---")
53print(f"Numerical r'(t):  {numerical_vel}")
54print(f"Analytical r'(t): {analytical_vel}")
55print(f"Error: {np.linalg.norm(numerical_vel - analytical_vel):.2e}")
56
57# ============================================
58# VISUALIZATION
59# ============================================
60
61fig = plt.figure(figsize=(14, 5))
62
63# 3D helix with tangent and acceleration vectors
64ax1 = fig.add_subplot(131, projection='3d')
65
66# Draw the helix
67t_vals = np.linspace(0, 4*np.pi, 200)
68x = np.cos(t_vals)
69y = np.sin(t_vals)
70z = t_vals / 2
71
72ax1.plot(x, y, z, 'b-', linewidth=2, label='r(t) helix')
73
74# Draw vectors at t0
75pos = r(t0)
76vel = r_prime(t0) * 0.5  # Scale for visibility
77acc = r_double_prime(t0) * 0.5
78
79ax1.quiver(*pos, *vel, color='orange', linewidth=2,
80           label="r'(t) velocity", arrow_length_ratio=0.2)
81ax1.quiver(*pos, *acc, color='red', linewidth=2,
82           label="r''(t) acceleration", arrow_length_ratio=0.2)
83ax1.scatter(*pos, color='green', s=100, zorder=5)
84
85ax1.set_xlabel('X')
86ax1.set_ylabel('Y')
87ax1.set_zlabel('Z')
88ax1.legend(loc='upper left')
89ax1.set_title('Helix with Velocity and Acceleration')
90
91# Speed over time
92ax2 = fig.add_subplot(132)
93speeds = [np.linalg.norm(r_prime(t)) for t in t_vals]
94ax2.plot(t_vals, speeds, 'g-', linewidth=2)
95ax2.axhline(y=np.sqrt(1.25), color='r', linestyle='--',
96            label=f'Constant speed = √1.25')
97ax2.set_xlabel('Time t')
98ax2.set_ylabel('Speed |r\'(t)|')
99ax2.set_title('Speed vs Time')
100ax2.legend()
101
102# Tangent vector components
103ax3 = fig.add_subplot(133)
104T_x = [-np.sin(t)/np.sqrt(1.25) for t in t_vals]
105T_y = [np.cos(t)/np.sqrt(1.25) for t in t_vals]
106T_z = [0.5/np.sqrt(1.25) for t in t_vals]
107
108ax3.plot(t_vals, T_x, 'r-', label='T_x')
109ax3.plot(t_vals, T_y, 'g-', label='T_y')
110ax3.plot(t_vals, T_z, 'b-', label='T_z')
111ax3.set_xlabel('Time t')
112ax3.set_ylabel('Component')
113ax3.set_title('Unit Tangent Vector Components')
114ax3.legend()
115
116plt.tight_layout()
117plt.show()

Gradients in Machine Learning

Here's how vector derivatives appear in ML optimization:

Gradients and Optimization
🐍gradient_descent.py
8Loss Function

The loss L(w) measures prediction error. It's a scalar function of the weight vector w. We want to find w that minimizes L.

13The Gradient Vector

∂L/∂w = ⟨∂L/∂w₁, ∂L/∂w₂, ∂L/∂w₃⟩ is a vector pointing in the direction of steepest increase. Each component is a partial derivative.

35Gradient Descent

We move in the NEGATIVE gradient direction to decrease loss. This is exactly like following the downhill slope of a surface.

EXAMPLE
w_new = w_old - learning_rate × ∂L/∂w
44The Update Step

w = w - η∇L is the core update. The gradient tells us which direction increases L; we go opposite to decrease it.

60Optimization as a Curve

The sequence of weight vectors forms a path through weight space — a discrete version of a vector-valued function w(t)!

74 lines without explanation
1import numpy as np
2
3# ============================================
4# VECTOR DERIVATIVES IN MACHINE LEARNING
5# ============================================
6
7# In ML, we differentiate loss functions with respect to weight vectors
8# The gradient is a vector of partial derivatives
9
10def loss_function(w, X, y):
11    """Mean squared error loss: L = (1/n) * ||Xw - y||²"""
12    predictions = X @ w
13    residuals = predictions - y
14    return np.mean(residuals ** 2)
15
16def gradient(w, X, y):
17    """Gradient: ∂L/∂w = (2/n) * X.T @ (Xw - y)"""
18    n = len(y)
19    predictions = X @ w
20    residuals = predictions - y
21    return (2/n) * X.T @ residuals
22
23# Example data: 3 features, 5 samples
24np.random.seed(42)
25X = np.random.randn(5, 3)  # 5 samples, 3 features
26y = np.random.randn(5)      # 5 target values
27w = np.array([1.0, -0.5, 0.2])  # Initial weights
28
29print("--- Gradient Computation ---")
30print(f"Weight vector w: {w}")
31print(f"Loss L(w): {loss_function(w, X, y):.4f}")
32print(f"Gradient ∂L/∂w: {gradient(w, X, y)}")
33
34# ============================================
35# GRADIENT DESCENT: Following the Negative Gradient
36# ============================================
37
38def gradient_descent(X, y, learning_rate=0.1, iterations=100):
39    """Minimize loss by moving opposite to the gradient."""
40    w = np.zeros(X.shape[1])  # Initialize at origin
41    history = []
42
43    for i in range(iterations):
44        loss = loss_function(w, X, y)
45        grad = gradient(w, X, y)
46        history.append({'iter': i, 'loss': loss, 'w': w.copy(), 'grad': grad.copy()})
47
48        # Key step: update in NEGATIVE gradient direction
49        w = w - learning_rate * grad
50
51        if i < 5:
52            print(f"Iter {i}: loss = {loss:.4f}, |grad| = {np.linalg.norm(grad):.4f}")
53
54    return w, history
55
56print("\n--- Gradient Descent ---")
57w_optimal, history = gradient_descent(X, y)
58print(f"\nOptimal weights: {w_optimal}")
59print(f"Final loss: {loss_function(w_optimal, X, y):.6f}")
60
61# ============================================
62# PATH AS A VECTOR-VALUED FUNCTION
63# ============================================
64
65# The optimization path w(t) is like a parametric curve!
66# Each iteration t gives a new weight vector w(t)
67
68print("\n--- Optimization Path as Vector Function ---")
69print("Think of w(iteration) as a vector-valued function:")
70print("  w(0) → w(1) → w(2) → ... → w_optimal")
71print("\nThe 'velocity' is approximately -grad (direction of update)")
72print("The path curves through weight space toward the minimum!")
73
74# Path length (total distance traveled)
75total_distance = 0
76for i in range(1, len(history)):
77    step = np.linalg.norm(history[i]['w'] - history[i-1]['w'])
78    total_distance += step
79print(f"\nTotal path length: {total_distance:.4f}")

Common Pitfalls

Pitfall 1: Confusing Speed and Velocity

Speed r(t)|\mathbf{r}'(t)| is a scalar (always ≥ 0). Velocity r(t)\mathbf{r}'(t) is a vector (can point in any direction). They're related but not the same!

Pitfall 2: Forgetting Order in Cross Products

When differentiating u×v\mathbf{u} \times \mathbf{v}, the order matters: ddt[u×v]=u×v+u×v\frac{d}{dt}[\mathbf{u} \times \mathbf{v}] = \mathbf{u}' \times \mathbf{v} + \mathbf{u} \times \mathbf{v}'. Swapping the order changes the sign!

Pitfall 3: Division by Zero in Unit Tangent

The unit tangent T(t)=r(t)/r(t)\mathbf{T}(t) = \mathbf{r}'(t)/|\mathbf{r}'(t)| is undefined when r(t)=0\mathbf{r}'(t) = \mathbf{0}. This happens at cusps or stationary points where the particle momentarily stops.

Pitfall 4: Assuming Constant Speed

Just because an object moves along a curve doesn't mean its speed is constant. For r(t)=t2,t3\mathbf{r}(t) = \langle t^2, t^3 \rangle, the speed r(t)=t4+9t2|\mathbf{r}'(t)| = |t|\sqrt{4 + 9t^2} varies with tt.

Pitfall 5: Reparameterization Changes r'(t)

The same curve with different parameterizations has different velocity vectors. If r1(t)\mathbf{r}_1(t) and r2(s)\mathbf{r}_2(s) trace the same curve, r1(t)r2(s)\mathbf{r}_1'(t) \neq \mathbf{r}_2'(s) in general. The unit tangent T\mathbf{T}, however, is the same!


Test Your Understanding

Test Your Understanding
Question 1 of 8

If r(t) = ⟨t², 3t, cos(t)⟩, what is r'(t)?


Summary

The derivative of a vector function extends the fundamental concept of instantaneous rate of change to curves in space. By differentiating component-wise, we obtain the tangent vector — a powerful tool for analyzing motion, geometry, and optimization.

Key Concepts

ConceptDescription
Definitionr'(t) = lim[r(t+Δt) - r(t)]/Δt
Component formr'(t) = ⟨f'(t), g'(t), h'(t)⟩
Velocityv(t) = r'(t) — vector describing motion
Speed|v(t)| = |r'(t)| — scalar magnitude of velocity
Unit tangentT(t) = r'(t)/|r'(t)| — direction only, |T| = 1
Accelerationa(t) = r''(t) = v'(t)
Gradient∇L = ⟨∂L/∂w₁, ..., ∂L/∂wₙ⟩ — for ML optimization

Key Takeaways

  1. The derivative of a vector function is computed component by component
  2. r(t)\mathbf{r}'(t) is the tangent vector to the curve — it points in the direction of motion
  3. Velocity is a vector (direction + speed); speed is a scalar (just magnitude)
  4. All differentiation rules (sum, product, chain) extend naturally to vectors
  5. The unit tangent vector T(t)\mathbf{T}(t) captures direction without speed
  6. In machine learning, gradients are vector derivatives used for optimization
  7. Backpropagation is the chain rule for vectors applied through neural network layers
The Essence of Vector Derivatives:
"The derivative of position is velocity. The derivative of velocity is acceleration. The derivative of a loss function is the gradient. Each tells us how to move forward — literally or figuratively — toward our goal."
Coming Next: In the next section, we'll explore Arc Length and Curvature — using the tangent vector to measure how long a curve is and how sharply it bends. These concepts complete our toolkit for analyzing space curves.
Loading comments...