Chapter 4
18 min read
Section 36 of 353

The Product Rule

The Derivative - Instantaneous Rate of Change

Learning Objectives

By the end of this section, you will be able to:

  1. State and apply the product rule for differentiating products of functions
  2. Understand geometrically why the product rule has its particular form using the "growing rectangle" analogy
  3. Prove the product rule from the limit definition of the derivative
  4. Extend the product rule to three or more factors
  5. Connect the product rule to gradient computation in neural networks (backpropagation)
  6. Avoid common mistakes when applying the rule

The Big Picture: Differentiating Products

"When two quantities that both change are multiplied together, the rate of change of their product involves contributions from both of them."

In the previous sections, we learned the power rule for differentiating xnx^n and the constant multiple rule. But what if we need to differentiate a product of two functions, like f(x)g(x)f(x) \cdot g(x)?

A natural first guess might be that (fg)=fg(fg)' = f' \cdot g' — just differentiate each factor. But this is wrong! Let's see why with a simple example:

Counter-Example: Why (fg)' \u2260 f' \u00b7 g'

Let f(x)=xf(x) = x and g(x)=xg(x) = x. Then f(x)g(x)=x2f(x) \cdot g(x) = x^2.

Wrong answer: f(x)g(x)=11=1f'(x) \cdot g'(x) = 1 \cdot 1 = 1

Correct answer: (x2)=2x(x^2)' = 2x

Clearly 12x1 \neq 2x, so the naive guess fails!

The correct formula is the product rule:

The Product Rule

ddx[f(x)g(x)]=f(x)g(x)+f(x)g(x)\frac{d}{dx}[f(x) \cdot g(x)] = f'(x) \cdot g(x) + f(x) \cdot g'(x)

In Leibniz notation: d(uv)dx=dudxv+udvdx\frac{d(uv)}{dx} = \frac{du}{dx}v + u\frac{dv}{dx}

Memory Aid

"The derivative of the first times the second, plus the first times the derivative of the second."

Each factor gets its turn to be differentiated while the other stays fixed.


Historical Context: Leibniz and Newton

The product rule was discovered independently by Isaac Newton and Gottfried Wilhelm Leibniz in the late 17th century as they developed calculus. It's one of the foundational differentiation rules that make calculus a practical tool.

Leibniz, in particular, saw the product rule as arising naturally from his notation. He wrote differentials as d(uv)d(uv) and observed that when both uu and vv change by small amounts dudu and dvdv:

(u+du)(v+dv)=uv+udv+vdu+dudv(u + du)(v + dv) = uv + u \cdot dv + v \cdot du + du \cdot dv

The change in the product is: d(uv)=udv+vdu+dudvd(uv) = u \cdot dv + v \cdot du + du \cdot dv

Since dudvdu \cdot dv is infinitesimally small compared to the other terms, we get:

d(uv)=udv+vdud(uv) = u \cdot dv + v \cdot du


Intuitive Understanding: The Growing Rectangle

The most elegant way to understand the product rule is through the area of a rectangle analogy.

Imagine a rectangle with width f(t)f(t) and height g(t)g(t), both changing with time. The area is A(t)=f(t)g(t)A(t) = f(t) \cdot g(t). How fast is the area changing?

When time increases by a small amount Δt\Delta t:

  • The width changes from ff to f+Δff + \Delta f where ΔffΔt\Delta f \approx f' \cdot \Delta t
  • The height changes from gg to g+Δgg + \Delta g where ΔggΔt\Delta g \approx g' \cdot \Delta t

The new area has three additional pieces beyond the original rectangle:

RegionAreaContribution to dA/dt
Right strip (green)Δf · g = f'Δt · gf'(t) · g(t)
Top strip (blue)f · Δg = f · g'Δtf(t) · g'(t)
Corner (purple)Δf · Δg = f'g'(Δt)²Vanishes as Δt → 0

The Geometry of the Product Rule: Rectangle Area

Imagine a rectangle whose width f(t) and height g(t) both change with time. The area A(t) = f(t) · g(t) changes as both sides grow. How fast does the area change?

f(t) = 1.50\u0394fg(t)\u0394gA = f \u00b7 gf'\u0394t \u00b7 gf \u00b7 g'\u0394t

Area Change Breakdown

Original Area: 1.950
f' \u00b7 g \u00b7 \u0394t = 0.3250
f \u00b7 g' \u00b7 \u0394t = 0.2250
f' \u00b7 g' \u00b7 \u0394t\u00B2 = 0.0375
\u0394A = 0.5875

The Key Insight

As \u0394t \u2192 0, the purple corner (proportional to \u0394t\u00B2) becomes negligible compared to the green and blue strips (proportional to \u0394t).

dA/dt = f'(t) \u00b7 g(t) + f(t) \u00b7 g'(t)

Ratio Analysis

\u0394A / \u0394t = 1.1750

True derivative: f'g + fg' = 1.1000

Error from corner term: 0.075000

The Key Insight

The corner term ΔfΔg\Delta f \cdot \Delta g is proportional to (Δt)2(\Delta t)^2, so when we divide by Δt\Delta t and take the limit, it vanishes. Only the two strips contribute to the derivative.


Geometric Proof

Let's formalize the rectangle argument:

Setup: Let A(t)=f(t)g(t)A(t) = f(t) \cdot g(t) represent the area of a rectangle.

Step 1: Compute the change in area:

A(t+Δt)A(t)=f(t+Δt)g(t+Δt)f(t)g(t)A(t + \Delta t) - A(t) = f(t + \Delta t) \cdot g(t + \Delta t) - f(t) \cdot g(t)

Step 2: Add and subtract f(t)g(t+Δt)f(t) \cdot g(t + \Delta t):

=f(t+Δt)g(t+Δt)f(t)g(t+Δt)+f(t)g(t+Δt)f(t)g(t)= f(t + \Delta t) \cdot g(t + \Delta t) - f(t) \cdot g(t + \Delta t) + f(t) \cdot g(t + \Delta t) - f(t) \cdot g(t)

Step 3: Factor:

=[f(t+Δt)f(t)]g(t+Δt)+f(t)[g(t+Δt)g(t)]= [f(t + \Delta t) - f(t)] \cdot g(t + \Delta t) + f(t) \cdot [g(t + \Delta t) - g(t)]

Step 4: Divide by Δt\Delta t and take the limit:

dAdt=limΔt0f(t+Δt)f(t)Δtg(t+Δt)+f(t)limΔt0g(t+Δt)g(t)Δt\frac{dA}{dt} = \lim_{\Delta t \to 0} \frac{f(t + \Delta t) - f(t)}{\Delta t} \cdot g(t + \Delta t) + f(t) \cdot \lim_{\Delta t \to 0} \frac{g(t + \Delta t) - g(t)}{\Delta t}

Step 5: Since g(t+Δt)g(t)g(t + \Delta t) \to g(t) as Δt0\Delta t \to 0 (by continuity):

dAdt=f(t)g(t)+f(t)g(t)\frac{dA}{dt} = f'(t) \cdot g(t) + f(t) \cdot g'(t)


Formal Proof from the Limit Definition

Here's the rigorous proof using the limit definition of the derivative:

Theorem: If ff and gg are differentiable at xx, then so is h(x)=f(x)g(x)h(x) = f(x) \cdot g(x), and:

h(x)=f(x)g(x)+f(x)g(x)h'(x) = f'(x) \cdot g(x) + f(x) \cdot g'(x)

Proof:

h(x)=limh0f(x+h)g(x+h)f(x)g(x)hh'(x) = \lim_{h \to 0} \frac{f(x+h)g(x+h) - f(x)g(x)}{h}

Add and subtract f(x)g(x+h)f(x)g(x+h):

=limh0f(x+h)g(x+h)f(x)g(x+h)+f(x)g(x+h)f(x)g(x)h= \lim_{h \to 0} \frac{f(x+h)g(x+h) - f(x)g(x+h) + f(x)g(x+h) - f(x)g(x)}{h}

Split into two fractions:

=limh0[f(x+h)f(x)hg(x+h)+f(x)g(x+h)g(x)h]= \lim_{h \to 0} \left[ \frac{f(x+h) - f(x)}{h} \cdot g(x+h) + f(x) \cdot \frac{g(x+h) - g(x)}{h} \right]

Apply limit laws:

=limh0f(x+h)f(x)hlimh0g(x+h)+f(x)limh0g(x+h)g(x)h= \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \cdot \lim_{h \to 0} g(x+h) + f(x) \cdot \lim_{h \to 0} \frac{g(x+h) - g(x)}{h}

Since gg is differentiable, it's continuous, so limh0g(x+h)=g(x)\lim_{h \to 0} g(x+h) = g(x):

=f(x)g(x)+f(x)g(x)= f'(x) \cdot g(x) + f(x) \cdot g'(x)


Interactive Exploration

Use the visualizer below to see the product rule in action. Select different function pairs and watch how the derivative of their product relates to the individual derivatives.

Interactive Product Rule Visualizer

Explore how the derivative of a product f(x) · g(x) relates to the derivatives of the individual functions. The product rule states: (fg)' = f'g + fg'

xyf(x) = xg(x) = x²f(x) · g(x)

Function Values at x = 1.50

f(x)1.5000
g(x)2.2500
f(x) · g(x)3.3750

Derivative Values

f'(x)1.0000
g'(x)3.0000

Product Rule: (fg)' = f'g + fg'

f'(x) · g(x) = 2.2500+f(x) · g'(x) = 4.5000=6.7500

The slope of the product curve at x = 1.50 is the sum of two terms: the derivative of the first function times the second, plus the first function times the derivative of the second.


Worked Examples

Example 1: Polynomial times Polynomial

Find ddx[(x2+1)(x32x)]\frac{d}{dx}[(x^2 + 1)(x^3 - 2x)]

Solution: Let f(x)=x2+1f(x) = x^2 + 1 and g(x)=x32xg(x) = x^3 - 2x

  • f(x)=2xf'(x) = 2x
  • g(x)=3x22g'(x) = 3x^2 - 2

Applying the product rule:

ddx[fg]=(2x)(x32x)+(x2+1)(3x22)\frac{d}{dx}[f \cdot g] = (2x)(x^3 - 2x) + (x^2 + 1)(3x^2 - 2)

Expanding:

=2x44x2+3x42x2+3x22= 2x^4 - 4x^2 + 3x^4 - 2x^2 + 3x^2 - 2

=5x43x22= 5x^4 - 3x^2 - 2

Example 2: Exponential times Polynomial

Find ddx[exx2]\frac{d}{dx}[e^x \cdot x^2]

Solution: Let f(x)=exf(x) = e^x and g(x)=x2g(x) = x^2

  • f(x)=exf'(x) = e^x
  • g(x)=2xg'(x) = 2x

Applying the product rule:

ddx[exx2]=exx2+ex2x\frac{d}{dx}[e^x \cdot x^2] = e^x \cdot x^2 + e^x \cdot 2x

=ex(x2+2x)=exx(x+2)= e^x(x^2 + 2x) = e^x \cdot x(x + 2)

Example 3: Trigonometric times Polynomial

Find ddx[xsin(x)]\frac{d}{dx}[x \sin(x)]

Solution: Let f(x)=xf(x) = x and g(x)=sin(x)g(x) = \sin(x)

  • f(x)=1f'(x) = 1
  • g(x)=cos(x)g'(x) = \cos(x)

Applying the product rule:

ddx[xsin(x)]=1sin(x)+xcos(x)=sin(x)+xcos(x)\frac{d}{dx}[x \sin(x)] = 1 \cdot \sin(x) + x \cdot \cos(x) = \sin(x) + x\cos(x)


Extensions and Generalizations

Product of Three Functions

For three functions u(x)u(x), v(x)v(x), and w(x)w(x):

(uvw)=uvw+uvw+uvw(uvw)' = u'vw + uv'w + uvw'

Proof idea: Apply the two-function product rule twice:

(uvw)=((uv)w)=(uv)w+(uv)w(uvw)' = ((uv) \cdot w)' = (uv)'w + (uv)w'

=(uv+uv)w+uvw= (u'v + uv')w + uvw'

=uvw+uvw+uvw= u'vw + uv'w + uvw'

Pattern

For nn functions, the derivative has nn terms. In each term, exactly one function is differentiated while all others remain unchanged.

General Product Rule

For nn differentiable functions:

ddx[f1f2fn]=i=1nf1fi1fifi+1fn\frac{d}{dx}[f_1 f_2 \cdots f_n] = \sum_{i=1}^{n} f_1 \cdots f_{i-1} \cdot f_i' \cdot f_{i+1} \cdots f_n

This can be proven by induction using the two-function product rule as the base case.


Machine Learning Applications

The product rule is fundamental to backpropagation, the algorithm used to train neural networks.

Gradient Flow Through Multiplication

In a neural network, many operations involve multiplying quantities together:

  • Weighted inputs: wxw \cdot x (weight times input)
  • Attention scores: softmax(QKT)V\text{softmax}(QK^T) \cdot V
  • Gating mechanisms: σ(z)tanh(c)\sigma(z) \cdot \tanh(c) in LSTMs

When computing gradients during backpropagation, the product rule tells us how the gradient flows through these multiplication operations:

Forward: z = x \u00b7 y

Multiply inputs x and y to get output z

Backward: Gradients

Lx=Lzy\frac{\partial L}{\partial x} = \frac{\partial L}{\partial z} \cdot y

Ly=Lzx\frac{\partial L}{\partial y} = \frac{\partial L}{\partial z} \cdot x

The Product Rule in Action

When backpropagating through z=xyz = x \cdot y, the gradient with respect to x is the upstream gradient times y (the "other" factor), and vice versa. This is exactly the product rule: (xy)/x=y\partial(xy)/\partial x = y and (xy)/y=x\partial(xy)/\partial y = x.

Example: Attention Mechanism

In transformer attention, scores are computed as:

Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right) \cdot V

This involves multiple matrix multiplications. Backpropagating gradients through this expression requires the product rule at each multiplication step.


Python Implementation

Numerical Verification

Let's verify the product rule numerically by comparing the direct derivative with the formula:

Verifying the Product Rule Numerically
🐍product_rule_verify.py
3Product Rule Verification

This function numerically verifies the product rule by computing h'(x) two ways: directly from the definition, and using the formula f'g + fg'.

11Numerical Derivative

We use the central difference formula for better accuracy: f'(x) ≈ [f(x+h) - f(x-h)] / 2h. This is O(h²) accurate, better than the forward difference.

16Direct Numerical Derivative of Product

We compute the derivative of h(x) = f(x)g(x) directly using the definition, without applying the product rule.

19Product Rule Formula

The product rule gives us h'(x) = f'(x)g(x) + f(x)g'(x). This should match the numerical derivative if the rule is correct.

32Testing x² sin(x)

For h(x) = x²sin(x), we have f'(x) = 2x and g'(x) = cos(x), so h'(x) = 2x·sin(x) + x²·cos(x).

50 lines without explanation
1import numpy as np
2
3def product_rule_numerical(f, g, x, h=1e-7):
4    """
5    Verify the product rule numerically.
6
7    For h(x) = f(x) * g(x), the product rule states:
8    h'(x) = f'(x) * g(x) + f(x) * g'(x)
9
10    We'll compute both sides and compare.
11    """
12    # Numerical derivatives
13    f_prime = (f(x + h) - f(x - h)) / (2 * h)
14    g_prime = (g(x + h) - g(x - h)) / (2 * h)
15
16    # Product function
17    h_func = lambda t: f(t) * g(t)
18    h_prime_numerical = (h_func(x + h) - h_func(x - h)) / (2 * h)
19
20    # Product rule formula
21    h_prime_formula = f_prime * g(x) + f(x) * g_prime
22
23    return {
24        "f(x)": f(x),
25        "g(x)": g(x),
26        "f'(x)": f_prime,
27        "g'(x)": g_prime,
28        "h'(x) numerical": h_prime_numerical,
29        "h'(x) formula": h_prime_formula,
30        "difference": abs(h_prime_numerical - h_prime_formula)
31    }
32
33# Example 1: f(x) = x^2, g(x) = sin(x)
34f1 = lambda x: x**2
35g1 = lambda x: np.sin(x)
36
37result1 = product_rule_numerical(f1, g1, x=np.pi/4)
38print("Example: h(x) = x^2 * sin(x) at x = pi/4")
39for key, value in result1.items():
40    print(f"  {key}: {value:.6f}")
41print()
42
43# Example 2: f(x) = e^x, g(x) = x
44f2 = lambda x: np.exp(x)
45g2 = lambda x: x
46
47result2 = product_rule_numerical(f2, g2, x=1.0)
48print("Example: h(x) = e^x * x at x = 1")
49for key, value in result2.items():
50    print(f"  {key}: {value:.6f}")
51print()
52
53# The product rule: h'(x) = e^x * x + e^x * 1 = e^x(x + 1)
54# At x = 1: h'(1) = e^1 * (1 + 1) = 2e
55print(f"Exact answer: 2*e = {2 * np.e:.6f}")

Product Rule in Backpropagation

Here's how the product rule appears in automatic differentiation:

Product Rule in Automatic Differentiation
🐍backprop_product_rule.py
3Computation Graph Node

Each node stores its value, gradient, children (operands), and operation type. This is the foundation of automatic differentiation.

15Product Rule in Backprop

The product rule appears directly: when backpropagating through a multiplication a*b, the gradient flows to 'a' multiplied by the value of 'b', and vice versa.

17Gradient Flow to First Factor

∂(ab)/∂a = b, so the upstream gradient is multiplied by b.value before propagating to node a. This is the product rule in action!

18Gradient Flow to Second Factor

∂(ab)/∂b = a, so the upstream gradient is multiplied by a.value before propagating to node b.

45Building the Computation Graph

We build f = x*y + y*z step by step. Notice y appears in two products, so it will receive gradients from both paths.

55Gradient Accumulation

Variable y appears in both x*y and y*z. The product rule gives us ∂(xy)/∂y = x and ∂(yz)/∂y = z, so df/dy = x + z = 6.

64 lines without explanation
1import numpy as np
2
3class ComputationNode:
4    """
5    A node in a computation graph that demonstrates
6    how the product rule appears in backpropagation.
7    """
8    def __init__(self, value, children=None, op=None):
9        self.value = value
10        self.grad = 0.0
11        self.children = children or []
12        self.op = op
13
14    def backward(self, upstream_grad=1.0):
15        self.grad += upstream_grad
16
17        if self.op == 'mul':
18            # Product rule! d(ab)/da = b, d(ab)/db = a
19            a, b = self.children
20            a.backward(upstream_grad * b.value)  # df/da = b
21            b.backward(upstream_grad * a.value)  # df/db = a
22
23        elif self.op == 'add':
24            # Sum rule: d(a+b)/da = 1, d(a+b)/db = 1
25            for child in self.children:
26                child.backward(upstream_grad)
27
28def multiply(a, b):
29    """Create a multiplication node."""
30    return ComputationNode(
31        a.value * b.value,
32        children=[a, b],
33        op='mul'
34    )
35
36def add(a, b):
37    """Create an addition node."""
38    return ComputationNode(
39        a.value + b.value,
40        children=[a, b],
41        op='add'
42    )
43
44# Example: Compute gradients for f = x * y + y * z
45x = ComputationNode(2.0)
46y = ComputationNode(3.0)
47z = ComputationNode(4.0)
48
49# Forward pass: f = x*y + y*z = 2*3 + 3*4 = 6 + 12 = 18
50xy = multiply(x, y)  # xy = 6
51yz = multiply(y, z)  # yz = 12
52f = add(xy, yz)      # f = 18
53
54print("Forward pass:")
55print(f"  f = x*y + y*z = {x.value}*{y.value} + {y.value}*{z.value} = {f.value}")
56print()
57
58# Backward pass: compute df/dx, df/dy, df/dz
59f.backward()
60
61print("Backward pass (using product rule):")
62print(f"  df/dx = y = {x.grad}")  # Should be 3
63print(f"  df/dy = x + z = {y.grad}")  # Should be 2 + 4 = 6
64print(f"  df/dz = y = {z.grad}")  # Should be 3
65
66# Verify analytically:
67# f = xy + yz
68# df/dx = y = 3 ✓
69# df/dy = x + z = 2 + 4 = 6 ✓  (product rule applied twice!)
70# df/dz = y = 3 ✓

Common Mistakes to Avoid

Mistake 1: Multiplying derivatives

Wrong: (fg)=fg(fg)' = f' \cdot g'

Correct: (fg)=fg+fg(fg)' = f'g + fg'

The derivative of a product is NOT the product of the derivatives!

Mistake 2: Forgetting the second term

Wrong: (x2sinx)=2xsinx(x^2 \sin x)' = 2x \sin x

Correct: (x2sinx)=2xsinx+x2cosx(x^2 \sin x)' = 2x \sin x + x^2 \cos x

Both factors contribute to the rate of change.

Mistake 3: Using product rule when unnecessary

For f(x)=3x4f(x) = 3x^4, just use the power rule directly: f(x)=12x3f'(x) = 12x^3

Constants multiplied by functions use the constant multiple rule, not the product rule.

Mistake 4: Swapping the order matters for non-commutative products

For matrix products ABAB, the product rule gives (AB)=AB+AB(AB)' = A'B + AB', but you must preserve the order since matrix multiplication is not commutative.


Test Your Understanding

Test Your Understanding: The Product Rule

1. If h(x) = x² · sin(x), what is h'(x) using the product rule?

2. Which of the following is the correct statement of the product rule?

3. Find the derivative of f(x) = e^x · x

4. In the geometric interpretation using a rectangle, what does the 'purple corner' represent?

5. If f(x) = (x + 1)(x - 1), what is f'(x)?

6. For three functions u, v, w, what is the derivative of (uvw)'?

7. In machine learning, the product rule is essential for computing gradients. Why?

8. What is the derivative of f(x) = x · ln(x)?

Answer all 8 questions to check your results


Summary

The product rule is a fundamental differentiation technique that tells us how to find the derivative of a product of two functions.

Key Formula

ddx[f(x)g(x)]=f(x)g(x)+f(x)g(x)\frac{d}{dx}[f(x) \cdot g(x)] = f'(x) \cdot g(x) + f(x) \cdot g'(x)

Key Concepts

ConceptDescription
Geometric intuitionRate of change of rectangle area = right strip + top strip
Formula(fg)' = f'g + fg' — each factor takes a turn being differentiated
Extension to n factorsn terms, each with exactly one differentiated factor
In backpropagation∂(xy)/∂x = y and ∂(xy)/∂y = x
Common mistake(fg)' ≠ f'g' — never multiply the derivatives!

Key Takeaways

  1. The product rule accounts for the fact that both factors contribute to the rate of change of their product
  2. Geometrically, it comes from the area of a growing rectangle: two strips grow, the corner term vanishes
  3. For three or more functions, each factor takes its turn being differentiated while the others stay fixed
  4. The product rule is essential in backpropagation for computing gradients through multiplication operations
  5. Never confuse (fg)(fg)' with fgf' \cdot g'!
The Product Rule in One Sentence:
"Each factor takes a turn being differentiated while the other stays fixed, and the results are added together."
Coming Next: In the next section, we'll learn the Quotient Rule for differentiating ratios of functions. Spoiler: it's closely related to the product rule!
Loading comments...