Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

State and apply the product rule for differentiating products of functions
Understand geometrically why the product rule has its particular form using the "growing rectangle" analogy
Prove the product rule from the limit definition of the derivative
Extend the product rule to three or more factors
Connect the product rule to gradient computation in neural networks (backpropagation)
Avoid common mistakes when applying the rule

The Big Picture: Differentiating Products

"When two quantities that both change are multiplied together, the rate of change of their product involves contributions from both of them."

In the previous sections, we learned the power rule for differentiating $x^n$ and the constant multiple rule. But what if we need to differentiate a product of two functions, like $f(x) \cdot g(x)$ ?

A natural first guess might be that $(fg)' = f' \cdot g'$ — just differentiate each factor. But this is wrong! Let's see why with a simple example:

Counter-Example: Why (fg)' \u2260 f' \u00b7 g'

Let $f(x) = x$ and $g(x) = x$ . Then $f(x) \cdot g(x) = x^2$ .

Wrong answer: $f'(x) \cdot g'(x) = 1 \cdot 1 = 1$

Correct answer: $(x^2)' = 2x$

Clearly $1 \neq 2x$ , so the naive guess fails!

The correct formula is the product rule:

The Product Rule

\frac{d}{dx}[f(x) \cdot g(x)] = f'(x) \cdot g(x) + f(x) \cdot g'(x)

In Leibniz notation: $\frac{d(uv)}{dx} = \frac{du}{dx}v + u\frac{dv}{dx}$

Memory Aid

"The derivative of the first times the second, plus the first times the derivative of the second."

Each factor gets its turn to be differentiated while the other stays fixed.

Historical Context: Leibniz and Newton

The product rule was discovered independently by Isaac Newton and Gottfried Wilhelm Leibniz in the late 17th century as they developed calculus. It's one of the foundational differentiation rules that make calculus a practical tool.

Leibniz, in particular, saw the product rule as arising naturally from his notation. He wrote differentials as $d(uv)$ and observed that when both $u$ and $v$ change by small amounts $du$ and $dv$ :

$(u + du)(v + dv) = uv + u \cdot dv + v \cdot du + du \cdot dv$

The change in the product is: $d(uv) = u \cdot dv + v \cdot du + du \cdot dv$

Since $du \cdot dv$ is infinitesimally small compared to the other terms, we get:

$d(uv) = u \cdot dv + v \cdot du$

Intuitive Understanding: The Growing Rectangle

The most elegant way to understand the product rule is through the area of a rectangle analogy.

Imagine a rectangle with width $f(t)$ and height $g(t)$ , both changing with time. The area is $A(t) = f(t) \cdot g(t)$ . How fast is the area changing?

When time increases by a small amount $\Delta t$ :

The width changes from $f$ to $f + \Delta f$ where $\Delta f \approx f' \cdot \Delta t$
The height changes from $g$ to $g + \Delta g$ where $\Delta g \approx g' \cdot \Delta t$

The new area has three additional pieces beyond the original rectangle:

Region	Area	Contribution to dA/dt
Right strip (green)	Δf · g = f'Δt · g	f'(t) · g(t)
Top strip (blue)	f · Δg = f · g'Δt	f(t) · g'(t)
Corner (purple)	Δf · Δg = f'g'(Δt)²	Vanishes as Δt → 0

The Geometry of the Product Rule: Rectangle Area

Imagine a rectangle whose width f(t) and height g(t) both change with time. The area A(t) = f(t) · g(t) changes as both sides grow. How fast does the area change?

Time t = 1.0

Time step \u0394t = 0.50

Area Change Breakdown

Original Area: 1.950

f' \u00b7 g \u00b7 \u0394t = 0.3250

f \u00b7 g' \u00b7 \u0394t = 0.2250

f' \u00b7 g' \u00b7 \u0394t\u00B2 = 0.0375

\u0394A = 0.5875

The Key Insight

As \u0394t \u2192 0, the purple corner (proportional to \u0394t\u00B2) becomes negligible compared to the green and blue strips (proportional to \u0394t).

dA/dt = f'(t) \u00b7 g(t) + f(t) \u00b7 g'(t)

Ratio Analysis

\u0394A / \u0394t = 1.1750

True derivative: f'g + fg' = 1.1000

Error from corner term: 0.075000

The Key Insight

The corner term $\Delta f \cdot \Delta g$ is proportional to $(\Delta t)^2$ , so when we divide by $\Delta t$ and take the limit, it vanishes. Only the two strips contribute to the derivative.

Geometric Proof

Let's formalize the rectangle argument:

Setup: Let $A(t) = f(t) \cdot g(t)$ represent the area of a rectangle.

Step 1: Compute the change in area:

$A(t + \Delta t) - A(t) = f(t + \Delta t) \cdot g(t + \Delta t) - f(t) \cdot g(t)$

Step 2: Add and subtract $f(t) \cdot g(t + \Delta t)$ :

$= f(t + \Delta t) \cdot g(t + \Delta t) - f(t) \cdot g(t + \Delta t) + f(t) \cdot g(t + \Delta t) - f(t) \cdot g(t)$

Step 3: Factor:

$= [f(t + \Delta t) - f(t)] \cdot g(t + \Delta t) + f(t) \cdot [g(t + \Delta t) - g(t)]$

Step 4: Divide by $\Delta t$ and take the limit:

$\frac{dA}{dt} = \lim_{\Delta t \to 0} \frac{f(t + \Delta t) - f(t)}{\Delta t} \cdot g(t + \Delta t) + f(t) \cdot \lim_{\Delta t \to 0} \frac{g(t + \Delta t) - g(t)}{\Delta t}$

Step 5: Since $g(t + \Delta t) \to g(t)$ as $\Delta t \to 0$ (by continuity):

$\frac{dA}{dt} = f'(t) \cdot g(t) + f(t) \cdot g'(t)$ ∎

Formal Proof from the Limit Definition

Here's the rigorous proof using the limit definition of the derivative:

Theorem: If $f$ and $g$ are differentiable at $x$ , then so is $h(x) = f(x) \cdot g(x)$ , and:

$h'(x) = f'(x) \cdot g(x) + f(x) \cdot g'(x)$

Proof:

$h'(x) = \lim_{h \to 0} \frac{f(x+h)g(x+h) - f(x)g(x)}{h}$

Add and subtract $f(x)g(x+h)$ :

$= \lim_{h \to 0} \frac{f(x+h)g(x+h) - f(x)g(x+h) + f(x)g(x+h) - f(x)g(x)}{h}$

Split into two fractions:

$= \lim_{h \to 0} \left[ \frac{f(x+h) - f(x)}{h} \cdot g(x+h) + f(x) \cdot \frac{g(x+h) - g(x)}{h} \right]$

Apply limit laws:

$= \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \cdot \lim_{h \to 0} g(x+h) + f(x) \cdot \lim_{h \to 0} \frac{g(x+h) - g(x)}{h}$

Since $g$ is differentiable, it's continuous, so $\lim_{h \to 0} g(x+h) = g(x)$ :

$= f'(x) \cdot g(x) + f(x) \cdot g'(x)$ ∎

Interactive Exploration

Use the visualizer below to see the product rule in action. Select different function pairs and watch how the derivative of their product relates to the individual derivatives.

Interactive Product Rule Visualizer

Explore how the derivative of a product f(x) · g(x) relates to the derivatives of the individual functions. The product rule states: (fg)' = f'g + fg'

Function Pair

Point x = 1.50

Show tangent lines

Function Values at x = 1.50

f(x)1.5000

g(x)2.2500

f(x) · g(x)3.3750

Derivative Values

f'(x)1.0000

g'(x)3.0000

Product Rule: (fg)' = f'g + fg'

f'(x) · g(x) = 2.2500+f(x) · g'(x) = 4.5000=6.7500

The slope of the product curve at x = 1.50 is the sum of two terms: the derivative of the first function times the second, plus the first function times the derivative of the second.

Worked Examples

Example 1: Polynomial times Polynomial

Find $\frac{d}{dx}[(x^2 + 1)(x^3 - 2x)]$

Solution: Let $f(x) = x^2 + 1$ and $g(x) = x^3 - 2x$

$f'(x) = 2x$
$g'(x) = 3x^2 - 2$

Applying the product rule:

$\frac{d}{dx}[f \cdot g] = (2x)(x^3 - 2x) + (x^2 + 1)(3x^2 - 2)$

Expanding:

$= 2x^4 - 4x^2 + 3x^4 - 2x^2 + 3x^2 - 2$

$= 5x^4 - 3x^2 - 2$

Example 2: Exponential times Polynomial

Find $\frac{d}{dx}[e^x \cdot x^2]$

Solution: Let $f(x) = e^x$ and $g(x) = x^2$

$f'(x) = e^x$
$g'(x) = 2x$

Applying the product rule:

$\frac{d}{dx}[e^x \cdot x^2] = e^x \cdot x^2 + e^x \cdot 2x$

$= e^x(x^2 + 2x) = e^x \cdot x(x + 2)$

Example 3: Trigonometric times Polynomial

Find $\frac{d}{dx}[x \sin(x)]$

Solution: Let $f(x) = x$ and $g(x) = \sin(x)$

$f'(x) = 1$
$g'(x) = \cos(x)$

Applying the product rule:

$\frac{d}{dx}[x \sin(x)] = 1 \cdot \sin(x) + x \cdot \cos(x) = \sin(x) + x\cos(x)$

Extensions and Generalizations

Product of Three Functions

For three functions $u(x)$ , $v(x)$ , and $w(x)$ :

(uvw)' = u'vw + uv'w + uvw'

Proof idea: Apply the two-function product rule twice:

$(uvw)' = ((uv) \cdot w)' = (uv)'w + (uv)w'$

$= (u'v + uv')w + uvw'$

$= u'vw + uv'w + uvw'$

Pattern

For $n$ functions, the derivative has $n$ terms. In each term, exactly one function is differentiated while all others remain unchanged.

General Product Rule

For $n$ differentiable functions:

\frac{d}{dx}[f_1 f_2 \cdots f_n] = \sum_{i=1}^{n} f_1 \cdots f_{i-1} \cdot f_i' \cdot f_{i+1} \cdots f_n

This can be proven by induction using the two-function product rule as the base case.

Machine Learning Applications

The product rule is fundamental to backpropagation, the algorithm used to train neural networks.

Gradient Flow Through Multiplication

In a neural network, many operations involve multiplying quantities together:

Weighted inputs: $w \cdot x$ (weight times input)
Attention scores: $\text{softmax}(QK^T) \cdot V$
Gating mechanisms: $\sigma(z) \cdot \tanh(c)$ in LSTMs

When computing gradients during backpropagation, the product rule tells us how the gradient flows through these multiplication operations:

Forward: z = x \u00b7 y

Multiply inputs x and y to get output z

Backward: Gradients

$\frac{\partial L}{\partial x} = \frac{\partial L}{\partial z} \cdot y$

$\frac{\partial L}{\partial y} = \frac{\partial L}{\partial z} \cdot x$

The Product Rule in Action

When backpropagating through $z = x \cdot y$ , the gradient with respect to x is the upstream gradient times y (the "other" factor), and vice versa. This is exactly the product rule: $\partial(xy)/\partial x = y$ and $\partial(xy)/\partial y = x$ .

Example: Attention Mechanism

In transformer attention, scores are computed as:

$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right) \cdot V$

This involves multiple matrix multiplications. Backpropagating gradients through this expression requires the product rule at each multiplication step.

Python Implementation

Numerical Verification

Let's verify the product rule numerically by comparing the direct derivative with the formula:

Verifying the Product Rule Numerically

🐍product_rule_verify.py

Explanation(5)

Code(55)

3Product Rule Verification

This function numerically verifies the product rule by computing h'(x) two ways: directly from the definition, and using the formula f'g + fg'.

11Numerical Derivative

We use the central difference formula for better accuracy: f'(x) ≈ [f(x+h) - f(x-h)] / 2h. This is O(h²) accurate, better than the forward difference.

16Direct Numerical Derivative of Product

We compute the derivative of h(x) = f(x)g(x) directly using the definition, without applying the product rule.

19Product Rule Formula

The product rule gives us h'(x) = f'(x)g(x) + f(x)g'(x). This should match the numerical derivative if the rule is correct.

32Testing x² sin(x)

For h(x) = x²sin(x), we have f'(x) = 2x and g'(x) = cos(x), so h'(x) = 2x·sin(x) + x²·cos(x).

50 lines without explanation

1import numpy as np
2
3def product_rule_numerical(f, g, x, h=1e-7):
4    """
5    Verify the product rule numerically.
6
7    For h(x) = f(x) * g(x), the product rule states:
8    h'(x) = f'(x) * g(x) + f(x) * g'(x)
9
10    We'll compute both sides and compare.
11    """
12    # Numerical derivatives
13    f_prime = (f(x + h) - f(x - h)) / (2 * h)
14    g_prime = (g(x + h) - g(x - h)) / (2 * h)
15
16    # Product function
17    h_func = lambda t: f(t) * g(t)
18    h_prime_numerical = (h_func(x + h) - h_func(x - h)) / (2 * h)
19
20    # Product rule formula
21    h_prime_formula = f_prime * g(x) + f(x) * g_prime
22
23    return {
24        "f(x)": f(x),
25        "g(x)": g(x),
26        "f'(x)": f_prime,
27        "g'(x)": g_prime,
28        "h'(x) numerical": h_prime_numerical,
29        "h'(x) formula": h_prime_formula,
30        "difference": abs(h_prime_numerical - h_prime_formula)
31    }
32
33# Example 1: f(x) = x^2, g(x) = sin(x)
34f1 = lambda x: x**2
35g1 = lambda x: np.sin(x)
36
37result1 = product_rule_numerical(f1, g1, x=np.pi/4)
38print("Example: h(x) = x^2 * sin(x) at x = pi/4")
39for key, value in result1.items():
40    print(f"  {key}: {value:.6f}")
41print()
42
43# Example 2: f(x) = e^x, g(x) = x
44f2 = lambda x: np.exp(x)
45g2 = lambda x: x
46
47result2 = product_rule_numerical(f2, g2, x=1.0)
48print("Example: h(x) = e^x * x at x = 1")
49for key, value in result2.items():
50    print(f"  {key}: {value:.6f}")
51print()
52
53# The product rule: h'(x) = e^x * x + e^x * 1 = e^x(x + 1)
54# At x = 1: h'(1) = e^1 * (1 + 1) = 2e
55print(f"Exact answer: 2*e = {2 * np.e:.6f}")

Product Rule in Backpropagation

Here's how the product rule appears in automatic differentiation:

Product Rule in Automatic Differentiation

🐍backprop_product_rule.py

Explanation(6)

Code(70)

3Computation Graph Node

Each node stores its value, gradient, children (operands), and operation type. This is the foundation of automatic differentiation.

15Product Rule in Backprop

The product rule appears directly: when backpropagating through a multiplication a*b, the gradient flows to 'a' multiplied by the value of 'b', and vice versa.

17Gradient Flow to First Factor

∂(ab)/∂a = b, so the upstream gradient is multiplied by b.value before propagating to node a. This is the product rule in action!

18Gradient Flow to Second Factor

∂(ab)/∂b = a, so the upstream gradient is multiplied by a.value before propagating to node b.

45Building the Computation Graph

We build f = x*y + y*z step by step. Notice y appears in two products, so it will receive gradients from both paths.

55Gradient Accumulation

Variable y appears in both x*y and y*z. The product rule gives us ∂(xy)/∂y = x and ∂(yz)/∂y = z, so df/dy = x + z = 6.

64 lines without explanation

1import numpy as np
2
3class ComputationNode:
4    """
5    A node in a computation graph that demonstrates
6    how the product rule appears in backpropagation.
7    """
8    def __init__(self, value, children=None, op=None):
9        self.value = value
10        self.grad = 0.0
11        self.children = children or []
12        self.op = op
13
14    def backward(self, upstream_grad=1.0):
15        self.grad += upstream_grad
16
17        if self.op == 'mul':
18            # Product rule! d(ab)/da = b, d(ab)/db = a
19            a, b = self.children
20            a.backward(upstream_grad * b.value)  # df/da = b
21            b.backward(upstream_grad * a.value)  # df/db = a
22
23        elif self.op == 'add':
24            # Sum rule: d(a+b)/da = 1, d(a+b)/db = 1
25            for child in self.children:
26                child.backward(upstream_grad)
27
28def multiply(a, b):
29    """Create a multiplication node."""
30    return ComputationNode(
31        a.value * b.value,
32        children=[a, b],
33        op='mul'
34    )
35
36def add(a, b):
37    """Create an addition node."""
38    return ComputationNode(
39        a.value + b.value,
40        children=[a, b],
41        op='add'
42    )
43
44# Example: Compute gradients for f = x * y + y * z
45x = ComputationNode(2.0)
46y = ComputationNode(3.0)
47z = ComputationNode(4.0)
48
49# Forward pass: f = x*y + y*z = 2*3 + 3*4 = 6 + 12 = 18
50xy = multiply(x, y)  # xy = 6
51yz = multiply(y, z)  # yz = 12
52f = add(xy, yz)      # f = 18
53
54print("Forward pass:")
55print(f"  f = x*y + y*z = {x.value}*{y.value} + {y.value}*{z.value} = {f.value}")
56print()
57
58# Backward pass: compute df/dx, df/dy, df/dz
59f.backward()
60
61print("Backward pass (using product rule):")
62print(f"  df/dx = y = {x.grad}")  # Should be 3
63print(f"  df/dy = x + z = {y.grad}")  # Should be 2 + 4 = 6
64print(f"  df/dz = y = {z.grad}")  # Should be 3
65
66# Verify analytically:
67# f = xy + yz
68# df/dx = y = 3 ✓
69# df/dy = x + z = 2 + 4 = 6 ✓  (product rule applied twice!)
70# df/dz = y = 3 ✓

Common Mistakes to Avoid

Mistake 1: Multiplying derivatives

Wrong: $(fg)' = f' \cdot g'$

Correct: $(fg)' = f'g + fg'$

The derivative of a product is NOT the product of the derivatives!

Mistake 2: Forgetting the second term

Wrong: $(x^2 \sin x)' = 2x \sin x$

Correct: $(x^2 \sin x)' = 2x \sin x + x^2 \cos x$

Both factors contribute to the rate of change.

Mistake 3: Using product rule when unnecessary

For $f(x) = 3x^4$ , just use the power rule directly: $f'(x) = 12x^3$

Constants multiplied by functions use the constant multiple rule, not the product rule.

Mistake 4: Swapping the order matters for non-commutative products

For matrix products $AB$ , the product rule gives $(AB)' = A'B + AB'$ , but you must preserve the order since matrix multiplication is not commutative.

Test Your Understanding

Test Your Understanding: The Product Rule

1. If h(x) = x² · sin(x), what is h'(x) using the product rule?

2. Which of the following is the correct statement of the product rule?

3. Find the derivative of f(x) = e^x · x

4. In the geometric interpretation using a rectangle, what does the 'purple corner' represent?

5. If f(x) = (x + 1)(x - 1), what is f'(x)?

6. For three functions u, v, w, what is the derivative of (uvw)'?

7. In machine learning, the product rule is essential for computing gradients. Why?

8. What is the derivative of f(x) = x · ln(x)?

Answer all 8 questions to check your results

Summary

The product rule is a fundamental differentiation technique that tells us how to find the derivative of a product of two functions.

Key Formula

\frac{d}{dx}[f(x) \cdot g(x)] = f'(x) \cdot g(x) + f(x) \cdot g'(x)

Key Concepts

Concept	Description
Geometric intuition	Rate of change of rectangle area = right strip + top strip
Formula	(fg)' = f'g + fg' — each factor takes a turn being differentiated
Extension to n factors	n terms, each with exactly one differentiated factor
In backpropagation	∂(xy)/∂x = y and ∂(xy)/∂y = x
Common mistake	(fg)' ≠ f'g' — never multiply the derivatives!

Key Takeaways

The product rule accounts for the fact that both factors contribute to the rate of change of their product
Geometrically, it comes from the area of a growing rectangle: two strips grow, the corner term vanishes
For three or more functions, each factor takes its turn being differentiated while the others stay fixed
The product rule is essential in backpropagation for computing gradients through multiplication operations
Never confuse $(fg)'$ with $f' \cdot g'$ !

The Product Rule in One Sentence:

"Each factor takes a turn being differentiated while the other stays fixed, and the results are added together."

Coming Next: In the next section, we'll learn the Quotient Rule for differentiating ratios of functions. Spoiler: it's closely related to the product rule!