Learning Objectives
By the end of this section, you will be able to:
- State and apply the product rule for differentiating products of functions
- Understand geometrically why the product rule has its particular form using the "growing rectangle" analogy
- Prove the product rule from the limit definition of the derivative
- Extend the product rule to three or more factors
- Connect the product rule to gradient computation in neural networks (backpropagation)
- Avoid common mistakes when applying the rule
The Big Picture: Differentiating Products
"When two quantities that both change are multiplied together, the rate of change of their product involves contributions from both of them."
In the previous sections, we learned the power rule for differentiating and the constant multiple rule. But what if we need to differentiate a product of two functions, like ?
A natural first guess might be that — just differentiate each factor. But this is wrong! Let's see why with a simple example:
Counter-Example: Why (fg)' \u2260 f' \u00b7 g'
Let and . Then .
Wrong answer:
Correct answer:
Clearly , so the naive guess fails!
The correct formula is the product rule:
The Product Rule
In Leibniz notation:
Memory Aid
"The derivative of the first times the second, plus the first times the derivative of the second."
Each factor gets its turn to be differentiated while the other stays fixed.
Historical Context: Leibniz and Newton
The product rule was discovered independently by Isaac Newton and Gottfried Wilhelm Leibniz in the late 17th century as they developed calculus. It's one of the foundational differentiation rules that make calculus a practical tool.
Leibniz, in particular, saw the product rule as arising naturally from his notation. He wrote differentials as and observed that when both and change by small amounts and :
The change in the product is:
Since is infinitesimally small compared to the other terms, we get:
Intuitive Understanding: The Growing Rectangle
The most elegant way to understand the product rule is through the area of a rectangle analogy.
Imagine a rectangle with width and height , both changing with time. The area is . How fast is the area changing?
When time increases by a small amount :
- The width changes from to where
- The height changes from to where
The new area has three additional pieces beyond the original rectangle:
| Region | Area | Contribution to dA/dt |
|---|---|---|
| Right strip (green) | Δf · g = f'Δt · g | f'(t) · g(t) |
| Top strip (blue) | f · Δg = f · g'Δt | f(t) · g'(t) |
| Corner (purple) | Δf · Δg = f'g'(Δt)² | Vanishes as Δt → 0 |
The Geometry of the Product Rule: Rectangle Area
Imagine a rectangle whose width f(t) and height g(t) both change with time. The area A(t) = f(t) · g(t) changes as both sides grow. How fast does the area change?
Area Change Breakdown
The Key Insight
As \u0394t \u2192 0, the purple corner (proportional to \u0394t\u00B2) becomes negligible compared to the green and blue strips (proportional to \u0394t).
Ratio Analysis
\u0394A / \u0394t = 1.1750
True derivative: f'g + fg' = 1.1000
Error from corner term: 0.075000
The Key Insight
The corner term is proportional to , so when we divide by and take the limit, it vanishes. Only the two strips contribute to the derivative.
Geometric Proof
Let's formalize the rectangle argument:
Setup: Let represent the area of a rectangle.
Step 1: Compute the change in area:
Step 2: Add and subtract :
Step 3: Factor:
Step 4: Divide by and take the limit:
Step 5: Since as (by continuity):
∎
Formal Proof from the Limit Definition
Here's the rigorous proof using the limit definition of the derivative:
Theorem: If and are differentiable at , then so is , and:
Proof:
Add and subtract :
Split into two fractions:
Apply limit laws:
Since is differentiable, it's continuous, so :
∎
Interactive Exploration
Use the visualizer below to see the product rule in action. Select different function pairs and watch how the derivative of their product relates to the individual derivatives.
Interactive Product Rule Visualizer
Explore how the derivative of a product f(x) · g(x) relates to the derivatives of the individual functions. The product rule states: (fg)' = f'g + fg'
Function Values at x = 1.50
Derivative Values
Product Rule: (fg)' = f'g + fg'
The slope of the product curve at x = 1.50 is the sum of two terms: the derivative of the first function times the second, plus the first function times the derivative of the second.
Worked Examples
Example 1: Polynomial times Polynomial
Find
Solution: Let and
Applying the product rule:
Expanding:
Example 2: Exponential times Polynomial
Find
Solution: Let and
Applying the product rule:
Example 3: Trigonometric times Polynomial
Find
Solution: Let and
Applying the product rule:
Extensions and Generalizations
Product of Three Functions
For three functions , , and :
Proof idea: Apply the two-function product rule twice:
Pattern
For functions, the derivative has terms. In each term, exactly one function is differentiated while all others remain unchanged.
General Product Rule
For differentiable functions:
This can be proven by induction using the two-function product rule as the base case.
Machine Learning Applications
The product rule is fundamental to backpropagation, the algorithm used to train neural networks.
Gradient Flow Through Multiplication
In a neural network, many operations involve multiplying quantities together:
- Weighted inputs: (weight times input)
- Attention scores:
- Gating mechanisms: in LSTMs
When computing gradients during backpropagation, the product rule tells us how the gradient flows through these multiplication operations:
Forward: z = x \u00b7 y
Multiply inputs x and y to get output z
Backward: Gradients
The Product Rule in Action
When backpropagating through , the gradient with respect to x is the upstream gradient times y (the "other" factor), and vice versa. This is exactly the product rule: and .
Example: Attention Mechanism
In transformer attention, scores are computed as:
This involves multiple matrix multiplications. Backpropagating gradients through this expression requires the product rule at each multiplication step.
Python Implementation
Numerical Verification
Let's verify the product rule numerically by comparing the direct derivative with the formula:
Product Rule in Backpropagation
Here's how the product rule appears in automatic differentiation:
Common Mistakes to Avoid
Mistake 1: Multiplying derivatives
Wrong:
Correct:
The derivative of a product is NOT the product of the derivatives!
Mistake 2: Forgetting the second term
Wrong:
Correct:
Both factors contribute to the rate of change.
Mistake 3: Using product rule when unnecessary
For , just use the power rule directly:
Constants multiplied by functions use the constant multiple rule, not the product rule.
Mistake 4: Swapping the order matters for non-commutative products
For matrix products , the product rule gives , but you must preserve the order since matrix multiplication is not commutative.
Test Your Understanding
Test Your Understanding: The Product Rule
1. If h(x) = x² · sin(x), what is h'(x) using the product rule?
2. Which of the following is the correct statement of the product rule?
3. Find the derivative of f(x) = e^x · x
4. In the geometric interpretation using a rectangle, what does the 'purple corner' represent?
5. If f(x) = (x + 1)(x - 1), what is f'(x)?
6. For three functions u, v, w, what is the derivative of (uvw)'?
7. In machine learning, the product rule is essential for computing gradients. Why?
8. What is the derivative of f(x) = x · ln(x)?
Answer all 8 questions to check your results
Summary
The product rule is a fundamental differentiation technique that tells us how to find the derivative of a product of two functions.
Key Formula
Key Concepts
| Concept | Description |
|---|---|
| Geometric intuition | Rate of change of rectangle area = right strip + top strip |
| Formula | (fg)' = f'g + fg' — each factor takes a turn being differentiated |
| Extension to n factors | n terms, each with exactly one differentiated factor |
| In backpropagation | ∂(xy)/∂x = y and ∂(xy)/∂y = x |
| Common mistake | (fg)' ≠ f'g' — never multiply the derivatives! |
Key Takeaways
- The product rule accounts for the fact that both factors contribute to the rate of change of their product
- Geometrically, it comes from the area of a growing rectangle: two strips grow, the corner term vanishes
- For three or more functions, each factor takes its turn being differentiated while the others stay fixed
- The product rule is essential in backpropagation for computing gradients through multiplication operations
- Never confuse with !
Coming Next: In the next section, we'll learn the Quotient Rule for differentiating ratios of functions. Spoiler: it's closely related to the product rule!