Chapter 4
18 min read
Section 37 of 353

The Quotient Rule

The Derivative - Instantaneous Rate of Change

Learning Objectives

By the end of this section, you will be able to:

  1. State and apply the quotient rule for differentiating quotients of functions
  2. Derive the quotient rule from the product rule and chain rule
  3. Prove the quotient rule from the limit definition of the derivative
  4. Recognize when to use the quotient rule versus simpler alternatives
  5. Connect the quotient rule to normalization operations in neural networks (softmax, layer normalization)
  6. Avoid common mistakes involving sign errors and order of terms

The Big Picture: Differentiating Ratios

"When one changing quantity is divided by another, the rate of change of their ratio depends on both the top and bottom — and the bottom 'fights back' when it changes."

In the previous section, we learned the product rule for differentiating f(x)g(x)f(x) \cdot g(x). But what if we need to differentiate a quotient of two functions, like f(x)g(x)\frac{f(x)}{g(x)}?

Just as with the product rule, a natural first guess might be that the derivative of a quotient is the quotient of derivatives: (fg)=fg\left(\frac{f}{g}\right)' = \frac{f'}{g'}. But this is wrong!

Counter-Example: Why (f/g)' \u2260 f'/g'

Let f(x)=xf(x) = x and g(x)=xg(x) = x. Then f(x)g(x)=xx=1\frac{f(x)}{g(x)} = \frac{x}{x} = 1 (for x0x \neq 0).

Wrong answer: f(x)g(x)=11=1\frac{f'(x)}{g'(x)} = \frac{1}{1} = 1

Correct answer: The derivative of the constant 1 is 00

Clearly 101 \neq 0, so the naive guess fails!

The correct formula is the quotient rule:

The Quotient Rule

ddx[f(x)g(x)]=f(x)g(x)f(x)g(x)[g(x)]2\frac{d}{dx}\left[\frac{f(x)}{g(x)}\right] = \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{[g(x)]^2}

In Leibniz notation: ddx(uv)=vdudxudvdxv2\frac{d}{dx}\left(\frac{u}{v}\right) = \frac{v\frac{du}{dx} - u\frac{dv}{dx}}{v^2}

Memory Aid: "Low d-High minus High d-Low"

"Low d-High minus High d-Low, over Low squared."

Low = denominator (g), High = numerator (f), d-High = derivative of numerator, d-Low = derivative of denominator.


Historical Context

The quotient rule, like the product rule, was developed by Isaac Newton and Gottfried Wilhelm Leibniz in the late 17th century. Leibniz's differential notation made the relationship between the product and quotient rules particularly clear.

Leibniz observed that division is just multiplication by a reciprocal:

uv=uv1\frac{u}{v} = u \cdot v^{-1}

This insight means the quotient rule can be derived from the product rule and chain rule — we'll do this shortly!


Intuitive Understanding: The Denominator "Fights Back"

To understand the quotient rule intuitively, imagine you're computing a ratio like miles per hour (speed).

Let f(t)f(t) = miles traveled and g(t)g(t) = hours elapsed. Your average speed is v(t)=f(t)/g(t)v(t) = f(t)/g(t).

How does your speed change over time?

  • When you travel more miles (ff increases), your speed goes up
  • When more time passes (gg increases), your speed goes down — the denominator "dilutes" the numerator

The quotient rule captures both effects. Notice the minus sign: when the denominator increases, itdecreases the ratio. This is why we have fgfgf' \cdot g - f \cdot g' in the numerator, not fg+fgf' \cdot g + f \cdot g'.

Why the Denominator is Squared

The g2g^2 in the denominator appears because when we compute the change in a quotient, we're dividing by the denominator twice:

  • Once for the original quotient
  • Once more when accounting for how the denominator is changing

Deriving the Quotient Rule from the Product Rule

One of the most elegant aspects of the quotient rule is that we canderive it from the product rule. The key insight is:

fg=fg1\frac{f}{g} = f \cdot g^{-1}

Use the interactive demonstration below to see each step of the derivation:

Deriving the Quotient Rule from the Product Rule
Step 1 of 7Starting Point
d/dx[f(x) / g(x)]

We want to find the derivative of a quotient of two functions.

Rewrite as ProductProduct RuleChain RuleAlgebraFinal Result

The Chain Rule Connection

This derivation requires the chain rule to differentiate g1g^{-1}. If you haven't learned the chain rule yet (it's in the next section), don't worry — we'll also prove the quotient rule directly from the limit definition below.


Formal Proof from the Limit Definition

Here's the rigorous proof using the limit definition of the derivative, which doesn't require the chain rule:

Theorem: If ff and gg are differentiable at xx and g(x)0g(x) \neq 0, then h(x)=f(x)g(x)h(x) = \frac{f(x)}{g(x)} is differentiable at xx, and:

h(x)=f(x)g(x)f(x)g(x)[g(x)]2h'(x) = \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{[g(x)]^2}

Proof:

h(x)=limk0f(x+k)g(x+k)f(x)g(x)kh'(x) = \lim_{k \to 0} \frac{\frac{f(x+k)}{g(x+k)} - \frac{f(x)}{g(x)}}{k}

Combine fractions in the numerator:

=limk01kf(x+k)g(x)f(x)g(x+k)g(x+k)g(x)= \lim_{k \to 0} \frac{1}{k} \cdot \frac{f(x+k) \cdot g(x) - f(x) \cdot g(x+k)}{g(x+k) \cdot g(x)}

Add and subtract f(x)g(x)f(x) \cdot g(x):

=limk0f(x+k)g(x)f(x)g(x)f(x)g(x+k)+f(x)g(x)kg(x+k)g(x)= \lim_{k \to 0} \frac{f(x+k)g(x) - f(x)g(x) - f(x)g(x+k) + f(x)g(x)}{k \cdot g(x+k) \cdot g(x)}

Factor:

=limk0[f(x+k)f(x)]g(x)f(x)[g(x+k)g(x)]kg(x+k)g(x)= \lim_{k \to 0} \frac{[f(x+k) - f(x)]g(x) - f(x)[g(x+k) - g(x)]}{k \cdot g(x+k) \cdot g(x)}

Separate the limits:

=1[g(x)]2[g(x)limk0f(x+k)f(x)kf(x)limk0g(x+k)g(x)k]= \frac{1}{[g(x)]^2} \left[ g(x) \lim_{k \to 0} \frac{f(x+k) - f(x)}{k} - f(x) \lim_{k \to 0} \frac{g(x+k) - g(x)}{k} \right]

Since gg is continuous, limk0g(x+k)=g(x)\lim_{k \to 0} g(x+k) = g(x):

=f(x)g(x)f(x)g(x)[g(x)]2= \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{[g(x)]^2}


Interactive Exploration

Use the visualizer below to explore the quotient rule with different function pairs. Watch how the derivative of the quotient relates to the individual functions and their derivatives.

Interactive Quotient Rule Explorer

Quotient Rule:

d/dx[/x + 1] = (2x \u00b7 x + 1 - \u00b7 1) / (x + 1)\u00B2

Values at x = 1.00:

f(x) = = 1.0000

g(x) = x + 1 = 2.0000

f'(x) = 2x = 2.0000

g'(x) = 1 = 1.0000

Quotient Rule Calculation:

h(x) = f/g = 0.5000

f'g = 4.0000

fg' = 1.0000

g\u00B2 = 4.0000

h'(x) = (4.00 - 1.00) / 4.00 = 0.7500


Worked Examples

Example 1: Simple Rational Function

Find ddx(xx+1)\frac{d}{dx}\left(\frac{x}{x+1}\right)

Solution: Let f(x)=xf(x) = x and g(x)=x+1g(x) = x + 1

  • f(x)=1f'(x) = 1
  • g(x)=1g'(x) = 1

Applying the quotient rule:

ddx(xx+1)=(1)(x+1)(x)(1)(x+1)2\frac{d}{dx}\left(\frac{x}{x+1}\right) = \frac{(1)(x+1) - (x)(1)}{(x+1)^2}

Simplifying:

=x+1x(x+1)2=1(x+1)2= \frac{x + 1 - x}{(x+1)^2} = \frac{1}{(x+1)^2}

Example 2: The Derivative of tan(x)

Find ddx[tan(x)]\frac{d}{dx}[\tan(x)]

Solution: Recall that tan(x)=sin(x)cos(x)\tan(x) = \frac{\sin(x)}{\cos(x)}

Let f(x)=sin(x)f(x) = \sin(x) and g(x)=cos(x)g(x) = \cos(x)

  • f(x)=cos(x)f'(x) = \cos(x)
  • g(x)=sin(x)g'(x) = -\sin(x)

Applying the quotient rule:

ddx[tan(x)]=cos(x)cos(x)sin(x)(sin(x))cos2(x)\frac{d}{dx}[\tan(x)] = \frac{\cos(x) \cdot \cos(x) - \sin(x) \cdot (-\sin(x))}{\cos^2(x)}

Simplifying:

=cos2(x)+sin2(x)cos2(x)=1cos2(x)= \frac{\cos^2(x) + \sin^2(x)}{\cos^2(x)} = \frac{1}{\cos^2(x)}

=sec2(x)= \sec^2(x)

Example 3: Complex Rational Function

Find ddx(x2+1x21)\frac{d}{dx}\left(\frac{x^2 + 1}{x^2 - 1}\right)

Solution: Let f(x)=x2+1f(x) = x^2 + 1 and g(x)=x21g(x) = x^2 - 1

  • f(x)=2xf'(x) = 2x
  • g(x)=2xg'(x) = 2x

Applying the quotient rule:

=(2x)(x21)(x2+1)(2x)(x21)2= \frac{(2x)(x^2 - 1) - (x^2 + 1)(2x)}{(x^2 - 1)^2}

Expanding:

=2x32x2x32x(x21)2= \frac{2x^3 - 2x - 2x^3 - 2x}{(x^2 - 1)^2}

=4x(x21)2= \frac{-4x}{(x^2 - 1)^2}

Example 4: With Exponential Function

Find ddx(exx)\frac{d}{dx}\left(\frac{e^x}{x}\right)

Solution: Let f(x)=exf(x) = e^x and g(x)=xg(x) = x

  • f(x)=exf'(x) = e^x
  • g(x)=1g'(x) = 1

Applying the quotient rule:

=exxex1x2= \frac{e^x \cdot x - e^x \cdot 1}{x^2}

=ex(x1)x2= \frac{e^x(x - 1)}{x^2}


Special Cases and Shortcuts

When the Numerator is Constant

For h(x)=cg(x)h(x) = \frac{c}{g(x)} where cc is a constant:

ddx(cg(x))=cg(x)[g(x)]2\frac{d}{dx}\left(\frac{c}{g(x)}\right) = -\frac{c \cdot g'(x)}{[g(x)]^2}

Alternative: Rewrite as cg(x)1c \cdot g(x)^{-1} and use the power rule with the chain rule:

ddx(cg(x))=c(1)g(x)2g(x)=cg(x)[g(x)]2\frac{d}{dx}\left(\frac{c}{g(x)}\right) = c \cdot (-1) \cdot g(x)^{-2} \cdot g'(x) = -\frac{c \cdot g'(x)}{[g(x)]^2}

When to Avoid the Quotient Rule

Sometimes rewriting the expression makes differentiation easier:

Instead ofRewrite asThen use
5/x³5x⁻³Power Rule
1/xⁿx⁻ⁿPower Rule
x²/2(1/2)x²Constant Multiple Rule
(x+1)/x1 + 1/x = 1 + x⁻¹Sum + Power Rule

When to Use the Quotient Rule

Use the quotient rule when both the numerator and denominator are non-constant functions that can't be simplified. If only the denominator contains xx, consider rewriting with negative exponents.


Machine Learning Applications

The quotient rule appears frequently in machine learning whenever we work with normalized or ratio-based quantities.

Softmax Function

The softmax function converts logits into probabilities:

softmax(z)i=ezij=1nezj\text{softmax}(z)_i = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}

This is a quotient! When computing gradients for backpropagation, the quotient rule tells us:

Diagonal: pizi\frac{\partial p_i}{\partial z_i}

Using quotient rule: pi(1pi)p_i(1 - p_i)

Off-diagonal: pizj\frac{\partial p_i}{\partial z_j}

Using quotient rule: pipj-p_i \cdot p_j

Attention Mechanism

In transformer attention, the attention weights are computed as:

αij=exp(sij)kexp(sik)\alpha_{ij} = \frac{\exp(s_{ij})}{\sum_k \exp(s_{ik})}

This is exactly the softmax function! Backpropagating through attention requires the quotient rule to compute gradients with respect to the scores sijs_{ij}.

Layer Normalization

Layer normalization involves dividing by the standard deviation:

x^i=xiμσ\hat{x}_i = \frac{x_i - \mu}{\sigma}

When σ\sigma depends on xix_i (as it does in practice), computing the gradient requires the quotient rule.

Why This Matters for ML

Deep learning frameworks like PyTorch and TensorFlow automatically apply the quotient rule through automatic differentiation. But understanding the rule helps you:

  • Debug gradient issues in custom layers
  • Understand numerical stability concerns
  • Implement efficient backward passes
  • Reason about gradient flow through normalization layers

Python Implementation

Numerical Verification

Let's verify the quotient rule numerically:

Verifying the Quotient Rule Numerically
🐍quotient_rule_verify.py
3Quotient Rule Verification

This function numerically verifies the quotient rule by computing h'(x) two ways: directly from the definition, and using the formula (f'g - fg') / g².

11Numerical Derivatives

We use the central difference formula for better accuracy: f'(x) ≈ [f(x+h) - f(x-h)] / 2h. This is O(h²) accurate.

18Direct Numerical Derivative

We compute the derivative of h(x) = f(x)/g(x) directly using the definition, without applying the quotient rule.

22Quotient Rule Formula

The quotient rule gives us h'(x) = (f'g - fg') / g². This should match the numerical derivative if the rule is correct.

38Testing x²/(x+1)

For h(x) = x²/(x+1), we have f'(x) = 2x and g'(x) = 1, so h'(x) = [2x(x+1) - x²(1)] / (x+1)² = (x² + 2x) / (x+1)².

47Derivative of tan(x)

This verifies that d/dx[tan(x)] = sec²(x), a classic result derived using the quotient rule on sin(x)/cos(x).

49 lines without explanation
1import numpy as np
2
3def quotient_rule_numerical(f, g, x, h=1e-7):
4    """
5    Verify the quotient rule numerically.
6
7    For h(x) = f(x) / g(x), the quotient rule states:
8    h'(x) = (f'(x) * g(x) - f(x) * g'(x)) / [g(x)]^2
9
10    We'll compute both sides and compare.
11    """
12    # Numerical derivatives using central difference
13    f_prime = (f(x + h) - f(x - h)) / (2 * h)
14    g_prime = (g(x + h) - g(x - h)) / (2 * h)
15
16    # Quotient function
17    h_func = lambda t: f(t) / g(t)
18    h_prime_numerical = (h_func(x + h) - h_func(x - h)) / (2 * h)
19
20    # Quotient rule formula
21    g_x = g(x)
22    f_x = f(x)
23    h_prime_formula = (f_prime * g_x - f_x * g_prime) / (g_x ** 2)
24
25    return {
26        "f(x)": f_x,
27        "g(x)": g_x,
28        "f'(x)": f_prime,
29        "g'(x)": g_prime,
30        "h'(x) numerical": h_prime_numerical,
31        "h'(x) formula": h_prime_formula,
32        "difference": abs(h_prime_numerical - h_prime_formula)
33    }
34
35# Example 1: f(x) = x^2, g(x) = x + 1 => h(x) = x^2 / (x + 1)
36f1 = lambda x: x**2
37g1 = lambda x: x + 1
38
39result1 = quotient_rule_numerical(f1, g1, x=2.0)
40print("Example: h(x) = x^2 / (x + 1) at x = 2")
41for key, value in result1.items():
42    print(f"  {key}: {value:.6f}")
43print()
44
45# Example 2: tan(x) = sin(x) / cos(x)
46f2 = lambda x: np.sin(x)
47g2 = lambda x: np.cos(x)
48
49result2 = quotient_rule_numerical(f2, g2, x=np.pi/4)
50print("Example: h(x) = sin(x) / cos(x) = tan(x) at x = pi/4")
51for key, value in result2.items():
52    print(f"  {key}: {value:.6f}")
53
54# At x = pi/4: tan'(x) = sec^2(x) = 1/cos^2(x) = 2
55print(f"Expected sec^2(pi/4) = {1 / np.cos(np.pi/4)**2:.6f}")

Softmax Jacobian: Quotient Rule in Action

Here's how the quotient rule appears when computing the Jacobian of the softmax function:

Softmax Jacobian via Quotient Rule
🐍softmax_jacobian.py
3Softmax as a Quotient

Each softmax output is a quotient: exp(z_i) divided by the sum of all exp(z_j). Differentiating this requires the quotient rule.

14Softmax Jacobian

The Jacobian captures how each output p_i changes with respect to each input z_j. This is where the quotient rule becomes essential.

22Diagonal Elements (i = j)

When differentiating p_i w.r.t. z_i, the quotient rule gives: (e^{z_i} · S - e^{z_i} · e^{z_i}) / S² = p_i(1 - p_i).

27Off-Diagonal Elements (i ≠ j)

When differentiating p_i w.r.t. z_j (where j ≠ i), the numerator derivative is 0, giving: -e^{z_i} · e^{z_j} / S² = -p_i · p_j.

45Row Sums are Zero

The Jacobian rows sum to zero because probabilities must sum to 1. If one probability goes up, others must go down!

50 lines without explanation
1import numpy as np
2
3def softmax(z):
4    """
5    Softmax function: transforms logits into probabilities.
6    softmax(z)_i = exp(z_i) / sum(exp(z_j))
7
8    This is a quotient! The quotient rule appears in its derivative.
9    """
10    exp_z = np.exp(z - np.max(z))  # Numerical stability
11    return exp_z / np.sum(exp_z)
12
13def softmax_jacobian(z):
14    """
15    Compute the Jacobian of softmax: d(softmax_i)/d(z_j)
16
17    Using the quotient rule:
18    - When i = j: d(p_i)/d(z_i) = p_i(1 - p_i)
19    - When i != j: d(p_i)/d(z_j) = -p_i * p_j
20
21    This gives: Jacobian = diag(p) - p @ p.T
22    """
23    p = softmax(z)
24    n = len(z)
25    jacobian = np.zeros((n, n))
26
27    for i in range(n):
28        for j in range(n):
29            if i == j:
30                # Quotient rule: (e^z_i * S - e^z_i * e^z_i) / S^2
31                # = p_i - p_i^2 = p_i(1 - p_i)
32                jacobian[i, j] = p[i] * (1 - p[i])
33            else:
34                # Quotient rule: (0 * S - e^z_i * e^z_j) / S^2
35                # = -p_i * p_j
36                jacobian[i, j] = -p[i] * p[j]
37
38    return jacobian
39
40# Example with 3 classes
41z = np.array([2.0, 1.0, 0.1])
42p = softmax(z)
43J = softmax_jacobian(z)
44
45print("Logits z:", z)
46print("Softmax p:", p.round(4))
47print("Sum of probabilities:", p.sum())  # Should be 1
48print()
49print("Jacobian matrix:")
50print(J.round(4))
51print()
52print("Row sums (should be 0):", J.sum(axis=1).round(10))
53
54# The quotient rule tells us how changing one logit
55# affects all the probabilities!

Common Mistakes to Avoid

Mistake 1: Wrong order in the numerator

Wrong: (fg)=fgfgg2\left(\frac{f}{g}\right)' = \frac{f \cdot g' - f' \cdot g}{g^2}

Correct: (fg)=fgfgg2\left(\frac{f}{g}\right)' = \frac{f' \cdot g - f \cdot g'}{g^2}

Remember: "Low d-High minus High d-Low" — derivative of the top comes first!

Mistake 2: Dividing derivatives

Wrong: (fg)=fg\left(\frac{f}{g}\right)' = \frac{f'}{g'}

Correct: (fg)=fgfgg2\left(\frac{f}{g}\right)' = \frac{f'g - fg'}{g^2}

The derivative of a quotient is NOT the quotient of derivatives!

Mistake 3: Forgetting to square the denominator

Wrong: (x2x+1)=2x(x+1)x2x+1\left(\frac{x^2}{x+1}\right)' = \frac{2x(x+1) - x^2}{x+1}

Correct: (x2x+1)=2x(x+1)x2(x+1)2\left(\frac{x^2}{x+1}\right)' = \frac{2x(x+1) - x^2}{(x+1)^2}

The denominator must be squared!

Mistake 4: Using quotient rule when unnecessary

For 5x3\frac{5}{x^3}, just use the power rule: 5x35x^{-3}, so the derivative is 15x4-15x^{-4}.

When the numerator is constant, consider rewriting with negative exponents for simpler computation.

Mistake 5: Sign errors with negative derivatives

When g(x)=sin(x)g'(x) = -\sin(x), remember that subtracting fgf \cdot g' means subtracting a negative, which adds:

fgf(sin(x))=fg+fsin(x)f' \cdot g - f \cdot (-\sin(x)) = f'g + f\sin(x)

Be careful with double negatives!


Test Your Understanding

Test Your UnderstandingQuestion 1 of 7

What is the derivative of f(x) = x / (x + 1)?

d/dx[x/(x+1)] = [1·(x+1) - x·1] / (x+1)²
Score: 0 / 7

Summary

The quotient rule tells us how to differentiate ratios of functions. Unlike the naive guess f/gf'/g', the correct formula accounts for both the changing numerator and the "fighting back" of the denominator.

Key Formula

ddx[f(x)g(x)]=f(x)g(x)f(x)g(x)[g(x)]2\frac{d}{dx}\left[\frac{f(x)}{g(x)}\right] = \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{[g(x)]^2}

Key Concepts

ConceptDescription
Memory aid"Low d-High minus High d-Low, over Low squared"
DerivationCan be derived from product rule using f/g = f · g⁻¹
tan(x) derivatived/dx[tan(x)] = sec²(x), proven via quotient rule
ML connectionSoftmax, attention weights, layer norm all involve quotients
Avoid whenNumerator is constant → use power rule with negative exponent
Common errorWrong order (f'g - fg', not fg' - f'g) and forgetting g²

Key Takeaways

  1. The quotient rule is not symmetric — the order matters: fgfgf'g - fg', not fgfgfg' - f'g
  2. It can be derived from the product rule by writing f/g=fg1f/g = f \cdot g^{-1}
  3. The squared denominator appears because we're dividing by the changing denominator
  4. The quotient rule is essential for computing gradients through softmax and other normalization operations in ML
  5. When the numerator is constant, rewrite with negative exponents for easier differentiation
The Quotient Rule in One Sentence:
"Low d-High minus High d-Low, over the square of what's below."
Coming Next: In the next section, we'll learn the Chain Rule — how to differentiate compositions of functions. This is perhaps the most powerful differentiation rule and is the foundation of backpropagation in neural networks!
Loading comments...