Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

State and apply the quotient rule for differentiating quotients of functions
Derive the quotient rule from the product rule and chain rule
Prove the quotient rule from the limit definition of the derivative
Recognize when to use the quotient rule versus simpler alternatives
Connect the quotient rule to normalization operations in neural networks (softmax, layer normalization)
Avoid common mistakes involving sign errors and order of terms

The Big Picture: Differentiating Ratios

"When one changing quantity is divided by another, the rate of change of their ratio depends on both the top and bottom — and the bottom 'fights back' when it changes."

In the previous section, we learned the product rule for differentiating $f(x) \cdot g(x)$ . But what if we need to differentiate a quotient of two functions, like $\frac{f(x)}{g(x)}$ ?

Just as with the product rule, a natural first guess might be that the derivative of a quotient is the quotient of derivatives: $\left(\frac{f}{g}\right)' = \frac{f'}{g'}$ . But this is wrong!

Counter-Example: Why (f/g)' \u2260 f'/g'

Let $f(x) = x$ and $g(x) = x$ . Then $\frac{f(x)}{g(x)} = \frac{x}{x} = 1$ (for $x \neq 0$ ).

Wrong answer: $\frac{f'(x)}{g'(x)} = \frac{1}{1} = 1$

Correct answer: The derivative of the constant 1 is $0$

Clearly $1 \neq 0$ , so the naive guess fails!

The correct formula is the quotient rule:

The Quotient Rule

\frac{d}{dx}\left[\frac{f(x)}{g(x)}\right] = \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{[g(x)]^2}

In Leibniz notation: $\frac{d}{dx}\left(\frac{u}{v}\right) = \frac{v\frac{du}{dx} - u\frac{dv}{dx}}{v^2}$

Memory Aid: "Low d-High minus High d-Low"

"Low d-High minus High d-Low, over Low squared."

Low = denominator (g), High = numerator (f), d-High = derivative of numerator, d-Low = derivative of denominator.

Historical Context

The quotient rule, like the product rule, was developed by Isaac Newton and Gottfried Wilhelm Leibniz in the late 17th century. Leibniz's differential notation made the relationship between the product and quotient rules particularly clear.

Leibniz observed that division is just multiplication by a reciprocal:

\frac{u}{v} = u \cdot v^{-1}

This insight means the quotient rule can be derived from the product rule and chain rule — we'll do this shortly!

Intuitive Understanding: The Denominator "Fights Back"

To understand the quotient rule intuitively, imagine you're computing a ratio like miles per hour (speed).

Let $f(t)$ = miles traveled and $g(t)$ = hours elapsed. Your average speed is $v(t) = f(t)/g(t)$ .

How does your speed change over time?

When you travel more miles ( $f$ increases), your speed goes up
When more time passes ( $g$ increases), your speed goes down — the denominator "dilutes" the numerator

The quotient rule captures both effects. Notice the minus sign: when the denominator increases, itdecreases the ratio. This is why we have $f' \cdot g - f \cdot g'$ in the numerator, not $f' \cdot g + f \cdot g'$ .

Why the Denominator is Squared

The $g^2$ in the denominator appears because when we compute the change in a quotient, we're dividing by the denominator twice:

Once for the original quotient
Once more when accounting for how the denominator is changing

Deriving the Quotient Rule from the Product Rule

One of the most elegant aspects of the quotient rule is that we canderive it from the product rule. The key insight is:

\frac{f}{g} = f \cdot g^{-1}

Use the interactive demonstration below to see each step of the derivation:

Deriving the Quotient Rule from the Product Rule

Step 1 of 7Starting Point

d/dx[f(x) / g(x)]

We want to find the derivative of a quotient of two functions.

Rewrite as ProductProduct RuleChain RuleAlgebraFinal Result

The Chain Rule Connection

This derivation requires the chain rule to differentiate $g^{-1}$ . If you haven't learned the chain rule yet (it's in the next section), don't worry — we'll also prove the quotient rule directly from the limit definition below.

Formal Proof from the Limit Definition

Here's the rigorous proof using the limit definition of the derivative, which doesn't require the chain rule:

Theorem: If $f$ and $g$ are differentiable at $x$ and $g(x) \neq 0$ , then $h(x) = \frac{f(x)}{g(x)}$ is differentiable at $x$ , and:

$h'(x) = \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{[g(x)]^2}$

Proof:

$h'(x) = \lim_{k \to 0} \frac{\frac{f(x+k)}{g(x+k)} - \frac{f(x)}{g(x)}}{k}$

Combine fractions in the numerator:

$= \lim_{k \to 0} \frac{1}{k} \cdot \frac{f(x+k) \cdot g(x) - f(x) \cdot g(x+k)}{g(x+k) \cdot g(x)}$

Add and subtract $f(x) \cdot g(x)$ :

$= \lim_{k \to 0} \frac{f(x+k)g(x) - f(x)g(x) - f(x)g(x+k) + f(x)g(x)}{k \cdot g(x+k) \cdot g(x)}$

Factor:

$= \lim_{k \to 0} \frac{[f(x+k) - f(x)]g(x) - f(x)[g(x+k) - g(x)]}{k \cdot g(x+k) \cdot g(x)}$

Separate the limits:

$= \frac{1}{[g(x)]^2} \left[ g(x) \lim_{k \to 0} \frac{f(x+k) - f(x)}{k} - f(x) \lim_{k \to 0} \frac{g(x+k) - g(x)}{k} \right]$

Since $g$ is continuous, $\lim_{k \to 0} g(x+k) = g(x)$ :

$= \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{[g(x)]^2}$ ∎

Interactive Exploration

Use the visualizer below to explore the quotient rule with different function pairs. Watch how the derivative of the quotient relates to the individual functions and their derivatives.

Interactive Quotient Rule Explorer

Function Pair

x = 1.00

Quotient Rule:

d/dx[x²/x + 1] = (2x \u00b7 x + 1 - x² \u00b7 1) / (x + 1)\u00B2

Values at x = 1.00:

f(x) = x² = 1.0000

g(x) = x + 1 = 2.0000

f'(x) = 2x = 2.0000

g'(x) = 1 = 1.0000

Quotient Rule Calculation:

h(x) = f/g = 0.5000

f'g = 4.0000

fg' = 1.0000

g\u00B2 = 4.0000

h'(x) = (4.00 - 1.00) / 4.00 = 0.7500

Worked Examples

Example 1: Simple Rational Function

Find $\frac{d}{dx}\left(\frac{x}{x+1}\right)$

Solution: Let $f(x) = x$ and $g(x) = x + 1$

$f'(x) = 1$
$g'(x) = 1$

Applying the quotient rule:

$\frac{d}{dx}\left(\frac{x}{x+1}\right) = \frac{(1)(x+1) - (x)(1)}{(x+1)^2}$

Simplifying:

$= \frac{x + 1 - x}{(x+1)^2} = \frac{1}{(x+1)^2}$

Example 2: The Derivative of tan(x)

Find $\frac{d}{dx}[\tan(x)]$

Solution: Recall that $\tan(x) = \frac{\sin(x)}{\cos(x)}$

Let $f(x) = \sin(x)$ and $g(x) = \cos(x)$

$f'(x) = \cos(x)$
$g'(x) = -\sin(x)$

Applying the quotient rule:

$\frac{d}{dx}[\tan(x)] = \frac{\cos(x) \cdot \cos(x) - \sin(x) \cdot (-\sin(x))}{\cos^2(x)}$

Simplifying:

$= \frac{\cos^2(x) + \sin^2(x)}{\cos^2(x)} = \frac{1}{\cos^2(x)}$

$= \sec^2(x)$

Example 3: Complex Rational Function

Find $\frac{d}{dx}\left(\frac{x^2 + 1}{x^2 - 1}\right)$

Solution: Let $f(x) = x^2 + 1$ and $g(x) = x^2 - 1$

$f'(x) = 2x$
$g'(x) = 2x$

Applying the quotient rule:

$= \frac{(2x)(x^2 - 1) - (x^2 + 1)(2x)}{(x^2 - 1)^2}$

Expanding:

$= \frac{2x^3 - 2x - 2x^3 - 2x}{(x^2 - 1)^2}$

$= \frac{-4x}{(x^2 - 1)^2}$

Example 4: With Exponential Function

Find $\frac{d}{dx}\left(\frac{e^x}{x}\right)$

Solution: Let $f(x) = e^x$ and $g(x) = x$

$f'(x) = e^x$
$g'(x) = 1$

Applying the quotient rule:

$= \frac{e^x \cdot x - e^x \cdot 1}{x^2}$

$= \frac{e^x(x - 1)}{x^2}$

Special Cases and Shortcuts

When the Numerator is Constant

For $h(x) = \frac{c}{g(x)}$ where $c$ is a constant:

\frac{d}{dx}\left(\frac{c}{g(x)}\right) = -\frac{c \cdot g'(x)}{[g(x)]^2}

Alternative: Rewrite as $c \cdot g(x)^{-1}$ and use the power rule with the chain rule:

$\frac{d}{dx}\left(\frac{c}{g(x)}\right) = c \cdot (-1) \cdot g(x)^{-2} \cdot g'(x) = -\frac{c \cdot g'(x)}{[g(x)]^2}$

When to Avoid the Quotient Rule

Sometimes rewriting the expression makes differentiation easier:

Instead of	Rewrite as	Then use
5/x³	5x⁻³	Power Rule
1/xⁿ	x⁻ⁿ	Power Rule
x²/2	(1/2)x²	Constant Multiple Rule
(x+1)/x	1 + 1/x = 1 + x⁻¹	Sum + Power Rule

When to Use the Quotient Rule

Use the quotient rule when both the numerator and denominator are non-constant functions that can't be simplified. If only the denominator contains $x$ , consider rewriting with negative exponents.

Machine Learning Applications

The quotient rule appears frequently in machine learning whenever we work with normalized or ratio-based quantities.

Softmax Function

The softmax function converts logits into probabilities:

\text{softmax}(z)_i = \frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}

This is a quotient! When computing gradients for backpropagation, the quotient rule tells us:

Diagonal: $\frac{\partial p_i}{\partial z_i}$

Using quotient rule: $p_i(1 - p_i)$

Off-diagonal: $\frac{\partial p_i}{\partial z_j}$

Using quotient rule: $-p_i \cdot p_j$

Attention Mechanism

In transformer attention, the attention weights are computed as:

$\alpha_{ij} = \frac{\exp(s_{ij})}{\sum_k \exp(s_{ik})}$

This is exactly the softmax function! Backpropagating through attention requires the quotient rule to compute gradients with respect to the scores $s_{ij}$ .

Layer Normalization

Layer normalization involves dividing by the standard deviation:

$\hat{x}_i = \frac{x_i - \mu}{\sigma}$

When $\sigma$ depends on $x_i$ (as it does in practice), computing the gradient requires the quotient rule.

Why This Matters for ML

Deep learning frameworks like PyTorch and TensorFlow automatically apply the quotient rule through automatic differentiation. But understanding the rule helps you:

Debug gradient issues in custom layers
Understand numerical stability concerns
Implement efficient backward passes
Reason about gradient flow through normalization layers

Python Implementation

Numerical Verification

Let's verify the quotient rule numerically:

Verifying the Quotient Rule Numerically

🐍quotient_rule_verify.py

Explanation(6)

Code(55)

3Quotient Rule Verification

This function numerically verifies the quotient rule by computing h'(x) two ways: directly from the definition, and using the formula (f'g - fg') / g².

11Numerical Derivatives

We use the central difference formula for better accuracy: f'(x) ≈ [f(x+h) - f(x-h)] / 2h. This is O(h²) accurate.

18Direct Numerical Derivative

We compute the derivative of h(x) = f(x)/g(x) directly using the definition, without applying the quotient rule.

22Quotient Rule Formula

The quotient rule gives us h'(x) = (f'g - fg') / g². This should match the numerical derivative if the rule is correct.

38Testing x²/(x+1)

For h(x) = x²/(x+1), we have f'(x) = 2x and g'(x) = 1, so h'(x) = [2x(x+1) - x²(1)] / (x+1)² = (x² + 2x) / (x+1)².

47Derivative of tan(x)

This verifies that d/dx[tan(x)] = sec²(x), a classic result derived using the quotient rule on sin(x)/cos(x).

49 lines without explanation

1import numpy as np
2
3def quotient_rule_numerical(f, g, x, h=1e-7):
4    """
5    Verify the quotient rule numerically.
6
7    For h(x) = f(x) / g(x), the quotient rule states:
8    h'(x) = (f'(x) * g(x) - f(x) * g'(x)) / [g(x)]^2
9
10    We'll compute both sides and compare.
11    """
12    # Numerical derivatives using central difference
13    f_prime = (f(x + h) - f(x - h)) / (2 * h)
14    g_prime = (g(x + h) - g(x - h)) / (2 * h)
15
16    # Quotient function
17    h_func = lambda t: f(t) / g(t)
18    h_prime_numerical = (h_func(x + h) - h_func(x - h)) / (2 * h)
19
20    # Quotient rule formula
21    g_x = g(x)
22    f_x = f(x)
23    h_prime_formula = (f_prime * g_x - f_x * g_prime) / (g_x ** 2)
24
25    return {
26        "f(x)": f_x,
27        "g(x)": g_x,
28        "f'(x)": f_prime,
29        "g'(x)": g_prime,
30        "h'(x) numerical": h_prime_numerical,
31        "h'(x) formula": h_prime_formula,
32        "difference": abs(h_prime_numerical - h_prime_formula)
33    }
34
35# Example 1: f(x) = x^2, g(x) = x + 1 => h(x) = x^2 / (x + 1)
36f1 = lambda x: x**2
37g1 = lambda x: x + 1
38
39result1 = quotient_rule_numerical(f1, g1, x=2.0)
40print("Example: h(x) = x^2 / (x + 1) at x = 2")
41for key, value in result1.items():
42    print(f"  {key}: {value:.6f}")
43print()
44
45# Example 2: tan(x) = sin(x) / cos(x)
46f2 = lambda x: np.sin(x)
47g2 = lambda x: np.cos(x)
48
49result2 = quotient_rule_numerical(f2, g2, x=np.pi/4)
50print("Example: h(x) = sin(x) / cos(x) = tan(x) at x = pi/4")
51for key, value in result2.items():
52    print(f"  {key}: {value:.6f}")
53
54# At x = pi/4: tan'(x) = sec^2(x) = 1/cos^2(x) = 2
55print(f"Expected sec^2(pi/4) = {1 / np.cos(np.pi/4)**2:.6f}")

Softmax Jacobian: Quotient Rule in Action

Here's how the quotient rule appears when computing the Jacobian of the softmax function:

Softmax Jacobian via Quotient Rule

🐍softmax_jacobian.py

Explanation(5)

Code(55)

3Softmax as a Quotient

Each softmax output is a quotient: exp(z_i) divided by the sum of all exp(z_j). Differentiating this requires the quotient rule.

14Softmax Jacobian

The Jacobian captures how each output p_i changes with respect to each input z_j. This is where the quotient rule becomes essential.

22Diagonal Elements (i = j)

When differentiating p_i w.r.t. z_i, the quotient rule gives: (e^{z_i} · S - e^{z_i} · e^{z_i}) / S² = p_i(1 - p_i).

27Off-Diagonal Elements (i ≠ j)

When differentiating p_i w.r.t. z_j (where j ≠ i), the numerator derivative is 0, giving: -e^{z_i} · e^{z_j} / S² = -p_i · p_j.

45Row Sums are Zero

The Jacobian rows sum to zero because probabilities must sum to 1. If one probability goes up, others must go down!

50 lines without explanation

1import numpy as np
2
3def softmax(z):
4    """
5    Softmax function: transforms logits into probabilities.
6    softmax(z)_i = exp(z_i) / sum(exp(z_j))
7
8    This is a quotient! The quotient rule appears in its derivative.
9    """
10    exp_z = np.exp(z - np.max(z))  # Numerical stability
11    return exp_z / np.sum(exp_z)
12
13def softmax_jacobian(z):
14    """
15    Compute the Jacobian of softmax: d(softmax_i)/d(z_j)
16
17    Using the quotient rule:
18    - When i = j: d(p_i)/d(z_i) = p_i(1 - p_i)
19    - When i != j: d(p_i)/d(z_j) = -p_i * p_j
20
21    This gives: Jacobian = diag(p) - p @ p.T
22    """
23    p = softmax(z)
24    n = len(z)
25    jacobian = np.zeros((n, n))
26
27    for i in range(n):
28        for j in range(n):
29            if i == j:
30                # Quotient rule: (e^z_i * S - e^z_i * e^z_i) / S^2
31                # = p_i - p_i^2 = p_i(1 - p_i)
32                jacobian[i, j] = p[i] * (1 - p[i])
33            else:
34                # Quotient rule: (0 * S - e^z_i * e^z_j) / S^2
35                # = -p_i * p_j
36                jacobian[i, j] = -p[i] * p[j]
37
38    return jacobian
39
40# Example with 3 classes
41z = np.array([2.0, 1.0, 0.1])
42p = softmax(z)
43J = softmax_jacobian(z)
44
45print("Logits z:", z)
46print("Softmax p:", p.round(4))
47print("Sum of probabilities:", p.sum())  # Should be 1
48print()
49print("Jacobian matrix:")
50print(J.round(4))
51print()
52print("Row sums (should be 0):", J.sum(axis=1).round(10))
53
54# The quotient rule tells us how changing one logit
55# affects all the probabilities!

Common Mistakes to Avoid

Mistake 1: Wrong order in the numerator

Wrong: $\left(\frac{f}{g}\right)' = \frac{f \cdot g' - f' \cdot g}{g^2}$

Correct: $\left(\frac{f}{g}\right)' = \frac{f' \cdot g - f \cdot g'}{g^2}$

Remember: "Low d-High minus High d-Low" — derivative of the top comes first!

Mistake 2: Dividing derivatives

Wrong: $\left(\frac{f}{g}\right)' = \frac{f'}{g'}$

Correct: $\left(\frac{f}{g}\right)' = \frac{f'g - fg'}{g^2}$

The derivative of a quotient is NOT the quotient of derivatives!

Mistake 3: Forgetting to square the denominator

Wrong: $\left(\frac{x^2}{x+1}\right)' = \frac{2x(x+1) - x^2}{x+1}$

Correct: $\left(\frac{x^2}{x+1}\right)' = \frac{2x(x+1) - x^2}{(x+1)^2}$

The denominator must be squared!

Mistake 4: Using quotient rule when unnecessary

For $\frac{5}{x^3}$ , just use the power rule: $5x^{-3}$ , so the derivative is $-15x^{-4}$ .

When the numerator is constant, consider rewriting with negative exponents for simpler computation.

Mistake 5: Sign errors with negative derivatives

When $g'(x) = -\sin(x)$ , remember that subtracting $f \cdot g'$ means subtracting a negative, which adds:

$f' \cdot g - f \cdot (-\sin(x)) = f'g + f\sin(x)$

Be careful with double negatives!

Test Your Understanding

Test Your UnderstandingQuestion 1 of 7

What is the derivative of f(x) = x / (x + 1)?

d/dx[x/(x+1)] = [1·(x+1) - x·1] / (x+1)²

Score: 0 / 7

Summary

The quotient rule tells us how to differentiate ratios of functions. Unlike the naive guess $f'/g'$ , the correct formula accounts for both the changing numerator and the "fighting back" of the denominator.

Key Formula

\frac{d}{dx}\left[\frac{f(x)}{g(x)}\right] = \frac{f'(x) \cdot g(x) - f(x) \cdot g'(x)}{[g(x)]^2}

Key Concepts

Concept	Description
Memory aid	"Low d-High minus High d-Low, over Low squared"
Derivation	Can be derived from product rule using f/g = f · g⁻¹
tan(x) derivative	d/dx[tan(x)] = sec²(x), proven via quotient rule
ML connection	Softmax, attention weights, layer norm all involve quotients
Avoid when	Numerator is constant → use power rule with negative exponent
Common error	Wrong order (f'g - fg', not fg' - f'g) and forgetting g²

Key Takeaways

The quotient rule is not symmetric — the order matters: $f'g - fg'$ , not $fg' - f'g$
It can be derived from the product rule by writing $f/g = f \cdot g^{-1}$
The squared denominator appears because we're dividing by the changing denominator
The quotient rule is essential for computing gradients through softmax and other normalization operations in ML
When the numerator is constant, rewrite with negative exponents for easier differentiation

The Quotient Rule in One Sentence:

"Low d-High minus High d-Low, over the square of what's below."

Coming Next: In the next section, we'll learn the Chain Rule — how to differentiate compositions of functions. This is perhaps the most powerful differentiation rule and is the foundation of backpropagation in neural networks!

Learning Objectives

The Big Picture: Differentiating Ratios

Counter-Example: Why (f/g)' \u2260 f'/g'

The Quotient Rule

Memory Aid: "Low d-High minus High d-Low"

Historical Context

Intuitive Understanding: The Denominator "Fights Back"

Why the Denominator is Squared

Deriving the Quotient Rule from the Product Rule

The Chain Rule Connection

Formal Proof from the Limit Definition

Interactive Exploration

Values at x = 1.00:

Quotient Rule Calculation:

Worked Examples

Example 1: Simple Rational Function

Example 2: The Derivative of tan(x)

Example 3: Complex Rational Function

Example 4: With Exponential Function

Special Cases and Shortcuts

When the Numerator is Constant

When to Avoid the Quotient Rule

When to Use the Quotient Rule

Machine Learning Applications

Softmax Function

Diagonal: ∂pi∂zi\frac{\partial p_i}{\partial z_i}∂zi​∂pi​​

Off-diagonal: ∂pi∂zj\frac{\partial p_i}{\partial z_j}∂zj​∂pi​​

Attention Mechanism

Layer Normalization

Why This Matters for ML

Python Implementation

Numerical Verification

Softmax Jacobian: Quotient Rule in Action

Common Mistakes to Avoid

Mistake 1: Wrong order in the numerator

Mistake 2: Dividing derivatives

Mistake 3: Forgetting to square the denominator

Mistake 4: Using quotient rule when unnecessary

Mistake 5: Sign errors with negative derivatives

Test Your Understanding

What is the derivative of f(x) = x / (x + 1)?

Summary

Key Formula

Key Concepts

Key Takeaways

Diagonal: $\frac{\partial p_i}{\partial z_i}$

Off-diagonal: $\frac{\partial p_i}{\partial z_j}$