Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Apply the Power Rule to differentiate any function of the form $x^n$
Use the Constant Rule to recognize that derivatives of constants are zero
Apply the Constant Multiple Rule to pull constants out of derivatives
Combine the Sum and Difference Rules to differentiate polynomial functions
Extend the Power Rule to negative and fractional exponents
Connect these rules to gradient computation in machine learning

The Big Picture: From Definition to Efficiency

"The derivative rules are shortcuts — they replace tedious limit calculations with simple algebraic operations."

In the previous sections, we learned that the derivative is defined as a limit: $f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$ . While this definition is fundamental, computing derivatives using limits every time would be incredibly tedious. Imagine having to expand $(x+h)^{10}$ just to find the derivative of $x^{10}$ !

The derivative rules we learn in this section are powerful shortcuts derived from the limit definition. Once proven, they allow us to differentiate most functions instantly, without ever writing a limit.

Why These Rules Matter

These three rules — Power, Sum, and Constant — are the building blocks for differentiating all polynomial functions. Combined with rules for products, quotients, and compositions (which we'll learn later), they let us differentiate virtually any function we encounter.

The rules we'll learn:

Constant Rule

\frac{d}{dx}(c) = 0

Power Rule

\frac{d}{dx}(x^n) = nx^{n-1}

Sum Rule

\frac{d}{dx}(f+g) = f' + g'

Historical Context: Newton and Leibniz

Both Isaac Newton (1643–1727) and Gottfried Wilhelm Leibniz (1646–1716) independently discovered calculus in the late 17th century. They both recognized that certain patterns emerged when differentiating polynomial functions:

The derivative of $x^2$ is $2x$
The derivative of $x^3$ is $3x^2$
The derivative of $x^4$ is $4x^3$

The pattern was clear: the exponent comes down as a coefficient, and the exponent decreases by one. This became the Power Rule, one of the most frequently used rules in all of calculus.

The Notation We Use

Leibniz introduced the notation $\frac{d}{dx}$ for derivatives. This notation emphasizes that differentiation is an operation we perform with respect to a variable. Newton used a dot notation (still used in physics for time derivatives). Both notations survive today, each with its advantages.

The Constant Rule

The simplest derivative rule states that the derivative of any constant is zero:

The Constant Rule

\frac{d}{dx}(c) = 0

where $c$ is any constant

Why Does This Make Sense?

The derivative measures the rate of change. A constant, by definition, doesn't change. If $f(x) = 5$ , then no matter what $x$ is, the output is always 5. The function is perfectly flat — its slope is zero everywhere.

Proof Using the Limit Definition

Let $f(x) = c$ where $c$ is a constant.

f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

= \lim_{h \to 0} \frac{c - c}{h}

= \lim_{h \to 0} \frac{0}{h}

= \lim_{h \to 0} 0 = 0

Function	Derivative	Explanation
f(x) = 7	f'(x) = 0	The number 7 never changes
f(x) = π	f'(x) = 0	π is a constant (≈ 3.14159...)
f(x) = -100	f'(x) = 0	Negative constants also don't change

The Power Rule

The Power Rule is perhaps the most frequently used differentiation formula. It tells us how to differentiate any power of $x$ :

The Power Rule

\frac{d}{dx}(x^n) = n \cdot x^{n-1}

for any real number $n$

In words: bring down the exponent as a coefficient, then reduce the exponent by 1.

Deriving the Power Rule

Let's prove the Power Rule for positive integers using the limit definition and the Binomial Theorem:

Goal: Show that $\frac{d}{dx}(x^n) = nx^{n-1}$

Proof: Let $f(x) = x^n$ .

f'(x) = \lim_{h \to 0} \frac{(x+h)^n - x^n}{h}

By the Binomial Theorem: $(x+h)^n = x^n + nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \ldots + h^n$

Substituting:

= \lim_{h \to 0} \frac{x^n + nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \ldots - x^n}{h}

= \lim_{h \to 0} \frac{nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \ldots}{h}

= \lim_{h \to 0} \left[ nx^{n-1} + \binom{n}{2}x^{n-2}h + \ldots \right]

= nx^{n-1}

∎

All terms with $h$ vanish as $h \to 0$ , leaving only $nx^{n-1}$ .

f(x)	f'(x)	Pattern
x¹	1	1·x⁰ = 1
x²	2x	2·x¹ = 2x
x³	3x²	3·x² = 3x²
x⁴	4x³	4·x³ = 4x³
x¹⁰	10x⁹	10·x⁹ = 10x⁹

Interactive Power Rule Explorer

Use the visualizer below to explore how the Power Rule works. Adjust the exponent $n$ and observe how both the function and its derivative change:

Interactive Power Rule Visualizer

Explore how the Power Rule transforms functions and their derivatives

Power: n = 2

Adjust the exponent to see how the derivative changes

Point: x = 1.00

At x = 1.00:

f(x) = 1.0000

f'(x) = 2.0000

The slope of the tangent line at this point is 2.0000

Power Rule Formula:

d/dx(x²) = 2x¹

The Constant Multiple Rule

Constants can be "pulled out" of derivatives:

The Constant Multiple Rule

\frac{d}{dx}[c \cdot f(x)] = c \cdot \frac{d}{dx}[f(x)] = c \cdot f'(x)

Proof

\frac{d}{dx}[cf(x)] = \lim_{h \to 0} \frac{cf(x+h) - cf(x)}{h}

= \lim_{h \to 0} c \cdot \frac{f(x+h) - f(x)}{h}

= c \cdot \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

= c \cdot f'(x)

∎

Example: Find the derivative of $f(x) = 5x^3$ .

f'(x) = 5 \cdot \frac{d}{dx}(x^3)

(Constant Multiple Rule)

= 5 \cdot 3x^2

(Power Rule)

= 15x^2

Sum and Difference Rules

The derivative of a sum is the sum of the derivatives:

Sum Rule

\frac{d}{dx}[f(x) + g(x)] = f'(x) + g'(x)

Difference Rule

\frac{d}{dx}[f(x) - g(x)] = f'(x) - g'(x)

Proof of Sum Rule

\frac{d}{dx}[f(x) + g(x)] = \lim_{h \to 0} \frac{[f(x+h) + g(x+h)] - [f(x) + g(x)]}{h}

= \lim_{h \to 0} \frac{[f(x+h) - f(x)] + [g(x+h) - g(x)]}{h}

= \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} + \lim_{h \to 0} \frac{g(x+h) - g(x)}{h}

= f'(x) + g'(x)

∎

Key Insight

The Sum Rule means differentiation is a linear operator. This property is fundamental in advanced mathematics and allows us to differentiate complex expressions term by term.

Interactive Sum Rule Demo

Explore how the derivative of a sum equals the sum of derivatives:

Sum Rule Interactive Demo

The derivative of a sum equals the sum of the derivatives

Function f(x)

Coefficient: 1

Function g(x)

Coefficient: 0.5

Evaluate at x = 1.00

Sum Rule in Action:

Derivatives:

f'(x) = 2x

g'(x) = 1.5x²

At x = 1.0:

f'(1.0) = 2.000

g'(1.0) = 1.500

(f + g)'(1.0) = 2.000 + 1.500 = 3.500

Combining the Rules: Polynomial Differentiation

With the Power Rule, Constant Multiple Rule, and Sum Rule, we can differentiate any polynomial:

Example 1: Find $\frac{d}{dx}(3x^4 - 5x^2 + 7x - 2)$

\frac{d}{dx}(3x^4 - 5x^2 + 7x - 2)

Apply Sum Rule (term by term):

= \frac{d}{dx}(3x^4) - \frac{d}{dx}(5x^2) + \frac{d}{dx}(7x) - \frac{d}{dx}(2)

Apply Constant Multiple Rule:

= 3\frac{d}{dx}(x^4) - 5\frac{d}{dx}(x^2) + 7\frac{d}{dx}(x) - 0

Apply Power Rule:

= 3(4x^3) - 5(2x) + 7(1)

= 12x^3 - 10x + 7

Example 2: Find $\frac{d}{dx}(x^6 + 2x^5 - x^3 + 4x - 9)$

= 6x^5 + 2(5x^4) - 3x^2 + 4(1) - 0

= 6x^5 + 10x^4 - 3x^2 + 4

Negative and Fractional Exponents

The Power Rule works for all real exponents, not just positive integers. This greatly extends its usefulness.

Negative Exponents

Recall that $x^{-n} = \frac{1}{x^n}$ . The Power Rule still applies:

Function	Rewrite	Derivative
1/x	x⁻¹	-1·x⁻² = -1/x²
1/x²	x⁻²	-2·x⁻³ = -2/x³
1/x³	x⁻³	-3·x⁻⁴ = -3/x⁴

Example: Find $\frac{d}{dx}\left(\frac{3}{x^2}\right)$

\frac{d}{dx}\left(\frac{3}{x^2}\right) = \frac{d}{dx}(3x^{-2})

= 3 \cdot (-2)x^{-3}

= -\frac{6}{x^3}

Fractional Exponents (Roots)

Roots can be written as fractional exponents: $\sqrt{x} = x^{1/2}$ , $\sqrt[3]{x} = x^{1/3}$ , etc.

Function	Rewrite	Derivative
√x	x^(1/2)	(1/2)x^(-1/2) = 1/(2√x)
∛x	x^(1/3)	(1/3)x^(-2/3)
x^(3/2)	x^(3/2)	(3/2)x^(1/2) = (3/2)√x

Example: Find $\frac{d}{dx}(\sqrt{x})$

\frac{d}{dx}(\sqrt{x}) = \frac{d}{dx}(x^{1/2})

= \frac{1}{2}x^{1/2 - 1}

= \frac{1}{2}x^{-1/2}

= \frac{1}{2\sqrt{x}}

Real-World Applications

Physics: Motion

If position is given by a polynomial function of time, velocity and acceleration are found by differentiation:

Problem: A ball is thrown vertically. Its height (in meters) after $t$ seconds is $h(t) = -5t^2 + 20t + 2$ .

Find the velocity and acceleration functions.

Solution:

Velocity: $v(t) = h'(t) = -10t + 20$ m/s
Acceleration: $a(t) = v'(t) = -10$ m/s² (constant, due to gravity)

At $t = 2$ seconds: $v(2) = -10(2) + 20 = 0$ . This is when the ball reaches its maximum height.

Economics: Marginal Analysis

In economics, the derivative represents "marginal" quantities — the rate of change of one quantity with respect to another:

Problem: A company's cost to produce $x$ units is $C(x) = 0.01x^3 - 0.9x^2 + 35x + 500$ dollars.

Find the marginal cost when producing 30 units.

Solution:

C'(x) = 0.03x^2 - 1.8x + 35

C'(30) = 0.03(900) - 1.8(30) + 35 = 27 - 54 + 35 = 8

The 31st unit costs approximately $8 more to produce than the 30th.

Biology: Population Growth

Population models often involve derivatives to understand growth rates:

Problem: A bacterial colony population after $t$ hours is modeled by $P(t) = 100 + 50t + 2t^2$ .

Find the growth rate at t = 5 hours.

Solution:

Growth rate:

P'(t) = 50 + 4t

t = 5

P'(5) = 50 + 20 = 70

bacteria per hour

Machine Learning Connection

The derivative rules are the foundation of gradient-based optimization, which powers virtually all modern machine learning.

Polynomial Regression

In polynomial regression, we fit a model of the form: $y = w_0 + w_1 x + w_2 x^2 + \ldots + w_n x^n$

To minimize the loss function, we need derivatives with respect to each weight $w_i$ . The Power Rule tells us exactly how polynomial features contribute to the gradient.

Gradient Descent

The update rule for gradient descent is: $w_{new} = w_{old} - \alpha \cdot \frac{\partial L}{\partial w}$

Computing $\frac{\partial L}{\partial w}$ requires the derivative rules. For polynomial models:

Power Rule: Differentiates each polynomial feature
Sum Rule: Combines gradients from multiple terms
Constant Rule: Handles bias terms (their gradient is simpler)

Automatic Differentiation

Modern deep learning frameworks like PyTorch and TensorFlow use automatic differentiation to compute gradients. Under the hood, they apply the same rules we're learning — Power, Sum, Product, Chain — to build a computational graph and compute derivatives efficiently.

Python Implementation

Computing Polynomial Derivatives

Let's implement the derivative rules in Python:

Implementing Derivative Rules

🐍polynomial_derivatives.py

Explanation(4)

Code(48)

3Power Rule Function

This function implements the power rule: d/dx(cx^n) = cnx^(n-1). It takes the coefficient, exponent, and point x as inputs.

16Constant Derivative

When the exponent is 0, we have a constant term (c * x^0 = c). The derivative of a constant is always 0.

19Applying the Rule

The new coefficient is the original coefficient times the exponent. The new exponent is one less than the original.

30Polynomial Derivative

Using the Sum Rule, we can differentiate each term separately and add the results. This is exactly how computers handle polynomial derivatives.

44 lines without explanation

1import numpy as np
2
3def power_rule(coefficient, exponent, x):
4    """
5    Apply the power rule to find derivative of c * x^n at point x.
6
7    Power Rule: d/dx(c * x^n) = c * n * x^(n-1)
8
9    Args:
10        coefficient: The constant multiplier c
11        exponent: The power n
12        x: The point at which to evaluate
13
14    Returns:
15        The derivative value at x
16    """
17    if exponent == 0:
18        return 0  # Derivative of constant is 0
19
20    new_coefficient = coefficient * exponent
21    new_exponent = exponent - 1
22    return new_coefficient * (x ** new_exponent)
23
24# Examples
25print("Power Rule Examples:")
26print(f"d/dx(x^2) at x=3: {power_rule(1, 2, 3)}")        # 2*3 = 6
27print(f"d/dx(5x^3) at x=2: {power_rule(5, 3, 2)}")      # 15*4 = 60
28print(f"d/dx(x^(-1)) at x=2: {power_rule(1, -1, 2)}")   # -1/4 = -0.25
29
30def polynomial_derivative(coefficients, x):
31    """
32    Compute derivative of a polynomial at point x.
33
34    coefficients[i] is the coefficient of x^i
35    Uses Sum Rule: d/dx(f + g) = f' + g'
36
37    Example: [3, 2, 1] represents 3 + 2x + x^2
38    """
39    derivative = 0
40    for power, coef in enumerate(coefficients):
41        derivative += power_rule(coef, power, x)
42    return derivative
43
44# Example: f(x) = 3 + 2x + x^2, f'(x) = 2 + 2x
45coeffs = [3, 2, 1]  # 3 + 2x + x^2
46print(f"\nPolynomial 3 + 2x + x^2:")
47print(f"  f'(1) = {polynomial_derivative(coeffs, 1)}")  # 2 + 2 = 4
48print(f"  f'(2) = {polynomial_derivative(coeffs, 2)}")  # 2 + 4 = 6

Application to Machine Learning

Here's how these rules appear in gradient descent for polynomial regression:

Derivative Rules in ML

🐍polynomial_regression.py

Explanation(4)

Code(58)

5Polynomial Feature Gradient

In polynomial regression, each feature x^i requires derivative rules. The power rule tells us how each feature changes with x.

17Feature Construction

Creating [1, x, x^2, ...] uses the power rule implicitly. Each x^i follows the pattern we learned.

29Gradient Computation

The gradient combines chain rule with our basic rules. The power rule applies to polynomial features, the sum rule to combining gradients.

38Weight Update

Gradient descent uses derivatives at every step. The derivative rules make computing these gradients tractable for any polynomial model.

54 lines without explanation

1import numpy as np
2
3# In machine learning, we often need gradients of polynomial features
4
5def compute_polynomial_features_gradient(x, weights, degree):
6    """
7    Compute gradient of polynomial regression loss w.r.t. weights.
8
9    Model: y_pred = w_0 + w_1*x + w_2*x^2 + ... + w_d*x^d
10    Loss: L = (y_true - y_pred)^2
11
12    For SGD, we need dL/dw_i for each weight.
13
14    Using chain rule and power rule:
15    dL/dw_i = -2 * (y_true - y_pred) * x^i
16    """
17    # Create polynomial features: [1, x, x^2, ..., x^d]
18    features = np.array([x ** i for i in range(degree + 1)])
19
20    # Prediction
21    y_pred = np.dot(weights, features)
22
23    return features, y_pred
24
25# Simple gradient descent step
26def gradient_step(x, y_true, weights, learning_rate, degree):
27    """
28    One step of gradient descent for polynomial regression.
29
30    The derivative rules (power, sum, constant) appear here:
31    - Power rule: d/dx(x^n) = n*x^(n-1) in feature computation
32    - Sum rule: total gradient is sum of individual gradients
33    - Constant rule: bias term gradient is just the error signal
34    """
35    features, y_pred = compute_polynomial_features_gradient(x, weights, degree)
36    error = y_true - y_pred
37
38    # Gradient for each weight
39    gradient = -2 * error * features
40
41    # Update weights
42    new_weights = weights - learning_rate * gradient
43
44    return new_weights, error ** 2
45
46# Example: Fit y = x^2 with polynomial regression
47np.random.seed(42)
48weights = np.random.randn(3) * 0.1  # degree 2: [w_0, w_1, w_2]
49learning_rate = 0.01
50
51print("Training polynomial regression:")
52for step in range(5):
53    x = np.random.uniform(-2, 2)
54    y_true = x ** 2  # True function
55    weights, loss = gradient_step(x, y_true, weights, learning_rate, 2)
56    print(f"Step {step+1}: weights = {weights.round(3)}, loss = {loss:.4f}")
57
58print(f"\nFinal weights (should approach [0, 0, 1]): {weights.round(3)}")

Common Mistakes to Avoid

Mistake 1: Forgetting to reduce the exponent

Wrong: $\frac{d}{dx}(x^5) = 5x^5$

Correct: $\frac{d}{dx}(x^5) = 5x^4$

The Power Rule requires reducing the exponent by 1.

Mistake 2: Treating x like a constant

Wrong: $\frac{d}{dx}(x) = 0$

Correct: $\frac{d}{dx}(x) = \frac{d}{dx}(x^1) = 1 \cdot x^0 = 1$

The variable $x$ is not a constant — it's $x^1$ .

Mistake 3: Confusing coefficient and exponent

Wrong: $\frac{d}{dx}(3x^2) = 6x^2$

Correct: $\frac{d}{dx}(3x^2) = 3 \cdot 2x^1 = 6x$

The 3 is a coefficient (stays), the 2 comes down as multiplier, and the exponent reduces.

Mistake 4: Forgetting the constant of a linear term

Wrong: $\frac{d}{dx}(5x) = 0$

Correct: $\frac{d}{dx}(5x) = 5 \cdot 1 = 5$

$5x = 5x^1$ , so the derivative is $5 \cdot 1 \cdot x^0 = 5$ .

Test Your Understanding

Derivative Rules QuizQuestion 1 of 8

What is the derivative of f(x) = x⁵?

Score: 0/0

Summary

The derivative rules provide powerful shortcuts for computing derivatives without using the limit definition every time.

The Core Rules

Rule	Formula	Example
Constant Rule	d/dx(c) = 0	d/dx(7) = 0
Power Rule	d/dx(xⁿ) = nxⁿ⁻¹	d/dx(x³) = 3x²
Constant Multiple	d/dx[cf(x)] = cf'(x)	d/dx(5x²) = 10x
Sum Rule	d/dx[f + g] = f' + g'	d/dx(x² + x) = 2x + 1
Difference Rule	d/dx[f - g] = f' - g'	d/dx(x³ - x) = 3x² - 1

Key Takeaways

The Power Rule is the workhorse: $\frac{d}{dx}(x^n) = nx^{n-1}$ for any real $n$
Constants have zero derivative because they don't change
The Sum Rule allows term-by-term differentiation of polynomials
The Power Rule works for negative and fractional exponents too
These rules are building blocks for the Product, Quotient, and Chain Rules
Machine learning uses these rules constantly in gradient computation

The Power of Shortcuts:

"With these three rules, we can differentiate any polynomial instantly — a task that would take pages using limits."

Coming Next: In the next section, we'll learn the Product Rule — how to differentiate products of functions, which is essential when our functions can't be written as simple polynomials.