Chapter 4
18 min read
Section 35 of 353

Basic Derivative Rules: Power, Sum, Constant

The Derivative - Instantaneous Rate of Change

Learning Objectives

By the end of this section, you will be able to:

  1. Apply the Power Rule to differentiate any function of the form xnx^n
  2. Use the Constant Rule to recognize that derivatives of constants are zero
  3. Apply the Constant Multiple Rule to pull constants out of derivatives
  4. Combine the Sum and Difference Rules to differentiate polynomial functions
  5. Extend the Power Rule to negative and fractional exponents
  6. Connect these rules to gradient computation in machine learning

The Big Picture: From Definition to Efficiency

"The derivative rules are shortcuts — they replace tedious limit calculations with simple algebraic operations."

In the previous sections, we learned that the derivative is defined as a limit: f(x)=limh0f(x+h)f(x)hf'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}. While this definition is fundamental, computing derivatives using limits every time would be incredibly tedious. Imagine having to expand (x+h)10(x+h)^{10} just to find the derivative of x10x^{10}!

The derivative rules we learn in this section are powerful shortcuts derived from the limit definition. Once proven, they allow us to differentiate most functions instantly, without ever writing a limit.

Why These Rules Matter

These three rules — Power, Sum, and Constant — are the building blocks for differentiating all polynomial functions. Combined with rules for products, quotients, and compositions (which we'll learn later), they let us differentiate virtually any function we encounter.

The rules we'll learn:

Constant Rule

ddx(c)=0\frac{d}{dx}(c) = 0

Power Rule

ddx(xn)=nxn1\frac{d}{dx}(x^n) = nx^{n-1}

Sum Rule

ddx(f+g)=f+g\frac{d}{dx}(f+g) = f' + g'

Historical Context: Newton and Leibniz

Both Isaac Newton (1643–1727) and Gottfried Wilhelm Leibniz (1646–1716) independently discovered calculus in the late 17th century. They both recognized that certain patterns emerged when differentiating polynomial functions:

  • The derivative of x2x^2 is 2x2x
  • The derivative of x3x^3 is 3x23x^2
  • The derivative of x4x^4 is 4x34x^3

The pattern was clear: the exponent comes down as a coefficient, and the exponent decreases by one. This became the Power Rule, one of the most frequently used rules in all of calculus.

The Notation We Use

Leibniz introduced the notation ddx\frac{d}{dx} for derivatives. This notation emphasizes that differentiation is an operation we perform with respect to a variable. Newton used a dot notation (still used in physics for time derivatives). Both notations survive today, each with its advantages.


The Constant Rule

The simplest derivative rule states that the derivative of any constant is zero:

The Constant Rule

ddx(c)=0\frac{d}{dx}(c) = 0

where cc is any constant

Why Does This Make Sense?

The derivative measures the rate of change. A constant, by definition, doesn't change. If f(x)=5f(x) = 5, then no matter what xx is, the output is always 5. The function is perfectly flat — its slope is zero everywhere.

Proof Using the Limit Definition

Let f(x)=cf(x) = c where cc is a constant.

f(x)=limh0f(x+h)f(x)hf'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}
=limh0cch= \lim_{h \to 0} \frac{c - c}{h}
=limh00h= \lim_{h \to 0} \frac{0}{h}
=limh00=0= \lim_{h \to 0} 0 = 0
FunctionDerivativeExplanation
f(x) = 7f'(x) = 0The number 7 never changes
f(x) = πf'(x) = 0π is a constant (≈ 3.14159...)
f(x) = -100f'(x) = 0Negative constants also don't change

The Power Rule

The Power Rule is perhaps the most frequently used differentiation formula. It tells us how to differentiate any power of xx:

The Power Rule

ddx(xn)=nxn1\frac{d}{dx}(x^n) = n \cdot x^{n-1}

for any real number nn

In words: bring down the exponent as a coefficient, then reduce the exponent by 1.

Deriving the Power Rule

Let's prove the Power Rule for positive integers using the limit definition and the Binomial Theorem:

Goal: Show that ddx(xn)=nxn1\frac{d}{dx}(x^n) = nx^{n-1}

Proof: Let f(x)=xnf(x) = x^n.

f(x)=limh0(x+h)nxnhf'(x) = \lim_{h \to 0} \frac{(x+h)^n - x^n}{h}

By the Binomial Theorem: (x+h)n=xn+nxn1h+(n2)xn2h2++hn(x+h)^n = x^n + nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \ldots + h^n

Substituting:

=limh0xn+nxn1h+(n2)xn2h2+xnh= \lim_{h \to 0} \frac{x^n + nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \ldots - x^n}{h}
=limh0nxn1h+(n2)xn2h2+h= \lim_{h \to 0} \frac{nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \ldots}{h}
=limh0[nxn1+(n2)xn2h+]= \lim_{h \to 0} \left[ nx^{n-1} + \binom{n}{2}x^{n-2}h + \ldots \right]
=nxn1= nx^{n-1}

All terms with hh vanish as h0h \to 0, leaving only nxn1nx^{n-1}.

f(x)f'(x)Pattern
11·x⁰ = 1
2x2·x¹ = 2x
3x²3·x² = 3x²
x⁴4x³4·x³ = 4x³
x¹⁰10x⁹10·x⁹ = 10x⁹

Interactive Power Rule Explorer

Use the visualizer below to explore how the Power Rule works. Adjust the exponent nn and observe how both the function and its derivative change:

Interactive Power Rule Visualizer

Explore how the Power Rule transforms functions and their derivatives

-2-1123246810f(x) = x2f'(x) = 2x1Tangent line

Adjust the exponent to see how the derivative changes

At x = 1.00:

f(x) = 1.0000
f'(x) = 2.0000

The slope of the tangent line at this point is 2.0000

Power Rule Formula:

d/dx(x2) = 2x1

The Constant Multiple Rule

Constants can be "pulled out" of derivatives:

The Constant Multiple Rule

ddx[cf(x)]=cddx[f(x)]=cf(x)\frac{d}{dx}[c \cdot f(x)] = c \cdot \frac{d}{dx}[f(x)] = c \cdot f'(x)

Proof

ddx[cf(x)]=limh0cf(x+h)cf(x)h\frac{d}{dx}[cf(x)] = \lim_{h \to 0} \frac{cf(x+h) - cf(x)}{h}
=limh0cf(x+h)f(x)h= \lim_{h \to 0} c \cdot \frac{f(x+h) - f(x)}{h}
=climh0f(x+h)f(x)h= c \cdot \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}
=cf(x)= c \cdot f'(x)

Example: Find the derivative of f(x)=5x3f(x) = 5x^3.

f(x)=5ddx(x3)f'(x) = 5 \cdot \frac{d}{dx}(x^3) (Constant Multiple Rule)
=53x2= 5 \cdot 3x^2 (Power Rule)
=15x2= 15x^2

Sum and Difference Rules

The derivative of a sum is the sum of the derivatives:

Sum Rule

ddx[f(x)+g(x)]=f(x)+g(x)\frac{d}{dx}[f(x) + g(x)] = f'(x) + g'(x)

Difference Rule

ddx[f(x)g(x)]=f(x)g(x)\frac{d}{dx}[f(x) - g(x)] = f'(x) - g'(x)

Proof of Sum Rule

ddx[f(x)+g(x)]=limh0[f(x+h)+g(x+h)][f(x)+g(x)]h\frac{d}{dx}[f(x) + g(x)] = \lim_{h \to 0} \frac{[f(x+h) + g(x+h)] - [f(x) + g(x)]}{h}
=limh0[f(x+h)f(x)]+[g(x+h)g(x)]h= \lim_{h \to 0} \frac{[f(x+h) - f(x)] + [g(x+h) - g(x)]}{h}
=limh0f(x+h)f(x)h+limh0g(x+h)g(x)h= \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} + \lim_{h \to 0} \frac{g(x+h) - g(x)}{h}
=f(x)+g(x)= f'(x) + g'(x)

Key Insight

The Sum Rule means differentiation is a linear operator. This property is fundamental in advanced mathematics and allows us to differentiate complex expressions term by term.

Interactive Sum Rule Demo

Explore how the derivative of a sum equals the sum of derivatives:

Sum Rule Interactive Demo

The derivative of a sum equals the sum of the derivatives

f(x) + g(x)

Function f(x)

Function g(x)

Sum Rule in Action:

Derivatives:
f'(x) = 2x
g'(x) = 1.5x²
At x = 1.0:
f'(1.0) = 2.000
g'(1.0) = 1.500
(f + g)'(1.0) = 2.000 + 1.500 = 3.500

Combining the Rules: Polynomial Differentiation

With the Power Rule, Constant Multiple Rule, and Sum Rule, we can differentiate any polynomial:

Example 1: Find ddx(3x45x2+7x2)\frac{d}{dx}(3x^4 - 5x^2 + 7x - 2)

ddx(3x45x2+7x2)\frac{d}{dx}(3x^4 - 5x^2 + 7x - 2)
Apply Sum Rule (term by term):
=ddx(3x4)ddx(5x2)+ddx(7x)ddx(2)= \frac{d}{dx}(3x^4) - \frac{d}{dx}(5x^2) + \frac{d}{dx}(7x) - \frac{d}{dx}(2)
Apply Constant Multiple Rule:
=3ddx(x4)5ddx(x2)+7ddx(x)0= 3\frac{d}{dx}(x^4) - 5\frac{d}{dx}(x^2) + 7\frac{d}{dx}(x) - 0
Apply Power Rule:
=3(4x3)5(2x)+7(1)= 3(4x^3) - 5(2x) + 7(1)
=12x310x+7= 12x^3 - 10x + 7

Example 2: Find ddx(x6+2x5x3+4x9)\frac{d}{dx}(x^6 + 2x^5 - x^3 + 4x - 9)

=6x5+2(5x4)3x2+4(1)0= 6x^5 + 2(5x^4) - 3x^2 + 4(1) - 0
=6x5+10x43x2+4= 6x^5 + 10x^4 - 3x^2 + 4

Negative and Fractional Exponents

The Power Rule works for all real exponents, not just positive integers. This greatly extends its usefulness.

Negative Exponents

Recall that xn=1xnx^{-n} = \frac{1}{x^n}. The Power Rule still applies:

FunctionRewriteDerivative
1/xx⁻¹-1·x⁻² = -1/x²
1/x²x⁻²-2·x⁻³ = -2/x³
1/x³x⁻³-3·x⁻⁴ = -3/x⁴

Example: Find ddx(3x2)\frac{d}{dx}\left(\frac{3}{x^2}\right)

ddx(3x2)=ddx(3x2)\frac{d}{dx}\left(\frac{3}{x^2}\right) = \frac{d}{dx}(3x^{-2})
=3(2)x3= 3 \cdot (-2)x^{-3}
=6x3= -\frac{6}{x^3}

Fractional Exponents (Roots)

Roots can be written as fractional exponents: x=x1/2\sqrt{x} = x^{1/2}, x3=x1/3\sqrt[3]{x} = x^{1/3}, etc.

FunctionRewriteDerivative
√xx^(1/2)(1/2)x^(-1/2) = 1/(2√x)
∛xx^(1/3)(1/3)x^(-2/3)
x^(3/2)x^(3/2)(3/2)x^(1/2) = (3/2)√x

Example: Find ddx(x)\frac{d}{dx}(\sqrt{x})

ddx(x)=ddx(x1/2)\frac{d}{dx}(\sqrt{x}) = \frac{d}{dx}(x^{1/2})
=12x1/21= \frac{1}{2}x^{1/2 - 1}
=12x1/2= \frac{1}{2}x^{-1/2}
=12x= \frac{1}{2\sqrt{x}}

Real-World Applications

Physics: Motion

If position is given by a polynomial function of time, velocity and acceleration are found by differentiation:

Problem: A ball is thrown vertically. Its height (in meters) after tt seconds is h(t)=5t2+20t+2h(t) = -5t^2 + 20t + 2.

Find the velocity and acceleration functions.

Solution:

  • Velocity: v(t)=h(t)=10t+20v(t) = h'(t) = -10t + 20 m/s
  • Acceleration: a(t)=v(t)=10a(t) = v'(t) = -10 m/s² (constant, due to gravity)

At t=2t = 2 seconds: v(2)=10(2)+20=0v(2) = -10(2) + 20 = 0. This is when the ball reaches its maximum height.

Economics: Marginal Analysis

In economics, the derivative represents "marginal" quantities — the rate of change of one quantity with respect to another:

Problem: A company's cost to produce xx units is C(x)=0.01x30.9x2+35x+500C(x) = 0.01x^3 - 0.9x^2 + 35x + 500 dollars.

Find the marginal cost when producing 30 units.

Solution:

C(x)=0.03x21.8x+35C'(x) = 0.03x^2 - 1.8x + 35
C(30)=0.03(900)1.8(30)+35=2754+35=8C'(30) = 0.03(900) - 1.8(30) + 35 = 27 - 54 + 35 = 8

The 31st unit costs approximately $8 more to produce than the 30th.

Biology: Population Growth

Population models often involve derivatives to understand growth rates:

Problem: A bacterial colony population after tt hours is modeled by P(t)=100+50t+2t2P(t) = 100 + 50t + 2t^2.

Find the growth rate at t = 5 hours.

Solution:

Growth rate: P(t)=50+4tP'(t) = 50 + 4t
At t=5t = 5: P(5)=50+20=70P'(5) = 50 + 20 = 70 bacteria per hour

Machine Learning Connection

The derivative rules are the foundation of gradient-based optimization, which powers virtually all modern machine learning.

Polynomial Regression

In polynomial regression, we fit a model of the form: y=w0+w1x+w2x2++wnxny = w_0 + w_1 x + w_2 x^2 + \ldots + w_n x^n

To minimize the loss function, we need derivatives with respect to each weight wiw_i. The Power Rule tells us exactly how polynomial features contribute to the gradient.

Gradient Descent

The update rule for gradient descent is: wnew=woldαLww_{new} = w_{old} - \alpha \cdot \frac{\partial L}{\partial w}

Computing Lw\frac{\partial L}{\partial w} requires the derivative rules. For polynomial models:

  • Power Rule: Differentiates each polynomial feature
  • Sum Rule: Combines gradients from multiple terms
  • Constant Rule: Handles bias terms (their gradient is simpler)

Automatic Differentiation

Modern deep learning frameworks like PyTorch and TensorFlow use automatic differentiation to compute gradients. Under the hood, they apply the same rules we're learning — Power, Sum, Product, Chain — to build a computational graph and compute derivatives efficiently.


Python Implementation

Computing Polynomial Derivatives

Let's implement the derivative rules in Python:

Implementing Derivative Rules
🐍polynomial_derivatives.py
3Power Rule Function

This function implements the power rule: d/dx(cx^n) = cnx^(n-1). It takes the coefficient, exponent, and point x as inputs.

16Constant Derivative

When the exponent is 0, we have a constant term (c * x^0 = c). The derivative of a constant is always 0.

19Applying the Rule

The new coefficient is the original coefficient times the exponent. The new exponent is one less than the original.

30Polynomial Derivative

Using the Sum Rule, we can differentiate each term separately and add the results. This is exactly how computers handle polynomial derivatives.

44 lines without explanation
1import numpy as np
2
3def power_rule(coefficient, exponent, x):
4    """
5    Apply the power rule to find derivative of c * x^n at point x.
6
7    Power Rule: d/dx(c * x^n) = c * n * x^(n-1)
8
9    Args:
10        coefficient: The constant multiplier c
11        exponent: The power n
12        x: The point at which to evaluate
13
14    Returns:
15        The derivative value at x
16    """
17    if exponent == 0:
18        return 0  # Derivative of constant is 0
19
20    new_coefficient = coefficient * exponent
21    new_exponent = exponent - 1
22    return new_coefficient * (x ** new_exponent)
23
24# Examples
25print("Power Rule Examples:")
26print(f"d/dx(x^2) at x=3: {power_rule(1, 2, 3)}")        # 2*3 = 6
27print(f"d/dx(5x^3) at x=2: {power_rule(5, 3, 2)}")      # 15*4 = 60
28print(f"d/dx(x^(-1)) at x=2: {power_rule(1, -1, 2)}")   # -1/4 = -0.25
29
30def polynomial_derivative(coefficients, x):
31    """
32    Compute derivative of a polynomial at point x.
33
34    coefficients[i] is the coefficient of x^i
35    Uses Sum Rule: d/dx(f + g) = f' + g'
36
37    Example: [3, 2, 1] represents 3 + 2x + x^2
38    """
39    derivative = 0
40    for power, coef in enumerate(coefficients):
41        derivative += power_rule(coef, power, x)
42    return derivative
43
44# Example: f(x) = 3 + 2x + x^2, f'(x) = 2 + 2x
45coeffs = [3, 2, 1]  # 3 + 2x + x^2
46print(f"\nPolynomial 3 + 2x + x^2:")
47print(f"  f'(1) = {polynomial_derivative(coeffs, 1)}")  # 2 + 2 = 4
48print(f"  f'(2) = {polynomial_derivative(coeffs, 2)}")  # 2 + 4 = 6

Application to Machine Learning

Here's how these rules appear in gradient descent for polynomial regression:

Derivative Rules in ML
🐍polynomial_regression.py
5Polynomial Feature Gradient

In polynomial regression, each feature x^i requires derivative rules. The power rule tells us how each feature changes with x.

17Feature Construction

Creating [1, x, x^2, ...] uses the power rule implicitly. Each x^i follows the pattern we learned.

29Gradient Computation

The gradient combines chain rule with our basic rules. The power rule applies to polynomial features, the sum rule to combining gradients.

38Weight Update

Gradient descent uses derivatives at every step. The derivative rules make computing these gradients tractable for any polynomial model.

54 lines without explanation
1import numpy as np
2
3# In machine learning, we often need gradients of polynomial features
4
5def compute_polynomial_features_gradient(x, weights, degree):
6    """
7    Compute gradient of polynomial regression loss w.r.t. weights.
8
9    Model: y_pred = w_0 + w_1*x + w_2*x^2 + ... + w_d*x^d
10    Loss: L = (y_true - y_pred)^2
11
12    For SGD, we need dL/dw_i for each weight.
13
14    Using chain rule and power rule:
15    dL/dw_i = -2 * (y_true - y_pred) * x^i
16    """
17    # Create polynomial features: [1, x, x^2, ..., x^d]
18    features = np.array([x ** i for i in range(degree + 1)])
19
20    # Prediction
21    y_pred = np.dot(weights, features)
22
23    return features, y_pred
24
25# Simple gradient descent step
26def gradient_step(x, y_true, weights, learning_rate, degree):
27    """
28    One step of gradient descent for polynomial regression.
29
30    The derivative rules (power, sum, constant) appear here:
31    - Power rule: d/dx(x^n) = n*x^(n-1) in feature computation
32    - Sum rule: total gradient is sum of individual gradients
33    - Constant rule: bias term gradient is just the error signal
34    """
35    features, y_pred = compute_polynomial_features_gradient(x, weights, degree)
36    error = y_true - y_pred
37
38    # Gradient for each weight
39    gradient = -2 * error * features
40
41    # Update weights
42    new_weights = weights - learning_rate * gradient
43
44    return new_weights, error ** 2
45
46# Example: Fit y = x^2 with polynomial regression
47np.random.seed(42)
48weights = np.random.randn(3) * 0.1  # degree 2: [w_0, w_1, w_2]
49learning_rate = 0.01
50
51print("Training polynomial regression:")
52for step in range(5):
53    x = np.random.uniform(-2, 2)
54    y_true = x ** 2  # True function
55    weights, loss = gradient_step(x, y_true, weights, learning_rate, 2)
56    print(f"Step {step+1}: weights = {weights.round(3)}, loss = {loss:.4f}")
57
58print(f"\nFinal weights (should approach [0, 0, 1]): {weights.round(3)}")

Common Mistakes to Avoid

Mistake 1: Forgetting to reduce the exponent

Wrong: ddx(x5)=5x5\frac{d}{dx}(x^5) = 5x^5

Correct: ddx(x5)=5x4\frac{d}{dx}(x^5) = 5x^4

The Power Rule requires reducing the exponent by 1.

Mistake 2: Treating x like a constant

Wrong: ddx(x)=0\frac{d}{dx}(x) = 0

Correct: ddx(x)=ddx(x1)=1x0=1\frac{d}{dx}(x) = \frac{d}{dx}(x^1) = 1 \cdot x^0 = 1

The variable xx is not a constant — it's x1x^1.

Mistake 3: Confusing coefficient and exponent

Wrong: ddx(3x2)=6x2\frac{d}{dx}(3x^2) = 6x^2

Correct: ddx(3x2)=32x1=6x\frac{d}{dx}(3x^2) = 3 \cdot 2x^1 = 6x

The 3 is a coefficient (stays), the 2 comes down as multiplier, and the exponent reduces.

Mistake 4: Forgetting the constant of a linear term

Wrong: ddx(5x)=0\frac{d}{dx}(5x) = 0

Correct: ddx(5x)=51=5\frac{d}{dx}(5x) = 5 \cdot 1 = 5

5x=5x15x = 5x^1, so the derivative is 51x0=55 \cdot 1 \cdot x^0 = 5.


Test Your Understanding

Derivative Rules QuizQuestion 1 of 8

What is the derivative of f(x) = x⁵?

Score: 0/0

Summary

The derivative rules provide powerful shortcuts for computing derivatives without using the limit definition every time.

The Core Rules

RuleFormulaExample
Constant Ruled/dx(c) = 0d/dx(7) = 0
Power Ruled/dx(xⁿ) = nxⁿ⁻¹d/dx(x³) = 3x²
Constant Multipled/dx[cf(x)] = cf'(x)d/dx(5x²) = 10x
Sum Ruled/dx[f + g] = f' + g'd/dx(x² + x) = 2x + 1
Difference Ruled/dx[f - g] = f' - g'd/dx(x³ - x) = 3x² - 1

Key Takeaways

  1. The Power Rule is the workhorse: ddx(xn)=nxn1\frac{d}{dx}(x^n) = nx^{n-1} for any real nn
  2. Constants have zero derivative because they don't change
  3. The Sum Rule allows term-by-term differentiation of polynomials
  4. The Power Rule works for negative and fractional exponents too
  5. These rules are building blocks for the Product, Quotient, and Chain Rules
  6. Machine learning uses these rules constantly in gradient computation
The Power of Shortcuts:
"With these three rules, we can differentiate any polynomial instantly — a task that would take pages using limits."
Coming Next: In the next section, we'll learn the Product Rule — how to differentiate products of functions, which is essential when our functions can't be written as simple polynomials.
Loading comments...