Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Define partial derivatives as rates of change with respect to one variable while holding others constant
Compute partial derivatives using standard differentiation rules
Interpret partial derivatives geometrically as slopes of tangent lines on surfaces
Calculate higher-order partial derivatives and understand Clairaut's theorem
Apply partial derivatives in physics, engineering, and machine learning contexts
Connect partial derivatives to gradients and optimization algorithms

The Big Picture: Rates of Change in Multiple Directions

"The partial derivative answers: How fast is this quantity changing if I only adjust one input while freezing all the others?"

In single-variable calculus, the derivative $f'(x)$ tells us how fast $f$ changes as $x$ changes. But what happens when a function depends on multiple variables?

Consider the temperature $T(x, y, t)$ at position $(x, y)$ and time $t$ . You might ask:

How does temperature change as I move east (increasing x)?
How does temperature change as I move north (increasing y)?
How does temperature change as time passes (increasing t)?

Each question isolates one variable while holding the others fixed. The answers are partial derivatives.

Where Partial Derivatives Appear

Physics

Heat equation: temperature diffusion
Wave equation: vibrating strings
Maxwell's equations: electromagnetism
Fluid dynamics: Navier-Stokes equations

Machine Learning

Gradient descent: optimize loss functions
Backpropagation: train neural networks
Sensitivity analysis: feature importance
Jacobians: coordinate transformations

Economics

Marginal cost and revenue
Utility maximization
Production functions
Price elasticity

Engineering

Stress-strain analysis
Control systems
Signal processing
Thermodynamics

Historical Context: The Birth of Multivariable Calculus

Partial derivatives emerged in the 18th century as mathematicians tackled problems involving multiple changing quantities. Leonhard Euler (1707–1783) was among the first to systematically use partial derivatives, particularly in his work on fluid dynamics and the calculus of variations.

Joseph-Louis Lagrange (1736–1813) extended these ideas to mechanics, using partial derivatives to reformulate Newton's laws in terms of energy. His Mécanique Analytique (1788) laid the groundwork for modern physics.

Alexis Clairaut (1713–1765) proved the famous theorem bearing his name: that mixed partial derivatives are equal (under continuity conditions). This result simplifies countless calculations in applied mathematics.

The Notation

The "∂" symbol (a rounded "d") was introduced by Adrien-Marie Legendre in 1786 to distinguish partial derivatives from ordinary derivatives. This notation, refined by Carl Jacobi, became standard by the mid-19th century.

Definition of Partial Derivatives

Definition: Partial Derivative

Let $f(x, y)$ be a function of two variables. The partial derivative of f with respect to x at $(a, b)$ is:

\frac{\partial f}{\partial x}(a, b) = \lim_{h \to 0} \frac{f(a + h, b) - f(a, b)}{h}

Similarly, the partial derivative of f with respect to y at $(a, b)$ is:

\frac{\partial f}{\partial y}(a, b) = \lim_{h \to 0} \frac{f(a, b + h) - f(a, b)}{h}

Key insight: When computing $\partial f/\partial x$ , we treat $y$ as a constant. When computing $\partial f/\partial y$ , we treat $x$ as a constant.

This is exactly like taking an ordinary derivative, but with some variables "frozen" at fixed values.

Notation for Partial Derivatives

There are several common notations for partial derivatives. All of the following mean "the partial derivative of f with respect to x":

Notation	Name	Common Usage
∂f/∂x	Leibniz notation	General, emphasizes the variable
fₓ	Subscript notation	Quick, common in applied math
f₁	Numeric subscript	When variables are numbered (x₁, x₂, ...)
∂ₓf	Operator notation	Emphasizes partial differentiation as an operator
Dₓf	D notation	Alternative operator notation

For evaluated partial derivatives:

$\frac{\partial f}{\partial x}\bigg|_{(a,b)}$ or $\frac{\partial f}{\partial x}(a,b)$ — at the point (a, b)
$f_x(a, b)$ — subscript notation at a point

Choosing Notation

Use $\partial f/\partial x$ when you want to emphasize which variable is changing. Use $f_x$ for compactness in longer expressions. Be consistent within a single problem.

Geometric Interpretation

The partial derivative $\partial f/\partial x$ at a point has a beautiful geometric meaning. Imagine the surface $z = f(x, y)$ :

Fix y = b: This creates a vertical plane parallel to the xz-plane.
Intersect with surface: The plane cuts the surface, creating a curve.
The partial derivative: $\partial f/\partial x$ is the slope of this curve at the point.

Similarly, $\partial f/\partial y$ is the slope of the curve obtained by slicing the surface with a plane parallel to the yz-plane (fixing x).

Geometric Summary

$\frac{\partial f}{\partial x}$ : Slope in the x-direction (walking east on the surface)
$\frac{\partial f}{\partial y}$ : Slope in the y-direction (walking north on the surface)
Together, these determine the tangent plane to the surface

Interactive 3D Visualizer

Explore partial derivatives on various surfaces. The red line shows the tangent in the x-direction (slope = $\partial f/\partial x$ ), and the green line shows the tangent in the y-direction (slope = $\partial f/\partial y$ ).

Interactive 3D Partial Derivatives Visualizer

Surface Function

f(x,y) = x^2 + y^2

Point x = 0.50

Point y = 0.50

At point (0.50, 0.50):

f(x,y) = 0.5000

∂f/∂x = 1.0000

∂f/∂y = 1.0000

Display Options

∂f/∂x (Red Line)

Rate of change when moving in the x-direction, keeping y constant. The slope of the red curve at the point.

∂f/∂y (Green Line)

Rate of change when moving in the y-direction, keeping x constant. The slope of the green curve at the point.

Key Insight

Partial derivatives give the slope along coordinate axes. Together, they define the tangent plane to the surface.

What to Explore

Move the point around — watch how partial derivatives change
Try the saddle surface — partials have opposite signs!
Find points where both partials are zero — these are critical points
Toggle the slices — see the 1D curves whose slopes give the partials

Computing Partial Derivatives

To compute a partial derivative, simply apply standard differentiation rules while treating other variables as constants.

The Key Rule

When computing $\partial f/\partial x$ , treat y (and all other variables) as constants.

Example 1: Polynomial

Find the partial derivatives of $f(x, y) = x^3 + 2x^2y - 3y^2$ .

∂f/∂x: Treat y as constant.

$\frac{\partial f}{\partial x} = 3x^2 + 4xy - 0 = 3x^2 + 4xy$

∂f/∂y: Treat x as constant.

$\frac{\partial f}{\partial y} = 0 + 2x^2 - 6y = 2x^2 - 6y$

Example 2: Exponential

Find the partial derivatives of $f(x, y) = e^{xy}$ .

∂f/∂x: Use chain rule, with y as a constant coefficient.

$\frac{\partial f}{\partial x} = e^{xy} \cdot y = ye^{xy}$

∂f/∂y: Use chain rule, with x as a constant coefficient.

$\frac{\partial f}{\partial y} = e^{xy} \cdot x = xe^{xy}$

Example 3: Trigonometric

Find the partial derivatives of $f(x, y) = \sin(x^2 y)$ .

∂f/∂x: Chain rule with inner function $u = x^2 y$ .

$\frac{\partial f}{\partial x} = \cos(x^2 y) \cdot 2xy = 2xy\cos(x^2 y)$

∂f/∂y: Chain rule with inner function.

$\frac{\partial f}{\partial y} = \cos(x^2 y) \cdot x^2 = x^2\cos(x^2 y)$

Step-by-Step Calculator

Work through several examples with guided step-by-step solutions. Click through each step to see how to compute partial derivatives for different types of functions.

Step-by-Step Partial Derivative Calculator

f(x,y) = x³ + 2xy² - 3y

Computing ∂f/∂x (derivative with respect to x)

Treat y as a constant

text{Hold } y text{ fixed}

Differentiate x³ → 3x²

frac{partial}{partial x}(x^3) = 3x^2

Differentiate 2xy² → 2y² (y² is constant)

frac{partial}{partial x}(2xy^2) = 2y^2

Differentiate -3y → 0 (constant)

frac{partial}{partial x}(-3y) = 0

Combine all terms

frac{partial f}{partial x} = 3x^2 + 2y^2

Key Rule

When computing ∂f/∂x, treat all other variables as constants. Apply standard differentiation rules (power rule, chain rule, product rule, etc.) just as you would for a function of one variable.

Higher-Order Partial Derivatives

Just as we can take second derivatives in single-variable calculus, we can take second partial derivatives — and there are more options!

Second Partial Derivatives

For $f(x, y)$ , there are four second partial derivatives:

Notation	Meaning
∂²f/∂x² or fₓₓ	Differentiate with respect to x twice
∂²f/∂y² or f_yy	Differentiate with respect to y twice
∂²f/∂x∂y or f_xy	Differentiate with respect to y, then x
∂²f/∂y∂x or f_yx	Differentiate with respect to x, then y

Example

Find all second partial derivatives of $f(x, y) = x^2 y^3$ .

First partials:

$f_x = 2xy^3, \quad f_y = 3x^2 y^2$

Second partials:

$f_{xx} = \frac{\partial}{\partial x}(2xy^3) = 2y^3$

$f_{yy} = \frac{\partial}{\partial y}(3x^2 y^2) = 6x^2 y$

$f_{xy} = \frac{\partial}{\partial y}(2xy^3) = 6xy^2$

$f_{yx} = \frac{\partial}{\partial x}(3x^2 y^2) = 6xy^2$

Notice: $f_{xy} = f_{yx} = 6xy^2$ !

Clairaut's Theorem: Equality of Mixed Partials

Clairaut's Theorem

If $f$ is a function defined on a disk $D$ containing $(a, b)$ , and if $f_{xy}$ and $f_{yx}$ are both continuous on $D$ , then:

\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f}{\partial y \partial x}

In other words: The order of differentiation does not matter (provided the mixed partials are continuous).

Why This Matters

Clairaut's theorem dramatically simplifies computations. Instead of computing both $f_{xy}$ and $f_{yx}$ , we can compute whichever is easier and know they're equal. For a function of $n$ variables, this reduces the number of independent second partials from $n^2$ to $n(n+1)/2$ .

Partial Derivatives, Contours, and the Gradient

The partial derivatives combine to form the gradient vector:

\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right)

The gradient has two remarkable properties:

Direction of steepest ascent: The gradient points in the direction where $f$ increases most rapidly.
Perpendicular to contours: The gradient is always perpendicular to level curves (contour lines).

Explore this relationship in the interactive contour plot below:

Contour Plot with Gradient and Partial Derivatives

Function

x = 0.50

y = 0.50

At (0.50, 0.50):

f = 0.5000

∂f/∂x = 1.0000

∂f/∂y = 1.0000

∇f = (1.00, 1.00)

|∇f| = 1.4142

Tip: Click and drag to move the point. The gradient (purple) always points in the direction of steepest ascent, perpendicular to contour lines.

∂f/∂x (Red)

Rate of change in x-direction. Points right when increasing, left when decreasing.

∂f/∂y (Green)

Rate of change in y-direction. Points up when increasing, down when decreasing.

∇f (Purple)

The gradient vector combines both partials. It always points perpendicular to contour lines.

Applications in Physics

The Heat Equation

Temperature distribution $u(x, t)$ in a rod satisfies:

\frac{\partial u}{\partial t} = k \frac{\partial^2 u}{\partial x^2}

This says the rate of temperature change at a point is proportional to the "curvature" of the temperature profile (how much hotter or colder it is than its neighbors).

The Wave Equation

Vibrations of a string $u(x, t)$ satisfy:

\frac{\partial^2 u}{\partial t^2} = c^2 \frac{\partial^2 u}{\partial x^2}

This relates acceleration ( $\partial^2 u/\partial t^2$ ) to curvature ( $\partial^2 u/\partial x^2$ ).

Electromagnetism

Maxwell's equations involve partial derivatives of electric and magnetic fields:

\nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial t}

and

\nabla \times \mathbf{B} = \mu_0 \mathbf{J} + \mu_0 \varepsilon_0 \frac{\partial \mathbf{E}}{\partial t}

Applications in Machine Learning

Partial derivatives are the foundation of machine learning optimization. Every time a neural network trains, it computes thousands of partial derivatives.

Loss Functions and Gradients

A neural network has a loss function $L(w_1, w_2, \ldots, w_n)$ depending on all weights. To minimize the loss, we compute the gradient:

\nabla L = \left( \frac{\partial L}{\partial w_1}, \frac{\partial L}{\partial w_2}, \ldots, \frac{\partial L}{\partial w_n} \right)

Gradient Descent

To minimize $L$ , we update weights in the opposite direction of the gradient:

w_i \leftarrow w_i - \alpha \frac{\partial L}{\partial w_i}

where $\alpha$ is the learning rate.

Backpropagation = Chain Rule for Partial Derivatives

The famous backpropagation algorithm is simply the chain rule applied to compute all partial derivatives efficiently:

\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}

Why "Back" Propagation?

We propagate derivatives backward through the network: starting from the loss, we compute how the loss changes with respect to each layer's output, then with respect to each weight.

Python Implementation

Computing Partial Derivatives Numerically

Numerical Partial Derivatives

🐍partial_derivatives.py

Explanation(4)

Code(63)

3Numerical Partial Derivatives

We compute partial derivatives numerically using the central difference formula. This approximates the derivative by looking at function values on either side of the point.

EXAMPLE

∂f/∂x ≈ [f(x+h, y) - f(x-h, y)] / (2h)

17Central Difference Method

The central difference formula is more accurate than forward or backward differences. It has O(h²) error rather than O(h) error.

21Example Function

We test with f(x,y) = x²y + sin(y). This has mixed polynomial and trigonometric terms to demonstrate how partial derivatives treat other variables as constants.

26Analytical Derivatives

For comparison, we compute exact partial derivatives: ∂f/∂x = 2xy (treating y as constant) and ∂f/∂y = x² + cos(y) (treating x as constant).

59 lines without explanation

1import numpy as np
2from scipy.misc import derivative
3
4def partial_derivative_numeric(f, point, var_index, h=1e-5):
5    """
6    Compute partial derivative numerically using central difference.
7
8    Parameters:
9    - f: function taking a tuple/array of variables
10    - point: tuple/array of values (x, y, z, ...)
11    - var_index: which variable to differentiate (0=x, 1=y, etc.)
12    - h: step size for numerical differentiation
13
14    Returns: approximate partial derivative at the point
15    """
16    point = np.array(point, dtype=float)
17
18    # Create points offset in the direction of var_index
19    point_plus = point.copy()
20    point_minus = point.copy()
21    point_plus[var_index] += h
22    point_minus[var_index] -= h
23
24    # Central difference formula
25    return (f(point_plus) - f(point_minus)) / (2 * h)
26
27# Example: f(x, y) = x^2 * y + sin(y)
28def f(p):
29    x, y = p
30    return x**2 * y + np.sin(y)
31
32# Analytical partial derivatives for comparison
33def f_x(p):
34    x, y = p
35    return 2 * x * y  # ∂f/∂x
36
37def f_y(p):
38    x, y = p
39    return x**2 + np.cos(y)  # ∂f/∂y
40
41# Test at point (1, π/4)
42point = (1.0, np.pi/4)
43
44# Numerical partial derivatives
45fx_num = partial_derivative_numeric(f, point, 0)
46fy_num = partial_derivative_numeric(f, point, 1)
47
48# Analytical partial derivatives
49fx_exact = f_x(point)
50fy_exact = f_y(point)
51
52print("Function: f(x,y) = x²y + sin(y)")
53print(f"At point ({point[0]}, π/4):")
54print()
55print("Partial with respect to x:")
56print(f"  Numerical: {fx_num:.6f}")
57print(f"  Analytical: {fx_exact:.6f}")
58print(f"  Error: {abs(fx_num - fx_exact):.2e}")
59print()
60print("Partial with respect to y:")
61print(f"  Numerical: {fy_num:.6f}")
62print(f"  Analytical: {fy_exact:.6f}")
63print(f"  Error: {abs(fy_num - fy_exact):.2e}")

The Gradient and Gradient Descent

Gradient Computation and Optimization

🐍gradient_descent.py

Explanation(4)

Code(78)

3The Gradient Vector

The gradient ∇f is a vector containing all partial derivatives. For f(x,y), ∇f = (∂f/∂x, ∂f/∂y). It points in the direction of steepest ascent.

16Computing Each Component

We loop through each variable and compute its partial derivative using the central difference formula. Each becomes one component of the gradient vector.

34Gradient Direction

The gradient always points toward the direction of steepest increase. Its magnitude tells you how steep that direction is. At (1,0), ∇f = (2,0) points purely in the x-direction.

46Gradient Descent

To minimize a function, we move opposite to the gradient direction: x_new = x_old - α·∇f. The learning rate α controls step size. This is the foundation of neural network training!

74 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3from mpl_toolkits.mplot3d import Axes3D
4
5def compute_gradient(f, point, h=1e-5):
6    """
7    Compute the gradient vector ∇f at a point.
8
9    The gradient is the vector of all partial derivatives:
10    ∇f = (∂f/∂x, ∂f/∂y, ∂f/∂z, ...)
11    """
12    point = np.array(point, dtype=float)
13    n = len(point)
14    gradient = np.zeros(n)
15
16    for i in range(n):
17        point_plus = point.copy()
18        point_minus = point.copy()
19        point_plus[i] += h
20        point_minus[i] -= h
21        gradient[i] = (f(point_plus) - f(point_minus)) / (2 * h)
22
23    return gradient
24
25# Example: f(x, y) = x² + y² (paraboloid)
26def paraboloid(p):
27    return p[0]**2 + p[1]**2
28
29# Test points
30points = [
31    (1.0, 0.0),
32    (0.0, 1.0),
33    (1.0, 1.0),
34    (0.5, 0.5),
35]
36
37print("Gradient of f(x,y) = x² + y²")
38print("=" * 40)
39print()
40
41for pt in points:
42    grad = compute_gradient(paraboloid, pt)
43    mag = np.linalg.norm(grad)
44    print(f"At ({pt[0]:.1f}, {pt[1]:.1f}):")
45    print(f"  ∇f = ({grad[0]:.2f}, {grad[1]:.2f})")
46    print(f"  |∇f| = {mag:.2f}")
47    print(f"  Direction: {np.degrees(np.arctan2(grad[1], grad[0])):.1f}°")
48    print()
49
50# Machine Learning Connection: Gradient Descent
51print("=" * 40)
52print("Gradient Descent for Minimization")
53print("=" * 40)
54print()
55
56def gradient_descent(f, start, learning_rate=0.1, n_iterations=20):
57    """
58    Simple gradient descent to find minimum.
59    Updates: x_new = x_old - α * ∇f
60    """
61    x = np.array(start, dtype=float)
62    history = [x.copy()]
63
64    for i in range(n_iterations):
65        grad = compute_gradient(f, x)
66        x = x - learning_rate * grad
67        history.append(x.copy())
68        if i < 5 or i >= n_iterations - 2:
69            print(f"Step {i+1}: x = ({x[0]:.4f}, {x[1]:.4f}), f(x) = {f(x):.6f}")
70        elif i == 5:
71            print("...")
72
73    return x, history
74
75start = (2.0, 2.0)
76minimum, path = gradient_descent(paraboloid, start)
77print(f"\nFound minimum at ({minimum[0]:.6f}, {minimum[1]:.6f})")
78print(f"Function value at minimum: {paraboloid(minimum):.6f}")

Higher-Order Partial Derivatives

Second-Order and Mixed Partials

🐍higher_order_partials.py

Explanation(4)

Code(67)

4Second Derivatives

∂²f/∂x² means differentiate twice with respect to x. This measures how the slope changes — the concavity of the function in the x-direction.

12Mixed Partials

∂²f/∂x∂y means first differentiate with respect to y, then with respect to x. This measures how the x-slope changes as y changes.

18Clairaut's Theorem

For functions with continuous second partial derivatives, ∂²f/∂x∂y = ∂²f/∂y∂x. The order of differentiation does not matter!

52Hessian Matrix

The four second partials form the Hessian matrix H = [[∂²f/∂x², ∂²f/∂x∂y], [∂²f/∂y∂x, ∂²f/∂y²]]. This is crucial for optimization and determines if a critical point is a min, max, or saddle.

63 lines without explanation

1import numpy as np
2from functools import partial
3
4def second_partial_xx(f, point, h=1e-4):
5    """Second partial derivative ∂²f/∂x²"""
6    x, y = point
7    return (f((x+h, y)) - 2*f((x, y)) + f((x-h, y))) / h**2
8
9def second_partial_yy(f, point, h=1e-4):
10    """Second partial derivative ∂²f/∂y²"""
11    x, y = point
12    return (f((x, y+h)) - 2*f((x, y)) + f((x, y-h))) / h**2
13
14def mixed_partial_xy(f, point, h=1e-4):
15    """Mixed partial derivative ∂²f/∂x∂y"""
16    x, y = point
17    return (f((x+h, y+h)) - f((x+h, y-h))
18            - f((x-h, y+h)) + f((x-h, y-h))) / (4 * h**2)
19
20def mixed_partial_yx(f, point, h=1e-4):
21    """Mixed partial derivative ∂²f/∂y∂x"""
22    return mixed_partial_xy(f, point, h)  # By Clairaut's theorem!
23
24# Example: f(x, y) = x³y² + sin(xy)
25def f(p):
26    x, y = p[0], p[1]
27    return x**3 * y**2 + np.sin(x * y)
28
29# Analytical second derivatives for verification
30def f_xx(p):
31    x, y = p
32    return 6*x*y**2 - y**2 * np.sin(x*y)
33
34def f_yy(p):
35    x, y = p
36    return 2*x**3 - x**2 * np.sin(x*y)
37
38def f_xy(p):
39    x, y = p
40    return 6*x**2*y + np.cos(x*y) - x*y*np.sin(x*y)
41
42# Test at (1, π/2)
43point = (1.0, np.pi/2)
44
45print("Second-Order Partial Derivatives")
46print("Function: f(x,y) = x³y² + sin(xy)")
47print(f"At point (1, π/2):")
48print("=" * 40)
49
50# Compute all second partials
51fxx_num = second_partial_xx(f, point)
52fyy_num = second_partial_yy(f, point)
53fxy_num = mixed_partial_xy(f, point)
54fyx_num = mixed_partial_yx(f, point)
55
56print(f"∂²f/∂x² (numerical):  {fxx_num:.4f}")
57print(f"∂²f/∂x² (analytical): {f_xx(point):.4f}")
58print()
59print(f"∂²f/∂y² (numerical):  {fyy_num:.4f}")
60print(f"∂²f/∂y² (analytical): {f_yy(point):.4f}")
61print()
62print(f"∂²f/∂x∂y (numerical): {fxy_num:.4f}")
63print(f"∂²f/∂y∂x (numerical): {fyx_num:.4f}")
64print(f"∂²f/∂x∂y (analytical): {f_xy(point):.4f}")
65print()
66print("Clairaut's Theorem Verification:")
67print(f"  ∂²f/∂x∂y = ∂²f/∂y∂x? {np.isclose(fxy_num, fyx_num)}")

Common Pitfalls

Pitfall 1: Forgetting to Hold Variables Constant

When computing $\partial f/\partial x$ , every $y$ is a constant, not a variable! Don't use product rule on terms like $xy$ — just write $\partial(xy)/\partial x = y$ .

Pitfall 2: Confusing Notation Order

In $\partial^2 f/\partial x \partial y$ , we differentiate with respect to $y$ first, then $x$ . The variable closest to $f$ is applied first. (Mnemonic: read from right to left, or "inside-out".)

Pitfall 3: Assuming All Functions Have Equal Mixed Partials

Clairaut's theorem requires continuity of the mixed partials. Counterexamples exist for discontinuous functions! Always verify continuity if in doubt.

Pro Tip: Checking Your Work

After computing partial derivatives, verify by: (1) substituting simple values like $x = 1, y = 0$ ; (2) checking units/dimensions match; (3) confirming $f_{xy} = f_{yx}$ for smooth functions.

Test Your Understanding

Test Your UnderstandingQuestion 1 of 8

Find ∂f/∂x for:

f(x,y) = x²y + 3xy²

Score: 0/8

Summary

Partial derivatives extend the concept of rate of change to functions of multiple variables. They are computed by treating all other variables as constants.

Key Formulas

Concept	Formula	Meaning
Definition	∂f/∂x = lim[f(x+h,y) - f(x,y)]/h	Rate of change in x-direction
Gradient	∇f = (∂f/∂x, ∂f/∂y)	Vector of all partial derivatives
Clairaut	∂²f/∂x∂y = ∂²f/∂y∂x	Mixed partials are equal (if continuous)
Chain Rule	∂f/∂t = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt)	Composition with parameter
Gradient Descent	x ← x - α∇f	Optimization update

Key Takeaways

Partial derivatives measure rate of change with respect to one variable while others stay fixed
Geometrically, they give slopes of tangent lines in coordinate directions
The gradient $\nabla f$ points toward steepest ascent and is perpendicular to contours
Clairaut's theorem lets us interchange the order of differentiation for smooth functions
In machine learning, partial derivatives power gradient descent and backpropagation
PDEs (heat, wave, Maxwell's equations) are expressed using partial derivatives

The Essence of Partial Derivatives:

"Change one thing at a time, hold everything else fixed — and measure what happens."

Coming Next: In the next section, we'll explore Tangent Planes and Linear Approximations. Just as the derivative gives a linear approximation in 1D, partial derivatives define a tangent plane that best approximates a surface near a point.