Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Derive the equation of the tangent plane to a surface $z = f(x, y)$ at any point using partial derivatives
Construct linear approximations to functions of two variables and estimate values near a point of tangency
Calculate total differentials and understand their relationship to linear approximation
Distinguish between continuity and differentiability in multiple dimensions
Analyze the error in linear approximations and understand when the approximation is valid
Apply these concepts to problems in optimization, sensitivity analysis, and machine learning

The Big Picture: Flattening the World

"Every smooth surface, when viewed closely enough, looks flat. The tangent plane is that flat approximation."

When you stand on the Earth, you perceive it as flat even though it is a sphere. This is because at any point, the Earth's surface can be well-approximated by a flat plane — the tangent plane at that point. This fundamental idea extends to any smooth surface in mathematics.

In single-variable calculus, we learned that a differentiable function $y = f(x)$ can be approximated near a point $x = a$ by its tangent line:

f(x) \approx f(a) + f'(a)(x - a)

Now we extend this powerful idea to functions of two variables. Instead of a tangent line, we have a tangent plane. Instead of one derivative, we use both partial derivatives.

Single Variable

Curve $y = f(x)$
Tangent line at $x = a$
Uses one derivative: $f'(a)$
Linear approximation in 1D

Two Variables

Surface $z = f(x, y)$
Tangent plane at $(a, b)$
Uses two partials: $f_x, f_y$
Linear approximation in 2D

Why Tangent Planes Matter

Tangent planes are the foundation of differential calculus in higher dimensions. They enable:

Optimization: Gradient descent uses linear approximation to find minima
Sensitivity Analysis: How do small input changes affect outputs?
Error Propagation: How do measurement errors compound?
Numerical Methods: Newton's method in multiple dimensions

From Tangent Lines to Tangent Planes

Consider a surface $z = f(x, y)$ and a point $P = (a, b, f(a, b))$ on this surface. We want to find the plane that best approximates the surface near this point.

The Two Tangent Lines

At point $P$ , we can draw two special curves on the surface:

The curve where $y = b$ (constant): This gives us $z = f(x, b)$ , a function of $x$ alone. Its tangent line at $x = a$ has slope $f_x(a, b)$ .
The curve where $x = a$ (constant): This gives us $z = f(a, y)$ , a function of $y$ alone. Its tangent line at $y = b$ has slope $f_y(a, b)$ .

These two tangent lines lie in the tangent plane, and together they determine the tangent plane uniquely.

Key Geometric Insight

The tangent plane contains:

The tangent to the x-slice curve (red): direction $\langle 1, 0, f_x \rangle$
The tangent to the y-slice curve (blue): direction $\langle 0, 1, f_y \rangle$

Any plane containing both these vectors is the tangent plane.

The Tangent Plane Equation

Let $f(x, y)$ be a function with continuous partial derivatives near $(a, b)$ . The tangent plane to the surface $z = f(x, y)$ at the point $(a, b, f(a, b))$ is:

Tangent Plane Equation

z - f(a, b) = f_x(a, b)(x - a) + f_y(a, b)(y - b)

Or equivalently:

z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)

Deriving the Normal Vector

We can also express the tangent plane using its normal vector. Rewrite the surface as $F(x, y, z) = f(x, y) - z = 0$ . The gradient of $F$ is:

\nabla F = \langle f_x, f_y, -1 \rangle

This vector is normal (perpendicular) to the tangent plane. The tangent plane can then be written as:

f_x(a, b)(x - a) + f_y(a, b)(y - b) - (z - f(a, b)) = 0

Example: Paraboloid

Find the tangent plane to $z = x^2 + y^2$ at the point $(1, 2, 5)$ .

Solution:

First, compute the partial derivatives:

$f_x = 2x$ , so $f_x(1, 2) = 2$
$f_y = 2y$ , so $f_y(1, 2) = 4$

The tangent plane equation is:

z - 5 = 2(x - 1) + 4(y - 2)

Simplifying: $z = 2x + 4y - 5$

Interactive: Explore Tangent Planes

Explore how the tangent plane changes as you move the point of tangency on different surfaces. Notice how the plane tilts based on the partial derivatives.

Loading 3D visualization...

Linear Approximation

The tangent plane provides a linear approximation to the function near the point of tangency. This is also called the linearization of $f$ at $(a, b)$ .

Linear Approximation (Linearization)

L(x, y) = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)

For points $(x, y)$ near $(a, b)$ :

f(x, y) \approx L(x, y)

Why "Linear"?

The approximating function $L(x, y)$ is linear in the displacements $(x - a)$ and $(y - b)$ . If we set $\Delta x = x - a$ and $\Delta y = y - b$ :

f(a + \Delta x, b + \Delta y) \approx f(a, b) + f_x(a, b) \Delta x + f_y(a, b) \Delta y

This is a first-degree polynomial in $\Delta x$ and $\Delta y$ .

Example: Estimating a Square Root

Estimate $\sqrt{(3.02)^2 + (3.97)^2}$ using linear approximation.

Solution:

Let $f(x, y) = \sqrt{x^2 + y^2}$ . We use the base point $(a, b) = (3, 4)$ .

$f(3, 4) = \sqrt{9 + 16} = 5$
$f_x = \frac{x}{\sqrt{x^2 + y^2}}$ , so $f_x(3, 4) = \frac{3}{5}$
$f_y = \frac{y}{\sqrt{x^2 + y^2}}$ , so $f_y(3, 4) = \frac{4}{5}$

The linearization is:

L(x, y) = 5 + \frac{3}{5}(x - 3) + \frac{4}{5}(y - 4)

At $(3.02, 3.97)$ :

L(3.02, 3.97) = 5 + 0.6(0.02) + 0.8(-0.03) = 5 + 0.012 - 0.024 = 4.988

The actual value is $\sqrt{3.02^2 + 3.97^2} \approx 4.98804$ , so our approximation is excellent.

Choosing the Base Point

Choose $(a, b)$ to be a point where you can easily compute $f$ , $f_x$ , and $f_y$ . Usually this means integer values or special values like multiples of $\pi$ .

The Total Differential

The total differential provides an alternate notation for linear approximation that is especially useful in physics and engineering.

Total Differential

For $z = f(x, y)$ , the total differential is:

dz = f_x \, dx + f_y \, dy

Or using subscript notation:

dz = \frac{\partial z}{\partial x} dx + \frac{\partial z}{\partial y} dy

What Does This Mean?

The differentials $dx$ and $dy$ represent small changes in $x$ and $y$ . The differential $dz$ approximates the resulting change in $z$ :

\Delta z \approx dz = f_x \, dx + f_y \, dy

Here $\Delta z = f(x + dx, y + dy) - f(x, y)$ is the actual change, while $dz$ is the linear approximation of this change.

Example: Error Propagation

The volume of a cylinder is $V = \pi r^2 h$ . If the radius is measured as $r = 5 \pm 0.1$ cm and height as $h = 10 \pm 0.2$ cm, estimate the error in the volume.

Solution:

The total differential is:

dV = \frac{\partial V}{\partial r} dr + \frac{\partial V}{\partial h} dh = 2\pi rh \, dr + \pi r^2 \, dh

At $r = 5, h = 10$ with $|dr| = 0.1, |dh| = 0.2$ :

|dV| \leq 2\pi(5)(10)(0.1) + \pi(25)(0.2) = 10\pi + 5\pi = 15\pi \approx 47.1 \text{ cm}^3

The nominal volume is $\pi(25)(10) = 250\pi \approx 785$ cm³, so the relative error is about 6%.

Differentiability in Multiple Variables

In single-variable calculus, differentiability at a point implies continuity. In multiple variables, the situation is more subtle.

Important Distinction

Having both partial derivatives $f_x$ and $f_y$ exist at a point does not guarantee that the function is differentiable there!

Definition of Differentiability

A function $f(x, y)$ is differentiable at $(a, b)$ if:

\lim_{(\Delta x, \Delta y) \to (0, 0)} \frac{f(a + \Delta x, b + \Delta y) - f(a, b) - f_x(a, b)\Delta x - f_y(a, b)\Delta y}{\sqrt{(\Delta x)^2 + (\Delta y)^2}} = 0

In words: the linear approximation error goes to zero faster than the distance to the point of tangency.

Sufficient Condition for Differentiability

Sufficient Condition

If the partial derivatives $f_x$ and $f_y$ exist and are continuous in a neighborhood of $(a, b)$ , then $f$ is differentiable at $(a, b)$ .

This is why we often require continuity of partial derivatives — it guarantees that tangent planes exist and the linear approximation is valid.

A Classic Counterexample

Consider the function:

f(x, y) = \begin{cases} \frac{xy}{x^2 + y^2} & (x, y) \neq (0, 0) \\ 0 & (x, y) = (0, 0) \end{cases}

At the origin:

$f_x(0, 0) = 0$ (computed using the definition)
$f_y(0, 0) = 0$ (computed using the definition)
But $f$ is not continuous at $(0, 0)$ — approaching along $y = x$ gives $f = 1/2$

Since $f$ is not even continuous, it cannot be differentiable. The tangent plane formula gives $z = 0$ , but this is a terrible approximation for points not on the axes.

Error Analysis

How good is the linear approximation? The answer depends on how far we are from the point of tangency and how curved the surface is.

The Error Formula

If $f$ has continuous second partial derivatives, the error in the linear approximation is:

f(x, y) - L(x, y) = O\left(\sqrt{(x - a)^2 + (y - b)^2}^2\right)

This means the error is approximately proportional to the square of the distance from the point of tangency.

Distance from (a, b)	Relative Error Scale
0.1	~0.01 (1%)
0.01	~0.0001 (0.01%)
0.001	~0.000001 (0.0001%)

This quadratic behavior is why linear approximation works so well for small displacements — the error shrinks much faster than the distance.

Interactive: Explore Approximation Error

Move the test point and observe how the approximation error grows as you move away from the base point. Notice the quadratic behavior of the error.

Loading visualization...

Real-World Applications

1. Engineering: Stress and Strain

In structural engineering, the stress $\sigma$ in a material often depends on multiple loading conditions. Linear approximation lets engineers estimate how small changes in loads affect stress:

d\sigma = \frac{\partial \sigma}{\partial F_1} dF_1 + \frac{\partial \sigma}{\partial F_2} dF_2

2. Physics: Thermodynamics

For an ideal gas, pressure $P$ depends on volume $V$ and temperature $T$ :

P = \frac{nRT}{V}

The total differential gives:

dP = \frac{nR}{V} dT - \frac{nRT}{V^2} dV

This tells us how pressure changes with small variations in temperature and volume.

3. Economics: Marginal Analysis

If profit $P(q_1, q_2)$ depends on quantities of two products, the total differential:

dP = \frac{\partial P}{\partial q_1} dq_1 + \frac{\partial P}{\partial q_2} dq_2

tells us how to adjust production for maximum profit increase. The partial derivatives are the marginal profits.

GPS systems use linear approximation to convert latitude/longitude changes to distances. For small changes near a point $(\phi_0, \lambda_0)$ :

\Delta x \approx R \cos(\phi_0) \, \Delta\lambda, \quad \Delta y \approx R \, \Delta\phi

Machine Learning Connections

Linear approximation is the mathematical foundation of several key concepts in machine learning.

Gradient Descent

When training a neural network, we minimize a loss function $L(\theta)$ where $\theta$ represents all the weights. Gradient descent uses the linear approximation:

L(\theta + \Delta\theta) \approx L(\theta) + \nabla L \cdot \Delta\theta

To decrease the loss, we choose $\Delta\theta = -\eta \nabla L$ (negative gradient direction), giving the update rule:

\theta_{new} = \theta_{old} - \eta \nabla L

Why This Works

The linear approximation tells us the loss will decrease by approximately $\eta |\nabla L|^2$ per step. As long as the learning rate $\eta$ is small enough that the linear approximation is accurate, we're guaranteed to make progress.

Taylor Expansion for Second-Order Methods

Methods like Newton's method and L-BFGS use the second-order Taylor expansion:

L(\theta + \Delta\theta) \approx L(\theta) + \nabla L \cdot \Delta\theta + \frac{1}{2} \Delta\theta^T H \Delta\theta

where $H$ is the Hessian matrix of second derivatives. This gives faster convergence near the minimum.

Sensitivity and Feature Importance

The gradient $\nabla f$ tells us how sensitive the output is to each input feature:

Large $|\partial f / \partial x_i|$ : Feature $x_i$ strongly affects the output
Small $|\partial f / \partial x_i|$ : Feature $x_i$ has little effect on the output

This is the basis of gradient-based feature importance and saliency maps in deep learning interpretability.

Jacobian for Vector-Valued Functions

For neural networks with multiple outputs, the tangent plane generalizes to the Jacobian matrix:

J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix}

The linear approximation becomes $\mathbf{f}(\mathbf{x} + \Delta\mathbf{x}) \approx \mathbf{f}(\mathbf{x}) + J \Delta\mathbf{x}$ .

Python Implementation

Computing Tangent Planes

Tangent Plane Computation

🐍tangent_plane.py

Explanation(3)

Code(91)

8Tangent Plane Formula

The tangent plane z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b) is the 2D generalization of the tangent line. It uses both partial derivatives to capture how the surface changes in both the x and y directions.

36Linear Approximation

L(x, y) approximates f(x, y) near (a, b). The approximation is 'linear' because the formula is first-degree in (x - a) and (y - b). Higher-order terms are dropped.

50Partial Derivatives

For f(x, y) = x² + y², we have f_x = 2x (derivative treating y as constant) and f_y = 2y (derivative treating x as constant). These give the slopes of the tangent plane.

88 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3from mpl_toolkits.mplot3d import Axes3D
4
5def compute_tangent_plane(f, fx, fy, point, grid_range=1.0, grid_size=20):
6    """
7    Compute tangent plane to surface z = f(x, y) at a given point.
8
9    The tangent plane at (a, b, f(a, b)) is:
10    z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)
11
12    This is the 2D analog of the tangent line equation!
13
14    Parameters:
15        f: Function f(x, y)
16        fx: Partial derivative with respect to x
17        fy: Partial derivative with respect to y
18        point: Tuple (a, b) - point of tangency
19    """
20    a, b = point
21    z0 = f(a, b)
22    fx_val = fx(a, b)
23    fy_val = fy(a, b)
24
25    # Create grid for visualization
26    x = np.linspace(a - grid_range, a + grid_range, grid_size)
27    y = np.linspace(b - grid_range, b + grid_range, grid_size)
28    X, Y = np.meshgrid(x, y)
29
30    # Surface values
31    Z_surface = f(X, Y)
32
33    # Tangent plane: z = z0 + fx*(x - a) + fy*(y - b)
34    Z_plane = z0 + fx_val * (X - a) + fy_val * (Y - b)
35
36    return X, Y, Z_surface, Z_plane, (z0, fx_val, fy_val)
37
38def linear_approximation(f, fx, fy, base_point, eval_point):
39    """
40    Use tangent plane to approximate f near a point.
41
42    L(x, y) ≈ f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)
43
44    This is LINEAR approximation because the approximating
45    function is a linear function of (x - a) and (y - b).
46    """
47    a, b = base_point
48    x, y = eval_point
49
50    z0 = f(a, b)
51    approx = z0 + fx(a, b) * (x - a) + fy(a, b) * (y - b)
52    actual = f(x, y)
53    error = abs(actual - approx)
54
55    return approx, actual, error
56
57# Example: Paraboloid f(x, y) = x² + y²
58def f(x, y):
59    return x**2 + y**2
60
61def fx(x, y):  # ∂f/∂x = 2x
62    return 2 * x
63
64def fy(x, y):  # ∂f/∂y = 2y
65    return 2 * y
66
67# Compute tangent plane at (1, 1)
68point = (1, 1)
69X, Y, Z_surface, Z_plane, params = compute_tangent_plane(f, fx, fy, point)
70z0, fx_val, fy_val = params
71
72print("Tangent Plane Analysis")
73print("=" * 50)
74print(f"Surface: z = x² + y²")
75print(f"Point of tangency: ({point[0]}, {point[1]}, {z0})")
76print(f"∂f/∂x at point: {fx_val}")
77print(f"∂f/∂y at point: {fy_val}")
78print(f"\nTangent plane equation:")
79print(f"z = {z0} + {fx_val}(x - {point[0]}) + {fy_val}(y - {point[1]})")
80print(f"z = {z0} + {fx_val}x - {fx_val*point[0]} + {fy_val}y - {fy_val*point[1]}")
81print(f"z = {fx_val}x + {fy_val}y + {z0 - fx_val*point[0] - fy_val*point[1]}")
82
83# Test linear approximation
84test_point = (1.1, 1.05)
85approx, actual, error = linear_approximation(f, fx, fy, point, test_point)
86print(f"\nLinear Approximation Test:")
87print(f"Test point: {test_point}")
88print(f"Actual f(x, y) = {actual:.6f}")
89print(f"Linear approx  = {approx:.6f}")
90print(f"Error          = {error:.6f}")
91print(f"Relative error = {100*error/actual:.4f}%")

Machine Learning Applications

Linear Approximation in ML

🐍linear_approx_ml.py

Explanation(3)

Code(142)

7Gradient Descent Connection

Gradient descent uses the linear approximation L(θ + Δθ) ≈ L(θ) + ∇L · Δθ to predict how the loss will change. Moving in the negative gradient direction minimizes this linear approximation.

35Tangent Hyperplane

The formula L(x) = f(x₀) + ∇f(x₀)ᵀ(x - x₀) is the equation of the tangent hyperplane at x₀. This generalizes the 2D tangent plane to any number of dimensions.

56Sensitivity Analysis

The gradient tells us how sensitive the output is to each input feature. Large gradient components indicate features that strongly affect the output - useful for feature importance and interpretability.

139 lines without explanation

1import numpy as np
2from typing import Callable, Tuple
3
4class LinearizationForML:
5    """
6    Linear approximation concepts are fundamental to machine learning:
7
8    1. GRADIENT DESCENT: Uses linear approximation to predict
9       loss change: L(θ + Δθ) ≈ L(θ) + ∇L · Δθ
10
11    2. TAYLOR EXPANSION: Second-order methods (Newton, BFGS)
12       use quadratic approximation beyond linear
13
14    3. JACOBIAN: For vector-valued functions, the tangent plane
15       generalizes to the Jacobian matrix
16
17    4. SENSITIVITY ANALYSIS: How do small input changes
18       affect outputs? Linear approximation quantifies this.
19    """
20
21    def __init__(self, f: Callable, grad_f: Callable):
22        """
23        f: Scalar function of vector input
24        grad_f: Gradient function returning vector
25        """
26        self.f = f
27        self.grad_f = grad_f
28
29    def linear_model(self, x0: np.ndarray) -> Tuple[Callable, float, np.ndarray]:
30        """
31        Build linear approximation (tangent hyperplane) at x0.
32
33        L(x) = f(x0) + ∇f(x0)ᵀ(x - x0)
34
35        This is exactly what gradient descent uses to decide
36        which direction to step!
37        """
38        f0 = self.f(x0)
39        grad0 = self.grad_f(x0)
40
41        def L(x):
42            return f0 + np.dot(grad0, x - x0)
43
44        return L, f0, grad0
45
46    def gradient_descent_step(self, x0: np.ndarray, learning_rate: float) -> np.ndarray:
47        """
48        Gradient descent moves in the direction that the
49        LINEAR APPROXIMATION predicts will decrease f most.
50
51        The gradient ∇f points in the direction of steepest
52        ascent of the tangent plane, so we go the opposite way.
53        """
54        _, _, grad0 = self.linear_model(x0)
55        return x0 - learning_rate * grad0
56
57    def sensitivity_analysis(self, x0: np.ndarray, perturbation: np.ndarray) -> dict:
58        """
59        How much does f change when we perturb the input?
60
61        Linear approximation: Δf ≈ ∇f · Δx
62
63        This tells us which input features most affect the output -
64        crucial for interpretability and feature importance!
65        """
66        L, f0, grad0 = self.linear_model(x0)
67
68        # Predicted change using linear approx
69        predicted_change = np.dot(grad0, perturbation)
70
71        # Actual change
72        x_new = x0 + perturbation
73        actual_change = self.f(x_new) - f0
74
75        # Feature contributions
76        feature_contributions = grad0 * perturbation
77
78        return {
79            'predicted_change': predicted_change,
80            'actual_change': actual_change,
81            'approximation_error': abs(actual_change - predicted_change),
82            'gradient': grad0,
83            'feature_contributions': feature_contributions,
84            'relative_importance': np.abs(feature_contributions) / (np.sum(np.abs(feature_contributions)) + 1e-10)
85        }
86
87# Example: Logistic loss function (common in ML)
88def logistic_loss(w):
89    """
90    Simplified logistic loss for demonstration.
91    In practice: L(w) = -Σ[y log(σ(wx)) + (1-y) log(1 - σ(wx))]
92    """
93    x = np.array([1.0, 2.0, -1.0])  # Fixed data point
94    y = 1  # Label
95    z = np.dot(w, x)
96    sigmoid = 1 / (1 + np.exp(-z))
97    return -y * np.log(sigmoid + 1e-10) - (1 - y) * np.log(1 - sigmoid + 1e-10)
98
99def logistic_loss_gradient(w):
100    """Gradient of logistic loss."""
101    x = np.array([1.0, 2.0, -1.0])
102    y = 1
103    z = np.dot(w, x)
104    sigmoid = 1 / (1 + np.exp(-z))
105    return (sigmoid - y) * x
106
107# Demonstrate linear approximation in ML context
108print("Linear Approximation in Machine Learning")
109print("=" * 60)
110
111# Initialize model
112model = LinearizationForML(logistic_loss, logistic_loss_gradient)
113
114# Current weights
115w0 = np.array([0.5, -0.3, 0.8])
116print(f"Current weights: {w0}")
117print(f"Current loss: {logistic_loss(w0):.6f}")
118
119# Build linear model
120L, f0, grad0 = model.linear_model(w0)
121print(f"\nGradient at current point: {grad0}")
122print(f"Gradient magnitude: {np.linalg.norm(grad0):.6f}")
123
124# Sensitivity analysis
125perturbation = np.array([0.1, -0.05, 0.02])
126analysis = model.sensitivity_analysis(w0, perturbation)
127
128print(f"\nSensitivity Analysis:")
129print(f"Perturbation: {perturbation}")
130print(f"Predicted loss change (linear): {analysis['predicted_change']:.6f}")
131print(f"Actual loss change: {analysis['actual_change']:.6f}")
132print(f"Approximation error: {analysis['approximation_error']:.6f}")
133print(f"\nFeature contributions to predicted change:")
134for i, contrib in enumerate(analysis['feature_contributions']):
135    print(f"  Feature {i}: {contrib:+.6f} ({100*analysis['relative_importance'][i]:.1f}%)")
136
137# Gradient descent step
138w1 = model.gradient_descent_step(w0, learning_rate=0.1)
139print(f"\nGradient Descent Step (lr=0.1):")
140print(f"New weights: {w1}")
141print(f"New loss: {logistic_loss(w1):.6f}")
142print(f"Loss decreased by: {logistic_loss(w0) - logistic_loss(w1):.6f}")

Test Your Understanding

Summary

Tangent planes and linear approximations extend the powerful ideas of single-variable calculus to functions of multiple variables. They provide a way to locally "flatten" a curved surface, making analysis and computation tractable.

Key Formulas

Concept	Formula
Tangent Plane	z = f(a,b) + f_x(a,b)(x - a) + f_y(a,b)(y - b)
Linear Approximation	L(x,y) = f(a,b) + f_x(a,b)(x - a) + f_y(a,b)(y - b)
Total Differential	dz = f_x dx + f_y dy
Normal Vector	n = ⟨-f_x, -f_y, 1⟩ (or ⟨f_x, f_y, -1⟩)
Approximation Error	\|f - L\| = O(distance²)

Key Concepts

The tangent plane at a point is determined by the two partial derivatives at that point
Linear approximation uses the tangent plane equation to estimate function values near the point of tangency
The total differential $dz = f_x dx + f_y dy$ estimates the change in $z$ from small changes in $x$ and $y$
Differentiability in multiple variables is stronger than just having partial derivatives — it requires the linear approximation to be good
The approximation error is quadratic in distance, making linear approximation very accurate for small displacements
Gradient descent in machine learning is a direct application of linear approximation to optimization

The Essence of Differential Calculus:

"Locally, every smooth surface is flat. The tangent plane captures this local flatness, and linear approximation exploits it."

Coming Next: In the next section, we'll explore The Chain Rule for Multivariable Functions — how to compute derivatives when variables depend on other variables. This is the foundation of backpropagation in neural networks.

Learning Objectives

The Big Picture: Flattening the World

Single Variable

Two Variables

Why Tangent Planes Matter

From Tangent Lines to Tangent Planes

The Two Tangent Lines

Key Geometric Insight

The Tangent Plane Equation

Tangent Plane Equation

Deriving the Normal Vector

Example: Paraboloid

Interactive: Explore Tangent Planes

Linear Approximation

Linear Approximation (Linearization)

Why "Linear"?

Example: Estimating a Square Root

Choosing the Base Point

The Total Differential

Total Differential

What Does This Mean?

Example: Error Propagation

Differentiability in Multiple Variables

Important Distinction

Definition of Differentiability

Sufficient Condition for Differentiability

Sufficient Condition

A Classic Counterexample

Error Analysis

The Error Formula

Interactive: Explore Approximation Error

Real-World Applications

1. Engineering: Stress and Strain

2. Physics: Thermodynamics

3. Economics: Marginal Analysis

4. Navigation: Small-Angle Approximations

Machine Learning Connections

Gradient Descent

Why This Works

Taylor Expansion for Second-Order Methods

Sensitivity and Feature Importance

Jacobian for Vector-Valued Functions

Python Implementation

Computing Tangent Planes

Machine Learning Applications

Test Your Understanding

Summary

Key Formulas

Key Concepts