Chapter 17
22 min read
Section 150 of 353

Tangent Planes and Linear Approximations

Partial Derivatives

Learning Objectives

By the end of this section, you will be able to:

  1. Derive the equation of the tangent plane to a surface z=f(x,y)z = f(x, y) at any point using partial derivatives
  2. Construct linear approximations to functions of two variables and estimate values near a point of tangency
  3. Calculate total differentials and understand their relationship to linear approximation
  4. Distinguish between continuity and differentiability in multiple dimensions
  5. Analyze the error in linear approximations and understand when the approximation is valid
  6. Apply these concepts to problems in optimization, sensitivity analysis, and machine learning

The Big Picture: Flattening the World

"Every smooth surface, when viewed closely enough, looks flat. The tangent plane is that flat approximation."

When you stand on the Earth, you perceive it as flat even though it is a sphere. This is because at any point, the Earth's surface can be well-approximated by a flat plane — the tangent plane at that point. This fundamental idea extends to any smooth surface in mathematics.

In single-variable calculus, we learned that a differentiable function y=f(x)y = f(x) can be approximated near a point x=ax = a by its tangent line:

f(x)f(a)+f(a)(xa)f(x) \approx f(a) + f'(a)(x - a)

Now we extend this powerful idea to functions of two variables. Instead of a tangent line, we have a tangent plane. Instead of one derivative, we use both partial derivatives.

Single Variable

  • Curve y=f(x)y = f(x)
  • Tangent line at x=ax = a
  • Uses one derivative: f(a)f'(a)
  • Linear approximation in 1D

Two Variables

  • Surface z=f(x,y)z = f(x, y)
  • Tangent plane at (a,b)(a, b)
  • Uses two partials: fx,fyf_x, f_y
  • Linear approximation in 2D

Why Tangent Planes Matter

Tangent planes are the foundation of differential calculus in higher dimensions. They enable:

  • Optimization: Gradient descent uses linear approximation to find minima
  • Sensitivity Analysis: How do small input changes affect outputs?
  • Error Propagation: How do measurement errors compound?
  • Numerical Methods: Newton's method in multiple dimensions

From Tangent Lines to Tangent Planes

Consider a surface z=f(x,y)z = f(x, y) and a point P=(a,b,f(a,b))P = (a, b, f(a, b)) on this surface. We want to find the plane that best approximates the surface near this point.

The Two Tangent Lines

At point PP, we can draw two special curves on the surface:

  1. The curve where y=by = b (constant): This gives us z=f(x,b)z = f(x, b), a function of xx alone. Its tangent line at x=ax = a has slope fx(a,b)f_x(a, b).
  2. The curve where x=ax = a (constant): This gives us z=f(a,y)z = f(a, y), a function of yy alone. Its tangent line at y=by = b has slope fy(a,b)f_y(a, b).

These two tangent lines lie in the tangent plane, and together they determine the tangent plane uniquely.

Key Geometric Insight

The tangent plane contains:

  • The tangent to the x-slice curve (red): direction 1,0,fx\langle 1, 0, f_x \rangle
  • The tangent to the y-slice curve (blue): direction 0,1,fy\langle 0, 1, f_y \rangle

Any plane containing both these vectors is the tangent plane.


The Tangent Plane Equation

Let f(x,y)f(x, y) be a function with continuous partial derivatives near (a,b)(a, b). The tangent plane to the surface z=f(x,y)z = f(x, y) at the point (a,b,f(a,b))(a, b, f(a, b)) is:

Tangent Plane Equation

zf(a,b)=fx(a,b)(xa)+fy(a,b)(yb)z - f(a, b) = f_x(a, b)(x - a) + f_y(a, b)(y - b)

Or equivalently:

z=f(a,b)+fx(a,b)(xa)+fy(a,b)(yb)z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)

Deriving the Normal Vector

We can also express the tangent plane using its normal vector. Rewrite the surface as F(x,y,z)=f(x,y)z=0F(x, y, z) = f(x, y) - z = 0. The gradient of FF is:

F=fx,fy,1\nabla F = \langle f_x, f_y, -1 \rangle

This vector is normal (perpendicular) to the tangent plane. The tangent plane can then be written as:

fx(a,b)(xa)+fy(a,b)(yb)(zf(a,b))=0f_x(a, b)(x - a) + f_y(a, b)(y - b) - (z - f(a, b)) = 0

Example: Paraboloid

Find the tangent plane to z=x2+y2z = x^2 + y^2 at the point (1,2,5)(1, 2, 5).

Solution:

First, compute the partial derivatives:

  • fx=2xf_x = 2x, so fx(1,2)=2f_x(1, 2) = 2
  • fy=2yf_y = 2y, so fy(1,2)=4f_y(1, 2) = 4

The tangent plane equation is:

z5=2(x1)+4(y2)z - 5 = 2(x - 1) + 4(y - 2)

Simplifying: z=2x+4y5z = 2x + 4y - 5

Interactive: Explore Tangent Planes

Explore how the tangent plane changes as you move the point of tangency on different surfaces. Notice how the plane tilts based on the partial derivatives.

Loading 3D visualization...

Linear Approximation

The tangent plane provides a linear approximation to the function near the point of tangency. This is also called the linearization of ff at (a,b)(a, b).

Linear Approximation (Linearization)

L(x,y)=f(a,b)+fx(a,b)(xa)+fy(a,b)(yb)L(x, y) = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)

For points (x,y)(x, y) near (a,b)(a, b):

f(x,y)L(x,y)f(x, y) \approx L(x, y)

Why "Linear"?

The approximating function L(x,y)L(x, y) is linear in the displacements (xa)(x - a) and (yb)(y - b). If we set Δx=xa\Delta x = x - a and Δy=yb\Delta y = y - b:

f(a+Δx,b+Δy)f(a,b)+fx(a,b)Δx+fy(a,b)Δyf(a + \Delta x, b + \Delta y) \approx f(a, b) + f_x(a, b) \Delta x + f_y(a, b) \Delta y

This is a first-degree polynomial in Δx\Delta x and Δy\Delta y.

Example: Estimating a Square Root

Estimate (3.02)2+(3.97)2\sqrt{(3.02)^2 + (3.97)^2} using linear approximation.

Solution:

Let f(x,y)=x2+y2f(x, y) = \sqrt{x^2 + y^2}. We use the base point (a,b)=(3,4)(a, b) = (3, 4).

  • f(3,4)=9+16=5f(3, 4) = \sqrt{9 + 16} = 5
  • fx=xx2+y2f_x = \frac{x}{\sqrt{x^2 + y^2}}, so fx(3,4)=35f_x(3, 4) = \frac{3}{5}
  • fy=yx2+y2f_y = \frac{y}{\sqrt{x^2 + y^2}}, so fy(3,4)=45f_y(3, 4) = \frac{4}{5}

The linearization is:

L(x,y)=5+35(x3)+45(y4)L(x, y) = 5 + \frac{3}{5}(x - 3) + \frac{4}{5}(y - 4)

At (3.02,3.97)(3.02, 3.97):

L(3.02,3.97)=5+0.6(0.02)+0.8(0.03)=5+0.0120.024=4.988L(3.02, 3.97) = 5 + 0.6(0.02) + 0.8(-0.03) = 5 + 0.012 - 0.024 = 4.988

The actual value is 3.022+3.9724.98804\sqrt{3.02^2 + 3.97^2} \approx 4.98804, so our approximation is excellent.

Choosing the Base Point

Choose (a,b)(a, b) to be a point where you can easily compute ff, fxf_x, and fyf_y. Usually this means integer values or special values like multiples of π\pi.


The Total Differential

The total differential provides an alternate notation for linear approximation that is especially useful in physics and engineering.

Total Differential

For z=f(x,y)z = f(x, y), the total differential is:

dz=fxdx+fydydz = f_x \, dx + f_y \, dy

Or using subscript notation:

dz=zxdx+zydydz = \frac{\partial z}{\partial x} dx + \frac{\partial z}{\partial y} dy

What Does This Mean?

The differentials dxdx and dydy represent small changes in xx and yy. The differential dzdz approximates the resulting change in zz:

Δzdz=fxdx+fydy\Delta z \approx dz = f_x \, dx + f_y \, dy

Here Δz=f(x+dx,y+dy)f(x,y)\Delta z = f(x + dx, y + dy) - f(x, y) is the actual change, while dzdz is the linear approximation of this change.

Example: Error Propagation

The volume of a cylinder is V=πr2hV = \pi r^2 h. If the radius is measured as r=5±0.1r = 5 \pm 0.1 cm and height as h=10±0.2h = 10 \pm 0.2 cm, estimate the error in the volume.

Solution:

The total differential is:

dV=Vrdr+Vhdh=2πrhdr+πr2dhdV = \frac{\partial V}{\partial r} dr + \frac{\partial V}{\partial h} dh = 2\pi rh \, dr + \pi r^2 \, dh

At r=5,h=10r = 5, h = 10 with dr=0.1,dh=0.2|dr| = 0.1, |dh| = 0.2:

dV2π(5)(10)(0.1)+π(25)(0.2)=10π+5π=15π47.1 cm3|dV| \leq 2\pi(5)(10)(0.1) + \pi(25)(0.2) = 10\pi + 5\pi = 15\pi \approx 47.1 \text{ cm}^3

The nominal volume is π(25)(10)=250π785\pi(25)(10) = 250\pi \approx 785 cm3, so the relative error is about 6%.


Differentiability in Multiple Variables

In single-variable calculus, differentiability at a point implies continuity. In multiple variables, the situation is more subtle.

Important Distinction

Having both partial derivatives fxf_x and fyf_y exist at a point does not guarantee that the function is differentiable there!

Definition of Differentiability

A function f(x,y)f(x, y) is differentiable at (a,b)(a, b) if:

lim(Δx,Δy)(0,0)f(a+Δx,b+Δy)f(a,b)fx(a,b)Δxfy(a,b)Δy(Δx)2+(Δy)2=0\lim_{(\Delta x, \Delta y) \to (0, 0)} \frac{f(a + \Delta x, b + \Delta y) - f(a, b) - f_x(a, b)\Delta x - f_y(a, b)\Delta y}{\sqrt{(\Delta x)^2 + (\Delta y)^2}} = 0

In words: the linear approximation error goes to zero faster than the distance to the point of tangency.

Sufficient Condition for Differentiability

Sufficient Condition

If the partial derivatives fxf_x and fyf_y exist and are continuous in a neighborhood of (a,b)(a, b), then ff is differentiable at (a,b)(a, b).

This is why we often require continuity of partial derivatives — it guarantees that tangent planes exist and the linear approximation is valid.

A Classic Counterexample

Consider the function:

f(x,y)={xyx2+y2(x,y)(0,0)0(x,y)=(0,0)f(x, y) = \begin{cases} \frac{xy}{x^2 + y^2} & (x, y) \neq (0, 0) \\ 0 & (x, y) = (0, 0) \end{cases}

At the origin:

  • fx(0,0)=0f_x(0, 0) = 0 (computed using the definition)
  • fy(0,0)=0f_y(0, 0) = 0 (computed using the definition)
  • But ff is not continuous at (0,0)(0, 0) — approaching along y=xy = x gives f=1/2f = 1/2

Since ff is not even continuous, it cannot be differentiable. The tangent plane formula gives z=0z = 0, but this is a terrible approximation for points not on the axes.


Error Analysis

How good is the linear approximation? The answer depends on how far we are from the point of tangency and how curved the surface is.

The Error Formula

If ff has continuous second partial derivatives, the error in the linear approximation is:

f(x,y)L(x,y)=O((xa)2+(yb)22)f(x, y) - L(x, y) = O\left(\sqrt{(x - a)^2 + (y - b)^2}^2\right)

This means the error is approximately proportional to the square of the distance from the point of tangency.

Distance from (a, b)Relative Error Scale
0.1~0.01 (1%)
0.01~0.0001 (0.01%)
0.001~0.000001 (0.0001%)

This quadratic behavior is why linear approximation works so well for small displacements — the error shrinks much faster than the distance.

Interactive: Explore Approximation Error

Move the test point and observe how the approximation error grows as you move away from the base point. Notice the quadratic behavior of the error.

Loading visualization...

Real-World Applications

1. Engineering: Stress and Strain

In structural engineering, the stress σ\sigma in a material often depends on multiple loading conditions. Linear approximation lets engineers estimate how small changes in loads affect stress:

dσ=σF1dF1+σF2dF2d\sigma = \frac{\partial \sigma}{\partial F_1} dF_1 + \frac{\partial \sigma}{\partial F_2} dF_2

2. Physics: Thermodynamics

For an ideal gas, pressure PP depends on volume VV and temperature TT:

P=nRTVP = \frac{nRT}{V}

The total differential gives:

dP=nRVdTnRTV2dVdP = \frac{nR}{V} dT - \frac{nRT}{V^2} dV

This tells us how pressure changes with small variations in temperature and volume.

3. Economics: Marginal Analysis

If profit P(q1,q2)P(q_1, q_2) depends on quantities of two products, the total differential:

dP=Pq1dq1+Pq2dq2dP = \frac{\partial P}{\partial q_1} dq_1 + \frac{\partial P}{\partial q_2} dq_2

tells us how to adjust production for maximum profit increase. The partial derivatives are the marginal profits.

4. Navigation: Small-Angle Approximations

GPS systems use linear approximation to convert latitude/longitude changes to distances. For small changes near a point (ϕ0,λ0)(\phi_0, \lambda_0):

ΔxRcos(ϕ0)Δλ,ΔyRΔϕ\Delta x \approx R \cos(\phi_0) \, \Delta\lambda, \quad \Delta y \approx R \, \Delta\phi

Machine Learning Connections

Linear approximation is the mathematical foundation of several key concepts in machine learning.

Gradient Descent

When training a neural network, we minimize a loss function L(θ)L(\theta) where θ\theta represents all the weights. Gradient descent uses the linear approximation:

L(θ+Δθ)L(θ)+LΔθL(\theta + \Delta\theta) \approx L(\theta) + \nabla L \cdot \Delta\theta

To decrease the loss, we choose Δθ=ηL\Delta\theta = -\eta \nabla L (negative gradient direction), giving the update rule:

θnew=θoldηL\theta_{new} = \theta_{old} - \eta \nabla L

Why This Works

The linear approximation tells us the loss will decrease by approximately ηL2\eta |\nabla L|^2 per step. As long as the learning rate η\eta is small enough that the linear approximation is accurate, we're guaranteed to make progress.

Taylor Expansion for Second-Order Methods

Methods like Newton's method and L-BFGS use the second-order Taylor expansion:

L(θ+Δθ)L(θ)+LΔθ+12ΔθTHΔθL(\theta + \Delta\theta) \approx L(\theta) + \nabla L \cdot \Delta\theta + \frac{1}{2} \Delta\theta^T H \Delta\theta

where HH is the Hessian matrix of second derivatives. This gives faster convergence near the minimum.

Sensitivity and Feature Importance

The gradient f\nabla f tells us how sensitive the output is to each input feature:

  • Large f/xi|\partial f / \partial x_i|: Feature xix_i strongly affects the output
  • Small f/xi|\partial f / \partial x_i|: Feature xix_i has little effect on the output

This is the basis of gradient-based feature importance and saliency maps in deep learning interpretability.

Jacobian for Vector-Valued Functions

For neural networks with multiple outputs, the tangent plane generalizes to the Jacobian matrix:

J=[f1x1f1xnfmx1fmxn]J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix}

The linear approximation becomes f(x+Δx)f(x)+JΔx\mathbf{f}(\mathbf{x} + \Delta\mathbf{x}) \approx \mathbf{f}(\mathbf{x}) + J \Delta\mathbf{x}.


Python Implementation

Computing Tangent Planes

Tangent Plane Computation
🐍tangent_plane.py
8Tangent Plane Formula

The tangent plane z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b) is the 2D generalization of the tangent line. It uses both partial derivatives to capture how the surface changes in both the x and y directions.

36Linear Approximation

L(x, y) approximates f(x, y) near (a, b). The approximation is 'linear' because the formula is first-degree in (x - a) and (y - b). Higher-order terms are dropped.

50Partial Derivatives

For f(x, y) = x² + y², we have f_x = 2x (derivative treating y as constant) and f_y = 2y (derivative treating x as constant). These give the slopes of the tangent plane.

88 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3from mpl_toolkits.mplot3d import Axes3D
4
5def compute_tangent_plane(f, fx, fy, point, grid_range=1.0, grid_size=20):
6    """
7    Compute tangent plane to surface z = f(x, y) at a given point.
8
9    The tangent plane at (a, b, f(a, b)) is:
10    z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)
11
12    This is the 2D analog of the tangent line equation!
13
14    Parameters:
15        f: Function f(x, y)
16        fx: Partial derivative with respect to x
17        fy: Partial derivative with respect to y
18        point: Tuple (a, b) - point of tangency
19    """
20    a, b = point
21    z0 = f(a, b)
22    fx_val = fx(a, b)
23    fy_val = fy(a, b)
24
25    # Create grid for visualization
26    x = np.linspace(a - grid_range, a + grid_range, grid_size)
27    y = np.linspace(b - grid_range, b + grid_range, grid_size)
28    X, Y = np.meshgrid(x, y)
29
30    # Surface values
31    Z_surface = f(X, Y)
32
33    # Tangent plane: z = z0 + fx*(x - a) + fy*(y - b)
34    Z_plane = z0 + fx_val * (X - a) + fy_val * (Y - b)
35
36    return X, Y, Z_surface, Z_plane, (z0, fx_val, fy_val)
37
38def linear_approximation(f, fx, fy, base_point, eval_point):
39    """
40    Use tangent plane to approximate f near a point.
41
42    L(x, y) ≈ f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)
43
44    This is LINEAR approximation because the approximating
45    function is a linear function of (x - a) and (y - b).
46    """
47    a, b = base_point
48    x, y = eval_point
49
50    z0 = f(a, b)
51    approx = z0 + fx(a, b) * (x - a) + fy(a, b) * (y - b)
52    actual = f(x, y)
53    error = abs(actual - approx)
54
55    return approx, actual, error
56
57# Example: Paraboloid f(x, y) = x² + y²
58def f(x, y):
59    return x**2 + y**2
60
61def fx(x, y):  # ∂f/∂x = 2x
62    return 2 * x
63
64def fy(x, y):  # ∂f/∂y = 2y
65    return 2 * y
66
67# Compute tangent plane at (1, 1)
68point = (1, 1)
69X, Y, Z_surface, Z_plane, params = compute_tangent_plane(f, fx, fy, point)
70z0, fx_val, fy_val = params
71
72print("Tangent Plane Analysis")
73print("=" * 50)
74print(f"Surface: z = x² + y²")
75print(f"Point of tangency: ({point[0]}, {point[1]}, {z0})")
76print(f"∂f/∂x at point: {fx_val}")
77print(f"∂f/∂y at point: {fy_val}")
78print(f"\nTangent plane equation:")
79print(f"z = {z0} + {fx_val}(x - {point[0]}) + {fy_val}(y - {point[1]})")
80print(f"z = {z0} + {fx_val}x - {fx_val*point[0]} + {fy_val}y - {fy_val*point[1]}")
81print(f"z = {fx_val}x + {fy_val}y + {z0 - fx_val*point[0] - fy_val*point[1]}")
82
83# Test linear approximation
84test_point = (1.1, 1.05)
85approx, actual, error = linear_approximation(f, fx, fy, point, test_point)
86print(f"\nLinear Approximation Test:")
87print(f"Test point: {test_point}")
88print(f"Actual f(x, y) = {actual:.6f}")
89print(f"Linear approx  = {approx:.6f}")
90print(f"Error          = {error:.6f}")
91print(f"Relative error = {100*error/actual:.4f}%")

Machine Learning Applications

Linear Approximation in ML
🐍linear_approx_ml.py
7Gradient Descent Connection

Gradient descent uses the linear approximation L(θ + Δθ) ≈ L(θ) + ∇L · Δθ to predict how the loss will change. Moving in the negative gradient direction minimizes this linear approximation.

35Tangent Hyperplane

The formula L(x) = f(x₀) + ∇f(x₀)ᵀ(x - x₀) is the equation of the tangent hyperplane at x₀. This generalizes the 2D tangent plane to any number of dimensions.

56Sensitivity Analysis

The gradient tells us how sensitive the output is to each input feature. Large gradient components indicate features that strongly affect the output - useful for feature importance and interpretability.

139 lines without explanation
1import numpy as np
2from typing import Callable, Tuple
3
4class LinearizationForML:
5    """
6    Linear approximation concepts are fundamental to machine learning:
7
8    1. GRADIENT DESCENT: Uses linear approximation to predict
9       loss change: L(θ + Δθ) ≈ L(θ) + ∇L · Δθ
10
11    2. TAYLOR EXPANSION: Second-order methods (Newton, BFGS)
12       use quadratic approximation beyond linear
13
14    3. JACOBIAN: For vector-valued functions, the tangent plane
15       generalizes to the Jacobian matrix
16
17    4. SENSITIVITY ANALYSIS: How do small input changes
18       affect outputs? Linear approximation quantifies this.
19    """
20
21    def __init__(self, f: Callable, grad_f: Callable):
22        """
23        f: Scalar function of vector input
24        grad_f: Gradient function returning vector
25        """
26        self.f = f
27        self.grad_f = grad_f
28
29    def linear_model(self, x0: np.ndarray) -> Tuple[Callable, float, np.ndarray]:
30        """
31        Build linear approximation (tangent hyperplane) at x0.
32
33        L(x) = f(x0) + ∇f(x0)ᵀ(x - x0)
34
35        This is exactly what gradient descent uses to decide
36        which direction to step!
37        """
38        f0 = self.f(x0)
39        grad0 = self.grad_f(x0)
40
41        def L(x):
42            return f0 + np.dot(grad0, x - x0)
43
44        return L, f0, grad0
45
46    def gradient_descent_step(self, x0: np.ndarray, learning_rate: float) -> np.ndarray:
47        """
48        Gradient descent moves in the direction that the
49        LINEAR APPROXIMATION predicts will decrease f most.
50
51        The gradient ∇f points in the direction of steepest
52        ascent of the tangent plane, so we go the opposite way.
53        """
54        _, _, grad0 = self.linear_model(x0)
55        return x0 - learning_rate * grad0
56
57    def sensitivity_analysis(self, x0: np.ndarray, perturbation: np.ndarray) -> dict:
58        """
59        How much does f change when we perturb the input?
60
61        Linear approximation: Δf ≈ ∇f · Δx
62
63        This tells us which input features most affect the output -
64        crucial for interpretability and feature importance!
65        """
66        L, f0, grad0 = self.linear_model(x0)
67
68        # Predicted change using linear approx
69        predicted_change = np.dot(grad0, perturbation)
70
71        # Actual change
72        x_new = x0 + perturbation
73        actual_change = self.f(x_new) - f0
74
75        # Feature contributions
76        feature_contributions = grad0 * perturbation
77
78        return {
79            'predicted_change': predicted_change,
80            'actual_change': actual_change,
81            'approximation_error': abs(actual_change - predicted_change),
82            'gradient': grad0,
83            'feature_contributions': feature_contributions,
84            'relative_importance': np.abs(feature_contributions) / (np.sum(np.abs(feature_contributions)) + 1e-10)
85        }
86
87# Example: Logistic loss function (common in ML)
88def logistic_loss(w):
89    """
90    Simplified logistic loss for demonstration.
91    In practice: L(w) = -Σ[y log(σ(wx)) + (1-y) log(1 - σ(wx))]
92    """
93    x = np.array([1.0, 2.0, -1.0])  # Fixed data point
94    y = 1  # Label
95    z = np.dot(w, x)
96    sigmoid = 1 / (1 + np.exp(-z))
97    return -y * np.log(sigmoid + 1e-10) - (1 - y) * np.log(1 - sigmoid + 1e-10)
98
99def logistic_loss_gradient(w):
100    """Gradient of logistic loss."""
101    x = np.array([1.0, 2.0, -1.0])
102    y = 1
103    z = np.dot(w, x)
104    sigmoid = 1 / (1 + np.exp(-z))
105    return (sigmoid - y) * x
106
107# Demonstrate linear approximation in ML context
108print("Linear Approximation in Machine Learning")
109print("=" * 60)
110
111# Initialize model
112model = LinearizationForML(logistic_loss, logistic_loss_gradient)
113
114# Current weights
115w0 = np.array([0.5, -0.3, 0.8])
116print(f"Current weights: {w0}")
117print(f"Current loss: {logistic_loss(w0):.6f}")
118
119# Build linear model
120L, f0, grad0 = model.linear_model(w0)
121print(f"\nGradient at current point: {grad0}")
122print(f"Gradient magnitude: {np.linalg.norm(grad0):.6f}")
123
124# Sensitivity analysis
125perturbation = np.array([0.1, -0.05, 0.02])
126analysis = model.sensitivity_analysis(w0, perturbation)
127
128print(f"\nSensitivity Analysis:")
129print(f"Perturbation: {perturbation}")
130print(f"Predicted loss change (linear): {analysis['predicted_change']:.6f}")
131print(f"Actual loss change: {analysis['actual_change']:.6f}")
132print(f"Approximation error: {analysis['approximation_error']:.6f}")
133print(f"\nFeature contributions to predicted change:")
134for i, contrib in enumerate(analysis['feature_contributions']):
135    print(f"  Feature {i}: {contrib:+.6f} ({100*analysis['relative_importance'][i]:.1f}%)")
136
137# Gradient descent step
138w1 = model.gradient_descent_step(w0, learning_rate=0.1)
139print(f"\nGradient Descent Step (lr=0.1):")
140print(f"New weights: {w1}")
141print(f"New loss: {logistic_loss(w1):.6f}")
142print(f"Loss decreased by: {logistic_loss(w0) - logistic_loss(w1):.6f}")

Test Your Understanding


Summary

Tangent planes and linear approximations extend the powerful ideas of single-variable calculus to functions of multiple variables. They provide a way to locally "flatten" a curved surface, making analysis and computation tractable.

Key Formulas

ConceptFormula
Tangent Planez = f(a,b) + f_x(a,b)(x - a) + f_y(a,b)(y - b)
Linear ApproximationL(x,y) = f(a,b) + f_x(a,b)(x - a) + f_y(a,b)(y - b)
Total Differentialdz = f_x dx + f_y dy
Normal Vectorn = ⟨-f_x, -f_y, 1⟩ (or ⟨f_x, f_y, -1⟩)
Approximation Error|f - L| = O(distance²)

Key Concepts

  1. The tangent plane at a point is determined by the two partial derivatives at that point
  2. Linear approximation uses the tangent plane equation to estimate function values near the point of tangency
  3. The total differential dz=fxdx+fydydz = f_x dx + f_y dy estimates the change in zz from small changes in xx and yy
  4. Differentiability in multiple variables is stronger than just having partial derivatives — it requires the linear approximation to be good
  5. The approximation error is quadratic in distance, making linear approximation very accurate for small displacements
  6. Gradient descent in machine learning is a direct application of linear approximation to optimization
The Essence of Differential Calculus:
"Locally, every smooth surface is flat. The tangent plane captures this local flatness, and linear approximation exploits it."
Coming Next: In the next section, we'll explore The Chain Rule for Multivariable Functions — how to compute derivatives when variables depend on other variables. This is the foundation of backpropagation in neural networks.
Loading comments...