Derive the equation of the tangent plane to a surface z=f(x,y) at any point using partial derivatives
Construct linear approximations to functions of two variables and estimate values near a point of tangency
Calculate total differentials and understand their relationship to linear approximation
Distinguish between continuity and differentiability in multiple dimensions
Analyze the error in linear approximations and understand when the approximation is valid
Apply these concepts to problems in optimization, sensitivity analysis, and machine learning
The Big Picture: Flattening the World
"Every smooth surface, when viewed closely enough, looks flat. The tangent plane is that flat approximation."
When you stand on the Earth, you perceive it as flat even though it is a sphere. This is because at any point, the Earth's surface can be well-approximated by a flat plane — the tangent plane at that point. This fundamental idea extends to any smooth surface in mathematics.
In single-variable calculus, we learned that a differentiable function y=f(x) can be approximated near a point x=a by its tangent line:
f(x)≈f(a)+f′(a)(x−a)
Now we extend this powerful idea to functions of two variables. Instead of a tangent line, we have a tangent plane. Instead of one derivative, we use both partial derivatives.
Single Variable
Curve y=f(x)
Tangent line at x=a
Uses one derivative: f′(a)
Linear approximation in 1D
Two Variables
Surface z=f(x,y)
Tangent plane at (a,b)
Uses two partials: fx,fy
Linear approximation in 2D
Why Tangent Planes Matter
Tangent planes are the foundation of differential calculus in higher dimensions. They enable:
Optimization: Gradient descent uses linear approximation to find minima
Sensitivity Analysis: How do small input changes affect outputs?
Error Propagation: How do measurement errors compound?
Numerical Methods: Newton's method in multiple dimensions
From Tangent Lines to Tangent Planes
Consider a surface z=f(x,y) and a point P=(a,b,f(a,b)) on this surface. We want to find the plane that best approximates the surface near this point.
The Two Tangent Lines
At point P, we can draw two special curves on the surface:
The curve where y=b (constant): This gives us z=f(x,b), a function of x alone. Its tangent line at x=a has slope fx(a,b).
The curve where x=a (constant): This gives us z=f(a,y), a function of y alone. Its tangent line at y=b has slope fy(a,b).
These two tangent lines lie in the tangent plane, and together they determine the tangent plane uniquely.
Key Geometric Insight
The tangent plane contains:
The tangent to the x-slice curve (red): direction ⟨1,0,fx⟩
The tangent to the y-slice curve (blue): direction ⟨0,1,fy⟩
Any plane containing both these vectors is the tangent plane.
The Tangent Plane Equation
Let f(x,y) be a function with continuous partial derivatives near (a,b). The tangent plane to the surface z=f(x,y) at the point (a,b,f(a,b)) is:
Tangent Plane Equation
z−f(a,b)=fx(a,b)(x−a)+fy(a,b)(y−b)
Or equivalently:
z=f(a,b)+fx(a,b)(x−a)+fy(a,b)(y−b)
Deriving the Normal Vector
We can also express the tangent plane using its normal vector. Rewrite the surface as F(x,y,z)=f(x,y)−z=0. The gradient of F is:
∇F=⟨fx,fy,−1⟩
This vector is normal (perpendicular) to the tangent plane. The tangent plane can then be written as:
fx(a,b)(x−a)+fy(a,b)(y−b)−(z−f(a,b))=0
Example: Paraboloid
Find the tangent plane to z=x2+y2 at the point (1,2,5).
Solution:
First, compute the partial derivatives:
fx=2x, so fx(1,2)=2
fy=2y, so fy(1,2)=4
The tangent plane equation is:
z−5=2(x−1)+4(y−2)
Simplifying: z=2x+4y−5
Interactive: Explore Tangent Planes
Explore how the tangent plane changes as you move the point of tangency on different surfaces. Notice how the plane tilts based on the partial derivatives.
Loading 3D visualization...
Linear Approximation
The tangent plane provides a linear approximation to the function near the point of tangency. This is also called the linearization of f at (a,b).
Linear Approximation (Linearization)
L(x,y)=f(a,b)+fx(a,b)(x−a)+fy(a,b)(y−b)
For points (x,y) near (a,b):
f(x,y)≈L(x,y)
Why "Linear"?
The approximating function L(x,y) is linear in the displacements (x−a) and (y−b). If we set Δx=x−a and Δy=y−b:
f(a+Δx,b+Δy)≈f(a,b)+fx(a,b)Δx+fy(a,b)Δy
This is a first-degree polynomial in Δx and Δy.
Example: Estimating a Square Root
Estimate (3.02)2+(3.97)2 using linear approximation.
Solution:
Let f(x,y)=x2+y2. We use the base point (a,b)=(3,4).
In words: the linear approximation error goes to zero faster than the distance to the point of tangency.
Sufficient Condition for Differentiability
Sufficient Condition
If the partial derivatives fx and fy exist and are continuous in a neighborhood of (a,b), then f is differentiable at (a,b).
This is why we often require continuity of partial derivatives — it guarantees that tangent planes exist and the linear approximation is valid.
A Classic Counterexample
Consider the function:
f(x,y)={x2+y2xy0(x,y)=(0,0)(x,y)=(0,0)
At the origin:
fx(0,0)=0 (computed using the definition)
fy(0,0)=0 (computed using the definition)
But f is not continuous at (0,0) — approaching along y=x gives f=1/2
Since f is not even continuous, it cannot be differentiable. The tangent plane formula gives z=0, but this is a terrible approximation for points not on the axes.
Error Analysis
How good is the linear approximation? The answer depends on how far we are from the point of tangency and how curved the surface is.
The Error Formula
If f has continuous second partial derivatives, the error in the linear approximation is:
f(x,y)−L(x,y)=O((x−a)2+(y−b)22)
This means the error is approximately proportional to the square of the distance from the point of tangency.
Distance from (a, b)
Relative Error Scale
0.1
~0.01 (1%)
0.01
~0.0001 (0.01%)
0.001
~0.000001 (0.0001%)
This quadratic behavior is why linear approximation works so well for small displacements — the error shrinks much faster than the distance.
Interactive: Explore Approximation Error
Move the test point and observe how the approximation error grows as you move away from the base point. Notice the quadratic behavior of the error.
Loading visualization...
Real-World Applications
1. Engineering: Stress and Strain
In structural engineering, the stress σ in a material often depends on multiple loading conditions. Linear approximation lets engineers estimate how small changes in loads affect stress:
dσ=∂F1∂σdF1+∂F2∂σdF2
2. Physics: Thermodynamics
For an ideal gas, pressure P depends on volume V and temperature T:
P=VnRT
The total differential gives:
dP=VnRdT−V2nRTdV
This tells us how pressure changes with small variations in temperature and volume.
3. Economics: Marginal Analysis
If profit P(q1,q2) depends on quantities of two products, the total differential:
dP=∂q1∂Pdq1+∂q2∂Pdq2
tells us how to adjust production for maximum profit increase. The partial derivatives are the marginal profits.
4. Navigation: Small-Angle Approximations
GPS systems use linear approximation to convert latitude/longitude changes to distances. For small changes near a point (ϕ0,λ0):
Δx≈Rcos(ϕ0)Δλ,Δy≈RΔϕ
Machine Learning Connections
Linear approximation is the mathematical foundation of several key concepts in machine learning.
Gradient Descent
When training a neural network, we minimize a loss function L(θ) where θ represents all the weights. Gradient descent uses the linear approximation:
L(θ+Δθ)≈L(θ)+∇L⋅Δθ
To decrease the loss, we choose Δθ=−η∇L (negative gradient direction), giving the update rule:
θnew=θold−η∇L
Why This Works
The linear approximation tells us the loss will decrease by approximately η∣∇L∣2 per step. As long as the learning rate η is small enough that the linear approximation is accurate, we're guaranteed to make progress.
Taylor Expansion for Second-Order Methods
Methods like Newton's method and L-BFGS use the second-order Taylor expansion:
L(θ+Δθ)≈L(θ)+∇L⋅Δθ+21ΔθTHΔθ
where H is the Hessian matrix of second derivatives. This gives faster convergence near the minimum.
Sensitivity and Feature Importance
The gradient ∇f tells us how sensitive the output is to each input feature:
Large∣∂f/∂xi∣: Feature xi strongly affects the output
Small∣∂f/∂xi∣: Feature xi has little effect on the output
This is the basis of gradient-based feature importance and saliency maps in deep learning interpretability.
Jacobian for Vector-Valued Functions
For neural networks with multiple outputs, the tangent plane generalizes to the Jacobian matrix:
J=∂x1∂f1⋮∂x1∂fm⋯⋱⋯∂xn∂f1⋮∂xn∂fm
The linear approximation becomes f(x+Δx)≈f(x)+JΔx.
Python Implementation
Computing Tangent Planes
Tangent Plane Computation
🐍tangent_plane.py
Explanation(3)
Code(91)
8Tangent Plane Formula
The tangent plane z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b) is the 2D generalization of the tangent line. It uses both partial derivatives to capture how the surface changes in both the x and y directions.
36Linear Approximation
L(x, y) approximates f(x, y) near (a, b). The approximation is 'linear' because the formula is first-degree in (x - a) and (y - b). Higher-order terms are dropped.
50Partial Derivatives
For f(x, y) = x² + y², we have f_x = 2x (derivative treating y as constant) and f_y = 2y (derivative treating x as constant). These give the slopes of the tangent plane.
88 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3from mpl_toolkits.mplot3d import Axes3D
45defcompute_tangent_plane(f, fx, fy, point, grid_range=1.0, grid_size=20):6"""
7 Compute tangent plane to surface z = f(x, y) at a given point.
89 The tangent plane at (a, b, f(a, b)) is:
10 z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)
1112 This is the 2D analog of the tangent line equation!
1314 Parameters:
15 f: Function f(x, y)
16 fx: Partial derivative with respect to x
17 fy: Partial derivative with respect to y
18 point: Tuple (a, b) - point of tangency
19 """20 a, b = point
21 z0 = f(a, b)22 fx_val = fx(a, b)23 fy_val = fy(a, b)2425# Create grid for visualization26 x = np.linspace(a - grid_range, a + grid_range, grid_size)27 y = np.linspace(b - grid_range, b + grid_range, grid_size)28 X, Y = np.meshgrid(x, y)2930# Surface values31 Z_surface = f(X, Y)3233# Tangent plane: z = z0 + fx*(x - a) + fy*(y - b)34 Z_plane = z0 + fx_val *(X - a)+ fy_val *(Y - b)3536return X, Y, Z_surface, Z_plane,(z0, fx_val, fy_val)3738deflinear_approximation(f, fx, fy, base_point, eval_point):39"""
40 Use tangent plane to approximate f near a point.
4142 L(x, y) ≈ f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b)
4344 This is LINEAR approximation because the approximating
45 function is a linear function of (x - a) and (y - b).
46 """47 a, b = base_point
48 x, y = eval_point
4950 z0 = f(a, b)51 approx = z0 + fx(a, b)*(x - a)+ fy(a, b)*(y - b)52 actual = f(x, y)53 error =abs(actual - approx)5455return approx, actual, error
5657# Example: Paraboloid f(x, y) = x² + y²58deff(x, y):59return x**2+ y**26061deffx(x, y):# ∂f/∂x = 2x62return2* x
6364deffy(x, y):# ∂f/∂y = 2y65return2* y
6667# Compute tangent plane at (1, 1)68point =(1,1)69X, Y, Z_surface, Z_plane, params = compute_tangent_plane(f, fx, fy, point)70z0, fx_val, fy_val = params
7172print("Tangent Plane Analysis")73print("="*50)74print(f"Surface: z = x² + y²")75print(f"Point of tangency: ({point[0]}, {point[1]}, {z0})")76print(f"∂f/∂x at point: {fx_val}")77print(f"∂f/∂y at point: {fy_val}")78print(f"\nTangent plane equation:")79print(f"z = {z0} + {fx_val}(x - {point[0]}) + {fy_val}(y - {point[1]})")80print(f"z = {z0} + {fx_val}x - {fx_val*point[0]} + {fy_val}y - {fy_val*point[1]}")81print(f"z = {fx_val}x + {fy_val}y + {z0 - fx_val*point[0]- fy_val*point[1]}")8283# Test linear approximation84test_point =(1.1,1.05)85approx, actual, error = linear_approximation(f, fx, fy, point, test_point)86print(f"\nLinear Approximation Test:")87print(f"Test point: {test_point}")88print(f"Actual f(x, y) = {actual:.6f}")89print(f"Linear approx = {approx:.6f}")90print(f"Error = {error:.6f}")91print(f"Relative error = {100*error/actual:.4f}%")
Machine Learning Applications
Linear Approximation in ML
🐍linear_approx_ml.py
Explanation(3)
Code(142)
7Gradient Descent Connection
Gradient descent uses the linear approximation L(θ + Δθ) ≈ L(θ) + ∇L · Δθ to predict how the loss will change. Moving in the negative gradient direction minimizes this linear approximation.
35Tangent Hyperplane
The formula L(x) = f(x₀) + ∇f(x₀)ᵀ(x - x₀) is the equation of the tangent hyperplane at x₀. This generalizes the 2D tangent plane to any number of dimensions.
56Sensitivity Analysis
The gradient tells us how sensitive the output is to each input feature. Large gradient components indicate features that strongly affect the output - useful for feature importance and interpretability.
139 lines without explanation
1import numpy as np
2from typing import Callable, Tuple
34classLinearizationForML:5"""
6 Linear approximation concepts are fundamental to machine learning:
78 1. GRADIENT DESCENT: Uses linear approximation to predict
9 loss change: L(θ + Δθ) ≈ L(θ) + ∇L · Δθ
1011 2. TAYLOR EXPANSION: Second-order methods (Newton, BFGS)
12 use quadratic approximation beyond linear
1314 3. JACOBIAN: For vector-valued functions, the tangent plane
15 generalizes to the Jacobian matrix
1617 4. SENSITIVITY ANALYSIS: How do small input changes
18 affect outputs? Linear approximation quantifies this.
19 """2021def__init__(self, f: Callable, grad_f: Callable):22"""
23 f: Scalar function of vector input
24 grad_f: Gradient function returning vector
25 """26 self.f = f
27 self.grad_f = grad_f
2829deflinear_model(self, x0: np.ndarray)-> Tuple[Callable,float, np.ndarray]:30"""
31 Build linear approximation (tangent hyperplane) at x0.
3233 L(x) = f(x0) + ∇f(x0)ᵀ(x - x0)
3435 This is exactly what gradient descent uses to decide
36 which direction to step!
37 """38 f0 = self.f(x0)39 grad0 = self.grad_f(x0)4041defL(x):42return f0 + np.dot(grad0, x - x0)4344return L, f0, grad0
4546defgradient_descent_step(self, x0: np.ndarray, learning_rate:float)-> np.ndarray:47"""
48 Gradient descent moves in the direction that the
49 LINEAR APPROXIMATION predicts will decrease f most.
5051 The gradient ∇f points in the direction of steepest
52 ascent of the tangent plane, so we go the opposite way.
53 """54 _, _, grad0 = self.linear_model(x0)55return x0 - learning_rate * grad0
5657defsensitivity_analysis(self, x0: np.ndarray, perturbation: np.ndarray)->dict:58"""
59 How much does f change when we perturb the input?
6061 Linear approximation: Δf ≈ ∇f · Δx
6263 This tells us which input features most affect the output -
64 crucial for interpretability and feature importance!
65 """66 L, f0, grad0 = self.linear_model(x0)6768# Predicted change using linear approx69 predicted_change = np.dot(grad0, perturbation)7071# Actual change72 x_new = x0 + perturbation
73 actual_change = self.f(x_new)- f0
7475# Feature contributions76 feature_contributions = grad0 * perturbation
7778return{79'predicted_change': predicted_change,80'actual_change': actual_change,81'approximation_error':abs(actual_change - predicted_change),82'gradient': grad0,83'feature_contributions': feature_contributions,84'relative_importance': np.abs(feature_contributions)/(np.sum(np.abs(feature_contributions))+1e-10)85}8687# Example: Logistic loss function (common in ML)88deflogistic_loss(w):89"""
90 Simplified logistic loss for demonstration.
91 In practice: L(w) = -Σ[y log(σ(wx)) + (1-y) log(1 - σ(wx))]
92 """93 x = np.array([1.0,2.0,-1.0])# Fixed data point94 y =1# Label95 z = np.dot(w, x)96 sigmoid =1/(1+ np.exp(-z))97return-y * np.log(sigmoid +1e-10)-(1- y)* np.log(1- sigmoid +1e-10)9899deflogistic_loss_gradient(w):100"""Gradient of logistic loss."""101 x = np.array([1.0,2.0,-1.0])102 y =1103 z = np.dot(w, x)104 sigmoid =1/(1+ np.exp(-z))105return(sigmoid - y)* x
106107# Demonstrate linear approximation in ML context108print("Linear Approximation in Machine Learning")109print("="*60)110111# Initialize model112model = LinearizationForML(logistic_loss, logistic_loss_gradient)113114# Current weights115w0 = np.array([0.5,-0.3,0.8])116print(f"Current weights: {w0}")117print(f"Current loss: {logistic_loss(w0):.6f}")118119# Build linear model120L, f0, grad0 = model.linear_model(w0)121print(f"\nGradient at current point: {grad0}")122print(f"Gradient magnitude: {np.linalg.norm(grad0):.6f}")123124# Sensitivity analysis125perturbation = np.array([0.1,-0.05,0.02])126analysis = model.sensitivity_analysis(w0, perturbation)127128print(f"\nSensitivity Analysis:")129print(f"Perturbation: {perturbation}")130print(f"Predicted loss change (linear): {analysis['predicted_change']:.6f}")131print(f"Actual loss change: {analysis['actual_change']:.6f}")132print(f"Approximation error: {analysis['approximation_error']:.6f}")133print(f"\nFeature contributions to predicted change:")134for i, contrib inenumerate(analysis['feature_contributions']):135print(f" Feature {i}: {contrib:+.6f} ({100*analysis['relative_importance'][i]:.1f}%)")136137# Gradient descent step138w1 = model.gradient_descent_step(w0, learning_rate=0.1)139print(f"\nGradient Descent Step (lr=0.1):")140print(f"New weights: {w1}")141print(f"New loss: {logistic_loss(w1):.6f}")142print(f"Loss decreased by: {logistic_loss(w0)- logistic_loss(w1):.6f}")
Test Your Understanding
Summary
Tangent planes and linear approximations extend the powerful ideas of single-variable calculus to functions of multiple variables. They provide a way to locally "flatten" a curved surface, making analysis and computation tractable.
Key Formulas
Concept
Formula
Tangent Plane
z = f(a,b) + f_x(a,b)(x - a) + f_y(a,b)(y - b)
Linear Approximation
L(x,y) = f(a,b) + f_x(a,b)(x - a) + f_y(a,b)(y - b)
Total Differential
dz = f_x dx + f_y dy
Normal Vector
n = ⟨-f_x, -f_y, 1⟩ (or ⟨f_x, f_y, -1⟩)
Approximation Error
|f - L| = O(distance²)
Key Concepts
The tangent plane at a point is determined by the two partial derivatives at that point
Linear approximation uses the tangent plane equation to estimate function values near the point of tangency
The total differentialdz=fxdx+fydy estimates the change in z from small changes in x and y
Differentiability in multiple variables is stronger than just having partial derivatives — it requires the linear approximation to be good
The approximation error is quadratic in distance, making linear approximation very accurate for small displacements
Gradient descent in machine learning is a direct application of linear approximation to optimization
The Essence of Differential Calculus:
"Locally, every smooth surface is flat. The tangent plane captures this local flatness, and linear approximation exploits it."
Coming Next: In the next section, we'll explore The Chain Rule for Multivariable Functions — how to compute derivatives when variables depend on other variables. This is the foundation of backpropagation in neural networks.