Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Understand the geometric meaning of constrained optimization and why the gradients of f and g must be parallel at optimal points
Apply the method of Lagrange multipliers to solve optimization problems with equality constraints
Construct and analyze the Lagrangian function for single and multiple constraints
Interpret the Lagrange multiplier λ as the shadow price or marginal value of relaxing the constraint
Extend the method to problems with multiple equality constraints
Connect Lagrange multipliers to machine learning applications including SVMs and constrained optimization

The Big Picture: Optimization Under Constraints

"The art of constrained optimization is finding the best you can do while respecting the rules of the game."

In the real world, optimization rarely happens in a vacuum. A company wants to maximize profit, but has limited resources. A physicist seeks the minimum energy configuration, but certain quantities are conserved. A machine learning algorithm minimizes loss, but model parameters must satisfy regularization constraints.

Lagrange multipliers provide an elegant method to handle such constrained optimization problems. Instead of searching the entire domain of a function, we search only along a constraint surface—finding where the objective function is optimized while staying on that surface.

Why This Matters

Lagrange multipliers appear throughout science and engineering:

Economics: Maximize utility subject to budget constraints
Physics: Find equilibrium states with conservation laws
Engineering: Optimize designs with material and geometric constraints
Machine Learning: Train SVMs, constrained neural networks, and Lagrangian relaxation
Statistics: Maximum entropy distributions and exponential families

Historical Context

The method was developed by Joseph-Louis Lagrange (1736-1813), one of the greatest mathematicians of the 18th century. Working in Turin and later Paris, Lagrange made fundamental contributions to analysis, number theory, and mechanics.

Lagrange introduced his multiplier technique in the context of mechanics, where he sought to find equilibrium positions of systems subject to constraints. The method appears in his monumental work Mécanique Analytique (1788), which reformulated Newtonian mechanics in a unified variational framework.

Lagrange's Insight

Lagrange realized that instead of trying to eliminate constraints by substitution, you can "incorporate" them into the problem using auxiliary variables—the multipliers. This transforms a constrained problem into an unconstrained one in a higher-dimensional space.

The Constrained Optimization Problem

We consider the following problem:

Standard Constrained Optimization Problem

Maximize (or minimize) $f(x, y)$

subject to $g(x, y) = 0$

Here, $f$ is the objective function we want to optimize, and $g(x, y) = 0$ is the constraint that restricts our domain to a curve (or surface in higher dimensions).

Key Question: How do we find the point on the constraint curve where $f$ achieves its maximum or minimum value?

Geometric Intuition

Consider the level curves of $f(x, y) = c$ for various values of $c$ . As $c$ changes, these curves sweep across the plane.

Now imagine walking along the constraint curve $g(x, y) = 0$ and watching the value of $f$ . At most points, the level curves of $f$ cross the constraint transversally (non-tangentially). This means you can move along the constraint to increase or decrease $f$ .

At the optimum, something special happens: the level curve of $f$ is tangent to the constraint curve. You cannot improve $f$ while staying on the constraint!

The Tangency Condition

If level curves of $f$ and the constraint $g = 0$ are tangent at a point, then their normal vectors are parallel. But the normal to a level curve is the gradient! Therefore:

∇f = λ∇g for some scalar λ

Interactive: Geometry Visualizer

Explore how the gradients of $f$ and $g$ relate at different points on the constraint. At the optimal point, the gradients become parallel:

Lagrange Multipliers: Geometric Visualization

Maximize/minimize x + y on the unit circle

Example:

Position on Constraint

Move the point along the constraint to see when ∇f and ∇g become parallel

Current Point

x = -1.000, y = 0.000

f(x, y) = -1.0000

Constraint: x² + y² = 1

Key Insight

At the optimal point, the gradient of f (orange) is parallel to the gradient of g (green). This means ∇f = λ∇g for some scalar λ (the Lagrange multiplier).

The Lagrange Condition

Our geometric insight leads to the fundamental condition for constrained optimization:

The Lagrange Condition

\nabla f(x, y) = \lambda \nabla g(x, y)

At a constrained optimum, the gradient of f is parallel to the gradient of g

Writing this out in components for a function of two variables:

\frac{\partial f}{\partial x} = \lambda \frac{\partial g}{\partial x}

\frac{\partial f}{\partial y} = \lambda \frac{\partial g}{\partial y}

g(x, y) = 0

(the constraint)

This gives us three equations in three unknowns: $x$ , $y$ , and $\lambda$ .

The Lagrangian Function

We can unify these conditions elegantly by defining the Lagrangian:

The Lagrangian

\mathcal{L}(x, y, \lambda) = f(x, y) - \lambda \cdot g(x, y)

The Lagrange conditions are simply the requirement that all partial derivatives of $\mathcal{L}$ vanish:

$\frac{\partial \mathcal{L}}{\partial x} = 0$ gives $\frac{\partial f}{\partial x} = \lambda \frac{\partial g}{\partial x}$
$\frac{\partial \mathcal{L}}{\partial y} = 0$ gives $\frac{\partial f}{\partial y} = \lambda \frac{\partial g}{\partial y}$
$\frac{\partial \mathcal{L}}{\partial \lambda} = 0$ gives $g(x, y) = 0$ (the constraint!)

Sign Convention

Some texts write $\mathcal{L} = f + \lambda g$ with a plus sign. This changes the sign of λ but gives the same critical points. We use the minus sign for consistency with physics conventions.

Interactive: The Lagrange Condition

Explore the mathematical derivation and see how the condition $\nabla f = \lambda \nabla g$ emerges:

The Lagrange Condition: ∇f = λ∇g

Interactive Example: Maximize x + y on x² + y² = 1

Lagrange Multiplier λ = 0.500

Optimal λ ≈ 0.7071 (= 1/√2)

∇f =(1, 1)

∇g at (x, y) =(2.000, 2.000)

λ∇g =(1.000, 1.000)

Point (x, y) =(1.000, 1.000)

g(x, y) =1.0000 ✗

What is λ?

The Lagrange multiplier λ measures how much the optimal value of f changes per unit change in the constraint. It's the shadow price of the constraint.

Why Parallel?

If ∇f and ∇g aren't parallel, you can move along the constraint in a direction that increases f. At the optimum, no such direction exists.

Physical Meaning

∇g points perpendicular to the constraint. At the optimum, ∇f also points perpendicular—no tangential component means no way to improve.

The Method: Step by Step

To solve a constrained optimization problem using Lagrange multipliers:

Identify the objective function $f(x, y)$ and constraint $g(x, y) = 0$
Compute the gradients $\nabla f$ and $\nabla g$
Set up the equations $\nabla f = \lambda \nabla g$ and $g = 0$
Solve the system for $x$ , $y$ , and $\lambda$
Evaluate $f$ at each critical point to find the maximum and minimum
Interpret $\lambda$ as the marginal value of the constraint

Critical Point ≠ Optimum

Lagrange multipliers find critical points, which are candidates for maxima and minima. You must check whether each is a max, min, or saddle point, and compare values to find the global optimum.

Worked Examples

Interactive Solver

Work through detailed examples step by step. See how the gradient condition leads to the solution:

Step-by-Step Lagrange Multiplier Solutions

Problem:

Objective

f(x, y) = x + y

Constraint

g(x, y) = x² + y² - 1 = 0

1Step 1: Identify the Functions

Objective: f(x, y) = x + y

Constraint: g(x, y) = x² + y² - 1 = 0

Gradient calculation

Solving ∇f = λ∇g

Applying constraint

Final result

3D Visualization

See the geometry in 3D: the objective function as a surface, the constraint as a cylinder, and how the optimal level curve is tangent to the constraint:

3D Visualization: Constraint Surface and Level Curves

Maximize f(x, y) = x + y on x² + y² = r²

Radius r:

1.5

Blue Surface

The objective function f(x, y) = x + y. Level curves are parallel diagonal lines.

Green Cylinder

The constraint x² + y² = r². We must stay on this circle.

Yellow Line & Point

The optimal level curve tangent to the constraint. Maximum at f = 2.121.

Geometric Insight: The optimal point occurs where a level curve of f is tangent to the constraint curve. At this point, ∇f (orange arrow, perpendicular to level curve) is parallel to ∇g (green arrow, perpendicular to constraint).

Multiple Constraints

The method extends naturally to multiple equality constraints. For $k$ constraints $g_1 = 0, g_2 = 0, \ldots, g_k = 0$ :

Multiple Constraint Lagrangian

\mathcal{L} = f - \lambda_1 g_1 - \lambda_2 g_2 - \cdots - \lambda_k g_k

\nabla f = \lambda_1 \nabla g_1 + \lambda_2 \nabla g_2 + \cdots + \lambda_k \nabla g_k

Geometrically, $\nabla f$ must lie in the span of the constraint gradients. The constraint gradients define the normal space to the constraint set, and $\nabla f$ must be orthogonal to the constraint set at an optimum.

Interactive: Multiple Constraints

Multiple Constraints: Extending the Method

General Form

For k constraints g₁(x) = 0, g₂(x) = 0, ..., gₖ(x) = 0:

L(x, λ₁, ..., λₖ) = f(x) - λ₁g₁(x) - λ₂g₂(x) - ... - λₖgₖ(x)

∇f = λ₁∇g₁ + λ₂∇g₂ + ... + λₖ∇gₖ

Problem

A consumer maximizes utility u(x, y) = xy subject to a budget constraint.

Maximize u(x, y) = xy

Subject to:

2x + 3y = 12 (budget constraint)

Lagrangian

L = xy - λ(2x + 3y - 12)

Necessary Conditions

∂L/∂x = y - 2λ = 0 → y = 2λ

∂L/∂y = x - 3λ = 0 → x = 3λ

∂L/∂λ = -(2x + 3y - 12) = 0

Key Points for Multiple Constraints

•Each constraint adds one Lagrange multiplier λᵢ
•The gradient condition becomes ∇f = Σλᵢ∇gᵢ (linear combination)
•Each multiplier has economic interpretation: marginal value of relaxing that constraint
•Constraints must be "independent" (constraint qualification)

Second Derivative Test: Bordered Hessian

To determine whether a critical point is a maximum, minimum, or saddle, we can use the bordered Hessian:

\bar{H} = \begin{pmatrix} 0 & g_x & g_y \\ g_x & \mathcal{L}_{xx} & \mathcal{L}_{xy} \\ g_y & \mathcal{L}_{xy} & \mathcal{L}_{yy} \end{pmatrix}

For a constrained problem with one constraint in two variables:

If $\det(\bar{H}) > 0$ : local maximum
If $\det(\bar{H}) < 0$ : local minimum

Practical Approach

In practice, especially for optimization problems where you know one exists, simply evaluating $f$ at all critical points and comparing is often sufficient.

Applications

Economics: Utility Maximization

A consumer has a utility function $U(x, y)$ representing satisfaction from consuming quantities $x$ and $y$ of two goods. With prices $p_x, p_y$ and budget $M$ :

Maximize $U(x, y)$

subject to $p_x x + p_y y = M$

The Lagrange multiplier $\lambda$ represents the marginal utility of income—how much additional utility the consumer gains from one more dollar of budget.

Concept	Mathematical Expression	Interpretation
Budget constraint	pₓx + pᵧy = M	Total spending equals income
Optimal condition	MUₓ/pₓ = MUᵧ/pᵧ = λ	Marginal utility per dollar is equal across goods
Shadow price	λ = dU*/dM	Marginal value of relaxing budget

Physics: Equilibrium Problems

Many physics problems involve finding equilibrium subject to constraints:

Minimum energy: Find the shape of a hanging chain (catenary) minimizing potential energy with fixed length
Maximum entropy: Find probability distributions maximizing entropy subject to moment constraints
Quantum mechanics: Minimize energy of electrons subject to normalization and orthogonality

Machine Learning Connection

Lagrange multipliers are fundamental to machine learning, appearing in both theory and algorithms:

Support Vector Machines (SVM)

The SVM seeks to find a maximum-margin hyperplane separating two classes. The primal problem is:

Minimize $\frac{1}{2}\|w\|^2$

subject to $y_i(w \cdot x_i + b) \geq 1$ for all $i$

Using Lagrange multipliers $\alpha_i \geq 0$ for each constraint, we form:

\mathcal{L} = \frac{1}{2}\|w\|^2 - \sum_{i=1}^n \alpha_i [y_i(w \cdot x_i + b) - 1]

Taking derivatives and converting to the dual problem gives the famous kernel SVM formulation where we optimize only over the $\alpha_i$ .

Support Vectors

The name "support vector" comes from the Lagrange multipliers: only training points with $\alpha_i > 0$ (active constraints) matter for defining the decision boundary.

KKT Conditions for Inequality Constraints

For problems with inequality constraints $g(x) \leq 0$ , Lagrange multipliers generalize to the Karush-Kuhn-Tucker (KKT) conditions:

1. Stationarity:

\nabla f = \lambda \nabla g

2. Primal feasibility:

g(x) \leq 0

3. Dual feasibility:

\lambda \geq 0

4. Complementary slackness:

\lambda \cdot g(x) = 0

Complementary slackness means: either the constraint is active ( $g = 0$ ) or the multiplier is zero ( $\lambda = 0$ ). Inactive constraints don't affect the solution.

ML Application	Role of Lagrange Multipliers
SVM	Dual variables αᵢ identify support vectors
Lasso regularization	L1 constraint converted via duality
Neural network constraints	KKT for constrained architectures
Fairness constraints	Ensure model predictions satisfy equity requirements
Maximum entropy models	Exponential family parameters from moment constraints

Python Implementation

Let's implement Lagrange multiplier methods in Python:

Basic Numerical Solver

Solving Lagrange Equations Numerically

🐍lagrange_basic.py

Explanation(4)

Code(49)

3Lagrange Solver Function

This function takes the objective f, constraint g, their gradients, and initial guesses to find the constrained optimum using the Lagrange conditions.

19The Lagrange Equations

We solve three equations: ∂f/∂x = λ∂g/∂x, ∂f/∂y = λ∂g/∂y, and g(x,y) = 0. These are the necessary conditions for a constrained optimum.

27Numerical Solution

We use scipy's fsolve to find roots of the system. The solution gives us x, y (the optimal point) and λ (the Lagrange multiplier).

37Example Problem

Maximize f(x,y) = x + y subject to x² + y² = 1 (unit circle). The theoretical maximum is √2 at (1/√2, 1/√2) with λ = 1/√2.

45 lines without explanation

1import numpy as np
2from scipy.optimize import minimize
3
4def lagrange_solver_2d(f, g, grad_f, grad_g, x0, lambda0=1.0):
5    """
6    Solve constrained optimization using Lagrange multipliers.
7
8    f: objective function f(x, y)
9    g: constraint function g(x, y) = 0
10    grad_f: gradient of f, returns [df/dx, df/dy]
11    grad_g: gradient of g, returns [dg/dx, dg/dy]
12    x0: initial guess [x, y]
13    lambda0: initial guess for lambda
14    """
15    from scipy.optimize import fsolve
16
17    def equations(vars):
18        x, y, lam = vars
19        gf = grad_f(x, y)
20        gg = grad_g(x, y)
21        # ∇f = λ∇g and g = 0
22        return [
23            gf[0] - lam * gg[0],  # ∂f/∂x = λ ∂g/∂x
24            gf[1] - lam * gg[1],  # ∂f/∂y = λ ∂g/∂y
25            g(x, y)               # g(x, y) = 0
26        ]
27
28    initial_guess = [x0[0], x0[1], lambda0]
29    solution = fsolve(equations, initial_guess)
30    x_opt, y_opt, lambda_opt = solution
31
32    return {
33        'x': x_opt,
34        'y': y_opt,
35        'lambda': lambda_opt,
36        'f_optimal': f(x_opt, y_opt),
37        'constraint_value': g(x_opt, y_opt)
38    }
39
40# Example: Maximize x + y on unit circle
41f = lambda x, y: x + y
42g = lambda x, y: x**2 + y**2 - 1
43grad_f = lambda x, y: [1, 1]
44grad_g = lambda x, y: [2*x, 2*y]
45
46result = lagrange_solver_2d(f, g, grad_f, grad_g, [0.5, 0.5])
47print(f"Optimal point: ({result['x']:.4f}, {result['y']:.4f})")
48print(f"Optimal value: {result['f_optimal']:.4f}")
49print(f"Lagrange multiplier λ = {result['lambda']:.4f}")

Using SciPy for General Constraints

Constrained Optimization with SciPy

🐍scipy_optimization.py

Explanation(4)

Code(67)

4General Constrained Optimizer

This wrapper uses scipy's SLSQP method, which handles both equality (=) and inequality (≥, ≤) constraints efficiently.

17Equality Constraints

Equality constraints like h(x) = 0 are specified with type 'eq'. These correspond to Lagrange multipliers in the dual problem.

25Inequality Constraints

Inequality constraints g(x) ≥ 0 use type 'ineq'. These lead to KKT conditions where the multiplier is non-negative and complementary slackness holds.

44Cobb-Douglas Production

A classic economics example: maximize output Q = √(xy) subject to a budget constraint 2x + 3y ≤ 12 with x, y ≥ 0.

63 lines without explanation

1import numpy as np
2from scipy.optimize import minimize
3
4def constrained_optimization_scipy(
5    objective, x0,
6    eq_constraints=None,
7    ineq_constraints=None,
8    bounds=None
9):
10    """
11    General constrained optimization using scipy.optimize.minimize.
12
13    Uses SLSQP (Sequential Least Squares Programming) method
14    which handles both equality and inequality constraints.
15    """
16    constraints = []
17
18    # Equality constraints: h(x) = 0
19    if eq_constraints:
20        for con in eq_constraints:
21            constraints.append({
22                'type': 'eq',
23                'fun': con
24            })
25
26    # Inequality constraints: g(x) >= 0
27    if ineq_constraints:
28        for con in ineq_constraints:
29            constraints.append({
30                'type': 'ineq',
31                'fun': con
32            })
33
34    result = minimize(
35        objective,
36        x0,
37        method='SLSQP',
38        constraints=constraints,
39        bounds=bounds,
40        options={'disp': True}
41    )
42
43    return result
44
45# Example: Production optimization
46# Maximize output Q = x^0.5 * y^0.5 (Cobb-Douglas)
47# Subject to budget: 2x + 3y <= 12
48# x, y >= 0
49
50def objective(xy):
51    x, y = xy
52    # Minimize negative of output (to maximize)
53    return -(x**0.5 * y**0.5 + 1e-10)  # small epsilon for stability
54
55def budget_constraint(xy):
56    x, y = xy
57    return 12 - 2*x - 3*y  # >= 0 means 2x + 3y <= 12
58
59result = constrained_optimization_scipy(
60    objective,
61    x0=[1.0, 1.0],
62    ineq_constraints=[budget_constraint],
63    bounds=[(0, None), (0, None)]  # x >= 0, y >= 0
64)
65
66print(f"Optimal allocation: x = {result.x[0]:.4f}, y = {result.x[1]:.4f}")
67print(f"Maximum output: {-result.fun:.4f}")

Symbolic Solution with SymPy

Exact Symbolic Solutions

🐍symbolic_lagrange.py

Explanation(4)

Code(45)

3Symbolic Lagrange Solver

SymPy allows us to solve the Lagrange equations analytically, giving exact symbolic solutions rather than numerical approximations.

14Constructing the Lagrangian

The Lagrangian L = f - λg combines the objective and constraint. Taking ∂L/∂λ = 0 recovers the constraint g = 0.

18Gradient Conditions

We take partial derivatives ∂L/∂x = 0 and ∂L/∂y = 0. These give us ∂f/∂x = λ∂g/∂x and ∂f/∂y = λ∂g/∂y.

28Solving the System

SymPy's solve function finds all solutions to the system of polynomial equations, including the Lagrange multiplier values.

41 lines without explanation

1import sympy as sp
2
3def symbolic_lagrange_solve(f_expr, g_expr, variables):
4    """
5    Solve Lagrange multiplier problem symbolically using SymPy.
6
7    f_expr: objective function expression
8    g_expr: constraint expression (= 0)
9    variables: list of symbols [x, y, ...]
10    """
11    # Create Lagrange multiplier symbol
12    lam = sp.Symbol('lambda', real=True)
13
14    # Construct Lagrangian: L = f - λg
15    L = f_expr - lam * g_expr
16
17    # Compute gradient conditions: ∂L/∂xᵢ = 0
18    equations = []
19    for var in variables:
20        eq = sp.diff(L, var)
21        equations.append(sp.Eq(eq, 0))
22
23    # Add constraint: g = 0
24    equations.append(sp.Eq(g_expr, 0))
25
26    # Solve the system
27    all_vars = list(variables) + [lam]
28    solutions = sp.solve(equations, all_vars)
29
30    return solutions, L
31
32# Example: Minimize x² + y² on line x + y = 4
33x, y = sp.symbols('x y', real=True)
34
35f = x**2 + y**2      # Distance squared from origin
36g = x + y - 4        # Constraint: x + y = 4
37
38solutions, L = symbolic_lagrange_solve(f, g, [x, y])
39
40print("Lagrangian:", L)
41print("\nCritical points:")
42for sol in solutions:
43    print(f"  x = {sol[0]}, y = {sol[1]}, λ = {sol[2]}")
44    f_val = f.subs([(x, sol[0]), (y, sol[1])])
45    print(f"  f(x,y) = {f_val}")

Test Your Understanding

Test Your Understanding: Lagrange Multipliers

Question 1 of 8Score: 0/0

What is the geometric interpretation of the Lagrange condition ∇f = λ∇g?

Summary

The Core Idea

At a constrained optimum, the level curve of the objective function is tangent to the constraint curve. This happens when their gradients are parallel: $\nabla f = \lambda \nabla g$ .

Key Equations

Concept	Formula
Lagrangian	L = f - λg
Gradient condition	∇f = λ∇g
Multiple constraints	∇f = λ₁∇g₁ + λ₂∇g₂ + ...
KKT (inequality)	∇f = λ∇g, g ≤ 0, λ ≥ 0, λg = 0

Interpretation of λ

The Lagrange multiplier λ is the shadow price of the constraint.

It measures how much the optimal value of $f$ would change if we relaxed the constraint slightly: $\lambda = \frac{df^*}{dc}$ where $g = c$ is the constraint level.

Key Takeaways

Geometric insight: Optimal points occur where objective level curves are tangent to the constraint
Algebraic method: Solve $\nabla f = \lambda \nabla g$ together with $g = 0$
Multiple constraints: Add one multiplier per constraint; $\nabla f$ is a linear combination of constraint gradients
Economic interpretation: λ is the marginal value of relaxing the constraint
ML applications: SVMs, KKT conditions, and constrained optimization throughout machine learning

The Power of Lagrange Multipliers:

"Transform constrained optimization into finding where gradients align—a beautiful geometric condition with profound practical applications."

Completing Chapter 17: You've mastered Lagrange multipliers, the final and arguably most important topic in partial derivatives! This technique connects calculus to optimization, economics, physics, and machine learning. Next, we'll move to Multiple Integrals, extending integration to higher dimensions.