Introduction
Calculus provides the mathematical machinery for working with continuous probability distributions. We need derivatives to find modes and understand rate of change, integrals to compute probabilities and expected values, and optimization techniques for maximum likelihood estimation.
Why This Matters for ML: Gradient descent, backpropagation, maximum likelihood estimation, and computing expected values all rely heavily on calculus. Understanding these fundamentals is essential for deriving and implementing ML algorithms.
Derivatives
The derivative measures the instantaneous rate of change of a function. For a function f(x), the derivative is:
Essential Derivative Rules
| Rule | Formula |
|---|---|
| Power Rule | d/dx[xⁿ] = nxⁿ⁻¹ |
| Constant Multiple | d/dx[cf(x)] = c·f'(x) |
| Sum Rule | d/dx[f(x) + g(x)] = f'(x) + g'(x) |
| Product Rule | d/dx[f(x)g(x)] = f'(x)g(x) + f(x)g'(x) |
| Quotient Rule | d/dx[f(x)/g(x)] = [f'(x)g(x) - f(x)g'(x)]/g(x)² |
| Chain Rule | d/dx[f(g(x))] = f'(g(x))·g'(x) |
Important Derivatives for Statistics
| Function | Derivative |
|---|---|
| eˣ | eˣ |
| ln(x) | 1/x |
| aˣ | aˣ ln(a) |
| sin(x) | cos(x) |
| cos(x) | -sin(x) |
| logₐ(x) | 1/(x ln(a)) |
Log-Derivative Trick
For probability distributions, we often work with log-likelihoods. The derivative of log f(x) is:
This is called the score function in statistics.
Integrals
The integral represents the area under a curve. For continuous probability distributions, integrals compute probabilities and expected values.
Definite Integrals
Where F(x) is the antiderivative (also called primitive) of f(x), meaning F'(x) = f(x).
Essential Integration Rules
| Rule | Formula |
|---|---|
| Power Rule | ∫xⁿ dx = xⁿ⁺¹/(n+1) + C (n ≠ -1) |
| Exponential | ∫eˣ dx = eˣ + C |
| Natural Log | ∫(1/x) dx = ln|x| + C |
| Substitution | ∫f(g(x))g'(x) dx = ∫f(u) du |
| Parts | ∫u dv = uv - ∫v du |
Integration by Parts
For products of functions, use integration by parts:
LIATE rule for choosing u (in order of preference): Logarithmic, Inverse trig, Algebraic, Trigonometric, Exponential.
Probability Application
Multivariable Calculus
When working with multiple random variables or parameters, we need partial derivatives and multiple integrals.
Partial Derivatives
A partial derivative measures the rate of change with respect to one variable while holding others constant:
Gradient
The gradient is a vector of all partial derivatives:
The gradient points in the direction of steepest ascent.
Double Integrals
For joint probability distributions of two variables:
The Jacobian
When changing variables in multiple integrals, we need the Jacobian determinant:
Transform of Variables
For transforming probability distributions, if Y = g(X):
Optimization
Finding maxima and minima is central to statistical estimation methods like Maximum Likelihood Estimation (MLE).
Finding Critical Points
- Set the first derivative (or gradient) equal to zero: f'(x) = 0
- Solve for x to find critical points
- Use the second derivative test to classify: f''(x) > 0 → minimum, f''(x) < 0 → maximum
Hessian Matrix
For multivariate functions, the Hessian is the matrix of second partial derivatives:
- H positive definite → local minimum
- H negative definite → local maximum
- H indefinite → saddle point
Lagrange Multipliers
For constrained optimization (maximize f(x,y) subject to g(x,y) = c):
This is used extensively in deriving maximum entropy distributions.
Common Integrals in Statistics
These integrals appear frequently when working with probability distributions:
Gaussian Integral
This is fundamental for normalizing the normal distribution.
Gamma Function
Key properties:
- for positive integers
Beta Function
Python Implementation
Python provides powerful tools for symbolic and numerical calculus:
1import sympy as sp
2import numpy as np
3from scipy import integrate
4
5# Define symbolic variable
6x = sp.Symbol('x')
7
8# Derivatives
9f = x**3 + 2*x**2 - 5*x + 1
10f_prime = sp.diff(f, x)
11print(f"f(x) = {f}")
12print(f"f'(x) = {f_prime}") # 3*x**2 + 4*x - 5
13
14# Second derivative
15f_double_prime = sp.diff(f, x, 2)
16print(f"f''(x) = {f_double_prime}") # 6*x + 4
17
18# Indefinite integral
19integral = sp.integrate(f, x)
20print(f"∫f(x)dx = {integral}") # x**4/4 + 2*x**3/3 - 5*x**2/2 + x
21
22# Definite integral
23definite = sp.integrate(f, (x, 0, 2))
24print(f"∫₀²f(x)dx = {definite}") # 14/3Numerical Integration
1import numpy as np
2from scipy import integrate
3
4# Define a function
5def f(x):
6 return np.exp(-x**2)
7
8# Gaussian integral approximation
9result, error = integrate.quad(f, -np.inf, np.inf)
10print(f"∫e^(-x²)dx = {result:.6f}") # ~1.7724 = √π
11
12# Verify
13print(f"√π = {np.sqrt(np.pi):.6f}")
14
15# Double integral for joint density
16def joint_pdf(y, x):
17 """Joint normal PDF (simplified)"""
18 return (1/(2*np.pi)) * np.exp(-0.5*(x**2 + y**2))
19
20# Integrate over a region
21prob, error = integrate.dblquad(
22 joint_pdf,
23 -1, 1, # x limits
24 lambda x: -1, lambda x: 1 # y limits (as functions of x)
25)
26print(f"P(-1<X<1, -1<Y<1) = {prob:.4f}")Gradient and Optimization
1import numpy as np
2from scipy.optimize import minimize
3
4# Define a function and its gradient
5def f(params):
6 x, y = params
7 return (x - 2)**2 + (y - 3)**2 + x*y
8
9def gradient(params):
10 x, y = params
11 df_dx = 2*(x - 2) + y
12 df_dy = 2*(y - 3) + x
13 return np.array([df_dx, df_dy])
14
15# Find minimum
16x0 = np.array([0.0, 0.0]) # Initial guess
17result = minimize(f, x0, jac=gradient, method='BFGS')
18
19print(f"Minimum at: {result.x}")
20print(f"Minimum value: {result.fun}")
21
22# Hessian computation
23def hessian(params):
24 return np.array([[2, 1],
25 [1, 2]])
26
27# Check if positive definite (for minimum)
28H = hessian(result.x)
29eigenvalues = np.linalg.eigvals(H)
30print(f"Hessian eigenvalues: {eigenvalues}")
31print(f"Is minimum: {np.all(eigenvalues > 0)}")Summary
This section covered the calculus fundamentals essential for probability and statistics:
- Derivatives measure rates of change and are used for finding modes and in MLE
- Integrals compute probabilities and expected values for continuous distributions
- Partial derivatives and gradients extend these concepts to multiple variables
- Optimization techniques find parameter estimates in statistical models
- Special functions like Gamma and Beta appear throughout statistics
In the next section, we'll review linear algebra concepts that are essential for multivariate statistics and machine learning.