Boo-AI — Master Artificial Intelligence by Building from Scratch

Introduction

Calculus provides the mathematical machinery for working with continuous probability distributions. We need derivatives to find modes and understand rate of change, integrals to compute probabilities and expected values, and optimization techniques for maximum likelihood estimation.

Why This Matters for ML: Gradient descent, backpropagation, maximum likelihood estimation, and computing expected values all rely heavily on calculus. Understanding these fundamentals is essential for deriving and implementing ML algorithms.

Derivatives

The derivative measures the instantaneous rate of change of a function. For a function f(x), the derivative is:

f'(x) = \frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

Essential Derivative Rules

Rule	Formula
Power Rule	d/dx[xⁿ] = nxⁿ⁻¹
Constant Multiple	d/dx[cf(x)] = c·f'(x)
Sum Rule	d/dx[f(x) + g(x)] = f'(x) + g'(x)
Product Rule	d/dx[f(x)g(x)] = f'(x)g(x) + f(x)g'(x)
Quotient Rule	d/dx[f(x)/g(x)] = [f'(x)g(x) - f(x)g'(x)]/g(x)²
Chain Rule	d/dx[f(g(x))] = f'(g(x))·g'(x)

Important Derivatives for Statistics

Function	Derivative
eˣ	eˣ
ln(x)	1/x
aˣ	aˣ ln(a)
sin(x)	cos(x)
cos(x)	-sin(x)
logₐ(x)	1/(x ln(a))

Log-Derivative Trick

For probability distributions, we often work with log-likelihoods. The derivative of log f(x) is:

\frac{d}{dx}\ln f(x) = \frac{f'(x)}{f(x)}

This is called the score function in statistics.

Integrals

The integral represents the area under a curve. For continuous probability distributions, integrals compute probabilities and expected values.

Definite Integrals

\int_a^b f(x) \, dx = F(b) - F(a)

Where F(x) is the antiderivative (also called primitive) of f(x), meaning F'(x) = f(x).

Essential Integration Rules

Rule	Formula
Power Rule	∫xⁿ dx = xⁿ⁺¹/(n+1) + C (n ≠ -1)
Exponential	∫eˣ dx = eˣ + C
Natural Log	∫(1/x) dx = ln\|x\| + C
Substitution	∫f(g(x))g'(x) dx = ∫f(u) du
Parts	∫u dv = uv - ∫v du

Integration by Parts

For products of functions, use integration by parts:

\int u \, dv = uv - \int v \, du

LIATE rule for choosing u (in order of preference): Logarithmic, Inverse trig, Algebraic, Trigonometric, Exponential.

Probability Application

Computing expected values often requires integration by parts:

E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx

Multivariable Calculus

When working with multiple random variables or parameters, we need partial derivatives and multiple integrals.

Partial Derivatives

A partial derivative measures the rate of change with respect to one variable while holding others constant:

\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h, y) - f(x, y)}{h}

Gradient

The gradient is a vector of all partial derivatives:

\nabla f = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n} \right)

The gradient points in the direction of steepest ascent.

Double Integrals

For joint probability distributions of two variables:

P(a \leq X \leq b, c \leq Y \leq d) = \int_c^d \int_a^b f(x, y) \, dx \, dy

The Jacobian

When changing variables in multiple integrals, we need the Jacobian determinant:

J = \begin{vmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{vmatrix}

Transform of Variables

For transforming probability distributions, if Y = g(X):

f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|

Optimization

Finding maxima and minima is central to statistical estimation methods like Maximum Likelihood Estimation (MLE).

Finding Critical Points

Set the first derivative (or gradient) equal to zero: f'(x) = 0
Solve for x to find critical points
Use the second derivative test to classify: f''(x) > 0 → minimum, f''(x) < 0 → maximum

Hessian Matrix

For multivariate functions, the Hessian is the matrix of second partial derivatives:

H = \begin{pmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} \end{pmatrix}

H positive definite → local minimum
H negative definite → local maximum
H indefinite → saddle point

Lagrange Multipliers

For constrained optimization (maximize f(x,y) subject to g(x,y) = c):

\nabla f = \lambda \nabla g

This is used extensively in deriving maximum entropy distributions.

Common Integrals in Statistics

These integrals appear frequently when working with probability distributions:

Gaussian Integral

\int_{-\infty}^{\infty} e^{-x^2} \, dx = \sqrt{\pi}

This is fundamental for normalizing the normal distribution.

Gamma Function

\Gamma(n) = \int_0^{\infty} x^{n-1} e^{-x} \, dx

Key properties:

$\\Gamma(n) = (n-1)!$ for positive integers
$\\Gamma(n+1) = n \\cdot \\Gamma(n)$
$\\Gamma(1/2) = \\sqrt{\\pi}$

Beta Function

B(\alpha, \beta) = \int_0^1 x^{\alpha-1}(1-x)^{\beta-1} \, dx = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}

Python Implementation

Python provides powerful tools for symbolic and numerical calculus:

🐍calculus_basics.py

1import sympy as sp
2import numpy as np
3from scipy import integrate
4
5# Define symbolic variable
6x = sp.Symbol('x')
7
8# Derivatives
9f = x**3 + 2*x**2 - 5*x + 1
10f_prime = sp.diff(f, x)
11print(f"f(x) = {f}")
12print(f"f'(x) = {f_prime}")  # 3*x**2 + 4*x - 5
13
14# Second derivative
15f_double_prime = sp.diff(f, x, 2)
16print(f"f''(x) = {f_double_prime}")  # 6*x + 4
17
18# Indefinite integral
19integral = sp.integrate(f, x)
20print(f"∫f(x)dx = {integral}")  # x**4/4 + 2*x**3/3 - 5*x**2/2 + x
21
22# Definite integral
23definite = sp.integrate(f, (x, 0, 2))
24print(f"∫₀²f(x)dx = {definite}")  # 14/3

Numerical Integration

🐍numerical_integration.py

1import numpy as np
2from scipy import integrate
3
4# Define a function
5def f(x):
6    return np.exp(-x**2)
7
8# Gaussian integral approximation
9result, error = integrate.quad(f, -np.inf, np.inf)
10print(f"∫e^(-x²)dx = {result:.6f}")  # ~1.7724 = √π
11
12# Verify
13print(f"√π = {np.sqrt(np.pi):.6f}")
14
15# Double integral for joint density
16def joint_pdf(y, x):
17    """Joint normal PDF (simplified)"""
18    return (1/(2*np.pi)) * np.exp(-0.5*(x**2 + y**2))
19
20# Integrate over a region
21prob, error = integrate.dblquad(
22    joint_pdf,
23    -1, 1,  # x limits
24    lambda x: -1, lambda x: 1  # y limits (as functions of x)
25)
26print(f"P(-1<X<1, -1<Y<1) = {prob:.4f}")

Gradient and Optimization

🐍optimization.py

1import numpy as np
2from scipy.optimize import minimize
3
4# Define a function and its gradient
5def f(params):
6    x, y = params
7    return (x - 2)**2 + (y - 3)**2 + x*y
8
9def gradient(params):
10    x, y = params
11    df_dx = 2*(x - 2) + y
12    df_dy = 2*(y - 3) + x
13    return np.array([df_dx, df_dy])
14
15# Find minimum
16x0 = np.array([0.0, 0.0])  # Initial guess
17result = minimize(f, x0, jac=gradient, method='BFGS')
18
19print(f"Minimum at: {result.x}")
20print(f"Minimum value: {result.fun}")
21
22# Hessian computation
23def hessian(params):
24    return np.array([[2, 1],
25                     [1, 2]])
26
27# Check if positive definite (for minimum)
28H = hessian(result.x)
29eigenvalues = np.linalg.eigvals(H)
30print(f"Hessian eigenvalues: {eigenvalues}")
31print(f"Is minimum: {np.all(eigenvalues > 0)}")

Summary

This section covered the calculus fundamentals essential for probability and statistics:

Derivatives measure rates of change and are used for finding modes and in MLE
Integrals compute probabilities and expected values for continuous distributions
Partial derivatives and gradients extend these concepts to multiple variables
Optimization techniques find parameter estimates in statistical models
Special functions like Gamma and Beta appear throughout statistics

In the next section, we'll review linear algebra concepts that are essential for multivariate statistics and machine learning.