Introduction
Linear algebra is the mathematics of vectors and matrices. It provides the language for describing multivariate data, covariance structures, and transformations. Nearly every ML algorithm—from linear regression to deep learning—relies heavily on linear algebra.
Why This Matters for ML: Datasets are matrices, model parameters are vectors, and operations like PCA, SVD, and neural network computations are all linear algebra. Understanding these concepts is essential for implementing and optimizing ML algorithms.
Vectors
A vector is an ordered list of numbers. In probability and statistics, vectors represent data points, parameters, or random variable realizations.
Vector Operations
| Operation | Definition | Result |
|---|---|---|
| Addition | x + y | (x₁+y₁, x₂+y₂, ..., xₙ+yₙ) |
| Scalar Multiplication | cx | (cx₁, cx₂, ..., cxₙ) |
| Dot Product | x · y | x₁y₁ + x₂y₂ + ... + xₙyₙ |
| Norm (length) | ||x|| | √(x₁² + x₂² + ... + xₙ²) |
Dot Product Properties
The dot product (inner product) is fundamental:
- Geometric interpretation:
- Orthogonal vectors: x · y = 0 when x ⊥ y
- Norm from dot product:
Matrices
A matrix is a rectangular array of numbers. An m × n matrix has m rows and n columns:
Matrix Terminology
| Term | Definition | Example |
|---|---|---|
| Square | m = n | 3×3 matrix |
| Diagonal | aᵢⱼ = 0 for i ≠ j | Only main diagonal non-zero |
| Identity (I) | δᵢⱼ (1 on diagonal) | AI = IA = A |
| Zero Matrix (0) | All elements are 0 | A + 0 = A |
| Symmetric | A = Aᵀ | Covariance matrices |
Matrix Operations
Matrix Multiplication
For matrices A (m × n) and B (n × p), the product C = AB is (m × p):
Dimension Rule
A(m×n) × B(n×p) = C(m×p). The inner dimensions must match! Matrix multiplication is NOT commutative: AB ≠ BA in general.
Transpose
The transpose switches rows and columns:
Properties:
Matrix Inverse
For a square matrix A, the inverse A⁻¹ satisfies:
Not all matrices are invertible. A matrix is singular(non-invertible) if its determinant is zero.
Determinant
The determinant is a scalar value that characterizes the matrix. For a 2×2 matrix:
- det(A) = 0 ⟺ A is singular (not invertible)
- det(AB) = det(A) · det(B)
- det(A⁻¹) = 1/det(A)
Probability Application
Special Matrices
Covariance Matrix
For a random vector X, the covariance matrix is:
Properties:
- Symmetric: Σ = Σᵀ
- Positive semi-definite: xᵀΣx ≥ 0 for all x
- Diagonal elements: Variances of individual variables
- Off-diagonal elements: Covariances between variables
Positive Definite Matrices
A matrix A is positive definite if:
- All eigenvalues are positive
- Always invertible
- Covariance matrices are positive semi-definite
Orthogonal Matrices
A matrix Q is orthogonal if:
Orthogonal matrices preserve lengths and angles. Their columns (and rows) are orthonormal vectors.
Eigenvalues and Eigenvectors
For a square matrix A, if there exists a scalar λ and non-zero vector v such that:
Then λ is an eigenvalue and v is the corresponding eigenvector.
Computing Eigenvalues
Eigenvalues are found by solving the characteristic equation:
Properties
- Sum of eigenvalues = trace(A) = Σᵢ aᵢᵢ
- Product of eigenvalues = det(A)
- Symmetric matrices have real eigenvalues and orthogonal eigenvectors
PCA Connection
Principal Component Analysis (PCA) finds the eigenvectors of the covariance matrix. The eigenvalues represent the variance explained by each principal component.
Matrix Decompositions
Eigendecomposition
A symmetric matrix can be decomposed as:
Where Q contains eigenvectors and Λ is diagonal with eigenvalues.
Singular Value Decomposition (SVD)
Any m × n matrix can be decomposed as:
- U: m × m orthogonal (left singular vectors)
- Σ: m × n diagonal (singular values)
- V: n × n orthogonal (right singular vectors)
Cholesky Decomposition
For a positive definite matrix A:
Where L is lower triangular. This is used for sampling from multivariate normal distributions.
Python Implementation
1import numpy as np
2
3# Vectors
4x = np.array([1, 2, 3])
5y = np.array([4, 5, 6])
6
7# Vector operations
8print(f"x + y = {x + y}") # [5, 7, 9]
9print(f"2 * x = {2 * x}") # [2, 4, 6]
10print(f"x · y = {np.dot(x, y)}") # 32
11print(f"||x|| = {np.linalg.norm(x)}") # 3.74...
12
13# Matrix creation
14A = np.array([[1, 2], [3, 4]])
15B = np.array([[5, 6], [7, 8]])
16
17# Matrix operations
18print(f"A + B = \n{A + B}")
19print(f"A @ B = \n{A @ B}") # Matrix multiplication
20print(f"A.T = \n{A.T}") # Transpose
21
22# Inverse and determinant
23print(f"det(A) = {np.linalg.det(A)}")
24print(f"A^(-1) = \n{np.linalg.inv(A)}")Eigenvalues and Decompositions
1import numpy as np
2
3# Symmetric positive definite matrix
4A = np.array([[4, 2], [2, 3]])
5
6# Eigendecomposition
7eigenvalues, eigenvectors = np.linalg.eig(A)
8print(f"Eigenvalues: {eigenvalues}")
9print(f"Eigenvectors:\n{eigenvectors}")
10
11# Verify: A @ v = lambda @ v
12for i in range(len(eigenvalues)):
13 v = eigenvectors[:, i]
14 lam = eigenvalues[i]
15 print(f"A @ v{i} = {A @ v}")
16 print(f"λ{i} * v{i} = {lam * v}")
17
18# SVD
19U, S, Vt = np.linalg.svd(A)
20print(f"\nSVD:")
21print(f"U = \n{U}")
22print(f"S = {S}")
23print(f"V^T = \n{Vt}")
24
25# Cholesky decomposition
26L = np.linalg.cholesky(A)
27print(f"\nCholesky L:\n{L}")
28print(f"L @ L.T = \n{L @ L.T}") # Should equal ACovariance Matrix Example
1import numpy as np
2
3# Generate correlated data
4np.random.seed(42)
5n_samples = 1000
6
7# True covariance structure
8true_cov = np.array([[1.0, 0.8],
9 [0.8, 1.0]])
10
11# Generate samples using Cholesky
12L = np.linalg.cholesky(true_cov)
13z = np.random.randn(n_samples, 2)
14X = z @ L.T # Transform standard normal to desired covariance
15
16# Estimate covariance from data
17estimated_cov = np.cov(X.T)
18print(f"True covariance:\n{true_cov}")
19print(f"\nEstimated covariance:\n{estimated_cov}")
20
21# Eigendecomposition of covariance
22eigenvalues, eigenvectors = np.linalg.eig(estimated_cov)
23print(f"\nVariances (eigenvalues): {eigenvalues}")
24print(f"Principal directions:\n{eigenvectors}")Summary
This section covered the linear algebra essentials for statistics:
- Vectors represent data points and enable geometric interpretations
- Matrices store data and linear transformations
- Matrix operations (transpose, inverse, determinant) are fundamental building blocks
- Covariance matrices capture relationships between variables
- Eigendecomposition reveals principal directions and is central to PCA
- SVD and Cholesky decompositions have important statistical applications
In the next section, we'll set up our Python environment with NumPy, SciPy, and other tools we'll use throughout this book.