Learning Objectives
By the end of this section, you will be able to:
📚 Core Knowledge
- • Define eigenvalues and eigenvectors geometrically and algebraically
- • Explain the Spectral Theorem and why it matters for statistics
- • Describe the relationship between trace, determinant, and eigenvalues
- • Understand why covariance matrices have non-negative eigenvalues
🔧 Practical Skills
- • Compute eigenvalues and eigenvectors using NumPy
- • Interpret eigenanalysis results for data analysis
- • Implement the power iteration algorithm from scratch
- • Apply eigendecomposition to real datasets
🧠 Deep Learning Connections
- Principal Component Analysis (PCA) — Eigendecomposition of covariance matrices is the foundation of PCA
- Weight initialization — Xavier/He initialization uses eigenvalue analysis to preserve gradient flow
- Spectral normalization — Controls the largest singular value of weight matrices in GANs
- Graph neural networks — Spectral convolutions use eigenvalues of the graph Laplacian
Where You'll Apply This: Dimensionality reduction, feature extraction, data whitening, spectral clustering, recommender systems, image compression, and understanding the conditioning of optimization problems.
The Big Picture
Every linear transformation can be understood through its eigenvalues and eigenvectors. These special values reveal the intrinsic structure of a transformation — the directions along which it simply stretches or compresses, without rotating.
The Core Insight
When you apply a matrix to most vectors, they change both direction and magnitude. But eigenvectors are special — they only change magnitude. The matrix acts like a simple scaling in those directions.
Eigenvector: Direction unchanged by transformation
Eigenvalue: Scaling factor along that direction
Together: Complete description of the transformation
Historical Context
The eigenvalue problem has a rich history spanning centuries of mathematical development.
Euler & Lagrange (1700s)
First encountered eigenvalues studying the rotation of rigid bodies. The "principal axes" of rotation are eigenvectors! The word "eigen" comes from German meaning "own" or "characteristic."
Karl Pearson (1901)
Invented Principal Component Analysis (PCA) — the most direct statistical application of eigenvalue decomposition. He sought to find the "lines of closest fit" to multidimensional data.
Modern Era
Today, eigendecomposition underlies Google's PageRank, Netflix's recommender system, spectral clustering, and countless ML algorithms. It's the mathematical bridge between linear algebra and statistics.
Eigenvalue Definition
Let's formally define eigenvalues and eigenvectors, then build intuition for what they mean.
The Eigenvalue Equation
where A is an n×n matrix, is a non-zero vector (eigenvector), and is a scalar (eigenvalue)
This equation says: when we apply matrix A to eigenvector , the result is simply scaled by . The direction is preserved (or flipped if ).
| Symbol | Name | Meaning |
|---|---|---|
| A | Matrix | The linear transformation we're analyzing |
| v | Eigenvector | Direction that only gets scaled (not rotated) |
| λ | Eigenvalue | The scaling factor for that direction |
| n | Dimension | Size of the matrix (n×n) |
Geometric Interpretation
Think of a matrix as a transformation that stretches, rotates, and shears space. Most vectors get "scrambled" — they change both direction and length. But eigenvectors are invariant directions: they only stretch or shrink.
If
The transformation stretches space in the eigenvector direction.
If
The transformation compresses space in the eigenvector direction.
If
The transformation flips and scales — direction is reversed.
If
The eigenvector is in the null space — collapsed to zero.
Interactive: 2D Eigenvalue Explorer
Explore how different 2×2 matrices transform the unit circle. The eigenvectors show the directions that only get scaled, and the eigenvalues tell you by how much.
Matrix A
Eigenvalue Analysis
Key Insight
The eigenvectors show directions that are only scaled (not rotated) by the matrix transformation. The eigenvalues tell us the scaling factor in each direction. Notice how the unit circle becomes an ellipse whose axes align with the eigenvectors!
Eigendecomposition
If a matrix has n linearly independent eigenvectors, we can write it in a special factorized form called the eigendecomposition.
Eigendecomposition
Matrix of eigenvectors (columns)
Diagonal matrix of eigenvalues
Inverse of eigenvector matrix
This decomposition is powerful because it lets us understand any power of A easily:
And is trivial — just raise each diagonal element to the power k!
The Spectral Theorem
For symmetric matrices (where ), something beautiful happens. The Spectral Theorem guarantees:
- All eigenvalues are real — no complex numbers.
- Eigenvectors are orthogonal — they form a perpendicular coordinate system.
- Decomposition simplifies to where Q is orthogonal ().
Trace and Determinant Relationships
Two fundamental matrix properties are directly linked to eigenvalues:
Trace = Sum of Eigenvalues
The trace (sum of diagonal elements) equals the sum of all eigenvalues. For a covariance matrix, this is the total variance.
Determinant = Product of Eigenvalues
The determinant equals the product of all eigenvalues. If any eigenvalue is zero, the matrix is singular (non-invertible).
Covariance Matrix Eigenanalysis
The covariance matrix is perhaps the most important matrix in statistics. Its eigendecomposition reveals the fundamental structure of multivariate data.
Covariance Matrix Definition
Or in matrix form: (for centered data)
Key Properties of Covariance Matrices:
- Symmetric: (by construction)
- Positive Semi-Definite: All eigenvalues
- Diagonal = Variances:
- Off-diagonal = Covariances:
Interactive: Covariance Eigendecomposition
Generate correlated 2D data and watch how the eigendecomposition reveals the principal directions of variance. This is PCA in action!
Data Generation
Sample Covariance Matrix Σ
Eigendecomposition (PCA)
The PCA Connection
This is PCA in action! The eigenvectors of the covariance matrix are the principal components, and the eigenvalues tell us how much variance each component explains. PC1 (the eigenvector with the largest eigenvalue) points in the direction of maximum variance in the data. Adjusting correlation rotates the ellipse, while adjusting variances changes its shape.
The PCA Connection
Principal Component Analysis is nothing more than eigendecomposition of the covariance matrix! Each principal component is an eigenvector, and the variance it explains is the corresponding eigenvalue.
PCA Algorithm
- Center the data (subtract the mean from each feature)
- Compute the covariance matrix
- Find eigenvalues and eigenvectors of
- Sort eigenvectors by eigenvalue (descending)
- Project data onto top k eigenvectors for k-dimensional reduction
Computing Eigenvalues: Power Iteration
How do we actually compute eigenvalues? For large matrices, we can't solve the characteristic polynomial directly. The power iteration method is an elegant, iterative approach.
Power Iteration Algorithm
Initialize: = random unit vector
Iterate:
Converges to: Eigenvector with largest |λ|
Why does this work? Express the initial vector in the eigenbasis:
After k iterations:
If , the first term dominates as k → ∞.
Interactive: Power Iteration
Watch the power iteration algorithm converge to the dominant eigenvector. Notice how convergence speed depends on the ratio of eigenvalues.
Matrix A
Algorithm Progress
True Dominant Eigenpair
Algorithm: Power Iteration
1. Start with random vector v&sub0;
2. Multiply: w&subk; = A · v&subk;
3. Normalize: v&subk;+&sub1; = w&subk; / ||w&subk;||
4. Estimate eigenvalue: λ ≈ v&supT;&subk; A v&subk;
5. Repeat until convergence
Convergence rate: O(|λ&sub2;/λ&sub1;|&supk;) — faster when eigenvalues are well-separated
- Inverse iteration: Find the smallest eigenvalue using
- Shifted iteration: Find eigenvalues near a target value
- QR algorithm: Compute all eigenvalues simultaneously
Real-World Applications
AI/ML Connections
Eigenvalue decomposition appears throughout deep learning, often in subtle but crucial ways.
🎲 Weight Initialization
Xavier and He initialization analyze eigenvalues of random matrices to ensure gradients don't explode or vanish. The key insight: random weight matrices should preserve the variance of activations across layers.
📈 Spectral Normalization
In GANs, spectral normalization constrains the largest singular value (related to eigenvalues via SVD) of each weight matrix. This stabilizes training by controlling the Lipschitz constant.
🕸️ Graph Neural Networks
Spectral GNNs define convolutions using eigenvectors of the graph Laplacian. ChebNet and GCN can be understood as polynomial filters in the spectral domain.
🔄 Condition Number
The condition number determines how hard an optimization problem is. High condition numbers mean slow convergence for gradient descent.
- Use randomized algorithms for approximate solutions
- Compute only the top k eigenvalues when that's all you need
- Leverage sparse matrix structure when available
Python Implementation
Let's implement eigenvalue decomposition in Python. We'll cover both the NumPy approach and a from-scratch power iteration.
Here's a complete example with PCA and power iteration:
1import numpy as np
2from numpy.linalg import eigh, norm
3
4# ============================================
5# Example 1: Eigendecomposition of Covariance
6# ============================================
7np.random.seed(42)
8
9# Generate correlated 2D data
10n = 200
11mu = np.array([0, 0])
12cov_true = np.array([[2.0, 1.5],
13 [1.5, 1.0]])
14
15# Sample from multivariate normal
16data = np.random.multivariate_normal(mu, cov_true, n)
17
18# Compute sample covariance
19cov_sample = np.cov(data.T)
20print("Sample Covariance Matrix:")
21print(cov_sample)
22
23# Eigendecomposition
24eigenvalues, eigenvectors = eigh(cov_sample)
25# Sort descending
26idx = np.argsort(eigenvalues)[::-1]
27eigenvalues = eigenvalues[idx]
28eigenvectors = eigenvectors[:, idx]
29
30print(f"\nEigenvalues: {eigenvalues}")
31print(f"Variance explained: {eigenvalues / eigenvalues.sum() * 100}%")
32print(f"\nPC1: {eigenvectors[:, 0]}")
33print(f"PC2: {eigenvectors[:, 1]}")
34
35# ============================================
36# Example 2: Power Iteration from Scratch
37# ============================================
38def power_iteration(A, num_iterations=100, tol=1e-10):
39 """
40 Compute dominant eigenvalue and eigenvector.
41
42 Parameters:
43 A: Square matrix
44 num_iterations: Maximum iterations
45 tol: Convergence tolerance
46
47 Returns:
48 eigenvalue, eigenvector
49 """
50 n = A.shape[0]
51 v = np.random.randn(n)
52 v = v / norm(v)
53
54 eigenvalue_old = 0
55
56 for i in range(num_iterations):
57 # Apply matrix
58 Av = A @ v
59
60 # Rayleigh quotient for eigenvalue estimate
61 eigenvalue = v @ Av
62
63 # Normalize
64 v = Av / norm(Av)
65
66 # Check convergence
67 if abs(eigenvalue - eigenvalue_old) < tol:
68 print(f"Converged after {i+1} iterations")
69 break
70
71 eigenvalue_old = eigenvalue
72
73 return eigenvalue, v
74
75# Test on our covariance matrix
76eigenvalue_power, eigenvector_power = power_iteration(cov_sample)
77print(f"\n=== Power Iteration Results ===")
78print(f"Dominant eigenvalue: {eigenvalue_power:.6f}")
79print(f"True eigenvalue: {eigenvalues[0]:.6f}")
80print(f"Eigenvector: {eigenvector_power}")
81
82# ============================================
83# Example 3: PCA from Scratch
84# ============================================
85def pca(X, n_components=None):
86 """
87 Principal Component Analysis.
88
89 Parameters:
90 X: Data matrix (n_samples, n_features)
91 n_components: Number of components to keep
92
93 Returns:
94 projected_data, components, explained_variance_ratio
95 """
96 # Center the data
97 X_centered = X - X.mean(axis=0)
98
99 # Compute covariance matrix
100 cov = np.cov(X_centered.T)
101
102 # Eigendecomposition
103 eigenvalues, eigenvectors = eigh(cov)
104 idx = np.argsort(eigenvalues)[::-1]
105 eigenvalues = eigenvalues[idx]
106 eigenvectors = eigenvectors[:, idx]
107
108 # Select components
109 if n_components is None:
110 n_components = len(eigenvalues)
111
112 components = eigenvectors[:, :n_components]
113 explained_variance = eigenvalues[:n_components]
114 explained_variance_ratio = explained_variance / eigenvalues.sum()
115
116 # Project data
117 projected = X_centered @ components
118
119 return projected, components, explained_variance_ratio
120
121# Apply PCA
122projected, components, var_ratio = pca(data, n_components=2)
123print(f"\n=== PCA Results ===")
124print(f"Explained variance ratio: {var_ratio}")
125print(f"Total variance explained: {var_ratio.sum():.2%}")Knowledge Check
Test your understanding of eigenvalue decomposition with this interactive quiz.
What is an eigenvalue of a matrix A?
Summary
Key Takeaways
- Eigenvalues and eigenvectors reveal the intrinsic structure of linear transformations — directions that only get scaled, not rotated.
- The eigenvalue equation Av = λv defines eigenvectors as directions preserved by the matrix, with eigenvalues as their scaling factors.
- Symmetric matrices (including covariance matrices) have real eigenvalues and orthogonal eigenvectors — the Spectral Theorem.
- PCA is eigendecomposition of the covariance matrix. Principal components are eigenvectors; explained variance equals eigenvalues.
- Trace = sum of eigenvalues, Determinant = product of eigenvalues. These relationships are fundamental to matrix analysis.
- Power iteration computes the dominant eigenvalue iteratively, forming the basis for algorithms like PageRank and spectral methods.
Looking Ahead: In the next section, we'll explore Singular Value Decomposition (SVD), which generalizes eigendecomposition to rectangular matrices and is even more widely used in machine learning applications.