Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

📚 Core Knowledge

• Define eigenvalues and eigenvectors geometrically and algebraically
• Explain the Spectral Theorem and why it matters for statistics
• Describe the relationship between trace, determinant, and eigenvalues
• Understand why covariance matrices have non-negative eigenvalues

🔧 Practical Skills

• Compute eigenvalues and eigenvectors using NumPy
• Interpret eigenanalysis results for data analysis
• Implement the power iteration algorithm from scratch
• Apply eigendecomposition to real datasets

🧠 Deep Learning Connections

Principal Component Analysis (PCA) — Eigendecomposition of covariance matrices is the foundation of PCA
Weight initialization — Xavier/He initialization uses eigenvalue analysis to preserve gradient flow
Spectral normalization — Controls the largest singular value of weight matrices in GANs
Graph neural networks — Spectral convolutions use eigenvalues of the graph Laplacian

Where You'll Apply This: Dimensionality reduction, feature extraction, data whitening, spectral clustering, recommender systems, image compression, and understanding the conditioning of optimization problems.

The Big Picture

Every linear transformation can be understood through its eigenvalues and eigenvectors. These special values reveal the intrinsic structure of a transformation — the directions along which it simply stretches or compresses, without rotating.

The Core Insight

When you apply a matrix to most vectors, they change both direction and magnitude. But eigenvectors are special — they only change magnitude. The matrix acts like a simple scaling in those directions.

🎯

Eigenvector: Direction unchanged by transformation

📏

Eigenvalue: Scaling factor along that direction

🔮

Together: Complete description of the transformation

Historical Context

The eigenvalue problem has a rich history spanning centuries of mathematical development.

📜

Euler & Lagrange (1700s)

First encountered eigenvalues studying the rotation of rigid bodies. The "principal axes" of rotation are eigenvectors! The word "eigen" comes from German meaning "own" or "characteristic."

🧮

Karl Pearson (1901)

Invented Principal Component Analysis (PCA) — the most direct statistical application of eigenvalue decomposition. He sought to find the "lines of closest fit" to multidimensional data.

💻

Modern Era

Today, eigendecomposition underlies Google's PageRank, Netflix's recommender system, spectral clustering, and countless ML algorithms. It's the mathematical bridge between linear algebra and statistics.

Eigenvalue Definition

Let's formally define eigenvalues and eigenvectors, then build intuition for what they mean.

The Eigenvalue Equation

A\mathbf{v} = \lambda \mathbf{v}

where A is an n×n matrix, $\mathbf{v}$ is a non-zero vector (eigenvector), and $\lambda$ is a scalar (eigenvalue)

This equation says: when we apply matrix A to eigenvector $\mathbf{v}$ , the result is simply $\mathbf{v}$ scaled by $\lambda$ . The direction is preserved (or flipped if $\lambda < 0$ ).

Symbol	Name	Meaning
A	Matrix	The linear transformation we're analyzing
v	Eigenvector	Direction that only gets scaled (not rotated)
λ	Eigenvalue	The scaling factor for that direction
n	Dimension	Size of the matrix (n×n)

Geometric Interpretation

Think of a matrix as a transformation that stretches, rotates, and shears space. Most vectors get "scrambled" — they change both direction and length. But eigenvectors are invariant directions: they only stretch or shrink.

If $\lambda > 1$

The transformation stretches space in the eigenvector direction.

If $0 < \lambda < 1$

The transformation compresses space in the eigenvector direction.

If $\lambda < 0$

The transformation flips and scales — direction is reversed.

If $\lambda = 0$

The eigenvector is in the null space — collapsed to zero.

Interactive: 2D Eigenvalue Explorer

Explore how different 2×2 matrices transform the unit circle. The eigenvectors show the directions that only get scaled, and the eigenvalues tell you by how much.

🔷Eigenvalue Decomposition Visualizer

Matrix A

[

2.00

1.00

3.00

]

a₁₁2.00

a₁₂1.00

a₂₁1.00

a₂₂3.00

Eigenvalue Analysis

λ&sub1; (larger)

3.618

λ&sub2; (smaller)

1.382

v&sub1; = [0.526, 0.851]

v&sub2; = [0.851, -0.526]

Trace: 5.000 = λ&sub1; + λ&sub2;

Determinant: 5.000 = λ&sub1; × λ&sub2;

Unit circle

Transformed

v&sub1; (λ&sub1;)

v&sub2; (λ&sub2;)

Key Insight

The eigenvectors show directions that are only scaled (not rotated) by the matrix transformation. The eigenvalues tell us the scaling factor in each direction. Notice how the unit circle becomes an ellipse whose axes align with the eigenvectors!

Eigendecomposition

If a matrix has n linearly independent eigenvectors, we can write it in a special factorized form called the eigendecomposition.

Eigendecomposition

A = V \Lambda V^{-1}

V

Matrix of eigenvectors (columns)

\Lambda

Diagonal matrix of eigenvalues

V^{-1}

Inverse of eigenvector matrix

This decomposition is powerful because it lets us understand any power of A easily:

A^k = V \Lambda^k V^{-1}

And $\Lambda^k$ is trivial — just raise each diagonal element to the power k!

The Spectral Theorem

For symmetric matrices (where $A = A^T$ ), something beautiful happens. The Spectral Theorem guarantees:

All eigenvalues are real — no complex numbers.
Eigenvectors are orthogonal — they form a perpendicular coordinate system.
Decomposition simplifies to $A = Q \Lambda Q^T$ where Q is orthogonal ( $Q^{-1} = Q^T$ ).

Why This Matters for Statistics: Covariance matrices are always symmetric! This means PCA always works — we're guaranteed real eigenvalues and orthogonal principal components.

Trace and Determinant Relationships

Two fundamental matrix properties are directly linked to eigenvalues:

Trace = Sum of Eigenvalues

\text{tr}(A) = \sum_{i=1}^{n} \lambda_i

The trace (sum of diagonal elements) equals the sum of all eigenvalues. For a covariance matrix, this is the total variance.

Determinant = Product of Eigenvalues

\det(A) = \prod_{i=1}^{n} \lambda_i

The determinant equals the product of all eigenvalues. If any eigenvalue is zero, the matrix is singular (non-invertible).

Covariance Matrix Eigenanalysis

The covariance matrix is perhaps the most important matrix in statistics. Its eigendecomposition reveals the fundamental structure of multivariate data.

Covariance Matrix Definition

\Sigma = \frac{1}{n-1} \sum_{i=1}^{n} (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T

Or in matrix form: $\Sigma = \frac{1}{n-1} X^T X$ (for centered data)

Key Properties of Covariance Matrices:

Symmetric: $\Sigma = \Sigma^T$ (by construction)
Positive Semi-Definite: All eigenvalues $\lambda_i \geq 0$
Diagonal = Variances: $\Sigma_{ii} = \text{Var}(X_i)$
Off-diagonal = Covariances: $\Sigma_{ij} = \text{Cov}(X_i, X_j)$

Interactive: Covariance Eigendecomposition

Generate correlated 2D data and watch how the eigendecomposition reveals the principal directions of variance. This is PCA in action!

📊Covariance Matrix Eigendecomposition

Data Generation

Variance X (σ&sub1;²)2.00

Variance Y (σ&sub2;²)0.50

Correlation (ρ)0.70

Number of Points200

Sample Covariance Matrix Σ

[

1.934

0.743

0.560

]

Note: Covariance matrices are always symmetric and positive semi-definite

Eigendecomposition (PCA)

PC1 (First Principal Component)

λ&sub1; = 2.258

[0.916, 0.401]

90.6% variance

PC2 (Second Principal Component)

λ&sub2; = 0.235

[-0.401, 0.916]

9.4% variance

Data points

PC1

PC2

The PCA Connection

This is PCA in action! The eigenvectors of the covariance matrix are the principal components, and the eigenvalues tell us how much variance each component explains. PC1 (the eigenvector with the largest eigenvalue) points in the direction of maximum variance in the data. Adjusting correlation rotates the ellipse, while adjusting variances changes its shape.

The PCA Connection

Principal Component Analysis is nothing more than eigendecomposition of the covariance matrix! Each principal component is an eigenvector, and the variance it explains is the corresponding eigenvalue.

PCA Algorithm

Center the data (subtract the mean from each feature)
Compute the covariance matrix $\Sigma$
Find eigenvalues and eigenvectors of $\Sigma$
Sort eigenvectors by eigenvalue (descending)
Project data onto top k eigenvectors for k-dimensional reduction

Variance Explained: The proportion of variance explained by the i-th PC is:

\frac{\lambda_i}{\sum_j \lambda_j} = \frac{\lambda_i}{\text{tr}(\Sigma)}

Computing Eigenvalues: Power Iteration

How do we actually compute eigenvalues? For large matrices, we can't solve the characteristic polynomial directly. The power iteration method is an elegant, iterative approach.

Power Iteration Algorithm

Initialize: $\mathbf{v}_0$ = random unit vector

Iterate: $\mathbf{v}_{k+1} = \frac{A\mathbf{v}_k}{\|A\mathbf{v}_k\|}$

Converges to: Eigenvector with largest |λ|

Why does this work? Express the initial vector in the eigenbasis:

\mathbf{v}_0 = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \ldots + c_n \mathbf{v}_n

After k iterations:

A^k \mathbf{v}_0 = c_1 \lambda_1^k \mathbf{v}_1 + c_2 \lambda_2^k \mathbf{v}_2 + \ldots

If $|\lambda_1| > |\lambda_2|$ , the first term dominates as k → ∞.

Interactive: Power Iteration

Watch the power iteration algorithm converge to the dominant eigenvector. Notice how convergence speed depends on the ratio of eigenvalues.

🔄Power Iteration Method

Matrix A

a₁₁

2.0

a₁₂

1.0

a₂₁

1.0

a₂₂

3.0

Algorithm Progress

Iteration:0

Current Vector

[1.0000, 0.0000]

λ Estimate

—

Angle Error58.28°

Eigenvalue Error3.6180

True Dominant Eigenpair

λ&sub1; = 3.6180

v&sub1; = [0.5257, 0.8507]

True v&sub1;

Current estimate

History

Algorithm: Power Iteration

1. Start with random vector v&sub0;

2. Multiply: w&subk; = A · v&subk;

3. Normalize: v&subk;+&sub1; = w&subk; / ||w&subk;||

4. Estimate eigenvalue: λ ≈ v&supT;&subk; A v&subk;

5. Repeat until convergence

Convergence rate: O(|λ&sub2;/λ&sub1;|&supk;) — faster when eigenvalues are well-separated

Practical Variants: Real implementations use more sophisticated algorithms:

Inverse iteration: Find the smallest eigenvalue using $A^{-1}$
Shifted iteration: Find eigenvalues near a target value
QR algorithm: Compute all eigenvalues simultaneously

Real-World Applications

AI/ML Connections

Eigenvalue decomposition appears throughout deep learning, often in subtle but crucial ways.

🎲 Weight Initialization

Xavier and He initialization analyze eigenvalues of random matrices to ensure gradients don't explode or vanish. The key insight: random weight matrices should preserve the variance of activations across layers.

📈 Spectral Normalization

In GANs, spectral normalization constrains the largest singular value (related to eigenvalues via SVD) of each weight matrix. This stabilizes training by controlling the Lipschitz constant.

🕸️ Graph Neural Networks

Spectral GNNs define convolutions using eigenvectors of the graph Laplacian. ChebNet and GCN can be understood as polynomial filters in the spectral domain.

🔄 Condition Number

The condition number $\kappa = \lambda_{max}/\lambda_{min}$ determines how hard an optimization problem is. High condition numbers mean slow convergence for gradient descent.

Computational Complexity: Full eigendecomposition is O(n³) — expensive for large matrices! In practice:

Use randomized algorithms for approximate solutions
Compute only the top k eigenvalues when that's all you need
Leverage sparse matrix structure when available

Python Implementation

Let's implement eigenvalue decomposition in Python. We'll cover both the NumPy approach and a from-scratch power iteration.

Eigenvalue Decomposition with NumPy

🐍eigendecomposition.py

Explanation(7)

Code(27)

1Import NumPy

NumPy provides the linalg module with efficient, numerically stable implementations of eigenvalue algorithms.

4Function Definition

We create a wrapper that handles both symmetric and general matrices, returning eigenvalues in descending order.

16Convert to Array

Ensure the input is a NumPy array with float64 precision for numerical stability.

19Symmetry Check

Check if the matrix is symmetric (A = Aᵀ). Symmetric matrices have special properties: real eigenvalues and orthogonal eigenvectors.

22Use eigh for Symmetric

np.linalg.eigh is optimized for symmetric matrices. It's faster and more numerically stable than the general eig function.

EXAMPLE

Covariance matrices are always symmetric, so always use eigh for PCA!

25Use eig for General

np.linalg.eig handles general (possibly non-symmetric) matrices. Eigenvalues may be complex.

28Sort by Eigenvalue

Sort in descending order so the dominant eigenvalue (and its eigenvector) comes first. This is the convention used in PCA.

20 lines without explanation

1import numpy as np
2
3# Robust eigenvalue decomposition
4def eigendecompose(matrix, descending=True):
5    """
6    Compute eigenvalues and eigenvectors.
7
8    For symmetric matrices, uses eigh (more stable).
9    For general matrices, uses eig.
10
11    Returns eigenvalues sorted in descending order.
12    """
13    # Convert to numpy array
14    A = np.array(matrix, dtype=np.float64)
15
16    # Check if symmetric
17    if np.allclose(A, A.T):
18        eigenvalues, eigenvectors = np.linalg.eigh(A)
19    else:
20        eigenvalues, eigenvectors = np.linalg.eig(A)
21
22    # Sort by eigenvalue (descending for PCA convention)
23    idx = np.argsort(eigenvalues)[::-1] if descending else np.argsort(eigenvalues)
24    eigenvalues = eigenvalues[idx]
25    eigenvectors = eigenvectors[:, idx]
26
27    return eigenvalues, eigenvectors

Here's a complete example with PCA and power iteration:

🐍python

1import numpy as np
2from numpy.linalg import eigh, norm
3
4# ============================================
5# Example 1: Eigendecomposition of Covariance
6# ============================================
7np.random.seed(42)
8
9# Generate correlated 2D data
10n = 200
11mu = np.array([0, 0])
12cov_true = np.array([[2.0, 1.5],
13                      [1.5, 1.0]])
14
15# Sample from multivariate normal
16data = np.random.multivariate_normal(mu, cov_true, n)
17
18# Compute sample covariance
19cov_sample = np.cov(data.T)
20print("Sample Covariance Matrix:")
21print(cov_sample)
22
23# Eigendecomposition
24eigenvalues, eigenvectors = eigh(cov_sample)
25# Sort descending
26idx = np.argsort(eigenvalues)[::-1]
27eigenvalues = eigenvalues[idx]
28eigenvectors = eigenvectors[:, idx]
29
30print(f"\nEigenvalues: {eigenvalues}")
31print(f"Variance explained: {eigenvalues / eigenvalues.sum() * 100}%")
32print(f"\nPC1: {eigenvectors[:, 0]}")
33print(f"PC2: {eigenvectors[:, 1]}")
34
35# ============================================
36# Example 2: Power Iteration from Scratch
37# ============================================
38def power_iteration(A, num_iterations=100, tol=1e-10):
39    """
40    Compute dominant eigenvalue and eigenvector.
41
42    Parameters:
43        A: Square matrix
44        num_iterations: Maximum iterations
45        tol: Convergence tolerance
46
47    Returns:
48        eigenvalue, eigenvector
49    """
50    n = A.shape[0]
51    v = np.random.randn(n)
52    v = v / norm(v)
53
54    eigenvalue_old = 0
55
56    for i in range(num_iterations):
57        # Apply matrix
58        Av = A @ v
59
60        # Rayleigh quotient for eigenvalue estimate
61        eigenvalue = v @ Av
62
63        # Normalize
64        v = Av / norm(Av)
65
66        # Check convergence
67        if abs(eigenvalue - eigenvalue_old) < tol:
68            print(f"Converged after {i+1} iterations")
69            break
70
71        eigenvalue_old = eigenvalue
72
73    return eigenvalue, v
74
75# Test on our covariance matrix
76eigenvalue_power, eigenvector_power = power_iteration(cov_sample)
77print(f"\n=== Power Iteration Results ===")
78print(f"Dominant eigenvalue: {eigenvalue_power:.6f}")
79print(f"True eigenvalue:     {eigenvalues[0]:.6f}")
80print(f"Eigenvector: {eigenvector_power}")
81
82# ============================================
83# Example 3: PCA from Scratch
84# ============================================
85def pca(X, n_components=None):
86    """
87    Principal Component Analysis.
88
89    Parameters:
90        X: Data matrix (n_samples, n_features)
91        n_components: Number of components to keep
92
93    Returns:
94        projected_data, components, explained_variance_ratio
95    """
96    # Center the data
97    X_centered = X - X.mean(axis=0)
98
99    # Compute covariance matrix
100    cov = np.cov(X_centered.T)
101
102    # Eigendecomposition
103    eigenvalues, eigenvectors = eigh(cov)
104    idx = np.argsort(eigenvalues)[::-1]
105    eigenvalues = eigenvalues[idx]
106    eigenvectors = eigenvectors[:, idx]
107
108    # Select components
109    if n_components is None:
110        n_components = len(eigenvalues)
111
112    components = eigenvectors[:, :n_components]
113    explained_variance = eigenvalues[:n_components]
114    explained_variance_ratio = explained_variance / eigenvalues.sum()
115
116    # Project data
117    projected = X_centered @ components
118
119    return projected, components, explained_variance_ratio
120
121# Apply PCA
122projected, components, var_ratio = pca(data, n_components=2)
123print(f"\n=== PCA Results ===")
124print(f"Explained variance ratio: {var_ratio}")
125print(f"Total variance explained: {var_ratio.sum():.2%}")

Knowledge Check

Test your understanding of eigenvalue decomposition with this interactive quiz.

🧠Knowledge Check: Eigenvalue Decomposition

Question 1 of 7

What is an eigenvalue of a matrix A?

Score: 0 / 7

Summary

Key Takeaways

Eigenvalues and eigenvectors reveal the intrinsic structure of linear transformations — directions that only get scaled, not rotated.
The eigenvalue equation Av = λv defines eigenvectors as directions preserved by the matrix, with eigenvalues as their scaling factors.
Symmetric matrices (including covariance matrices) have real eigenvalues and orthogonal eigenvectors — the Spectral Theorem.
PCA is eigendecomposition of the covariance matrix. Principal components are eigenvectors; explained variance equals eigenvalues.
Trace = sum of eigenvalues, Determinant = product of eigenvalues. These relationships are fundamental to matrix analysis.
Power iteration computes the dominant eigenvalue iteratively, forming the basis for algorithms like PageRank and spectral methods.

Looking Ahead: In the next section, we'll explore Singular Value Decomposition (SVD), which generalizes eigendecomposition to rectangular matrices and is even more widely used in machine learning applications.

Learning Objectives

📚 Core Knowledge

🔧 Practical Skills

🧠 Deep Learning Connections

The Big Picture

The Core Insight

Historical Context

Euler & Lagrange (1700s)

Karl Pearson (1901)

Modern Era

Eigenvalue Definition

The Eigenvalue Equation

Geometric Interpretation

If λ>1\lambda > 1λ>1

If 0<λ<10 < \lambda < 10<λ<1

If λ<0\lambda < 0λ<0

If λ=0\lambda = 0λ=0

Interactive: 2D Eigenvalue Explorer

Matrix A

Eigenvalue Analysis

Key Insight

Eigendecomposition

Eigendecomposition

The Spectral Theorem

Trace and Determinant Relationships

Trace = Sum of Eigenvalues

Determinant = Product of Eigenvalues

Covariance Matrix Eigenanalysis

Covariance Matrix Definition

Interactive: Covariance Eigendecomposition

Data Generation

Sample Covariance Matrix Σ

Eigendecomposition (PCA)

The PCA Connection

The PCA Connection

PCA Algorithm

Computing Eigenvalues: Power Iteration

Power Iteration Algorithm

Interactive: Power Iteration

Matrix A

Algorithm Progress

True Dominant Eigenpair

Algorithm: Power Iteration

Real-World Applications

🖼️Image Compression with PCA

🔍Google PageRank

📊Spectral Clustering

AI/ML Connections

🎲 Weight Initialization

📈 Spectral Normalization

🕸️ Graph Neural Networks

🔄 Condition Number

Python Implementation

Knowledge Check

What is an eigenvalue of a matrix A?

Summary

Key Takeaways

If $\lambda > 1$

If $0 < \lambda < 1$

If $\lambda < 0$

If $\lambda = 0$