Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand what linear algebra is at its core: the mathematics of vectors, transformations, and spaces
See the historical problems that motivated its development, from solving systems of equations to understanding quantum mechanics
Recognize the three fundamental pillars: vectors (structured data), linear transformations (structure-preserving maps), and vector spaces (the worlds where vectors live)
Grasp the golden rule of linearity and why it makes linear algebra uniquely powerful for computation
Survey the vast landscape of applications, from physics and engineering to computer graphics and machine learning
Interactively explore how matrices transform space, building geometric intuition that will guide you through the entire book

Why This Matters

Linear algebra is not just a prerequisite course to check off a list. It is the computational backbone of virtually all modern technology: Google's search algorithm, Netflix's recommendation engine, self-driving cars, medical imaging, computer graphics, and the large language models behind ChatGPT and Claude all rest on the mathematics you are about to learn.

The Big Picture: A Language for Structure

Linear algebra is the mathematics of organized information. If calculus is the mathematics of change, then linear algebra is the mathematics of structure: how data is arranged, how it can be transformed, and what remains invariant through those transformations.

At its heart, linear algebra asks a deceptively simple question: what happens when we systematically process structured collections of numbers? A list of numbers can represent a position in space, a color, a sound wave, or the weights of a neural network. A grid of numbers (a matrix) can represent a transformation, a system of equations, a graph of connections, or an image. Linear algebra gives us a unified language to reason about all of these.

The word "linear" refers to a specific and profound constraint: the operations we study must respect addition and scaling. This may sound restrictive, but it is precisely this constraint that makes linear algebra so powerful. It means that complex problems can be broken into simpler pieces, solved independently, and reassembled. This is the essence of tractability in mathematics and the reason linear algebra sits at the foundation of modern computation.

A Brief History

The ideas behind linear algebra emerged long before the formal theory. Around 200 BCE, Chinese mathematicians described methods for solving systems of linear equations in the Jiuzhang Suanshu (Nine Chapters on the Mathematical Art), using techniques remarkably similar to what we now call Gaussian elimination.

The modern theory crystallized in the 19th century. In 1844, Hermann Grassmann published his visionary Ausdehnungslehre (Theory of Extension), introducing the concept of abstract vector spaces decades before the world was ready for it. In 1858, Arthur Cayley formalized matrix algebra, giving us the notation and operations we use today. These tools converged when quantum mechanics demanded infinite-dimensional vector spaces in the early 20th century, and the field matured rapidly.

Today, linear algebra has become the most computationally important branch of mathematics. The rise of digital computers transformed it from an abstract theory into a practical engine: every GPU on the planet is essentially a machine for performing matrix multiplications at staggering speed. When you train a neural network, you are doing linear algebra billions of times per second.

The Three Pillars of Linear Algebra

Linear algebra rests on three fundamental concepts that interlock like gears in a machine. Understanding their relationships is the key to mastering the entire subject.

Vectors: Information with Structure

A vector is an object that carries structured information. In the geometric picture, a vector is an arrow with a direction and a magnitude. In the algebraic picture, it is an ordered list of numbers. Both views describe the same mathematical object, and the power of linear algebra comes from moving fluently between them.

Consider a vector in two dimensions: $\mathbf{v} = (3, 2)$ . Geometrically, this is an arrow pointing 3 units right and 2 units up from the origin. Algebraically, it is a pair of numbers that could represent anything: a position on a map, a velocity, a force, or the amounts of two ingredients in a recipe.

Two fundamental operations define what we can do with vectors:

Addition: Given two vectors $\mathbf{u}$ and $\mathbf{v}$ , their sum $\mathbf{u} + \mathbf{v}$ is found by placing them tip-to-tail. This is the parallelogram law: the sum is the diagonal of the parallelogram formed by the two vectors.
Scalar multiplication: Multiplying a vector $\mathbf{v}$ by a number $c$ stretches it by a factor of $c$ . When $c > 1$ , the vector grows; when $0 < c < 1$ , it shrinks; when $c < 0$ , it reverses direction.

Interactive: Vector Operations

Vector v₁

x = 2.0

y = 1.0

Vector v₂

x = 1.0

y = 2.0

Result: v₁ + v₂

(3.0, 3.0)

The green arrow is the sum v₁ + v₂, reached by placing vectors tip-to-tail. The dashed lines show the parallelogram law.

Transformations: Machines That Reshape Space

A linear transformation is a function that takes every vector to a new vector while preserving the structure of addition and scaling. Think of it as a machine: you feed in a vector, and it outputs a transformed vector. The entire space warps, but in a very disciplined way.

In two dimensions, a linear transformation can rotate the plane, stretch it, shear it, reflect it, or project it down to a line. The remarkable fact is that every such transformation can be encoded as a $2 \times 2$ matrix. If the transformation sends the basis vector $\mathbf{e}_1 = (1, 0)$ to $(a, c)$ and $\mathbf{e}_2 = (0, 1)$ to $(b, d)$ , then the matrix is:

\displaystyle A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}

This is a deep insight: a matrix is just a compact way to record where the basis vectors go. Once you know where $\mathbf{e}_1$ and $\mathbf{e}_2$ land, you know where every vector in the plane will land, because every vector is a linear combination of the basis vectors.

To transform a specific vector $\mathbf{v} = (x, y)$ , we compute the matrix-vector product:

\displaystyle A\mathbf{v} = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = x \begin{pmatrix} a \\ c \end{pmatrix} + y \begin{pmatrix} b \\ d \end{pmatrix}

Read that formula carefully: the output is $x$ times the first column plus $y$ times the second column. The columns of a matrix are the transformed basis vectors. This single idea unlocks the entire subject.

Spaces: The Universes Vectors Inhabit

A vector space is a complete universe of vectors, closed under addition and scalar multiplication. The familiar 2D plane $\mathbb{R}^2$ is a vector space. So is 3D space $\mathbb{R}^3$ . But vector spaces can have any number of dimensions, including hundreds, thousands, or even infinitely many.

The dimension of a vector space tells you how many independent directions exist within it. A line is 1-dimensional, a plane is 2-dimensional, and the space of all RGB colors is 3-dimensional. In machine learning, a word embedding space might have 768 dimensions, where each dimension captures some subtle aspect of meaning.

Subspaces are "slices" of a larger space that are themselves vector spaces. A line through the origin in $\mathbb{R}^2$ is a subspace. The column space and null space of a matrix are subspaces that tell us fundamental things about what the corresponding transformation can and cannot do. Understanding subspaces is the key to understanding the structure of linear systems.

The Golden Rule of Linearity

The entire edifice of linear algebra rests on one principle: linearity. A function $f$ is linear if and only if it satisfies the superposition principle:

\displaystyle f(\alpha\mathbf{u} + \beta\mathbf{v}) = \alpha\, f(\mathbf{u}) + \beta\, f(\mathbf{v})

for all vectors $\mathbf{u}, \mathbf{v}$ and all scalars $\alpha, \beta$ . In plain language: you can decompose, process the pieces, and recombine. The result is the same as if you had processed the whole thing at once.

This principle is the reason linear algebra is so computationally tractable:

Decomposability: Any complex input can be broken into simple basis components.
Predictability: Knowing what $f$ does to the basis vectors tells you what it does to every vector.
Composability: Applying one transformation after another corresponds to matrix multiplication.

The Power of Linearity

Nonlinear problems are often intractable. The standard approach in science and engineering is to approximate them with linear ones (think Taylor series, linearization, tangent planes). Linear algebra provides the toolkit for solving these approximations. This is why it appears everywhere: it is the universal first-order approximation tool.

Where Linear Algebra Lives

Linear algebra is not confined to mathematics classrooms. It is the operational language of dozens of fields. Here is a sampling of how the concepts you will learn in this book appear in the real world:

Field	Application	Key LA Concept
Physics	Quantum states and measurements	Vector spaces, eigenvalues, Hermitian operators
Engineering	Circuit analysis (Kirchhoff’s laws)	Systems of linear equations
Computer Graphics	3D rotations, camera projections	Matrix transformations, homogeneous coordinates
Economics	Leontief input–output models	Matrix inverses, systems of equations
Statistics	Principal Component Analysis (PCA)	Eigenvalues, covariance matrices
Biology	Population dynamics (Leslie matrices)	Matrix powers, eigenvalue decomposition
Signal Processing	Fourier transforms, filtering	Orthogonal bases, inner products
Robotics	Kinematics and control systems	Matrix exponentials, state-space models
Machine Learning	Neural networks, embeddings, attention	Matrix multiplication, SVD, optimization
Structural Eng.	Finite element analysis	Sparse matrices, linear systems

Linear Algebra Powers Modern AI

Modern artificial intelligence is, at its core, a sequence of linear algebra operations punctuated by simple nonlinearities. Understanding this connection deeply will transform how you see both the mathematics and the technology.

Neural Networks Are Matrix Machines

Every layer of a neural network performs the same fundamental operation: multiply a vector of inputs $\mathbf{x}$ by a weight matrix $W$ , add a bias vector $\mathbf{b}$ , and apply a nonlinear activation function $\sigma$ :

\displaystyle \mathbf{h} = \sigma(W\mathbf{x} + \mathbf{b})

The weight matrix $W$ is a linear transformation that projects the input into a new representation space. The activation function $\sigma$ adds the nonlinearity needed to learn complex patterns. Training a neural network means finding the right matrices $W$ through gradient descent, which itself requires computing derivatives of matrix expressions (the Jacobian).

Attention Is Linear Algebra

The attention mechanism that powers modern large language models (GPT, Claude, Gemini) is pure linear algebra. Each token in a sequence is first mapped to three vectors: a query $\mathbf{q}$ , a key $\mathbf{k}$ , and a value $\mathbf{v}$ , via matrix multiplications. Then attention scores are computed as dot products between queries and keys, and the output is a weighted sum of values:

\displaystyle \text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right)V

Every operation in this formula, the matrix multiplications $QK^T$ and the final product with $V$ , the scaling by $\sqrt{d_k}$ , is an operation you will learn to understand deeply in this book.

Embeddings Are Vectors

When a language model processes text, each word (or token) is represented as a vector in a high-dimensional space, typically with 768 to 12,288 dimensions. Similar words end up as nearby vectors. The famous example: the vector arithmetic $\text{king} - \text{man} + \text{woman} \approx \text{queen}$ works because the embedding space has learned a linear structure for analogies. This is vector addition and subtraction in a high-dimensional vector space.

Data Compression Is Eigenvalue Decomposition

Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) are eigenvalue-based techniques that find the most important directions in data. They are used to compress images, reduce the dimensionality of datasets, denoise signals, and build recommendation systems. Netflix's recommendation algorithm, Spotify's music suggestions, and image compression standards all rely on these tools.

Why Linear Algebra Is THE Language of AI

Linear algebra is not merely "used" in AI. It is the natural language of these systems because data is naturally organized as vectors and matrices, transformations between representations are naturally linear maps, and the optimization algorithms that train models operate on matrix derivatives. Learning linear algebra is learning the language that modern AI thinks in.

Your First Transformation

Let's make this concrete. The interactive visualization below lets you directly manipulate a $2 \times 2$ matrix and see how it transforms the entire plane. Every point in the grid is moved according to the rule:

\displaystyle \begin{pmatrix} x' \\ y' \end{pmatrix} = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}

The red arrow shows where the basis vector $\mathbf{e}_1 = (1, 0)$ lands after the transformation, and the blue arrow shows where $\mathbf{e}_2 = (0, 1)$ lands. Together, these two arrows completely determine the transformation.

Interactive: 2D Linear Transformation

Matrix A

1.00

0.00

1.00

det(A) = 1.000

a = 1.00

b = 0.00

c = 0.00

d = 1.00

Presets

The red arrow shows where e₁ = (1, 0) lands, and the blue arrow shows where e₂ = (0, 1) lands. Together, they completely determine the transformation. The grid shows how the entire plane deforms.

Try the following experiments:

Rotation: Click the "Rotation 45°" preset. Notice how the grid stays perfectly rigid — every angle and length is preserved. The determinant stays at 1.
Shear: Click "Shear." The grid tilts but does not stretch. Horizontal lines stay horizontal, but vertical lines lean over. Areas are preserved (determinant = 1).
Projection: Click "Projection." The entire 2D plane collapses onto the x-axis. The determinant drops to 0, reflecting the loss of a dimension.
Reflection: Click "Reflection." The grid flips. Notice the determinant becomes negative, signaling that the orientation of space has been reversed.
Free exploration: Drag the sliders freely and watch how the grid responds. Can you make the grid collapse to a single point? What matrix does that correspond to?

The Determinant as Area

The determinant of the matrix tells you how areas change under the transformation. A determinant of 2 means areas double. A determinant of 0.5 means areas halve. A determinant of 0 means the transformation collapses space to a lower dimension. A negative determinant means the transformation flips the orientation of space, like looking in a mirror.

The Computational View

Linear algebra is not just theoretical — it is deeply computational. Here is a simple Python program using NumPy that demonstrates the core operations and verifies the principle of linearity:

The Core of Linear Algebra in Code

🐍linearity_demo.py

Explanation(5)

Code(33)

4Creating a Vector

A vector is created as a NumPy array. This 3D vector could represent a position, velocity, force, or any structured quantity with three components.

7Defining a Transformation

This 3×3 matrix encodes a 90° rotation around the z-axis. The first column [0, 1, 0] tells us where the x-basis vector (1,0,0) lands. The second column [-1, 0, 0] tells us where the y-basis vector (0,1,0) lands. The third column [0, 0, 1] means the z-axis is unchanged.

14Matrix-Vector Product

The @ operator performs matrix multiplication. This single operation applies the transformation to our vector, rotating it 90° around the z-axis. The point (3, 1, 4) becomes (-1, 3, 4).

22Testing Additivity

We verify that transforming vectors separately and adding the results gives the same answer as adding first and then transforming. This is the additivity property: f(u + w) = f(u) + f(w).

30Testing Homogeneity

We verify that scaling before or after the transformation gives the same result. This is the homogeneity property: f(cv) = c·f(v). Together with additivity, these two properties define what ‘linear’ means.

28 lines without explanation

1import numpy as np
2
3# A vector: a point in 3D space
4v = np.array([3.0, 1.0, 4.0])
5
6# A matrix: a 90-degree rotation around the z-axis
7A = np.array([
8    [0.0, -1.0,  0.0],
9    [1.0,  0.0,  0.0],
10    [0.0,  0.0,  1.0]
11])
12
13# Apply the transformation: matrix times vector
14v_new = A @ v
15print(f"Original:    {v}")
16print(f"Transformed: {v_new}")
17
18# Verify linearity: f(u + w) = f(u) + f(w)
19u = np.array([1.0, 0.0, 2.0])
20w = np.array([0.0, 1.0, -1.0])
21
22transform_then_add = A @ u + A @ w
23add_then_transform = A @ (u + w)
24
25print(f"A@u + A@w = {transform_then_add}")
26print(f"A@(u + w) = {add_then_transform}")
27print(f"Equal? {np.allclose(transform_then_add, add_then_transform)}")
28
29# Verify scalar property: f(c*v) = c*f(v)
30c = 3.5
31print(f"A@(c*u) = {A @ (c * u)}")
32print(f"c*(A@u) = {c * (A @ u)}")
33print(f"Equal? {np.allclose(A @ (c * u), c * (A @ u))}")

Every line in this program corresponds to a concept you will master in this book: vectors as arrays, matrices as transformations, the $@$ operator as matrix multiplication, and the linearity check as the fundamental theorem that ties it all together.

Summary and Road Ahead

Let's recap what we have established in this opening section:

Linear algebra is the mathematics of structure: vectors carry structured information, matrices encode transformations, and vector spaces provide the stage on which the drama plays out.
Three pillars: vectors (data with structure), linear transformations (structure-preserving maps), and vector spaces (closed universes of vectors).
The golden rule of linearity: superposition means you can decompose, transform the pieces, and reassemble. This is what makes linear algebra computationally tractable and universally applicable.
Applications are everywhere: from solving circuit equations to training neural networks, from compressing images to simulating quantum systems.
Modern AI is linear algebra: neural networks are sequences of matrix multiplications, attention is a dot-product computation, and embeddings are vectors in high-dimensional spaces.

In the next section, we will dive deeper into the first pillar: vectors. We will explore the duality between the geometric arrow picture and the algebraic list-of-numbers picture, and discover why both perspectives are essential. The journey from here will take us through matrices, systems of equations, determinants, eigenvalues, SVD, and ultimately to the frontiers of applied linear algebra in modern computing.

The road ahead: This book is designed to build your understanding layer by layer, always connecting formal mathematics to geometric intuition, computational practice, and real-world applications. Every concept will be visualized, every formula will be explained in plain language, and every abstraction will be grounded in concrete examples. Welcome to linear algebra.