Chapter 2
15 min read
Section 6 of 134

Vectors in 2D and 3D

Vectors - The Building Blocks

By the end of this section, you will own a mental picture so concrete that whenever you read "vector," you see an arrow with a tail at the origin, a tip at(v1,v2)\,(v_1, v_2)\, or(v1,v2,v3)\,(v_1, v_2, v_3)\,, a specific length, and a specific direction. Everything else in linear algebra is built on top of that one picture.


The Story: Why Vectors Exist

In the 1840s, physics was stuck. Newton's laws describe forces and velocities — quantities that have both size and direction — but there was no clean algebra for them. If you wanted to add two forces, you wrote out three equations for the three coordinate directions and did the bookkeeping by hand. Every rotation of the axes rewrote every formula. Every composition of motions was a thicket.

In Dublin in 1843, William Rowan Hamilton was hunting for an algebra of rotations in three dimensions. After years of failure, he discovered the quaternions while walking along the Royal Canal and famously carved the defining relation into Brougham Bridge. On the continent, Hermann Grassmann published his Ausdehnungslehre in 1844 — a sweeping algebra of oriented quantities in any number of dimensions, so far ahead of its time that almost no one read it for thirty years. What we now call vector analysis — the streamlined arrow-with-components notation we still use today — was distilled out of those two threads by J. Willard Gibbs at Yale and Oliver Heaviside in England in the 1880s, independently, because they needed it to write electromagnetism without going insane.

What they gave us is the object this section is about: a thing with magnitude and direction, represented by an ordered list of numbers, obeying a simple algebra. Once you have it, force diagrams, velocity fields, image pixels, word meanings, neural network activations, and quantum states all become the same kind of creature.

Before vectors, every physical law was a committee of scalar equations. After vectors, the law is one line — and the committee is what a computer does for you.

The Mental Picture First: 2D

A vector in the plane is an arrow. Its tail sits at the origin, its tip sits at some point, and the two numbers that locate the tip are its components. Drag the tip below. Watch the components, the length, and the angle update live. Feel how the three readouts are not three separate pieces of information — they are three views of the same arrow.

Loading 2D vector

A few things to notice while you play. Dragging along the x-axis changes onlyv1\,v_1\,. Dragging vertically changes onlyv2\,v_2\,. Dragging diagonally changes both — and the lengthv\,|\vec v|\, is never the sum of the components; it is the hypotenuse. The angle θ is measured from the positive x-axis and can be negative when the arrow points below. The arrow itself does not care about coordinates — the coordinates are labels we stick on it after choosing axes. The arrow is primary. The numbers are a description.

Vectors are not points. A point is a location; a vector has a direction. If you took the drawing above and slid the entire arrow — tail and tip together — five units to the right, it would still be the same vector. We happen to draw every vector starting from the origin because it makes the components easy to read, but that is a convenience, not a rule.

Stepping Into 3D

Add a third axis and the story barely changes. An arrow in space needs three numbers to describe its tip:u=(u1,u2,u3)\,\vec u = (u_1, u_2, u_3)\,. Move the sliders below and rotate the scene. Notice the three dashed drops — first along x, then up in y, then out in z — that build the arrow as an L-shaped path. This is the 3D Pythagorean theorem made visible: the length of the diagonal is built from the three perpendicular legs.

Loading 3D vector

The default vector u=(1,2,2)\,\vec u = (1, 2, 2)\, is chosen so the legs — of length 1, 2, and 2 — produce a clean diagonal. Compute it with me: one squared is 1, two squared is 4, two squared is 4, sum is 9, square root is 3. The arrow is exactly three units long. Change the sliders and watch the number change with you. This is the whole content of the formulau=u12+u22+u32\,|\vec u| = \sqrt{u_1^2 + u_2^2 + u_3^2}\,.


The Formal Definition, Earned

Now that you have the picture, the notation has something to stick to. A 2D vector is an ordered pair of real numbers v=(v1,v2)R2\,\vec v = (v_1, v_2) \in \mathbb{R}^2\,. A 3D vector is an ordered triple v=(v1,v2,v3)R3\,\vec v = (v_1, v_2, v_3) \in \mathbb{R}^3\,. The symbol Rn\,\mathbb{R}^n\, names the space of all such lists of length n\,n\,. Three sets of notation are all equivalent and you will see them everywhere:

Row formColumn formPair/triple form
v = [ v₁ v₂ ]v = [ v₁ ; v₂ ](v₁, v₂)

In this book we use the pair/triple form in prose and the column form in equations where we need to multiply by matrices. The choice is cosmetic; nothing about the underlying arrow changes when you rotate the numbers ninety degrees on the page.

Magnitude: How Long Is an Arrow?

The magnitude (also called length, norm, or Euclidean norm) of vRn\,\vec v \in \mathbb{R}^n\, is

    v  =  v12+v22++vn2.    \;\; |\vec v| \;=\; \sqrt{\, v_1^2 + v_2^2 + \cdots + v_n^2 \,}. \;\;

Every symbol earns its place.vi\,v_i\, is the i\,i\,-th component — the signed length of the projection of the arrow onto the i\,i\,-th axis. Squaring it throws away the sign (a leg of length −3 and a leg of length 3 contribute the same amount to the hypotenuse). Summing the squares is the multi-dimensional Pythagorean theorem: perpendicular legs compose into a diagonal whose squared length is the sum of their squared lengths. The square root returns from area back to length. What the equation tells us is that length in higher dimensions is not a new mystery — it is Pythagoras applied one axis at a time.

Unit Vectors: Direction Alone

Any non-zero vector v\,\vec v\, can be divided by its own magnitude to produce a vector of length exactly 1 that points the same way:

    v^  =  vv.    \;\; \hat v \;=\; \frac{\vec v}{|\vec v|}. \;\;

The hat is the standard notation for "unit version of." Unit vectors strip off the magnitude and leave pure direction — a useful move whenever you want to compare directions without size getting in the way (we will do exactly this when we meet the dot product and cosine similarity in later sections).


Worked Example: By Hand

Take v=(3,4)\,\vec v = (3, 4)\,. This is the famous 3-4-5 right triangle in disguise. Compute the magnitude step by step, writing every arithmetic move:

Square the first component: 32=9\,3^2 = 9\,. Square the second: 42=16\,4^2 = 16\,. Sum them: 9+16=25\,9 + 16 = 25\,. Take the square root: 25=5\,\sqrt{25} = 5\,. So v=5\,|\vec v| = 5\,, exactly. The angle from the positive x-axis is θ=arctan(4/3)53.13\,\theta = \arctan(4/3) \approx 53.13^\circ\,. The unit vector is v^=(3/5,  4/5)=(0.6,  0.8)\,\hat v = (3/5,\; 4/5) = (0.6,\; 0.8)\,. Drag the arrow in the 2D widget above to(3,4)\,(3, 4)\, and read off 5, 53.1° — the numbers in the side panel should match digit for digit.

Now u=(1,2,2)\,\vec u = (1, 2, 2)\, in 3D. Square each: 1,4,4\,1, 4, 4\,. Sum: 9\,9\,. Square root: 3. So u=3\,|\vec u| = 3\,. The unit vector is u^=(1/3,  2/3,  2/3)\,\hat u = (1/3,\; 2/3,\; 2/3)\,, which rounds to (0.333,  0.667,  0.667)\,(0.333,\; 0.667,\; 0.667)\,. Sliders in the 3D widget above are already at these values; the readout on the right should say 3.000.

The three-way promise. The hand calculation above, the on-screen widget, the Python-from-scratch code in the next subsection, and the PyTorch code after that all use the same two vectors v=(3,4)\,\vec v = (3, 4)\, and u=(1,2,2)\,\vec u = (1, 2, 2)\,. If any of them ever disagrees on the output, one of them has a bug. This is how you learn to trust numerical computing — by watching it match arithmetic you can do on paper.

Python From Scratch

Before we pick up any library, let us build magnitude, components, and unit vectors out of bare loops. The point is not that this is how you would ship code — it is that the formula v=vi2\,|\vec v| = \sqrt{\sum v_i^2}\, is literally three operations: multiply, add, and take the square root. Once you have felt those three operations pass through your fingers, PyTorch's one-liner stops being magic.

🐍vectors_from_scratch.py
1Define magnitude(v)

Function that accepts a list of numbers representing a vector in any dimension and returns its Euclidean length. No imports, no NumPy, just Python.

EXAMPLE
⬇ input: v = [3.0, 4.0]  (any length accepted)
2total = 0.0 (running sum of squares)

Accumulator that will hold v₁² + v₂² + … + vₙ². We start it at 0.0 (float, not int) so division and sqrt behave correctly later.

EXAMPLE
before loop: total = 0.0
3for x in v — walk the components

Iterate through each component of the vector exactly once. For v = [3.0, 4.0], this loop runs twice: iteration 1 has x = 3.0, iteration 2 has x = 4.0.

EXAMPLE
iter 1 → x = 3.0
iter 2 → x = 4.0
4total += x * x — accumulate a squared leg

Square the current component and add it to the running total. The square throws away the sign (−3 and +3 contribute the same), which is exactly what Pythagoras asks for.

EXAMPLE
iter 1: total = 0.0 + 3.0*3.0 = 9.0
iter 2: total = 9.0 + 4.0*4.0 = 25.0
5return total ** 0.5 — square root

total now holds the sum of squared components (25.0 for v). Raising to the 0.5 power is the square root. We return a single non-negative float: the length of the arrow.

EXAMPLE
⬆ return: 25.0 ** 0.5 = 5.0
8Define unit_vector(v)

Function that rescales a vector so its length becomes exactly 1 while preserving its direction. Requires v ≠ 0 — we will discuss that pitfall later.

EXAMPLE
⬇ input: v = [3.0, 4.0]
9m = magnitude(v) — one division factor

Compute the length once and store it. If we recomputed magnitude inside the comprehension on each iteration we would do O(n²) work for no reason.

EXAMPLE
m = magnitude([3.0, 4.0]) = 5.0
10return [x / m for x in v]

📚 List comprehension. Builds a new list by dividing each component by m. Each new entry is the old component scaled so the resulting arrow has length exactly 1 — direction preserved, size normalized.

EXAMPLE
[3.0/5.0, 4.0/5.0] = [0.6, 0.8]
new length = √(0.36 + 0.64) = 1.0
13v = [3.0, 4.0] — our 2D example

The 3-4-5 right-triangle vector used in the hand calculation. Components v₁ = 3.0, v₂ = 4.0.

EXAMPLE
v[0] = 3.0, v[1] = 4.0
14u = [1.0, 2.0, 2.0] — our 3D example

A vector in ℝ³ chosen so 1² + 2² + 2² = 9 (a perfect square). Components u₁ = 1.0, u₂ = 2.0, u₃ = 2.0.

EXAMPLE
u[0] = 1.0, u[1] = 2.0, u[2] = 2.0
16print("v =", v)

Echo the input vector so the reader can align the printed output with this line. Python prints lists inline, space-comma separated.

EXAMPLE
stdout: v        = [3.0, 4.0]
17magnitude(v) call

Apply our hand-rolled magnitude. Internally loops over [3.0, 4.0], accumulates 9.0 + 16.0 = 25.0, returns √25 = 5.0.

EXAMPLE
stdout: |v|      = 5.0
18unit_vector(v) call

Applies magnitude first (5.0) then divides component-wise. Result is the direction of v stripped of its size.

EXAMPLE
stdout: v_hat    = [0.6, 0.8]
20print("u =", u)

Echoes the 3D input vector u = [1.0, 2.0, 2.0].

EXAMPLE
stdout: u        = [1.0, 2.0, 2.0]
21magnitude(u) in 3D

Same function — the loop just runs three times instead of two. 1² + 2² + 2² = 9, √9 = 3. Notice how no code changed to handle 3D: the function is dimension-agnostic.

EXAMPLE
stdout: |u|      = 3.0
22unit_vector(u) in 3D

Each component divided by 3.0. The printed triple is the direction of u as a unit arrow.

EXAMPLE
stdout: u_hat    = [0.3333333333, 0.6666666667, 0.6666666667]
6 lines without explanation
1def magnitude(v):
2    total = 0.0
3    for x in v:
4        total += x * x
5    return total ** 0.5
6
7
8def unit_vector(v):
9    m = magnitude(v)
10    return [x / m for x in v]
11
12
13v = [3.0, 4.0]
14u = [1.0, 2.0, 2.0]
15
16print("v        =", v)
17print("|v|      =", magnitude(v))
18print("v_hat    =", unit_vector(v))
19
20print("u        =", u)
21print("|u|      =", magnitude(u))
22print("u_hat    =", unit_vector(u))

The PyTorch Way

PyTorch stores a vector as a one-dimensional tensor. The same three operations — multiply, sum, square root — happen, but now they run as a single vectorized kernel on CPU or GPU, and the whole expression participates in autograd if we ask it to. Same two vectors, same two numbers out the other end.

🐍vectors_pytorch.py
1import torch

📚 Bring in PyTorch. Gives us tensors, GPU execution, and automatic differentiation. For this section we only use the tensor and norm functionality — autograd shows up in later chapters.

EXAMPLE
after this line: torch.__version__ is accessible
3v = torch.tensor([3.0, 4.0])

📚 torch.tensor(data) builds a tensor from a Python list. Argument data is any nested sequence of numbers; PyTorch infers the shape and a default dtype of torch.float32 from the 3.0/4.0 literals.

EXAMPLE
⬆ return: tensor([3., 4.])  (shape=(2,), dtype=torch.float32)
4u = torch.tensor([1.0, 2.0, 2.0])

Same construction, three components. Shape becomes (3,). Whether a tensor lives in 2D or 3D is just its length along axis 0; nothing about the API changes.

EXAMPLE
⬆ return: tensor([1., 2., 2.])  (shape=(3,))
6mag_v = torch.linalg.norm(v)

📚 torch.linalg.norm(input, ord=None, dim=None) computes a norm. With a 1-D input and defaults it computes the Euclidean (L2) norm: √(Σ vᵢ²). Returned as a 0-D tensor (a scalar tensor) — not a Python float yet.

EXAMPLE
⬆ return: tensor(5.)  (sqrt(3² + 4²) = sqrt(25) = 5)
7mag_u = torch.linalg.norm(u)

Same kernel, three-element input. Internally computes 1² + 2² + 2² = 9, then √9.

EXAMPLE
⬆ return: tensor(3.)
9v_hat = v / mag_v

📚 Tensor division broadcasts a scalar tensor over a vector tensor: every entry of v is divided by the single value inside mag_v. The result is a new tensor of the same shape as v.

EXAMPLE
⬆ return: tensor([0.6000, 0.8000])
10u_hat = u / mag_u

Same pattern in 3D. Each of u's three components is divided by mag_u (= 3.0).

EXAMPLE
⬆ return: tensor([0.3333, 0.6667, 0.6667])
12print("v =", v)

Prints the tensor repr. PyTorch shows trailing zeros as a single dot (3. means 3.0) and uses the tensor(...) wrapper so you can tell it from a plain list.

EXAMPLE
stdout: v       = tensor([3., 4.])
13mag_v.item()

📚 .item() extracts a Python float from a 0-D tensor. We call it here only so the printed value reads 5.0 instead of tensor(5.). Do not call .item() on multi-element tensors — it raises an error.

EXAMPLE
⬆ return: 5.0  (stdout: |v|     = 5.0)
14print("v_hat =", v_hat)

Printing the unit tensor directly. No .item() because v_hat has two entries.

EXAMPLE
stdout: v_hat   = tensor([0.6000, 0.8000])
16print("u =", u)

Echoes the 3D input tensor.

EXAMPLE
stdout: u       = tensor([1., 2., 2.])
17mag_u.item()

Extract the scalar length of u as a Python float. Matches the hand calculation exactly.

EXAMPLE
stdout: |u|     = 3.0
18print("u_hat =", u_hat)

Print the 3D unit tensor. 1/3 is not exactly representable in float32, so PyTorch prints rounded values — which is why we see 0.3333 and 0.6667, not the infinite decimal expansion.

EXAMPLE
stdout: u_hat   = tensor([0.3333, 0.6667, 0.6667])
5 lines without explanation
1import torch
2
3v = torch.tensor([3.0, 4.0])
4u = torch.tensor([1.0, 2.0, 2.0])
5
6mag_v = torch.linalg.norm(v)
7mag_u = torch.linalg.norm(u)
8
9v_hat = v / mag_v
10u_hat = u / mag_u
11
12print("v       =", v)
13print("|v|     =", mag_v.item())
14print("v_hat   =", v_hat)
15
16print("u       =", u)
17print("|u|     =", mag_u.item())
18print("u_hat   =", u_hat)

What did PyTorch hide? The loop in magnitude became a fused C++/CUDA kernel. The division became a broadcast over whatever device the tensor lives on — your CPU, your GPU, your TPU. At this scale the speed difference is invisible, but for an embedding tensor of shape (50,000,768)\,(50{,}000, 768)\, the kernel does in milliseconds what a Python loop would take a minute on. That is the point of the library: the meaning of magnitude is still three operations; only the delivery mechanism has changed.


Where Engineers Actually Use This: Word Embeddings

Here is a place where the arrow picture pays rent immediately. Modern natural language processing represents every word — and in fact every sub-word token — as a vector in Rn\,\mathbb{R}^n\, for some n\,n\, between 100 and a few thousand. The breakthrough was Mikolov et al.'s word2vec in 2013, which showed that the vectors learned by predicting neighboring words in a corpus place semantically related words near each other. "Cat" and "dog" end up as short arrows apart. "Cat" and "laptop" end up far.

Measuring "near" is just the magnitude we already know, applied to the difference of two vectors:

    d(a,b)  =  ab  =  i=1n(aibi)2.    \;\; d(\vec a, \vec b) \;=\; |\vec a - \vec b| \;=\; \sqrt{\sum_{i=1}^{n} (a_i - b_i)^2}. \;\;

Nothing new — Pythagoras one more time, on a vector whose components are the axis-wise gaps between a\,\vec a\, and b\,\vec b\,. Click any two words below to compute the distance between their toy 3D embeddings. Watch the numbers tell the semantic story.

Loading embedding space

Click cat then dog: the readout shows distance ≈ 0.49. Click cat then laptop: ≈ 3.97 — almost eight times farther. Click apple then laptop: ≈ 4.46 — farther still. Without the program "knowing" anything about biology or consumer electronics, purely because training nudged related words into adjacent positions in space, the geometric distance we are computing becomes a measurement of meaning.

Real systems push the dimension way up (word2vec used 300, modern transformer embeddings use 768–12288) and often prefer cosine similarity over raw distance — but both are thin wrappers around the same magnitude formula. Semantic search, query expansion, retrieval-augmented generation, duplicate detection, and recommendation all boil down to: embed the thing as a vector, then ask how close it is to other vectors. Every one of those pipelines calls torch.linalg.norm or its Python-from-scratch twin millions of times a second.

The vectors in the widget above are a toy in R3\,\mathbb{R}^3\, so we can see them. Nothing about the arithmetic changes when n=768\,n = 768\,; the loop in magnitude just runs 768 times instead of 3. This is why linear algebra is the language of AI: the same three operations scale to any dimension the hardware can afford.

Pitfalls & What to Watch For

Confusing points and vectors. The coordinates (3,4)\,(3, 4)\, can label either a point in the plane or a vector. They are not the same object: a point has only location, a vector has length and direction and can live anywhere you draw it. A good habit is to reserve parentheses for points and angle brackets or column notation for vectors when the distinction matters, though in practice context does most of the disambiguating.

Unit vectors of the zero vector. The formulav^=v/v\,\hat v = \vec v / |\vec v|\, divides by zero if v=0\,\vec v = \vec 0\,. The zero vector has no direction, so there is no unit vector for it. Guard for this in code; NumPy and PyTorch will happily return nan or inf and let those poison whatever you compute next.

Numerical overflow in magnitude. For very long vectors with large components, squaring them before summing can overflow floating point even when the answer fits comfortably. The classic fix is to factor out the largest absolute component first (this is what math.hypot and torch.linalg.norm do internally). You will rarely hit this with embeddings, often hit it with poorly scaled physical simulations.

Row versus column conventions. Books are split roughly 50/50 on whether vectors are row shaped or column shaped. The math is identical; only the matrix layout you pair them with changes. We use column vectors in matrix contexts to match mainstream ML notation. Do not let the orientation on the page convince you two books disagree when they do not.

The most common student mistake. Reading v\,|\vec v|\, as "absolute value of v." The vertical bars are the same symbol historically, but for a vector they mean length of the arrow, which is always a non-negative scalar regardless of the signs of the components. A vector with all-negative components does not have a negative magnitude.


What This Unlocks Next

You can now describe, measure, and visualize arrows in two and three dimensions. The next two sections build directly on this picture: vector addition turns two arrows into a third via the parallelogram rule, and scalar multiplication stretches and flips arrows without changing their direction. Together, addition and scaling produce linear combinations — and once you have those, every structure in the rest of the book (span, basis, subspaces, transformations, matrices) is one or two small steps away.


Loading comments...