By the end of this section, you will own a mental picture so concrete that whenever you read "vector," you see an arrow with a tail at the origin, a tip at or, a specific length, and a specific direction. Everything else in linear algebra is built on top of that one picture.
The Story: Why Vectors Exist
In the 1840s, physics was stuck. Newton's laws describe forces and velocities — quantities that have both size and direction — but there was no clean algebra for them. If you wanted to add two forces, you wrote out three equations for the three coordinate directions and did the bookkeeping by hand. Every rotation of the axes rewrote every formula. Every composition of motions was a thicket.
In Dublin in 1843, William Rowan Hamilton was hunting for an algebra of rotations in three dimensions. After years of failure, he discovered the quaternions while walking along the Royal Canal and famously carved the defining relation into Brougham Bridge. On the continent, Hermann Grassmann published his Ausdehnungslehre in 1844 — a sweeping algebra of oriented quantities in any number of dimensions, so far ahead of its time that almost no one read it for thirty years. What we now call vector analysis — the streamlined arrow-with-components notation we still use today — was distilled out of those two threads by J. Willard Gibbs at Yale and Oliver Heaviside in England in the 1880s, independently, because they needed it to write electromagnetism without going insane.
What they gave us is the object this section is about: a thing with magnitude and direction, represented by an ordered list of numbers, obeying a simple algebra. Once you have it, force diagrams, velocity fields, image pixels, word meanings, neural network activations, and quantum states all become the same kind of creature.
Before vectors, every physical law was a committee of scalar equations. After vectors, the law is one line — and the committee is what a computer does for you.
The Mental Picture First: 2D
A vector in the plane is an arrow. Its tail sits at the origin, its tip sits at some point, and the two numbers that locate the tip are its components. Drag the tip below. Watch the components, the length, and the angle update live. Feel how the three readouts are not three separate pieces of information — they are three views of the same arrow.
A few things to notice while you play. Dragging along the x-axis changes only. Dragging vertically changes only. Dragging diagonally changes both — and the length is never the sum of the components; it is the hypotenuse. The angle θ is measured from the positive x-axis and can be negative when the arrow points below. The arrow itself does not care about coordinates — the coordinates are labels we stick on it after choosing axes. The arrow is primary. The numbers are a description.
Stepping Into 3D
Add a third axis and the story barely changes. An arrow in space needs three numbers to describe its tip:. Move the sliders below and rotate the scene. Notice the three dashed drops — first along x, then up in y, then out in z — that build the arrow as an L-shaped path. This is the 3D Pythagorean theorem made visible: the length of the diagonal is built from the three perpendicular legs.
The default vector is chosen so the legs — of length 1, 2, and 2 — produce a clean diagonal. Compute it with me: one squared is 1, two squared is 4, two squared is 4, sum is 9, square root is 3. The arrow is exactly three units long. Change the sliders and watch the number change with you. This is the whole content of the formula.
The Formal Definition, Earned
Now that you have the picture, the notation has something to stick to. A 2D vector is an ordered pair of real numbers . A 3D vector is an ordered triple . The symbol names the space of all such lists of length . Three sets of notation are all equivalent and you will see them everywhere:
| Row form | Column form | Pair/triple form |
|---|---|---|
| v = [ v₁ v₂ ] | v = [ v₁ ; v₂ ] | (v₁, v₂) |
In this book we use the pair/triple form in prose and the column form in equations where we need to multiply by matrices. The choice is cosmetic; nothing about the underlying arrow changes when you rotate the numbers ninety degrees on the page.
Magnitude: How Long Is an Arrow?
The magnitude (also called length, norm, or Euclidean norm) of is
Every symbol earns its place. is the -th component — the signed length of the projection of the arrow onto the -th axis. Squaring it throws away the sign (a leg of length −3 and a leg of length 3 contribute the same amount to the hypotenuse). Summing the squares is the multi-dimensional Pythagorean theorem: perpendicular legs compose into a diagonal whose squared length is the sum of their squared lengths. The square root returns from area back to length. What the equation tells us is that length in higher dimensions is not a new mystery — it is Pythagoras applied one axis at a time.
Unit Vectors: Direction Alone
Any non-zero vector can be divided by its own magnitude to produce a vector of length exactly 1 that points the same way:
The hat is the standard notation for "unit version of." Unit vectors strip off the magnitude and leave pure direction — a useful move whenever you want to compare directions without size getting in the way (we will do exactly this when we meet the dot product and cosine similarity in later sections).
Worked Example: By Hand
Take . This is the famous 3-4-5 right triangle in disguise. Compute the magnitude step by step, writing every arithmetic move:
Square the first component: . Square the second: . Sum them: . Take the square root: . So , exactly. The angle from the positive x-axis is . The unit vector is . Drag the arrow in the 2D widget above to and read off 5, 53.1° — the numbers in the side panel should match digit for digit.
Now in 3D. Square each: . Sum: . Square root: 3. So . The unit vector is , which rounds to . Sliders in the 3D widget above are already at these values; the readout on the right should say 3.000.
Python From Scratch
Before we pick up any library, let us build magnitude, components, and unit vectors out of bare loops. The point is not that this is how you would ship code — it is that the formula is literally three operations: multiply, add, and take the square root. Once you have felt those three operations pass through your fingers, PyTorch's one-liner stops being magic.
The PyTorch Way
PyTorch stores a vector as a one-dimensional tensor. The same three operations — multiply, sum, square root — happen, but now they run as a single vectorized kernel on CPU or GPU, and the whole expression participates in autograd if we ask it to. Same two vectors, same two numbers out the other end.
What did PyTorch hide? The loop in magnitude became a fused C++/CUDA kernel. The division became a broadcast over whatever device the tensor lives on — your CPU, your GPU, your TPU. At this scale the speed difference is invisible, but for an embedding tensor of shape the kernel does in milliseconds what a Python loop would take a minute on. That is the point of the library: the meaning of magnitude is still three operations; only the delivery mechanism has changed.
Where Engineers Actually Use This: Word Embeddings
Here is a place where the arrow picture pays rent immediately. Modern natural language processing represents every word — and in fact every sub-word token — as a vector in for some between 100 and a few thousand. The breakthrough was Mikolov et al.'s word2vec in 2013, which showed that the vectors learned by predicting neighboring words in a corpus place semantically related words near each other. "Cat" and "dog" end up as short arrows apart. "Cat" and "laptop" end up far.
Measuring "near" is just the magnitude we already know, applied to the difference of two vectors:
Nothing new — Pythagoras one more time, on a vector whose components are the axis-wise gaps between and . Click any two words below to compute the distance between their toy 3D embeddings. Watch the numbers tell the semantic story.
Click cat then dog: the readout shows distance ≈ 0.49. Click cat then laptop: ≈ 3.97 — almost eight times farther. Click apple then laptop: ≈ 4.46 — farther still. Without the program "knowing" anything about biology or consumer electronics, purely because training nudged related words into adjacent positions in space, the geometric distance we are computing becomes a measurement of meaning.
Real systems push the dimension way up (word2vec used 300, modern transformer embeddings use 768–12288) and often prefer cosine similarity over raw distance — but both are thin wrappers around the same magnitude formula. Semantic search, query expansion, retrieval-augmented generation, duplicate detection, and recommendation all boil down to: embed the thing as a vector, then ask how close it is to other vectors. Every one of those pipelines calls torch.linalg.norm or its Python-from-scratch twin millions of times a second.
magnitude just runs 768 times instead of 3. This is why linear algebra is the language of AI: the same three operations scale to any dimension the hardware can afford.Pitfalls & What to Watch For
Confusing points and vectors. The coordinates can label either a point in the plane or a vector. They are not the same object: a point has only location, a vector has length and direction and can live anywhere you draw it. A good habit is to reserve parentheses for points and angle brackets or column notation for vectors when the distinction matters, though in practice context does most of the disambiguating.
Unit vectors of the zero vector. The formula divides by zero if . The zero vector has no direction, so there is no unit vector for it. Guard for this in code; NumPy and PyTorch will happily return nan or inf and let those poison whatever you compute next.
Numerical overflow in magnitude. For very long vectors with large components, squaring them before summing can overflow floating point even when the answer fits comfortably. The classic fix is to factor out the largest absolute component first (this is what math.hypot and torch.linalg.norm do internally). You will rarely hit this with embeddings, often hit it with poorly scaled physical simulations.
Row versus column conventions. Books are split roughly 50/50 on whether vectors are row shaped or column shaped. The math is identical; only the matrix layout you pair them with changes. We use column vectors in matrix contexts to match mainstream ML notation. Do not let the orientation on the page convince you two books disagree when they do not.
The most common student mistake. Reading as "absolute value of v." The vertical bars are the same symbol historically, but for a vector they mean length of the arrow, which is always a non-negative scalar regardless of the signs of the components. A vector with all-negative components does not have a negative magnitude.
What This Unlocks Next
You can now describe, measure, and visualize arrows in two and three dimensions. The next two sections build directly on this picture: vector addition turns two arrows into a third via the parallelogram rule, and scalar multiplication stretches and flips arrows without changing their direction. Together, addition and scaling produce linear combinations — and once you have those, every structure in the rest of the book (span, basis, subspaces, transformations, matrices) is one or two small steps away.