A Working Definition
An algorithm is a finite, unambiguous, and effective procedure that, when started on a valid input, halts after a finite number of mechanical steps and produces the correct output. Stripped of prose, this is the contract:
Three words in that sentence carry almost all of the weight. Finite means the procedure must end — no infinite loops on valid inputs. Unambiguous means the next step is always determined: there is no "and then guess." Effective means each step can actually be executed by a person with pencil and paper, or, equivalently, by a machine with no intelligence of its own.
Intuition. A recipe is the everyday word for algorithm. "Beat eggs until stiff" would not survive a computer-science definition (when does "stiff" happen?). But "beat eggs for 90 seconds at speed 8" would: it has an input (the eggs), a clear stopping rule, and steps that can be executed without judgment.
The point of insisting on this contract is that it lets us reason about a procedure mathematically: we can prove it terminates, prove it is correct, and quantify how long it takes, because the rules of the game are pinned down before the analysis starts. Everything else in this book — sorting, searching, graphs, dynamic programming, NP-hardness — is built on top of these three words.
From al-Khwarizmi to Turing
The word algorithm is a Latinized rendering of al-Khwarizmi, the 9th-century Persian scholar Muhammad ibn Musa al-Khwarizmi. Around 825 CE, working in Baghdad's House of Wisdom, he wrote a treatise on arithmetic with Hindu-Arabic numerals. When that treatise was translated into Latin in the 12th century as Algoritmi de numero Indorum("al-Khwarizmi on the Indian numbers"), Europeans began calling any step-by-step calculation an algorism, and the modern word algorithm grew out of that.
But algorithms predate the word by more than a millennium. Around 300 BCE, Euclid wrote down a procedure for computing the greatest common divisor of two integers. It is still the standard algorithm taught today, more than 2,300 years later. We will trace it line by line in a moment.
| Year | Person | Why it matters |
|---|---|---|
| ~300 BCE | Euclid | First non-trivial algorithm in writing: GCD by repeated subtraction. |
| ~825 CE | al-Khwarizmi | Algebra and decimal arithmetic procedures; gave us the word algorithm. |
| 1843 | Ada Lovelace | Wrote the first algorithm intended for a machine (Bernoulli numbers on Babbage's Analytical Engine). |
| 1936 | Alonzo Church | Lambda calculus: defined what 'computable function' means. |
| 1936 | Alan Turing | Turing machines: a precise mathematical model of an algorithm. |
| 1965 | Donald Knuth | Pinned down the modern definition with five concrete criteria (below). |
| 1971 | Stephen Cook | Showed that some algorithmic problems may be inherently hard (NP-completeness). |
Turing's 1936 paper On Computable Numbers, with an Application to the Entscheidungsproblem is the moment the informal idea of "mechanical procedure" became a mathematical object. Turing described an idealized machine with a tape, a finite state register, and a transition rule. He then showed that any procedure we would intuitively call an algorithm could be simulated by such a machine. This is the Church-Turing thesis: the informal notion of algorithm and the formal notion of Turing-machine computation coincide.
Why this history matters in 2026
for loop, you are using a notation that bakes in 90 years of mathematical work on what "step" means. The reason your code can be analyzed, compiled, optimized, and reasoned about at all is that the underlying notion of algorithm is mathematically precise — not folkloric.Knuth's Five Criteria
In The Art of Computer Programming (1968), Donald Knuth sharpened the definition into five conditions that an algorithm must satisfy. They have been the textbook definition ever since.
- Finiteness. The algorithm must terminate after a finite number of steps on every valid input. Formally: there exists a function such that the procedure halts after at most steps on input .
- Definiteness. Each step is precisely and unambiguously specified. Two different machines (or two different students) executing the algorithm on the same input must perform exactly the same sequence of operations.
- Input. The algorithm has zero or more inputs drawn from a specified set. The set itself is part of the specification: GCD on two non-negative integers is a different algorithm from GCD on two Gaussian integers.
- Output. The algorithm produces one or more outputs that bear a specified relation to the inputs. "Specified relation" is the correctness condition — e.g. .
- Effectiveness. Every operation must be basic enough that it could, in principle, be done exactly and in finite time by a human with pencil and paper. "Find the smallest counterexample to the Riemann hypothesis" is not effective.
Definiteness vs. determinism
Quick Check
Which of these procedures is NOT an algorithm by Knuth's criteria?
Algorithm vs. Program vs. Process
These three words are often used loosely. The distinctions matter because the rest of the book lives at all three layers.
| Layer | What it is | Lives where | Example |
|---|---|---|---|
| Algorithm | A mathematical object: a finite sequence of well-defined steps. | On paper, in your head, in a textbook. | "Repeatedly replace (a, b) with (b, a mod b) until b = 0." |
| Program | A concrete encoding of an algorithm in a programming language. | In a .py / .ts / .c file. | def gcd(a, b): while b: a, b = b, a % b; return a |
| Process | A specific execution of a program by a CPU on real input. | In RAM, on a particular machine, at a particular time. | PID 4218 running gcd(252, 105) on your laptop right now. |
One algorithm corresponds to many programs (Python, Rust, hand-written x86 assembly — all implementing Euclid). One program corresponds to many processes (every time anyone runs it, a new process is born). When we say algorithm X is correct, we are making a claim about the algorithm, independent of any program. When we say this implementation is fast, we are usually making a claim about the program (and its compiler, OS, and hardware). Confusing these layers is a frequent source of bugs and bad benchmarks.
The Oldest Non-Trivial Algorithm: Euclid's GCD
Let's anchor the abstract definition with a concrete algorithm that is older than calculus, older than algebra, older than printed books. The greatest common divisor of two integers and , written , is the largest integer that divides both. For example, .
Euclid's key observation, in the language of modular arithmetic, is this identity: for any positive integers ,
with the boundary case .
The intuition is that any common divisor of and must also divide their difference, and more generally, must divide for any integer . So the set of common divisors is invariant under the replacement , and so its maximum — the GCD — is also invariant. We get to keep replacing until the second number is 0, at which point the first number is the GCD.
This is a tiny piece of mathematics that produces a very fast algorithm. The number of steps is — in fact bounded by by Lamé's theorem (1844), one of the first running-time analyses ever written. Every step at least halves the smaller number on average, so even on 500-digit cryptographic numbers the algorithm finishes in a couple of thousand steps.
Interactive: Stepping Through Euclid
Use the controls below to watch Euclid's algorithm reduce a pair of integers to a GCD. The blue and amber bars show the current values of and , and the trace at the bottom shows every step. Try the preset 1071, 462 (which gives 21) or type your own numbers. Notice how rapidly shrinks — that is in action.
| # | a | b | q | r | rule |
|---|---|---|---|---|---|
| 1 | 252 | 105 | 2 | 42 | gcd(252, 105) = gcd(105, 42) |
| 2 | 105 | 42 | 2 | 21 | gcd(105, 42) = gcd(42, 21) |
| 3 | 42 | 21 | 2 | 0 | gcd(42, 21) = gcd(21, 0) |
| 4 | 21 | 0 | — | — | gcd(21, 0) = 21 ← base case |
A few things to look for. First, after a single step the larger of the two arguments is gone forever — we always replace the pair with smaller numbers. Second, the algorithm cannot loop: each iteration strictly decreases the second argument (because ), and the second argument is a non-negative integer, so it must eventually reach 0. That is a complete termination proof in two sentences, the kind of argument we will use constantly in this book.
The Same Algorithm in Two Languages
We said earlier that one algorithm can be encoded as many programs. Here is Euclid's algorithm in two different programming languages, in two different styles. The algorithm is the same; the programs are different.
Pure Python: a recursive transcription of the math
TypeScript: an iterative version that is closer to what a CPU actually does
Recursion is mathematically natural but every recursive call costs a stack frame. The same algorithm can be written iteratively with two variables and a loop. The output is identical; the runtime characteristics are slightly different (no stack growth, friendlier to tail-call-less compilers).
Algorithm vs. program, made concrete
Binary Search: Algorithm as a Decision
A second canonical example. Given a sorted array and a target , decide whether is in the array, and if so where. The naive algorithm is to scan from left to right — comparisons. Binary search uses the sorted-ness to do dramatically better:
- Look at the middle element.
- If it equals , return its index.
- If it is less than , recurse on the right half.
- If it is greater than , recurse on the left half.
1def binary_search(A: list[int], t: int) -> int:
2 """Return the index of t in sorted list A, or -1 if absent."""
3 lo, hi = 0, len(A) - 1
4 while lo <= hi:
5 mid = (lo + hi) // 2 # integer midpoint, no overflow in Python
6 if A[mid] == t:
7 return mid
8 elif A[mid] < t:
9 lo = mid + 1 # discard left half (incl. mid)
10 else:
11 hi = mid - 1 # discard right half (incl. mid)
12 return -1Each iteration halves the live region of the array. After iterations the live region has size at most , so the loop exits after at most iterations. For , that is 30 comparisons total. The difference between and is the difference between seconds and nanoseconds at production scale — and it comes from a single observation about sorted data.
The role of the precondition
Correctness: Invariants and Termination
How do we know an algorithm is correct? Not by testing it on a few examples (testing can prove the presence of bugs but never their absence, as Dijkstra famously said). We prove it.
A typical correctness proof has two halves:
- Termination. Show the algorithm halts on every valid input. Usually done by exhibiting a variant: a quantity that strictly decreases and is bounded below.
- Partial correctness. Show that if the algorithm halts, it halts with the correct answer. Usually done by identifying a loop invariant: a property preserved by every iteration that, combined with the loop's termination condition, implies the post-condition.
Apply this to Euclid's algorithm. Variant: — a non-negative integer that strictly decreases each iteration (because ). It is bounded below by 0, so the loop must terminate. Invariant: , where are the original inputs. The recurrence guarantees the invariant is preserved by each step. When the loop exits with , the invariant says , which is what we return. QED.
Invariants as a thinking tool
Efficiency: A First Glimpse
Correctness is necessary but not sufficient. A bubble sort and a merge sort both correctly sort an array; one is usable, the other is not, on modern data sizes. The next chapter is devoted to asymptotic analysis — the formal language for comparing algorithms. For now, an informal preview.
We measure an algorithm's cost as a function of the input size . We care about how that function grows, not its constants. The standard scale:
| Growth | Name | n = 10 | n = 1,000,000 | Example |
|---|---|---|---|---|
| 1 | constant | 1 | 1 | Hash table lookup |
| log n | logarithmic | ~3 | ~20 | Binary search, GCD |
| n | linear | 10 | 1,000,000 | Linear scan, summing an array |
| n log n | linearithmic | ~33 | ~20,000,000 | Merge sort, heap sort |
| n^2 | quadratic | 100 | 10^12 | Bubble sort, naive matrix * vector |
| 2^n | exponential | 1,024 | ∞ (cosmic) | Brute-force subset enumeration |
| n! | factorial | 3,628,800 | ∞ | Naive traveling salesman |
At , an algorithm finishes in well under a second on a laptop. An algorithm takes hours. An algorithm will not finish before the heat death of the universe. Constants matter, but they cannot rescue you from the wrong asymptotic class. This is why the rest of the book obsesses over picking the right algorithm before worrying about micro-optimization.
Quick Check
An input has n = 1,000,000 items. You have two algorithms: A is O(n^2) with a tiny constant; B is O(n log n) with a large constant (say 100x). Which is faster?
Edge Cases and Pathological Inputs
Part of specifying an algorithm is specifying its domain. The boundaries are where bugs live. For Euclid's GCD:
- : base case, returns . Defined.
- : one recursive call swaps the arguments; returns . Defined.
- : mathematically undefined (every integer divides 0, so there is no greatest common divisor). By convention, computer-science libraries return 0 here so that algebraic identities like stay consistent.
- Negative inputs: handled by taking first; some libraries instead define by symmetry.
- Huge inputs: in Python
intis arbitrary precision so it just works; in C/Java withint32_t, GCD on two 32-bit integers is safe but the product inlcmoverflows.
Edge cases are part of the specification, not afterthoughts
Where Algorithms Live in the Real World
It is easy to think of algorithms as classroom toys. They are not. They are the load-bearing structure of the systems you use every second of every day.
| Domain | System | Algorithm at the core |
|---|---|---|
| Search engines | Google web search | PageRank (eigenvector iteration), inverted-index merge |
| Operating systems | Linux CFS scheduler | Red-black tree of runnable tasks; weighted fair queueing |
| Databases | PostgreSQL, MySQL | B+ tree lookups, hash join, sort-merge join, query planning |
| Compilers | LLVM, GCC | Recursive-descent / LR parsing, dataflow analysis, register allocation by graph coloring |
| Networking | Internet routing | Dijkstra (OSPF), Bellman-Ford (RIP), BGP path-vector |
| Maps | Google Maps, Waze | Contraction Hierarchies, A* search on road graphs |
| Social networks | Facebook, LinkedIn | BFS for 'people you may know', PageRank-style ranking |
| AI | Neural net training | Backpropagation (chain rule on a DAG), Adam optimizer |
| Recommendations | Netflix, Spotify | Matrix factorization (SVD, ALS), nearest neighbors |
| Scheduling | Airline crew, kubernetes | Bipartite matching, network flow, simplex / interior-point |
| Memory | malloc, garbage collection | Buddy allocator, mark-and-sweep, generational GC |
| Security | TLS handshake | RSA / ECC key exchange — built directly on Euclid's GCD! |
| Robotics | Self-driving cars | Kalman filter, RRT path planning, SLAM |
| Finance | Order matching | Limit-order books backed by skip lists / balanced BSTs |
| Biology | Genome assembly | De Bruijn graph traversal, Smith-Waterman alignment |
| Everyday | Spell-check | Edit distance (dynamic programming), trie lookup |
Look at the last row of the security entry. The RSA cryptosystem, which protects nearly every TLS connection on the internet, computes a modular inverse using the extended Euclidean algorithm — a 30-line elaboration of the procedure Euclid wrote down in 300 BCE. A 2,300-year-old algorithm is, right now, the reason your connection to your bank is private. Algorithms compound across millennia.
What an Algorithm Is Not
Sharpening the boundary further. None of the following are algorithms, and recognizing why builds intuition for what an algorithm is.
| Procedure | Why it fails | Failed criterion |
|---|---|---|
| "Print every digit of pi." | Pi has infinitely many digits; the procedure cannot halt. | Finiteness |
| "Sort the array somehow." | Not specified what 'somehow' does on each step. | Definiteness |
| "Given a Turing machine M, decide whether it halts." | Provably impossible (halting problem, Turing 1936). | Effectiveness — no procedure exists |
| "Find the next prime greater than n by intuition." | Intuition is not a mechanical operation. | Effectiveness |
| A trained neural net at inference time on novel input. | It produces an output, but each weight matters and there is no human-readable specification of the input-output relation. | Borderline — definite (the forward pass is deterministic) but the 'specified relation' is fuzzy. We usually call it a model, not an algorithm. |
| A heuristic that works most of the time. | Output is not specified to bear a fixed relation to input on every valid input. | Output (in the strict sense) — though heuristics are still studied as 'approximation algorithms' with a guaranteed error bound. |
Quick Check
Which is the SHARPEST distinction between an algorithm and a heuristic?
We have only sketched the landscape. Across the next 14 chapters we will build the standard algorithmic toolkit — arrays, lists, stacks, queues, trees, hash tables, heaps, graphs, recursion, dynamic programming, greedy methods, divide and conquer — with the same level of mathematical care we just applied to Euclid. Every data structure we meet will have an algorithmic personality (what it makes fast, what it makes slow), and every algorithm we meet will sit on top of a data structure. The two halves of this book's title are the two halves of the same craft.