Chapter 1
12 min read
Section 1 of 261

What is an Algorithm?

Introduction to Algorithmic Thinking

A Working Definition

An algorithm is a finite, unambiguous, and effective procedure that, when started on a valid input, halts after a finite number of mechanical steps and produces the correct output. Stripped of prose, this is the contract:

Algorithm:Input    Output,in finitely many steps, by following a fixed rule.\text{Algorithm} : \text{Input} \;\longrightarrow\; \text{Output}, \quad \text{in finitely many steps, by following a fixed rule.}

Three words in that sentence carry almost all of the weight. Finite means the procedure must end — no infinite loops on valid inputs. Unambiguous means the next step is always determined: there is no "and then guess." Effective means each step can actually be executed by a person with pencil and paper, or, equivalently, by a machine with no intelligence of its own.

Intuition. A recipe is the everyday word for algorithm. "Beat eggs until stiff" would not survive a computer-science definition (when does "stiff" happen?). But "beat eggs for 90 seconds at speed 8" would: it has an input (the eggs), a clear stopping rule, and steps that can be executed without judgment.

The point of insisting on this contract is that it lets us reason about a procedure mathematically: we can prove it terminates, prove it is correct, and quantify how long it takes, because the rules of the game are pinned down before the analysis starts. Everything else in this book — sorting, searching, graphs, dynamic programming, NP-hardness — is built on top of these three words.


From al-Khwarizmi to Turing

The word algorithm is a Latinized rendering of al-Khwarizmi, the 9th-century Persian scholar Muhammad ibn Musa al-Khwarizmi. Around 825 CE, working in Baghdad's House of Wisdom, he wrote a treatise on arithmetic with Hindu-Arabic numerals. When that treatise was translated into Latin in the 12th century as Algoritmi de numero Indorum("al-Khwarizmi on the Indian numbers"), Europeans began calling any step-by-step calculation an algorism, and the modern word algorithm grew out of that.

But algorithms predate the word by more than a millennium. Around 300 BCE, Euclid wrote down a procedure for computing the greatest common divisor of two integers. It is still the standard algorithm taught today, more than 2,300 years later. We will trace it line by line in a moment.

YearPersonWhy it matters
~300 BCEEuclidFirst non-trivial algorithm in writing: GCD by repeated subtraction.
~825 CEal-KhwarizmiAlgebra and decimal arithmetic procedures; gave us the word algorithm.
1843Ada LovelaceWrote the first algorithm intended for a machine (Bernoulli numbers on Babbage's Analytical Engine).
1936Alonzo ChurchLambda calculus: defined what 'computable function' means.
1936Alan TuringTuring machines: a precise mathematical model of an algorithm.
1965Donald KnuthPinned down the modern definition with five concrete criteria (below).
1971Stephen CookShowed that some algorithmic problems may be inherently hard (NP-completeness).

Turing's 1936 paper On Computable Numbers, with an Application to the Entscheidungsproblem is the moment the informal idea of "mechanical procedure" became a mathematical object. Turing described an idealized machine with a tape, a finite state register, and a transition rule. He then showed that any procedure we would intuitively call an algorithm could be simulated by such a machine. This is the Church-Turing thesis: the informal notion of algorithm and the formal notion of Turing-machine computation coincide.

Why this history matters in 2026

Every time you write a Python for loop, you are using a notation that bakes in 90 years of mathematical work on what "step" means. The reason your code can be analyzed, compiled, optimized, and reasoned about at all is that the underlying notion of algorithm is mathematically precise — not folkloric.

Knuth's Five Criteria

In The Art of Computer Programming (1968), Donald Knuth sharpened the definition into five conditions that an algorithm must satisfy. They have been the textbook definition ever since.

  1. Finiteness. The algorithm must terminate after a finite number of steps on every valid input. Formally: there exists a function T:InputNT : \text{Input} \to \mathbb{N} such that the procedure halts after at most T(x)T(x) steps on input xx.
  2. Definiteness. Each step is precisely and unambiguously specified. Two different machines (or two different students) executing the algorithm on the same input must perform exactly the same sequence of operations.
  3. Input. The algorithm has zero or more inputs drawn from a specified set. The set itself is part of the specification: GCD on two non-negative integers is a different algorithm from GCD on two Gaussian integers.
  4. Output. The algorithm produces one or more outputs that bear a specified relation to the inputs. "Specified relation" is the correctness condition — e.g. gcd(a,b)a    gcd(a,b)b    d:(dadb)dgcd(a,b)\gcd(a, b) \mid a \;\wedge\; \gcd(a, b) \mid b \;\wedge\; \forall d : (d \mid a \wedge d \mid b) \Rightarrow d \mid \gcd(a, b).
  5. Effectiveness. Every operation must be basic enough that it could, in principle, be done exactly and in finite time by a human with pencil and paper. "Find the smallest counterexample to the Riemann hypothesis" is not effective.

Definiteness vs. determinism

Definiteness does not mean an algorithm cannot use randomness. A randomized algorithm is still definite: at each step the rule is "flip a fair coin and branch on the outcome." The rule itself has no ambiguity. What changes is that the output is now a random variable, and correctness becomes a probabilistic statement.

Quick Check

Which of these procedures is NOT an algorithm by Knuth's criteria?


Algorithm vs. Program vs. Process

These three words are often used loosely. The distinctions matter because the rest of the book lives at all three layers.

LayerWhat it isLives whereExample
AlgorithmA mathematical object: a finite sequence of well-defined steps.On paper, in your head, in a textbook."Repeatedly replace (a, b) with (b, a mod b) until b = 0."
ProgramA concrete encoding of an algorithm in a programming language.In a .py / .ts / .c file.def gcd(a, b): while b: a, b = b, a % b; return a
ProcessA specific execution of a program by a CPU on real input.In RAM, on a particular machine, at a particular time.PID 4218 running gcd(252, 105) on your laptop right now.

One algorithm corresponds to many programs (Python, Rust, hand-written x86 assembly — all implementing Euclid). One program corresponds to many processes (every time anyone runs it, a new process is born). When we say algorithm X is correct, we are making a claim about the algorithm, independent of any program. When we say this implementation is fast, we are usually making a claim about the program (and its compiler, OS, and hardware). Confusing these layers is a frequent source of bugs and bad benchmarks.


The Oldest Non-Trivial Algorithm: Euclid's GCD

Let's anchor the abstract definition with a concrete algorithm that is older than calculus, older than algebra, older than printed books. The greatest common divisor of two integers aa and bb, written gcd(a,b)\gcd(a, b), is the largest integer that divides both. For example, gcd(252,105)=21\gcd(252, 105) = 21.

Euclid's key observation, in the language of modular arithmetic, is this identity: for any positive integers ab>0a \geq b > 0,

gcd(a,b)=gcd(b,amodb),\gcd(a, b) = \gcd(b, a \bmod b), with the boundary case gcd(a,0)=a\gcd(a, 0) = a.

The intuition is that any common divisor of aa and bb must also divide their difference, and more generally, must divide aqb=amodba - q b = a \bmod b for any integer qq. So the set of common divisors is invariant under the replacement (a,b)(b,amodb)(a, b) \to (b, a \bmod b), and so its maximum — the GCD — is also invariant. We get to keep replacing until the second number is 0, at which point the first number is the GCD.

This is a tiny piece of mathematics that produces a very fast algorithm. The number of steps is O(logmin(a,b))O(\log \min(a, b)) — in fact bounded by 5log10min(a,b)5 \log_{10} \min(a, b) by Lamé's theorem (1844), one of the first running-time analyses ever written. Every step at least halves the smaller number on average, so even on 500-digit cryptographic numbers the algorithm finishes in a couple of thousand steps.


Interactive: Stepping Through Euclid

Use the controls below to watch Euclid's algorithm reduce a pair of integers to a GCD. The blue and amber bars show the current values of aa and bb, and the trace at the bottom shows every step. Try the preset 1071, 462 (which gives 21) or type your own numbers. Notice how rapidly bb shrinks — that is O(log)O(\log) in action.

step 1 / 4
a
252
b
105
divide: 252 = 105 × 2 + 42
replace: (a, b) ← (105, 42)
gcd(252, 105) = gcd(105, 42)
#abqrrule
1252105242gcd(252, 105) = gcd(105, 42)
210542221gcd(105, 42) = gcd(42, 21)
3422120gcd(42, 21) = gcd(21, 0)
4210gcd(21, 0) = 21 ← base case

A few things to look for. First, after a single step the larger of the two arguments is gone forever — we always replace the pair with smaller numbers. Second, the algorithm cannot loop: each iteration strictly decreases the second argument (because amodb<ba \bmod b < b), and the second argument is a non-negative integer, so it must eventually reach 0. That is a complete termination proof in two sentences, the kind of argument we will use constantly in this book.


The Same Algorithm in Two Languages

We said earlier that one algorithm can be encoded as many programs. Here is Euclid's algorithm in two different programming languages, in two different styles. The algorithm is the same; the programs are different.

Pure Python: a recursive transcription of the math

Euclid's algorithm, recursive Python
🐍euclid_recursive.py
1Function signature with type hints

Two integer inputs, one integer output. The type hints (`int` -> `int`) are not enforced at runtime in Python; they are documentation that matches the algorithm's specified input/output sets from Knuth's criteria 3 and 4.

3Normalize sign

GCD is defined for non-negative integers; mathematically gcd(-12, 18) = gcd(12, 18) = 6. Calling abs() on both inputs makes the algorithm robust to sign without changing the answer.

EXAMPLE
abs(-252) = 252, abs(105) = 105 -> we proceed with (252, 105).
4Base case: b == 0

This is the boundary identity gcd(a, 0) = a. Without a base case the recursion would never end; with it, every recursive descent eventually terminates.

EXAMPLE
gcd(21, 0): b == 0, return 21. Done.
5Return a (the answer at the base case)

When b is 0, a IS the GCD by definition. No further work.

6Recursive step: gcd(b, a mod b)

The mathematical identity gcd(a, b) = gcd(b, a mod b). Each call shrinks the second argument strictly (because a mod b is in [0, b)), guaranteeing finiteness — Knuth criterion 1.

EXAMPLE
gcd(252, 105): 252 mod 105 = 42, recurse on gcd(105, 42). Then gcd(42, 21). Then gcd(21, 0) = 21.
11First call with a worked example

We picked 252 and 105 deliberately: their GCD is 21 = 3 * 7, which is small enough to verify by hand. Trace: (252,105) -> (105,42) -> (42,21) -> (21,0). Four calls.

12A larger example

1071 = 3 * 7 * 51, 462 = 2 * 3 * 7 * 11. Common factor 3 * 7 = 21. The trace has the same length class — O(log min(a,b)) — even though the numbers are bigger.

13Boundary case

gcd(17, 0) hits the base case immediately and returns 17. This is the identity gcd(a, 0) = a, with no recursive step at all.

14The pathological pair (0, 0)

By the abs/base-case path we return 0. Mathematically this is a convention: every integer divides 0, so there is no 'greatest' common divisor. Most languages and number-theory libraries return 0 here for compatibility with identities like lcm(a, b) * gcd(a, b) = |a * b|.

4 lines without explanation
1def gcd(a: int, b: int) -> int:
2    """Greatest common divisor of two non-negative integers (Euclid's algorithm)."""
3    a, b = abs(a), abs(b)
4    if b == 0:
5        return a
6    return gcd(b, a % b)
7
8
9# Worked example
10print(gcd(252, 105))   # -> 21
11print(gcd(1071, 462))  # -> 21
12print(gcd(17, 0))      # -> 17  (boundary case)
13print(gcd(0, 0))       # -> 0   (conventional, see "Edge cases" below)

TypeScript: an iterative version that is closer to what a CPU actually does

Recursion is mathematically natural but every recursive call costs a stack frame. The same algorithm can be written iteratively with two variables and a loop. The output is identical; the runtime characteristics are slightly different (no stack growth, friendlier to tail-call-less compilers).

Euclid's algorithm, iterative TypeScript
📘euclid_iterative.ts
1TypeScript signature

Two number inputs, one number output. JS/TS use IEEE 754 doubles for `number`, so for huge integers (more than 2^53) you would switch to `bigint`. For our examples, plain numbers are fine.

3Local copies x, y

We don't mutate the parameters directly; that's stylistic but it makes the loop invariant easier to state ('x and y are the current pair').

EXAMPLE
gcd(252, 105) -> x = 252, y = 105.
4abs again

Same reason as the Python version: gcd(-12, 18) should equal gcd(12, 18). This is the contract.

9while y !== 0

The loop condition IS the negation of the base case. As long as y is non-zero, we have more work to do.

EXAMPLE
Iteration 1: x=252, y=105, y != 0, enter loop.
10Compute remainder

r = x mod y. This is the heart of the algorithm — the same step as `a % b` in the Python version.

EXAMPLE
Iteration 1: r = 252 % 105 = 42. Iteration 2: r = 105 % 42 = 21. Iteration 3: r = 42 % 21 = 0.
11x <- y

Promote the previous y to be the new x. After this, x holds the smaller of the previous pair.

EXAMPLE
Iteration 1: x becomes 105. Iteration 2: x becomes 42. Iteration 3: x becomes 21.
12y <- r

The new y is the remainder. Because 0 <= r < (old y), y strictly decreases — this is what guarantees termination.

EXAMPLE
Iteration 1: y becomes 42. Iteration 2: y becomes 21. Iteration 3: y becomes 0 -> loop exits.
14Return x

When the loop exits, y is 0 and x holds the GCD. Same answer as the recursive version, computed with constant stack space.

17Same worked example as Python

We deliberately use the same inputs to emphasize: this is the SAME ALGORITHM, expressed differently. Output: 21.

10 lines without explanation
1function gcd(a: number, b: number): number {
2  // Normalize: GCD is defined for non-negative integers.
3  let x = Math.abs(a);
4  let y = Math.abs(b);
5
6  // Loop invariant: gcd(x, y) == gcd(original |a|, original |b|).
7  // Each iteration replaces (x, y) with (y, x mod y).
8  // Since y strictly decreases and stays >= 0, the loop terminates.
9  while (y !== 0) {
10    const r = x % y;
11    x = y;
12    y = r;
13  }
14  return x;
15}
16
17console.log(gcd(252, 105));   // -> 21
18console.log(gcd(1071, 462));  // -> 21
19console.log(gcd(17, 0));      // -> 17

Algorithm vs. program, made concrete

The Python and TypeScript code look different. They use different keywords, different stack behavior, different number types. But they are encodings of the same algorithm: the sequence of intermediate (a, b) pairs is identical for any input. When we analyze Euclid's running time as O(logmin(a,b))O(\log \min(a, b)), the analysis applies to both, automatically.

A second canonical example. Given a sorted array A[0..n1]A[0..n-1] and a target tt, decide whether tt is in the array, and if so where. The naive algorithm is to scan from left to right — O(n)O(n) comparisons. Binary search uses the sorted-ness to do dramatically better:

  1. Look at the middle element.
  2. If it equals tt, return its index.
  3. If it is less than tt, recurse on the right half.
  4. If it is greater than tt, recurse on the left half.
🐍python
1def binary_search(A: list[int], t: int) -> int:
2    """Return the index of t in sorted list A, or -1 if absent."""
3    lo, hi = 0, len(A) - 1
4    while lo <= hi:
5        mid = (lo + hi) // 2     # integer midpoint, no overflow in Python
6        if A[mid] == t:
7            return mid
8        elif A[mid] < t:
9            lo = mid + 1         # discard left half (incl. mid)
10        else:
11            hi = mid - 1         # discard right half (incl. mid)
12    return -1

Each iteration halves the live region of the array. After kk iterations the live region has size at most n/2kn / 2^k, so the loop exits after at most log2n\lceil \log_2 n \rceil iterations. For n=109n = 10^9, that is 30 comparisons total. The difference between O(n)O(n) and O(logn)O(\log n) is the difference between seconds and nanoseconds at production scale — and it comes from a single observation about sorted data.

The role of the precondition

Binary search is correct ONLY if A is sorted. The sortedness is a precondition of the algorithm. This is part of the input specification: GCD takes integers, binary search takes a sorted array. Calling binary search on an unsorted array is not a bug in the algorithm — it is a violation of its contract.

Correctness: Invariants and Termination

How do we know an algorithm is correct? Not by testing it on a few examples (testing can prove the presence of bugs but never their absence, as Dijkstra famously said). We prove it.

A typical correctness proof has two halves:

  1. Termination. Show the algorithm halts on every valid input. Usually done by exhibiting a variant: a quantity that strictly decreases and is bounded below.
  2. Partial correctness. Show that if the algorithm halts, it halts with the correct answer. Usually done by identifying a loop invariant: a property preserved by every iteration that, combined with the loop's termination condition, implies the post-condition.

Apply this to Euclid's algorithm. Variant: bb — a non-negative integer that strictly decreases each iteration (because 0amodb<b0 \leq a \bmod b < b). It is bounded below by 0, so the loop must terminate. Invariant: gcd(a,b)=gcd(a0,b0)\gcd(a, b) = \gcd(a_0, b_0), where (a0,b0)(a_0, b_0) are the original inputs. The recurrence gcd(a,b)=gcd(b,amodb)\gcd(a, b) = \gcd(b, a \bmod b) guarantees the invariant is preserved by each step. When the loop exits with b=0b = 0, the invariant says gcd(a0,b0)=gcd(a,0)=a\gcd(a_0, b_0) = \gcd(a, 0) = a, which is what we return. QED.

Invariants as a thinking tool

Once you start writing invariants for your loops, you stop writing off-by-one bugs. The discipline of asking "what is true at the top of every iteration?" surfaces edge cases before they reach production.

Efficiency: A First Glimpse

Correctness is necessary but not sufficient. A bubble sort and a merge sort both correctly sort an array; one is usable, the other is not, on modern data sizes. The next chapter is devoted to asymptotic analysis — the formal language for comparing algorithms. For now, an informal preview.

We measure an algorithm's cost as a function of the input size nn. We care about how that function grows, not its constants. The standard scale:

GrowthNamen = 10n = 1,000,000Example
1constant11Hash table lookup
log nlogarithmic~3~20Binary search, GCD
nlinear101,000,000Linear scan, summing an array
n log nlinearithmic~33~20,000,000Merge sort, heap sort
n^2quadratic10010^12Bubble sort, naive matrix * vector
2^nexponential1,024&infin; (cosmic)Brute-force subset enumeration
n!factorial3,628,800&infin;Naive traveling salesman

At n=106n = 10^6, an O(nlogn)O(n \log n) algorithm finishes in well under a second on a laptop. An O(n2)O(n^2) algorithm takes hours. An O(2n)O(2^n) algorithm will not finish before the heat death of the universe. Constants matter, but they cannot rescue you from the wrong asymptotic class. This is why the rest of the book obsesses over picking the right algorithm before worrying about micro-optimization.

Quick Check

An input has n = 1,000,000 items. You have two algorithms: A is O(n^2) with a tiny constant; B is O(n log n) with a large constant (say 100x). Which is faster?


Edge Cases and Pathological Inputs

Part of specifying an algorithm is specifying its domain. The boundaries are where bugs live. For Euclid's GCD:

  • gcd(a,0)\gcd(a, 0): base case, returns aa. Defined.
  • gcd(0,b)\gcd(0, b): one recursive call swaps the arguments; returns bb. Defined.
  • gcd(0,0)\gcd(0, 0): mathematically undefined (every integer divides 0, so there is no greatest common divisor). By convention, computer-science libraries return 0 here so that algebraic identities like gcd(a,b)lcm(a,b)=ab\gcd(a, b) \cdot \operatorname{lcm}(a, b) = |a \cdot b| stay consistent.
  • Negative inputs: handled by taking a,b|a|, |b| first; some libraries instead define gcd(12,18)=6\gcd(-12, 18) = 6 by symmetry.
  • Huge inputs: in Python int is arbitrary precision so it just works; in C/Java with int32_t, GCD on two 32-bit integers is safe but the product in lcm overflows.

Edge cases are part of the specification, not afterthoughts

A correct algorithm must specify, in writing, what happens for every valid input — including the awkward ones. Saying "gcd(0, 0) returns 0 by convention" is a design choice. Saying nothing and letting it return whatever the code happens to do is a bug waiting to be discovered in production.

Where Algorithms Live in the Real World

It is easy to think of algorithms as classroom toys. They are not. They are the load-bearing structure of the systems you use every second of every day.

DomainSystemAlgorithm at the core
Search enginesGoogle web searchPageRank (eigenvector iteration), inverted-index merge
Operating systemsLinux CFS schedulerRed-black tree of runnable tasks; weighted fair queueing
DatabasesPostgreSQL, MySQLB+ tree lookups, hash join, sort-merge join, query planning
CompilersLLVM, GCCRecursive-descent / LR parsing, dataflow analysis, register allocation by graph coloring
NetworkingInternet routingDijkstra (OSPF), Bellman-Ford (RIP), BGP path-vector
MapsGoogle Maps, WazeContraction Hierarchies, A* search on road graphs
Social networksFacebook, LinkedInBFS for 'people you may know', PageRank-style ranking
AINeural net trainingBackpropagation (chain rule on a DAG), Adam optimizer
RecommendationsNetflix, SpotifyMatrix factorization (SVD, ALS), nearest neighbors
SchedulingAirline crew, kubernetesBipartite matching, network flow, simplex / interior-point
Memorymalloc, garbage collectionBuddy allocator, mark-and-sweep, generational GC
SecurityTLS handshakeRSA / ECC key exchange — built directly on Euclid's GCD!
RoboticsSelf-driving carsKalman filter, RRT path planning, SLAM
FinanceOrder matchingLimit-order books backed by skip lists / balanced BSTs
BiologyGenome assemblyDe Bruijn graph traversal, Smith-Waterman alignment
EverydaySpell-checkEdit distance (dynamic programming), trie lookup

Look at the last row of the security entry. The RSA cryptosystem, which protects nearly every TLS connection on the internet, computes a modular inverse using the extended Euclidean algorithm — a 30-line elaboration of the procedure Euclid wrote down in 300 BCE. A 2,300-year-old algorithm is, right now, the reason your connection to your bank is private. Algorithms compound across millennia.


What an Algorithm Is Not

Sharpening the boundary further. None of the following are algorithms, and recognizing why builds intuition for what an algorithm is.

ProcedureWhy it failsFailed criterion
&quot;Print every digit of pi.&quot;Pi has infinitely many digits; the procedure cannot halt.Finiteness
&quot;Sort the array somehow.&quot;Not specified what 'somehow' does on each step.Definiteness
&quot;Given a Turing machine M, decide whether it halts.&quot;Provably impossible (halting problem, Turing 1936).Effectiveness — no procedure exists
&quot;Find the next prime greater than n by intuition.&quot;Intuition is not a mechanical operation.Effectiveness
A trained neural net at inference time on novel input.It produces an output, but each weight matters and there is no human-readable specification of the input-output relation.Borderline — definite (the forward pass is deterministic) but the 'specified relation' is fuzzy. We usually call it a model, not an algorithm.
A heuristic that works most of the time.Output is not specified to bear a fixed relation to input on every valid input.Output (in the strict sense) — though heuristics are still studied as 'approximation algorithms' with a guaranteed error bound.

Quick Check

Which is the SHARPEST distinction between an algorithm and a heuristic?


We have only sketched the landscape. Across the next 14 chapters we will build the standard algorithmic toolkit — arrays, lists, stacks, queues, trees, hash tables, heaps, graphs, recursion, dynamic programming, greedy methods, divide and conquer — with the same level of mathematical care we just applied to Euclid. Every data structure we meet will have an algorithmic personality (what it makes fast, what it makes slow), and every algorithm we meet will sit on top of a data structure. The two halves of this book's title are the two halves of the same craft.

Loading comments...