Why Engineers Prove Things
A proof is a chain of reasoning that takes you from things you already accept to a conclusion that, by the rules of logic, must therefore hold. It looks like a math exercise. It functions, for working software engineers, like a debugger that runs over every possible input at once. When Donald Knuth quipped, "beware of bugs in the above code; I have only proved it correct, not tried it," he wasn't belittling proofs — he was pointing out that the only thing better than testing on a million inputs is reasoning that closes the door on the other infinity.
The skills in this section are not optional ornaments. They show up directly as code:
| Proof technique | Engineering artifact it produces |
|---|---|
| Direct proof | An algorithm derivation: "here is why this formula computes the answer." |
| Induction | A correctness argument for any function with a recursion or loop. |
| Loop invariants | Hoare-style assertions you can paste into the code as `assert` statements. |
| Termination proof | A guarantee that a service won't hang — the variant is your liveness witness. |
| Pigeonhole | Hash-collision arguments, cycle-detection, cache-eviction lower bounds. |
| Adversary argument | "No matter how clever you are, sorting needs Ω(n log n) comparisons." |
The bridge: a loop invariant is mathematical induction in disguise. A termination argument is well-founded recursion in disguise. A type checker is a tiny automated theorem prover. Whenever you write code that must be correct on inputs you have not seen, you are doing proof work whether or not you call it that.
Direct Proof
A direct proof of assumes and uses definitions and previously established facts to derive . There is no rhetorical trick — you simply walk forward.
Worked example: if is odd, then is odd
Definition. An integer is odd iff there exists an integer with .
Proof. Assume is odd. Then for some integer . Squaring, . Let ; is an integer because integers are closed under addition and multiplication. Therefore , which is the definition of odd.
The shape of every direct proof
Proof by Contrapositive
The contrapositive of is . The two are logically equivalent — proving one proves the other. Sometimes the contrapositive is dramatically easier because gives you a concrete object to manipulate.
Worked example: if is even, then is even
Trying to prove this directly is awkward (square roots are not friendly over the integers). Contrapositive: if is odd, then is odd — which is exactly what we just proved. Done.
When to flip
Proof by Contradiction
Assume the conclusion is false, derive a contradiction, conclude the conclusion is true. The technique is so old it has a Latin name: reductio ad absurdum.
Worked example 1: is irrational
Suppose, for contradiction, that with integers having no common factor (we may always reduce to lowest terms). Then , so is even, so is even (by contrapositive above), so for some integer . Substituting, , so , so is also even. But then and share the factor 2 — contradicting our choice of lowest terms.
Worked example 2: there are infinitely many primes (Euclid)
Suppose, for contradiction, that the primes are exactly . Form . is greater than every , so it is not in our list. Yet every integer has a prime factor. That factor divides , but each divides and hence leaves remainder 1 when dividing . So has a prime factor not on the list — contradiction.
Sketch: the halting problem is undecidable
Suppose, for contradiction, a program exists that returns true iff program halts on input . Build the diagonal program D(P): if H(P, P) returns true, loop forever; else halt. Now ask: does D(D) halt? If yes, said yes, so D loops — contradiction. If no, said no, so D halts — contradiction. The assumption was wrong; cannot exist.
Direct vs. contradiction
Mathematical Induction
Induction is the workhorse of algorithm correctness. To prove for every integer :
- Base case. Show directly.
- Inductive hypothesis. Assume for an arbitrary .
- Inductive step. Using only the hypothesis and definitions, prove .
Picture an infinite line of dominoes: the base case knocks the first one over; the inductive step says each domino topples its neighbor; together every domino falls.
Worked example 1: Gauss's sum,
Base. For , the LHS is 1 and the RHS is . Match.
Hypothesis. Assume .
Step. Then , which is the case.
Worked example 2: for all
Base. . Holds.
Hypothesis. Assume .
Step. , where the last inequality uses . So .
The base case is not optional
Quick Check
In the Gauss-sum induction, which line uses the inductive hypothesis?
Strong Induction
Sometimes "" is too narrow a step — you need to lean on multiple smaller cases. Strong induction lets you assume for every when proving . Logically it is equivalent to ordinary induction; tactically it is much more powerful.
Worked example: every integer has a prime factorization
Base. is prime; the "factorization" is just .
Hypothesis. Assume every integer with has a prime factorization.
Step. Consider . If is prime, we are done. Otherwise with . By the strong hypothesis, both and have prime factorizations; concatenating them gives one for .
Why ordinary induction fails here
Structural Induction
When the objects you reason about are not numbers but recursively-defined structures — lists, trees, expressions, programs — ordinary induction on size is awkward. Structural induction matches the structure exactly:
| Structure | Base case(s) | Inductive case |
|---|---|---|
| Lists | P(empty list) | P(xs) ⇒ P(cons x xs) |
| Binary trees | P(empty) and/or P(leaf) | P(L) and P(R) ⇒ P(node v L R) |
| Arithmetic expressions | P(literal), P(variable) | P(e1), P(e2) ⇒ P(e1 op e2) |
| Natural numbers (Peano) | P(0) | P(n) ⇒ P(succ n) |
Worked example: a complete binary tree of nodes has height
Let denote the height of (single-node tree has height 0). We prove the equivalent claim: a complete binary tree of height has between and nodes; therefore .
Base. A single-node tree has height 0 and 1 node; . Holds.
Step. Suppose the claim holds for all complete binary trees of height . A complete tree of height has root plus two subtrees, each of height (one possibly missing its last level). By the hypothesis, each subtree has between and nodes, so the whole tree has between ... wait — the lower bound is achieved when the last level has just one node, giving total. The upper bound matches. Taking logs of gives .
Why this matters for DS&A
Loop Invariants: Induction in Code
A loop invariant is a statement about the program state that is true every time control reaches the loop test. Tony Hoare turned this idea into a method for proving programs correct. Three obligations — an exact mirror of induction:
- Initialization (base). The invariant holds the first time we reach the loop test.
- Maintenance (step). If the invariant holds before an iteration, it still holds after.
- Termination (closing the deal). When the loop exits, the invariant plus the exit condition gives the post-condition we wanted.
Worked example: insertion sort
Invariant: at the start of each iteration with index , the prefix is a sorted permutation of the original first elements.
Initialization. Before the first iteration ; is a single element, trivially sorted, and trivially a permutation of itself.
Maintenance. The body shifts elements of larger than one position right and inserts in the gap. The result is a sorted permutation of the original first elements — the invariant for the next iteration with .
Termination. The loop exits at . The invariant says is a sorted permutation of the original array — precisely the post-condition.
Worked example: binary search invariant
Invariant: if the target exists anywhere in the original array, then it lies in the slice .
Initialization. covers the whole array; trivially the target, if present, is in the slice.
Maintenance. We compute and compare. If we're done. If , the array is sorted, so every index has value ; the target cannot be there, so it must lie in — updating preserves the invariant. Symmetric argument for .
Termination. Each iteration shrinks by at least 1 (since either jumps past or falls below ). When , the slice is empty; combined with the invariant, the target is not in the array.
An invariant is the ghost of an induction inside your loop
assert statements — they're executable specifications.Interactive: Invariant Inspector
Step through binary search on a sorted array. At every step the cyan band shows the surviving range . The invariant promises that if the target is in the array at all, it is somewhere in cyan. Watch the band shrink; the green panel reaffirms why the invariant still holds.
- lo = 0, hi = 15, mid = 7
- A[mid] = 22, target = 23
- Action: A[mid] < target, set lo = mid + 1
Try a target that isn't in the array (drag the target slider to a value between two cells). The invariant still holds at every step — right up to the final step where the range becomes empty. That empty range is exactly the proof: the contrapositive form of the invariant says if the target isn't in , then it isn't in the original array. The algorithm correctly returns "not found."
Termination Proofs
Correctness has two parts: partial correctness (if the algorithm halts, the answer is right) and total correctness (it actually halts). Loop invariants buy you the first; you prove the second with a variant: a quantity that
- takes values in a well-founded set (one with no infinite descending chain — e.g. ),
- strictly decreases on every iteration / recursive call.
Because the values are bounded below and strictly drop, the process must reach the floor — the loop exits, the recursion bottoms out.
Worked example: Euclid's GCD terminates
Recall the recurrence with base . Take the variant . By definition of , , so the second argument strictly decreases. It is a non-negative integer. The recursion must reach in at most steps.
Tighter bound
Programs that "obviously" terminate but don't
Disproof by Counterexample
Universal claims ("for all ...") are disproved by exhibiting a single for which the claim fails. Existential claims ("there exists ...") cannot be disproved this way.
| Claim | Counterexample | Why it kills the claim |
|---|---|---|
| Every odd number is prime. | 9 = 3 · 3 | 9 is odd but composite. |
| n² + n + 41 is prime for every n ≥ 0. | n = 40 gives 41². | Euler's polynomial eventually fails. |
| Every continuous function on ℝ is differentiable. | f(x) = |x| at 0 | Continuous, not differentiable. |
| Greedy interval scheduling by shortest interval is optimal. | Three intervals: [1, 4], [3, 5], [4, 7] | Greedy picks [3, 5]; optimal picks [1, 4] and [4, 7]. |
| Hashing always gives O(1) lookup. | Adversarial keys colliding in one bucket | Linear scan on collision chain. |
In algorithm design, counterexamples are how you reject a plausible-looking approach before you waste a week implementing it. Whenever your gut says "greedy should work here," try to break it on small inputs first. A counterexample on an array of length 4 saves you a debugging session at scale.
Pigeonhole Principle
Pigeonhole. If objects go into boxes, some box gets at least two.
Generalized. If objects go into boxes, some box gets at least .
It looks trivial; it is devastatingly powerful. A few algorithmic uses:
- Hash collisions are inevitable. Any hash function from an unbounded key space into a finite table has, by pigeonhole, infinitely many collisions. There is no "perfect" general-purpose hash.
- Floyd's cycle detection. A function on a finite set, iterated, must eventually revisit a value — the iterates are pigeons, the set is the boxes. This is why the "tortoise and hare" algorithm works.
- Finite-state machines repeat. A DFA with states reading a string of length must visit some state twice — the foundation of the pumping lemma.
- Birthday bound. Among 23 random people, two probably share a birthday. The "-collision" bound for hashing is the same calculation.
Mini-proof: any 5 points in a unit square include two within distance
Partition the unit square into four sub-squares. Five points into four sub-squares — by pigeonhole, some sub-square holds at least two. Any two points in a sub-square are within its diagonal .
Adversary Arguments
How do you prove that no algorithm can do better than X? You play the adversary: you imagine an opponent who answers each query the algorithm makes in the worst possible (but consistent) way. If the adversary can force the algorithm into work, then is a lower bound for the problem.
Sketch: comparison-based sorting requires comparisons
Any deterministic comparison sort can be drawn as a decision tree: each internal node is a comparison , each leaf is one of the possible permutations. To distinguish all permutations, the tree must have at least leaves. A binary tree with leaves has height at least . Therefore the worst-case number of comparisons is at least by Stirling's approximation.
Why this matters
Probabilistic Arguments
The probabilistic method proves that an object with property exists by showing that a randomly drawn object has property with positive probability. Strange — you never construct the object. But if it would exist with probability , then by the definition of probability there must be at least one in the sample space.
For algorithm analysis, the friendlier cousin is expected-value reasoning. Randomized quicksort, with a uniformly random pivot, runs in expected time even though its worst case is . The proof shows that any pair of elements is compared at most once and computes the expected number of comparisons by linearity of expectation:
.
Randomization defeats adversaries
Case Study: Verifying Binary Search
We now write the algorithm out and annotate every meaningful line with its role in the proof. Python first; TypeScript second with explicit invariant assertions.
The same algorithm in TypeScript, with the invariants written as runtime assertions you can keep in development builds:
Quick Check
Suppose a colleague rewrites the recursive call as `lo = mid` instead of `lo = mid + 1`. What breaks?
Case Study: Euclid's GCD
Two proof obligations: the algorithm computes the right thing (partial correctness via the recurrence ), and it terminates (well-founded variant ).
The TypeScript version is essentially the same; the proof obligations transfer verbatim. We use BigInt here because the algorithm is most useful for cryptographic-sized integers, where ordinary 64-bit arithmetic would overflow.
Proofs in Industry
These techniques are not academic decoration. They underpin real systems whose failure would be catastrophic.
| System | Proof technology | Why it matters |
|---|---|---|
| Amazon Web Services (S3, DynamoDB, EBS) | TLA+ specifications + model checking | Engineers caught subtle distributed-protocol bugs (replica divergence, lost writes) that no test suite would have found before launch. |
| seL4 microkernel | Isabelle/HOL machine-checked proof | First general-purpose OS kernel with a complete formal proof of functional correctness (~ 200,000 lines of proof for ~9,000 lines of C). |
| CompCert C compiler | Coq proof of semantic preservation | Proves the compiled assembly behaves the same as the source C — eliminating a category of safety-critical compiler bugs (used in Airbus avionics). |
| TLS 1.3 (Project Everest) | F* proof of cryptographic security | Verified TLS implementation now shipping in Firefox, Wireguard, and Azure. |
| Rust borrow checker | Linear-types proof system at the type-checker level | "If it compiles, it has no data races" — a static proof discharged automatically every build. |
| TypeScript / Flow type systems | Curry-Howard correspondence (lightweight) | Every type error is a tiny rejected proof attempt; every successful compile is a small theorem about your data flow. |
| Financial-trading risk systems | Loop-invariant audits, model checking | Race conditions in matching engines = market-moving losses. Invariant proofs are how desks sleep at night. |
| Blockchain smart contracts | Symbolic execution (Manticore, Mythril) + Z3 | Reentrancy, integer-overflow, and access-control bugs cost over $3B in 2022 alone; formal methods are the strongest defence. |
The throughline: where the cost of failure is high enough, machine-checked proof is cheaper than testing. Even when full proof is overkill, the local techniques — an invariant comment, a termination argument, a sketch of why your greedy is optimal — are the difference between code that almost works and code you can ship.
Common Pitfalls
1. The base case is wrong (or trivially picked)
The horse-color paradox above is the canonical example: a base case that doesn't actually let the inductive step take its first non-trivial bite. Always test the step at manually, not just at .
2. Confusing weak and strong induction
If your inductive step needs facts about cases much smaller than (recursing on halves, factors, sub-trees), you need strong induction. Trying to force the argument into ordinary induction will leave a hole.
3. "Proof" by example
Verifying an identity for proves nothing about . Polya's prime-counting conjecture held for the first integers and turned out to be false. Examples build intuition; only a proof closes the gap.
4. Off-by-one in the inductive step
The most common bug: assuming and accidentally proving again, not . Annotate which index you are reasoning about at every line.
5. Forgetting termination
Every loop and every recursive call needs some argument for why it ends. "The variable seems to get smaller" isn't enough — identify the variant explicitly. Real-world example: the 2003 Northeast blackout was triggered partly by an alarm system stuck in an infinite race-condition loop.
6. Confusing necessary and sufficient
is not the same as . "If it's a square, it's a rectangle" doesn't mean "rectangles are squares." Half the bad correctness arguments collapse on this point.
7. Universal vs. existential mix-up
To disprove "" one counterexample suffices. To disprove "" you must prove the universal negation — one counterexample is not enough.
Quick Check
A student writes: "I tested my new sorting algorithm on 10,000 random arrays and it always sorted correctly. Therefore it's correct." What proof technique would have actually established correctness, and what is the precise problem with the student's argument?
Exercises
Warm-up
- Prove by induction that .
- Prove by contradiction that is irrational.
- Disprove: "every quadratic with integer coefficients is reducible over the integers." (Give a counterexample.)
- Identify the variant for the loop
while n > 1: n = n // 2. Why does it terminate?
Loop invariants
- Linear search for an element in an unsorted array. State the invariant. Prove initialization, maintenance, and termination.
- Selection sort. State an invariant strong enough to prove the post-condition ("A is sorted"). Prove all three obligations.
- The two-pointer algorithm for "does a sorted array contain two elements summing to ?" State and prove an invariant relating the pointers to the search space.
Termination
- Find a variant for the recursion
f(n) = f(n // 2) + f(n // 2)with basef(0) = 1. Why does the well-founded relation work even though we make two recursive calls? - The recursion
ack(m, n)for the Ackermann function calls itself with smaller pairs in the lexicographic order on . State the variant precisely and explain why with this order is well-founded.
Structural induction
- Prove by structural induction on binary trees that for every tree, (assuming every internal node has exactly two children).
- Prove that the in-order traversal of a binary search tree visits keys in non-decreasing order. (Hint: induction on tree structure; carefully state what you assume about the left and right subtrees.)
Pigeonhole and adversary
- Prove: any sequence of 11 distinct integers contains either an increasing subsequence of length 4 or a decreasing subsequence of length 4. (Generalization of Erdős–Szekeres.)
- Adversary lower bound: prove that finding the maximum of elements requires at least comparisons. Sketch the adversary's strategy.
Challenge
- State and prove the loop invariant for the in-place partition step of quicksort (the Lomuto scheme). Use it to prove that, after partition, the pivot ends up in its final sorted position.
- Prove that any reachable state of the readers-writers concurrent algorithm (with at most one writer or any number of readers) satisfies the safety invariant. (You will need to enumerate the program states and show every transition preserves the invariant.)
How to attack a proof exercise
With proof techniques in hand, every algorithm we meet from now on comes with a built-in question: what is the invariant, and why does it terminate? Answer those two and you have understood the algorithm at the level a senior engineer is expected to.