Chapter 3
20 min read
Section 14 of 261

Proof Techniques

Mathematical Foundations

Why Engineers Prove Things

A proof is a chain of reasoning that takes you from things you already accept to a conclusion that, by the rules of logic, must therefore hold. It looks like a math exercise. It functions, for working software engineers, like a debugger that runs over every possible input at once. When Donald Knuth quipped, "beware of bugs in the above code; I have only proved it correct, not tried it," he wasn't belittling proofs — he was pointing out that the only thing better than testing on a million inputs is reasoning that closes the door on the other infinity.

The skills in this section are not optional ornaments. They show up directly as code:

Proof techniqueEngineering artifact it produces
Direct proofAn algorithm derivation: "here is why this formula computes the answer."
InductionA correctness argument for any function with a recursion or loop.
Loop invariantsHoare-style assertions you can paste into the code as `assert` statements.
Termination proofA guarantee that a service won't hang — the variant is your liveness witness.
PigeonholeHash-collision arguments, cycle-detection, cache-eviction lower bounds.
Adversary argument"No matter how clever you are, sorting needs Ω(n log n) comparisons."
The bridge: a loop invariant is mathematical induction in disguise. A termination argument is well-founded recursion in disguise. A type checker is a tiny automated theorem prover. Whenever you write code that must be correct on inputs you have not seen, you are doing proof work whether or not you call it that.

Direct Proof

A direct proof of PQP \Rightarrow Q assumes PP and uses definitions and previously established facts to derive QQ. There is no rhetorical trick — you simply walk forward.

Worked example: if nn is odd, then n2n^2 is odd

Definition. An integer nn is odd iff there exists an integer kk with n=2k+1n = 2k + 1.

Proof. Assume nn is odd. Then n=2k+1n = 2k + 1 for some integer kk. Squaring, n2=(2k+1)2=4k2+4k+1=2(2k2+2k)+1n^2 = (2k+1)^2 = 4k^2 + 4k + 1 = 2(2k^2 + 2k) + 1. Let m=2k2+2km = 2k^2 + 2k; mm is an integer because integers are closed under addition and multiplication. Therefore n2=2m+1n^2 = 2m + 1, which is the definition of odd. \blacksquare

The shape of every direct proof

Unfold the hypothesis using its definition. Re-fold the conclusion using its definition. Algebra in the middle bridges the two.

Proof by Contrapositive

The contrapositive of PQP \Rightarrow Q is ¬Q¬P\neg Q \Rightarrow \neg P. The two are logically equivalent — proving one proves the other. Sometimes the contrapositive is dramatically easier because ¬Q\neg Q gives you a concrete object to manipulate.

Worked example: if n2n^2 is even, then nn is even

Trying to prove this directly is awkward (square roots are not friendly over the integers). Contrapositive: if nn is odd, then n2n^2 is odd — which is exactly what we just proved. Done. \blacksquare

When to flip

If the hypothesis says "n2n^2 has property X" or "ff is not injective" — some negative or composite condition — flipping often turns it into a constructive statement you can compute with.

Proof by Contradiction

Assume the conclusion is false, derive a contradiction, conclude the conclusion is true. The technique is so old it has a Latin name: reductio ad absurdum.

Worked example 1: 2\sqrt{2} is irrational

Suppose, for contradiction, that 2=p/q\sqrt{2} = p/q with p,qp, q integers having no common factor (we may always reduce to lowest terms). Then 2q2=p22q^2 = p^2, so p2p^2 is even, so pp is even (by contrapositive above), so p=2rp = 2r for some integer rr. Substituting, 2q2=4r22q^2 = 4r^2, so q2=2r2q^2 = 2r^2, so qq is also even. But then pp and qq share the factor 2 — contradicting our choice of lowest terms. \blacksquare

Worked example 2: there are infinitely many primes (Euclid)

Suppose, for contradiction, that the primes are exactly p1,p2,,pnp_1, p_2, \ldots, p_n. Form N=p1p2pn+1N = p_1 p_2 \cdots p_n + 1. NN is greater than every pip_i, so it is not in our list. Yet every integer >1> 1 has a prime factor. That factor divides NN, but each pip_i divides p1pnp_1 \cdots p_n and hence leaves remainder 1 when dividing NN. So NN has a prime factor not on the list — contradiction. \blacksquare

Sketch: the halting problem is undecidable

Suppose, for contradiction, a program H(P,x)H(P, x) exists that returns true iff program PP halts on input xx. Build the diagonal program D(P): if H(P, P) returns true, loop forever; else halt. Now ask: does D(D) halt? If yes, H(D,D)H(D, D) said yes, so D loops — contradiction. If no, H(D,D)H(D, D) said no, so D halts — contradiction. The assumption was wrong; HH cannot exist. \blacksquare

Direct vs. contradiction

A direct proof tells you why the statement holds. A contradiction proof only tells you that it holds. When both are available, prefer the direct one — it is more constructive and usually easier to convert into code.

Mathematical Induction

Induction is the workhorse of algorithm correctness. To prove P(n)P(n) for every integer nn0n \ge n_0:

  1. Base case. Show P(n0)P(n_0) directly.
  2. Inductive hypothesis. Assume P(k)P(k) for an arbitrary kn0k \ge n_0.
  3. Inductive step. Using only the hypothesis and definitions, prove P(k+1)P(k+1).

Picture an infinite line of dominoes: the base case knocks the first one over; the inductive step says each domino topples its neighbor; together every domino falls.

Worked example 1: Gauss's sum, i=1ni=n(n+1)2\sum_{i=1}^{n} i = \frac{n(n+1)}{2}

Base. For n=1n = 1, the LHS is 1 and the RHS is 122=1\frac{1 \cdot 2}{2} = 1. Match.

Hypothesis. Assume i=1ki=k(k+1)2\sum_{i=1}^{k} i = \frac{k(k+1)}{2}.

Step. Then i=1k+1i=i=1ki+(k+1)=k(k+1)2+(k+1)=k(k+1)+2(k+1)2=(k+1)(k+2)2\sum_{i=1}^{k+1} i = \sum_{i=1}^{k} i + (k+1) = \frac{k(k+1)}{2} + (k+1) = \frac{k(k+1) + 2(k+1)}{2} = \frac{(k+1)(k+2)}{2}, which is the n=k+1n = k+1 case. \blacksquare

Worked example 2: 2n>n2^n > n for all n1n \ge 1

Base. 21=2>12^1 = 2 > 1. Holds.

Hypothesis. Assume 2k>k2^k > k.

Step. 2k+1=22k>2k=k+kk+12^{k+1} = 2 \cdot 2^k > 2k = k + k \ge k + 1, where the last inequality uses k1k \ge 1. So 2k+1>k+12^{k+1} > k+1. \blacksquare

The base case is not optional

Without it, you can "prove" absurd things. Try this fake induction: all horses are the same color. Hypothesis: any kk horses are the same color. Step: in any group of k+1k+1 horses, remove one — the remaining kk match; put it back and remove a different one — those kk also match; therefore all k+1k+1 match. The step looks fine. The base case k=1k = 1 trivially holds. The case k=2k = 2 is where the argument fails — you can't form two overlapping groups of size 1 to splice. The lesson: verify the step actually closes the gap from your base.

Quick Check

In the Gauss-sum induction, which line uses the inductive hypothesis?


Strong Induction

Sometimes "P(k)P(k+1)P(k) \Rightarrow P(k+1)" is too narrow a step — you need to lean on multiple smaller cases. Strong induction lets you assume P(j)P(j) for every n0jkn_0 \le j \le k when proving P(k+1)P(k+1). Logically it is equivalent to ordinary induction; tactically it is much more powerful.

Worked example: every integer n2n \ge 2 has a prime factorization

Base. n=2n = 2 is prime; the "factorization" is just 22.

Hypothesis. Assume every integer jj with 2jk2 \le j \le k has a prime factorization.

Step. Consider k+1k+1. If k+1k+1 is prime, we are done. Otherwise k+1=abk+1 = a \cdot b with 2a,bk2 \le a, b \le k. By the strong hypothesis, both aa and bb have prime factorizations; concatenating them gives one for k+1k+1. \blacksquare

Why ordinary induction fails here

We didn't step from kk to k+1k+1; we stepped from aa and bb, both possibly far below kk. That is exactly the leverage strong induction provides.

Structural Induction

When the objects you reason about are not numbers but recursively-defined structures — lists, trees, expressions, programs — ordinary induction on size is awkward. Structural induction matches the structure exactly:

StructureBase case(s)Inductive case
ListsP(empty list)P(xs) ⇒ P(cons x xs)
Binary treesP(empty) and/or P(leaf)P(L) and P(R) ⇒ P(node v L R)
Arithmetic expressionsP(literal), P(variable)P(e1), P(e2) ⇒ P(e1 op e2)
Natural numbers (Peano)P(0)P(n) ⇒ P(succ n)

Worked example: a complete binary tree of nn nodes has height log2n\lfloor \log_2 n \rfloor

Let h(T)h(T) denote the height of TT (single-node tree has height 0). We prove the equivalent claim: a complete binary tree of height hh has between 2h2^h and 2h+112^{h+1} - 1 nodes; therefore h=log2nh = \lfloor \log_2 n \rfloor.

Base. A single-node tree has height 0 and 1 node; 20=11211=12^0 = 1 \le 1 \le 2^1 - 1 = 1. Holds.

Step. Suppose the claim holds for all complete binary trees of height <h< h. A complete tree of height hh has root plus two subtrees, each of height h1h-1 (one possibly missing its last level). By the hypothesis, each subtree has between 2h12^{h-1} and 2h12^h - 1 nodes, so the whole tree has between 1+22h1=2h+11 + 2 \cdot 2^{h-1} = 2^h + 1... wait — the lower bound is achieved when the last level has just one node, giving 2h2^h total. The upper bound 1+2(2h1)=2h+111 + 2(2^h - 1) = 2^{h+1} - 1 matches. Taking logs of 2hn2h+112^h \le n \le 2^{h+1} - 1 gives h=log2nh = \lfloor \log_2 n \rfloor. \blacksquare

Why this matters for DS&A

Every "the height of a balanced tree is logarithmic" argument — AVL trees, red-black trees, B-trees, segment trees — is structural induction on the recursive definition of the tree. Master this template once and you've unlocked an entire chapter's worth of correctness arguments.

Loop Invariants: Induction in Code

A loop invariant is a statement about the program state that is true every time control reaches the loop test. Tony Hoare turned this idea into a method for proving programs correct. Three obligations — an exact mirror of induction:

  1. Initialization (base). The invariant holds the first time we reach the loop test.
  2. Maintenance (step). If the invariant holds before an iteration, it still holds after.
  3. Termination (closing the deal). When the loop exits, the invariant plus the exit condition gives the post-condition we wanted.

Worked example: insertion sort

Invariant: at the start of each iteration with index ii, the prefix A[1..i1]A[1..i-1] is a sorted permutation of the original first i1i-1 elements.

Initialization. Before the first iteration i=2i = 2; A[1..1]A[1..1] is a single element, trivially sorted, and trivially a permutation of itself.

Maintenance. The body shifts elements of A[1..i1]A[1..i-1] larger than A[i]A[i] one position right and inserts A[i]A[i] in the gap. The result is a sorted permutation of the original first ii elements — the invariant for the next iteration with i+1i+1.

Termination. The loop exits at i=n+1i = n+1. The invariant says A[1..n]A[1..n] is a sorted permutation of the original array — precisely the post-condition. \blacksquare

Worked example: binary search invariant

Invariant: if the target exists anywhere in the original array, then it lies in the slice A[lo..hi]A[\text{lo}..\text{hi}].

Initialization. lo=0,hi=n1\text{lo} = 0, \text{hi} = n - 1 covers the whole array; trivially the target, if present, is in the slice.

Maintenance. We compute m=(lo+hi)/2m = \lfloor (\text{lo} + \text{hi})/2 \rfloor and compare. If A[m]=tA[m] = t we're done. If A[m]<tA[m] < t, the array is sorted, so every index m\le m has value A[m]<t\le A[m] < t; the target cannot be there, so it must lie in A[m+1..hi]A[m+1..\text{hi}] — updating lom+1\text{lo} \leftarrow m+1 preserves the invariant. Symmetric argument for A[m]>tA[m] > t.

Termination. Each iteration shrinks hilo\text{hi} - \text{lo} by at least 1 (since either lo\text{lo} jumps past mm or hi\text{hi} falls below mm). When lo>hi\text{lo} > \text{hi}, the slice is empty; combined with the invariant, the target is not in the array. \blacksquare

An invariant is the ghost of an induction inside your loop

If you can write the invariant down clearly, you have already done 80% of the correctness proof. Many senior engineers add invariants directly as assert statements — they're executable specifications.

Interactive: Invariant Inspector

Step through binary search on a sorted array. At every step the cyan band shows the surviving range A[lo..hi]A[\text{lo}..\text{hi}]. The invariant promises that if the target is in the array at all, it is somewhere in cyan. Watch the band shrink; the green panel reaffirms why the invariant still holds.

Invariant Inspector — Binary SearchStep 1 of 5
10lo4172103134165196227mid258289311034113712401343144615hi
Current state
  • lo = 0, hi = 15, mid = 7
  • A[mid] = 22, target = 23
  • Action: A[mid] < target, set lo = mid + 1
Invariant holds
If the target exists in the original array, then it lies inside A[lo..hi]. The cyan cells show that surviving range. The amber cell is the only one we just compared — shrinking lo or hi past it is sound because the array is sorted.

Try a target that isn't in the array (drag the target slider to a value between two cells). The invariant still holds at every step — right up to the final step where the range becomes empty. That empty range is exactly the proof: the contrapositive form of the invariant says if the target isn't in A[lo..hi]A[\text{lo}..\text{hi}], then it isn't in the original array. The algorithm correctly returns "not found."


Termination Proofs

Correctness has two parts: partial correctness (if the algorithm halts, the answer is right) and total correctness (it actually halts). Loop invariants buy you the first; you prove the second with a variant: a quantity that

  1. takes values in a well-founded set (one with no infinite descending chain — e.g. N\mathbb{N}),
  2. strictly decreases on every iteration / recursive call.

Because the values are bounded below and strictly drop, the process must reach the floor — the loop exits, the recursion bottoms out.

Worked example: Euclid's GCD terminates

Recall the recurrence gcd(a,b)=gcd(b,amodb)\gcd(a, b) = \gcd(b, a \bmod b) with base gcd(a,0)=a\gcd(a, 0) = a. Take the variant V(a,b)=bV(a, b) = b. By definition of mod\bmod, 0amodb<b0 \le a \bmod b < b, so the second argument strictly decreases. It is a non-negative integer. The recursion must reach b=0b = 0 in at most b0b_0 steps. \blacksquare

Tighter bound

The variant V=bV = b proves termination but vastly overestimates speed. A finer analysis (Lamé's theorem) shows that two consecutive Euclid steps cut bb by at least a factor of 12\frac{1}{2}, giving O(logmin(a,b))O(\log \min(a, b)) iterations. We'll meet this analysis in the modular-arithmetic section.

Programs that "obviously" terminate but don't

The Collatz function — if nn is even, n/2n/2; else 3n+13n+1 — is conjectured to terminate for every positive integer, but no one has proved it. There is no known well-founded variant. If even something this simple resists a termination proof, do not skip the variant for your real systems.

Disproof by Counterexample

Universal claims ("for all nn...") are disproved by exhibiting a single nn for which the claim fails. Existential claims ("there exists nn...") cannot be disproved this way.

ClaimCounterexampleWhy it kills the claim
Every odd number is prime.9 = 3 · 39 is odd but composite.
n² + n + 41 is prime for every n ≥ 0.n = 40 gives 41².Euler's polynomial eventually fails.
Every continuous function on ℝ is differentiable.f(x) = |x| at 0Continuous, not differentiable.
Greedy interval scheduling by shortest interval is optimal.Three intervals: [1, 4], [3, 5], [4, 7]Greedy picks [3, 5]; optimal picks [1, 4] and [4, 7].
Hashing always gives O(1) lookup.Adversarial keys colliding in one bucketLinear scan on collision chain.

In algorithm design, counterexamples are how you reject a plausible-looking approach before you waste a week implementing it. Whenever your gut says "greedy should work here," try to break it on small inputs first. A counterexample on an array of length 4 saves you a debugging session at scale.


Pigeonhole Principle

Pigeonhole. If n+1n+1 objects go into nn boxes, some box gets at least two.

Generalized. If mm objects go into nn boxes, some box gets at least m/n\lceil m/n \rceil.

It looks trivial; it is devastatingly powerful. A few algorithmic uses:

  • Hash collisions are inevitable. Any hash function from an unbounded key space into a finite table has, by pigeonhole, infinitely many collisions. There is no "perfect" general-purpose hash.
  • Floyd's cycle detection. A function ff on a finite set, iterated, must eventually revisit a value — the iterates are pigeons, the set is the boxes. This is why the "tortoise and hare" algorithm works.
  • Finite-state machines repeat. A DFA with kk states reading a string of length >k> k must visit some state twice — the foundation of the pumping lemma.
  • Birthday bound. Among 23 random people, two probably share a birthday. The "n\sqrt{n}-collision" bound for hashing is the same calculation.

Mini-proof: any 5 points in a unit square include two within distance 22\frac{\sqrt{2}}{2}

Partition the unit square into four 12×12\frac{1}{2} \times \frac{1}{2} sub-squares. Five points into four sub-squares — by pigeonhole, some sub-square holds at least two. Any two points in a sub-square are within its diagonal 22\frac{\sqrt{2}}{2}. \blacksquare


Adversary Arguments

How do you prove that no algorithm can do better than X? You play the adversary: you imagine an opponent who answers each query the algorithm makes in the worst possible (but consistent) way. If the adversary can force the algorithm into f(n)f(n) work, then f(n)f(n) is a lower bound for the problem.

Sketch: comparison-based sorting requires Ω(nlogn)\Omega(n \log n) comparisons

Any deterministic comparison sort can be drawn as a decision tree: each internal node is a comparison aiaja_i \lessgtr a_j, each leaf is one of the n!n! possible permutations. To distinguish all permutations, the tree must have at least n!n! leaves. A binary tree with LL leaves has height at least log2L\log_2 L. Therefore the worst-case number of comparisons is at least log2(n!)=k=1nlog2k=Θ(nlogn)\log_2(n!) = \sum_{k=1}^{n} \log_2 k = \Theta(n \log n) by Stirling's approximation. \blacksquare

Why this matters

Mergesort, heapsort, and randomized quicksort all hit O(nlogn)O(n \log n). The decision-tree argument tells us we cannot beat that with comparisons — if you want O(n)O(n) sorting, you must look outside the comparison model (radix sort, counting sort, bucket sort).

Probabilistic Arguments

The probabilistic method proves that an object with property XX exists by showing that a randomly drawn object has property XX with positive probability. Strange — you never construct the object. But if it would exist with probability >0> 0, then by the definition of probability there must be at least one in the sample space.

For algorithm analysis, the friendlier cousin is expected-value reasoning. Randomized quicksort, with a uniformly random pivot, runs in expected time E[T(n)]=O(nlogn)E[T(n)] = O(n \log n) even though its worst case is O(n2)O(n^2). The proof shows that any pair of elements is compared at most once and computes the expected number of comparisons by linearity of expectation:

E[comparisons]=i<jPr[ai and aj are compared]=i<j2ji+1=O(nlogn)E[\text{comparisons}] = \sum_{i < j} \Pr[a_i \text{ and } a_j \text{ are compared}] = \sum_{i < j} \frac{2}{j - i + 1} = O(n \log n).

Randomization defeats adversaries

A deterministic algorithm has a fixed worst-case input: the adversary picks it. A randomized algorithm makes its choices after the input is fixed, so the adversary cannot tailor the input to the algorithm's coin flips. This is exactly why random pivots, random hash seeds, and randomized routing buy practical robustness.

We now write the algorithm out and annotate every meaningful line with its role in the proof. Python first; TypeScript second with explicit invariant assertions.

Binary search with the invariant proof inline
🐍binary_search.py
1Function signature

Inputs are a sorted array A and a target value. The docstring records the pre- and post-conditions; without the pre-condition (sorted), the invariant cannot be maintained.

6Initialize the surviving range

lo and hi span the full array. This is the *initialization* step of the loop invariant: any target value, if present, lies somewhere in A[0..n-1].

EXAMPLE
For A = [1, 4, 7, 9, 12], lo = 0 and hi = 4.
8Invariant comment

Always write the invariant down. It is the loop's contract with the rest of the program. Many engineers turn this into `assert` statements that run during testing.

9Loop test (well-founded variant)

We continue while the range is non-empty. The variant is hi - lo: it is a non-negative integer that strictly decreases on every iteration (proved on lines 13 and 15). Therefore the loop terminates.

10Compute mid (overflow-safe in Python)

Python integers are arbitrary precision, so (lo + hi) // 2 is fine. In C/Java/Rust use lo + (hi - lo) // 2 to avoid integer overflow when both are large.

EXAMPLE
lo = 0, hi = 4 → mid = 2.
11Found case

The post-condition holds trivially: A[mid] == target, so target is in A and we return its index. No further reasoning needed.

13Maintenance: target must be to the right

Because A is sorted, every index ≤ mid has value ≤ A[mid] < target. The target cannot be in A[lo..mid]. So if it is anywhere in A[lo..hi], it is in A[mid+1..hi]. Setting lo = mid + 1 preserves the invariant.

EXAMPLE
A[2] = 7, target = 9 → narrow to A[3..4].
15Maintenance: target must be to the left

Symmetric: every index ≥ mid has value ≥ A[mid] > target. Setting hi = mid - 1 preserves the invariant.

16Variant decreases

Either lo jumped past mid (so lo strictly increased) or hi fell below mid (so hi strictly decreased). Either way, hi - lo dropped by at least 1. Termination guaranteed.

18Termination + invariant ⇒ post-condition

We exit when lo > hi, i.e. the range A[lo..hi] is empty. Combined with the invariant 'if target is in A, it is in A[lo..hi]', target is not in A. Returning -1 satisfies the post-condition.

9 lines without explanation
1def binary_search(A, target):
2    """Return an index i with A[i] == target, or -1 if not found.
3       Pre:  A is sorted in non-decreasing order.
4       Post: Returns -1 iff target not in A.
5    """
6    lo, hi = 0, len(A) - 1
7
8    # INVARIANT: if target is in A, then it is in A[lo..hi].
9    while lo <= hi:
10        mid = (lo + hi) // 2          # midpoint of surviving range
11        if A[mid] == target:
12            return mid                # post-condition trivially holds
13        elif A[mid] < target:
14            lo = mid + 1              # invariant preserved (sorted ⇒ left half ruled out)
15        else:
16            hi = mid - 1              # invariant preserved (sorted ⇒ right half ruled out)
17        # VARIANT (hi - lo) strictly decreased; well-founded in N.
18
19    return -1                         # range empty + invariant ⇒ target absent

The same algorithm in TypeScript, with the invariants written as runtime assertions you can keep in development builds:

Same algorithm in TypeScript with executable invariants
📘binarySearch.ts
1Readonly input enforces &quot;sorted&quot; can&apos;t be broken mid-call

Marking A as `readonly number[]` is a tiny lightweight proof: the type system guarantees nobody mutates A while we execute, so the sorted property survives.

3Pre-condition assertion

We verify the array is actually sorted before relying on that fact. In production builds, console.assert is a no-op; in development it screams immediately if a caller hands us garbage.

EXAMPLE
Catches bugs like binarySearch([3, 1, 4]) early.
11Inline invariant assertion

Express the invariant as an executable predicate: either the surviving slice contains the target, or the target wasn't in A to begin with. This is *only* for instrumentation — the algorithm itself doesn't depend on it.

16Overflow-safe midpoint

lo + ((hi - lo) >> 1) avoids the famous bug in Joshua Bloch's binary search post (Java's mid = (lo + hi) / 2 overflowed for large arrays). The shift is integer division by 2.

17Found

Same as the Python version. The proof obligation is discharged immediately.

18Maintenance step (right half)

lo = mid + 1 preserves the invariant. The variant hi - lo dropped by mid - lo + 1 ≥ 1.

19Maintenance step (left half)

hi = mid - 1 preserves the invariant. The variant dropped by hi - mid + 1 ≥ 1.

21Empty range ⇒ not found

Termination + invariant gives the post-condition. The proof is complete.

14 lines without explanation
1function binarySearch(A: readonly number[], target: number): number {
2  // Pre-condition: A is sorted non-decreasing.
3  for (let i = 1; i < A.length; i++) {
4    console.assert(A[i - 1] <= A[i], "pre: A must be sorted");
5  }
6
7  let lo = 0;
8  let hi = A.length - 1;
9
10  while (lo <= hi) {
11    // Invariant: target ∈ A ⇒ target ∈ A[lo..hi].
12    const containsTarget =
13      A.slice(lo, hi + 1).includes(target) || !A.includes(target);
14    console.assert(containsTarget, "invariant violated");
15
16    const mid = lo + ((hi - lo) >> 1);   // overflow-safe midpoint
17    if (A[mid] === target) return mid;
18    if (A[mid] < target) lo = mid + 1;
19    else hi = mid - 1;
20  }
21  return -1;
22}

Quick Check

Suppose a colleague rewrites the recursive call as `lo = mid` instead of `lo = mid + 1`. What breaks?


Case Study: Euclid's GCD

Two proof obligations: the algorithm computes the right thing (partial correctness via the recurrence gcd(a,b)=gcd(b,amodb)\gcd(a, b) = \gcd(b, a \bmod b)), and it terminates (well-founded variant bb).

Euclidean GCD with both proofs annotated
🐍gcd.py
1Signature with explicit integer types

Type hints document the precondition the proof needs: integers (the modular operation must be defined) and non-negative (so b lives in the well-founded set ℕ).

EXAMPLE
gcd(48, 18) → 6.
5Pre-condition

Both inputs non-negative; both can't be zero (gcd(0, 0) is undefined). We could check this with assertions in production code.

6Loop test on the variant

We continue while b > 0. The exit condition b == 0 is exactly the base case of the mathematical recurrence, which is no coincidence.

8The single mathematical step

By definition of `mod`, 0 ≤ a % b < b. So the *new* value of b is strictly less than the *old* value of b — the variant strictly decreases. Since b lives in ℕ, the loop must reach 0.

EXAMPLE
(48, 18) → (18, 12) → (12, 6) → (6, 0).
9Return value justified by the recurrence

When b = 0, the recurrence gives gcd(a, 0) = a directly. The return value is the answer.

4 lines without explanation
1def gcd(a: int, b: int) -> int:
2    """Greatest common divisor of a, b ≥ 0, not both zero.
3       Recurrence: gcd(a, 0) = a; gcd(a, b) = gcd(b, a % b).
4    """
5    # Pre: a ≥ 0, b ≥ 0, (a, b) ≠ (0, 0).
6    while b != 0:
7        # Variant: b is a non-negative integer; we prove it strictly decreases.
8        a, b = b, a % b              # 0 ≤ a % b < b ⇒ new b strictly less than old b
9    return a                         # b == 0 ⇒ result is a (base case of the recurrence)

The TypeScript version is essentially the same; the proof obligations transfer verbatim. We use BigInt here because the algorithm is most useful for cryptographic-sized integers, where ordinary 64-bit arithmetic would overflow.

GCD on arbitrary-precision integers
📘gcd.ts
1BigInt for cryptographic sizes

RSA key generation, ECC arithmetic, and Diffie-Hellman all rely on gcd over very large integers. Using BigInt closes off a whole class of overflow bugs.

2Reject negative inputs

Our termination proof relied on b ∈ ℕ. Negative b would still terminate in JavaScript (because of how % is defined for negatives) but the *proof* doesn't cover that case, so we refuse the input.

3Reject (0, 0)

gcd(0, 0) has no answer — every integer divides both. We make the partial-function status explicit.

6Compute the new remainder

By the modulo definition for non-negative integers, 0 ≤ r < b. The variant b drops to r — strict decrease, so the loop terminates after at most O(log min(a, b)) iterations.

EXAMPLE
gcd(48n, 18n) iterations: (48, 18) → (18, 12) → (12, 6) → (6, 0).
8Update both variables

TypeScript does not have Python's tuple-swap shorthand, so we use a temporary `r`. The algorithmic content is identical.

6 lines without explanation
1function gcd(a: bigint, b: bigint): bigint {
2  if (a < 0n || b < 0n) throw new Error("gcd requires non-negative inputs");
3  if (a === 0n && b === 0n) throw new Error("gcd(0, 0) is undefined");
4
5  while (b !== 0n) {
6    const r = a % b;        // 0 ≤ r < b — this is the variant decrease
7    a = b;
8    b = r;                  // new b strictly less than old b
9  }
10  return a;
11}

Proofs in Industry

These techniques are not academic decoration. They underpin real systems whose failure would be catastrophic.

SystemProof technologyWhy it matters
Amazon Web Services (S3, DynamoDB, EBS)TLA+ specifications + model checkingEngineers caught subtle distributed-protocol bugs (replica divergence, lost writes) that no test suite would have found before launch.
seL4 microkernelIsabelle/HOL machine-checked proofFirst general-purpose OS kernel with a complete formal proof of functional correctness (~ 200,000 lines of proof for ~9,000 lines of C).
CompCert C compilerCoq proof of semantic preservationProves the compiled assembly behaves the same as the source C — eliminating a category of safety-critical compiler bugs (used in Airbus avionics).
TLS 1.3 (Project Everest)F* proof of cryptographic securityVerified TLS implementation now shipping in Firefox, Wireguard, and Azure.
Rust borrow checkerLinear-types proof system at the type-checker level&quot;If it compiles, it has no data races&quot; — a static proof discharged automatically every build.
TypeScript / Flow type systemsCurry-Howard correspondence (lightweight)Every type error is a tiny rejected proof attempt; every successful compile is a small theorem about your data flow.
Financial-trading risk systemsLoop-invariant audits, model checkingRace conditions in matching engines = market-moving losses. Invariant proofs are how desks sleep at night.
Blockchain smart contractsSymbolic execution (Manticore, Mythril) + Z3Reentrancy, integer-overflow, and access-control bugs cost over $3B in 2022 alone; formal methods are the strongest defence.

The throughline: where the cost of failure is high enough, machine-checked proof is cheaper than testing. Even when full proof is overkill, the local techniques — an invariant comment, a termination argument, a sketch of why your greedy is optimal — are the difference between code that almost works and code you can ship.


Common Pitfalls

1. The base case is wrong (or trivially picked)

The horse-color paradox above is the canonical example: a base case that doesn't actually let the inductive step take its first non-trivial bite. Always test the step at n=n0+1n = n_0 + 1 manually, not just at n0n_0.

2. Confusing weak and strong induction

If your inductive step needs facts about cases much smaller than kk (recursing on halves, factors, sub-trees), you need strong induction. Trying to force the argument into ordinary induction will leave a hole.

3. "Proof" by example

Verifying an identity for n=1,2,3,4,5n = 1, 2, 3, 4, 5 proves nothing about n=6n = 6. Polya's prime-counting conjecture L(n)0L(n) \le 0 held for the first 101410^{14} integers and turned out to be false. Examples build intuition; only a proof closes the gap.

4. Off-by-one in the inductive step

The most common bug: assuming P(k)P(k) and accidentally proving P(k)P(k) again, not P(k+1)P(k+1). Annotate which index you are reasoning about at every line.

5. Forgetting termination

Every loop and every recursive call needs some argument for why it ends. "The variable seems to get smaller" isn't enough — identify the variant explicitly. Real-world example: the 2003 Northeast blackout was triggered partly by an alarm system stuck in an infinite race-condition loop.

6. Confusing necessary and sufficient

PQP \Rightarrow Q is not the same as QPQ \Rightarrow P. "If it's a square, it's a rectangle" doesn't mean "rectangles are squares." Half the bad correctness arguments collapse on this point.

7. Universal vs. existential mix-up

To disprove "n,P(n)\forall n, P(n)" one counterexample suffices. To disprove "n,P(n)\exists n, P(n)" you must prove the universal negation — one counterexample is not enough.

Quick Check

A student writes: "I tested my new sorting algorithm on 10,000 random arrays and it always sorted correctly. Therefore it's correct." What proof technique would have actually established correctness, and what is the precise problem with the student's argument?


Exercises

Warm-up

  1. Prove by induction that i=1n(2i1)=n2\sum_{i=1}^{n} (2i - 1) = n^2.
  2. Prove by contradiction that 3\sqrt{3} is irrational.
  3. Disprove: "every quadratic ax2+bx+cax^2 + bx + c with integer coefficients is reducible over the integers." (Give a counterexample.)
  4. Identify the variant for the loop while n > 1: n = n // 2. Why does it terminate?

Loop invariants

  1. Linear search for an element in an unsorted array. State the invariant. Prove initialization, maintenance, and termination.
  2. Selection sort. State an invariant strong enough to prove the post-condition ("A is sorted"). Prove all three obligations.
  3. The two-pointer algorithm for "does a sorted array AA contain two elements summing to ss?" State and prove an invariant relating the pointers i,ji, j to the search space.

Termination

  1. Find a variant for the recursion f(n) = f(n // 2) + f(n // 2) with base f(0) = 1. Why does the well-founded relation work even though we make two recursive calls?
  2. The recursion ack(m, n) for the Ackermann function calls itself with smaller pairs in the lexicographic order on N2\mathbb{N}^2. State the variant precisely and explain why N2\mathbb{N}^2 with this order is well-founded.

Structural induction

  1. Prove by structural induction on binary trees that for every tree, #leaves=#internal nodes+1\#\text{leaves} = \#\text{internal nodes} + 1 (assuming every internal node has exactly two children).
  2. Prove that the in-order traversal of a binary search tree visits keys in non-decreasing order. (Hint: induction on tree structure; carefully state what you assume about the left and right subtrees.)

Pigeonhole and adversary

  1. Prove: any sequence of 11 distinct integers contains either an increasing subsequence of length 4 or a decreasing subsequence of length 4. (Generalization of Erdős–Szekeres.)
  2. Adversary lower bound: prove that finding the maximum of nn elements requires at least n1n - 1 comparisons. Sketch the adversary's strategy.

Challenge

  1. State and prove the loop invariant for the in-place partition step of quicksort (the Lomuto scheme). Use it to prove that, after partition, the pivot ends up in its final sorted position.
  2. Prove that any reachable state of the readers-writers concurrent algorithm (with at most one writer or any number of readers) satisfies the safety invariant. (You will need to enumerate the program states and show every transition preserves the invariant.)

How to attack a proof exercise

Write down the precondition, the postcondition, and (for loops) the candidate invariant before you write any algebra. Most failures are vague specifications, not failed manipulations.

With proof techniques in hand, every algorithm we meet from now on comes with a built-in question: what is the invariant, and why does it terminate? Answer those two and you have understood the algorithm at the level a senior engineer is expected to.

Loading comments...