Chapter 1
15 min read
Section 2 of 261

Problem Decomposition Strategies

Introduction to Algorithmic Thinking

The Core Idea: Reduce, Then Solve

Algorithmic thinking is, at its heart, the art of not solving the problem you were given. Instead, you transform that problem — usually too large or too tangled to attack head-on — into one or more strictly easier instances whose answers can be assembled into the answer you actually want. Decomposition is the lever; recursion, induction, and the asymptotics of T(n)T(n) are the machinery that makes the lever safe.

Formally, given a problem instance PP of size nn, we seek a function decompose(P){P1,P2,,Pa}\text{decompose}(P) \to \{P_1, P_2, \dots, P_a\} producing a0a \geq 0 subproblems each of size strictly smaller than nn, together with a combining function combine(P,ans1,,ansa)ans\text{combine}(P, \text{ans}_1, \dots, \text{ans}_a) \to \text{ans}. When a=0a = 0, PP is a base case whose answer is known directly.

Strategic question. “What is the smallest move I can make on this problem that turns it into a problem I either already know how to solve, or that has the same shape but is strictly easier?”

Almost every classical algorithm — merge sort, binary search, FFT, Dijkstra, Strassen's matrix multiplication, dynamic programming over a DAG — is the answer to that question for some specific PP. The strategies below are the recurring shapes the answer takes.


The Five Decomposition Strategies

Decomposition strategies are not disjoint — many real algorithms are hybrids — but each captures a distinct structural insight. The characteristic differences are: how many subproblems are produced, how much smaller they are, whether they overlap, and whether the input itself must first be transformed.

StrategySubproblemsRecurrence shapeCanonical example
Divide & Conquer≥ 2, independent, disjointT(n) = a·T(n/b) + f(n)Merge sort, FFT, Strassen
Decrease & ConquerExactly 1, slightly smallerT(n) = T(n−c) + f(n) or T(n/c) + f(n)Binary search, Euclid GCD, insertion sort
Transform & ConquerSame problem, easier representationT(n) = transform(n) + solve(n')Heapsort (build-heap), Gaussian elimination, hashing
Brute Force / ExhaustiveAll candidate solutionsT(n) = |candidates| · check(n)Try-all-subsets, generate-and-test
Dynamic ProgrammingMany, overlappingT(n) = #states · transition costFibonacci, LCS, edit distance, Bellman-Ford

1. Divide and Conquer

Split the input into a2a \geq 2 independent pieces, recursively solve each, then merge. The hallmark is independence: subproblem answers do not depend on one another, so the recursive calls can in principle run in parallel. Merge sort splits an array of size nn into halves of size n/2n/2 and merges them in linear time, yielding T(n)=2T(n/2)+Θ(n)=Θ(nlogn)T(n) = 2T(n/2) + \Theta(n) = \Theta(n \log n). The Fast Fourier Transform splits a length-nn signal into even-indexed and odd-indexed halves, recurses, and combines with twiddle factors in Θ(n)\Theta(n). Strassen multiplies two n×nn \times n matrices using 7 recursive multiplications of half-sized matrices instead of the naive 8, giving Θ(nlog27)Θ(n2.807)\Theta(n^{\log_2 7}) \approx \Theta(n^{2.807}).

2. Decrease and Conquer

Make the problem one step smaller — reduce by a constant (usually 1), or by a constant factor. There is exactly one recursive call. Binary search shrinks the candidate range by half each step (decrease by factor); the Euclidean algorithm replaces gcd(a,b)\gcd(a, b) with gcd(b,amodb)\gcd(b, a \bmod b), decreasing by a variable but provably O(logmin(a,b))O(\log \min(a,b)) factor; insertion sort sorts A[0..n1]A[0..n-1] by sorting A[0..n2]A[0..n-2] and inserting A[n1]A[n-1] in place (decrease by 1).

3. Transform and Conquer

Don't shrink the input — change its representation so that the original problem becomes easy. Three sub-flavors:

  1. Instance simplification — presort, then run a simpler algorithm. Finding the median of a sorted array is O(1)O(1).
  2. Representation change — heapsort first builds a binary heap from the input array (O(n)O(n)), then repeatedly extracts the max (O(logn)O(\log n) each). The tree-shaped heap turns “find max” from a linear scan into a constant-time peek.
  3. Problem reduction — map your problem to one with a known solver. Counting inversions reduces to a tweaked merge sort; shortest-path on a DAG reduces to topological order plus relaxation; maximum-flow reduces to linear programming.

Enumerate every candidate solution and test each. Almost always suboptimal, but invaluable as: a baseline for measuring speedups; a correctness oracle for randomized testing of optimized algorithms; the only known approach for many NP-hard problems at small nn (e.g., the traveling salesman with n20n \leq 20). The decomposition is trivial — “try this candidate, then try the rest” — but the candidate space has size 2n2^n, n!n!, or worse.

5. Dynamic Programming / Memoization

Use this when subproblems overlap — that is, when a naive recursion would solve the same subproblem an exponential number of times. The naive recursive Fibonacci F(n)=F(n1)+F(n2)F(n) = F(n-1) + F(n-2) recomputes F(n2)F(n-2) twice, F(n3)F(n-3) three times, and F(k)F(k) roughly ϕnk\phi^{n-k} times where ϕ1.618\phi \approx 1.618. Caching each subproblem the first time it's solved collapses the exponential tree into a DAG of size O(n)O(n). The same trick rescues longest common subsequence (O(nm)O(nm)), edit distance, knapsack, matrix-chain multiplication, and the Bellman equation in reinforcement learning.

Definition: Overlapping Subproblems

A recursive decomposition has overlapping subproblems if the recursion tree contains two distinct nodes that represent the same subproblem instance. Equivalently, the set of distinct subproblems is polynomial in nn while the recursion tree is exponential. This gap is what memoization closes.

The Decomposition Tree

Every recursive decomposition induces a tree: the root is the original problem, the children of any node are the subproblems produced by decompose\text{decompose}, and the leaves are base cases. The shape of this tree is the geometry of the algorithm.

StrategyTree shapeDepthLeaves
Divide & Conquer (merge sort)Balanced binarylog₂ nn
Decrease & Conquer (linear)A long stickn1
Decrease & Conquer (halving)A long sticklog₂ n1
Brute force (subsets)Balanced binary, fulln2ⁿ
DP after memoizationDAG, not a treedependsO(states)

The total cost of an algorithm is nodeswork-at-that-node\sum_{\text{nodes}} \text{work-at-that-node}. Equivalently, depth×work-per-level\text{depth} \times \text{work-per-level} when work is uniform across a level. Merge sort has depth log2n\log_2 n and Θ(n)\Theta(n) work per level → Θ(nlogn)\Theta(n \log n). A naive recursive Fibonacci has depth nn and a roughly doubling number of nodes per level → Θ(ϕn)\Theta(\phi^n).


Recurrences: The Algebra of Decomposition

A recurrence is an equation (or inequality) that expresses the running time of a recursive algorithm on input of size nn in terms of its running time on smaller inputs. It is the algebraic form of the decomposition tree.

The most-cited general form is the divide-and-conquer recurrence:

T(n)  =  aT(n/b)  +  f(n),a1,    b>1.\quad T(n) \;=\; a \cdot T(n/b) \;+\; f(n), \qquad a \geq 1, \;\; b > 1.

Read this as: “solving a problem of size nn costs aa recursive solves on subproblems of size n/bn/b, plus f(n)f(n) non-recursive work to split and combine.” Specific instances:

Algorithmabf(n)Closed form for T(n)
Binary search12Θ(1)Θ(log n)
Merge sort22Θ(n)Θ(n log n)
Karatsuba mult.32Θ(n)Θ(n^{log₂ 3}) ≈ Θ(n^{1.585})
Strassen mult.72Θ(n²)Θ(n^{log₂ 7}) ≈ Θ(n^{2.807})
Naive matrix mult.82Θ(n²)Θ(n³)

The Master Theorem reads off T(n)T(n) directly from (a,b,f)(a, b, f) by comparing f(n)f(n) with nlogban^{\log_b a}. We'll prove and apply it in Chapter 2; for now the take-away is that decomposition shape determines asymptotic running time, not constants and not implementation tricks.

Why this matters

Two algorithms with the same big-O can hide a factor-of-100 constant gap, but two algorithms with different decomposition shapes (say Θ(n2)\Theta(n^2) brute force vs. Θ(nlogn)\Theta(n \log n) divide-and-conquer) diverge by factors that grow without bound. Pick the right shape first; then worry about constants.

Quick Check

An algorithm solves a size-n problem by making 4 recursive calls on subproblems of size n/2, plus Θ(n²) combining work. Which closed form is correct?


Worked Example 1: Sum of an Array

Decrease-and-conquer in its purest form. To sum A[0..n1]A[0..n-1], recursively sum A[1..n1]A[1..n-1] and add A[0]A[0]. Base case: an empty slice sums to 0.

Decrease-and-conquer sum (one recursive call, problem shrinks by 1)
🐍array_sum.py
1Function signature

Takes the array a and an index i marking the start of the suffix we still need to sum. Default i=0 means 'sum the whole array'.

EXAMPLE
Call: array_sum([3, 1, 4, 1, 5]) → i defaults to 0
3Base case test

If i has walked past the last valid index, the suffix A[i..n-1] is empty. The sum of an empty sequence is 0 — return immediately, no recursion.

EXAMPLE
For a = [3, 1, 4, 1, 5] (length 5), this triggers when i = 5
4Return zero — anchor of the induction

This is what makes the whole tower of recursive calls eventually terminate and start unwinding.

7The recursive call: trust it works

Compute the sum of the strictly smaller suffix starting at i+1. By the leap of faith, this returns the correct sum of A[i+1..n-1].

EXAMPLE
If i = 0 and a = [3, 1, 4, 1, 5], rest = array_sum(a, 1) = 1 + 4 + 1 + 5 = 11
8Combine: add the element we held back

We held a[i] out of the recursive call, so we add it now. This is the 'combine' step of the universal template.

EXAMPLE
Returns a[0] + rest = 3 + 11 = 14
3 lines without explanation
1def array_sum(a, i=0):
2    # Base case: index past the last element -> empty suffix
3    if i == len(a):
4        return 0
5
6    # Recursive case: A[i] + sum of A[i+1..n-1]
7    rest = array_sum(a, i + 1)
8    return a[i] + rest

The recurrence is T(n)=T(n1)+Θ(1)T(n) = T(n-1) + \Theta(1), which solves to T(n)=Θ(n)T(n) = \Theta(n). Stack depth is also Θ(n)\Theta(n) — for n=106n = 10^6 on CPython this will hit the recursion limit, a textbook reason to prefer iteration when the decomposition is linear.


Worked Example 2: Maximum Subarray (Three Decompositions)

Given an array A[0..n1]A[0..n-1] of integers (possibly negative), find the maximum sum achievable by any contiguous subarray. Example: for A=[2,1,3,4,1,2,1,5,4]A = [-2, 1, -3, 4, -1, 2, 1, -5, 4] the answer is 6, attained by the subarray [4,1,2,1][4, -1, 2, 1]. This problem is famous because it admits three textbook decompositions of strikingly different cost, making it ideal for studying how decomposition shape drives running time.

Decomposition A: Brute Force — Θ(n3)\Theta(n^3)

Enumerate every pair (i,j)(i, j) with 0ij<n0 \leq i \leq j < n, sum A[i..j]A[i..j] from scratch, take the max. Three nested loops: (n2)\binom{n}{2} pairs and a sum of length up to nn for each.

🐍python
1def max_subarray_brute(a):
2    n = len(a)
3    best = a[0]
4    for i in range(n):
5        for j in range(i, n):
6            s = sum(a[i:j+1])     # O(j - i + 1) work
7            if s > best:
8                best = s
9    return best

Decomposition B: Divide and Conquer — Θ(nlogn)\Theta(n \log n)

Split AA at the midpoint mm. The optimal subarray either (i) lies entirely in the left half, (ii) lies entirely in the right half, or (iii) crosses mm. Cases (i) and (ii) are recursive calls; case (iii) is the only one we have to handle directly — sweep outward from mm in both directions tracking the best suffix sum to the left and best prefix sum to the right.

🐍python
1def max_subarray_dc(a, lo, hi):
2    if lo == hi:                           # base case: single element
3        return a[lo]
4    m = (lo + hi) // 2
5
6    # case (i) and (ii): entirely on one side
7    best_left  = max_subarray_dc(a, lo, m)
8    best_right = max_subarray_dc(a, m + 1, hi)
9
10    # case (iii): crosses m — best suffix of A[lo..m] + best prefix of A[m+1..hi]
11    s = 0; left_max = float("-inf")
12    for k in range(m, lo - 1, -1):
13        s += a[k]
14        left_max = max(left_max, s)
15    s = 0; right_max = float("-inf")
16    for k in range(m + 1, hi + 1):
17        s += a[k]
18        right_max = max(right_max, s)
19
20    return max(best_left, best_right, left_max + right_max)

The cross-case sweep is Θ(n)\Theta(n), giving the recurrence T(n)=2T(n/2)+Θ(n)T(n) = 2T(n/2) + \Theta(n) Θ(nlogn)\Theta(n \log n).

Decomposition C: Kadane (DP) — Θ(n)\Theta(n)

Define B[i]B[i] as the maximum sum of a contiguous subarray ending exactly at index ii. Then either that subarray is the single element A[i]A[i] alone, or it extends the best subarray ending at i1i-1:

B[i]  =  max(A[i],  A[i]+B[i1]).\quad B[i] \;=\; \max\bigl(\,A[i],\ \ A[i] + B[i-1]\,\bigr).

The answer is maxiB[i]\max_i B[i]. Because each B[i]B[i] depends only on B[i1]B[i-1] we keep one rolling scalar. This is Kadane's algorithm, and it's the textbook example of how transforming a brute-force search into a DP with overlapping-subproblems structure collapses Θ(n3)\Theta(n^3) down to Θ(n)\Theta(n).

Lesson from this example

Same problem, three decompositions, three asymptotic regimes. The gap between brute force (n=104n = 10^4 takes ~1 s) and Kadane (n=109n = 10^9 takes ~1 s) is five orders of magnitude. Decomposition is not stylistic; it is the single largest knob you control.

Kadane in Two Languages

First, pure Python from scratch — no libraries, no tricks — so the algorithm is naked.

Kadane: O(n) max subarray sum
🐍kadane.py
1Function signature

Input is a list of numbers (any sign). Returns the maximum sum achievable by any contiguous, non-empty subarray.

EXAMPLE
kadane([-2, 1, -3, 4, -1, 2, 1, -5, 4]) → 6
3Initialize the rolling DP value

cur_end plays the role of B[i] in the recurrence — the best subarray sum ending exactly at position i. We seed it with B[0] = A[0] (a single-element subarray is the only option ending at index 0).

EXAMPLE
For a = [-2, 1, -3, 4, ...], cur_end starts at -2
4Initialize the global answer

best holds max over all i of B[i] seen so far. Must also start at a[0] — handles arrays where every element is negative (then the answer is the largest element).

EXAMPLE
For a = [-2, 1, -3, 4, ...], best starts at -2
6Sweep i from 1 to n-1

We process each index in order. Because B[i] only needs B[i-1], one left-to-right pass is enough — no array required.

8The DP recurrence

Two choices for the best subarray ending at i: (a) start fresh with just A[i], or (b) extend the best one ending at i-1 by appending A[i]. We pick whichever sum is larger.

EXAMPLE
i=1, a[1]=1, cur_end was -2 → max(1, -2+1) = max(1, -1) = 1. cur_end becomes 1.
10Update the running global best

Compare the new best-ending-at-i with the global best; keep the larger.

EXAMPLE
i=3, a[3]=4, cur_end was -3 → cur_end = max(4, -3+4) = 4. best = max(1, 4) = 4.
12Return the global best

After the loop, best = max over all i of B[i]. For our example, the running cur_end values are [-2, 1, -2, 4, 3, 5, 6, 1, 5] and best = 6.

EXAMPLE
Returns 6 for the input [-2, 1, -3, 4, -1, 2, 1, -5, 4]
5 lines without explanation
1def kadane(a):
2    # B[i] = max subarray sum ENDING at index i (rolling scalar)
3    cur_end = a[0]
4    best    = a[0]
5
6    for i in range(1, len(a)):
7        # Extend the previous best-ending-here, OR restart at A[i] alone
8        cur_end = max(a[i], cur_end + a[i])
9        # Track the global best across all i
10        best    = max(best, cur_end)
11
12    return best

Now a vectorized PyTorch version — useful when you have a batch of sequences (e.g., per-instrument returns in a finance pipeline, or per-channel anomaly scores in time-series models) and want to fuse the sweep into a single GPU kernel via cummax\text{cummax} on prefix sums.

Kadane via prefix sums + cummin (vectorized, batched)
🐍kadane_torch.py
3Function signature

Takes a 2-D tensor of shape (B, n) — B sequences each of length n. Returns one scalar per sequence: the maximum subarray sum.

EXAMPLE
Input shape (4, 1000) → output shape (4,)
6Prefix sum

prefix[j] = a[0] + a[1] + ... + a[j-1] (after we prepend zero in line 8). The classic identity is sum(A[i..j]) = prefix[j+1] - prefix[i].

EXAMPLE
a = [3, -2, 5] → prefix = [3, 1, 6]
7Build a leading-zero tensor

We need prefix[0] = 0 so that 'subarray starting at index 0' fits the prefix-difference identity uniformly.

8Concatenate the zero in front

Now prefix has length n+1: prefix[0] = 0, prefix[k] = a[0] + ... + a[k-1]. This makes the identity sum(A[i..j]) = prefix[j+1] - prefix[i] hold for ALL valid i, j including i = 0.

EXAMPLE
a = [3, -2, 5] → prefix = [0, 3, 1, 6]
11Running minimum of prefix

run_min[k] = min(prefix[0], prefix[1], ..., prefix[k]). Why? sum(A[i..j]) is maximized when we subtract the smallest prefix[i] seen up to position j+1. cummin gives that minimum cheaply in one fused kernel.

EXAMPLE
prefix = [0, 3, 1, 6] → run_min = [0, 0, 0, 0]
14Per-position best subarray ending here

diffs[j] = prefix[j+1] - run_min[j] = max sum of a subarray that ENDS at index j. This is exactly Kadane's B[j], computed for the whole batch in parallel.

EXAMPLE
prefix[1..]=[3,1,6], run_min[..-1]=[0,0,0,0] sliced to [0,0,0] → diffs = [3, 1, 6]
15Reduce to one scalar per sequence

Take the max along the time axis to recover the global maximum subarray sum per batch element. .values strips the (values, indices) tuple that torch.max returns when given a dim.

EXAMPLE
diffs = [3, 1, 6] → returns tensor(6)
8 lines without explanation
1import torch
2
3def kadane_torch(a: torch.Tensor) -> torch.Tensor:
4    # a: (B, n) batch of sequences. Returns (B,) of max subarray sums.
5    # Identity: max_{i<=j} sum(A[i..j]) = max_j ( prefix[j] - min_{i<=j} prefix[i-1] )
6    prefix = torch.cumsum(a, dim=-1)                          # (B, n)
7    zero   = torch.zeros_like(prefix[..., :1])
8    prefix = torch.cat([zero, prefix], dim=-1)                # (B, n+1)
9
10    # Running min of prefix[0..j], i.e., smallest "starting" prefix
11    run_min = torch.cummin(prefix, dim=-1).values             # (B, n+1)
12
13    # max over j of ( prefix[j+1] - run_min[j] )
14    diffs = prefix[..., 1:] - run_min[..., :-1]               # (B, n)
15    return diffs.max(dim=-1).values                           # (B,)

Both implementations compute the same answer, but the second one leverages a transform-and-conquer insight: rewriting the problem in terms of prefix sums turns it into a difference query over running minima, which fuses cleanly onto vector hardware. Same problem — better representation.


Interactive: Decomposing Merge Sort

Decomposition trees are easier to feel than to describe. The widget below walks merge sort through both phases on an 8-element array. Press Step to either split a node or merge a pair.

Step 0 / 14 · phase: split
[38, 27, 43, 3, 9, 82, 10, 1]
actively splittingmerged (sorted)unsorted leaf / pending

How to read this. Step forward to watch mergeSort([38,27,43,3,9,82,10,1])\text{mergeSort}([38,27,43,3,9,82,10,1]) split itself into halves until each piece has length 1 (the base case). Then watch sorted pieces merge back upward in post-order. The split phase makes log2n\log_2 n levels; the merge phase does O(n)O(n) work per level — total O(nlogn)O(n \log n).

Things to notice as you step through:

  • The split phase is a pre-order walk — we always handle a parent before its children.
  • The merge phase is the mirror image: a post-order walk — a parent can only merge after both its children have already merged. This is induction in motion.
  • The tree has exactly log28=3\log_2 8 = 3 levels (excluding the leaves) and 88 leaves — matching the asymptotics for n=8n = 8.
  • Each level processes Θ(n)\Theta(n) elements in total during merge, and there are Θ(logn)\Theta(\log n) levels → the famous Θ(nlogn)\Theta(n \log n).

Decomposition in Real Systems

The five strategies aren't academic exercises — they are the skeletons of the systems you use every minute.

SystemStrategyHow decomposition shows up
Google MapReduce / SparkDivide & ConquerMap: split a 100 TB log file into independent 64 MB shards. Reduce: combine per-shard answers. T(n) = a · T(n/a) + Θ(n) at each stage.
PostgreSQL query plannerTransform & ConquerRewrites SQL into a relational-algebra DAG, then picks join orders via DP over connected subsets — bitmasked Bellman-Held-Karp on subgraphs of size ≤ 12.
GCC / LLVMDivide & Conquer + DPRecursive descent splits source by grammar rules; instruction selection on the resulting tree is DP (tree pattern matching, Aho-Johnson).
Operating system schedulers (CFS)Transform & ConquerRepresent the process set as a red-black tree keyed by virtual runtime → 'pick next process' becomes O(log n) tree-min instead of O(n) scan.
TCP/IP routing (BGP, OSPF)Divide & Conquer (hierarchy)The Internet is recursively partitioned into autonomous systems → areas → subnets, so route lookup is bounded by tree depth, not the global table size.
MP3, JPEG, H.264, DSPDivide & ConquerFFT in Θ(n log n) (Cooley-Tukey) splits a length-n DFT into two length-n/2 DFTs. Without it, modern audio/video codecs would not exist.
Google Maps routingDP / Decrease & ConquerDijkstra (DP over the relaxation Bellman equation) plus contraction-hierarchies (transform-and-conquer: precompute shortcuts) gives sub-millisecond cross-continent routes.
Recommender systemsDP / matrix factorizationCast 'predict user-item rating' as low-rank matrix completion; solve via alternating least squares — each subproblem (one user fixed) is a small linear system.
Distributed gradient descentDivide & ConquerAll-reduce splits an N-GPU gradient sum into log N pairwise reductions of half the data each — same recursive shape as merge sort over a ring.
3D rendering (BVH / scene graphs)Divide & ConquerBounding Volume Hierarchies recursively partition space → ray-scene intersection drops from Θ(#triangles) to Θ(log #triangles).
Genome assemblyTransform & ConquerRead overlaps reduce to an Eulerian path on a de Bruijn graph; problem-reduction lets us reuse 80 years of graph theory.
Black-Scholes option pricing (Monte Carlo)Brute Force + parallelismGenerate 10⁸ candidate price paths, average the payoffs. Embarrassingly parallel — exhaustive search becomes feasible because the candidates are independent.

Notice how often transform-and-conquer is the hidden engine: red-black trees in the Linux kernel, prefix sums on GPUs, de Bruijn graphs in bioinformatics. The work is in choosing a representation in which the problem you care about is trivial.


Recognizing Which Strategy Fits

Given a fresh problem, run this checklist. The first “yes” usually wins.

  1. Can I split the input into ~aa independent pieces of size n/bn/b? If the pieces don't need to talk to each other, you have divide and conquer. (Sorting halves, FFT halves, BVH octants.)
  2. Can I solve a slightly smaller instance and finish in O(1)O(1) or O(n)O(n)? Then it's decrease and conquer. (Insertion sort, GCD, binary search.)
  3. Is there a representation in which my problem is easy? Transform and conquer. Sorting first, building a hash table, mapping to a graph, switching from time domain to frequency domain.
  4. Do my recursive subproblems repeat? Then memoize and you have dynamic programming. The diagnostic: draw two levels of the recursion tree and look for nodes labeled with the same subproblem.
  5. Is the candidate space small (or am I just sanity-checking my fast algorithm)? Brute force is fine and is often the right baseline.
  6. Is the input itself a tree, graph, or recursively-defined object? Then the natural decomposition follows the structure of the input. (Tree algorithms, JSON/AST traversals, file-system walks.)

Heuristic

If two strategies look applicable, pick the one with the smallest depth in its decomposition tree, all else equal. Depth bounds both stack usage and the parallelism ceiling.

Quick Check

You're asked to compute the number of inversions (pairs i < j with A[i] > A[j]) in an array of length 10⁶. Which decomposition strategy gives the best asymptotic running time?


Correctness and Efficiency Intuition

Correctness via structural induction

A recursive algorithm is correct iff: (1) it returns the right answer on every base case, and (2) assuming the recursive calls return the right answer on their (strictly smaller) inputs, the combine step produces the right answer on the current input. This is the induction hypothesis in computational form. For kadane the base case is the singleton subarray A[0]A[0] with answer A[0]A[0]; the inductive step is the recurrence B[i]=max(A[i], A[i]+B[i1])B[i] = \max(A[i],\ A[i] + B[i-1]) — if we believe B[i1]B[i-1] is correct, the case analysis (start fresh vs. extend) is exhaustive, so B[i]B[i] is correct.

Termination is part of correctness

Correctness requires termination — otherwise “assume the recursive calls return” is vacuous. Termination follows from a well-founded order on the inputs: each recursive call's argument is strictly smaller in some measure (length, depth, value of a+ba + b, etc.) that has no infinite descending chain. For arrays the usual measure is length; for Euclid's algorithm it is min(a,b)\min(a, b).

Efficiency = depth × work-per-level

Once you have a decomposition tree, asymptotic cost falls out by accounting:

  1. Depth — how many levels deep does the tree go? Halving gives O(logn)O(\log n); decreasing by 1 gives O(n)O(n).
  2. Branching — how many children per node? Constant aa children with shrinkage factor bb means level kk has aka^k nodes of size n/bkn/b^k.
  3. Per-node work — the non-recursive cost of decompose + combine at a node of size mm.
  4. Sum it up: total work = nodesf(size of node)\sum_{\text{nodes}} f(\text{size of node}). For balanced trees this collapses to a clean closed form via the Master Theorem.

Two algorithms with the same big-O often differ in which term dominates — e.g., merge sort vs. heapsort are both Θ(nlogn)\Theta(n \log n), but heapsort's constant is larger because heap operations have worse cache locality. The shape of the decomposition predicts not just asymptotic order but also memory access patterns, parallelism, and even numerical stability (Strassen is famously less numerically stable than the naive algorithm despite being asymptotically faster).

The takeaway from Chapter 1, Section 2

Decomposition is the single most powerful design tool in algorithmics. Pick the right shape — then code it. The five strategies in this section are the structural vocabulary you will use to read, invent, and analyze every algorithm in the rest of this book.
Loading comments...