Learning Objectives
By the end of this section, you will be able to:
- Understand amortized analysis and how it differs from worst-case and average-case analysis
- Apply the aggregate method to calculate average cost over a sequence of operations
- Master the accounting method by assigning charges that pay for expensive operations
- Use the potential method to analyze data structures using a potential function
- Prove that dynamic array push is O(1) amortized using all three methods
- Recognize when amortized analysis applies in real-world data structures
Why This Matters: Amortized analysis is essential for understanding the true performance of data structures like dynamic arrays (std::vector, ArrayList), hash tables, and self-balancing trees. Without it, you would incorrectly conclude that dynamic arrays have O(n) push operations—a claim that contradicts everyday programming experience.
The Problem: Why Worst-Case Lies
Consider the humble dynamic array—the ArrayList in Java, std::vector in C++, or list in Python. You've used it countless times:
1arr = []
2for i in range(1000000):
3 arr.append(i) # What's the cost of this?If you do a naive worst-case analysis, you might reason:
- When the array is full, append must copy all n elements to a new, larger array
- Therefore, worst-case cost of append is
- So n appends would cost !
But this analysis is misleading. In practice, those million appends complete nearly instantly. The worst-case analysis gives a true but useless bound—it tells us each operation might be expensive, but ignores that expensive operations are rare.
The Trap of Worst-Case Analysis
What is Amortized Analysis?
Amortized analysis determines the average performance of an operation in a sequence, guaranteeing the average cost over that sequence even in the worst case.
Key Characteristics
| Analysis Type | What It Measures | Guarantee |
|---|---|---|
| Worst-Case | Single operation's maximum cost | Every operation costs at most this |
| Average-Case | Expected cost with probabilistic assumptions | Average over random inputs |
| Amortized | Average cost over any sequence | Total cost / number of operations |
The Key Insight
Unlike average-case analysis, amortized analysis does not involve probability. It provides a guaranteed average per operation over any sequence of operations. The idea is:
If we can prove that n operations cost at most (for some constant c), then the amortized cost is .
Intuition: Amortized analysis "spreads out" the cost of expensive operations over many cheap ones. It's like a subscription service—you pay a steady monthly fee rather than paying per-use with occasional huge bills.
Running Example: Dynamic Arrays
Throughout this section, we'll analyze a dynamic array with a doubling strategy:
The Data Structure
- Start with an array of capacity 1
- When pushing to a full array, allocate a new array of twice the capacity
- Copy all existing elements to the new array, then insert the new element
Cost Model
- Inserting an element into an available slot costs 1
- Copying an element during resize costs 1
- So resizing n elements and inserting costs n + 1
Cost Sequence Example
Let's trace the cost of pushing elements 1, 2, 3, ..., 8:
| Push # | Size Before | Capacity | Resize? | Cost | Cumulative |
|---|---|---|---|---|---|
| 1 | 0 | 1 | No | 1 | 1 |
| 2 | 1 | 1 | Yes → 2 | 1+1=2 | 3 |
| 3 | 2 | 2 | Yes → 4 | 2+1=3 | 6 |
| 4 | 3 | 4 | No | 1 | 7 |
| 5 | 4 | 4 | Yes → 8 | 4+1=5 | 12 |
| 6 | 5 | 8 | No | 1 | 13 |
| 7 | 6 | 8 | No | 1 | 14 |
| 8 | 7 | 8 | No | 1 | 15 |
After 8 pushes, total cost = 15. Amortized cost = 15/8 = 1.875 per push!
Pattern to Notice
Interactive: Dynamic Array Costs
Watch how push operations affect the total cost and amortized cost. Notice how the amortized cost stays close to 3 even when individual operations spike during resizes.
Dynamic Array Push Operations
Watch how array resizing affects the total cost of push operations
Key Insight
Although individual push operations can cost up to O(n) when resizing occurs, the amortized cost per operation is O(1). Expensive resize operations are rare enough that their cost "spreads out" over all operations.
Quick Check
After 16 push operations on a dynamic array that doubles, what is the total cost?
The Aggregate Method
The aggregate method is the most straightforward approach: calculate the total cost of n operations, then divide by n.
Approach
- Compute an upper bound on the total cost of n operations
- Divide by n to get the amortized cost per operation
- All operations get the same amortized cost
Analysis for Dynamic Array
For n push operations on a dynamic array that doubles:
Step 1: Count the Total Cost
Each push costs 1 for insertion. Additionally, resize operations cost:
- Push 2: copy 1 element
- Push 3: copy 2 elements
- Push 5: copy 4 elements
- Push 9: copy 8 elements
- In general: copy elements at push
Total copy cost for n pushes:
Step 2: Sum Total Costs
Step 3: Compute Amortized Cost
Result
The Accounting Method (Banker's Method)
The accounting method assigns different "charges" to different operations. Some operations are overcharged; the extra credit pays for future expensive operations.
The Key Rule
The total amortized cost must always be at least the total actual cost. We can never "go into debt."
where is the amortized cost and is the actual cost.
Analysis for Dynamic Array
We assign an amortized cost of 3 to each push operation:
- 1 credit pays for the actual insertion
- 2 credits are stored with the element for future copying
Why This Works
When a resize happens from capacity c to 2c:
- We need to copy c elements (cost = c)
- The last c/2 elements were added since the previous resize
- Each of those c/2 elements has 2 stored credits
- Total stored: credits—exactly enough!
Visualization
Think of each element carrying 2 "coins":
- 1 coin to pay for copying itself in the next resize
- 1 coin to "sponsor" an older element that has already spent its coins
Intuition
Interactive: Accounting Method
Watch the accounting method in action. Each push is charged 3 credits, with 2 stored for future copies. When a resize occurs, stored credits pay for copying.
The Accounting Method (Banker's Method)
Charge extra "credits" for cheap operations to pay for expensive ones
How It Works
1.We charge 3 credits for each push operation (this is our amortized cost)
2.The actual insert costs 1 credit
3.We store 2 credits with each element for future copying
4.When resize happens, stored credits pay for copying elements
Credit Flow per Push
Key Insight
By charging 3 credits per push and storing extras, we always have enough "savings" to pay for expensive resize operations when they occur. The accounting method proves the amortized cost is O(1).
Quick Check
In the accounting method for dynamic arrays, why do we charge 3 credits instead of 2?
The Potential Method (Physicist's Method)
The potential method is the most powerful and general technique. It defines a potential function that maps the data structure state to a non-negative real number, representing "stored energy."
The Framework
Define a potential function such that:
- (initial state has zero potential)
- for all states (potential is never negative)
The amortized cost of the i-th operation is:
That is: amortized cost = actual cost + change in potential.
Why This Works
Summing over n operations:
Since and :
The sum of amortized costs is an upper bound on actual costs!
Analysis for Dynamic Array
Define the potential function:
where n = number of elements, c = capacity.
Case 1: No Resize (n < c before push)
- Actual cost:
- Potential change:
- Amortized:
Case 2: Resize Needed (n = c before push)
- Actual cost: (copy n elements + insert)
- New state: n+1 elements, capacity 2c = 2n
- Potential change:
- Amortized:
Beautiful Result
Interactive: Potential Method
Explore how the potential function tracks "stored energy" in the data structure. Notice how when expensive resize operations occur, the potential drops significantly, canceling out the high actual cost.
The Potential Method (Physicist's Method)
Define a potential function that captures the "stored energy" in the data structure
Potential Function
Φ(D) = 2n - c
where n = size, c = capacity
Amortized cost: ĉᵢ = cᵢ + Φ(Dᵢ) - Φ(Dᵢ₋₁)
Array State
Potential Energy: Φ = -1
Operation History
| Op | Type | Actual (c) | Φ before | Φ after | ΔΦ | Amortized |
|---|---|---|---|---|---|---|
| Push elements to see the analysis | ||||||
Key Insight
When a resize occurs, the potential drops significantly (ΔΦ is very negative), which cancels out the high actual cost. This keeps the amortized cost constant at 3 for every push operation!
Quick Check
Why does the potential drop during a resize?
Comparing the Three Methods
| Method | Approach | Best For | Difficulty |
|---|---|---|---|
| Aggregate | Calculate total cost, divide by n | Simple data structures, same amortized cost for all ops | Easy |
| Accounting | Assign charges, maintain credit balance | When different operations should have different costs | Medium |
| Potential | Define potential function on states | Complex data structures, proving tight bounds | Hard (but powerful) |
When to Use Each
- Aggregate: Start here. If you can sum up total cost easily, this is the simplest approach.
- Accounting: Use when you need to "prepay" for future operations. Good for intuitive explanations.
- Potential: Use for complex analyses, especially when operations interact in complicated ways. The potential function often reveals deep structure.
Implementation in Code
Let's implement a dynamic array from scratch, tracking costs to verify our analysis.
C Implementation
Real-World Applications
Amortized analysis is crucial for understanding the performance of many common data structures:
1. Hash Tables
Hash tables resize when the load factor exceeds a threshold. Similar to dynamic arrays, this gives amortized insert/lookup despite occasional rehashing.
2. Splay Trees
Splay trees have amortized operations, even though individual operations can be . The potential method is essential for analyzing them.
3. Union-Find (Disjoint Set)
With path compression and union by rank, Union-Find achieves amortized time per operation, where is the inverse Ackermann function (effectively constant).
4. Fibonacci Heaps
Fibonacci heaps have amortized insert, find-min, and decrease-key operations, enabling efficient implementations of Dijkstra's and Prim's algorithms.
| Data Structure | Worst-Case | Amortized | Key Insight |
|---|---|---|---|
| Dynamic Array (push) | O(n) | O(1) | Doubling spreads resize cost |
| Hash Table (insert) | O(n) | O(1) | Rehashing is infrequent |
| Splay Tree (search) | O(n) | O(log n) | Recently accessed items move to root |
| Union-Find | O(log n) | O(α(n)) | Path compression flattens tree |
| Fibonacci Heap (decrease-key) | O(1) | O(1) | Lazy cleanup in extract-min |
Test Your Understanding
What is the key insight behind amortized analysis?
Summary
Amortized analysis reveals the true cost of data structure operations by considering sequences rather than individual worst cases.
Key Concepts
| Concept | Key Insight | Formula/Idea |
|---|---|---|
| Amortized Cost | Average over sequence, not probability | Total cost / number of ops |
| Aggregate Method | Sum all costs, divide by n | Simple and direct |
| Accounting Method | Charge extra, save for expensive ops | Credits ≥ actual costs always |
| Potential Method | Define energy function on states | ĉᵢ = cᵢ + Φ(Dᵢ) - Φ(Dᵢ₋₁) |
| Dynamic Array Push | O(1) amortized despite O(n) worst case | Φ = 2n - c |
Key Takeaways
- Worst-case per-operation can be misleading when expensive operations are rare
- Amortized analysis provides guaranteed average cost without probabilistic assumptions
- Three methods: Aggregate (easiest), Accounting (intuitive), Potential (most powerful)
- Dynamic array push is O(1) amortized, which is why vectors/ArrayLists are so efficient
- The potential method can prove tight bounds for complex data structures
Exercises
Conceptual Questions
- Tripling Strategy: Suppose a dynamic array triples its capacity instead of doubling. Prove that push is still amortized. What is the constant factor compared to doubling?
- Incrementing by 1: If a dynamic array increases capacity by 1 (instead of doubling), what is the amortized cost of push? Prove your answer.
- Multipop Stack: Consider a stack with push (cost 1) and multipop(k) that removes k items (cost min(k, size)). Using the potential function , prove that both operations are amortized.
- Binary Counter: A binary counter starts at 0. The INCREMENT operation flips bits from rightmost until finding a 0, then flips it to 1. Prove INCREMENT is amortized using the potential .
Coding Exercises
- Implement and Verify: Implement a dynamic array that tracks costs per operation. Run 1 million pushes and verify that the amortized cost approaches 3.
- Shrinking Array: Extend the dynamic array to shrink (halve capacity) when size drops below 1/4 capacity. Analyze the amortized cost of pop operations.
- Visualize Potential: Create a visualization that shows the potential function value after each operation for both push and pop on a dynamic array.
Challenge Problems
- Two-Stack Queue: A queue can be implemented using two stacks. Analyze the amortized cost of enqueue and dequeue operations.
- Splay Tree Analysis: Research and explain the Access Lemma used in splay tree amortized analysis. What potential function is used?
- Hash Table Load Factor: If a hash table resizes when load factor exceeds 0.75 and doubles capacity, prove that insert is amortized. What changes if we resize at 0.5?
Solution Hints
- Tripling: Copy costs become 1 + 3 + 9 + ... = (3^k - 1)/2. The amortized cost is about 1.5.
- Increment by 1: This gives O(n) amortized cost! Total cost is 1 + 2 + 3 + ... + n = n(n+1)/2 = O(n^2) for n pushes.
- Multipop: Push: actual cost 1, ΔΦ = +1, amortized = 2. Multipop(k): actual cost k, ΔΦ = -k, amortized = 0.
- Binary Counter: If we flip t bits to 0 and one bit to 1, actual cost = t+1, ΔΦ = 1-t, amortized = 2.
In the next section, we'll explore the Master Theorem—a powerful tool for solving recurrence relations that arise from divide-and-conquer algorithms.