Chapter 5
12 min read
Section 26 of 261

Array Operations and Complexity

Arrays

The Operation Zoo

Every array data structure exposes the same handful of primitive operations — read by index, write by index, insert, delete, search, traverse, slice. Their asymptotic costs are deceptively simple to write down. Their real costs on a real machine, fighting cache lines and branch predictors, can differ by two orders of magnitude from what big-O suggests. A serious engineer owns both views.

This section walks every canonical operation, derives its tight bound, and then anchors the bound with hardware-level intuition. We end with the patterns that working engineers actually use to pick the right operation for the right job.

OperationWorst-caseWhy
Random access A[i]O(1)address arithmetic
Update A[i] = vO(1)single store
Linear search (unsorted)O(n)scan all
Binary search (sorted)O(log n)halve the range
Insert at index kO(n − k)shift right
Delete at index kO(n − k)shift left
Append (dynamic array)O(1) amortized / O(n) worstoccasional resize
PrependO(n)shift everything
Concatenate (n + m)O(n + m)copy both
Reverse in placeO(n)n / 2 swaps
Rotate by kO(n)three reverses
Min / max / sum / meanO(n)must touch every element
Equality compare two arraysO(n)element-wise
Slice A[i:j] (Python list)O(j − i)copy into new list
Slice (NumPy basic)O(1)view, no copy
Membership (unsorted)O(n)linear scan
Membership (sorted)O(log n)binary search
Sort (comparison-based)O(n log n) avgcovered in Ch 26

Read this table left-to-right, then right-to-left

Left-to-right tells you the cost of the operation you intend to run. Right-to-left tells you the operation that fits a budget. If you can only afford O(logn)O(\log n), this table tells you the array must be sorted — which constrains your insert / delete story, which constrains your container choice. Datastructure design is constraint propagation.

Random Access and Update: The O(1) Miracle

Random access — reading A[i]A[i] for any valid ii in constant time — is the defining superpower of arrays. The reason is pure arithmetic:

addr(A[i])  =  base+is\mathrm{addr}(A[i]) \;=\; \mathrm{base} + i \cdot s

where base\mathrm{base} is the start of the backing buffer and ss is the stride (size of one element in bytes). One multiply, one add, one load — three machine instructions, zero traversal. No matter how big the array is, the address calculation is the same width. That is where theO(1)O(1) comes from. Update is symmetric: same address calculation, plus a single store.

What "O(1)" really means here

Big-O hides the constant. On a modern CPU the address calculation costs about 1 nanosecond if the cache line is hot. If the cache line is cold, the load stalls for 50–200 ns waiting on RAM. Both are constant in nn — both areO(1)O(1) — but the second is 100× slower. We come back to this in the cache-locality section.

A linear search examines elements one at a time. Best case the target is the first element (O(1)O(1)), worst case it is not there at all (O(n)O(n)), expected case for a uniformly random target position isn/2\approx n / 2 probes — stillO(n)O(n) because constants disappear in asymptotic notation.

🐍python
1def linear_search(arr, target):
2    for i, x in enumerate(arr):
3        if x == target:
4            return i        # found
5    return -1               # not found

If the array is sorted, binary search lets us discard half the remaining range with each comparison:T(n)=T(n/2)+O(1)T(n) = T(n / 2) + O(1), which solves toT(n)=O(logn)T(n) = O(\log n). We covered the invariant carefully in Chapter 3 §3 (Recurrences). The key trade is up front: you paid O(nlogn)O(n \log n) to sort once, and you now reap O(logn)O(\log n) on every query thereafter. For qq queries the total isO(nlogn+qlogn)O(n \log n + q \log n) — far better than O(qn)O(qn) as soon asqlognq \gg \log n.


Insertion and Deletion: Where the Cost Hides

The expensive operations on an array are the ones that change the array's shape. Insertion at index kkrequires shifting every element from kk ton1n - 1 one slot to the right to make room. That is nkn - k writes. Worst casek=0k = 0 (insert at the front) givesO(n)O(n); best case k=nk = n(insert at the end) is O(1)O(1).

Deletion is the mirror: nk1n - k - 1 elements slide left to close the hole, plus one drop at the tail.

Insert, delete, and rotate-left in pure Python
🐍array_ops.py
1Function signature

Takes the live list arr, an index k, and value v. We mutate in place — the conventional Python / C++ pattern for array edits.

2Docstring

Document side effects. Returning None and mutating is a real contract; callers must know.

3Capture current length

n is the length BEFORE growth. We need it because the loop bound depends on the pre-append size.

EXAMPLE
arr=[7,3,9,4]  →  n=4
4Grow the underlying buffer by 1

Python's list grows in amortized O(1). On most appends it costs nothing; occasionally the buffer is full and the runtime allocates a larger chunk and copies — that copy is the worst-case cost we'll dissect later in this section.

EXAMPLE
arr=[7,3,9,4]  →  arr=[7,3,9,4,None]
5Walk the right-shift loop

We move from the NEW tail (index n, the freshly added slot) down to k+1 inclusive. range(n, k, -1) visits indices n, n-1, …, k+1. Going right-to-left is critical: if we walked left-to-right we would overwrite each value before reading it.

EXAMPLE
for k=1, n=4: i = 4, 3, 2
6Shift one cell right

Each iteration reads one cell, writes the adjacent cell. This is the dominant cost: n − k assignments. Worst case k=0 ⇒ n shifts ⇒ O(n).

EXAMPLE
i=4: arr[4]=arr[3]=4 → [7,3,9,4,4]
i=3: arr[3]=arr[2]=9 → [7,3,9,9,4]
i=2: arr[2]=arr[1]=3 → [7,3,3,9,4]
7Place the new value

After all shifts, slot k is a duplicate of its old contents and is safe to overwrite. v lands here.

EXAMPLE
arr[1]=99 → [7,99,3,9,4]
9Symmetric delete signature

Returns the removed value (Python's list.pop semantics). Same in-place mutation contract.

11Capture removed value first

Read arr[k] BEFORE any shifting, otherwise we lose it.

EXAMPLE
arr=[7,99,3,9,4], k=1  →  removed=99
12Capture length

We need n to know where to stop the left-shift loop.

13Walk the left-shift loop

Iterate i from k up to n-2 (inclusive). Now we MUST go left-to-right: each cell is overwritten by its right neighbor, and the neighbor is still its original value because we haven't visited it yet.

14Shift one cell left

Cost is n − 1 − k assignments. Worst case k=0 ⇒ n−1 shifts ⇒ O(n).

EXAMPLE
i=1: arr[1]=arr[2]=3 → [7,3,3,9,4]
i=2: arr[2]=arr[3]=9 → [7,3,9,9,4]
i=3: arr[3]=arr[4]=4 → [7,3,9,4,4]
15Drop the duplicated tail

After the loop, the last cell is a duplicate of its old neighbor. pop() removes it in O(1). Net effect: array is one shorter, hole is gone.

EXAMPLE
arr=[7,3,9,4,4] → arr=[7,3,9,4]
16Return the value the caller wanted

The caller often needs to inspect / log / re-route the deleted element.

18rotate_left signature

Rotate-left by k means: every element moves left by k, with the first k elements wrapping around to the back. Naive approach copies into a buffer (O(n) time AND O(n) extra space). The three-reverse trick achieves O(n) time with O(1) extra space.

20Length capture

Used for both the modulo and the reverse bounds.

21Empty guard

Reversing an empty range is a no-op, but k %= 0 would raise ZeroDivisionError without this guard.

22Empty short-circuit

Return immediately for an empty array — nothing to rotate.

23Normalize k

Rotating by n is the identity, so reduce mod n. Handles k > n cleanly.

EXAMPLE
n=8, k=11  →  k=3
24Reverse the prefix [0, k-1]

First k elements are reversed in place.

EXAMPLE
[1,2,3,4,5,6,7,8], k=3 → [3,2,1,4,5,6,7,8]
25Reverse the suffix [k, n-1]

Remaining n − k elements reversed.

EXAMPLE
[3,2,1,4,5,6,7,8] → [3,2,1,8,7,6,5,4]
26Reverse the whole array

Reversing a sequence whose two halves were each pre-reversed yields the original two halves swapped — exactly a left-rotation by k.

EXAMPLE
[3,2,1,8,7,6,5,4] → [4,5,6,7,8,1,2,3] ✓
28Helper: in-place reverse

Standard two-pointer reverse. lo and hi walk toward each other, swapping. Total swaps = ⌊(hi − lo + 1) / 2⌋. Across the three calls, total swaps is exactly n / 2 + n / 2 + n / 2 ≈ 3n / 2 — still O(n).

29Loop guard

Stop when pointers meet or cross. For odd-length ranges the middle element doesn't need to move.

30Tuple swap

Python's tuple-assignment swap; two reads, two writes, no temporary variable in source code.

31Advance left pointer

Move toward the middle.

32Retreat right pointer

Move toward the middle.

5 lines without explanation
1def insert_at(arr, k, v):
2    """Insert value v at index k. Returns nothing; mutates arr."""
3    n = len(arr)
4    arr.append(None)                # grow by 1 slot
5    for i in range(n, k, -1):       # walk from new tail down to k+1
6        arr[i] = arr[i - 1]         # shift each element right by 1
7    arr[k] = v                      # place new value
8
9def delete_at(arr, k):
10    """Delete element at index k. Returns the removed value."""
11    removed = arr[k]
12    n = len(arr)
13    for i in range(k, n - 1):       # walk from k up to second-last
14        arr[i] = arr[i + 1]         # shift each element left by 1
15    arr.pop()                       # drop the duplicated tail
16    return removed
17
18def rotate_left(arr, k):
19    """Rotate arr left by k positions, in place, in O(n) using 3 reverses."""
20    n = len(arr)
21    if n == 0:
22        return
23    k %= n
24    _reverse(arr, 0, k - 1)
25    _reverse(arr, k, n - 1)
26    _reverse(arr, 0, n - 1)
27
28def _reverse(arr, lo, hi):
29    while lo < hi:
30        arr[lo], arr[hi] = arr[hi], arr[lo]
31        lo += 1
32        hi -= 1
Direction of the shift loop matters. Insertion walks right-to-left. Deletion walks left-to-right. Reverse the direction and you overwrite values before reading them — one of the most common bugs in hand-rolled array code.
The swap-and-pop trick. If you do not need to preserve order, deleting at index kk can beO(1)O(1): swap A[k]A[k]with A[n1]A[n-1], then pop the last element. Game engines, ECS frameworks, and entity/particle pools live on this trick.

Amortized Append and the Resize Drama

Dynamic arrays advertise amortizedO(1)O(1) append, but each individual append can cost O(n)O(n) when the underlying buffer is full and must be reallocated and copied to a larger one. The standard analysis (covered in detail in Ch 7) uses doubling: when the buffer fills, allocate 2n2n slots and copy.

The bookkeeping argument: across nn appends the total work for resizes is1+2+4++n/2+n<2n1 + 2 + 4 + \cdots + n / 2 + n < 2n. Sonn appends costO(n)O(n) total &Rightarrow; per append isO(1)O(1) averaged out, even though one unfortunate append paid Θ(n)\Theta(n) for the copy that day.

Realtime systems beware

"Amortized O(1)" is not the same as "always cheap." If you append once per millisecond in a soft-realtime audio thread, the one append that triggers a multi-megabyte realloc will blow your deadline budget. Mitigation:reserve(n)\texttt{reserve}(n) up front, or use a ring buffer with fixed capacity.

Bulk Operations: Concat, Reverse, Rotate, Compare

Operations that must touch every element are inevitablyΘ(n)\Theta(n):

  • Concatenate arrays of sizenn and mm:O(n+m)O(n + m). Allocate one buffer of sizen+mn + m, copy both. Repeated concatenation inside a loop becomes O(n2)O(n^2) — the classic Python "build a string with +=" performance trap. Use ''.join(parts)\texttt{''.join(parts)} instead.
  • Reverse in place:n/2n / 2 swaps via two pointers walking inward. O(n)O(n), no extra space.
  • Rotate left by k: the three-reverses trick achieves O(n)O(n) time andO(1)O(1) space — reverse[0,k1][0, k-1], reverse[k,n1][k, n-1], reverse the whole thing. We implement it in the code panel above.
  • Equality compare: in the worst case (arrays differ only at the last index) you must read all2n2n values — O(n)O(n).
  • Min / max / sum / mean / variance: a single linear pass — O(n)O(n) — and trivially parallelizable.

Slicing: Copy or View?

The semantics of A[i:j]A[i:j] change drastically between languages, and this single decision dominates real-world performance.

Language / TypeSlice costSlice semantics
Python listO(j − i)Eager copy. Independent storage.
NumPy basic slice (no fancy index)O(1)View. Shares memory; mutating the slice mutates the parent.
NumPy fancy slice (boolean / index array)O(j − i)Copy.
Go sliceO(1)View over a backing array; capacity tracked.
Rust &[T] slice referenceO(1)Borrow; no allocation.
C++ std::span (C++20)O(1)Non-owning view.
Why does this matter? A function that processes 10,000 items and builds a slice on every call isO(n)O(n) per call in Python andO(1)O(1) per call in NumPy — a 10,000× speed difference at largenn. The asymptotic wall is invisible until the dataset crosses a threshold and your nightly job goes from 30 minutes to 30 hours.

The same operation in C++ for comparison

C++: insert / delete with std::memmove + a benchmark harness
📄array_ops.cpp
1vector header

std::vector is the C++ analog of a Python list / Java ArrayList: contiguous, dynamically resizable.

2cstring

Brings in std::memmove — a hardware-accelerated bulk copy that handles overlapping ranges correctly. It's roughly an order of magnitude faster than a hand-rolled element-by-element loop.

3chrono

High-resolution clock for benchmarking. Reader can compile and run to see real wall time.

4iostream

For std::cout to print the timing result.

6insert_at signature

Reference parameter (&) so we mutate in place. size_t for the index — unsigned, matches the vector's index type.

7Grow the buffer

push_back is amortized O(1); occasionally triggers a realloc + copy when capacity is exceeded.

8memmove: bulk right-shift

Shifts (size − k − 1) integers from position k to position k + 1 in one call. The CPU uses SIMD / wide loads, so the constant factor is tiny (~ 1 ns per int on modern x86). Compare to the Python loop above which does pure interpreter-overhead per element — a 50–100× speed gap is normal.

EXAMPLE
a = [1,2,3,4,5], k=1
memmove copies 4 ints (3,4,5 + the 0 we appended)
a becomes [1, 1, 2, 3, 4]
then a[k]=v overwrites the slot
9Byte count argument

memmove takes BYTES, not element count. (size − k − 1) elements × sizeof(int) (typically 4 bytes).

10Place the value

After memmove, slot k is safe to overwrite.

13delete_at signature

Mirrors insert_at. Same mutation-in-place contract.

14memmove: bulk left-shift

Shift (size − k − 1) ints from k+1 to k. memmove is safe with overlap; memcpy would be undefined behavior here because the source and destination ranges overlap.

15Byte count

Same byte-arithmetic as before.

16Drop the duplicate tail

pop_back is O(1) — never reallocates downward by default.

19Million-element vector

1'000'000 with C++14 digit separators, all initialized to 0.

20Start clock

Capture wall-clock instant immediately before the operation.

21Worst-case insert

Insert at index 0 forces a shift of all 1,000,000 elements. On a typical laptop this prints ≈ 800–1500 microseconds — about 1 ns per element. That tiny constant is why arrays remain practical even for these brutally non-asymptotic-friendly operations.

22Stop clock

Capture after; subtraction yields elapsed duration.

23Print microseconds

duration_cast picks the unit; the integer count is the actual measurement. Try changing to chapter §5.3 sizes (1e3, 1e4, 1e5, 1e6) and plot — you'll see the canonical linear curve.

9 lines without explanation
1#include <vector>
2#include <cstring>
3#include <chrono>
4#include <iostream>
5
6void insert_at(std::vector<int>& a, size_t k, int v) {
7    a.push_back(0);                                  // grow by 1
8    std::memmove(&a[k + 1], &a[k],                   // bulk shift right
9                 (a.size() - k - 1) * sizeof(int));  // (n - k) * 4 bytes
10    a[k] = v;
11}
12
13void delete_at(std::vector<int>& a, size_t k) {
14    std::memmove(&a[k], &a[k + 1],                   // bulk shift left
15                 (a.size() - k - 1) * sizeof(int));
16    a.pop_back();
17}
18
19int main() {
20    std::vector<int> a(1'000'000, 0);
21    auto t0 = std::chrono::high_resolution_clock::now();
22    insert_at(a, 0, 42);                             // worst case: shift 1M ints
23    auto t1 = std::chrono::high_resolution_clock::now();
24    std::cout << "front insert on n=1e6: "
25              << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count()
26              << " us\n";
27}

Interactive: Array Operation Animator

Pick an operation, drag the index kk slider and value vv slider, and click Run step. Amber cells indicate values that changed. Blue cells indicate cells that participated in a shift (read once, written once). The cells touched counter is the literal number of read / write events the operation just performed — the constant factor big-O hides. Compare an "Insert at front" against a "Swap-pop" on the same array and watch cumulative touches diverge.

Array Operation Animator

703192k4314852657
Operation
Get A[k]
cost: O(1)
Cells touched (last step)
0
Cumulative touches
0

Amber cells changed value. Blue cells participated in a shift. The "cells touched" counter is the real, observed cost of the operation you just ran — what big-O hides inside its constant factor.

Try this experiment: starting from the default array of length 8, run Prepend three times. The cumulative-touches counter grows as9+10+11=309 + 10 + 11 = 30. Now reset and run Append three times instead —1+1+1=31 + 1 + 1 = 3. Same final length, ten times less work. This is the gap betweenO(n)O(n) prepend andO(1)O(1) append in a single experiment.

Quick Check

An array of size n receives k front-insertions, one after another. What is the total work performed?

Quick Check

You have arr = [10, 20, 30, 40, 50]. You run swap-and-pop deletion at index 1, then at index 1 again, then at index 0. What is the final array?


Constants Matter: Cache Locality vs Big-O

Two operations that are both O(1)O(1) can differ in real wall-clock cost by 50× or more. The reason is the memory hierarchy:

OperationBig-OTypical real timeWhy
Array index A[i] (cache hot)O(1)≈ 1 nsL1 cache, 3–4 cycle load
Array index A[i] (cache cold)O(1)≈ 80 nsDRAM round trip
Hash map lookup (Python dict)O(1) avg≈ 60 nshash + probe + pointer chase
Linked list traversal stepO(1)≈ 50 nspointer chase, no prefetch
Sequential array sweep (N=1M)O(n)≈ 0.3–1 msfully prefetcher-friendly
Random array access (N=1M)O(n)≈ 50–80 mscache thrashes on every access

The bottom two rows are the same algorithm asymptotically, on the same size of input, with a 50–100× gap. Big-O is unbothered. Your latency budget is not.

The lesson: arrays beat "theoretically equivalent" structures (hash maps for smallnn, linked lists almost always) on real hardware because contiguous memory is what CPUs are physically optimized to chew through. When in doubt, prefer the array.

Real-World Patterns

  1. Game engines — entity arrays. Append-heavy, delete with swap-pop. Order does not matter; cache-friendly iteration is everything.
  2. Audio processing — ring buffers. A pure dynamic array would prepend on every input sample (O(n)O(n)); a ring buffer makes itO(1)O(1) by reusing slots.
  3. Trading systems — sorted order books. Sorted array with binary search for price lookup, occasional inserts paid for once on every quote. At HFT scale these arrays are even replaced with cache-line-aligned, branchless variants.
  4. Time-series databases (Druid, InfluxDB, ClickHouse).Append-only column arrays. Never delete or insert in the middle; all writes are tail-appends. Reads are sequential range scans. This pattern is the only reason these systems can ingest millions of points per second.
  5. Compiler symbol tables. Small arrays of structs with linear search beat hash maps forn<16n < 16 — cache locality wins over asymptotics at small sizes.
  6. GPU vertex / index buffers. Append-only on the CPU side, then frozen and shipped to the GPU. No mid-array edits ever.
  7. DOM child lists in browsers. When you callnode.appendChild(c)\texttt{node.appendChild(c)} 10000 times in a tight loop, you pay the array shift cost plus a layout reflow per call. Hence theDocumentFragment\texttt{DocumentFragment} pattern: build off-tree, attach once.
  8. Genomics — sequence arrays. A human genome is31093 \cdot 10^9 bytes as an array of nucleotides. Random access by index is the only way tools like BWA / minimap2 are tractable. Insertions are forbidden; mutations are recorded as a separate diff structure.
  9. UI virtualization. A list of 100,000 items cannot render every row. Materialize only the visible window into a small array, recycle DOM nodes — turn anO(n)O(n) render intoO(visible)O(\text{visible}).
  10. Database B-tree pages. Each page is a small sorted array of keys. Insert paysO(page size)O(\text{page size}) shifts — tolerable because page size is bounded (typically 4–16 KB &Rightarrow; a few hundred keys).

Decision Rules: Which Operation, Which Structure?

Memorize these. They cover ~90% of array-related design decisions.

Access patternUseWhy
Random access + appends onlyDynamic array (Python list, Go slice, std::vector)O(1) read, amortized O(1) append
Random access + occasional middle inserts (small n)Dynamic arrayConstants beat asymptotics for n < ~10⁴
Random access + heavy middle inserts (large n)Balanced BST or skip listO(log n) insert AND access
Inserts/deletes at both endsDeque (Ch 8)O(1) at both ends
Order doesn't matter, fast deleteArray + swap-popO(1) delete by index
Streaming append + bounded windowRing bufferO(1) push/pop, fixed memory
Sorted, query-heavySorted array + binary searchO(log n) query, paid for at sort time
Sorted, mixed query/updateBalanced BST or B-treeO(log n) for everything
Tail-append, range scanAppend-only column arrayFits SSD/columnar storage
Heuristic: if the operation you run most often isO(1)O(1) on an array, use an array. If the operation you run most often is O(n)O(n) on an array and nn exceeds a few thousand, change structures — you are paying a tax that compounds.

The next two sections (Two Pointer and Sliding Window) are entirely about turning an apparent O(n2)O(n^2) algorithm on an array into O(n)O(n) — using nothing but the operations we just catalogued.

Loading comments...