Chapter 6
12 min read
Section 36 of 261

When to Use Linked Lists vs Arrays

Linked Lists

The Big-O Illusion

Open any introductory data-structures course and you will be told the same story. Inserting at the head of a singly linked list is O(1)O(1); inserting at the head of a dynamic array is O(n)O(n) because every existing element must shift one slot. Therefore, the story concludes, when you need many head-insertions, the linked list wins. Every freshman accepts this. Every senior engineer knows it is wrong in practice for almost every realistic workload.

Big-O describes how the cost scales as nn grows. It does not describe what one operation actually costs. It hides the constant factor — and on modern hardware, that constant factor is dominated by the memory hierarchy, not by the arithmetic. A linked-list head-insert is O(1)O(1), but the "1" includes a heap allocation (~30-100 ns) and a future cache miss the next time anyone walks the list (~100-300 cycles). An array shift moves nn elements, but each move is a sequential memory copy at roughly 11 cycle each, fully prefetched, vectorized to 32-64 bytes per cycle by SIMD.

Big-O compares asymptotic work. Two algorithms with the same Big-O can differ by 100x in wall-clock time because of constant factors driven by cache behaviour, allocator behaviour, and branch prediction. Asymptotic analysis is the start of the conversation, not the end.

Stroustrup's Benchmark: Vector Beats List

In a now-famous 2012 GoingNative talk, Bjarne Stroustrup, the creator of C++, ran the following experiment. Given nn random integers, insert them one at a time into a sorted sequence — that is, for each new value find the right position and put it there. Repeat for two containers: std::vector<int> and std::list<int>. Both are O(n2)O(n^2) algorithms. On paper they should scale identically.

The vector loses on the algorithmic accounting: each insertion does an O(logn)O(\log n) binary search to find the position, then shifts up to nn elements. The linked list looks better: O(n)O(n) linear search to find the position, then O(1)O(1) splice. Beginners predict the linked list wins.

Stroustrup's measurement said the opposite. From n=100n = 100 all the way past n=500,000n = 500{,}000, the vector won by an enormous margin — often 10x to 50x faster. The gap widened, not narrowed, as nn grew. The linked-list version was so much slower that it stopped being practical before nn hit a million.

The takeaway: the vector pays for shifting, but the shift is cache-friendly contiguous memmove. The linked list pays for pointer chasing on every comparison during the search, and every pointer chase is a potential cache miss. The vector's expensive operation is fast; the linked list's cheap operation is slow.

Why Cache Locality Wins

Modern CPUs are roughly 100×100\times faster than main memory. To hide that gap, every CPU has a cache hierarchy:

LevelTypical sizeLatencyHolds
Registersfew hundred bytes0 cyclescurrent operands
L1 cache32-64 KB / core~4 cycles (~1 ns)hot data
L2 cache256 KB - 1 MB / core~12 cycles (~3 ns)recent data
L3 cache8-64 MB shared~40 cycles (~12 ns)shared working set
DRAMGBs~200-300 cycles (~70-100 ns)everything else

Memory is moved between these levels in cache lines of 64 bytes each. When you read one int, you actually pull 16 ints. Sequential array traversal therefore amortizes one DRAM round-trip across 16 elements; the next 15 are essentially free. Linked-list traversal jumps to a fresh address every node, defeating the cache line and the hardware prefetcher simultaneously.

There is one more layer: the prefetcher. Modern CPUs detect striding access patterns and pull cache lines before they are requested. This works beautifully for arrays — the prefetcher quickly identifies a +4 stride and stays a few lines ahead — and not at all for linked lists, because the prefetcher cannot guess where the next pointer leads until the current load returns.

Mental model

A modern CPU is a sports car that hates pulling over to ask for directions. An array is a straight highway with green lights every block. A linked list is a scavenger hunt where every clue is in a different city.

The Decision Framework

With the constant-factor reality established, here is the decision rule a senior engineer actually uses. Choose a linked list only if all four of these are true.

  1. You will frequently insert or delete at known positions — that is, you already hold a node pointer. (If you only have an index, the linked list must walk to find the position, and then you have lost.)
  2. You will rarely need indexed random access (list[k]).
  3. The list is long enough that the algorithmic O(1)O(1) splice actually pays back the per-node cache penalty — empirically, this means many thousands of nodes and the splices outnumber the traversals by a wide margin.
  4. You don't need iteration over the whole list to be fast.

Choose a dynamic array when:

  • You need indexed access.
  • Sequential traversal is on the hot path.
  • The set of elements is bounded or growth is rare.
  • Cache misses matter — i.e., always.

Choose a deque (a doubly linked list of fixed-size blocks) when:

  • You need fast push and pop on both ends.
  • You don't want the pointer-invalidation hazards of a vector.
  • You want most of the array's cache friendliness within each block.

Choose a hash map plus doubly linked list (the LRU pattern) when:

  • You need O(1)O(1) random access by key.
  • You also need O(1)O(1) reordering — move-to-front, evict-tail.

Choose a balanced BST or skip list when:

  • You need O(logn)O(\log n) insert, delete, search, and predecessor/successor. An array can search in O(logn)O(\log n) but cannot insert in O(logn)O(\log n); a linked list can splice but cannot search faster than O(n)O(n). Only ordered tree-like structures unify all four.

Choosing the Right Container

Here is the decision flow, in the order an experienced engineer applies it.

  1. Do you only need a sequence and indexed access? Use a dynamic array. Stop here. This covers ~70% of all real cases.
  2. Do you need O(1) push/pop on both ends? Use a block deque (Python collections.deque, Java ArrayDeque, C++ std::deque). Cache-friendly within blocks, no end-shifting.
  3. Do you need O(1) lookup by key? Use a hash map. Plus a DLL if you also need O(1) ordered eviction (LRU).
  4. Do you need ordered logarithmic ops (predecessor, range, k-th smallest)? Use a balanced BST or skip list.
  5. Do you need a persistent / immutable structure? A linked list shines here, because tail-sharing is free — but you are now in a functional-programming context, not a performance one.
  6. None of the above and you genuinely have node pointers everywhere? Now consider a doubly linked list.
Notice the linked list lives at step 6, not step 1. That is the inverted ordering most introductory textbooks accidentally teach.

Memory and Locality, Quantified

Concrete numbers for one million 32-bit integers, on a typical x86-64 CPU with 64-byte cache lines:

MetricArray of 1M intsSingly linked list of 1M ints
Element payload4 MB4 MB
Per-node overhead0 bytes8 B next pointer + 16 B allocator header
Total memory~4 MB~28-40 MB
Memory layout1 contiguous block~1M scattered 24-byte allocations
Fits in L2 cache?Often (<= 1 MB) or partially streamedNo (28x larger)
Sequential read latency~1 ns / element~30-100 ns / element
Random index accessO(1)O(n)
Insert at known nodeO(n) shiftO(1) splice

The verdict is stark. The linked list trades a 30 to 100x per-element constant factor for the asymptotic O(1)O(1) splice. That trade pays only when splices dominate traversals, which is rarer than students think.

These numbers are for a hand-rolled C-style linked list. Java's LinkedList<Integer> is even worse: each node is a separate Java object with a 16-byte header, plus the boxed Integer is also a separate object with another 16-byte header. For 1M boxed ints you spend roughly 56 MB on a structure that an ArrayDeque<Integer> would store in ~12 MB. Java LinkedList is almost always the wrong answer.

The Intrusive Linked List Pattern

When linked lists do appear in production code — particularly in operating systems, schedulers, and high-performance networking — they are almost always intrusive. Instead of a wrapper node holding a pointer to the payload, the link pointers are embedded inside the payload struct.

📄c
1/* Linux-kernel-style intrusive doubly linked list. */
2struct list_head {
3    struct list_head *prev, *next;
4};
5
6/* The payload owns its links. */
7struct task {
8    int pid;
9    char name[32];
10    struct list_head runqueue;   /* link in scheduler's run queue */
11    struct list_head children;   /* link in parent's child list */
12};

Two payoffs. First, no separate allocation per insertion — the link cell is already inside the task struct, allocated when the task was created. Second, the same object can sit on multiple lists by embedding multiple list_head fields. The Linux kernel uses this pattern thousands of times: the struct task for a process simultaneously lives on the run queue, its parent's children list, the global pid hash bucket, and more — all without a single auxiliary allocation.

Even with intrusive lists, the kernel pairs them with a hash table for fast lookup by id. The list is the auxiliary thread used for ordered iteration and O(1) splicing; the hash table is the primary index. This is the lesson from real systems: linked lists are rarely the primary container; they are almost always a secondary thread woven through a primary hash or array.


Engineering Case Studies

LRU cache: hash + DLL

An LRU cache stores at most KK items keyed by string, evicting the least recently used when a new item arrives. Two operations must be O(1)O(1): get(key) and put(key, value). The textbook solution combines a hash map with a doubly linked list. The hash map maps keys to DLL nodes; the DLL orders nodes by recency. get hashes the key (O(1)), splices the node to the front (O(1)). put hashes (O(1)) and, if full, evicts the tail (O(1)). Neither an array nor a DLL alone can do this; the combination is essential. This is the design of Python's functools.lru_cache and Redis's eviction policy.

Queue: circular array, not linked list

For a FIFO queue, the textbook answer is "use a linked list because both ends are O(1)." The production answer is "use a circular array." Python's collections.deque is a doubly linked list of 64-element blocks; Java's ArrayDeque is a power-of-two circular buffer. Both are 5 to 10x faster than a hand-rolled linked list for the same operations because most pushes and pops hit cached memory inside the current block.

Undo stack: dynamic array

An editor maintains an undo stack of the last NN commands. Operations are push and pop at one end. A list in Python or Vec in Rust does this perfectly with O(1)O(1) amortized push and O(1)O(1) pop. Reaching for LinkedList here is overkill; you pay the per-node allocation tax for no asymptotic benefit.

Browser tab list: array of tab objects

Tabs are reordered rarely (drag-and-drop) and traversed often (rendering, keyboard navigation). The reorder is human-paced, so a linear shift of 30 tabs is invisible — sub-microsecond. The traversal happens every paint frame, so cache locality matters. Array wins; linked list loses.

OS process scheduler: depends on the era

Classical Unix used a circular doubly linked list per priority level. Modern Linux's Completely Fair Scheduler (CFS) instead uses a red-black tree keyed by accumulated CPU time, because it needs O(logn)O(\log n) "pick the task with minimum runtime" queries that a flat list cannot serve efficiently. Same problem, different data structure as the requirements grew.

Database B-tree leaves: linked together

InnoDB, Postgres, SQLite, and most other B-tree-based engines keep their leaf pages connected by sibling pointers — a singly or doubly linked list at the leaf level. Why? Because range queries (WHERE x BETWEEN a AND b) descend to the leaf containing a once, then walk leaves sequentially. The linked list is the auxiliary thread that turns range queries from O(klogn)O(k \log n) into O(logn+k)O(\log n + k). This is the canonical secondary-thread pattern from earlier — the B-tree is the primary index, the list is the fast iterator.


A Two-Language Benchmark

Talk is cheap. Run this on your machine and watch the constant factor crush Big-O.

bench.py — array vs linked list, 1,000,000 ints
🐍python
1Import perf_counter

time.perf_counter() returns a high-resolution monotonic clock in seconds. It is the standard tool for micro-benchmarks in Python; time.time() can run backwards on NTP adjustments and is wall-clock, not monotonic.

2Use a dataclass for the Node

@dataclass auto-generates __init__, __repr__, and __eq__ from type annotations. Cleaner than writing the boilerplate; identical runtime cost to a hand-written class.

4Define a singly linked list Node

Each Node carries an int and a forward pointer. The 'Node | None' string is a forward reference because Node is mid-definition. Every Node is a separate Python object on the heap with its own header (~56 bytes on CPython 3.12), value box (~28 bytes), and __dict__ — total ~80-100 bytes per int. A list[int] stores a pointer per element (~8 bytes) plus shared int objects.

8Define the array-append benchmark

Takes n, the number of elements to append. We will invoke this with n = 1,000,000 to expose constant-factor differences.

EXECUTION STATE
n = 1,000,000
9Empty Python list

arr starts with capacity 0. CPython's list grows by roughly 1.125x with rounding, doubling enough times that the total work to append n elements is O(n) amortized. Each grow allocates a new contiguous PyObject* buffer and memcpys the old one — but this happens O(log n) times, not n times.

10Start the timer

We exclude list creation from the timing because the cost we care about is per-append work.

11The hot loop

range(n) yields integers 0..n-1 lazily; no million-element list is materialized. The interpreter overhead per iteration is the same for both benchmarks, so any difference we see is structural.

12Append: amortized O(1)

list.append is a C-level operation: bump the size, write into the buffer, possibly realloc. Writes happen into already-cached memory most of the time because the buffer is contiguous and the CPU prefetcher learns the access pattern within a few iterations.

13Return elapsed seconds

On a 2024 laptop this prints something like 0.05s for n=1M. About 50 nanoseconds per append — most of which is Python interpreter overhead, not the append itself.

EXECUTION STATE
elapsed = ~0.05s
15Linked-list append benchmark

We maintain head and tail so each append is O(1) — otherwise we'd be timing O(n^2) and the comparison would be meaningless.

16Initialize head as None

Empty list. The 'Node | None' annotation forces us to handle the empty case explicitly, which is exactly the kind of corner case that breaks naive linked-list code.

EXECUTION STATE
head = None
17Tail pointer for O(1) append

Without a tail pointer, appending requires walking from head every time — O(n^2) total. The tail pointer is the textbook fix.

EXECUTION STATE
tail = None
18Start the timer

Exclude the head/tail init from the timed region.

19Same n, same loop

Identical loop structure to the array case so any speed difference is structural, not loop overhead.

20Allocate a fresh Node

This is where the linked list pays dearly. Node(i) calls Python's object allocator, which on n=1,000,000 means a million heap allocations. Each one walks a free list, may trigger a small-block-pool refill, and increments the gc generation counter. Amortized O(1), but the constant factor is ~200-500 ns each — orders of magnitude more than writing into a pre-grown array slot.

EXECUTION STATE
i = current index
node = fresh heap object
21First-node case

If the list is empty, the new node becomes both head and tail.

22Set head and tail together

Python tuple-assignment is one bytecode (STORE_FAST twice). No performance trick here, just cleaner code.

EXECUTION STATE
head = node (id 0)
tail = node (id 0)
23General case

Splice the new node onto the existing tail in O(1). This is the textbook insertion that linked lists are supposed to be fast at.

24Wire the old tail forward

The old tail's next pointer now references the new node. This is a single attribute write — fast in isolation, but Python attribute writes go through the dict/slots machinery (~50-100 ns) versus a buffer write of ~1 ns.

25Advance the tail pointer

The new node is now the tail. Total work per append: 1 allocation + 2 attribute writes ~ 300-600 ns. The list.append equivalent is ~50 ns. Linked-list append is ~10x slower per element, before we even start measuring traversal.

EXECUTION STATE
tail = new node
26Return elapsed

On the same machine: ~0.4 to 0.6 seconds for n=1M. Compared to ~0.05s for the array. 8-12x slower in real wall-clock time — and we haven't even traversed it yet.

EXECUTION STATE
elapsed = ~0.5s
28Sum benchmark for the array

Now we measure the cost of traversing each container. This is where cache locality really decides the fight.

29Time the traversal

Pure read pass — no allocations, no writes.

30Initialize accumulator

An int that will be incremented n times. CPython boxes small ints, so total += x reuses the small-int cache for values < 256 and allocates fresh PyLong objects above that. Doesn't affect the relative comparison.

EXECUTION STATE
total = 0
31Iterate the array

for x in arr uses the list iterator, which walks contiguous PyObject* slots. The CPU prefetcher detects the linear pattern and pulls cache lines (64 bytes = 16 pointers) ahead of demand, so misses are rare.

32Add into accumulator

Trivial work compared to the memory access. On modern hardware, the integer add takes ~1 cycle; a cache hit takes ~4 cycles; a cache miss takes ~200-300 cycles. Sequential array reads are mostly hits.

33Return total + elapsed

Typically ~0.03s for n=1M. Roughly 30 ns per element, dominated by interpreter overhead — the actual memory access is essentially free thanks to prefetching.

EXECUTION STATE
elapsed = ~0.03s
35Linked-list sum

Same total operation, but now the iterator chases pointers between heap objects with no spatial locality.

36Time the traversal

Same timing harness.

37Accumulator

Same as before.

EXECUTION STATE
total = 0
38Start at head

cur is our walker. Walking by pointer instead of index is the linked-list signature pattern.

EXECUTION STATE
cur = head
39Loop until None

Standard linked-list traversal. Iteration count is exactly n.

40Add the value

Reads cur.value — an attribute lookup. The Node object lives somewhere on the heap chosen by the allocator at construction time.

41The cache-miss line

cur = cur.next jumps to wherever that node was allocated. Because the million Nodes were allocated interleaved with other Python objects (gc bookkeeping, int boxes, etc.), they are scattered across the heap. Each .next dereference is likely a cache miss — ~100-300 cycles versus ~4 for the array case. This is the line where linked lists die.

EXECUTION STATE
cur = next node (cache miss likely)
42Return total + elapsed

Typically ~0.15-0.40s for n=1M — 5-15x slower than the array traversal, despite both being O(n) with the same number of iterations. The asymptotic complexity is identical; the constant factor is decided entirely by memory layout.

EXECUTION STATE
elapsed = ~0.25s
44Driver

Standard 'if main' guard so this file works as both a module and a script.

45Set N to a million

Underscores are a numeric literal separator — purely for readability. Works in Python 3.6+, Java 7+, Rust, JS (ES2021+).

EXECUTION STATE
N = 1,000,000
46Print array timing

f-string with format spec :.3f prints to 3 decimal places. The :, format adds thousands separators.

47Print linked-list timing

Same format. Run this file. The linked-list line will be conspicuously larger — typically 8-15x — purely due to allocation and pointer chasing. Big-O told us they were equal. Reality disagrees.

9 lines without explanation
1import time
2from dataclasses import dataclass
3
4@dataclass
5class Node:
6    value: int
7    next: "Node | None" = None
8
9def append_array(n: int) -> float:
10    arr = []
11    t0 = time.perf_counter()
12    for i in range(n):
13        arr.append(i)
14    return time.perf_counter() - t0
15
16def append_linked(n: int) -> float:
17    head: Node | None = None
18    tail: Node | None = None
19    t0 = time.perf_counter()
20    for i in range(n):
21        node = Node(i)
22        if tail is None:
23            head = tail = node
24        else:
25            tail.next = node
26            tail = node
27    return time.perf_counter() - t0
28
29def sum_array(arr: list[int]) -> tuple[int, float]:
30    t0 = time.perf_counter()
31    total = 0
32    for x in arr:
33        total += x
34    return total, time.perf_counter() - t0
35
36def sum_linked(head: "Node | None") -> tuple[int, float]:
37    t0 = time.perf_counter()
38    total = 0
39    cur = head
40    while cur is not None:
41        total += cur.value
42        cur = cur.next
43    return total, time.perf_counter() - t0
44
45if __name__ == "__main__":
46    N = 1_000_000
47    print(f"append array  ({N:,}): {append_array(N):.3f}s")
48    print(f"append linked ({N:,}): {append_linked(N):.3f}s")

Same experiment in C++, where the difference is even larger because the absolute cost of memory access is a bigger fraction of the total work.

bench.cpp — std::vector vs std::forward_list, -O2
📄cpp
1Compile with -O2

Without optimization the comparison is meaningless: -O0 hides hardware costs behind interpreter-like overhead. -O2 is what production C++ ships with. -std=c++20 enables modern range-for and templated lambdas.

2Headers

<chrono> for high-resolution timing, <forward_list> for the singly linked list (std::list is doubly linked and even slower per node), <vector> for the contiguous-array competitor.

6Type alias for the clock

high_resolution_clock is typically a typedef for steady_clock on glibc — monotonic and nanosecond-resolution. Aliasing keeps the code legible.

8Generic timing helper

A function template that accepts any callable. seconds([&] { ... }) returns the elapsed seconds. The forwarding reference F&& accepts lvalues, rvalues, lambdas, function pointers — anything callable.

9Snapshot t0

auto deduces the chrono::time_point type. We capture it before the call.

10Run the benchmarked code

f() is invoked exactly once. The lambda captures by reference (&), so it can mutate v and l from the enclosing scope.

11Compute and return seconds

duration<double> converts the elapsed nanoseconds to a double in seconds. .count() pulls out the numeric value.

15constexpr N

constexpr lets the compiler embed N as an immediate constant. The single-quote separator (1'000'000) is C++14+ digit grouping, equivalent to Python's underscore.

EXECUTION STATE
N = 1,000,000
17Empty vector

std::vector<int> manages a contiguous heap-allocated buffer. Default capacity is 0; capacity grows by ~2x per push when exhausted.

18Reserve up front

v.reserve(N) preallocates capacity so push_back never triggers a realloc. This isolates 'append cost' from 'realloc cost' and is a fair comparison to the linked list, which also doesn't realloc. Without reserve the vector still wins, just by less.

19Time the vector build

Capture by reference into the lambda; invoke through seconds(). The lambda body is the only code timed.

20Push N ints into the vector

push_back into reserved capacity is just two instructions: write the int, increment the size. The CPU writes into already-cached memory thanks to spatial locality and prefetching.

23Empty forward_list

std::forward_list is the singly linked list — the smallest, fastest linked list std offers. Every node is a separate heap allocation containing { int value; node* next; }.

24Time the list build

Same harness, different container.

25push_front: O(1) per node

Each push_front calls operator new, runs the system allocator (typically tcmalloc/jemalloc/glibc malloc), threads the new node onto head, and updates head. The allocation is the cost — ~30-80 ns each on a modern x86 — versus ~1 ns for a vector push_back into reserved capacity.

28Init traversal accumulator

long long avoids overflow on N*(N-1)/2 ~ 5e11.

EXECUTION STATE
sum_v = 0
29Time the vector traversal

Read-only pass over all N elements.

30Range-for over contiguous memory

Compiles to a tight loop with a pointer increment. The CPU's hardware prefetcher recognizes the streaming pattern and pulls cache lines (64 bytes = 16 ints) ahead of demand. Effective memory latency: 0-1 ns/element.

33Init list accumulator

Same.

EXECUTION STATE
sum_l = 0
34Time the list traversal

Same number of iterations as the vector. Asymptotically O(n) for both.

35Pointer-chase the list

Each iteration reads node->value (one cache line) and node->next (same line, since they're packed in the node struct). The next iteration jumps to wherever node->next points — likely a different, uncached page. The prefetcher cannot help: it doesn't know where to prefetch until the current load resolves. Effective memory latency: 50-150 ns/element.

38Print results

On a 2024 laptop with -O2: vector traverse ~0.5 ms, fwdlist traverse ~10-30 ms — a 30-60x ratio for the same Big-O. The vector also builds ~5x faster. This is the experiment Bjarne Stroustrup demonstrates in his 'Why you should avoid linked lists' talk.

EXECUTION STATE
vector traverse = ~0.5 ms
fwdlist traverse = ~20 ms
19 lines without explanation
1// g++ -O2 -std=c++20 bench.cpp && ./a.out
2#include <chrono>
3#include <forward_list>
4#include <iostream>
5#include <vector>
6using clock_hr = std::chrono::high_resolution_clock;
7
8template <class F>
9double seconds(F&& f) {
10    auto t0 = clock_hr::now();
11    f();
12    return std::chrono::duration<double>(clock_hr::now() - t0).count();
13}
14
15int main() {
16    constexpr int N = 1'000'000;
17
18    std::vector<int> v;
19    v.reserve(N);
20    double t_v = seconds([&] {
21        for (int i = 0; i < N; ++i) v.push_back(i);
22    });
23
24    std::forward_list<int> l;
25    double t_l = seconds([&] {
26        for (int i = 0; i < N; ++i) l.push_front(i);
27    });
28
29    long long sum_v = 0;
30    double s_v = seconds([&] {
31        for (int x : v) sum_v += x;
32    });
33
34    long long sum_l = 0;
35    double s_l = seconds([&] {
36        for (int x : l) sum_l += x;
37    });
38
39    std::cout << "vector  build " << t_v*1000 << " ms, traverse " << s_v*1000 << " ms\n";
40    std::cout << "fwdlist build " << t_l*1000 << " ms, traverse " << s_l*1000 << " ms\n";
41}

On a 2024 laptop, typical results: vector traverse 0.5\approx 0.5 ms, forward_list traverse 20\approx 20 ms — a 40x ratio for two algorithms with identical O(n)O(n) complexity. The difference is the memory hierarchy.


Interactive: Pick the Right Structure

Five real-world workloads. For each one, pick the container that an experienced engineer would actually choose, then read why.

Scenario 1 of 5

Most-recently-used (MRU) cache

You need to maintain at most K items keyed by string. On every access you must (a) look up by key in O(1), (b) mark the item as most-recent, and (c) evict the least-recent in O(1) when the cache fills up.


Containers in the Wild

How standard libraries actually expose these trade-offs:

LanguageDefault sequence (array)Linked listBlock dequeOrdered map
Pythonlist (dynamic array)— (none in stdlib)collections.dequesortedcontainers / dict (insertion order)
C++std::vectorstd::list (DLL), std::forward_list (SLL)std::dequestd::map (rb-tree)
JavaArrayListLinkedList (DLL — avoid)ArrayDequeTreeMap (rb-tree)
RustVec<T>std::collections::LinkedListVecDequeBTreeMap
Goslice ([]T)container/list— (use slice + indices)— (sorted slice or 3rd-party)
JS / TSArray— (none)— (use Array or 3rd-party)Map (insertion-ordered)

Two patterns jump out. First, every modern language ships a dynamic array as the default sequence, because that is the right answer ~70% of the time. Second, several languages either omit the linked list entirely (Python, JS/TS) or warn against it (Rust's docs literally say "you almost always want Vec or VecDeque"). The ecosystem has spoken: linked lists are a niche tool.


Pitfalls in Choosing

  • Reflexively reaching for LinkedList from intro classes. The course said it was O(1)O(1) at the head, so it must be fast. It isn't. Always benchmark before committing.
  • Believing Big-O without measuring. Two O(n)O(n) algorithms can differ by 100x on real hardware. Big-O ranks scaling, not speed.
  • Premature pointer-itis. "I might need to splice in the middle someday" is the most expensive sentence in performance engineering. Profile real workloads, not imagined ones.
  • Java LinkedList. It implements both List and Deque, which makes it look like a universal answer. It is almost never the right answer. Use ArrayList or ArrayDeque.
  • Forgetting allocator pressure. Every new linked-list node is a malloc/new — slow on its own and fatal under contention from many threads. Arena and slab allocators help, but they push the complexity onto you.
  • Ignoring the auxiliary index. If you need both fast lookup and fast splicing, you need both a hash map and a list — not one or the other. The LRU cache is the archetype; copy it.
  • Using a linked list where a deque belongs. If you only need both-ends operations, a block deque is faster, simpler, and more cache-friendly. A linked list is overkill for queues and stacks.

Quick Checks

Quick Check

You will store 1,000,000 sensor readings, then iterate them sequentially 100 times to compute statistics. No insertions in the middle, no deletions. Which container?

Quick Check

You are designing an LRU cache for HTTP responses, keyed by URL, that must serve get and put in O(1) and evict the least recently used entry when full. Which combination?

Quick Check

Bjarne Stroustrup demonstrated that std::vector beats std::list for sorted-insertion of integers up to n = 500,000, despite both being O(n^2). What is the dominant reason?

Loading comments...