Chapter 7
18 min read
Section 38 of 261

Array vs Linked List Implementation

Stacks

The previous section described the Stack ADT as a black box: push, pop, peek, size, empty, all running in O(1)O(1). That contract says nothing about how memory is laid out. Two implementations dominate every stack you will ever touch: a dynamic array and a singly linked list. Both deliver O(1)O(1) per operation. They are not interchangeable.

This section dissects both, traces the cost of every line of code, and shows when each one is the right call. By the end you should be able to read std::stack\texttt{std::stack}, ArrayDeque\texttt{ArrayDeque}, the JVM operand stack, and the lock-free Treiber stack in java.util.concurrent\texttt{java.util.concurrent} — and explain why each one chose the implementation it did.


One ADT, Two Skeletons

The Stack ADT has only one privileged end: the top. Everything we need from an implementation is the ability to add and remove at that one end in constant time. Two memory layouts give us that:

ImplementationWhere is the top?pushpop
Dynamic arrayLast element of the arrayappend (write to slot size)pop_back (clear slot size−1)
Singly linked listThe head nodeprepend (allocate, point new node at head)advance head (free old head)

Both choices avoid the costly end of their underlying structure: the array never shifts elements, the linked list never walks to the tail. That is the whole trick — everything else is engineering.

Why not push at index 0 of an array?

Because that would force every existing element to slide right by one slot — O(n)O(n) per push. The top of an array stack is always the high-index end. Equivalently, the top of a linked-list stack is always the head, never the tail (a singly linked tail-push would need to walk the whole list, which is also O(n)O(n)).

The Array-Backed Stack

A dynamic array stack stores elements in a contiguous buffer of capacity CC, with a counter nCn \le C tracking the current size. The top of the stack lives at index n1n - 1.

Memory Layout

📝text
1capacity = 8
2       ┌───┬───┬───┬───┬───┬───┬───┬───┐
3data:  │ 3 │ 7 │ 1 │ 4 │ 9 │ · │ · │ · │
4       └───┴───┴───┴───┴───┴───┴───┴───┘
5         0   1   2   3   4   5   6   7
67                      top = 4 (size = 5)

Push writes to slot nn and increments nn. Pop decrements nn and returns the slot (optionally clearing it for the GC). When n=Cn = C we cannot push without first resizing the buffer.

Capacity Doubling and Amortized O(1)O(1)

The standard growth policy doubles capacity on overflow. A single resize is expensive — it allocates a new buffer of size 2C2C and copies all CC existing elements — but it is rare. Over a sequence of nn pushes starting from capacity 1, the total copy cost is

1+2+4++2log2n2n1 + 2 + 4 + \cdots + 2^{\lfloor \log_2 n \rfloor} \le 2n,

so the average cost per push is bounded by a constant. This is the classic amortized O(1)O(1) argument we will formalize in Chapter 5. Worst-case push, however, is O(n)O(n) — the one push that triggers a resize copies the whole buffer. That distinction matters in real-time systems, where you cannot afford an unbounded latency spike.

Amortized vs worst-case

Most stacks you encounter on a server, in a compiler, or in a game engine are array-backed because amortized O(1)O(1) is genuinely fast. Avionics, audio DSP, and hard-real-time control systems often cannot tolerate a single O(n)O(n) push and prefer a linked-list stack or a fixed-capacity bounded stack.

Python Implementation

Python's built-in list is itself a dynamic array, so a Pythonic stack is one line: stack = []; push is stack.append(x); pop is stack.pop(). That is exactly the right thing in production. To understand what append does, here is the same machinery written explicitly:

ArrayStack: dynamic-array implementation
🐍array_stack.py
2Pre-allocate a buffer

We start with a small fixed-size backing list. The buffer is sized by capacity, NOT by number of elements stored — there is a real distinction between capacity (4) and size (0).

EXAMPLE
_data = [None, None, None, None], _size = 0, _capacity = 4
3Track logical size separately

_size counts how many slots are live. Slots beyond _size are reserved (capacity) but unused.

4Track buffer capacity

_capacity is len(_data). We could call len(self._data) but caching it makes resize logic clearer and saves a function call in the hot path.

13Capacity check before write

If _size == _capacity, the buffer is full. We must grow before writing to slot _size.

EXAMPLE
After 4 pushes from initial_capacity=4: _size == _capacity == 4. Next push triggers _resize(8).
14Double the capacity

Doubling gives amortized O(1) push. Growth factor of 2 is canonical; CPython uses a milder growth (≈ 1.125) to save memory; C++ std::vector uses 2 on libc++ and 1.5 on libstdc++.

15Write at the top

Slot _size is the next free slot. After writing, this becomes the new top.

EXAMPLE
Before: _data = [3, 7, 1, 4, None, None, None, None], _size = 4. push(9): _data[4] = 9.
16Increment size

Now _size = 5 and _data[4] = 9 is the top. Order matters: write THEN increment, so an exception during the write does not leave _size pointing at uninitialized memory.

19Empty check on pop

Popping an empty stack is a programming error, not a recoverable one. Raise rather than return None — None is a perfectly valid stack element.

21Decrement first

Decrement _size first so the slot at the OLD top index is now logically out of bounds. This is symmetric with push (write-then-increment, decrement-then-read).

EXAMPLE
Before pop: _size = 5, top at index 4. After: _size = 4, value at index 4 is being returned.
22Read the value

Read from the slot we just logically released. The slot still physically holds the value until the next line.

23Clear the slot for GC

Without this line, _data still holds a reference to the popped object. The object cannot be garbage-collected even though it is no longer reachable through the stack API. Setting to None drops that reference.

EXAMPLE
If you push a 1GB tensor and pop it, you want it freed — not pinned in _data.
28Peek = read top without removing

Top lives at index _size - 1. Peek does not modify state.

31Resize: allocate new buffer

new_capacity is double the old capacity. We allocate a fresh list of that size — this is the expensive step, O(new_capacity).

33Copy elements

Copy all _size live elements into the new buffer. Slots from _size to new_capacity stay None. This is the O(n) cost that gives push its amortized analysis.

EXAMPLE
Old _data = [3, 7, 1, 4], new_data starts as [None]*8, after loop = [3, 7, 1, 4, None, None, None, None].
35Swap buffers

_data now points at the new, larger buffer. The old buffer becomes garbage and is freed by Python's GC.

22 lines without explanation
1class ArrayStack:
2    def __init__(self, initial_capacity: int = 4) -> None:
3        self._data: list[object] = [None] * initial_capacity
4        self._size: int = 0
5        self._capacity: int = initial_capacity
6
7    def __len__(self) -> int:
8        return self._size
9
10    def is_empty(self) -> bool:
11        return self._size == 0
12
13    def push(self, value: object) -> None:
14        if self._size == self._capacity:
15            self._resize(self._capacity * 2)
16        self._data[self._size] = value
17        self._size += 1
18
19    def pop(self) -> object:
20        if self._size == 0:
21            raise IndexError("pop from empty stack")
22        self._size -= 1
23        value = self._data[self._size]
24        self._data[self._size] = None  # release reference for GC
25        return value
26
27    def peek(self) -> object:
28        if self._size == 0:
29            raise IndexError("peek on empty stack")
30        return self._data[self._size - 1]
31
32    def _resize(self, new_capacity: int) -> None:
33        new_data: list[object] = [None] * new_capacity
34        for i in range(self._size):
35            new_data[i] = self._data[i]
36        self._data = new_data
37        self._capacity = new_capacity

Pointer invalidation in C/C++

In Python every element is a reference, so resize is safe. In C++, std::vector resize invalidates pointers, iterators, and references into the buffer. If you held &stack.top() across a push, your pointer now dangles. This is the single most common bug when migrating from a linked-list stack to an array stack — we will see how std::stack sidesteps it below.

The Linked-List Stack

A linked-list stack stores each element in its own heap-allocated node. Each node holds a value and a pointer to the next node. The stack itself is a single pointer, head, pointing at the top node.

Memory Layout

📝text
1head ──► [9 | ●] ──► [4 | ●] ──► [1 | ●] ──► [7 | ●] ──► [3 | ∅]
2          top                                              bottom
3
4  push(v): new node N with N.next = head; head = N
5  pop():    h = head; head = head.next; free h; return h.value

Both push and pop touch only the head pointer and at most one node. There is no “buffer full” case — growth is one allocation per push. There is also no resize spike: every push pays exactly the same constant cost, so worst-case push is O(1)O(1), not just amortized.

Python Implementation

LinkedStack: singly-linked-list implementation
🐍linked_stack.py
1dataclass for the node

@dataclass auto-generates __init__, __repr__, __eq__. Cleaner than writing a class by hand for a pure data carrier.

5Forward reference in the type

Optional["_Node"] uses a string forward reference because _Node is being defined right now. At runtime Python evaluates this lazily.

6Two fields, that is all

A node is value + next pointer. No prev pointer (this is a singly linked list, not doubly). For a stack, prev is unnecessary — we never need to walk backwards.

EXAMPLE
Memory per node in CPython: ~64 bytes overhead for the object + pointer fields. Compare to ~8 bytes per slot in ArrayStack.
11head starts at None

An empty stack has no top node. head = None is the canonical empty state.

21Push: allocate, prepend, retarget head

Three things in one expression: (1) allocate a new node, (2) point its next at the current head, (3) make head point at the new node. THIS IS A SINGLE LINE because it must be atomic from the data structure's perspective — concurrent code needs all three to happen together.

EXAMPLE
Before: head → [4]→[1]→∅. push(9): allocate N(9, next=head). head = N. After: head → [9]→[4]→[1]→∅.
26Empty check on pop

head is None means the stack is empty. Same convention as ArrayStack.

28Save the old head

We need it for two reasons: to read its value (line 32), and to break its next pointer (line 30) so the popped node does not pin the rest of the list in memory.

29Advance head

head now points at what WAS the second node. The old head is now unreachable from the stack — but it still holds a next pointer into the live list.

30Sever the popped node

Set node.next = None so the popped node does not transitively reference the rest of the list. Without this, if the caller holds onto the returned value AND somehow the popped node, the WHOLE LIST would be retained. Defensive hygiene.

32Return the value, not the node

The node is an internal implementation detail. Callers see only the user value — they should never know nodes exist.

26 lines without explanation
1from dataclasses import dataclass
2from typing import Optional
3
4@dataclass
5class _Node:
6    value: object
7    next: Optional["_Node"] = None
8
9class LinkedStack:
10    def __init__(self) -> None:
11        self._head: Optional[_Node] = None
12        self._size: int = 0
13
14    def __len__(self) -> int:
15        return self._size
16
17    def is_empty(self) -> bool:
18        return self._head is None
19
20    def push(self, value: object) -> None:
21        self._head = _Node(value=value, next=self._head)
22        self._size += 1
23
24    def pop(self) -> object:
25        if self._head is None:
26            raise IndexError("pop from empty stack")
27        node = self._head
28        self._head = node.next
29        node.next = None  # break reference, help GC
30        self._size -= 1
31        return node.value
32
33    def peek(self) -> object:
34        if self._head is None:
35            raise IndexError("peek on empty stack")
36        return self._head.value

Why a dataclass is fine here

For a stack, we never need to walk backwards or splice in the middle, so a single-field next pointer suffices. If you find yourself needing a doubly linked list inside a stack implementation, stop — you almost certainly want a deque (Chapter 8) instead.

Interactive: Two-Stacks Race

The two implementations look almost identical from the outside. The difference is what they cost per push. Drive both at once below: every click pushes (or pops) the same value on both stacks. The chart records the per-operation cost in arbitrary work units — one unit per slot write or pointer assignment, plus a copy-cost spike whenever the array stack resizes.

Loading interactive race…
  • Push 4 elements. The array fills its initial buffer; cost is flat (1 unit each). The linked list pays 3 units per push (allocate + 2 writes).
  • Push the 5th element. The array stack resizes: its bar shoots up to 5\approx 5 units (copy 4 + write 1). Linked list bar stays flat at 3.
  • Push the 9th, 17th, 33rd elements: each triggers another resize, with bars 9,17,33\approx 9, 17, 33. The spikes get bigger but rarer — this is amortized O(1)O(1) in pictures.
  • Watch the average cost. Even with the spikes, the array stack's average creeps toward a small constant (around 2 units). Linked stack average is locked at 3.

What the chart is and is not

These are didactic work units, not benchmark cycles. In real hardware, an array push is much cheaper than a linked-list push not because of operation count but because of cache locality: the array slot you are writing is in L1; the linked-list node is a fresh allocation that may miss every cache. We will see this measured in the next subsection.

Side-by-Side Comparison

PropertyArray stackLinked-list stack
push (average)O(1) amortizedO(1) worst-case
push (worst-case)O(n) on resizeO(1)
popO(1)O(1)
peekO(1)O(1)
Memory per element1 word (the value itself)≥ 2 words (value + next), plus per-allocation overhead
Allocator pressureRare, large allocations on resizeOne allocation per push
Cache localityExcellent (contiguous)Poor (pointer-chased nodes)
Pointer to top stable across push?No (resize invalidates)Yes
Trivial to bound sizeYes (cap capacity)Same, but less natural
Lock-free implementationHard (must coordinate index + buffer)Standard pattern (Treiber stack)
Works with arena allocatorAwkward (resize fights the arena)Natural

Why Cache Locality Dominates in Practice

Modern CPUs run roughly two orders of magnitude faster than main memory. An L1 hit takes ~1 ns; a main-memory miss takes ~100 ns. The array stack's top is almost always in L1 because we just touched the adjacent slots. The linked-list stack's top is a freshly allocated node, often on a cache line that has never been touched — every push misses.

For typical short-lived stacks of small values, an array-backed stack is between 3× and 10× faster than a linked-list stack on commodity hardware, despite both being “O(1)O(1)”. This is why std::stack\texttt{std::stack}, Java's ArrayDeque\texttt{ArrayDeque}, Python's list, and the JVM operand stack are all array-backed.

Quick Check

You push 9 elements onto an array stack with initial capacity 4 and growth factor 2 (so capacities go 4, 8, 16, …). Counting 1 unit per slot write and 1 per element copied during resize, the total work is closest to:


The Bounded Stack Variant

Sometimes you want the array stack without the resize. A bounded stack has a fixed capacity decided at construction; pushing past capacity is an error rather than a trigger to grow.

🐍python
1class BoundedStack:
2    def __init__(self, capacity: int) -> None:
3        if capacity <= 0:
4            raise ValueError("capacity must be positive")
5        self._data: list[object] = [None] * capacity
6        self._size = 0
7        self._capacity = capacity
8
9    def push(self, value: object) -> None:
10        if self._size == self._capacity:
11            raise OverflowError("stack overflow")
12        self._data[self._size] = value
13        self._size += 1
14
15    def pop(self) -> object:
16        if self._size == 0:
17            raise IndexError("pop from empty stack")
18        self._size -= 1
19        value, self._data[self._size] = self._data[self._size], None
20        return value

Bounded stacks dominate in:

  • Embedded firmware — no malloc, every byte budgeted at compile time.
  • Hardware call stacks — an x86 thread stack is a fixed 1–8 MB; overflow is a hard fault.
  • Fixed-depth recursion guards — explicit stacks for DFS where a depth limit is part of the spec.
  • Interpreters with predictable working set — the JVM operand stack has a per-method max_stack attribute computed at compile time so the interpreter pre-allocates exactly the right amount.

Always-bounded contract

A bounded stack is the only implementation that gives you a hard worst-case memory bound. The unbounded array and linked stacks both grow on demand; if your input is adversarial, that growth is unbounded too.

Concurrency: The Treiber Stack

Both implementations above are single-threaded. If two threads call push simultaneously on the same stack, you can lose updates, corrupt the head pointer, or skip a slot. The standard fix is a mutex around every operation. The elegant fix — available only on the linked-list version — is the Treiber stack (Treiber 1986), a lock-free stack that uses a single atomic compare-and-swap (CAS) on the head pointer.

🐍python
1# Pseudocode — real Treiber stack needs language-level atomics.
2# In Python you would use threading.Lock; in Java, AtomicReference;
3# in C++, std::atomic<Node*> with memory ordering.
4
5def push(stack, value):
6    new_node = Node(value=value)
7    while True:
8        old_head = stack.head            # atomic load
9        new_node.next = old_head
10        if cas(stack.head, old_head, new_node):  # atomic compare-and-swap
11            return                       # success; otherwise retry
12
13def pop(stack):
14    while True:
15        old_head = stack.head            # atomic load
16        if old_head is None:
17            return None                  # or raise
18        new_head = old_head.next
19        if cas(stack.head, old_head, new_head):
20            return old_head.value

The CAS atomically does this: if stack.head still equals old_head, replace it with the new value and return success; otherwise return failure and leave it alone. The loop retries on contention. This is lock-free — some thread always makes progress — but not wait-free: a single thread could in principle spin forever if it is unlucky.

The ABA problem

Treiber's algorithm has a famous bug. Suppose thread T1 reads head = A and is about to CAS. Meanwhile thread T2 pops A, pops B, and pushes A again (perhaps reusing the same memory). When T1 wakes up, the CAS sees head == A and succeeds — but A's next pointer now points at deallocated memory. The fix is a tagged pointer (head + 64-bit version counter, atomically incremented on every change) or hazard pointers. In Java this is hidden by the GC: A is not reclaimed while T1 still holds a reference. In C++ you must implement it yourself.

An array-backed lock-free stack is much harder because you must atomically coordinate two things: the size index and the buffer pointer (which can move on resize). This is why every lock-free stack you will encounter — ConcurrentLinkedDeque\texttt{ConcurrentLinkedDeque} internals, the GC's mark stack, kernel work-stealing queues — uses a linked-list layout.

Quick Check

Why is the Treiber stack called 'lock-free' but not 'wait-free'?


C++ Implementations

C++ exposes the array-vs-linked tension in the standard library itself. std::stack is a container adapter: it wraps an underlying container and forwards push/pop to it. The default underlying container is std::deque, not std::vector, for one specific reason — we will see why.

Array-Backed Stack in C++

ArrayStack in C++ over std::vector
📄array_stack.hpp
6Generic over element type

Templated on T so a single class works for ArrayStack<int>, ArrayStack<std::string>, ArrayStack<MyType>, etc.

9push: forward to vector

vector::push_back amortizes reallocation just like our Python ArrayStack._resize. Capacity doubles (libc++) or grows by ×1.5 (libstdc++) on overflow.

EXAMPLE
If T = std::string, this copies the string into the vector slot.
10push: rvalue overload for moves

When the caller passes a temporary or std::move(x), this overload moves rather than copies — critical for non-trivially-copyable T like std::vector or std::unique_ptr.

EXAMPLE
stack.push(std::move(big_string)) avoids copying the string contents.
13Defensive empty check

Standard library's std::stack does NOT do this — popping an empty std::stack is undefined behavior. We add the check to make our class safer at small cost.

14pop_back is O(1)

vector::pop_back destroys the last element and decrements size; capacity is unchanged. Critically, no resize is ever triggered by pop — only push grows.

18top returns a REFERENCE

We return T& so the caller can read or modify in place. But this reference is invalidated by the next push that resizes! See the warning below.

19back() = the high-index end

vector::back() returns *(end() - 1). For a stack, this is the top.

20 lines without explanation
1#include <cstddef>
2#include <stdexcept>
3#include <utility>
4#include <vector>
5
6template <typename T>
7class ArrayStack {
8public:
9    void push(const T& value)  { data_.push_back(value); }
10    void push(T&& value)       { data_.push_back(std::move(value)); }
11
12    void pop() {
13        if (data_.empty()) throw std::out_of_range("pop from empty stack");
14        data_.pop_back();
15    }
16
17    T& top() {
18        if (data_.empty()) throw std::out_of_range("top of empty stack");
19        return data_.back();
20    }
21
22    bool empty() const noexcept { return data_.empty(); }
23    std::size_t size() const noexcept { return data_.size(); }
24
25private:
26    std::vector<T> data_;
27};

The pointer-stability footgun

top() returns T&. After a push that triggers a vector reallocation, that reference is dangling. Code like
📄cpp
1auto& x = stack.top();
2stack.push(other);   // may reallocate
3x = something;       // UNDEFINED BEHAVIOR — x dangles
is the most common bug when migrating from a linked-list stack (where node addresses are stable) to a vector-backed stack. This is why std::stack defaults to std::deque: deque uses a chunked layout so element addresses are stable across push (though not insert/erase). For most users the deque default is fine; for the lowest possible overhead specify std::stack<T, std::vector<T>> and never hold a reference across a push.

Linked-List Stack in C++

📄cpp
1#include <memory>
2#include <stdexcept>
3#include <utility>
4
5template <typename T>
6class LinkedStack {
7    struct Node {
8        T value;
9        std::unique_ptr<Node> next;
10        Node(T v, std::unique_ptr<Node> n)
11            : value(std::move(v)), next(std::move(n)) {}
12    };
13
14    std::unique_ptr<Node> head_;
15    std::size_t size_ = 0;
16
17public:
18    void push(T value) {
19        head_ = std::make_unique<Node>(std::move(value), std::move(head_));
20        ++size_;
21    }
22
23    T pop() {
24        if (!head_) throw std::out_of_range("pop from empty stack");
25        T value = std::move(head_->value);
26        head_ = std::move(head_->next);  // old head freed here
27        --size_;
28        return value;
29    }
30
31    T& top() {
32        if (!head_) throw std::out_of_range("top of empty stack");
33        return head_->value;
34    }
35
36    bool empty() const noexcept { return !head_; }
37    std::size_t size() const noexcept { return size_; }
38};

The unique_ptr chain handles all the memory management: when head_ is reassigned in pop, the old head is destroyed, its next moves out first so the destruction does not recursively destroy the entire list (which would blow the call stack on a deep stack). top() is now reference-stable across push, the trade-off being two heap allocations per push and a near-guaranteed cache miss.

std::stack's actual default

The standard says std::stack<T> is std::stack<T, std::deque<T>>. std::deque is neither pure array nor pure list — it is an array of fixed-size chunks. Push amortized O(1)O(1), references stable, no whole-buffer copy. It is the “safe default” that gives up some cache locality for pointer stability.

Real-World Stacks

StackImplementationWhy this choice
x86 hardware call stackArray (memory grows downward, %rsp register)Single allocation at thread creation; no malloc per call frame; cache-hot.
JVM operand stack (per method)Array, max depth precomputed by javacBytecode declares max_stack at compile time → static allocation, no resize ever.
Python list as a stackArray (CPython listobject.c)Idiomatic stack idiom in Python; same dynamic-array machinery.
Java java.util.Stack (legacy)Array (extends Vector)Predates ArrayDeque; synchronized; effectively deprecated.
Java ArrayDeque (preferred)Circular arrayFaster than Stack and LinkedList for stack workloads; cache-friendly.
Treiber stack in JUCLinked list with AtomicReference headLock-free push/pop via CAS; basis of ConcurrentLinkedQueue and many GC mark stacks.
Compiler activation recordsHardware stack (array)Function call/return is the hot path; cache locality dominates everything.
PostScript interpreter operand stackArray, fixed default depthPredictable footprint for printer firmware; overflow is a defined error.
RPN calculator (HP-style)Array, fixed (often depth 4)Hardware-defined fixed depth; older models actually used 4 named registers.
Browser back-button historyArray per tabBounded by user navigation length; trivial to bound and prune.
DFS on huge graphExplicit linked-list or array stackReplaces system call stack to avoid stack-overflow on deep graphs.
GC mark stack (JVM, V8, .NET)Linked array of fixed-size chunks (a chunked stack)Hybrid: cache locality of array + worst-case O(1) push of linked list.

Notice the chunked GC mark stack: it is essentially a linked list of arrays. Each chunk is contiguous (cache-friendly); chunks are linked together (no resize spike, no whole-buffer copy). When you find yourself wanting both properties, this hybrid is the canonical answer — it is what std::deque is, what the V8 GC's mark stack is, and what every production JVM uses internally for marking.


Pitfalls

1. Conflating amortized with worst-case in a real-time loop

A push that usually takes 5 ns and occasionally takes 5 ms because of a resize will be invisible in average-case microbenchmarks and catastrophic in an audio callback. If you are inside a 1\le 1 ms latency budget, prefer a linked-list stack or pre-reserve the array's capacity (vec.reserve(N)).

2. Holding a reference into the array stack across a push

Already shown in C++ above. Same trap exists in Rust's Vec (the borrow checker catches it at compile time, mercifully) and in Go slices that share a backing array (the runtime does not catch it). Whenever the underlying buffer can move, internal pointers can dangle.

3. Per-allocation overhead in tight loops

A linked-list stack with 10710^7 pushes calls malloc ten million times. On glibc that is ~30 ns per call — 3×1083 \times 10^8 ns = 0.3 s of pure allocator overhead. The same workload on an array stack pays log210724\log_2 10^7 \approx 24 resize allocations total. Use a node pool / arena allocator if you genuinely need a linked stack in a tight loop.

4. ABA in lock-free linked stacks

Already shown above. The fix is tagged pointers (std::atomic<TaggedPointer> with a 64-bit version counter) or hazard pointers, or a managed runtime with GC, or epoch-based reclamation. Never roll your own production lock-free stack without addressing this.

5. Forgetting to clear popped slots in array stack

If T owns resources (file handles, GPU buffers, big strings), leaving the popped slot pointing at the old object means the resource is held until the slot is overwritten or the stack itself is destroyed. The self._data[i] = None line in our Python implementation and the implicit destruction in C++'s vector::pop_back exist precisely to avoid this leak.


Choosing an Implementation

A short decision tree:

  1. Default: array stack. If you do not have a specific reason to choose otherwise, an array-backed stack (list in Python, std::stack over std::vector in C++, ArrayDeque in Java) wins on cache locality, allocator overhead, and code simplicity.
  2. Real-time / bounded latency. Use a bounded array stack with capacity reserved up front, or a linked-list stack with a node pool. Never let the resize spike happen inside the latency budget.
  3. Concurrent push/pop from many threads. Linked list with CAS on the head (Treiber stack). Wrap with hazard pointers or rely on a GC.
  4. Need stable references to elements (you keep a pointer to top\text{top} across mutations). Linked list, or std::deque as a chunked compromise.
  5. Strict memory budget (embedded, kernel). Bounded array stack. Allocation is statically determined.
  6. Both cache locality and worst-case O(1)O(1) push. Chunked stack (linked list of fixed-size arrays). This is what GC mark stacks and std::deque are.
Engineering takeaway:O(1)O(1) per operation” is necessary but rarely sufficient. The constants — cache lines, allocator calls, pointer stability, lock-freedom — are what determine which implementation actually ships. Knowing both implementations lets you pick the right one and explain the choice.

The next section pulls back from data-structure mechanics to look at the most consequential stack in computing: the system call stack, the array stack that every running program uses for every function call.

Loading comments...