Chapter 6
15 min read
Section 30 of 261

Nodes, Pointers, and Memory

Linked Lists

The Pointer Idea

A pointer is a value whose meaning is “the location of something else in memory.” Concretely, on a 64-bit machine, a pointer is an 8-byte unsigned integer that names a byte address in your process's virtual address space. Conceptually—and this is the picture you should carry into every diagram in this chapter—a pointer is an arrow from one object to another.

Languages disagree about how loud they are about pointers. In C, you write int* p and you can print the address with %p; the address arithmetic is yours to do. In Java and Python, every variable that names a non-primitive object is implicitly a pointer—assigning b = a copies the arrow, not the object—but the language forbids you from inspecting or arithmetically manipulating the address.

Why this matters: The data structures in this chapter (and most of the rest of the book) are built by chaining objects together with pointers. Every lookup, insert, delete, splice, and reversal is ultimately a few rewrites of pointer fields. If you can fluently picture an arrow being unhooked from one box and rehooked onto another, the rest of linked structures collapses into careful bookkeeping.

What Is a Node?

A node is the smallest unit of a linked structure: a small struct that bundles a value with one or more pointer fields aimed at other nodes. The simplest node has exactly one pointer:

Node  =  (value,next)\text{Node} \;=\; (\text{value}, \text{next})

That second component, next, is the entire reason linked lists exist. With it, a node knows about the node that comes after it, and we can chain nodes into arbitrarily long sequences without needing them to live next to each other in memory.

Variants you'll meet in later sections of this chapter:

  • Doubly linked node: (value,prev,next)(\text{value}, \text{prev}, \text{next}) — two pointers, walk both directions.
  • Tree node: (value,left,right)(\text{value}, \text{left}, \text{right}) — two pointers, branch instead of chain (Chapter 9).
  • Graph node: (value,[neighbors])(\text{value}, [\text{neighbors}]) — many pointers, no shape constraint (Chapter 16).

Addresses vs References

Two flavors of the same idea cause endless confusion. It pays to name the difference precisely.

ConceptC / C++Java / Python / JS
What a variable holdsEither a value or an explicit pointer (int vs int*)Primitives hold values; everything else holds an implicit reference
Take the address&x produces a pointer to xNot exposed; the language hides it
Follow the pointer*p or p->fieldImplicit on every field access (a.next)
Pointer arithmeticp + 1 advances by sizeof(*p)Forbidden
Null sentinelNULL or nullptrNone (Python), null (Java/JS)

Reference is a pointer in disguise

When the Java spec says “reference” or the Python docs say “name binding,” the runtime is using pointers under the hood. The difference is permission, not mechanism: managed languages refuse to let you read the integer address, do arithmetic on it, or forge a pointer out of thin air. That removes whole classes of bugs (dangling pointers, buffer overruns) at the cost of needing a garbage collector.

Stack vs Heap Allocation

Nodes live on the heap. Local variables, function arguments, and return addresses live on the stack. The distinction governs lifetime and cost.

PropertyStackHeap
Allocation cost~1 ns (move stack pointer)~50–100 ns (allocator bookkeeping)
DeallocationAutomatic at function returnManual (C) or via GC (Java/Python)
LifetimeTied to enclosing function frameIndependent — survives any function
LayoutContiguous, predictableScattered wherever the allocator finds room
Typical useLocals, arguments, return addressesLong-lived objects, dynamic structures

A linked-list node must live on the heap because it has to outlive whatever function created it. That's why a Python Node() call or a C malloc(sizeof(Node)) is irreducible: there's a real cost, in the tens of nanoseconds, every time you grow the list.

Big-O hides this constant. An array push is O(1)O(1) amortized with a constant of ~1 ns; a linked-list prepend is also O(1)O(1) but with a constant of ~50 ns. They have the same asymptotic class and wildly different real-world throughput. Always measure when the constant matters.

Memory Layout: Array vs Linked List

Picture an array of three ints and a linked list of three ints. The bytes look completely different.

📝text
1Array (contiguous):
2
3  0x1000:  [ 7 ][ 3 ][ 9 ]    one cache line, three values, no pointers
4            ^^^   ^^^   ^^^
5            4B    4B    4B   = 12 bytes total payload
6
7Linked list (scattered):
8
9  0x1000:  [ 7 | 0x2080 ]    16 bytes (4 value + 4 pad + 8 ptr)
10  0x2080:  [ 3 | 0x40C0 ]    16 bytes
11  0x40C0:  [ 9 | NULL    ]   16 bytes
12
13Total = 48 bytes for the same 3 ints, plus pointer-chase across 3 cache lines.

Two consequences fall out of this picture:

  1. Cache friendliness. Walking an array is one cache line per ~16 ints (typical 64-byte line). Walking a linked list is one cache miss per node in the worst case, because nodes can sit anywhere on the heap. On modern CPUs a cache miss is ~100× slower than an L1 hit—this is the dominant performance gap between arrays and lists.
  2. Random access. Arrays support a[k]a[k] in O(1)O(1) by computing base+kstride\text{base} + k \cdot \text{stride}. Linked lists have no such formula: to reach the kk-th node you must hop kk times from the head. That's O(k)O(k).

So why ever pay for a linked list? Two operations that arrays make slow:

  • Insert/delete in the middle given a pointer to the spot: O(1)O(1) for a list (rewrite two pointers); O(n)O(n) for an array (shift every element after the gap).
  • Splice/concatenate two lists in O(1)O(1) by re-pointing one tail. With arrays you must allocate-and-copy.
The right question is never “array or linked list?” in the abstract. It's “does my workload do mostly random access (array wins) or mostly local edits given a cursor (list wins)?”

Anatomy of a Node in Python

Python is the easiest place to start because the runtime hides every detail that doesn't matter for the algorithm. Here is a full minimal node, three nodes built by hand, and a complete traversal.

A node, a list, a traversal — pure Python
🐍python
1Define the Node class

A Node is just a Python object bundling two attributes: the payload (value) and a reference to another Node (next). Defining a class allocates no nodes yet — only the blueprint.

2Constructor signature

__init__ runs every time we say Node(...). value is the payload (any Python object). next defaults to None — the conventional terminator that means 'no successor'.

3Store the value

self.value = value attaches the payload to this object. Internally, Python stores a reference to the value object — even ints are objects in CPython, though small ints are cached.

4Store the next pointer

self.next holds a reference to another Node — this IS the pointer. In CPython terms, it's a PyObject* under the hood; from Python it looks like an attribute, but it's the arrow you draw in diagrams.

7Allocate the tail node

Node(9) calls __init__ with value=9 and next=None (the default). CPython's allocator hands out a fresh chunk of heap memory big enough for the object header + two attribute slots. n3 holds the address.

EXAMPLE
n3 -> [value=9 | next=None]
8Allocate the middle node

Node(3, n3) allocates another node. The second argument n3 is bound to self.next, so this node's pointer aims at the node we just made. n3 itself is a name (a local reference) — passing it copies the reference, not the node.

EXAMPLE
n2 -> [value=3 | next=*] -> n3
9Allocate the head node

Node(7, n2) creates the front of the list. We name the variable 'head' by convention — it's the only handle the program needs to reach every other node. Lose head and the whole chain becomes garbage.

EXAMPLE
head -> [7|.] -> [3|.] -> [9|None]
12Cursor variable p

p is a separate name pointing at the same object head points at. Walking the list means moving p forward without touching head — never overwrite head unless you mean to.

13Loop until terminator

'is not None' uses identity comparison, which is the right test for the sentinel. Don't write '!= None' (PEP 8) — None is a singleton, so identity is faster and intentional.

14Visit the value

p.value reads the payload of the current node. This is the only place the loop interacts with data; everything else is plumbing.

EXAMPLE
iter 1: prints 7
iter 2: prints 3
iter 3: prints 9
15Hop one step (the canonical move)

p = p.next rebinds p to the successor. This single line is the heart of every linked-list traversal — memorize it. After three hops, p becomes None and the loop terminates.

LOOP TRACE · 4 iterations
before iter 1
p = node@7
after iter 1
p = node@3
after iter 2
p = node@9
after iter 3
p = None
4 lines without explanation
1class Node:
2    def __init__(self, value, next=None):
3        self.value = value
4        self.next = next
5
6# Build  head -> 7 -> 3 -> 9 -> None  by hand
7n3 = Node(9)
8n2 = Node(3, n3)
9head = Node(7, n2)
10
11# Walk the list and print each value
12p = head
13while p is not None:
14    print(p.value)
15    p = p.next

After all three constructor calls, the object graph in CPython's heap looks exactly like the diagram in the playground below: three independent PyObjects, each carrying an integer reference and a next reference; the names head, n2, n3 are local variables that go out of scope when main returns—but the nodes themselves stay alive as long as head reaches them.


Anatomy of a Node in C

C drops the disguise. Every malloc and every dereference is explicit. The same algorithm becomes longer but more honest.

The same list — explicit malloc / free in C
📄c
4Declare the Node struct

struct Node holds an int payload and a pointer to another Node. The 'struct Node*' inside Node is a self-referential pointer — legal because pointers have a fixed size (8 bytes on 64-bit), so the compiler doesn't need the full Node layout to size the field.

5The value field

Plain int — 4 bytes on most platforms. The compiler may insert 4 bytes of padding before the next pointer to satisfy 8-byte alignment, so a Node typically occupies 16 bytes, not 12.

6The next pointer

8 bytes holding the address of another Node (or NULL). This is the literal pointer — there's no Pythonic illusion here: you can print it as a hex number with %p.

9Helper to allocate a node

Returns Node* — a pointer to a freshly heap-allocated node. We need the heap, not the stack, because the node must outlive this function's frame.

10malloc requests heap memory

malloc(sizeof(Node)) asks the allocator for 16 bytes. The cast (Node*) tells the compiler we'll use those bytes as a Node. malloc may return NULL if the system is out of memory; production code must check.

EXAMPLE
p = 0x55a3b40   // some heap address
11Write the value through the pointer

p->value is shorthand for (*p).value. Dereferences p, then writes v into the value field of the Node object at that address.

12Wire the next pointer

Stores the argument n into p->next. Until we do this, p->next contains uninitialized garbage from whatever malloc handed us — that bug, dereferencing uninitialized memory, is the #1 cause of crashes in C linked-list code.

17Build the list inside-out

make(9, NULL) is the tail; make(3, ...) wraps it; make(7, ...) becomes the head. Three malloc calls, three nodes, one chain. Note the nested calls evaluate inner-to-outer.

EXAMPLE
head -> [7|.] -> [3|.] -> [9|NULL]
19Idiomatic for-loop traversal

The C idiom for walking a linked list. Initialize p to head, continue while p is non-NULL, advance p = p->next at each step. Identical semantics to the Python while-loop.

20Print through the pointer

p->value reads the int through the pointer. printf adds a newline because of \n. No allocation happens here — we're only reading.

23Begin teardown

Unlike Python, nothing frees these nodes automatically. If we 'return 0' here without freeing, every node leaks — the OS reclaims them on exit, but in a long-running program that's a memory leak.

24Save the doomed node

We MUST capture head into 'doomed' before we move head forward. If we did 'free(head); head = head->next;' we'd dereference freed memory — undefined behavior, often a crash.

25Advance head

head = head->next reads doomed->next while doomed is still valid (we haven't freed it yet). Order matters: read first, then free.

26Release the memory

free(doomed) returns those 16 bytes to the allocator. After this call, 'doomed' is a dangling pointer — its bits still hold an address, but reading or writing through it is undefined behavior.

15 lines without explanation
1#include <stdio.h>
2#include <stdlib.h>
3
4typedef struct Node {
5    int value;
6    struct Node* next;
7} Node;
8
9Node* make(int v, Node* n) {
10    Node* p = (Node*) malloc(sizeof(Node));
11    p->value = v;
12    p->next  = n;
13    return p;
14}
15
16int main(void) {
17    Node* head = make(7, make(3, make(9, NULL)));
18
19    for (Node* p = head; p != NULL; p = p->next)
20        printf("%d\n", p->value);
21
22    /* free in reverse: must save next before freeing current */
23    while (head != NULL) {
24        Node* doomed = head;
25        head = head->next;
26        free(doomed);
27    }
28    return 0;
29}
The teardown loop's order is non-negotiable. Reading head->next after free(head) is undefined behavior. It might seem to work on your laptop, then segfault on the production server because the allocator reused the freed bytes. This is the archetypal use-after-free bug.

Pointer Manipulation Idioms

Almost every linked-list algorithm in the rest of this chapter is built from four primitive moves. Memorize their shapes; you'll see them constantly.

1. The Hop

🐍python
1p = p.next        # advance one node

Cost: one pointer read, one assignment. O(1)O(1). The atom of every traversal.

2. Insert After p

🐍python
1new_node = Node(v)
2new_node.next = p.next     # 1) new_node aims at the rest of the list
3p.next = new_node          # 2) p hops over to new_node

Order matters. If you do step 2 first, you overwrite p.next before you've saved it—the tail of the list becomes unreachable, and the garbage collector will gleefully free everything from p.next onward. Always wire the new node's outgoing pointer before redirecting any existing pointer at it.

3. Delete the Node After p

🐍python
1p.next = p.next.next       # bypass one node; GC reclaims the orphan

In Python the orphan node becomes garbage automatically. In C you must save the doomed node first and call free on it.

4. The Two-Pointer Swap

🐍python
1# Reverse the link between a and b
2a.next, b.next = b.next, a   # tuple-assignment evaluates the RHS first

Python's tuple assignment makes this a one-liner; in C you'd use a temporary. The trick of evaluating the right-hand side fully before any binding is what makes the line safe.


Null Terminators and Sentinel Heads

Every linked structure needs a way to say “there's nothing here.” The standard answer is a designated null pointer:

  • Python: None
  • Java / JavaScript: null
  • C: NULL (a macro for ((void*)0))
  • C++ (modern): nullptr

Treating the empty list as a null head and the end-of-list as a null next is conventional, but it has a cost: every algorithm sprouts special cases “if the list is empty” and “if I'm at the head.” A common cleanup trick is the sentinel head (also called a dummy or header node):

🐍python
1# Without sentinel: 3 cases (empty, at-head, in-middle)
2# With sentinel:   1 case
3sentinel = Node(value=None)        # dummy node, never holds real data
4sentinel.next = first_real_node    # the "real" head is sentinel.next
5
6# Now insert and delete look identical everywhere:
7def delete_after(p):
8    if p.next is not None:
9        p.next = p.next.next       # works at the head, in the middle, anywhere

A sentinel costs one node (16 bytes) but eliminates the “is this the head?” check from every operation. Production-grade linked lists in the Linux kernel (struct list_head) and the C++ STL (std::list) all use sentinels for this reason.


The True Memory Cost

Account for every byte. For a node holding a 4-byte int on a 64-bit system:

4int+4padding+8next ptr=16 B per node\underbrace{4}_{\text{int}} + \underbrace{4}_{\text{padding}} + \underbrace{8}_{\text{next ptr}} = 16 \text{ B per node}

Plus per-allocation overhead from the heap allocator—typically 8 to 32 bytes of bookkeeping per malloc chunk. So the practical cost of a linked-list node holding one int is more like 24 – 4824 \text{ – } 48 bytes. An array slot is 4. The list pays a 6×–12× memory premium for the privilege of O(1)O(1) local edits.

For a Python list of ints, the picture is worse: each Node is a full PyObject (~56 bytes object header + dict pointer + two attribute references), each attribute is itself a boxed integer (~28 bytes). One node easily costs ~120 bytes. Use __slots__ to chop the per-object overhead in half:

🐍python
1class Node:
2    __slots__ = ("value", "next")     # ~56 -> ~32 bytes per node
3    def __init__(self, value, next=None):
4        self.value = value
5        self.next  = next
If you find yourself reaching for a Python Node class for millions of records, the right answer is almost always “use a NumPy array, an array.array, or a Pandas column.” Linked lists exist for their algorithmic shape, not for storing bulk numeric data.

Interactive: Pointer Playground

The playground below is a live linked list. Each rectangle is one node: the left half holds the value, the right half is the next pointer (drawn as a dot with an arrow). The made-up addresses on top emphasize that nodes do not sit consecutively in memory—each one is wherever the heap allocator put it.

Try this sequence to feel the asymmetry:

  1. Click Insert at head three times. Watch how each insert is one rewrite — green node, head pointer flips.
  2. Click Insert at tail. Notice the operation is the same shape, but conceptually we had to walk to the end first — that walk is the O(n)O(n) cost.
  3. Drag the Delete index slider to the middle and click delete. The node turns red and disappears; the predecessor's next pointer now skips it.
0
head70x104030x108090x10c0None
> head -> [7|.] -> [3|.] -> [9|None]

Each rectangle is one heap-allocated node: the left half stores the value, the right half stores the next pointer (drawn as a dot). Addresses on top are made-up but illustrate that nodes can live anywhere on the heap. Notice how insert at head is O(1) butdelete at index k costs O(k) because we must walk there first.


Where Linked Lists Hide in Real Systems

Linked lists are sometimes accused of being “a textbook structure with no real use.” That is wrong—they're hiding in plain sight all over real systems software. The pattern repeats: whenever an algorithm needs O(1)O(1) insertion/removal at known points and indexed access is unimportant, a linked list shows up.

SystemWhere the linked list livesWhy
Linux kernelstruct list_head — intrusive doubly linked lists everywhereTasks, file descriptors, inodes; need O(1) splice/unlink under locks
Memory allocatorsFree lists chaining released chunks of the same size classAllocate / free in O(1) by popping/pushing the head
Garbage collectorsMark queue and reference graph traversalWorkload is pure traversal; no random access ever needed
Hash tables (chaining)Each bucket is a linked list of key–value pairsCollisions are rare; simple O(1) prepend keeps insert fast
LRU cachesDoubly linked list ordered by recency, plus a hash mapMove-to-front and evict-tail are both O(1) given a node pointer
FilesystemsInode block lists, directory entriesVariable-size, append-friendly, no need for random index
Browsers / editorsHistory, undo/redo stacks of unbounded lengthConstant-time push at the head; no indexed access
Graph algorithmsAdjacency lists (Chapter 16)Each vertex stores a list of neighbors; degrees vary widely
CompilersSSA def–use chains, basic-block predecessor/successor listsEdges are added/removed during optimization passes

Pitfalls and How to Avoid Them

  • Memory leaks (C/C++). Every malloc needs a matching free. Use a teardown function that walks the list and frees in head-to-tail order, saving next before each free.
  • Use-after-free / dangling pointers. After free(p), the bits of p still look like a valid address. Set p = NULL immediately if you might touch it again.
  • Accidental cycles. Writing tail.next = head turns the list into a ring, and any unsuspecting traversal becomes an infinite loop. Section 6 (Floyd's tortoise and hare) shows the O(n) detection algorithm.
  • Iterator invalidation. If you delete the node a cursor is currently pointing at, the cursor is dangling on the next hop. Standard fix: keep a previous pointer too, so deletion is “prev.next = cur.next; cur = prev.next.”
  • Identity vs equality (Python). a is b asks “same object?” a == b asks “same value?” For the null check you want p is None; for “is this the same node I started from?” you also want is, not ==.
  • Losing the head. Overwriting head in the middle of an algorithm makes every node before your cursor unreachable—in Python they get garbage collected; in C they leak. Save head at the top of any function that might rebind it.

Quick Checks

Quick Check

Starting from `head -> [7|.] -> [3|.] -> [9|None]`, you run: p = head.next p.next = Node(5, p.next) What does the list look like afterward?

Quick Check

In C, you want to delete the head of a non-empty list AND avoid a memory leak. Which sequence is correct?


With the pointer mental model in place, every linked-list algorithm in the rest of this chapter reduces to careful arrow choreography. Section 6.2 builds a full singly linked list class on top of these primitives; section 6.3 adds the back-pointer for the doubly linked variant; sections 6.4–6.7 turn the primitive moves into named algorithms (reverse, merge, detect-cycle, partition).

Loading comments...