Chapter 19
25 min read
Section 105 of 178

Introduction to Graphs

Graphs and Graph Learning

Learning Objectives

By the end of this section, you will be able to:

  1. Define graphs mathematically as structures G = (V, E) with nodes, edges, and their properties
  2. Distinguish between graph types including directed, undirected, weighted, bipartite, and acyclic graphs
  3. Represent graphs computationally using adjacency matrices, adjacency lists, and edge lists
  4. Calculate key graph properties such as degree, density, and path length
  5. Identify graph-structured data in real-world applications from social networks to molecules
  6. Understand why graphs matter for ML and the limitations of traditional neural networks on graph data
Why This Matters: Graphs are everywhere—from social networks and molecules to knowledge bases and road networks. Unlike images (grids) or text (sequences), graph data has irregular structure that traditional neural networks cannot handle. Understanding graph fundamentals is the foundation for Graph Neural Networks (GNNs), which have revolutionized applications in drug discovery, recommendation systems, fraud detection, and more.

Why Graphs? The Big Picture

Consider the data structures you've encountered in deep learning so far:

Data TypeStructureNeural Architecture
Images2D grid of pixelsConvolutional Neural Networks (CNNs)
Text/Audio1D sequence of tokensRNNs, LSTMs, Transformers
TabularFixed feature vectorFully Connected Networks

These architectures exploit the regular structure of their input data:

  • CNNs leverage the 2D grid structure of images with local receptive fields and weight sharing
  • RNNs/Transformers process sequences by modeling temporal or positional dependencies
  • MLPs treat input as a fixed-length feature vector with no inherent structure

But what about data with irregular structure? Consider:

  • A social network where people have varying numbers of friends
  • A molecule with atoms connected in arbitrary configurations
  • A 3D mesh representing a surface with variable connectivity
  • A knowledge graph linking entities through different relationship types

The Key Insight

Graphs provide a universal language for describing relationships and interactions. A graph captures what entities exist (nodes) and how they relate (edges)—nothing more, nothing less. This flexibility is both their power and their challenge for machine learning.

What is a Graph?

At its core, a graph is a mathematical structure that models pairwise relationships between objects. The objects are called nodes (or vertices), and the relationships are called edges (or links).

Intuitive Examples

DomainNodesEdges
Social NetworkPeopleFriendships
WebWeb pagesHyperlinks
MoleculeAtomsChemical bonds
Road NetworkIntersectionsRoads
Neural NetworkNeurons/LayersConnections
Citation NetworkPapersCitations

The beauty of graph representation is its universality: once data is represented as a graph, we can apply the same algorithms and neural architectures regardless of the original domain.


Formal Mathematical Definition

A graph is defined as an ordered pair:

G=(V,E)G = (V, E)

where:

  • VV is a set of vertices (nodes): V={v1,v2,,vn}V = \{v_1, v_2, \ldots, v_n\}
  • EE is a set of edges: EV×VE \subseteq V \times V

We typically denote:

  • V=n|V| = n as the number of nodes
  • E=e|E| = e (or mm) as the number of edges

Node and Edge Attributes

In practical applications, nodes and edges often carry additional information:

xvRd(node features for node v)\mathbf{x}_v \in \mathbb{R}^{d} \quad \text{(node features for node } v \text{)}
euvRde(edge features for edge (u,v))\mathbf{e}_{uv} \in \mathbb{R}^{d_e} \quad \text{(edge features for edge } (u,v) \text{)}

For example, in a molecular graph:

  • Node features xv\mathbf{x}_v might encode atom type, charge, and hybridization
  • Edge features euv\mathbf{e}_{uv} might encode bond type (single, double, triple) and bond length

Notation Convention

Throughout this chapter, we use:
  • u,v,wu, v, w to denote individual nodes
  • (u,v)(u, v) to denote an edge between nodes uu and vv
  • N(v)\mathcal{N}(v) to denote the neighborhood of node vv—the set of nodes connected to vv

The Neighborhood Function

One of the most important concepts in graph neural networks is the neighborhood:

N(v)={uV:(u,v)E or (v,u)E}\mathcal{N}(v) = \{u \in V : (u, v) \in E \text{ or } (v, u) \in E\}

This is the set of all nodes directly connected to vv. The degree of a node is the size of its neighborhood:

deg(v)=N(v)\text{deg}(v) = |\mathcal{N}(v)|

Quick Check

In a social network graph where nodes are people and edges are friendships, what does N(Alice) represent?


Interactive: Graph Explorer

Explore the fundamental structure of graphs with this interactive visualization. You can add nodes, connect them with edges, and observe how graph properties change in real-time.

ABCDEF
Click and drag nodes to rearrange. Click a node to select it.

Graph Properties

Nodes |V|:6
Edges |E|:7
Avg Degree:2.33
Density:0.467
Type:Undirected

Graph Traversal

Select a node to set start point

What to Explore

  1. Add nodes: Click "Add Node" then click on the canvas to create new vertices
  2. Connect nodes: Click "Add Edge" then click two nodes to connect them
  3. Observe degrees: Select a node to see its degree and neighbors
  4. Toggle directed: Enable directed mode to see how direction affects the graph
  5. Run BFS: Select a node and run breadth-first search to see graph traversal

Types of Graphs

Graphs come in many varieties, each suited to different types of relationships. The choice of graph type significantly impacts how we design algorithms and neural networks.

Undirected Graph

ABCDE
Undirected|V| = 5|E| = 6

Description

Edges have no direction. If node A is connected to node B, then B is also connected to A.

Directed vs. Undirected Graphs

The most fundamental distinction is whether edges have direction:

TypeNotationMeaningExample
Undirected(u, v) = (v, u)Symmetric relationshipFacebook friendships
Directed(u, v) ≠ (v, u)Asymmetric relationshipTwitter follows

In undirected graphs, if Alice is connected to Bob, then Bob is also connected to Alice. The edge (A,B)(A, B) is the same as (B,A)(B, A).

In directed graphs (digraphs), edges have a direction. Alice following Bob on Twitter does not mean Bob follows Alice.

Weighted Graphs

When edges carry numerical values, we have a weighted graph:

w:ER(weight function)w: E \to \mathbb{R} \quad \text{(weight function)}

Weights can represent distances, costs, similarities, or interaction strengths. For instance:

  • Road network: Edge weight = travel time or distance
  • Social network: Edge weight = interaction frequency
  • Protein network: Edge weight = interaction confidence score

Special Graph Types

TypeDefinitionKey Property
Complete Graph KₙEvery pair of nodes is connected|E| = n(n-1)/2
Bipartite GraphNodes split into two sets; edges only between setsTwo-colorable
TreeConnected graph with no cycles|E| = n - 1
DAGDirected graph with no directed cyclesHas topological ordering
Sparse Graph|E| = O(n)Most real-world graphs
Dense Graph|E| = O(n²)Near-complete connectivity

Why Sparsity Matters

Most real-world graphs are sparse—the number of edges grows linearly with nodes, not quadratically. A social network with 1 billion users doesn't have 10\u00B9\u2078 friendships! This sparsity is crucial for computational efficiency of GNNs.

Graph Representations in Code

To work with graphs computationally, we need efficient data structures. The choice of representation affects both memory usage and algorithm efficiency.

Graph G = (V, E)

ABCDE
|V| = 5 nodes
|E| = 6 edges

Adjacency Matrix

ABCDE
A01100
B10110
C11001
D01001
E00110
Space: O(n²) = 25 cells
Best for: Dense graphs, fast edge lookup O(1)

Representation Comparison

OperationAdjacency MatrixAdjacency ListEdge List
SpaceO(n²)O(n + e)O(e)
Check edge (u, v)O(1)O(degree)O(e)
Get neighborsO(n)O(degree)O(e)
Add edgeO(1)O(1)O(1)
Iterate all edgesO(n²)O(n + e)O(e)
Green = Fast |Orange = Medium |Red = Slow

Adjacency Matrix

The adjacency matrix A{0,1}n×n\mathbf{A} \in \{0, 1\}^{n \times n} encodes edges as matrix entries:

Aij={1if (i,j)E0otherwiseA_{ij} = \begin{cases} 1 & \text{if } (i, j) \in E \\ 0 & \text{otherwise} \end{cases}

Properties of the adjacency matrix:

  • Undirected graphs: A\mathbf{A} is symmetric (A=AT\mathbf{A} = \mathbf{A}^T)
  • Self-loops: Diagonal entries AiiA_{ii} indicate self-connections
  • Degree: Row sum gives degree: deg(i)=jAij\text{deg}(i) = \sum_j A_{ij}
  • Paths: (Ak)ij(\mathbf{A}^k)_{ij} counts paths of length kk from ii to jj
Adjacency Matrix Representation
🐍adjacency_matrix.py
5Edge List Input

Edges are provided as (source, target) pairs. This is a common input format for graph data.

13Zero Initialization

Start with all zeros - no edges exist until we add them.

16Symmetric for Undirected

For undirected graphs, we set both A[i][j] and A[j][i] to ensure symmetry.

31Degree Calculation

The degree of node i equals the row sum A[i].sum(). For directed graphs, this is out-degree.

35Matrix Power for Paths

A remarkable property: A^k[i,j] gives the number of paths of length k from i to j.

39 lines without explanation
1import torch
2import numpy as np
3
4def create_adjacency_matrix(edges: list, num_nodes: int) -> torch.Tensor:
5    """
6    Create an adjacency matrix from an edge list.
7
8    Args:
9        edges: List of (source, target) tuples
10        num_nodes: Total number of nodes
11
12    Returns:
13        Adjacency matrix of shape (num_nodes, num_nodes)
14    """
15    # Initialize with zeros
16    A = torch.zeros(num_nodes, num_nodes)
17
18    for src, tgt in edges:
19        A[src, tgt] = 1
20        A[tgt, src] = 1  # For undirected graph
21
22    return A
23
24
25# Example: Simple triangle graph (A-B-C-A)
26edges = [(0, 1), (1, 2), (2, 0)]
27num_nodes = 3
28
29A = create_adjacency_matrix(edges, num_nodes)
30print("Adjacency Matrix:")
31print(A)
32# tensor([[0., 1., 1.],
33#         [1., 0., 1.],
34#         [1., 1., 0.]])
35
36# Calculate degree of each node
37degrees = A.sum(dim=1)
38print(f"\nDegrees: {degrees}")  # tensor([2., 2., 2.])
39
40# Count paths of length 2
41A2 = A @ A
42print(f"\nPaths of length 2 (A²):")
43print(A2)
44# A²[i,j] = number of 2-hop paths from i to j

Adjacency List

For sparse graphs, an adjacency list is more memory-efficient:

Adjacency List Representation
🐍adjacency_list.py
11Using Sets for Fast Lookup

We use sets instead of lists to enable O(1) average-case edge existence checking.

18Undirected Edge Handling

For undirected graphs, add both (u,v) and (v,u) to maintain symmetry.

23Neighborhood Access

Accessing N(v) is O(1) - just return the stored set. Critical for GNN message passing!

31Edge Existence Check

With sets, checking if an edge exists is O(1) average. With lists, it would be O(degree).

46 lines without explanation
1from collections import defaultdict
2from typing import Dict, List, Set
3
4class Graph:
5    """
6    Graph representation using adjacency lists.
7    Efficient for sparse graphs where |E| << |V|².
8    """
9
10    def __init__(self, directed: bool = False):
11        self.adj_list: Dict[int, Set[int]] = defaultdict(set)
12        self.directed = directed
13        self.num_edges = 0
14
15    def add_edge(self, u: int, v: int):
16        """Add an edge between nodes u and v."""
17        self.adj_list[u].add(v)
18        if not self.directed:
19            self.adj_list[v].add(u)
20        self.num_edges += 1
21
22    def neighbors(self, v: int) -> Set[int]:
23        """Return the neighborhood N(v) of node v."""
24        return self.adj_list[v]
25
26    def degree(self, v: int) -> int:
27        """Return the degree of node v."""
28        return len(self.adj_list[v])
29
30    def has_edge(self, u: int, v: int) -> bool:
31        """Check if edge (u, v) exists. O(1) average with set."""
32        return v in self.adj_list[u]
33
34    @property
35    def num_nodes(self) -> int:
36        return len(self.adj_list)
37
38
39# Example usage
40g = Graph(directed=False)
41
42# Add edges
43for edge in [(0, 1), (1, 2), (2, 0), (2, 3)]:
44    g.add_edge(*edge)
45
46print(f"Number of nodes: {g.num_nodes}")
47print(f"Number of edges: {g.num_edges}")
48print(f"Neighbors of node 2: {g.neighbors(2)}")  # {0, 1, 3}
49print(f"Degree of node 2: {g.degree(2)}")  # 3
50print(f"Edge (1, 2) exists: {g.has_edge(1, 2)}")  # True

Edge List (COO Format)

The edge list format stores edges as pairs of indices. This is the standard format in PyTorch Geometric:

Edge List (COO Format)
🐍edge_index.py
8COO Format Shape

edge_index has shape (2, num_edges). First row is source nodes, second row is target nodes.

18Long Tensor Required

PyTorch Geometric requires edge indices to be torch.long (int64) tensors.

32Undirected Conversion

For undirected graphs, each edge (u,v) must appear as both (u,v) and (v,u) in edge_index.

43 lines without explanation
1import torch
2
3def create_edge_index(edges: list) -> torch.Tensor:
4    """
5    Create edge_index tensor in COO format.
6    This is the standard format used in PyTorch Geometric.
7
8    Args:
9        edges: List of (source, target) tuples
10
11    Returns:
12        edge_index: Tensor of shape (2, num_edges)
13                   Row 0: source nodes
14                   Row 1: target nodes
15    """
16    sources = [e[0] for e in edges]
17    targets = [e[1] for e in edges]
18
19    edge_index = torch.tensor([sources, targets], dtype=torch.long)
20    return edge_index
21
22
23# Example
24edges = [(0, 1), (1, 2), (2, 0), (0, 3)]
25edge_index = create_edge_index(edges)
26
27print("Edge index (COO format):")
28print(edge_index)
29# tensor([[0, 1, 2, 0],
30#         [1, 2, 0, 3]])
31
32# For undirected graphs, include reverse edges
33def to_undirected(edge_index: torch.Tensor) -> torch.Tensor:
34    """Convert to undirected by adding reverse edges."""
35    row, col = edge_index
36    edge_index_undirected = torch.stack([
37        torch.cat([row, col]),
38        torch.cat([col, row])
39    ], dim=0)
40    return edge_index_undirected
41
42edge_index_undirected = to_undirected(edge_index)
43print("\nUndirected edge index:")
44print(edge_index_undirected)
45# tensor([[0, 1, 2, 0, 1, 2, 0, 3],
46#         [1, 2, 0, 3, 0, 1, 3, 0]])

Representation Trade-offs

Choose your representation based on the operations you need:
  • Adjacency Matrix: Fast edge lookup O(1), but O(n²) space—only for small/dense graphs
  • Adjacency List: O(n + e) space, fast neighbor iteration—best for most algorithms
  • Edge Index (COO): O(e) space, GPU-friendly—standard for GNNs in PyTorch Geometric

Key Graph Properties

Understanding graph properties is essential for both algorithm design and understanding GNN behavior.

Degree Distribution

The degree distribution P(k)P(k) gives the probability that a randomly selected node has degree kk:

P(k)={vV:deg(v)=k}VP(k) = \frac{|\{v \in V : \text{deg}(v) = k\}|}{|V|}

Many real-world graphs follow a power-law degree distribution:

P(k)kγwhere γ23P(k) \propto k^{-\gamma} \quad \text{where } \gamma \approx 2-3

This means most nodes have few connections, but a few "hub" nodes have many. Such graphs are called scale-free.

Graph Density

Density measures how connected a graph is relative to its maximum possible edges:

ρ=EEmax=E(n2)=2En(n1)\rho = \frac{|E|}{|E|_{\text{max}}} = \frac{|E|}{\binom{n}{2}} = \frac{2|E|}{n(n-1)}
Graph TypeDensity RangeExamples
Sparseρ ≈ O(1/n)Social networks, road networks
Denseρ ≈ O(1)Complete graphs, cliques
Very sparse|E| ≈ O(n)Trees, planar graphs

Paths and Connectivity

A path from node uu to vv is a sequence of edges connecting them:

path(u,v)=(u,w1),(w1,w2),,(wk1,v)\text{path}(u, v) = (u, w_1), (w_1, w_2), \ldots, (w_{k-1}, v)

Key path-related concepts:

ConceptDefinitionSignificance
Shortest PathMinimum number of edges between two nodesUsed for distance metrics
DiameterMaximum shortest path in the graphIndicates how "spread out" the graph is
Connected ComponentMaximal set of mutually reachable nodesIsolated subgraphs
Average Path LengthMean shortest path over all node pairsMeasures global connectivity

The Degree Matrix and Graph Laplacian

Two matrices are fundamental to spectral graph theory and many GNN architectures:

The degree matrix D\mathbf{D} is diagonal with degrees on the diagonal:

Dii=deg(i)=jAijD_{ii} = \text{deg}(i) = \sum_j A_{ij}

The graph Laplacian captures the difference between a node and its neighbors:

L=DA\mathbf{L} = \mathbf{D} - \mathbf{A}

The normalized Laplacian (used in many GNNs):

L~=ID1/2AD1/2\tilde{\mathbf{L}} = \mathbf{I} - \mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2}

Why the Laplacian Matters

The graph Laplacian has remarkable properties:
  • Its eigenvalues encode graph connectivity (number of zeros = number of connected components)
  • Its eigenvectors form a "Fourier basis" for signals on the graph
  • Spectral GNNs (like ChebNet) use Laplacian eigenvalues for filtering
  • The normalized form appears directly in GCN's propagation rule

Quick Check

A graph has 100 nodes and 4950 edges. What is its density?


Real-World Graph Data

Graph-structured data appears across virtually every domain. Understanding these applications motivates why we need specialized neural network architectures.

Social Networks

AliceBobCarolDaveEveFrank

Platforms like Facebook, Twitter, and LinkedIn represent users as nodes and relationships (friendships, follows, connections) as edges.

Nodes

Users/Accounts

Edges

Friendships, Follows, Interactions

Scale

~3B users (Facebook)

Node Features
Profile infoActivity patternsDemographicsEmbeddings
Edge Features
Interaction frequencyRelationship typeTimestamp
ML Tasks
Friend recommendationCommunity detectionInfluence predictionFake account detection

Common Patterns in Real-World Graphs

Despite their diversity, most real-world graphs share common structural properties:

  1. Power-law degree distribution: A few hub nodes have many connections; most nodes have few
  2. Small-world property: Short average path lengths despite sparse connections ("six degrees of separation")
  3. Community structure: Nodes cluster into densely connected groups with sparse inter-group connections
  4. Homophily: Connected nodes tend to be similar (birds of a feather flock together)
  5. Heterophily: In some graphs, connected nodes are dissimilar (buyers and sellers)

Why Graphs in Machine Learning?

Now that we understand graph structure, why can't we just use standard neural networks on graph data?

The Challenge of Irregular Structure

Traditional neural networks assume fixed input structure:

  • MLPs: Fixed-length feature vector
  • CNNs: Fixed grid (224×224 pixels)
  • RNNs: Sequential ordering

Graphs violate all these assumptions:

ChallengeDescriptionWhy It&apos;s Hard
Variable sizeDifferent graphs have different numbers of nodesCan&apos;t use fixed input dimensions
No orderingNodes have no canonical order like pixels or wordsPermutation must not change output
Variable neighborhoodsEach node has different numbers of neighborsCan&apos;t apply fixed-size filters
Discrete structureNo continuous spatial coordinatesCan&apos;t define convolution in usual sense

The Permutation Invariance Requirement

A fundamental property required for any graph learning model is permutation invariance (for graph-level tasks) or permutation equivariance (for node-level tasks):

f(PX,PAPT)=f(X,A)f(\mathbf{P}\mathbf{X}, \mathbf{P}\mathbf{A}\mathbf{P}^T) = f(\mathbf{X}, \mathbf{A})

where P\mathbf{P} is any permutation matrix. This means the output should not depend on how we arbitrarily numbered the nodes.

Why Permutation Invariance Matters

Consider a social network: labeling users as 1,2,3,... vs 3,1,2,... is arbitrary—it's the same network! A graph neural network must recognize this equivalence and produce the same predictions regardless of node ordering.

What Graph Neural Networks Provide

Graph Neural Networks (GNNs) address these challenges through:

  1. Local aggregation: Each node aggregates information from its neighbors, handling variable neighborhoods
  2. Permutation equivariance: Operations are defined on edges and neighborhoods, not absolute positions
  3. Weight sharing: The same transformation is applied at every node, enabling arbitrary graph sizes
  4. Hierarchical representations: Stacking layers captures increasingly distant information

In the next sections, we'll see how these principles lead to specific GNN architectures like GCN, GraphSAGE, and GAT.


Summary

This section introduced the fundamental concepts of graphs as a data structure for machine learning. Here are the key takeaways:

Core Concepts

ConceptDefinitionML Relevance
Graph G = (V, E)Nodes V and edges EUniversal structure for relational data
Neighborhood N(v)Set of nodes adjacent to vFoundation of GNN message passing
Degree deg(v)|N(v)|Node importance measure
Adjacency Matrix AA[i,j] = 1 if edge (i,j)Matrix form for graph operations
Graph Laplacian LD - ASpectral analysis of graphs

Key Equations

  1. Graph definition: G=(V,E)G = (V, E)
  2. Neighborhood: N(v)={u:(u,v)E}\mathcal{N}(v) = \{u : (u, v) \in E\}
  3. Degree: deg(v)=jAvj\text{deg}(v) = \sum_j A_{vj}
  4. Density: ρ=2En(n1)\rho = \frac{2|E|}{n(n-1)}
  5. Laplacian: L=DA\mathbf{L} = \mathbf{D} - \mathbf{A}

Looking Forward

In the next section, we'll explore why we need specialized neural networks for graphs—understanding the limitations of traditional architectures and the key ideas that enable graph neural networks.


Knowledge Check

Test your understanding of graph fundamentals with this quiz:

Question 1 of 10Score: 0/0

In graph terminology, what does the degree of a node represent?


Exercises

Conceptual Questions

  1. Explain why social networks typically exhibit a power-law degree distribution. What real-world phenomenon does this reflect?
  2. Given a bipartite graph representing users and movies (edges = watched), what would edges in the user-user projection graph represent?
  3. Why is the graph Laplacian positive semi-definite? What does this imply about its eigenvalues?
  4. A tree with nn nodes has n1n-1 edges. Use this to explain why adding any edge to a tree creates exactly one cycle.

Mathematical Exercises

  1. Handshaking Lemma: Prove that vVdeg(v)=2E\sum_{v \in V} \text{deg}(v) = 2|E| for any undirected graph.
  2. Adjacency Matrix Powers: Show that (A2)ij(\mathbf{A}^2)_{ij} equals the number of common neighbors between nodes ii and jj.
  3. Bipartite Characterization: Prove that a graph is bipartite if and only if it contains no odd-length cycles.
  4. Laplacian Properties: For the graph Laplacian L\mathbf{L}, show that L1=0\mathbf{L}\mathbf{1} = \mathbf{0} where 1\mathbf{1} is the all-ones vector.

Coding Exercises

  1. Graph Class Implementation: Implement a Python class that supports both adjacency matrix and adjacency list representations, with methods to convert between them.
  2. Degree Distribution: Load a real-world graph dataset (e.g., Karate Club from PyTorch Geometric) and plot its degree distribution. Does it follow a power law?
  3. BFS Implementation: Implement breadth-first search to compute shortest paths from a source node to all other nodes.
  4. Connected Components: Implement an algorithm to find all connected components in an undirected graph.

Solution Hints

  • Exercise 1: Each edge contributes exactly 2 to the sum (one for each endpoint).
  • Exercise 2: (A2)ij=kAikAkj(\mathbf{A}^2)_{ij} = \sum_k A_{ik} A_{kj}—think about when this sum is non-zero.
  • Coding Exercise 2: Use matplotlib for plotting; consider log-log scale for power-law visualization.

Now that you understand graph fundamentals, you're ready to explore why we need Graph Neural Networks—understanding the unique challenges graphs pose for traditional neural networks and the key principles that enable learning on graph-structured data.