Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Define graphs mathematically as structures G = (V, E) with nodes, edges, and their properties
Distinguish between graph types including directed, undirected, weighted, bipartite, and acyclic graphs
Represent graphs computationally using adjacency matrices, adjacency lists, and edge lists
Calculate key graph properties such as degree, density, and path length
Identify graph-structured data in real-world applications from social networks to molecules
Understand why graphs matter for ML and the limitations of traditional neural networks on graph data

Why This Matters: Graphs are everywhere—from social networks and molecules to knowledge bases and road networks. Unlike images (grids) or text (sequences), graph data has irregular structure that traditional neural networks cannot handle. Understanding graph fundamentals is the foundation for Graph Neural Networks (GNNs), which have revolutionized applications in drug discovery, recommendation systems, fraud detection, and more.

Why Graphs? The Big Picture

Consider the data structures you've encountered in deep learning so far:

Data Type	Structure	Neural Architecture
Images	2D grid of pixels	Convolutional Neural Networks (CNNs)
Text/Audio	1D sequence of tokens	RNNs, LSTMs, Transformers
Tabular	Fixed feature vector	Fully Connected Networks

These architectures exploit the regular structure of their input data:

CNNs leverage the 2D grid structure of images with local receptive fields and weight sharing
RNNs/Transformers process sequences by modeling temporal or positional dependencies
MLPs treat input as a fixed-length feature vector with no inherent structure

But what about data with irregular structure? Consider:

A social network where people have varying numbers of friends
A molecule with atoms connected in arbitrary configurations
A 3D mesh representing a surface with variable connectivity
A knowledge graph linking entities through different relationship types

The Key Insight

Graphs provide a universal language for describing relationships and interactions. A graph captures what entities exist (nodes) and how they relate (edges)—nothing more, nothing less. This flexibility is both their power and their challenge for machine learning.

What is a Graph?

At its core, a graph is a mathematical structure that models pairwise relationships between objects. The objects are called nodes (or vertices), and the relationships are called edges (or links).

Intuitive Examples

Domain	Nodes	Edges
Social Network	People	Friendships
Web	Web pages	Hyperlinks
Molecule	Atoms	Chemical bonds
Road Network	Intersections	Roads
Neural Network	Neurons/Layers	Connections
Citation Network	Papers	Citations

The beauty of graph representation is its universality: once data is represented as a graph, we can apply the same algorithms and neural architectures regardless of the original domain.

Formal Mathematical Definition

A graph is defined as an ordered pair:

G = (V, E)

where:

$V$ is a set of vertices (nodes): $V = \{v_1, v_2, \ldots, v_n\}$
$E$ is a set of edges: $E \subseteq V \times V$

We typically denote:

$|V| = n$ as the number of nodes
$|E| = e$ (or $m$ ) as the number of edges

Node and Edge Attributes

In practical applications, nodes and edges often carry additional information:

\mathbf{x}_v \in \mathbb{R}^{d} \quad \text{(node features for node } v \text{)}

\mathbf{e}_{uv} \in \mathbb{R}^{d_e} \quad \text{(edge features for edge } (u,v) \text{)}

For example, in a molecular graph:

Node features $\mathbf{x}_v$ might encode atom type, charge, and hybridization
Edge features $\mathbf{e}_{uv}$ might encode bond type (single, double, triple) and bond length

Notation Convention

Throughout this chapter, we use:

$u, v, w$ to denote individual nodes
$(u, v)$ to denote an edge between nodes $u$ and $v$
$\mathcal{N}(v)$ to denote the neighborhood of node $v$ —the set of nodes connected to $v$

The Neighborhood Function

One of the most important concepts in graph neural networks is the neighborhood:

\mathcal{N}(v) = \{u \in V : (u, v) \in E \text{ or } (v, u) \in E\}

This is the set of all nodes directly connected to $v$ . The degree of a node is the size of its neighborhood:

\text{deg}(v) = |\mathcal{N}(v)|

Quick Check

In a social network graph where nodes are people and edges are friendships, what does N(Alice) represent?

Interactive: Graph Explorer

Explore the fundamental structure of graphs with this interactive visualization. You can add nodes, connect them with edges, and observe how graph properties change in real-time.

Click and drag nodes to rearrange. Click a node to select it.

Graph Properties

Nodes |V|:6

Edges |E|:7

Avg Degree:2.33

Density:0.467

Type:Undirected

Directed Graph

Graph Traversal

Select a node to set start point

What to Explore

Add nodes: Click "Add Node" then click on the canvas to create new vertices
Connect nodes: Click "Add Edge" then click two nodes to connect them
Observe degrees: Select a node to see its degree and neighbors
Toggle directed: Enable directed mode to see how direction affects the graph
Run BFS: Select a node and run breadth-first search to see graph traversal

Types of Graphs

Graphs come in many varieties, each suited to different types of relationships. The choice of graph type significantly impacts how we design algorithms and neural networks.

Undirected Graph

Undirected|V| = 5|E| = 6

Description

Edges have no direction. If node A is connected to node B, then B is also connected to A.

Directed vs. Undirected Graphs

The most fundamental distinction is whether edges have direction:

Type	Notation	Meaning	Example
Undirected	(u, v) = (v, u)	Symmetric relationship	Facebook friendships
Directed	(u, v) ≠ (v, u)	Asymmetric relationship	Twitter follows

In undirected graphs, if Alice is connected to Bob, then Bob is also connected to Alice. The edge $(A, B)$ is the same as $(B, A)$ .

In directed graphs (digraphs), edges have a direction. Alice following Bob on Twitter does not mean Bob follows Alice.

Weighted Graphs

When edges carry numerical values, we have a weighted graph:

w: E \to \mathbb{R} \quad \text{(weight function)}

Weights can represent distances, costs, similarities, or interaction strengths. For instance:

Road network: Edge weight = travel time or distance
Social network: Edge weight = interaction frequency
Protein network: Edge weight = interaction confidence score

Special Graph Types

Type	Definition	Key Property
Complete Graph Kₙ	Every pair of nodes is connected	\|E\| = n(n-1)/2
Bipartite Graph	Nodes split into two sets; edges only between sets	Two-colorable
Tree	Connected graph with no cycles	\|E\| = n - 1
DAG	Directed graph with no directed cycles	Has topological ordering
Sparse Graph	\|E\| = O(n)	Most real-world graphs
Dense Graph	\|E\| = O(n²)	Near-complete connectivity

Why Sparsity Matters

Most real-world graphs are sparse—the number of edges grows linearly with nodes, not quadratically. A social network with 1 billion users doesn't have 10\u00B9\u2078 friendships! This sparsity is crucial for computational efficiency of GNNs.

Graph Representations in Code

To work with graphs computationally, we need efficient data structures. The choice of representation affects both memory usage and algorithm efficiency.

Directed Graph

Graph G = (V, E)

|V| = 5 nodes

|E| = 6 edges

Adjacency Matrix

	A	B	C	D	E
A	0	1	1	0	0
B	1	0	1	1	0
C	1	1	0	0	1
D	0	1	0	0	1
E	0	0	1	1	0

Space: O(n²) = 25 cells

Best for: Dense graphs, fast edge lookup O(1)

Representation Comparison

Operation	Adjacency Matrix	Adjacency List	Edge List
Space	O(n²)	O(n + e)	O(e)
Check edge (u, v)	O(1)	O(degree)	O(e)
Get neighbors	O(n)	O(degree)	O(e)
Add edge	O(1)	O(1)	O(1)
Iterate all edges	O(n²)	O(n + e)	O(e)

Green = Fast |Orange = Medium |Red = Slow

Adjacency Matrix

The adjacency matrix $\mathbf{A} \in \{0, 1\}^{n \times n}$ encodes edges as matrix entries:

A_{ij} = \begin{cases} 1 & \text{if } (i, j) \in E \\ 0 & \text{otherwise} \end{cases}

Properties of the adjacency matrix:

Undirected graphs: $\mathbf{A}$ is symmetric ( $\mathbf{A} = \mathbf{A}^T$ )
Self-loops: Diagonal entries $A_{ii}$ indicate self-connections
Degree: Row sum gives degree: $\text{deg}(i) = \sum_j A_{ij}$
Paths: $(\mathbf{A}^k)_{ij}$ counts paths of length $k$ from $i$ to $j$

Adjacency Matrix Representation

🐍adjacency_matrix.py

Explanation(5)

Code(44)

5Edge List Input

Edges are provided as (source, target) pairs. This is a common input format for graph data.

13Zero Initialization

Start with all zeros - no edges exist until we add them.

16Symmetric for Undirected

For undirected graphs, we set both A[i][j] and A[j][i] to ensure symmetry.

31Degree Calculation

The degree of node i equals the row sum A[i].sum(). For directed graphs, this is out-degree.

35Matrix Power for Paths

A remarkable property: A^k[i,j] gives the number of paths of length k from i to j.

39 lines without explanation

1import torch
2import numpy as np
3
4def create_adjacency_matrix(edges: list, num_nodes: int) -> torch.Tensor:
5    """
6    Create an adjacency matrix from an edge list.
7
8    Args:
9        edges: List of (source, target) tuples
10        num_nodes: Total number of nodes
11
12    Returns:
13        Adjacency matrix of shape (num_nodes, num_nodes)
14    """
15    # Initialize with zeros
16    A = torch.zeros(num_nodes, num_nodes)
17
18    for src, tgt in edges:
19        A[src, tgt] = 1
20        A[tgt, src] = 1  # For undirected graph
21
22    return A
23
24
25# Example: Simple triangle graph (A-B-C-A)
26edges = [(0, 1), (1, 2), (2, 0)]
27num_nodes = 3
28
29A = create_adjacency_matrix(edges, num_nodes)
30print("Adjacency Matrix:")
31print(A)
32# tensor([[0., 1., 1.],
33#         [1., 0., 1.],
34#         [1., 1., 0.]])
35
36# Calculate degree of each node
37degrees = A.sum(dim=1)
38print(f"\nDegrees: {degrees}")  # tensor([2., 2., 2.])
39
40# Count paths of length 2
41A2 = A @ A
42print(f"\nPaths of length 2 (A²):")
43print(A2)
44# A²[i,j] = number of 2-hop paths from i to j

Adjacency List

For sparse graphs, an adjacency list is more memory-efficient:

Adjacency List Representation

🐍adjacency_list.py

Explanation(4)

Code(50)

11Using Sets for Fast Lookup

We use sets instead of lists to enable O(1) average-case edge existence checking.

18Undirected Edge Handling

For undirected graphs, add both (u,v) and (v,u) to maintain symmetry.

23Neighborhood Access

Accessing N(v) is O(1) - just return the stored set. Critical for GNN message passing!

31Edge Existence Check

With sets, checking if an edge exists is O(1) average. With lists, it would be O(degree).

46 lines without explanation

1from collections import defaultdict
2from typing import Dict, List, Set
3
4class Graph:
5    """
6    Graph representation using adjacency lists.
7    Efficient for sparse graphs where |E| << |V|².
8    """
9
10    def __init__(self, directed: bool = False):
11        self.adj_list: Dict[int, Set[int]] = defaultdict(set)
12        self.directed = directed
13        self.num_edges = 0
14
15    def add_edge(self, u: int, v: int):
16        """Add an edge between nodes u and v."""
17        self.adj_list[u].add(v)
18        if not self.directed:
19            self.adj_list[v].add(u)
20        self.num_edges += 1
21
22    def neighbors(self, v: int) -> Set[int]:
23        """Return the neighborhood N(v) of node v."""
24        return self.adj_list[v]
25
26    def degree(self, v: int) -> int:
27        """Return the degree of node v."""
28        return len(self.adj_list[v])
29
30    def has_edge(self, u: int, v: int) -> bool:
31        """Check if edge (u, v) exists. O(1) average with set."""
32        return v in self.adj_list[u]
33
34    @property
35    def num_nodes(self) -> int:
36        return len(self.adj_list)
37
38
39# Example usage
40g = Graph(directed=False)
41
42# Add edges
43for edge in [(0, 1), (1, 2), (2, 0), (2, 3)]:
44    g.add_edge(*edge)
45
46print(f"Number of nodes: {g.num_nodes}")
47print(f"Number of edges: {g.num_edges}")
48print(f"Neighbors of node 2: {g.neighbors(2)}")  # {0, 1, 3}
49print(f"Degree of node 2: {g.degree(2)}")  # 3
50print(f"Edge (1, 2) exists: {g.has_edge(1, 2)}")  # True

Edge List (COO Format)

The edge list format stores edges as pairs of indices. This is the standard format in PyTorch Geometric:

Edge List (COO Format)

🐍edge_index.py

Explanation(3)

Code(46)

8COO Format Shape

edge_index has shape (2, num_edges). First row is source nodes, second row is target nodes.

18Long Tensor Required

PyTorch Geometric requires edge indices to be torch.long (int64) tensors.

32Undirected Conversion

For undirected graphs, each edge (u,v) must appear as both (u,v) and (v,u) in edge_index.

43 lines without explanation

1import torch
2
3def create_edge_index(edges: list) -> torch.Tensor:
4    """
5    Create edge_index tensor in COO format.
6    This is the standard format used in PyTorch Geometric.
7
8    Args:
9        edges: List of (source, target) tuples
10
11    Returns:
12        edge_index: Tensor of shape (2, num_edges)
13                   Row 0: source nodes
14                   Row 1: target nodes
15    """
16    sources = [e[0] for e in edges]
17    targets = [e[1] for e in edges]
18
19    edge_index = torch.tensor([sources, targets], dtype=torch.long)
20    return edge_index
21
22
23# Example
24edges = [(0, 1), (1, 2), (2, 0), (0, 3)]
25edge_index = create_edge_index(edges)
26
27print("Edge index (COO format):")
28print(edge_index)
29# tensor([[0, 1, 2, 0],
30#         [1, 2, 0, 3]])
31
32# For undirected graphs, include reverse edges
33def to_undirected(edge_index: torch.Tensor) -> torch.Tensor:
34    """Convert to undirected by adding reverse edges."""
35    row, col = edge_index
36    edge_index_undirected = torch.stack([
37        torch.cat([row, col]),
38        torch.cat([col, row])
39    ], dim=0)
40    return edge_index_undirected
41
42edge_index_undirected = to_undirected(edge_index)
43print("\nUndirected edge index:")
44print(edge_index_undirected)
45# tensor([[0, 1, 2, 0, 1, 2, 0, 3],
46#         [1, 2, 0, 3, 0, 1, 3, 0]])

Representation Trade-offs

Choose your representation based on the operations you need:

Adjacency Matrix: Fast edge lookup O(1), but O(n²) space—only for small/dense graphs
Adjacency List: O(n + e) space, fast neighbor iteration—best for most algorithms
Edge Index (COO): O(e) space, GPU-friendly—standard for GNNs in PyTorch Geometric

Key Graph Properties

Understanding graph properties is essential for both algorithm design and understanding GNN behavior.

Degree Distribution

The degree distribution $P(k)$ gives the probability that a randomly selected node has degree $k$ :

P(k) = \frac{|\{v \in V : \text{deg}(v) = k\}|}{|V|}

Many real-world graphs follow a power-law degree distribution:

P(k) \propto k^{-\gamma} \quad \text{where } \gamma \approx 2-3

This means most nodes have few connections, but a few "hub" nodes have many. Such graphs are called scale-free.

Graph Density

Density measures how connected a graph is relative to its maximum possible edges:

\rho = \frac{|E|}{|E|_{\text{max}}} = \frac{|E|}{\binom{n}{2}} = \frac{2|E|}{n(n-1)}

Graph Type	Density Range	Examples
Sparse	ρ ≈ O(1/n)	Social networks, road networks
Dense	ρ ≈ O(1)	Complete graphs, cliques
Very sparse	\|E\| ≈ O(n)	Trees, planar graphs

Paths and Connectivity

A path from node $u$ to $v$ is a sequence of edges connecting them:

\text{path}(u, v) = (u, w_1), (w_1, w_2), \ldots, (w_{k-1}, v)

Key path-related concepts:

Concept	Definition	Significance
Shortest Path	Minimum number of edges between two nodes	Used for distance metrics
Diameter	Maximum shortest path in the graph	Indicates how "spread out" the graph is
Connected Component	Maximal set of mutually reachable nodes	Isolated subgraphs
Average Path Length	Mean shortest path over all node pairs	Measures global connectivity

The Degree Matrix and Graph Laplacian

Two matrices are fundamental to spectral graph theory and many GNN architectures:

The degree matrix $\mathbf{D}$ is diagonal with degrees on the diagonal:

D_{ii} = \text{deg}(i) = \sum_j A_{ij}

The graph Laplacian captures the difference between a node and its neighbors:

\mathbf{L} = \mathbf{D} - \mathbf{A}

The normalized Laplacian (used in many GNNs):

\tilde{\mathbf{L}} = \mathbf{I} - \mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2}

Why the Laplacian Matters

The graph Laplacian has remarkable properties:

Its eigenvalues encode graph connectivity (number of zeros = number of connected components)
Its eigenvectors form a "Fourier basis" for signals on the graph
Spectral GNNs (like ChebNet) use Laplacian eigenvalues for filtering
The normalized form appears directly in GCN's propagation rule

Quick Check

A graph has 100 nodes and 4950 edges. What is its density?

Real-World Graph Data

Graph-structured data appears across virtually every domain. Understanding these applications motivates why we need specialized neural network architectures.

Social Networks

Platforms like Facebook, Twitter, and LinkedIn represent users as nodes and relationships (friendships, follows, connections) as edges.

Nodes

Users/Accounts

Edges

Friendships, Follows, Interactions

Scale

~3B users (Facebook)

Node Features

Profile infoActivity patternsDemographicsEmbeddings

Edge Features

Interaction frequencyRelationship typeTimestamp

ML Tasks

Friend recommendationCommunity detectionInfluence predictionFake account detection

Common Patterns in Real-World Graphs

Despite their diversity, most real-world graphs share common structural properties:

Power-law degree distribution: A few hub nodes have many connections; most nodes have few
Small-world property: Short average path lengths despite sparse connections ("six degrees of separation")
Community structure: Nodes cluster into densely connected groups with sparse inter-group connections
Homophily: Connected nodes tend to be similar (birds of a feather flock together)
Heterophily: In some graphs, connected nodes are dissimilar (buyers and sellers)

Why Graphs in Machine Learning?

Now that we understand graph structure, why can't we just use standard neural networks on graph data?

The Challenge of Irregular Structure

Traditional neural networks assume fixed input structure:

MLPs: Fixed-length feature vector
CNNs: Fixed grid (224×224 pixels)
RNNs: Sequential ordering

Graphs violate all these assumptions:

Challenge	Description	Why It's Hard
Variable size	Different graphs have different numbers of nodes	Can't use fixed input dimensions
No ordering	Nodes have no canonical order like pixels or words	Permutation must not change output
Variable neighborhoods	Each node has different numbers of neighbors	Can't apply fixed-size filters
Discrete structure	No continuous spatial coordinates	Can't define convolution in usual sense

The Permutation Invariance Requirement

A fundamental property required for any graph learning model is permutation invariance (for graph-level tasks) or permutation equivariance (for node-level tasks):

f(\mathbf{P}\mathbf{X}, \mathbf{P}\mathbf{A}\mathbf{P}^T) = f(\mathbf{X}, \mathbf{A})

where $\mathbf{P}$ is any permutation matrix. This means the output should not depend on how we arbitrarily numbered the nodes.

Why Permutation Invariance Matters

Consider a social network: labeling users as 1,2,3,... vs 3,1,2,... is arbitrary—it's the same network! A graph neural network must recognize this equivalence and produce the same predictions regardless of node ordering.

What Graph Neural Networks Provide

Graph Neural Networks (GNNs) address these challenges through:

Local aggregation: Each node aggregates information from its neighbors, handling variable neighborhoods
Permutation equivariance: Operations are defined on edges and neighborhoods, not absolute positions
Weight sharing: The same transformation is applied at every node, enabling arbitrary graph sizes
Hierarchical representations: Stacking layers captures increasingly distant information

In the next sections, we'll see how these principles lead to specific GNN architectures like GCN, GraphSAGE, and GAT.

Summary

This section introduced the fundamental concepts of graphs as a data structure for machine learning. Here are the key takeaways:

Core Concepts

Concept	Definition	ML Relevance
Graph G = (V, E)	Nodes V and edges E	Universal structure for relational data
Neighborhood N(v)	Set of nodes adjacent to v	Foundation of GNN message passing
Degree deg(v)	\|N(v)\|	Node importance measure
Adjacency Matrix A	A[i,j] = 1 if edge (i,j)	Matrix form for graph operations
Graph Laplacian L	D - A	Spectral analysis of graphs

Key Equations

Graph definition: $G = (V, E)$
Neighborhood: $\mathcal{N}(v) = \{u : (u, v) \in E\}$
Degree: $\text{deg}(v) = \sum_j A_{vj}$
Density: $\rho = \frac{2|E|}{n(n-1)}$
Laplacian: $\mathbf{L} = \mathbf{D} - \mathbf{A}$

Looking Forward

In the next section, we'll explore why we need specialized neural networks for graphs—understanding the limitations of traditional architectures and the key ideas that enable graph neural networks.

Knowledge Check

Test your understanding of graph fundamentals with this quiz:

Question 1 of 10Score: 0/0

In graph terminology, what does the degree of a node represent?

Exercises

Conceptual Questions

Explain why social networks typically exhibit a power-law degree distribution. What real-world phenomenon does this reflect?
Given a bipartite graph representing users and movies (edges = watched), what would edges in the user-user projection graph represent?
Why is the graph Laplacian positive semi-definite? What does this imply about its eigenvalues?
A tree with $n$ nodes has $n-1$ edges. Use this to explain why adding any edge to a tree creates exactly one cycle.

Mathematical Exercises

Handshaking Lemma: Prove that $\sum_{v \in V} \text{deg}(v) = 2|E|$ for any undirected graph.
Adjacency Matrix Powers: Show that $(\mathbf{A}^2)_{ij}$ equals the number of common neighbors between nodes $i$ and $j$ .
Bipartite Characterization: Prove that a graph is bipartite if and only if it contains no odd-length cycles.
Laplacian Properties: For the graph Laplacian $\mathbf{L}$ , show that $\mathbf{L}\mathbf{1} = \mathbf{0}$ where $\mathbf{1}$ is the all-ones vector.

Coding Exercises

Graph Class Implementation: Implement a Python class that supports both adjacency matrix and adjacency list representations, with methods to convert between them.
Degree Distribution: Load a real-world graph dataset (e.g., Karate Club from PyTorch Geometric) and plot its degree distribution. Does it follow a power law?
BFS Implementation: Implement breadth-first search to compute shortest paths from a source node to all other nodes.
Connected Components: Implement an algorithm to find all connected components in an undirected graph.

Solution Hints

Exercise 1: Each edge contributes exactly 2 to the sum (one for each endpoint).
Exercise 2: $(\mathbf{A}^2)_{ij} = \sum_k A_{ik} A_{kj}$ —think about when this sum is non-zero.
Coding Exercise 2: Use matplotlib for plotting; consider log-log scale for power-law visualization.

Now that you understand graph fundamentals, you're ready to explore why we need Graph Neural Networks—understanding the unique challenges graphs pose for traditional neural networks and the key principles that enable learning on graph-structured data.

Learning Objectives

Why Graphs? The Big Picture

The Key Insight

What is a Graph?

Intuitive Examples

Formal Mathematical Definition

Node and Edge Attributes

Notation Convention

The Neighborhood Function

Quick Check

Interactive: Graph Explorer

Graph Properties

Graph Traversal

What to Explore

Types of Graphs

Undirected Graph

Description

Key Properties

ML Applications

Directed vs. Undirected Graphs

Weighted Graphs

Special Graph Types

Why Sparsity Matters

Graph Representations in Code

Graph G = (V, E)

Adjacency Matrix

Representation Comparison

Adjacency Matrix

Adjacency List

Edge List (COO Format)

Representation Trade-offs

Key Graph Properties

Degree Distribution

Graph Density

Paths and Connectivity

The Degree Matrix and Graph Laplacian

Why the Laplacian Matters

Quick Check

Real-World Graph Data

Social Networks

Nodes

Edges

Scale

Node Features

Edge Features

ML Tasks

Common Patterns in Real-World Graphs

Why Graphs in Machine Learning?

The Challenge of Irregular Structure

The Permutation Invariance Requirement

Why Permutation Invariance Matters

What Graph Neural Networks Provide

Summary

Core Concepts

Key Equations

Looking Forward

Knowledge Check

In graph terminology, what does the degree of a node represent?

Exercises

Conceptual Questions

Mathematical Exercises

Coding Exercises

Solution Hints