Learning Objectives
By the end of this section, you will be able to:
- Define graphs mathematically as structures G = (V, E) with nodes, edges, and their properties
- Distinguish between graph types including directed, undirected, weighted, bipartite, and acyclic graphs
- Represent graphs computationally using adjacency matrices, adjacency lists, and edge lists
- Calculate key graph properties such as degree, density, and path length
- Identify graph-structured data in real-world applications from social networks to molecules
- Understand why graphs matter for ML and the limitations of traditional neural networks on graph data
Why This Matters: Graphs are everywhere—from social networks and molecules to knowledge bases and road networks. Unlike images (grids) or text (sequences), graph data has irregular structure that traditional neural networks cannot handle. Understanding graph fundamentals is the foundation for Graph Neural Networks (GNNs), which have revolutionized applications in drug discovery, recommendation systems, fraud detection, and more.
Why Graphs? The Big Picture
Consider the data structures you've encountered in deep learning so far:
| Data Type | Structure | Neural Architecture |
|---|---|---|
| Images | 2D grid of pixels | Convolutional Neural Networks (CNNs) |
| Text/Audio | 1D sequence of tokens | RNNs, LSTMs, Transformers |
| Tabular | Fixed feature vector | Fully Connected Networks |
These architectures exploit the regular structure of their input data:
- CNNs leverage the 2D grid structure of images with local receptive fields and weight sharing
- RNNs/Transformers process sequences by modeling temporal or positional dependencies
- MLPs treat input as a fixed-length feature vector with no inherent structure
But what about data with irregular structure? Consider:
- A social network where people have varying numbers of friends
- A molecule with atoms connected in arbitrary configurations
- A 3D mesh representing a surface with variable connectivity
- A knowledge graph linking entities through different relationship types
The Key Insight
What is a Graph?
At its core, a graph is a mathematical structure that models pairwise relationships between objects. The objects are called nodes (or vertices), and the relationships are called edges (or links).
Intuitive Examples
| Domain | Nodes | Edges |
|---|---|---|
| Social Network | People | Friendships |
| Web | Web pages | Hyperlinks |
| Molecule | Atoms | Chemical bonds |
| Road Network | Intersections | Roads |
| Neural Network | Neurons/Layers | Connections |
| Citation Network | Papers | Citations |
The beauty of graph representation is its universality: once data is represented as a graph, we can apply the same algorithms and neural architectures regardless of the original domain.
Formal Mathematical Definition
A graph is defined as an ordered pair:
where:
- is a set of vertices (nodes):
- is a set of edges:
We typically denote:
- as the number of nodes
- (or ) as the number of edges
Node and Edge Attributes
In practical applications, nodes and edges often carry additional information:
For example, in a molecular graph:
- Node features might encode atom type, charge, and hybridization
- Edge features might encode bond type (single, double, triple) and bond length
Notation Convention
- to denote individual nodes
- to denote an edge between nodes and
- to denote the neighborhood of node —the set of nodes connected to
The Neighborhood Function
One of the most important concepts in graph neural networks is the neighborhood:
This is the set of all nodes directly connected to . The degree of a node is the size of its neighborhood:
Quick Check
In a social network graph where nodes are people and edges are friendships, what does N(Alice) represent?
Interactive: Graph Explorer
Explore the fundamental structure of graphs with this interactive visualization. You can add nodes, connect them with edges, and observe how graph properties change in real-time.
Graph Properties
Graph Traversal
Select a node to set start point
What to Explore
- Add nodes: Click "Add Node" then click on the canvas to create new vertices
- Connect nodes: Click "Add Edge" then click two nodes to connect them
- Observe degrees: Select a node to see its degree and neighbors
- Toggle directed: Enable directed mode to see how direction affects the graph
- Run BFS: Select a node and run breadth-first search to see graph traversal
Types of Graphs
Graphs come in many varieties, each suited to different types of relationships. The choice of graph type significantly impacts how we design algorithms and neural networks.
Undirected Graph
Description
Edges have no direction. If node A is connected to node B, then B is also connected to A.
Directed vs. Undirected Graphs
The most fundamental distinction is whether edges have direction:
| Type | Notation | Meaning | Example |
|---|---|---|---|
| Undirected | (u, v) = (v, u) | Symmetric relationship | Facebook friendships |
| Directed | (u, v) ≠ (v, u) | Asymmetric relationship | Twitter follows |
In undirected graphs, if Alice is connected to Bob, then Bob is also connected to Alice. The edge is the same as .
In directed graphs (digraphs), edges have a direction. Alice following Bob on Twitter does not mean Bob follows Alice.
Weighted Graphs
When edges carry numerical values, we have a weighted graph:
Weights can represent distances, costs, similarities, or interaction strengths. For instance:
- Road network: Edge weight = travel time or distance
- Social network: Edge weight = interaction frequency
- Protein network: Edge weight = interaction confidence score
Special Graph Types
| Type | Definition | Key Property |
|---|---|---|
| Complete Graph Kₙ | Every pair of nodes is connected | |E| = n(n-1)/2 |
| Bipartite Graph | Nodes split into two sets; edges only between sets | Two-colorable |
| Tree | Connected graph with no cycles | |E| = n - 1 |
| DAG | Directed graph with no directed cycles | Has topological ordering |
| Sparse Graph | |E| = O(n) | Most real-world graphs |
| Dense Graph | |E| = O(n²) | Near-complete connectivity |
Why Sparsity Matters
Graph Representations in Code
To work with graphs computationally, we need efficient data structures. The choice of representation affects both memory usage and algorithm efficiency.
Graph G = (V, E)
Adjacency Matrix
| A | B | C | D | E | |
|---|---|---|---|---|---|
| A | 0 | 1 | 1 | 0 | 0 |
| B | 1 | 0 | 1 | 1 | 0 |
| C | 1 | 1 | 0 | 0 | 1 |
| D | 0 | 1 | 0 | 0 | 1 |
| E | 0 | 0 | 1 | 1 | 0 |
Representation Comparison
| Operation | Adjacency Matrix | Adjacency List | Edge List |
|---|---|---|---|
| Space | O(n²) | O(n + e) | O(e) |
| Check edge (u, v) | O(1) | O(degree) | O(e) |
| Get neighbors | O(n) | O(degree) | O(e) |
| Add edge | O(1) | O(1) | O(1) |
| Iterate all edges | O(n²) | O(n + e) | O(e) |
Adjacency Matrix
The adjacency matrix encodes edges as matrix entries:
Properties of the adjacency matrix:
- Undirected graphs: is symmetric ()
- Self-loops: Diagonal entries indicate self-connections
- Degree: Row sum gives degree:
- Paths: counts paths of length from to
Adjacency List
For sparse graphs, an adjacency list is more memory-efficient:
Edge List (COO Format)
The edge list format stores edges as pairs of indices. This is the standard format in PyTorch Geometric:
Representation Trade-offs
- Adjacency Matrix: Fast edge lookup O(1), but O(n²) space—only for small/dense graphs
- Adjacency List: O(n + e) space, fast neighbor iteration—best for most algorithms
- Edge Index (COO): O(e) space, GPU-friendly—standard for GNNs in PyTorch Geometric
Key Graph Properties
Understanding graph properties is essential for both algorithm design and understanding GNN behavior.
Degree Distribution
The degree distribution gives the probability that a randomly selected node has degree :
Many real-world graphs follow a power-law degree distribution:
This means most nodes have few connections, but a few "hub" nodes have many. Such graphs are called scale-free.
Graph Density
Density measures how connected a graph is relative to its maximum possible edges:
| Graph Type | Density Range | Examples |
|---|---|---|
| Sparse | ρ ≈ O(1/n) | Social networks, road networks |
| Dense | ρ ≈ O(1) | Complete graphs, cliques |
| Very sparse | |E| ≈ O(n) | Trees, planar graphs |
Paths and Connectivity
A path from node to is a sequence of edges connecting them:
Key path-related concepts:
| Concept | Definition | Significance |
|---|---|---|
| Shortest Path | Minimum number of edges between two nodes | Used for distance metrics |
| Diameter | Maximum shortest path in the graph | Indicates how "spread out" the graph is |
| Connected Component | Maximal set of mutually reachable nodes | Isolated subgraphs |
| Average Path Length | Mean shortest path over all node pairs | Measures global connectivity |
The Degree Matrix and Graph Laplacian
Two matrices are fundamental to spectral graph theory and many GNN architectures:
The degree matrix is diagonal with degrees on the diagonal:
The graph Laplacian captures the difference between a node and its neighbors:
The normalized Laplacian (used in many GNNs):
Why the Laplacian Matters
- Its eigenvalues encode graph connectivity (number of zeros = number of connected components)
- Its eigenvectors form a "Fourier basis" for signals on the graph
- Spectral GNNs (like ChebNet) use Laplacian eigenvalues for filtering
- The normalized form appears directly in GCN's propagation rule
Quick Check
A graph has 100 nodes and 4950 edges. What is its density?
Real-World Graph Data
Graph-structured data appears across virtually every domain. Understanding these applications motivates why we need specialized neural network architectures.
Social Networks
Platforms like Facebook, Twitter, and LinkedIn represent users as nodes and relationships (friendships, follows, connections) as edges.
Nodes
Users/Accounts
Edges
Friendships, Follows, Interactions
Scale
~3B users (Facebook)
Node Features
Edge Features
ML Tasks
Common Patterns in Real-World Graphs
Despite their diversity, most real-world graphs share common structural properties:
- Power-law degree distribution: A few hub nodes have many connections; most nodes have few
- Small-world property: Short average path lengths despite sparse connections ("six degrees of separation")
- Community structure: Nodes cluster into densely connected groups with sparse inter-group connections
- Homophily: Connected nodes tend to be similar (birds of a feather flock together)
- Heterophily: In some graphs, connected nodes are dissimilar (buyers and sellers)
Why Graphs in Machine Learning?
Now that we understand graph structure, why can't we just use standard neural networks on graph data?
The Challenge of Irregular Structure
Traditional neural networks assume fixed input structure:
- MLPs: Fixed-length feature vector
- CNNs: Fixed grid (224×224 pixels)
- RNNs: Sequential ordering
Graphs violate all these assumptions:
| Challenge | Description | Why It's Hard |
|---|---|---|
| Variable size | Different graphs have different numbers of nodes | Can't use fixed input dimensions |
| No ordering | Nodes have no canonical order like pixels or words | Permutation must not change output |
| Variable neighborhoods | Each node has different numbers of neighbors | Can't apply fixed-size filters |
| Discrete structure | No continuous spatial coordinates | Can't define convolution in usual sense |
The Permutation Invariance Requirement
A fundamental property required for any graph learning model is permutation invariance (for graph-level tasks) or permutation equivariance (for node-level tasks):
where is any permutation matrix. This means the output should not depend on how we arbitrarily numbered the nodes.
Why Permutation Invariance Matters
What Graph Neural Networks Provide
Graph Neural Networks (GNNs) address these challenges through:
- Local aggregation: Each node aggregates information from its neighbors, handling variable neighborhoods
- Permutation equivariance: Operations are defined on edges and neighborhoods, not absolute positions
- Weight sharing: The same transformation is applied at every node, enabling arbitrary graph sizes
- Hierarchical representations: Stacking layers captures increasingly distant information
In the next sections, we'll see how these principles lead to specific GNN architectures like GCN, GraphSAGE, and GAT.
Summary
This section introduced the fundamental concepts of graphs as a data structure for machine learning. Here are the key takeaways:
Core Concepts
| Concept | Definition | ML Relevance |
|---|---|---|
| Graph G = (V, E) | Nodes V and edges E | Universal structure for relational data |
| Neighborhood N(v) | Set of nodes adjacent to v | Foundation of GNN message passing |
| Degree deg(v) | |N(v)| | Node importance measure |
| Adjacency Matrix A | A[i,j] = 1 if edge (i,j) | Matrix form for graph operations |
| Graph Laplacian L | D - A | Spectral analysis of graphs |
Key Equations
- Graph definition:
- Neighborhood:
- Degree:
- Density:
- Laplacian:
Looking Forward
In the next section, we'll explore why we need specialized neural networks for graphs—understanding the limitations of traditional architectures and the key ideas that enable graph neural networks.
Knowledge Check
Test your understanding of graph fundamentals with this quiz:
In graph terminology, what does the degree of a node represent?
Exercises
Conceptual Questions
- Explain why social networks typically exhibit a power-law degree distribution. What real-world phenomenon does this reflect?
- Given a bipartite graph representing users and movies (edges = watched), what would edges in the user-user projection graph represent?
- Why is the graph Laplacian positive semi-definite? What does this imply about its eigenvalues?
- A tree with nodes has edges. Use this to explain why adding any edge to a tree creates exactly one cycle.
Mathematical Exercises
- Handshaking Lemma: Prove that for any undirected graph.
- Adjacency Matrix Powers: Show that equals the number of common neighbors between nodes and .
- Bipartite Characterization: Prove that a graph is bipartite if and only if it contains no odd-length cycles.
- Laplacian Properties: For the graph Laplacian , show that where is the all-ones vector.
Coding Exercises
- Graph Class Implementation: Implement a Python class that supports both adjacency matrix and adjacency list representations, with methods to convert between them.
- Degree Distribution: Load a real-world graph dataset (e.g., Karate Club from PyTorch Geometric) and plot its degree distribution. Does it follow a power law?
- BFS Implementation: Implement breadth-first search to compute shortest paths from a source node to all other nodes.
- Connected Components: Implement an algorithm to find all connected components in an undirected graph.
Solution Hints
- Exercise 1: Each edge contributes exactly 2 to the sum (one for each endpoint).
- Exercise 2: —think about when this sum is non-zero.
- Coding Exercise 2: Use matplotlib for plotting; consider log-log scale for power-law visualization.
Now that you understand graph fundamentals, you're ready to explore why we need Graph Neural Networks—understanding the unique challenges graphs pose for traditional neural networks and the key principles that enable learning on graph-structured data.