Learning Objectives
By the end of this section, you will be able to:
📚 Core Knowledge
- • Understand the computational challenge of probabilistic inference
- • Explain variable elimination and its complexity
- • Describe belief propagation and message passing
- • Know when inference is tractable vs. intractable
🔧 Practical Skills
- • Implement variable elimination from scratch
- • Apply the sum-product algorithm to trees
- • Construct junction trees for exact inference
- • Choose appropriate inference algorithms
🧠 Deep Learning Connections
- • Graph Neural Networks — Message passing directly inspired by belief propagation
- • Variational Autoencoders — Use approximate inference via ELBO optimization
- • Structured Prediction — CRF layers in neural networks use inference algorithms
- • Neural Message Passing — Learned messages replace hand-designed factors
Where You'll Apply This: Medical diagnosis systems, protein structure prediction, image segmentation, natural language processing, robotics localization, and any application requiring reasoning under uncertainty with structured dependencies.
The Big Picture
Graphical models compactly represent joint probability distributions. But representation is only half the story—we need inference algorithms to answer questions about these distributions. Given a graphical model, inference lets us compute:
Marginal Probabilities
— probability of a single variable, summing over all others
Conditional Probabilities
— posterior given evidence
MAP Assignment
— most likely configuration
The Inference Challenge
Consider a joint distribution over binary variables. The naive approach to computing requires summing over configurations—exponential in the number of variables! Graphical models exploit conditional independence to make this tractable, but even then inference can be NP-hard in the worst case.
The Fundamental Trade-off
Tractable Cases
- • Tree-structured graphs:
- • Low treewidth:
- • Special structures (chains, polytrees)
Intractable Cases
- • Dense graphs, grids
- • High treewidth
- • Requires approximate inference
Historical Development
Judea Pearl (1982-1988)
Developed belief propagation and the message-passing paradigm for inference. His book "Probabilistic Reasoning in Intelligent Systems" (1988) established the foundations of graphical models. Won the Turing Award in 2011.
Junction Trees (1990s)
Lauritzen & Spiegelhalter developed the junction tree algorithm, which generalizes belief propagation to work on any graphical model. This became the standard for exact inference in practical systems.
Loopy BP & Variational Methods (2000s)
Researchers discovered that belief propagation often works well even on graphs with loops. Connections to variational methods and statistical physics led to new approximate inference algorithms like expectation propagation.
Neural Message Passing (2015-present)
Graph Neural Networks apply learnable message passing, directly inspired by belief propagation. Models like Graph Attention Networks (GAT) and Message Passing Neural Networks (MPNN) now power molecular property prediction, protein folding, and social network analysis.
Variable Elimination
Variable elimination is the most fundamental exact inference algorithm. It systematically removes variables from the model by summing them out, eventually leaving only the query variable(s).
The Algorithm
The key insight is that we can push summations inside products due to the distributive law:
We can group factors containing and sum them out together, creating a new factor over .
Variable Elimination Steps
- 1
Choose elimination order
Order variables to eliminate (excluding query). Order affects complexity!
- 2
For each variable to eliminate:
- a. Find all factors containing this variable
- b. Multiply these factors together
- c. Sum out (marginalize) the variable
- d. Add the new factor to the factor set
- 3
Multiply remaining factors
The result is the marginal probability
Interactive: Variable Elimination
Watch variable elimination compute step by step. Notice how factors are combined and variables are summed out in order, eventually leaving only the query variable.
Variable Elimination Algorithm
Computing P(D) by eliminating variables A → B → C
Active Factors
Variable Elimination computes marginal probabilities by:
- Multiplying all factors containing the variable
- Summing out (marginalizing) that variable
- Creating a new factor over the remaining variables
Complexity: O(n · d^w) where w is the treewidth of the graph
Elimination Ordering
The elimination order dramatically affects computational cost. A poor order can create intermediate factors that are exponentially large.
Treewidth and Complexity
The complexity of variable elimination is where is the treewidth of the graph (induced by the elimination order), is the maximum domain size, and is the number of variables.
Good ordering (low treewidth)
• Chain: treewidth 1
• Tree: treewidth 1
• Grid n×n: treewidth n
Heuristics for ordering
• Min-neighbors: eliminate fewest connections first
• Min-fill: minimize new edges added
• Finding optimal is NP-hard!
Belief Propagation
Belief propagation (BP), also called the sum-product algorithm, computes all marginals simultaneously through local message passing. Unlike variable elimination which processes one query at a time, BP can answer all marginal queries efficiently after a single pass.
The Sum-Product Algorithm
In BP, nodes (variables) and factors exchange messages. Each message summarizes what the sender "believes" about the recipient based on its local information and messages it received from others.
Message Update Rules
Variable to Factor Message
Product of all incoming factor messages except the recipient.
Factor to Variable Message
Marginalize the factor times incoming variable messages.
Final Belief (Marginal)
Product of all incoming messages, normalized to be a valid probability.
Interactive: Message Passing
Watch belief propagation in action. Observe how messages flow through the graph, carrying information from observed variables to update beliefs throughout the network. Notice how beliefs converge after a few iterations.
Belief Propagation (Sum-Product)
Message passing on a factor graph with D observed
Loopy Belief Propagation
On tree-structured graphs, belief propagation is exact and converges in two passes (leaves to root, root to leaves). But what about graphs with loops?
✓ When Loopy BP Works Well
- • Single loop or sparse graph structure
- • Weak correlations between variables
- • Strong evidence at some nodes
- • Attractive potentials (prefer agreement)
✗ When Loopy BP Fails
- • Many tight loops (dense graphs)
- • Strong frustrated interactions
- • Messages may oscillate or diverge
- • Beliefs can be overconfident
Junction Tree Algorithm
The junction tree algorithm (also called clique tree algorithm) provides exact inference on any graphical model. It works by transforming the original graph into a tree of cliques, then running belief propagation on this tree.
Building the Junction Tree
The construction involves several steps, each with a specific purpose:
Junction Tree Algorithm
Exact inference in arbitrary graphical models
Step 1: Moralize the Graph
Connect all parents of each node (marry the parents). Convert directed edges to undirected.
The Running Intersection Property
A valid junction tree must satisfy the running intersection property: if a variable appears in two cliques, it must appear in every clique on the unique path between them.
If X ∈ C₁ and X ∈ C₃, then X ∈ C₂ for all C₂ on path(C₁, C₃)
This property ensures that information about each variable is propagated correctly throughout the tree, guaranteeing exact inference.
Approximate Inference
When exact inference is intractable (high treewidth, large state spaces), we turn to approximate methods. The two main families are:
Sampling Methods
- Gibbs Sampling: Iteratively sample each variable conditioned on current values of others
- Importance Sampling: Weight samples by likelihood ratio
- Particle Filtering: Sequential Monte Carlo for temporal models
Asymptotically exact but may require many samples
Variational Methods
- Mean Field: Assume variables are independent, optimize per-variable distributions
- Loopy BP: BP on graphs with cycles (approximate)
- Expectation Propagation: Moment matching with tractable family
Deterministic optimization, fast but biased
Algorithm Comparison
Choosing the right inference algorithm depends on your graph structure, accuracy requirements, and computational budget. Use this interactive comparison to understand the trade-offs.
Inference Algorithm Comparison
Choose the right algorithm for your problem
Advantages
- +Exact inference
- +Works on any graph
- +Simple to implement
Limitations
- −Exponential in treewidth
- −Single query only
- −No structure reuse
| Algorithm | Type | Complexity | Trees | General |
|---|---|---|---|---|
| Variable Elimination | exact | O(n · d^w) | ○ | ✓ |
| Belief Propagation | exact | O(n · d²) | ✓ | ≈ |
| Junction Tree | exact | O(n · d^w) | ○ | ✓ |
| Gibbs Sampling | approximate | O(samples · n) | ○ | ≈ |
| Variational Inference | approximate | O(iterations · n) | ○ | ≈ |
Applications in Deep Learning
Inference algorithms from graphical models have profoundly influenced modern deep learning architectures:
🔗 Graph Neural Networks
GNNs directly implement message passing: each node aggregates messages from neighbors and updates its representation. The key difference is that message functions are learned rather than derived from probabilistic factors. Message Passing Neural Networks (MPNNs), Graph Attention Networks (GATs), and Graph Convolutional Networks (GCNs) all follow this paradigm.
🎯 Structured Prediction with CRFs
BiLSTM-CRF models for named entity recognition and POS tagging use inference algorithms at their core. The CRF layer computes the partition function and marginals using forward-backward (a special case of belief propagation on chains). This allows gradients to flow through the structured prediction layer.
🧬 Protein Structure Prediction
AlphaFold2 uses a form of message passing called "Evoformer" to reason about amino acid relationships. The attention mechanism can be seen as a soft, learnable version of belief propagation where "messages" are attention-weighted features between residue pairs.
🎲 Variational Autoencoders
VAEs perform approximate inference using the ELBO objective—the same framework used in variational inference for graphical models. The encoder network amortizes inference, learning to predict approximate posterior parameters directly from input data rather than running inference from scratch each time.
| Classical Algorithm | Deep Learning Analog | Key Innovation |
|---|---|---|
| Belief Propagation | Graph Neural Networks | Learned message functions |
| Forward-Backward | CRF Loss Layer | Differentiable structured prediction |
| Variational Inference | VAE + ELBO | Amortized inference with encoders |
| Junction Tree | Hierarchical Attention | Multi-scale message passing |
| Gibbs Sampling | Denoising Score Matching | Learned transition kernels |
Python Implementation
Let's implement the core inference algorithms from scratch. This implementation covers variable elimination and the essence of belief propagation.
- •
pgmpy— Python library for graphical models with full inference - •
Pyro/PyTorch— Probabilistic programming with GPU support - •
libDAI— C++ library with Python bindings, very efficient - •
OpenGM— Optimized for large-scale discrete inference
Knowledge Check
Test your understanding of inference algorithms in graphical models:
Knowledge Check
Question 1 of 8What is the main computational bottleneck in variable elimination?
Summary
Key Takeaways
Looking Ahead
In the next section, we'll explore Learning Graphical Model Structure—how to discover the graph structure itself from data. This is the inverse problem: instead of performing inference given a known structure, we infer which edges and dependencies exist. Structure learning is essential for discovering causal relationships and building interpretable models from observational data.