Learning Objectives
By the end of this section, you will be able to:
- Explain why we need to reverse the forward diffusion process to generate new samples
- Describe the mathematical goal: that approximates the true reverse
- Understand why the true reverse is intractable
- Recognize the key insight: conditioning on makes the reverse tractable
The Generation Problem
In Chapter 2, we learned how to systematically destroy data by adding noise:
For generation, we need to run this process in reverse:
The Core Idea: If we can learn to reverse each small noise-adding step, we can start from pure noise and iteratively denoise to obtain a clean sample from the data distribution.
Why Small Steps Matter
The key insight from thermodynamics and score matching theory is that when the forward steps are small enough, the reverse process also becomes Gaussian. This is not obvious - in general, reversing a stochastic process can lead to complex, non-Gaussian distributions.
For small , the reverse transition has approximately the same functional form as the forward:
Reversing the Markov Chain
Given a Markov chain, what is its reverse? By Bayes' theorem:
This tells us that the reverse transition depends on:
| Term | What It Is | Do We Know It? |
|---|---|---|
| q(x_t|x_{t-1}) | Forward transition | Yes - we defined it |
| q(x_{t-1}) | Marginal at t-1 | No - depends on data distribution |
| q(x_t) | Marginal at t | No - depends on data distribution |
The problem is that and are marginal distributionsthat depend on the unknown data distribution . We cannot compute them analytically.
The Intractable True Reverse
The true reverse process is intractable because it requires integrating over all possible :
This integral is over the entire data space - computationally infeasible.
The Key Insight
Why Conditioning Helps
When we know , we can use Bayes' rule with all Gaussian terms:
All three terms on the right are Gaussian (from Chapter 2), so the result is also Gaussian. We can derive the exact mean and variance.
Learning the Reverse Process
Since we don't know during generation, we train a neural network to predict what we need. The learnable reverse process is parameterized as:
What Does the Network Learn?
There are three equivalent ways to parameterize what the network predicts:
- Predict the mean: directly
- Predict the noise: , then compute the mean
- Predict the clean data: , then compute the mean
DDPM Choice: The original paper showed that predicting the noise works remarkably well and leads to a simple training objective. This is equivalent to learning the score function.
The Generation Algorithm
Once trained, generation is straightforward:
- Sample
- For :
- Predict noise:
- Compute mean from
- Sample:
- Return
Key Takeaways
- Generation reverses the forward process: Start from noise, iteratively denoise to get clean samples
- True reverse is intractable: requires marginals over unknown data distribution
- Conditional reverse is tractable: is Gaussian with known mean and variance
- Learn to approximate: Train to match the tractable posterior by predicting the noise
- Small steps are key: Gaussian reverse only holds when forward steps are small
Looking Ahead: In the next section, we'll derive the exact form of the tractable posterior , which serves as the training target.