Learning Objectives
By the end of this section, you will be able to:
- Explain why the noise schedule critically affects generation quality
- Derive the relationship between and
- Compare linear and cosine schedules and understand their trade-offs
- Implement multiple noise schedules in PyTorch
Why the Schedule Matters
The noise schedule determines how quickly we destroy information during the forward process. This seemingly simple hyperparameter has profound effects on:
- Sample Quality: Too aggressive noise destroys fine details; too slow noise makes generation harder to learn
- Training Efficiency: The schedule affects which timesteps contribute most to the loss, impacting convergence
- Generation Speed: Different schedules enable different sampling strategies, some much faster than others
The Key Insight: The noise schedule controls the "information destruction curve." We want to destroy information gradually enough that each reverse step is learnable, but completely enough that we reach pure noise.
From Beta to Alpha Bar
Recall from Section 2.1 that the single-step transition uses . But what we really care about is the cumulative effect - how much signal remains after steps. This is captured by :
The quantity tells us what fraction of the original signal variance remains at timestep . When , almost all signal is lost.
The Linear Schedule
The original DDPM paper (Ho et al., 2020) used a linear schedule:
with typical values and .
| Property | Value | Interpretation |
|---|---|---|
| β₁ (first step) | 0.0001 | Very small noise initially |
| β_T (last step) | 0.02 | Still relatively small per step |
| Total T | 1000 | Many small steps accumulate to destroy signal |
| α_bar_T | ~0.000006 | Almost no signal remains at end |
Problem with Linear Schedule
The linear schedule has a significant flaw: decays too quickly in early timesteps. Because is a product, even small values compound exponentially:
This means by timestep 500, 96% of the signal is already gone! The model must learn most of the high-frequency details in very few effective timesteps.
The Cosine Schedule
The Improved DDPM paper (Nichol & Dhariwal, 2021) proposed the cosine schedule, which directly specifies rather than :
The offset prevents from being exactly zero.
Why Cosine Works Better
The cosine function creates a smooth S-curve for :
- Slow start: stays close to 1 for early timesteps, preserving fine details longer
- Gradual middle: Smooth decay through middle timesteps
- Complete destruction: Still reaches near-zero by
Deriving Beta from Alpha Bar
This works because .
Interactive Schedule Comparison
Use the interactive visualization below to compare different noise schedules. Notice how the cosine schedule preserves signal () for longer during early timesteps:
📊Noise Schedule Comparison
α̅t (Signal Retention)
βt (Noise Added per Step)
SNR = α̅t/(1-α̅t) (Log Scale)
Values at t = 500
| Schedule | βt | α̅t | √α̅t | √(1-α̅t) | SNR |
|---|---|---|---|---|---|
| Linear (DDPM) | 0.010050 | 0.077992 | 0.279271 | 0.960212 | 0.0846 |
| Cosine (Improved DDPM) | 0.003146 | 0.493844 | 0.702740 | 0.711447 | 0.9757 |
Linear Schedule
Original DDPM. α̅t decays too quickly early on, potentially losing high-frequency details. Simple but not optimal.
Cosine Schedule
Improved DDPM. Slower decay preserves more signal structure. Better image quality, especially for high resolution.
Implementation
Here is a complete implementation of multiple noise schedules:
Signal-to-Noise Ratio Perspective
A powerful way to understand noise schedules is through the Signal-to-Noise Ratio (SNR):
The SNR measures the ratio of signal variance to noise variance at timestep :
- SNR → ∞: Pure signal (t = 0)
- SNR = 1: Equal signal and noise
- SNR → 0: Pure noise (t = T)
Log-SNR
In practice, we often work with because it spans many orders of magnitude:
The log-SNR typically ranges from about +10 (mostly signal) to -10 (mostly noise). A good noise schedule should have log-SNR decrease approximately linearly with .
Modern Perspective: Recent work (e.g., Karras et al., 2022) argues that diffusion models should be parameterized directly in terms of SNR or log-SNR, as this provides a more principled view of the denoising task.
Choosing a Schedule
Which schedule should you use? Here are practical guidelines:
| Schedule | Best For | Key Trade-off |
|---|---|---|
| Linear | Small images (32×32), quick experiments | Fast but may lose high-freq details |
| Cosine | High-resolution images, production models | Better quality but more complex |
| Sigmoid | Custom applications | Tunable middle transition |
| Learned | Maximum performance | Adds training complexity |
General Recommendations
- Start with cosine for most applications - it works well across image resolutions
- Verify endpoint: Ensure so the endpoint is effectively standard Gaussian
- Check SNR distribution: Log-SNR should span the range uniformly for balanced training
- Consider your data: Images with fine details may benefit from slower early decay
Key Takeaways
- Schedule is critical: The choice of directly affects sample quality
- Alpha bar matters most: determines how much signal remains
- Linear is simple but flawed: Signal decays too quickly in early steps
- Cosine preserves signal: Designed so decays more gradually
- SNR perspective: provides intuitive interpretation
Looking Ahead: In the next section, we'll derive the closed-form expression for sampling at any timestep directly from , which is the key to efficient training.