Learning Objectives
By the end of this section, you will be able to:
- Define generative modeling as the problem of learning to produce new samples from a data distribution
- Distinguish generative from discriminative models and understand when each is appropriate
- Identify the three core challenges of generative modeling: density estimation, sampling, and evaluation
- Appreciate the curse of dimensionality and why high-dimensional generation is so challenging
- Formulate the generative modeling problem mathematicallyin terms of learning probability distributions
The Big Picture: Learning Distributions
Consider this: you've seen thousands of faces in your life. If someone asked you to imagine a new face - one that doesn't belong to anyone real - you could do it effortlessly. Your brain has somehow learned the "distribution of faces" and can sample from it at will.
The Central Question: Can we build machines that learn to generate realistic new examples of any data type - images, audio, text, proteins, molecules - by learning from existing examples?
This is the generative modeling problem. It's one of the most fundamental challenges in machine learning, with profound implications for creativity, science, and our understanding of intelligence itself.
The modern breakthroughs you've seen - DALL-E creating images from text, GPT writing coherent stories, AlphaFold predicting protein structures - all stem from advances in generative modeling. Diffusion models represent the latest major paradigm, offering unprecedented quality and flexibility.
What Is Generative Modeling?
A generative model learns the underlying probability distribution from a dataset of examples . Once learned, we can:
- Sample: Generate new examples that look like they came from the original distribution
- Evaluate: Compute the likelihood of any given example
- Compress: Represent data efficiently using the learned structure
Generative vs. Discriminative Models
This is fundamentally different from discriminative modeling, which learns conditional distributions like for classification:
| Aspect | Discriminative | Generative |
|---|---|---|
| Goal | Predict labels given input | Model the full data distribution |
| Learns | P(y|x) - decision boundary | P(x) or P(x|y) - data structure |
| Training | Requires labeled data | Can use unlabeled data |
| Output | Classification/regression | New data samples |
| Example | Is this a cat? | Generate a new cat image |
Key Insight: Generative models are harder because they must understand the entire data structure, not just the features relevant for classification. A classifier might only need to detect "has whiskers" to identify cats, but a generator must understand fur texture, eye shape, pose, lighting, and countless other details.
Density Estimation
The first challenge in generative modeling is density estimation: learning a model that approximates the true data distribution .
The Maximum Likelihood Approach
The standard approach is to maximize the likelihood of the observed data:
Or equivalently, minimize the negative log-likelihood:
This is precisely the cross-entropy between the empirical data distribution and our model! (Recall from Chapter 0's information theory section.)
Why Is This Hard?
The challenge is that for complex data (images, audio, text), the true distribution lives in an incredibly high-dimensional space:
- A 256x256 RGB image has dimensions
- A 5-second audio clip at 44.1kHz has over 220,000 dimensions
- The space of possible configurations is astronomically large: for 8-bit images
The Curse of Dimensionality: In high dimensions, data points become increasingly sparse. If we tried to estimate density with a histogram using just 10 bins per dimension, a 100-dimensional problem would require bins - more than the number of atoms in the universe!
The Sampling Problem
Even if we had the perfect density function, how would we generate samples from it? This is the sampling problem.
Why Sampling Is Hard
For simple distributions like Gaussians, sampling is easy. But for complex, multi-modal distributions over high-dimensional spaces:
- Rejection sampling has exponentially low acceptance rates in high dimensions
- MCMC methods (like Metropolis-Hastings) mix slowly and may get stuck in modes
- Inverse CDF requires computing intractable integrals
Different Approaches to Sampling
Different generative model families tackle sampling in different ways:
| Model Family | Sampling Approach | Trade-off |
|---|---|---|
| VAEs | Decode random latent codes | Fast, but blurry outputs |
| GANs | Transform noise through generator | High quality, but mode collapse |
| Flows | Invertible transformation of noise | Exact likelihood, limited architecture |
| Autoregressive | Sample one element at a time | Exact likelihood, very slow |
| Diffusion | Iterative denoising from noise | High quality, slow (but parallelizable) |
The Diffusion Insight: Diffusion models solve sampling by learning to reverse a gradual noising process. Starting from pure noise (easy to sample!), they iteratively denoise until reaching a clean sample. This turns the hard problem of sampling from a complex distribution into many easy steps.
The Evaluation Challenge
How do we know if a generative model is good? This is perhaps the trickiest challenge of all. Unlike classification (where accuracy is clear), generative model evaluation is multi-faceted:
Evaluation Criteria
- Quality: Do generated samples look realistic? (Measured by FID, IS for images)
- Diversity: Does the model cover all modes of the data distribution? (Mode coverage metrics)
- Novelty: Is the model creating new examples or memorizing training data?
- Likelihood: How well does the model explain held-out data? (NLL in bits-per-dimension)
The Quality-Diversity Trade-off
There's an inherent tension between quality and diversity:
- A model that only generates the single most likely image would have perfect "quality" but zero diversity
- A model that generates every possible image uniformly would have perfect diversity but mostly garbage outputs
- Real generative models must balance these extremes
Mode Collapse: A common failure mode where the model learns to generate only a subset of the data distribution. GANs are notorious for this - the generator might produce perfect dogs but completely ignore cats. Diffusion models are more robust because they learn the full distribution through denoising.
Real-World Applications
Generative models have transformed numerous fields:
Computer Vision
- Image synthesis: Creating photorealistic images from text, sketches, or other images (DALL-E, Midjourney, Stable Diffusion)
- Image editing: Inpainting, super-resolution, style transfer
- Video generation: Generating temporally coherent video sequences
Audio and Speech
- Text-to-speech: Natural voice synthesis
- Music generation: Creating new compositions in various styles
- Voice conversion: Transforming one speaker's voice to another
Science and Medicine
- Drug discovery: Generating novel molecular structures
- Protein design: Creating proteins with desired properties
- Medical imaging: Synthetic data augmentation for rare conditions
Robotics and Simulation
- World models: Predicting future states for planning
- Synthetic data: Generating training data for perception
- Imitation learning: Generating expert trajectories
Mathematical Formulation
Let's formalize the generative modeling problem mathematically:
The Setup
- Data: We observe samples from unknown distribution
- Model: We define a parametric family
- Goal: Find such that
The Objective
We minimize a divergence between model and data distributions:
Different choices of lead to different methods:
| Divergence | Method | Properties |
|---|---|---|
| KL(p_data || p_theta) | Maximum Likelihood | Requires tractable p_theta |
| KL(p_theta || p_data) | Variational Inference | Mode-seeking |
| Jensen-Shannon | GANs | Adversarial training |
| Wasserstein | Optimal Transport | Geometric, stable gradients |
| Score Matching | Diffusion/Score Models | Only needs score function |
The Diffusion Formulation
Diffusion models take a unique approach: instead of directly modeling, they learn to reverse a noising process:
The model learns the reverse:
This seemingly indirect approach turns out to be remarkably effective! We'll explore why in the coming sections.
Summary
Generative modeling is the problem of learning to sample from data distributions. The key challenges are:
- Density Estimation: Learning a model that approximates the true data distribution, despite the curse of dimensionality
- Sampling: Efficiently generating samples from the learned distribution, even when it's complex and multimodal
- Evaluation: Assessing quality, diversity, and novelty of generated samples without ground truth
Different generative model families (VAEs, GANs, Flows, Autoregressive, Diffusion) make different trade-offs in addressing these challenges.
Looking Ahead: In the next section, we'll survey the landscape of generative models, understanding the strengths and weaknesses of each major family. This will set the stage for understanding why diffusion models have emerged as a leading paradigm.