Mental Model
The CDF of a random variable , denoted , gives the probability that takes a value less than or equal to . The key intuition: it accumulates probability from the left up to the point .
"How much probability mass lies to the left of (and including) ?"
What the CDF Tells Us:
⚡ Why CDF is More Fundamental than PDF:
The CDF is the universal language of probability distributions—it works where PMF and PDF cannot.
Loading interactive demo...
Loading interactive demo...
Learning Objectives
By the end of this section, you will:
1Conceptual Foundation
- Define the CDF as cumulative probability:
2Construction
- Build CDFs from PMF (discrete → step function):
- Build CDFs from PDF (continuous → integral):
3Structural Properties (Why CDFs Are Powerful)
- Understand limits at , monotonicity, right-continuity, and jump sizes = point masses
4Operational Use
- Compute interval probabilities:
5Inverse Thinking
- Master the quantile function (inverse CDF), percentiles, and threshold selection
6Data-Driven View
- Build empirical CDFs from real data and understand convergence (Glivenko-Cantelli)
7Reliability & Time-to-Event
- Apply survival function and hazard rate intuition
8AI/ML Bridge
- Apply to sampling, calibration, and uncertainty modeling in modern ML systems
⚠️ Common Misconceptions
No—CDF is cumulative probability, not probability density. CDF values are probabilities (0 to 1), while PDF values can exceed 1.
Flat regions mean no additional probability in that interval—not zero probability overall. The CDF is constant where no probability mass exists.
Jumps indicate discrete probability mass at a point. The jump size equals . This is normal for discrete and mixed distributions.
Where You'll Apply This Knowledge:
🔁 Sampling & Generative Models
→ Convert uniform noise into samples from any distribution using the inverse CDF
→ Learn invertible mappings whose Jacobian relates densities—CDF intuition underlies monotonic transforms
📊 Statistical Decision Making
→ CDF inversion directly defines medians, quartiles, and tail risk
→ Intervals are defined by CDF probability mass, not point estimates
🧪 Experiments & Testing
→ p-values are tail probabilities computed from a test statistic's CDF
→ ROC curves and decision thresholds depend on cumulative distributions of scores
⏱ Reliability & Time-to-Event Modeling
→ Survival function is directly derived:
→ Measures instantaneous failure probability given survival—CDF encodes history
🤖 Probabilistic ML & Representation Learning
→ Calibration checks whether predicted probabilities match empirical CDFs
→ Sampling from latent distributions relies on transforming noise via learned CDF-like mappings
Unifying perspective: CDFs are the bridge between probability theory, statistical inference, and modern generative modeling—turning uncertainty into geometry.
Historical Context
The Quest for a Universal Tool
Throughout the 18th and 19th centuries, mathematicians faced a recurring challenge: every time they wanted to compute , they had to sum (for discrete) or integrate (for continuous) from the beginning.
The Core Need:
- Abraham de Moivre (1718): First tabulated cumulative normal probabilities
- Pierre-Simon Laplace (1812): Formalized integration for continuous cases
- Andrey Kolmogorov (1933): Made CDF the fundamental object in probability theory
Kolmogorov's Insight: Instead of defining probability through PMF (discrete) or PDF (continuous) separately, define it through a single object—the CDF—that works for all random variables: discrete, continuous, and even mixed!
The Problem CDF Solves
Consider how often we ask questions like: "What's the probability of gettingat most this value?"
| Question | Mathematical Form |
|---|---|
| Probability of waiting ≤ 5 minutes? | P(T ≤ 5) |
| Chance of scoring ≤ 80 on the test? | P(X ≤ 80) |
| Likelihood of temperature ≤ 30°C? | P(T ≤ 30) |
| Probability of ≤ 3 defects? | P(N ≤ 3) |
Without CDF, we'd compute each answer by summing or integrating from scratch:
❌ Without CDF
For discrete:
For continuous:
Recompute the entire sum/integral each time!
✓ With CDF
For any distribution:
Just look up the value! Pre-computed cumulative probability.
Key Insight: The CDF is like a "running total" of probability. It tells you how much probability has accumulated up to any point x—no recalculation needed!
Interactive: CDF Visualizer
Explore how the CDF accumulates probability as x increases. Toggle between discrete and continuous distributions to see step functions vs smooth curves.
Loading interactive demo...
Formal Definition
Definition: Cumulative Distribution Function (CDF)
The cumulative distribution function of a random variable X is defined as:
In words: F(x) = probability that X is at most x.
Symbol Reference
| Symbol | Name | Intuitive Meaning |
|---|---|---|
| F(x) | CDF at x | Probability that X takes a value ≤ x |
| P(X ≤ x) | Cumulative probability | All probability 'to the left of x' |
| F(b) - F(a) | Interval probability | P(a < X ≤ b) — probability between a and b |
| F⁻¹(p) | Quantile function | The x value where F(x) = p (inverse CDF) |
| S(x) = 1 - F(x) | Survival function | Probability of exceeding x |
Intuitive Statement
What the CDF tells us: "If I pick a random value of X, what's the probability it's at most x?"
Think of F(x) as a probability meter that starts at 0 and gradually fills up to 1 as you move from left to right along the number line.
Four Essential Properties
Every valid CDF must satisfy exactly four properties. These aren't arbitrary—each reflects a fundamental truth about probability!
1Limits at Infinity
Why: No probability below negative infinity (0%); all probability is accounted for by positive infinity (100%).
2Monotonically Non-Decreasing
Why: As x increases, we can only accumulate more probability, never less. The running total never decreases!
3Right-Continuous
Why: The definition includes the point x itself. Approaching from the right gives the same value.
4Jump Size = Point Probability
Why: For discrete RVs, jumps occur at each possible value. For continuous RVs, there are no jumps (P(X = x) = 0).
Interactive: Properties Explorer
Explore each property interactively. See what happens when properties are violated—it's no longer a valid CDF!
Loading interactive demo...
Discrete vs Continuous CDFs
The CDF looks fundamentally different depending on whether the random variable is discrete or continuous.
Discrete CDF: Step Function
- Shape: Staircase pattern
- Jumps: At each possible value k
- Jump height: Equals P(X = k)
- Flat regions: Between possible values
Continuous CDF: Smooth Curve
- Shape: Smooth, continuous curve
- Jumps: None (no discontinuities)
- Slope: Equals the PDF at each point
- Inflection: Where PDF peaks, CDF steepest
| Aspect | Discrete CDF | Continuous CDF |
|---|---|---|
| Visual shape | Staircase (step function) | Smooth S-curve |
| Jumps/discontinuities | Yes, at each possible value | No jumps (continuous) |
| P(X = x) | = F(x) - F(x⁻) > 0 | = 0 always |
| Computed from | Sum of PMF values | Integral of PDF |
| Derivative exists? | No (at jumps) | Yes, F'(x) = f(x) |
Interactive: CDF from PMF/PDF
Watch the CDF being constructed from the PMF (discrete) or PDF (continuous). See how summation creates steps and integration creates smooth curves.
Loading interactive demo...
Interactive: PDF Area = CDF Difference
This visualization shows the fundamental relationship: the area under the PDF between two points equals the difference in CDF values at those points.
Loading interactive demo...
CDF-PMF-PDF Relationships
The CDF is intimately connected to PMF and PDF through summation/integration and their inverses.
For Discrete Random Variables
CDF from PMF
Sum up all PMF values at or below x
PMF from CDF
PMF at k equals the jump size in CDF at k
For Continuous Random Variables
CDF from PDF
Cumulative area under the PDF curve up to x
PDF from CDF
PDF is the derivative (slope) of the CDF
Fundamental Relationship: Integration and differentiation are inverse operations. The CDF is the integral of the PDF, and the PDF is the derivative of the CDF.
Computing Probabilities with CDF
The CDF makes computing interval probabilities trivial. Here are the key formulas:
At Most x (Directly from CDF)
Greater Than x (Complement)
In an Interval (Subtraction)
Exactly x (For Discrete RVs)
Interactive: Probability Calculator
Use the CDF to compute interval probabilities. Drag the bounds a and b to see how.
Loading interactive demo...
Quantile Function (Inverse CDF)
The quantile function (also called the inverse CDF or percent-point function) answers the reverse question: "What value x has cumulative probability p?"
Definition: Quantile Function
In words: Q(p) = the smallest x such that F(x) ≥ p.
Key Percentiles
| Percentile | p | Meaning |
|---|---|---|
| 25th (Q1) | 0.25 | First quartile—25% of values below |
| 50th (Median) | 0.50 | Middle value—50% below, 50% above |
| 75th (Q3) | 0.75 | Third quartile—75% of values below |
| 95th | 0.95 | Only 5% of values exceed this |
| 99th | 0.99 | Extreme upper tail—only 1% exceed |
Applications of Quantile Function
Confidence Intervals
95% CI: [Q(0.025), Q(0.975)]
IQR (Interquartile Range)
IQR = Q(0.75) - Q(0.25)
Random Sampling
X = Q(U), U ~ Uniform(0,1)
Box Plots
Built from Q1, median, Q3
Interactive: CDF vs Quantile
Explore the relationship between CDF and quantile function. They are inverses of each other: F(Q(p)) = p and Q(F(x)) = x.
Loading interactive demo...
Empirical CDF
The Empirical CDF (ECDF) estimates the true CDF from observed data. It's a step function that jumps by 1/n at each data point.
Definition: Empirical CDF
In words: Proportion of observed values that are ≤ x.
Key Properties of ECDF
Glivenko-Cantelli Theorem
As n → ∞, the ECDF converges uniformly to the true CDF: sup|F̂ₙ(x) - F(x)| → 0 almost surely.
Kolmogorov-Smirnov Test
Tests if data comes from a specific distribution by measuring max difference between ECDF and theoretical CDF.
Interactive: Empirical CDF Builder
Generate random samples and watch the ECDF being built step by step. See how it converges to the true CDF as sample size increases.
Loading interactive demo...
Survival Function
The Survival Function (also called the Reliability Function) is simply the complement of the CDF. It's widely used in reliability engineering, medical statistics, and machine learning for time-to-event modeling.
Definition: Survival Function
In words: Probability of "surviving" (exceeding) value x.
Related: Hazard Rate
Hazard Rate (Instantaneous Failure Rate)
The "risk" of failing at time x, given survival up to x.
Interactive: Survival & Hazard
Explore the relationship between CDF, survival function, and hazard rate. See how different distributions have different hazard behaviors.
Loading interactive demo...
Interactive: CDF Comparison Tool
Overlay multiple distributions' CDFs to compare their shapes. Understand how different distributions accumulate probability differently.
Loading interactive demo...
Common CDFs Gallery
Explore the CDFs of common distributions. Notice how each has a characteristic shape that reflects the underlying probability structure.
Loading interactive demo...
CDF Formulas Reference
A comprehensive reference table of CDF formulas, inverse CDFs, and key properties for common distributions.
Loading interactive demo...
Worked Examples
Example 1: Computing CDF from PMF
Problem: For a fair die roll X, compute F(3).
PMF: p(k) = 1/6 for k = 1, 2, 3, 4, 5, 6
F(3) = P(X ≤ 3) = p(1) + p(2) + p(3)
F(3) = 1/6 + 1/6 + 1/6 = 3/6 = 0.5
Interpretation: There's a 50% chance of rolling 3 or less.
Example 2: Interval Probability from CDF
Problem: For X ~ Normal(100, 15²), given F(85) = 0.1587 and F(115) = 0.8413, find P(85 < X ≤ 115).
P(85 < X ≤ 115) = F(115) - F(85)
P(85 < X ≤ 115) = 0.8413 - 0.1587 = 0.6826
Interpretation: About 68% of values fall within one standard deviation of the mean.
Example 3: Finding Percentiles
Problem: For X ~ Exponential(λ = 0.5), find the median (50th percentile).
CDF: F(x) = 1 - e^(-0.5x)
Set F(x) = 0.5: 1 - e^(-0.5x) = 0.5
e^(-0.5x) = 0.5
-0.5x = ln(0.5) = -0.693
x = 0.693 / 0.5 = 1.386
Interpretation: Half of the values are below 1.386, half are above.
Real-World Examples
📏 Height Distribution
Question: What % of people are shorter than 180 cm?
Answer: F(180) for Normal(170, 10²) ≈ 0.84 = 84%
ML Use: Percentile normalization, anomaly detection
⏱️ Waiting Time
Question: Probability of waiting ≤ 5 minutes?
Answer: F(5) for Exp(λ=0.2) = 1 - e⁻¹ ≈ 0.63
ML Use: SLA monitoring, queue prediction
🎯 Quality Control
Question: Probability of ≤ 2 defects per batch?
Answer: F(2) for Poisson(λ=1.5) ≈ 0.81
ML Use: Process monitoring, threshold setting
📊 Stock Returns
Question: Probability of losing ≤ 10%?
Answer: F(-10%) for Normal(μ, σ) = tail probability
ML Use: Value at Risk (VaR), risk assessment
AI/ML Applications
The CDF and its inverse (quantile function) are fundamental tools in machine learning. Here are the key applications:
1. Inverse Transform Sampling
Core Idea: Generate samples from ANY distribution using only uniform random numbers
Why it works: If X has CDF F, then F(X) ~ Uniform(0, 1). The inverse transform reverses this relationship!
Used in: Monte Carlo simulation, rejection sampling, importance sampling
2. Probability Calibration
Problem: ML model outputs aren't true probabilities
Solution: Use CDFs to transform model scores to calibrated probabilities. Isotonic regression and Platt scaling both use CDF-like transformations.
3. VAE Reparameterization Trick
The Problem: Can't backpropagate through random sampling
This is a special case of inverse transform sampling for the normal distribution! The randomness is separated from the parameters.
4. Normalizing Flows
Core Concept: Chain of invertible transformations
Normalizing flows use CDF transformations (and their Jacobians) to transform simple distributions into complex ones while maintaining tractable likelihoods.
5. Quantile Regression
Beyond Mean Prediction: Predict conditional quantiles
Instead of predicting E[Y|X], predict Q(p|X) for various p. This gives full uncertainty characterization, not just point estimates.
Interactive: Inverse Transform Sampling
Watch inverse transform sampling in action. Generate uniform random numbers, trace them horizontally to the CDF curve, then drop vertically to get samples from the target distribution.
Loading interactive demo...
Numerical Methods
Many CDFs don't have closed-form expressions. Here's how they're computed in practice:
Normal CDF: No Closed Form!
The integral cannot be expressed in terms of elementary functions. We use:
- Taylor series expansions
- Continued fraction approximations
- Rational polynomial approximations (Hart's method)
- Lookup tables with interpolation
Historical: Z-Tables
Before computers, statisticians used printed tables of Φ(z) values. These tables were painstakingly computed by hand and are still found in statistics textbooks.
Modern Implementations
Libraries like scipy.stats use highly optimized numerical algorithms that achieve 15+ digits of precision in microseconds.
Python Implementation
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5# ============================================
6# EXAMPLE 1: CDF Evaluation
7# ============================================
8
9# Standard normal CDF
10normal = stats.norm(loc=0, scale=1)
11
12# P(X ≤ 0) = 0.5 (symmetric around 0)
13print(f"P(X ≤ 0) = {normal.cdf(0):.4f}") # 0.5000
14
15# P(X ≤ 1.96) ≈ 0.975 (the famous 95% CI bound)
16print(f"P(X ≤ 1.96) = {normal.cdf(1.96):.4f}") # 0.9750
17
18# P(X ≤ -1) ≈ 0.159
19print(f"P(X ≤ -1) = {normal.cdf(-1):.4f}") # 0.1587
20
21# ============================================
22# EXAMPLE 2: Interval Probabilities
23# ============================================
24
25# P(-1 ≤ X ≤ 1) = F(1) - F(-1)
26prob_interval = normal.cdf(1) - normal.cdf(-1)
27print(f"P(-1 ≤ X ≤ 1) = {prob_interval:.4f}") # 0.6827 (the 68-95-99.7 rule!)
28
29# P(X > 2) = 1 - F(2)
30prob_greater = 1 - normal.cdf(2)
31print(f"P(X > 2) = {prob_greater:.4f}") # 0.0228
32
33# ============================================
34# EXAMPLE 3: Inverse CDF (Quantile Function)
35# ============================================
36
37# Q(0.5) = median = 0 for standard normal
38print(f"Median: Q(0.5) = {normal.ppf(0.5):.4f}") # 0.0000
39
40# Q(0.975) = 1.96 (upper bound for 95% CI)
41print(f"Q(0.975) = {normal.ppf(0.975):.4f}") # 1.9600
42
43# 95% confidence interval: [Q(0.025), Q(0.975)]
44ci_lower = normal.ppf(0.025)
45ci_upper = normal.ppf(0.975)
46print(f"95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]") # [-1.96, 1.96]
47
48# ============================================
49# EXAMPLE 4: Inverse Transform Sampling
50# ============================================
51
52np.random.seed(42)
53
54# Generate uniform random numbers
55n_samples = 10000
56u = np.random.uniform(0, 1, size=n_samples)
57
58# Transform to standard normal using inverse CDF
59samples_normal = normal.ppf(u)
60
61# Verify: mean should be ~0, std should be ~1
62print(f"Sample mean: {samples_normal.mean():.4f}") # ~0
63print(f"Sample std: {samples_normal.std():.4f}") # ~1
64
65# This is equivalent to np.random.normal(0, 1, n_samples)!
66
67# ============================================
68# EXAMPLE 5: Empirical CDF
69# ============================================
70
71from statsmodels.distributions.empirical_distribution import ECDF
72
73# Generate some data
74data = np.random.normal(0, 1, 1000)
75
76# Build ECDF
77ecdf = ECDF(data)
78
79# Evaluate ECDF at specific points
80print(f"ECDF(0) = {ecdf(0):.4f}") # ~0.5
81print(f"ECDF(1.96) = {ecdf(1.96):.4f}") # ~0.975
82
83# ============================================
84# EXAMPLE 6: Kolmogorov-Smirnov Test
85# ============================================
86
87from scipy.stats import kstest
88
89# Test if data comes from standard normal
90ks_stat, p_value = kstest(data, 'norm')
91print(f"KS statistic: {ks_stat:.4f}")
92print(f"P-value: {p_value:.4f}")
93
94# ============================================
95# EXAMPLE 7: Survival Function
96# ============================================
97
98# Exponential survival function
99exp_dist = stats.expon(scale=1/0.5) # λ = 0.5
100
101# P(X > 2) using survival function
102print(f"P(X > 2) = {exp_dist.sf(2):.4f}") # 0.3679
103
104# Verify: sf(x) = 1 - cdf(x)
105print(f"1 - F(2) = {1 - exp_dist.cdf(2):.4f}") # Same!
106
107# ============================================
108# EXAMPLE 8: VAE Reparameterization Trick
109# ============================================
110
111def reparameterize(mu, log_var):
112 """
113 VAE reparameterization trick.
114 Sample z ~ N(mu, exp(log_var)) using inverse transform idea.
115 """
116 std = np.exp(0.5 * log_var)
117 eps = np.random.normal(0, 1, size=mu.shape) # Standard normal samples
118 return mu + std * eps # This IS inverse transform sampling!
119
120# Example: encoder outputs mu=2, log_var=0.5
121mu = np.array([2.0])
122log_var = np.array([0.5])
123z = reparameterize(mu, log_var)
124print(f"Sampled z: {z[0]:.4f}")
125
126# ============================================
127# EXAMPLE 9: Probability Calibration Check
128# ============================================
129
130def reliability_diagram(y_true, y_pred_proba, n_bins=10):
131 """Check if predicted probabilities are calibrated."""
132 bin_edges = np.linspace(0, 1, n_bins + 1)
133 bin_means = []
134 bin_true_fractions = []
135
136 for i in range(n_bins):
137 mask = (y_pred_proba >= bin_edges[i]) & (y_pred_proba < bin_edges[i+1])
138 if mask.sum() > 0:
139 bin_means.append(y_pred_proba[mask].mean())
140 bin_true_fractions.append(y_true[mask].mean())
141
142 return np.array(bin_means), np.array(bin_true_fractions)
143
144# ============================================
145# EXAMPLE 10: A/B Testing with CDF
146# ============================================
147
148# Two-sample t-test p-value uses t-distribution CDF
149from scipy.stats import ttest_ind
150
151group_a = np.random.normal(10.0, 2.0, 100) # Control
152group_b = np.random.normal(10.5, 2.0, 100) # Treatment
153
154t_stat, p_value = ttest_ind(group_a, group_b)
155print(f"t-statistic: {t_stat:.4f}")
156print(f"p-value: {p_value:.4f}")
157
158# The p-value is computed using CDF of t-distribution!
159# p-value = 2 * (1 - F(|t|)) for two-tailed test
160t_dist = stats.t(df=198) # degrees of freedom
161manual_pvalue = 2 * (1 - t_dist.cdf(abs(t_stat)))
162print(f"Manual p-value: {manual_pvalue:.4f}")Common Pitfalls
The CDF gives cumulative probability P(X ≤ x), not density or mass at a point. F(x) can never decrease, while f(x) or p(x) can go up and down.
For discrete RVs: !
Be careful: P(X ≤ 3) includes P(X = 3), but P(X < 3) does not.
is only defined for p ∈ (0, 1). Q(0) may be -∞ and Q(1) may be +∞ for unbounded distributions!
but (limit from the left). For continuous RVs they're equal, but for discrete RVs they differ!
The normal CDF has no closed-form inverse! We use numerical approximations. Only some distributions (exponential, uniform, Cauchy) have analytical inverses.
Practice Problems
Test your understanding with these practice problems. Try solving them before revealing the solutions!
Loading interactive demo...
Test Your Understanding
Loading interactive demo...
Key Takeaways
- CDF = Cumulative Probability: F(x) = P(X ≤ x) tells you the probability of being at or below x—a "running total" of probability.
- Universal Tool: Unlike PMF (discrete only) or PDF (continuous only), the CDF works for ALL random variables—discrete, continuous, and mixed.
- Four Properties: F(-∞) = 0, F(+∞) = 1, non-decreasing, and right-continuous. These guarantee a valid probability measure.
- Discrete = Steps, Continuous = Smooth: Step functions (staircase) for discrete RVs; smooth S-curves for continuous RVs.
- Interval Probabilities: P(a < X ≤ b) = F(b) - F(a). No need to sum or integrate—just subtract CDF values!
- Inverse CDF = Quantile Function: Q(p) = F⁻¹(p) gives the value where cumulative probability equals p. Essential for percentiles and confidence intervals.
- Inverse Transform Sampling: Generate X = F⁻¹(U) where U ~ Uniform(0,1) to sample from any distribution. Foundation for Monte Carlo methods!
- Empirical CDF: F̂ₙ(x) = (# samples ≤ x) / n. Converges to true CDF as n → ∞ (Glivenko-Cantelli theorem).
- Survival Function: S(x) = 1 - F(x) = P(X > x). Used extensively in reliability engineering and survival analysis.
Connections to Other Topics
→ Chapter 3: Joint Distributions
The joint CDF F(x, y) = P(X ≤ x, Y ≤ y) extends these concepts to multiple random variables.
→ Chapter 5: Central Limit Theorem
The CLT states that CDFs of normalized sums converge to the normal CDF, regardless of the original distribution.
→ Hypothesis Testing
P-values are computed using CDFs of test statistics. The K-S test directly compares empirical and theoretical CDFs.
→ Next Section: Mixed RVs
Mixed random variables have CDFs with both jumps AND continuous parts—the CDF is the perfect tool to describe them!
Next Up: In the next section, we'll explore Mixed Random Variables—random variables that have both discrete "jumps" and continuous "smooth" parts. The CDF is the perfect tool to describe these hybrid distributions!