Chapter 2
35 min read
Section 5 of 6

Cumulative Distribution Functions

Random Variables

🧠

Mental Model

The CDF of a random variable XX, denoted F(x)F(x), gives the probability that XX takes a value less than or equal to xx. The key intuition: it accumulates probability from the left up to the point xx.

F(x)=P(Xleqx)F(x) = P(X leq x)

"How much probability mass lies to the left of (and including) xx?"

What the CDF Tells Us:
F(x)F(x) is the probability of XleqxX leq x
Completely characterizes the distribution
Non-decreasing function from 0 to 1
Interval probabilities: P(a<Xleqb)=F(b)F(a)P(a < X leq b) = F(b) - F(a)
Why CDF is More Fundamental than PDF:
Universal: CDF exists for all random variables (discrete, continuous, mixed). PDF only exists for continuous; discrete uses PMF instead.
Direct probabilities: CDF gives probabilities directly. PDF requires integration over an interval.
Point masses: CDF handles discrete probability via jumps. For continuous RVs, P(X=x)=0P(X = x) = 0, but CDF works seamlessly for both.
Always exists: CDF uniquely determines any distribution. PDF may not exist (e.g., discrete or singular continuous distributions).

The CDF is the universal language of probability distributions—it works where PMF and PDF cannot.

Loading interactive demo...

Loading interactive demo...

Learning Objectives

By the end of this section, you will:

1Conceptual Foundation
  • Define the CDF as cumulative probability: F(x)=P(Xleqx)F(x) = P(X leq x)
2Construction
  • Build CDFs from PMF (discrete → step function): F(x)=sumkleqxp(k)F(x) = sum_{k leq x} p(k)
  • Build CDFs from PDF (continuous → integral): F(x)=xf(t)dtF(x) = \int_{-\infty}^{x} f(t) \, dt
3Structural Properties (Why CDFs Are Powerful)
  • Understand limits at ±\pm\infty, monotonicity, right-continuity, and jump sizes = point masses
4Operational Use
  • Compute interval probabilities: P(a<Xleqb)=F(b)F(a)P(a < X leq b) = F(b) - F(a)
5Inverse Thinking
  • Master the quantile function (inverse CDF), percentiles, and threshold selection
6Data-Driven View
  • Build empirical CDFs from real data and understand convergence (Glivenko-Cantelli)
7Reliability & Time-to-Event
  • Apply survival function S(x)=1F(x)S(x) = 1 - F(x) and hazard rate intuition
8AI/ML Bridge
  • Apply to sampling, calibration, and uncertainty modeling in modern ML systems

⚠️ Common Misconceptions

"A CDF is a density"

No—CDF is cumulative probability, not probability density. CDF values are probabilities (0 to 1), while PDF values can exceed 1.

"A flat CDF region means no probability"

Flat regions mean no additional probability in that interval—not zero probability overall. The CDF is constant where no probability mass exists.

"A jump in CDF is an error"

Jumps indicate discrete probability mass at a point. The jump size equals P(X=x)P(X = x). This is normal for discrete and mixed distributions.

Where You'll Apply This Knowledge:
🔁 Sampling & Generative Models
Inverse Transform Sampling

→ Convert uniform noise into samples from any distribution using the inverse CDF

Normalizing Flows

→ Learn invertible mappings whose Jacobian relates densities—CDF intuition underlies monotonic transforms

📊 Statistical Decision Making
Percentile / Quantile Computation

→ CDF inversion directly defines medians, quartiles, and tail risk

Confidence Intervals

→ Intervals are defined by CDF probability mass, not point estimates

🧪 Experiments & Testing
A/B Testing Statistical Significance

→ p-values are tail probabilities computed from a test statistic's CDF

Threshold Selection for Classification

→ ROC curves and decision thresholds depend on cumulative distributions of scores

Reliability & Time-to-Event Modeling
Survival Analysis in ML

→ Survival function is directly derived: S(t)=1F(t)S(t) = 1 - F(t)

Hazard Rate Modeling

→ Measures instantaneous failure probability given survival—CDF encodes history

🤖 Probabilistic ML & Representation Learning
Probability Calibration

→ Calibration checks whether predicted probabilities match empirical CDFs

VAE Reparameterization Trick

→ Sampling from latent distributions relies on transforming noise via learned CDF-like mappings

Unifying perspective: CDFs are the bridge between probability theory, statistical inference, and modern generative modeling—turning uncertainty into geometry.


Historical Context

The Quest for a Universal Tool

Throughout the 18th and 19th centuries, mathematicians faced a recurring challenge: every time they wanted to compute P(Xleqx)P(X leq x), they had to sum (for discrete) or integrate (for continuous) from the beginning.

The Core Need:

  • Abraham de Moivre (1718): First tabulated cumulative normal probabilities
  • Pierre-Simon Laplace (1812): Formalized integration for continuous cases
  • Andrey Kolmogorov (1933): Made CDF the fundamental object in probability theory

Kolmogorov's Insight: Instead of defining probability through PMF (discrete) or PDF (continuous) separately, define it through a single object—the CDF—that works for all random variables: discrete, continuous, and even mixed!

📊
Discrete
Step function
📈
Continuous
Smooth curve
🔗
Mixed
CDF handles all!

The Problem CDF Solves

Consider how often we ask questions like: "What's the probability of gettingat most this value?"

QuestionMathematical Form
Probability of waiting ≤ 5 minutes?P(T ≤ 5)
Chance of scoring ≤ 80 on the test?P(X ≤ 80)
Likelihood of temperature ≤ 30°C?P(T ≤ 30)
Probability of ≤ 3 defects?P(N ≤ 3)

Without CDF, we'd compute each answer by summing or integrating from scratch:

❌ Without CDF

For discrete: P(Xleq3)=sumk=03p(k)P(X leq 3) = sum_{k=0}^{3} p(k)

For continuous: P(X3)=3f(t)dtP(X \leq 3) = \int_{-\infty}^{3} f(t) \, dt

Recompute the entire sum/integral each time!

✓ With CDF

For any distribution:

P(Xleq3)=F(3)P(X leq 3) = F(3)

Just look up the value! Pre-computed cumulative probability.

Key Insight: The CDF is like a "running total" of probability. It tells you how much probability has accumulated up to any point x—no recalculation needed!

Interactive: CDF Visualizer

Explore how the CDF accumulates probability as x increases. Toggle between discrete and continuous distributions to see step functions vs smooth curves.

Loading interactive demo...


Formal Definition

Definition: Cumulative Distribution Function (CDF)

The cumulative distribution function of a random variable X is defined as:

F(x)=P(Xx)extforallxRF(x) = P(X \leq x) \quad ext{for all } x \in \mathbb{R}

In words: F(x) = probability that X is at most x.

Symbol Reference

SymbolNameIntuitive Meaning
F(x)CDF at xProbability that X takes a value ≤ x
P(X ≤ x)Cumulative probabilityAll probability 'to the left of x'
F(b) - F(a)Interval probabilityP(a < X ≤ b) — probability between a and b
F⁻¹(p)Quantile functionThe x value where F(x) = p (inverse CDF)
S(x) = 1 - F(x)Survival functionProbability of exceeding x

Intuitive Statement

What the CDF tells us: "If I pick a random value of X, what's the probability it's at most x?"

Think of F(x) as a probability meter that starts at 0 and gradually fills up to 1 as you move from left to right along the number line.


Four Essential Properties

Every valid CDF must satisfy exactly four properties. These aren't arbitrary—each reflects a fundamental truth about probability!

1Limits at Infinity

limxoF(x)=0extandlimxo+F(x)=1\lim_{x o -\infty} F(x) = 0 \quad ext{and} \quad \lim_{x o +\infty} F(x) = 1

Why: No probability below negative infinity (0%); all probability is accounted for by positive infinity (100%).

2Monotonically Non-Decreasing

extIfa<b,extthenF(a)F(b)ext{If } a < b, ext{ then } F(a) \leq F(b)

Why: As x increases, we can only accumulate more probability, never less. The running total never decreases!

3Right-Continuous

limho0+F(x+h)=F(x)\lim_{h o 0^+} F(x + h) = F(x)

Why: The definition P(Xleqx)P(X leq x) includes the point x itself. Approaching from the right gives the same value.

4Jump Size = Point Probability

F(x)limho0+F(xh)=P(X=x)F(x) - \lim_{h o 0^+} F(x-h) = P(X = x)

Why: For discrete RVs, jumps occur at each possible value. For continuous RVs, there are no jumps (P(X = x) = 0).

Memory Aid: Think of CDF as climbing a staircase (discrete) or a ramp (continuous) from ground level (0) to the top floor (1). You can only go up, never down!

Interactive: Properties Explorer

Explore each property interactively. See what happens when properties are violated—it's no longer a valid CDF!

Loading interactive demo...


Discrete vs Continuous CDFs

The CDF looks fundamentally different depending on whether the random variable is discrete or continuous.

Discrete CDF: Step Function
F(x)=sumkleqxP(X=k)=sumkleqxp(k)F(x) = sum_{k leq x} P(X = k) = sum_{k leq x} p(k)
  • Shape: Staircase pattern
  • Jumps: At each possible value k
  • Jump height: Equals P(X = k)
  • Flat regions: Between possible values
Continuous CDF: Smooth Curve
F(x)=xf(t)dtF(x) = \int_{-\infty}^{x} f(t) \, dt
  • Shape: Smooth, continuous curve
  • Jumps: None (no discontinuities)
  • Slope: Equals the PDF at each point
  • Inflection: Where PDF peaks, CDF steepest
AspectDiscrete CDFContinuous CDF
Visual shapeStaircase (step function)Smooth S-curve
Jumps/discontinuitiesYes, at each possible valueNo jumps (continuous)
P(X = x)= F(x) - F(x⁻) > 0= 0 always
Computed fromSum of PMF valuesIntegral of PDF
Derivative exists?No (at jumps)Yes, F'(x) = f(x)

Interactive: CDF from PMF/PDF

Watch the CDF being constructed from the PMF (discrete) or PDF (continuous). See how summation creates steps and integration creates smooth curves.

Loading interactive demo...


Interactive: PDF Area = CDF Difference

This visualization shows the fundamental relationship: the area under the PDF between two points equals the difference in CDF values at those points.

Loading interactive demo...


CDF-PMF-PDF Relationships

The CDF is intimately connected to PMF and PDF through summation/integration and their inverses.

For Discrete Random Variables

CDF from PMF
F(x)=sumkleqxp(k)F(x) = sum_{k leq x} p(k)

Sum up all PMF values at or below x

PMF from CDF
p(k)=F(k)F(k)p(k) = F(k) - F(k^-)

PMF at k equals the jump size in CDF at k

For Continuous Random Variables

CDF from PDF
F(x)=xf(t)dtF(x) = \int_{-\infty}^{x} f(t) \, dt

Cumulative area under the PDF curve up to x

PDF from CDF
f(x) = rac{d}{dx} F(x) = F'(x)

PDF is the derivative (slope) of the CDF

Fundamental Relationship: Integration and differentiation are inverse operations. The CDF is the integral of the PDF, and the PDF is the derivative of the CDF.

Computing Probabilities with CDF

The CDF makes computing interval probabilities trivial. Here are the key formulas:

At Most x (Directly from CDF)
P(Xleqx)=F(x)P(X leq x) = F(x)
Greater Than x (Complement)
P(X>x)=1F(x)P(X > x) = 1 - F(x)
In an Interval (Subtraction)
P(a<Xleqb)=F(b)F(a)P(a < X leq b) = F(b) - F(a)
Exactly x (For Discrete RVs)
P(X=x)=F(x)F(x)=extjumpatxP(X = x) = F(x) - F(x^-) = ext{jump at } x
Continuous vs Discrete Intervals: For continuous RVs,P(aleqXleqb)=P(a<Xleqb)=P(a<X<b)=P(aleqX<b)P(a leq X leq b) = P(a < X leq b) = P(a < X < b) = P(a leq X < b)because P(X=a)=P(X=b)=0P(X = a) = P(X = b) = 0. But for discrete RVs, you must be careful about endpoint inclusion!

Interactive: Probability Calculator

Use the CDF to compute interval probabilities. Drag the bounds a and b to see howP(a<Xleqb)=F(b)F(a)P(a < X leq b) = F(b) - F(a).

Loading interactive demo...


Quantile Function (Inverse CDF)

The quantile function (also called the inverse CDF or percent-point function) answers the reverse question: "What value x has cumulative probability p?"

Definition: Quantile Function

Q(p)=F1(p)=inf{x:F(x)p}extforp(0,1)Q(p) = F^{-1}(p) = \inf\{x : F(x) \geq p\} \quad ext{for } p \in (0, 1)

In words: Q(p) = the smallest x such that F(x) ≥ p.

Key Percentiles

PercentilepMeaning
25th (Q1)0.25First quartile—25% of values below
50th (Median)0.50Middle value—50% below, 50% above
75th (Q3)0.75Third quartile—75% of values below
95th0.95Only 5% of values exceed this
99th0.99Extreme upper tail—only 1% exceed

Applications of Quantile Function

Confidence Intervals

95% CI: [Q(0.025), Q(0.975)]

IQR (Interquartile Range)

IQR = Q(0.75) - Q(0.25)

Random Sampling

X = Q(U), U ~ Uniform(0,1)

Box Plots

Built from Q1, median, Q3


Interactive: CDF vs Quantile

Explore the relationship between CDF and quantile function. They are inverses of each other: F(Q(p)) = p and Q(F(x)) = x.

Loading interactive demo...


Empirical CDF

The Empirical CDF (ECDF) estimates the true CDF from observed data. It's a step function that jumps by 1/n at each data point.

Definition: Empirical CDF

\hat{F}_n(x) = rac{1}{n} \sum_{i=1}^{n} \mathbf{1}(X_i \leq x) = rac{\#\{X_i \leq x\}}{n}

In words: Proportion of observed values that are ≤ x.

Key Properties of ECDF

Glivenko-Cantelli Theorem

As n → ∞, the ECDF converges uniformly to the true CDF: sup|F̂ₙ(x) - F(x)| → 0 almost surely.

Kolmogorov-Smirnov Test

Tests if data comes from a specific distribution by measuring max difference between ECDF and theoretical CDF.


Interactive: Empirical CDF Builder

Generate random samples and watch the ECDF being built step by step. See how it converges to the true CDF as sample size increases.

Loading interactive demo...


Survival Function

The Survival Function (also called the Reliability Function) is simply the complement of the CDF. It's widely used in reliability engineering, medical statistics, and machine learning for time-to-event modeling.

Definition: Survival Function

S(x)=P(X>x)=1F(x)S(x) = P(X > x) = 1 - F(x)

In words: Probability of "surviving" (exceeding) value x.

Hazard Rate (Instantaneous Failure Rate)
h(x) = rac{f(x)}{S(x)} = rac{f(x)}{1 - F(x)}

The "risk" of failing at time x, given survival up to x.

Constant hazard: Exponential (memoryless)
Increasing hazard: Aging/wear-out
Decreasing hazard: Infant mortality

Interactive: Survival & Hazard

Explore the relationship between CDF, survival function, and hazard rate. See how different distributions have different hazard behaviors.

Loading interactive demo...


Interactive: CDF Comparison Tool

Overlay multiple distributions' CDFs to compare their shapes. Understand how different distributions accumulate probability differently.

Loading interactive demo...


Common CDFs Gallery

Explore the CDFs of common distributions. Notice how each has a characteristic shape that reflects the underlying probability structure.

Loading interactive demo...


CDF Formulas Reference

A comprehensive reference table of CDF formulas, inverse CDFs, and key properties for common distributions.

Loading interactive demo...


Worked Examples

Example 1: Computing CDF from PMF

Problem: For a fair die roll X, compute F(3).

PMF: p(k) = 1/6 for k = 1, 2, 3, 4, 5, 6

F(3) = P(X ≤ 3) = p(1) + p(2) + p(3)

F(3) = 1/6 + 1/6 + 1/6 = 3/6 = 0.5

Interpretation: There's a 50% chance of rolling 3 or less.

Example 2: Interval Probability from CDF

Problem: For X ~ Normal(100, 15²), given F(85) = 0.1587 and F(115) = 0.8413, find P(85 < X ≤ 115).

P(85 < X ≤ 115) = F(115) - F(85)

P(85 < X ≤ 115) = 0.8413 - 0.1587 = 0.6826

Interpretation: About 68% of values fall within one standard deviation of the mean.

Example 3: Finding Percentiles

Problem: For X ~ Exponential(λ = 0.5), find the median (50th percentile).

CDF: F(x) = 1 - e^(-0.5x)

Set F(x) = 0.5: 1 - e^(-0.5x) = 0.5

e^(-0.5x) = 0.5

-0.5x = ln(0.5) = -0.693

x = 0.693 / 0.5 = 1.386

Interpretation: Half of the values are below 1.386, half are above.


Real-World Examples

📏 Height Distribution

Question: What % of people are shorter than 180 cm?

Answer: F(180) for Normal(170, 10²) ≈ 0.84 = 84%

ML Use: Percentile normalization, anomaly detection

⏱️ Waiting Time

Question: Probability of waiting ≤ 5 minutes?

Answer: F(5) for Exp(λ=0.2) = 1 - e⁻¹ ≈ 0.63

ML Use: SLA monitoring, queue prediction

🎯 Quality Control

Question: Probability of ≤ 2 defects per batch?

Answer: F(2) for Poisson(λ=1.5) ≈ 0.81

ML Use: Process monitoring, threshold setting

📊 Stock Returns

Question: Probability of losing ≤ 10%?

Answer: F(-10%) for Normal(μ, σ) = tail probability

ML Use: Value at Risk (VaR), risk assessment


AI/ML Applications

The CDF and its inverse (quantile function) are fundamental tools in machine learning. Here are the key applications:

1. Inverse Transform Sampling

Core Idea: Generate samples from ANY distribution using only uniform random numbers

X=F1(U)extwhereUextUniform(0,1)X = F^{-1}(U) \quad ext{where } U \sim ext{Uniform}(0, 1)

Why it works: If X has CDF F, then F(X) ~ Uniform(0, 1). The inverse transform reverses this relationship!

Used in: Monte Carlo simulation, rejection sampling, importance sampling

2. Probability Calibration

Problem: ML model outputs aren't true probabilities

Solution: Use CDFs to transform model scores to calibrated probabilities. Isotonic regression and Platt scaling both use CDF-like transformations.

3. VAE Reparameterization Trick

The Problem: Can't backpropagate through random sampling

z=mu+sigmacdotepsilon,quadepsilonsimN(0,1)z = mu + sigma cdot epsilon, quad epsilon sim N(0, 1)

This is a special case of inverse transform sampling for the normal distribution! The randomness is separated from the parameters.

4. Normalizing Flows

Core Concept: Chain of invertible transformations

Normalizing flows use CDF transformations (and their Jacobians) to transform simple distributions into complex ones while maintaining tractable likelihoods.

5. Quantile Regression

Beyond Mean Prediction: Predict conditional quantiles

Instead of predicting E[Y|X], predict Q(p|X) for various p. This gives full uncertainty characterization, not just point estimates.

Bottom Line: Understanding CDFs and inverse transform sampling is essential for implementing generative models, computing confidence intervals, and making probabilistic predictions in production ML systems.

Interactive: Inverse Transform Sampling

Watch inverse transform sampling in action. Generate uniform random numbers, trace them horizontally to the CDF curve, then drop vertically to get samples from the target distribution.

Loading interactive demo...


Numerical Methods

Many CDFs don't have closed-form expressions. Here's how they're computed in practice:

Normal CDF: No Closed Form!

The integral intinftyxet2/2dtint_{-infty}^{x} e^{-t^2/2} dt cannot be expressed in terms of elementary functions. We use:

  • Taylor series expansions
  • Continued fraction approximations
  • Rational polynomial approximations (Hart's method)
  • Lookup tables with interpolation
Historical: Z-Tables

Before computers, statisticians used printed tables of Φ(z) values. These tables were painstakingly computed by hand and are still found in statistics textbooks.

Modern Implementations

Libraries like scipy.stats use highly optimized numerical algorithms that achieve 15+ digits of precision in microseconds.


Python Implementation

🐍python
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5# ============================================
6# EXAMPLE 1: CDF Evaluation
7# ============================================
8
9# Standard normal CDF
10normal = stats.norm(loc=0, scale=1)
11
12# P(X ≤ 0) = 0.5 (symmetric around 0)
13print(f"P(X ≤ 0) = {normal.cdf(0):.4f}")  # 0.5000
14
15# P(X ≤ 1.96) ≈ 0.975 (the famous 95% CI bound)
16print(f"P(X ≤ 1.96) = {normal.cdf(1.96):.4f}")  # 0.9750
17
18# P(X ≤ -1) ≈ 0.159
19print(f"P(X ≤ -1) = {normal.cdf(-1):.4f}")  # 0.1587
20
21# ============================================
22# EXAMPLE 2: Interval Probabilities
23# ============================================
24
25# P(-1 ≤ X ≤ 1) = F(1) - F(-1)
26prob_interval = normal.cdf(1) - normal.cdf(-1)
27print(f"P(-1 ≤ X ≤ 1) = {prob_interval:.4f}")  # 0.6827 (the 68-95-99.7 rule!)
28
29# P(X > 2) = 1 - F(2)
30prob_greater = 1 - normal.cdf(2)
31print(f"P(X > 2) = {prob_greater:.4f}")  # 0.0228
32
33# ============================================
34# EXAMPLE 3: Inverse CDF (Quantile Function)
35# ============================================
36
37# Q(0.5) = median = 0 for standard normal
38print(f"Median: Q(0.5) = {normal.ppf(0.5):.4f}")  # 0.0000
39
40# Q(0.975) = 1.96 (upper bound for 95% CI)
41print(f"Q(0.975) = {normal.ppf(0.975):.4f}")  # 1.9600
42
43# 95% confidence interval: [Q(0.025), Q(0.975)]
44ci_lower = normal.ppf(0.025)
45ci_upper = normal.ppf(0.975)
46print(f"95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")  # [-1.96, 1.96]
47
48# ============================================
49# EXAMPLE 4: Inverse Transform Sampling
50# ============================================
51
52np.random.seed(42)
53
54# Generate uniform random numbers
55n_samples = 10000
56u = np.random.uniform(0, 1, size=n_samples)
57
58# Transform to standard normal using inverse CDF
59samples_normal = normal.ppf(u)
60
61# Verify: mean should be ~0, std should be ~1
62print(f"Sample mean: {samples_normal.mean():.4f}")  # ~0
63print(f"Sample std: {samples_normal.std():.4f}")   # ~1
64
65# This is equivalent to np.random.normal(0, 1, n_samples)!
66
67# ============================================
68# EXAMPLE 5: Empirical CDF
69# ============================================
70
71from statsmodels.distributions.empirical_distribution import ECDF
72
73# Generate some data
74data = np.random.normal(0, 1, 1000)
75
76# Build ECDF
77ecdf = ECDF(data)
78
79# Evaluate ECDF at specific points
80print(f"ECDF(0) = {ecdf(0):.4f}")  # ~0.5
81print(f"ECDF(1.96) = {ecdf(1.96):.4f}")  # ~0.975
82
83# ============================================
84# EXAMPLE 6: Kolmogorov-Smirnov Test
85# ============================================
86
87from scipy.stats import kstest
88
89# Test if data comes from standard normal
90ks_stat, p_value = kstest(data, 'norm')
91print(f"KS statistic: {ks_stat:.4f}")
92print(f"P-value: {p_value:.4f}")
93
94# ============================================
95# EXAMPLE 7: Survival Function
96# ============================================
97
98# Exponential survival function
99exp_dist = stats.expon(scale=1/0.5)  # λ = 0.5
100
101# P(X > 2) using survival function
102print(f"P(X > 2) = {exp_dist.sf(2):.4f}")  # 0.3679
103
104# Verify: sf(x) = 1 - cdf(x)
105print(f"1 - F(2) = {1 - exp_dist.cdf(2):.4f}")  # Same!
106
107# ============================================
108# EXAMPLE 8: VAE Reparameterization Trick
109# ============================================
110
111def reparameterize(mu, log_var):
112    """
113    VAE reparameterization trick.
114    Sample z ~ N(mu, exp(log_var)) using inverse transform idea.
115    """
116    std = np.exp(0.5 * log_var)
117    eps = np.random.normal(0, 1, size=mu.shape)  # Standard normal samples
118    return mu + std * eps  # This IS inverse transform sampling!
119
120# Example: encoder outputs mu=2, log_var=0.5
121mu = np.array([2.0])
122log_var = np.array([0.5])
123z = reparameterize(mu, log_var)
124print(f"Sampled z: {z[0]:.4f}")
125
126# ============================================
127# EXAMPLE 9: Probability Calibration Check
128# ============================================
129
130def reliability_diagram(y_true, y_pred_proba, n_bins=10):
131    """Check if predicted probabilities are calibrated."""
132    bin_edges = np.linspace(0, 1, n_bins + 1)
133    bin_means = []
134    bin_true_fractions = []
135
136    for i in range(n_bins):
137        mask = (y_pred_proba >= bin_edges[i]) & (y_pred_proba < bin_edges[i+1])
138        if mask.sum() > 0:
139            bin_means.append(y_pred_proba[mask].mean())
140            bin_true_fractions.append(y_true[mask].mean())
141
142    return np.array(bin_means), np.array(bin_true_fractions)
143
144# ============================================
145# EXAMPLE 10: A/B Testing with CDF
146# ============================================
147
148# Two-sample t-test p-value uses t-distribution CDF
149from scipy.stats import ttest_ind
150
151group_a = np.random.normal(10.0, 2.0, 100)  # Control
152group_b = np.random.normal(10.5, 2.0, 100)  # Treatment
153
154t_stat, p_value = ttest_ind(group_a, group_b)
155print(f"t-statistic: {t_stat:.4f}")
156print(f"p-value: {p_value:.4f}")
157
158# The p-value is computed using CDF of t-distribution!
159# p-value = 2 * (1 - F(|t|)) for two-tailed test
160t_dist = stats.t(df=198)  # degrees of freedom
161manual_pvalue = 2 * (1 - t_dist.cdf(abs(t_stat)))
162print(f"Manual p-value: {manual_pvalue:.4f}")

Common Pitfalls

Pitfall 1: Confusing CDF with PDF/PMF

The CDF gives cumulative probability P(X ≤ x), not density or mass at a point. F(x) can never decrease, while f(x) or p(x) can go up and down.

Pitfall 2: Discrete Endpoint Confusion

For discrete RVs: P(aleqXleqb)eqP(a<Xleqb)P(a leq X leq b) eq P(a < X leq b)!
Be careful: P(X ≤ 3) includes P(X = 3), but P(X < 3) does not.

Pitfall 3: Inverse CDF Domain

Q(p)=F1(p)Q(p) = F^{-1}(p) is only defined for p ∈ (0, 1). Q(0) may be -∞ and Q(1) may be +∞ for unbounded distributions!

Pitfall 4: Forgetting Right-Continuity

P(Xleqx)=F(x)P(X leq x) = F(x) butP(X<x)=F(x)P(X < x) = F(x^-) (limit from the left). For continuous RVs they're equal, but for discrete RVs they differ!

Pitfall 5: Assuming All CDFs Have Closed-Form Inverses

The normal CDF has no closed-form inverse! We use numerical approximations. Only some distributions (exponential, uniform, Cauchy) have analytical inverses.


Practice Problems

Test your understanding with these practice problems. Try solving them before revealing the solutions!

Loading interactive demo...


Test Your Understanding

Loading interactive demo...


Key Takeaways

  1. CDF = Cumulative Probability: F(x) = P(X ≤ x) tells you the probability of being at or below x—a "running total" of probability.
  2. Universal Tool: Unlike PMF (discrete only) or PDF (continuous only), the CDF works for ALL random variables—discrete, continuous, and mixed.
  3. Four Properties: F(-∞) = 0, F(+∞) = 1, non-decreasing, and right-continuous. These guarantee a valid probability measure.
  4. Discrete = Steps, Continuous = Smooth: Step functions (staircase) for discrete RVs; smooth S-curves for continuous RVs.
  5. Interval Probabilities: P(a < X ≤ b) = F(b) - F(a). No need to sum or integrate—just subtract CDF values!
  6. Inverse CDF = Quantile Function: Q(p) = F⁻¹(p) gives the value where cumulative probability equals p. Essential for percentiles and confidence intervals.
  7. Inverse Transform Sampling: Generate X = F⁻¹(U) where U ~ Uniform(0,1) to sample from any distribution. Foundation for Monte Carlo methods!
  8. Empirical CDF: F̂ₙ(x) = (# samples ≤ x) / n. Converges to true CDF as n → ∞ (Glivenko-Cantelli theorem).
  9. Survival Function: S(x) = 1 - F(x) = P(X > x). Used extensively in reliability engineering and survival analysis.

Connections to Other Topics

→ Chapter 3: Joint Distributions

The joint CDF F(x, y) = P(X ≤ x, Y ≤ y) extends these concepts to multiple random variables.

→ Chapter 5: Central Limit Theorem

The CLT states that CDFs of normalized sums converge to the normal CDF, regardless of the original distribution.

→ Hypothesis Testing

P-values are computed using CDFs of test statistics. The K-S test directly compares empirical and theoretical CDFs.

→ Next Section: Mixed RVs

Mixed random variables have CDFs with both jumps AND continuous parts—the CDF is the perfect tool to describe them!

Next Up: In the next section, we'll explore Mixed Random Variables—random variables that have both discrete "jumps" and continuous "smooth" parts. The CDF is the perfect tool to describe these hybrid distributions!