Learning Objectives
By the end of this section, you will be able to:
- Define the empirical distribution function (ECDF) and understand its properties
- State the Glivenko-Cantelli theorem and explain its significance
- Apply the DKW inequality to obtain finite-sample confidence bands
- Connect uniform convergence of CDFs to statistical learning theory
Why This Matters: The Glivenko-Cantelli theorem is the foundation for why empirical risk minimization works in machine learning. It guarantees that the empirical distribution converges uniformly to the true distribution.
The Story: Does the Empirical CDF Converge?
We've seen that sample means converge to population means (Law of Large Numbers). But what about the entire distribution? If we draw n samples, does the empirical distribution converge to the true distribution?
The Glivenko-Cantelli theorem gives a resounding yes—and not just pointwise, but uniformly over all x values.
The Empirical Distribution Function
Definition
Given i.i.d. samples from distribution F, the empirical distribution function (ECDF) is:
In words: Fn(x) is the proportion of samples ≤ x.
Interactive: ECDF Convergence
The Glivenko-Cantelli Theorem
Formal Statement
If are i.i.d. with CDF F, then:
The key insight: Convergence is uniform over all x, and holds almost surely.
Why Uniform Convergence Matters
- Pointwise convergence (Fn(x) → F(x) for each x) is easy to prove via LLN
- Uniform convergence (sup over all x) is much stronger—the maximum gap vanishes
- This uniformity is essential for statistical learning theory
Proof Sketch
Rate of Convergence
The Glivenko-Cantelli theorem tells us that convergence happens, but not how fast. The rate is O(1/√n), made precise by the DKW inequality.
DKW Inequality
For any ε > 0:
Application: This gives an exponentially decaying confidence band around the ECDF.
For a (1 - α) confidence band:
Kolmogorov-Smirnov Test
The Glivenko-Cantelli theorem and DKW inequality form the foundation of the Kolmogorov-Smirnov test for goodness of fit.
The KS test statistic is:
where F0 is the hypothesized CDF. Under the null hypothesis (data comes from F0), the DKW inequality gives critical values.
Machine Learning Applications
- Empirical Risk Minimization: The Glivenko-Cantelli theorem (and its generalizations) justify using training error as a proxy for test error
- VC Theory: Extends Glivenko-Cantelli to function classes, giving PAC learning bounds
- Bootstrap Methods: Sampling from Fn approximates sampling from F
- Quantile Estimation: Empirical quantiles converge to true quantiles
Python Implementation
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5def demonstrate_glivenko_cantelli(n_values=[10, 50, 200, 1000]):
6 """Demonstrate ECDF convergence to true CDF."""
7 np.random.seed(42)
8 true_dist = stats.norm(0, 1)
9 x = np.linspace(-3.5, 3.5, 200)
10 true_cdf = true_dist.cdf(x)
11
12 fig, axes = plt.subplots(2, 2, figsize=(12, 10))
13 axes = axes.ravel()
14
15 for ax, n in zip(axes, n_values):
16 samples = true_dist.rvs(n)
17 ecdf = np.array([np.mean(samples <= xi) for xi in x])
18 sup_diff = np.max(np.abs(ecdf - true_cdf))
19
20 ax.step(x, ecdf, where='post', label=f'ECDF (n={n})', color='blue')
21 ax.plot(x, true_cdf, label='True CDF', color='red', linestyle='--')
22 ax.fill_between(x, ecdf, true_cdf, alpha=0.3)
23 ax.set_title(f'n = {n}, sup|Fₙ - F| = {sup_diff:.4f}')
24 ax.legend()
25 ax.set_xlabel('x')
26 ax.set_ylabel('CDF')
27
28 plt.tight_layout()
29 plt.savefig('glivenko_cantelli_demo.png', dpi=150)
30 plt.show()
31
32
33def dkw_confidence_band(samples, alpha=0.05):
34 """
35 Compute DKW confidence band for ECDF.
36
37 Returns epsilon such that P(sup|Fₙ - F| > ε) ≤ α
38 """
39 n = len(samples)
40 epsilon = np.sqrt(np.log(2 / alpha) / (2 * n))
41 return epsilon
42
43
44if __name__ == "__main__":
45 demonstrate_glivenko_cantelli()
46
47 # Example: DKW confidence band
48 samples = np.random.normal(0, 1, 100)
49 eps = dkw_confidence_band(samples, alpha=0.05)
50 print(f"95% confidence band width: ±{eps:.4f}")Summary
- Empirical CDF Fn(x) is the proportion of samples ≤ x
- Glivenko-Cantelli: sup|Fn(x) - F(x)| → 0 almost surely
- DKW Inequality: Finite-sample bound P(sup > ε) ≤ 2e-2nε²
- ML Application: Justifies empirical risk minimization
Key Takeaway
The Glivenko-Cantelli theorem is the "fundamental theorem of statistics"—it guarantees that the empirical distribution converges uniformly to the true distribution. This is why learning from data is possible!