Chapter 9
20 min read
Section 65 of 175

Glivenko-Cantelli Theorem

Convergence Concepts

Learning Objectives

By the end of this section, you will be able to:

  1. Define the empirical distribution function (ECDF) and understand its properties
  2. State the Glivenko-Cantelli theorem and explain its significance
  3. Apply the DKW inequality to obtain finite-sample confidence bands
  4. Connect uniform convergence of CDFs to statistical learning theory
Why This Matters: The Glivenko-Cantelli theorem is the foundation for why empirical risk minimization works in machine learning. It guarantees that the empirical distribution converges uniformly to the true distribution.

The Story: Does the Empirical CDF Converge?

We've seen that sample means converge to population means (Law of Large Numbers). But what about the entire distribution? If we draw n samples, does the empirical distribution converge to the true distribution?

The Glivenko-Cantelli theorem gives a resounding yes—and not just pointwise, but uniformly over all x values.


The Empirical Distribution Function

Definition

Empirical Distribution Function

Given i.i.d. samples X1,ldots,XnX_1, \\ldots, X_n from distribution F, the empirical distribution function (ECDF) is:

Fn(x)=frac1nsumi=1nmathbf1XileqxF_n(x) = \\frac{1}{n} \\sum_{i=1}^n \\mathbf{1}_{\\{X_i \\leq x\\}}

In words: Fn(x) is the proportion of samples ≤ x.

Interactive: ECDF Convergence

sup|Fn(x) - F(x)| = 0.1387
Glivenko-Cantelli guarantees this → 0 as n → ∞
xCDFECDF F_n(x)True CDF F(x)
Increase n to see the blue empirical CDF converge uniformly to the red true CDF

The Glivenko-Cantelli Theorem

Formal Statement

Glivenko-Cantelli Theorem (1933)

If X1,X2,ldotsX_1, X_2, \\ldots are i.i.d. with CDF F, then:

supxinmathbbRFn(x)F(x)xrightarrowa.s.0\\sup_{x \\in \\mathbb{R}} |F_n(x) - F(x)| \\xrightarrow{a.s.} 0

The key insight: Convergence is uniform over all x, and holds almost surely.

Why Uniform Convergence Matters

  • Pointwise convergence (Fn(x) → F(x) for each x) is easy to prove via LLN
  • Uniform convergence (sup over all x) is much stronger—the maximum gap vanishes
  • This uniformity is essential for statistical learning theory

Proof Sketch

Rate of Convergence

The Glivenko-Cantelli theorem tells us that convergence happens, but not how fast. The rate is O(1/√n), made precise by the DKW inequality.


DKW Inequality

Dvoretzky-Kiefer-Wolfowitz Inequality

For any ε > 0:

Pleft(supxFn(x)F(x)>epsilonright)leq2e2nepsilon2P\\left(\\sup_x |F_n(x) - F(x)| > \\epsilon\\right) \\leq 2e^{-2n\\epsilon^2}

Application: This gives an exponentially decaying confidence band around the ECDF.

For a (1 - α) confidence band:

epsilon=sqrtfracln(2/alpha)2n\\epsilon = \\sqrt{\\frac{\\ln(2/\\alpha)}{2n}}

Kolmogorov-Smirnov Test

The Glivenko-Cantelli theorem and DKW inequality form the foundation of the Kolmogorov-Smirnov test for goodness of fit.

The KS test statistic is:

Dn=supxFn(x)F0(x)D_n = \\sup_x |F_n(x) - F_0(x)|

where F0 is the hypothesized CDF. Under the null hypothesis (data comes from F0), the DKW inequality gives critical values.


Machine Learning Applications

  • Empirical Risk Minimization: The Glivenko-Cantelli theorem (and its generalizations) justify using training error as a proxy for test error
  • VC Theory: Extends Glivenko-Cantelli to function classes, giving PAC learning bounds
  • Bootstrap Methods: Sampling from Fn approximates sampling from F
  • Quantile Estimation: Empirical quantiles converge to true quantiles

Python Implementation

🐍glivenko_cantelli.py
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5def demonstrate_glivenko_cantelli(n_values=[10, 50, 200, 1000]):
6    """Demonstrate ECDF convergence to true CDF."""
7    np.random.seed(42)
8    true_dist = stats.norm(0, 1)
9    x = np.linspace(-3.5, 3.5, 200)
10    true_cdf = true_dist.cdf(x)
11
12    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
13    axes = axes.ravel()
14
15    for ax, n in zip(axes, n_values):
16        samples = true_dist.rvs(n)
17        ecdf = np.array([np.mean(samples <= xi) for xi in x])
18        sup_diff = np.max(np.abs(ecdf - true_cdf))
19
20        ax.step(x, ecdf, where='post', label=f'ECDF (n={n})', color='blue')
21        ax.plot(x, true_cdf, label='True CDF', color='red', linestyle='--')
22        ax.fill_between(x, ecdf, true_cdf, alpha=0.3)
23        ax.set_title(f'n = {n}, sup|Fₙ - F| = {sup_diff:.4f}')
24        ax.legend()
25        ax.set_xlabel('x')
26        ax.set_ylabel('CDF')
27
28    plt.tight_layout()
29    plt.savefig('glivenko_cantelli_demo.png', dpi=150)
30    plt.show()
31
32
33def dkw_confidence_band(samples, alpha=0.05):
34    """
35    Compute DKW confidence band for ECDF.
36
37    Returns epsilon such that P(sup|Fₙ - F| > ε) ≤ α
38    """
39    n = len(samples)
40    epsilon = np.sqrt(np.log(2 / alpha) / (2 * n))
41    return epsilon
42
43
44if __name__ == "__main__":
45    demonstrate_glivenko_cantelli()
46
47    # Example: DKW confidence band
48    samples = np.random.normal(0, 1, 100)
49    eps = dkw_confidence_band(samples, alpha=0.05)
50    print(f"95% confidence band width: ±{eps:.4f}")

Summary

  • Empirical CDF Fn(x) is the proportion of samples ≤ x
  • Glivenko-Cantelli: sup|Fn(x) - F(x)| → 0 almost surely
  • DKW Inequality: Finite-sample bound P(sup > ε) ≤ 2e-2nε²
  • ML Application: Justifies empirical risk minimization

Key Takeaway

The Glivenko-Cantelli theorem is the "fundamental theorem of statistics"—it guarantees that the empirical distribution converges uniformly to the true distribution. This is why learning from data is possible!

Loading comments...