Chapter 9
25 min read
Section 63 of 175

L¹ Convergence and Uniform Integrability

Convergence Concepts

Learning Objectives

By the end of this section, you will be able to:

  1. Define L¹ convergence and explain how it differs from L² and convergence in probability
  2. State the definition of uniform integrability and recognize when a family of random variables is uniformly integrable
  3. Apply the Vitali Convergence Theorem to determine when convergence in probability implies L¹ convergence
  4. Connect uniform integrability to the Dominated Convergence Theorem and understand the relationship
  5. Recognize applications in machine learning where uniform integrability conditions are important
Why This Matters for AI/ML Engineers: Understanding when you can interchange limits and expectations is crucial for analyzing training dynamics, proving convergence of loss functions, and justifying asymptotic approximations in optimization algorithms.

The Story: When L¹ Convergence Works

We've studied convergence in probability (Section 9.1) and L² convergence (Section 9.4). But there's another important mode: L¹ convergence, also called convergence in mean.

A natural question arises: if XnxrightarrowPXX_n \\xrightarrow{P} X, does it follow that E[Xn]toE[X]E[X_n] \\to E[X]? In other words, can we swap limits and expectations?

The answer is: not always! The key condition that makes this work is uniform integrability—a property ensuring that the tails of the distributions don't escape to infinity as n grows.


L¹ (Mean) Convergence

Formal Definition

Definition: L¹ Convergence

A sequence X1,X2,ldotsX_1, X_2, \\ldots converges to X in L¹ (or in mean) if:

E[XnX]to0textasntoinftyE[|X_n - X|] \\to 0 \\text{ as } n \\to \\infty

We write: XnxrightarrowL1XX_n \\xrightarrow{L^1} X

L¹ vs L² Convergence

PropertyL¹ ConvergenceL² Convergence
DefinitionE[|Xₙ - X|] → 0E[(Xₙ - X)²] → 0
RequiresFinite first momentFinite second moment
Metric||f||₁ = E[|f|]||f||₂ = √E[f²]
ImplicationL² ⟹ L¹ (Jensen)Stronger requirement
CompletenessL¹ is completeL² is Hilbert space

The Hierarchy

L² convergence implies L¹ convergence (by Jensen's inequality or Cauchy-Schwarz), but not vice versa. Both imply convergence in probability. The relationship is:

L² ⟹ L¹ ⟹ In Probability ⟹ In Distribution

Uniform Integrability

Definition and Intuition

The key concept that bridges convergence in probability and L¹ convergence is uniform integrability (UI).

Definition: Uniform Integrability

A family XnninmathcalI\\{X_n\\}_{n \\in \\mathcal{I}} of random variables is uniformly integrable if:

limKtoinftysupnE[Xncdotmathbf1Xn>K]=0\\lim_{K \\to \\infty} \\sup_n E[|X_n| \\cdot \\mathbf{1}_{\\{|X_n| > K\\}}] = 0

In words: The tail contributions to the expectations can be made uniformly small across all n by choosing K large enough.

Intuitive Understanding

Uniform integrability means that no single Xn can "hide" too much mass in its tail. Even as n varies, the probability-weighted contribution from large values stays bounded.

  • If Xn have bounded support [−M, M], they are UI
  • If there exists Y with E[|Y|] < ∞ and |Xn| ≤ Y, they are UI (dominated)
  • If supn E[|Xn|1+ε] < ∞ for some ε > 0, they are UI

Equivalent Conditions

Several equivalent characterizations of uniform integrability exist:

  1. Tail condition: limKtoinftysupnE[Xncdotmathbf1Xn>K]=0\\lim_{K \\to \\infty} \\sup_n E[|X_n| \\cdot \\mathbf{1}_{|X_n| > K}] = 0
  2. De la Vallée-Poussin: There exists a convex increasing function φ with limxtoinftyphi(x)/x=infty\\lim_{x \\to \\infty} \\phi(x)/x = \\infty such that supnE[phi(Xn)]<infty\\sup_n E[\\phi(|X_n|)] < \\infty
  3. Bounded + tight tails: supn E[|Xn|] < ∞ and for all ε > 0, there exists δ > 0 such that P(A) < δ implies E[|Xn|·1A] < ε uniformly in n

Interactive: Uniform Integrability

The visualization below demonstrates uniform integrability. Adjust the threshold K and see how the tail integrals behave across the sequence.

NOT Uniformly Integrable
Tail integrals grow with n - not bounded uniformly

Tail Integral: E[|Xn| · 1{|Xn|>K}] for each n

1
2
3
4
5
6
7
8
9
10

Sequence index n

What you're seeing: The bars show the "tail mass" for each Xn. For uniform integrability, we need these tail integrals to be uniformly bounded across all n. Increase K to shrink the tails, or increase n to see how tail behavior evolves.


Vitali Convergence Theorem

The Theorem

Vitali Convergence Theorem

Let XnxrightarrowPXX_n \\xrightarrow{P} X (convergence in probability). Then the following are equivalent:

  1. XnxrightarrowL1XX_n \\xrightarrow{L^1} X (L¹ convergence)
  2. Xn\\{X_n\\} is uniformly integrable

The Power of Vitali: This theorem tells us exactly when we can swap limits and expectations. Convergence in probability plus uniform integrability equals L¹ convergence.

Proof Sketch


Connection to Dominated Convergence

The Vitali Convergence Theorem generalizes the famous Dominated Convergence Theorem (DCT). Recall the DCT:

Dominated Convergence Theorem

If Xn → X almost surely, |Xn| ≤ Y for all n, and E[Y] < ∞, then:

E[Xn]toE[X]E[X_n] \\to E[X]

The connection: If |Xn| ≤ Y with E[Y] < ∞, then the family {Xn} is automatically uniformly integrable. So DCT is a special case of Vitali!

AspectDominated ConvergenceVitali Convergence
Convergence modeAlmost sureIn probability
Tail controlDominated by integrable YUniform integrability
GeneralitySpecial caseMore general
VerificationFind dominating YCheck UI conditions

Examples and Counterexamples

Example: When Uniform Integrability Fails

Let Xn = n·1[0,1/n] on [0,1] with Lebesgue measure.

  • E[Xn] = n · (1/n) = 1 for all n
  • Xn → 0 almost surely (and in probability)
  • But E[Xn] = 1 → 1 ≠ E[0] = 0

What Went Wrong?

The sequence is NOT uniformly integrable. As n grows, the mass concentrates on a smaller set but with larger values. The tail integral E[Xn·1Xn>K] = 1 for K < n, which doesn't vanish uniformly.

Example: When Uniform Integrability Holds

Let Xn = X·1|X|≤n where E[|X|] < ∞. Then:

  • Xn → X almost surely
  • |Xn| ≤ |X| (dominated!)
  • Therefore uniformly integrable, and E[Xn] → E[X]

Machine Learning Applications

Uniform integrability appears in several ML contexts:

  • Loss function convergence: When training converges (θn → θ*), uniform integrability of L(θn) ensures E[L(θn)] → E[L(θ*)]
  • Gradient estimator bounds: Proving that SGD gradients have bounded expectations requires UI-like conditions
  • Regularization effects: L² regularization often provides the dominating bound needed for DCT/Vitali
  • Asymptotic MLE theory: The consistency of likelihood-based estimators often requires uniform integrability of the score function

Python Implementation

🐍uniform_integrability.py
1import numpy as np
2from typing import Callable, List
3
4def check_uniform_integrability(
5    samples_list: List[np.ndarray],
6    K_values: np.ndarray = None
7) -> dict:
8    """
9    Check uniform integrability of a family of samples.
10
11    For UI: lim_{K->inf} sup_n E[|X_n| * 1_{|X_n|>K}] = 0
12
13    Args:
14        samples_list: List of sample arrays, one per X_n
15        K_values: Threshold values to test
16
17    Returns:
18        dict with K values and corresponding tail integrals
19    """
20    if K_values is None:
21        # Auto-select K values based on data range
22        all_data = np.concatenate(samples_list)
23        K_values = np.linspace(0, np.percentile(np.abs(all_data), 99), 20)
24
25    results = {"K": K_values, "sup_tail_integral": []}
26
27    for K in K_values:
28        tail_integrals = []
29        for samples in samples_list:
30            mask = np.abs(samples) > K
31            tail_integral = np.mean(np.abs(samples) * mask)
32            tail_integrals.append(tail_integral)
33
34        results["sup_tail_integral"].append(max(tail_integrals))
35
36    # Check if UI holds (tail integrals vanish)
37    results["is_UI"] = results["sup_tail_integral"][-1] < 0.01
38
39    return results
40
41
42def demonstrate_vitali_theorem():
43    """
44    Demonstrate Vitali: P-convergence + UI => L1-convergence
45    """
46    np.random.seed(42)
47
48    # Example 1: UI holds (bounded sequence)
49    print("Example 1: Bounded sequence (UI holds)")
50    samples_ui = [np.random.uniform(-1, 1, 1000) * (1 - 1/(n+1))
51                  for n in range(1, 20)]
52    result_ui = check_uniform_integrability(samples_ui)
53    print(f"  Uniformly Integrable: {result_ui['is_UI']}")
54    print(f"  Tail integral at max K: {result_ui['sup_tail_integral'][-1]:.6f}")
55
56    # Example 2: UI fails (escaping mass)
57    print("\nExample 2: Escaping mass (UI fails)")
58    samples_not_ui = []
59    for n in range(1, 20):
60        # Most samples near 0, but occasional huge values
61        samples = np.zeros(1000)
62        num_large = max(1, 1000 // (n + 1))
63        samples[:num_large] = n * np.random.exponential(1, num_large)
64        np.random.shuffle(samples)
65        samples_not_ui.append(samples)
66
67    result_not_ui = check_uniform_integrability(samples_not_ui)
68    print(f"  Uniformly Integrable: {result_not_ui['is_UI']}")
69    print(f"  Tail integral at max K: {result_not_ui['sup_tail_integral'][-1]:.6f}")
70
71
72if __name__ == "__main__":
73    demonstrate_vitali_theorem()

Common Mistakes to Avoid

Mistake 1: Assuming E[Xn] → E[X] automatically

Reality: Convergence in probability does NOT imply convergence of expectations. You need uniform integrability!

Mistake 2: Confusing UI with bounded expectations

Reality: supn E[|Xn|] < ∞ is necessary but NOT sufficient for UI. You also need tail control.

Correct Approach

To prove L¹ convergence, verify (1) convergence in probability, AND (2) uniform integrability (often via a dominating function or moment bound).


Practice Problems


Summary

  • L¹ convergence means E[|Xn - X|] → 0, implying convergence of expectations
  • Uniform integrability ensures tail contributions stay uniformly bounded across all n
  • Vitali's Theorem: Convergence in probability + UI ⟺ L¹ convergence
  • Dominated Convergence is a special case where domination implies UI
  • In ML: UI conditions justify swapping limits and expectations in loss function analysis

Key Takeaway

When you want to prove E[Xn] → E[X], don't just check pointwise or probability convergence. Ask: "Are the tails under control?"Uniform integrability is your answer.

Loading comments...