Learning Objectives
By the end of this section, you will be able to:
- Define L¹ convergence and explain how it differs from L² and convergence in probability
- State the definition of uniform integrability and recognize when a family of random variables is uniformly integrable
- Apply the Vitali Convergence Theorem to determine when convergence in probability implies L¹ convergence
- Connect uniform integrability to the Dominated Convergence Theorem and understand the relationship
- Recognize applications in machine learning where uniform integrability conditions are important
Why This Matters for AI/ML Engineers: Understanding when you can interchange limits and expectations is crucial for analyzing training dynamics, proving convergence of loss functions, and justifying asymptotic approximations in optimization algorithms.
The Story: When L¹ Convergence Works
We've studied convergence in probability (Section 9.1) and L² convergence (Section 9.4). But there's another important mode: L¹ convergence, also called convergence in mean.
A natural question arises: if , does it follow that ? In other words, can we swap limits and expectations?
The answer is: not always! The key condition that makes this work is uniform integrability—a property ensuring that the tails of the distributions don't escape to infinity as n grows.
L¹ (Mean) Convergence
Formal Definition
A sequence converges to X in L¹ (or in mean) if:
We write:
L¹ vs L² Convergence
| Property | L¹ Convergence | L² Convergence |
|---|---|---|
| Definition | E[|Xₙ - X|] → 0 | E[(Xₙ - X)²] → 0 |
| Requires | Finite first moment | Finite second moment |
| Metric | ||f||₁ = E[|f|] | ||f||₂ = √E[f²] |
| Implication | L² ⟹ L¹ (Jensen) | Stronger requirement |
| Completeness | L¹ is complete | L² is Hilbert space |
The Hierarchy
L² convergence implies L¹ convergence (by Jensen's inequality or Cauchy-Schwarz), but not vice versa. Both imply convergence in probability. The relationship is:
Uniform Integrability
Definition and Intuition
The key concept that bridges convergence in probability and L¹ convergence is uniform integrability (UI).
A family of random variables is uniformly integrable if:
In words: The tail contributions to the expectations can be made uniformly small across all n by choosing K large enough.
Intuitive Understanding
Uniform integrability means that no single Xn can "hide" too much mass in its tail. Even as n varies, the probability-weighted contribution from large values stays bounded.
- If Xn have bounded support [−M, M], they are UI
- If there exists Y with E[|Y|] < ∞ and |Xn| ≤ Y, they are UI (dominated)
- If supn E[|Xn|1+ε] < ∞ for some ε > 0, they are UI
Equivalent Conditions
Several equivalent characterizations of uniform integrability exist:
- Tail condition:
- De la Vallée-Poussin: There exists a convex increasing function φ with such that
- Bounded + tight tails: supn E[|Xn|] < ∞ and for all ε > 0, there exists δ > 0 such that P(A) < δ implies E[|Xn|·1A] < ε uniformly in n
Interactive: Uniform Integrability
The visualization below demonstrates uniform integrability. Adjust the threshold K and see how the tail integrals behave across the sequence.
Tail Integral: E[|Xn| · 1{|Xn|>K}] for each n
Sequence index n
What you're seeing: The bars show the "tail mass" for each Xn. For uniform integrability, we need these tail integrals to be uniformly bounded across all n. Increase K to shrink the tails, or increase n to see how tail behavior evolves.
Vitali Convergence Theorem
The Theorem
Let (convergence in probability). Then the following are equivalent:
- (L¹ convergence)
- is uniformly integrable
The Power of Vitali: This theorem tells us exactly when we can swap limits and expectations. Convergence in probability plus uniform integrability equals L¹ convergence.
Proof Sketch
Connection to Dominated Convergence
The Vitali Convergence Theorem generalizes the famous Dominated Convergence Theorem (DCT). Recall the DCT:
If Xn → X almost surely, |Xn| ≤ Y for all n, and E[Y] < ∞, then:
The connection: If |Xn| ≤ Y with E[Y] < ∞, then the family {Xn} is automatically uniformly integrable. So DCT is a special case of Vitali!
| Aspect | Dominated Convergence | Vitali Convergence |
|---|---|---|
| Convergence mode | Almost sure | In probability |
| Tail control | Dominated by integrable Y | Uniform integrability |
| Generality | Special case | More general |
| Verification | Find dominating Y | Check UI conditions |
Examples and Counterexamples
Example: When Uniform Integrability Fails
Let Xn = n·1[0,1/n] on [0,1] with Lebesgue measure.
- E[Xn] = n · (1/n) = 1 for all n
- Xn → 0 almost surely (and in probability)
- But E[Xn] = 1 → 1 ≠ E[0] = 0
What Went Wrong?
The sequence is NOT uniformly integrable. As n grows, the mass concentrates on a smaller set but with larger values. The tail integral E[Xn·1Xn>K] = 1 for K < n, which doesn't vanish uniformly.
Example: When Uniform Integrability Holds
Let Xn = X·1|X|≤n where E[|X|] < ∞. Then:
- Xn → X almost surely
- |Xn| ≤ |X| (dominated!)
- Therefore uniformly integrable, and E[Xn] → E[X]
Machine Learning Applications
Uniform integrability appears in several ML contexts:
- Loss function convergence: When training converges (θn → θ*), uniform integrability of L(θn) ensures E[L(θn)] → E[L(θ*)]
- Gradient estimator bounds: Proving that SGD gradients have bounded expectations requires UI-like conditions
- Regularization effects: L² regularization often provides the dominating bound needed for DCT/Vitali
- Asymptotic MLE theory: The consistency of likelihood-based estimators often requires uniform integrability of the score function
Python Implementation
1import numpy as np
2from typing import Callable, List
3
4def check_uniform_integrability(
5 samples_list: List[np.ndarray],
6 K_values: np.ndarray = None
7) -> dict:
8 """
9 Check uniform integrability of a family of samples.
10
11 For UI: lim_{K->inf} sup_n E[|X_n| * 1_{|X_n|>K}] = 0
12
13 Args:
14 samples_list: List of sample arrays, one per X_n
15 K_values: Threshold values to test
16
17 Returns:
18 dict with K values and corresponding tail integrals
19 """
20 if K_values is None:
21 # Auto-select K values based on data range
22 all_data = np.concatenate(samples_list)
23 K_values = np.linspace(0, np.percentile(np.abs(all_data), 99), 20)
24
25 results = {"K": K_values, "sup_tail_integral": []}
26
27 for K in K_values:
28 tail_integrals = []
29 for samples in samples_list:
30 mask = np.abs(samples) > K
31 tail_integral = np.mean(np.abs(samples) * mask)
32 tail_integrals.append(tail_integral)
33
34 results["sup_tail_integral"].append(max(tail_integrals))
35
36 # Check if UI holds (tail integrals vanish)
37 results["is_UI"] = results["sup_tail_integral"][-1] < 0.01
38
39 return results
40
41
42def demonstrate_vitali_theorem():
43 """
44 Demonstrate Vitali: P-convergence + UI => L1-convergence
45 """
46 np.random.seed(42)
47
48 # Example 1: UI holds (bounded sequence)
49 print("Example 1: Bounded sequence (UI holds)")
50 samples_ui = [np.random.uniform(-1, 1, 1000) * (1 - 1/(n+1))
51 for n in range(1, 20)]
52 result_ui = check_uniform_integrability(samples_ui)
53 print(f" Uniformly Integrable: {result_ui['is_UI']}")
54 print(f" Tail integral at max K: {result_ui['sup_tail_integral'][-1]:.6f}")
55
56 # Example 2: UI fails (escaping mass)
57 print("\nExample 2: Escaping mass (UI fails)")
58 samples_not_ui = []
59 for n in range(1, 20):
60 # Most samples near 0, but occasional huge values
61 samples = np.zeros(1000)
62 num_large = max(1, 1000 // (n + 1))
63 samples[:num_large] = n * np.random.exponential(1, num_large)
64 np.random.shuffle(samples)
65 samples_not_ui.append(samples)
66
67 result_not_ui = check_uniform_integrability(samples_not_ui)
68 print(f" Uniformly Integrable: {result_not_ui['is_UI']}")
69 print(f" Tail integral at max K: {result_not_ui['sup_tail_integral'][-1]:.6f}")
70
71
72if __name__ == "__main__":
73 demonstrate_vitali_theorem()Common Mistakes to Avoid
Reality: Convergence in probability does NOT imply convergence of expectations. You need uniform integrability!
Reality: supn E[|Xn|] < ∞ is necessary but NOT sufficient for UI. You also need tail control.
To prove L¹ convergence, verify (1) convergence in probability, AND (2) uniform integrability (often via a dominating function or moment bound).
Practice Problems
Summary
- L¹ convergence means E[|Xn - X|] → 0, implying convergence of expectations
- Uniform integrability ensures tail contributions stay uniformly bounded across all n
- Vitali's Theorem: Convergence in probability + UI ⟺ L¹ convergence
- Dominated Convergence is a special case where domination implies UI
- In ML: UI conditions justify swapping limits and expectations in loss function analysis
Key Takeaway
When you want to prove E[Xn] → E[X], don't just check pointwise or probability convergence. Ask: "Are the tails under control?"Uniform integrability is your answer.