Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Define L¹ convergence and explain how it differs from L² and convergence in probability
State the definition of uniform integrability and recognize when a family of random variables is uniformly integrable
Apply the Vitali Convergence Theorem to determine when convergence in probability implies L¹ convergence
Connect uniform integrability to the Dominated Convergence Theorem and understand the relationship
Recognize applications in machine learning where uniform integrability conditions are important

Why This Matters for AI/ML Engineers: Understanding when you can interchange limits and expectations is crucial for analyzing training dynamics, proving convergence of loss functions, and justifying asymptotic approximations in optimization algorithms.

The Story: When L¹ Convergence Works

We've studied convergence in probability (Section 9.1) and L² convergence (Section 9.4). But there's another important mode: L¹ convergence, also called convergence in mean.

A natural question arises: if $X_n \\xrightarrow{P} X$ , does it follow that $E[X_n] \\to E[X]$ ? In other words, can we swap limits and expectations?

The answer is: not always! The key condition that makes this work is uniform integrability—a property ensuring that the tails of the distributions don't escape to infinity as n grows.

L¹ (Mean) Convergence

Formal Definition

Definition: L¹ Convergence

A sequence $X_1, X_2, \\ldots$ converges to X in L¹ (or in mean) if:

E[|X_n - X|] \\to 0 \\text{ as } n \\to \\infty

We write: $X_n \\xrightarrow{L^1} X$

L¹ vs L² Convergence

Property	L¹ Convergence	L² Convergence
Definition	E[\|Xₙ - X\|] → 0	E[(Xₙ - X)²] → 0
Requires	Finite first moment	Finite second moment
Metric	\|\|f\|\|₁ = E[\|f\|]	\|\|f\|\|₂ = √E[f²]
Implication	L² ⟹ L¹ (Jensen)	Stronger requirement
Completeness	L¹ is complete	L² is Hilbert space

The Hierarchy

L² convergence implies L¹ convergence (by Jensen's inequality or Cauchy-Schwarz), but not vice versa. Both imply convergence in probability. The relationship is:

L² ⟹ L¹ ⟹ In Probability ⟹ In Distribution

Uniform Integrability

Definition and Intuition

The key concept that bridges convergence in probability and L¹ convergence is uniform integrability (UI).

Definition: Uniform Integrability

A family $\\{X_n\\}_{n \\in \\mathcal{I}}$ of random variables is uniformly integrable if:

\\lim_{K \\to \\infty} \\sup_n E[|X_n| \\cdot \\mathbf{1}_{\\{|X_n| > K\\}}] = 0

In words: The tail contributions to the expectations can be made uniformly small across all n by choosing K large enough.

Intuitive Understanding

Uniform integrability means that no single X_n can "hide" too much mass in its tail. Even as n varies, the probability-weighted contribution from large values stays bounded.

If X_n have bounded support [−M, M], they are UI
If there exists Y with E[|Y|] < ∞ and |X_n| ≤ Y, they are UI (dominated)
If sup_n E[|X_n|^1+ε] < ∞ for some ε > 0, they are UI

Equivalent Conditions

Several equivalent characterizations of uniform integrability exist:

Tail condition: $\\lim_{K \\to \\infty} \\sup_n E[|X_n| \\cdot \\mathbf{1}_{|X_n| > K}] = 0$
De la Vallée-Poussin: There exists a convex increasing function φ with $\\lim_{x \\to \\infty} \\phi(x)/x = \\infty$ such that $\\sup_n E[\\phi(|X_n|)] < \\infty$
Bounded + tight tails: sup_n E[|X_n|] < ∞ and for all ε > 0, there exists δ > 0 such that P(A) < δ implies E[|X_n|·1_A] < ε uniformly in n

Interactive: Uniform Integrability

The visualization below demonstrates uniform integrability. Adjust the threshold K and see how the tail integrals behave across the sequence.

Threshold K: 5

Number of sequences (n): 10

❌

NOT Uniformly Integrable

Tail integrals grow with n - not bounded uniformly

Tail Integral: E[|X_n| · 1_{{|X_n|>K}}] for each n

Sequence index n

What you're seeing: The bars show the "tail mass" for each X_n. For uniform integrability, we need these tail integrals to be uniformly bounded across all n. Increase K to shrink the tails, or increase n to see how tail behavior evolves.

Vitali Convergence Theorem

The Theorem

Vitali Convergence Theorem

Let $X_n \\xrightarrow{P} X$ (convergence in probability). Then the following are equivalent:

$X_n \\xrightarrow{L^1} X$ (L¹ convergence)
$\\{X_n\\}$ is uniformly integrable

The Power of Vitali: This theorem tells us exactly when we can swap limits and expectations. Convergence in probability plus uniform integrability equals L¹ convergence.

Proof Sketch

Connection to Dominated Convergence

The Vitali Convergence Theorem generalizes the famous Dominated Convergence Theorem (DCT). Recall the DCT:

Dominated Convergence Theorem

If X_n → X almost surely, |X_n| ≤ Y for all n, and E[Y] < ∞, then:

E[X_n] \\to E[X]

The connection: If |X_n| ≤ Y with E[Y] < ∞, then the family {X_n} is automatically uniformly integrable. So DCT is a special case of Vitali!

Aspect	Dominated Convergence	Vitali Convergence
Convergence mode	Almost sure	In probability
Tail control	Dominated by integrable Y	Uniform integrability
Generality	Special case	More general
Verification	Find dominating Y	Check UI conditions

Examples and Counterexamples

Example: When Uniform Integrability Fails

Let X_n = n·1_[0,1/n] on [0,1] with Lebesgue measure.

E[X_n] = n · (1/n) = 1 for all n
X_n → 0 almost surely (and in probability)
But E[X_n] = 1 → 1 ≠ E[0] = 0

What Went Wrong?

The sequence is NOT uniformly integrable. As n grows, the mass concentrates on a smaller set but with larger values. The tail integral E[X_n·1_{X_n>K}] = 1 for K < n, which doesn't vanish uniformly.

Example: When Uniform Integrability Holds

Let X_n = X·1_|X|≤n where E[|X|] < ∞. Then:

X_n → X almost surely
|X_n| ≤ |X| (dominated!)
Therefore uniformly integrable, and E[X_n] → E[X]

Machine Learning Applications

Uniform integrability appears in several ML contexts:

Loss function convergence: When training converges (θ_n → θ*), uniform integrability of L(θ_n) ensures E[L(θ_n)] → E[L(θ*)]
Gradient estimator bounds: Proving that SGD gradients have bounded expectations requires UI-like conditions
Regularization effects: L² regularization often provides the dominating bound needed for DCT/Vitali
Asymptotic MLE theory: The consistency of likelihood-based estimators often requires uniform integrability of the score function

Python Implementation

🐍uniform_integrability.py

1import numpy as np
2from typing import Callable, List
3
4def check_uniform_integrability(
5    samples_list: List[np.ndarray],
6    K_values: np.ndarray = None
7) -> dict:
8    """
9    Check uniform integrability of a family of samples.
10
11    For UI: lim_{K->inf} sup_n E[|X_n| * 1_{|X_n|>K}] = 0
12
13    Args:
14        samples_list: List of sample arrays, one per X_n
15        K_values: Threshold values to test
16
17    Returns:
18        dict with K values and corresponding tail integrals
19    """
20    if K_values is None:
21        # Auto-select K values based on data range
22        all_data = np.concatenate(samples_list)
23        K_values = np.linspace(0, np.percentile(np.abs(all_data), 99), 20)
24
25    results = {"K": K_values, "sup_tail_integral": []}
26
27    for K in K_values:
28        tail_integrals = []
29        for samples in samples_list:
30            mask = np.abs(samples) > K
31            tail_integral = np.mean(np.abs(samples) * mask)
32            tail_integrals.append(tail_integral)
33
34        results["sup_tail_integral"].append(max(tail_integrals))
35
36    # Check if UI holds (tail integrals vanish)
37    results["is_UI"] = results["sup_tail_integral"][-1] < 0.01
38
39    return results
40
41
42def demonstrate_vitali_theorem():
43    """
44    Demonstrate Vitali: P-convergence + UI => L1-convergence
45    """
46    np.random.seed(42)
47
48    # Example 1: UI holds (bounded sequence)
49    print("Example 1: Bounded sequence (UI holds)")
50    samples_ui = [np.random.uniform(-1, 1, 1000) * (1 - 1/(n+1))
51                  for n in range(1, 20)]
52    result_ui = check_uniform_integrability(samples_ui)
53    print(f"  Uniformly Integrable: {result_ui['is_UI']}")
54    print(f"  Tail integral at max K: {result_ui['sup_tail_integral'][-1]:.6f}")
55
56    # Example 2: UI fails (escaping mass)
57    print("\nExample 2: Escaping mass (UI fails)")
58    samples_not_ui = []
59    for n in range(1, 20):
60        # Most samples near 0, but occasional huge values
61        samples = np.zeros(1000)
62        num_large = max(1, 1000 // (n + 1))
63        samples[:num_large] = n * np.random.exponential(1, num_large)
64        np.random.shuffle(samples)
65        samples_not_ui.append(samples)
66
67    result_not_ui = check_uniform_integrability(samples_not_ui)
68    print(f"  Uniformly Integrable: {result_not_ui['is_UI']}")
69    print(f"  Tail integral at max K: {result_not_ui['sup_tail_integral'][-1]:.6f}")
70
71
72if __name__ == "__main__":
73    demonstrate_vitali_theorem()

Common Mistakes to Avoid

❌

Mistake 1: Assuming E[X_n] → E[X] automatically

Reality: Convergence in probability does NOT imply convergence of expectations. You need uniform integrability!

❌

Mistake 2: Confusing UI with bounded expectations

Reality: sup_n E[|X_n|] < ∞ is necessary but NOT sufficient for UI. You also need tail control.

✅

Correct Approach

To prove L¹ convergence, verify (1) convergence in probability, AND (2) uniform integrability (often via a dominating function or moment bound).

Practice Problems

Summary

L¹ convergence means E[|X_n - X|] → 0, implying convergence of expectations
Uniform integrability ensures tail contributions stay uniformly bounded across all n
Vitali's Theorem: Convergence in probability + UI ⟺ L¹ convergence
Dominated Convergence is a special case where domination implies UI
In ML: UI conditions justify swapping limits and expectations in loss function analysis

Key Takeaway

When you want to prove E[X_n] → E[X], don't just check pointwise or probability convergence. Ask: "Are the tails under control?"Uniform integrability is your answer.

Learning Objectives

The Story: When L¹ Convergence Works

L¹ (Mean) Convergence

Formal Definition

L¹ vs L² Convergence

The Hierarchy

Uniform Integrability

Definition and Intuition

Intuitive Understanding

Equivalent Conditions

Interactive: Uniform Integrability

Vitali Convergence Theorem

The Theorem

Proof Sketch

Click to see the proof sketch

Connection to Dominated Convergence

Examples and Counterexamples

Example: When Uniform Integrability Fails

What Went Wrong?

Example: When Uniform Integrability Holds

Machine Learning Applications

Python Implementation

Common Mistakes to Avoid

Practice Problems

Problem 1: Verify UI

Problem 2: Counterexample

Summary

Key Takeaway