Chapter 10
35 min read
Section 67 of 175

Central Limit Theorem - Proof and Intuition

Fundamental Theorems

Learning Objectives

By the end of this section, you will be able to:

  1. State the Central Limit Theorem precisely and explain each condition
  2. Understand the proof via characteristic functions at both intuitive and rigorous levels
  3. Visualize convergence to normality for various starting distributions
  4. Apply the CLT to construct confidence intervals and hypothesis tests
  5. Recognize when the CLT applies and when it fails (heavy tails, dependence)
  6. Explain the Berry-Esseen theorem and convergence rates
  7. Connect CLT to modern AI/ML applications including mini-batch gradient descent and model averaging

Prerequisites: Convergence in Distribution from Chapter 9

The CLT is fundamentally about convergence in distribution (Section 9.3). The standardized sample mean converges in distribution to N(0,1). If you haven't studied convergence modes yet, review Chapter 9 first.


The Big Picture: Why CLT Matters

"The Central Limit Theorem is perhaps the most important theorem in all of probability theory." — It explains why the normal distribution appears everywhere.

The Central Limit Theorem (CLT) answers one of the most profound questions in probability: Why is the bell curve so universal?

The answer is remarkable: when you average many independent random quantities, the result tends toward a normal distribution regardless of what the original quantities looked like. It doesn't matter if you start with dice rolls, exponential waiting times, or any other distribution—the average converges to normal.

The Central Insight

Averaging creates normality. This is why the normal distribution appears naturally whenever a quantity is the aggregate of many small, independent effects:

  • Measurement errors — Sum of many small perturbations
  • Human heights — Sum of genetic and environmental factors
  • Stock returns — Sum of many small price movements
  • Mini-batch gradients — Average of individual sample gradients

Historical Context

The CLT was developed over nearly two centuries by some of history's greatest mathematicians:

Abraham de Moivre (1733)

First discovered that the binomial distribution approaches the normal curve. He was trying to compute gambling probabilities and noticed the pattern in coin flip outcomes.

Pierre-Simon Laplace (1810)

Extended de Moivre's result to more general settings. Introduced the idea that errors in astronomical measurements average to a normal distribution.

Aleksandr Lyapunov (1901)

Proved the CLT using characteristic functions under general conditions. His conditions (the Lyapunov condition) remain important for verifying when CLT applies.

Jarl Lindeberg & Paul Lévy (1920s)

Established the most general form of CLT and the Lindeberg condition. Lévy's continuity theorem connected characteristic function convergence to distribution convergence.

Why So Many Contributors?

Each mathematician tackled the CLT under progressively weaker assumptions. De Moivre's version required identical coin flips; Lindeberg's version allows different distributions for each random variable! The journey from special case to general theorem took 200 years of mathematical development.


Formal Statement of the Central Limit Theorem

Let X1,X2,,XnX_1, X_2, \ldots, X_n be a sequence of independent and identically distributed (i.i.d.) random variables with:

  • Mean: E[Xi]=μ\mathbb{E}[X_i] = \mu
  • Variance: Var(Xi)=σ2<\text{Var}(X_i) = \sigma^2 < \infty

Define the sample mean:

Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n} \sum_{i=1}^{n} X_i

And the standardized sum:

Zn=Xˉnμσ/n=i=1nXinμσnZ_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sum_{i=1}^{n} X_i - n\mu}{\sigma \sqrt{n}}

Central Limit Theorem (Lindeberg-Lévy)

As nn \to \infty, the standardized sum converges in distribution to a standard normal:

ZndN(0,1)Z_n \xrightarrow{d} N(0, 1)

Equivalently, for any real numbers a<ba < b:

limnP(aZnb)=Φ(b)Φ(a)\lim_{n \to \infty} P(a \leq Z_n \leq b) = \Phi(b) - \Phi(a)

where Φ\Phi is the standard normal CDF.

Unpacking the Statement

ComponentMeaningWhy It Matters
i.i.d.Independent, Identically DistributedEach observation is drawn from the same distribution without affecting others
Finite variancesigma^2 < infinityHeavy-tailed distributions (like Cauchy) violate CLT
Standardization(X_bar - mu) / (sigma/sqrt(n))Centers at 0 and scales to variance 1
Convergence in distributionCDFs converge pointwiseWeaker than almost sure convergence (LLN)
sqrt(n) in denominatorStandard error shrinks like 1/sqrt(n)This is the rate of convergence to the mean

Finite Variance is Crucial!

The CLT fails for heavy-tailed distributions with infinite variance. For example, the Cauchy distribution (ratio of two standard normals) has no mean or variance, and the average of n Cauchy random variables is still Cauchy—no convergence to normal!


Building Intuition: Why Does It Work?

Before diving into the proof, let's build intuition for why averaging creates bell curves. There are several complementary ways to understand this:

1. The Cancellation Argument

When you sum many random quantities, extreme values in one direction tend to be cancelled by extreme values in the opposite direction. Only the "typical" combinations survive, and there are many more ways to get average outcomes than extreme ones.

  • Extreme outcome: All 10 dice show 6 → Only 1 way
  • Average outcome: Total = 35 → Millions of combinations

2. The Random Walk Analogy

Think of a random walk where each step is XiμX_i - \mu. The sum (Xiμ)\sum(X_i - \mu) is your position after nn steps.

Each individual path is unpredictable, but the distribution of where walkers end up follows a predictable pattern: a bell curve centered at the origin with width growing like σn\sigma\sqrt{n}.

3. Information Geometry Perspective

The normal distribution maximizes entropy (uncertainty) for a given mean and variance. When you average many variables:

  • The mean is preserved (linearity of expectation)
  • The variance is reduced (by factor of nn)
  • Other shape features (skewness, kurtosis) are washed out faster than variance

The result is pushed toward the maximum entropy distribution: the normal.


Interactive CLT Simulation

Experience the CLT in action. Select any starting distribution—no matter how strange—and watch as the distribution of sample means converges to a bell curve:

Central Limit Theorem in Action

Watch the distribution of sample means converge to a bell curve, regardless of the original distribution!

Original
Dice Roll
Sample Means Generated
0
Theory: Mean of Means
mu = 3.5000
Theory: Std of Means
sigma/sqrt(n) = 0.5401
1.882.422.963.504.044.585.12Sample Mean (X-bar)FrequencyHistogram of MeansNormal Curve (Theory)
The Magic of CLT
No matter how strange the original distribution looks, the distribution of sample means becomes a bell curve as sample size increases! This is why the normal distribution appears everywhere in statistics.

Try These Experiments

  • Exponential (skewed): Watch the right tail disappear as n increases
  • Bimodal: The two peaks merge into a single bell curve!
  • Dice: Discrete becomes continuous as n grows
  • Compare n=5 vs n=30: How fast does convergence happen?

Proof via Characteristic Functions

The most elegant proof of the CLT uses characteristic functions. This approach, pioneered by Lyapunov and refined by Lévy, reveals the deep connection between the CLT and Fourier analysis.

Why Characteristic Functions?

Characteristic functions have a magical property: multiplication converts to addition. If XX and YY are independent:

φX+Y(t)=φX(t)φY(t)\varphi_{X+Y}(t) = \varphi_X(t) \cdot \varphi_Y(t)

This means the CF of a sum of i.i.d. variables is simply a power:

φX1++Xn(t)=[φX(t)]n\varphi_{X_1 + \cdots + X_n}(t) = [\varphi_X(t)]^n

The Proof in Four Steps

Step 1: Set Up the Standardized CF

Let φ(t)\varphi(t) be the CF of X1X_1 (centered to have mean 0). The CF of the standardized sum ZnZ_n is:

φZn(t)=[φ(tσn)]n\varphi_{Z_n}(t) = \left[\varphi\left(\frac{t}{\sigma\sqrt{n}}\right)\right]^n

Step 2: Taylor Expand the CF

Expand φ(t)\varphi(t) around t=0t = 0:

φ(t)=1+itE[X]t22E[X2]+o(t2)\varphi(t) = 1 + it \cdot \mathbb{E}[X] - \frac{t^2}{2}\mathbb{E}[X^2] + o(t^2)

Since E[X]=0\mathbb{E}[X] = 0 (centered) and E[X2]=σ2\mathbb{E}[X^2] = \sigma^2:

φ(t)=1σ2t22+o(t2)\varphi(t) = 1 - \frac{\sigma^2 t^2}{2} + o(t^2)

Step 3: Substitute and Simplify

Substituting t/(σn)t/(\sigma\sqrt{n}) for tt:

φ(tσn)=1t22n+o(1n)\varphi\left(\frac{t}{\sigma\sqrt{n}}\right) = 1 - \frac{t^2}{2n} + o\left(\frac{1}{n}\right)

Raising to the nn-th power:

[1t22n+o(1n)]n\left[1 - \frac{t^2}{2n} + o\left(\frac{1}{n}\right)\right]^n

Step 4: Apply the Exponential Limit

The famous limit (1+x/n)nex(1 + x/n)^n \to e^x as nn \to \infty gives us:

limn[1t22n]n=et2/2\lim_{n \to \infty} \left[1 - \frac{t^2}{2n}\right]^n = e^{-t^2/2}

This is exactly the characteristic function of N(0,1)N(0, 1)!

The Finishing Touch: Lévy's Continuity Theorem

Lévy's Continuity Theorem: If characteristic functions converge pointwise to a function that is continuous at 0, then the corresponding distributions converge.

Since et2/2e^{-t^2/2} is the CF of N(0,1)N(0,1) and is continuous everywhere:

φZn(t)et2/2    ZndN(0,1)\varphi_{Z_n}(t) \to e^{-t^2/2} \implies Z_n \xrightarrow{d} N(0,1) \quad \blacksquare

Interactive Proof Walkthrough

Watch the characteristic functions converge to the standard normal CF. This visualization shows the proof in action:

Central Limit Theorem via Characteristic Functions

[φ(t/√n)]n → e-t²/2 as n → ∞

The CF of the standardized sum converges to the standard normal CF!

n=1 (original)n=50
Target: e^(-t²/2)Re[[φ(t/√n)]ⁿ]Im[[φ(t/√n)]ⁿ]t (frequency)-101Characteristic Function of Standardized Sum S̄ₙ = (X₁+...+Xₙ - nμ) / (σ√n)
Base Distribution
Uniform
U[-√3, √3]: A flat, bounded distribution
Sum of Samples
n = 1
Original distribution
Max Deviation from Normal
0.2605
Keep increasing n
Why This Proves the CLT
  1. For any distribution with mean μ and variance σ², the standardized sum is S̄ₙ = (∑Xᵢ - nμ)/(σ√n)
  2. Its CF is [φ(t/√n)]ⁿ where φ is the CF of the standardized original distribution
  3. Taylor expansion: φ(t/√n) ≈ 1 - t²/(2n) + O(1/n²) for large n
  4. Therefore: [1 - t²/(2n)]ⁿ → e-t²/2 as n → ∞
  5. Since CFs uniquely determine distributions, S̄ₙ → N(0,1) in distribution!

Now explore the proof steps interactively. Adjust parameters to see how the convergence depends on the starting distribution and sample size:

Interactive CLT Proof Walkthrough

1
Setup: Standardized Sum

Define the standardized sum of i.i.d. random variables

Z_n = (X_1 + ... + X_n - nμ) / (σ√n)

We standardize the sum to have mean 0 and variance 1. This makes the result independent of the original mean and variance.

Target: e^(-t²/2)Exact: Re[φ(t/√n)]ⁿTaylor: [1-t²/(2n)]ⁿt (frequency)-101
Sample Size
n = 10
Max CF Error
0.0112
Current Step
1 / 7

Convergence Rate: The Berry-Esseen Theorem

The CLT tells us that convergence happens, but not how fast. The Berry-Esseen theorem quantifies the rate:

Berry-Esseen Theorem

Let X1,,XnX_1, \ldots, X_n be i.i.d. with mean μ\mu, variance σ2\sigma^2, and finite third absolute moment ρ=E[Xμ3]\rho = \mathbb{E}[|X - \mu|^3]. Then:

supxFZn(x)Φ(x)Cρσ3n\sup_x |F_{Z_n}(x) - \Phi(x)| \leq \frac{C \cdot \rho}{\sigma^3 \sqrt{n}}

where C0.4748C \leq 0.4748 (the best known constant).

What This Means

  • Rate: The error decreases as O(1/n)O(1/\sqrt{n}). To halve the error, you need 4x the samples.
  • Skewness matters: The ratio ρ/σ3\rho/\sigma^3 measures standardized third moment (related to skewness). More skewed distributions converge more slowly.
  • Uniform bound: The supremum is over all xx—the worst-case CDF difference.

CLT Convergence Rate Explorer (Berry-Esseen)

sup|FZn(x) - Φ(x)| ≤ C ⋅ ρ / (σ³ √n)

Error decreases as O(1/√n). Skewed distributions converge more slowly.

Skewness
2.00
ρ/σ³
2.00
Berry-Esseen Bound
0.1734
Actual Error
0.0304
-4-202400.51Empirical CDFNormal CDFB-E Boundsz (standardized value)CDF
Convergence Rate: Error vs Sample Size
050100150200EmpiricalBoundSample size (n)
Key Insight
The Exponential(1) distribution has ρ/σ³ = 2.00. To achieve 5% error, you need approximately n ≈ 361 samples. Compare this across distributions to see how skewness affects convergence!

Rule of Thumb Revisited

The "n ≥ 30" rule of thumb comes from practical experience. For mildly skewed distributions, Berry-Esseen suggests errors around 2-3% at n=30. For highly skewed distributions (like exponential), you may need n ≥ 100 for similar accuracy.


Practical Implications

The CLT is the theoretical foundation for many practical statistical procedures:

1. Confidence Intervals

For large n, a 95% confidence interval for the population mean is approximately:

Xˉ±1.96sn\bar{X} \pm 1.96 \cdot \frac{s}{\sqrt{n}}

where ss is the sample standard deviation. This works because (Xˉμ)/(s/n)(\bar{X} - \mu)/(s/\sqrt{n}) is approximatelyN(0,1)N(0,1) by CLT.

2. Hypothesis Testing

The z-test and (approximately) the t-test rely on CLT. When testing H0:μ=μ0H_0: \mu = \mu_0:

z=Xˉμ0s/nH0N(0,1)z = \frac{\bar{X} - \mu_0}{s/\sqrt{n}} \stackrel{H_0}{\sim} N(0,1)

3. Survey Sampling

Political polls use CLT. If 1,000 people are sampled and 52% support a candidate:

Margin of Error1.960.52×0.4810003.1%\text{Margin of Error} \approx 1.96 \sqrt{\frac{0.52 \times 0.48}{1000}} \approx 3.1\%

This ±3% margin is the standard "margin of error" in polling.


AI/ML Applications

The CLT is deeply embedded in modern machine learning. Here's how:

1. Mini-Batch Gradient Descent

The mini-batch gradient is an average of individual sample gradients:

batch=1Bi=1Bi\nabla_{\text{batch}} = \frac{1}{B} \sum_{i=1}^{B} \nabla_i

By CLT, this is approximately normal around the true gradient. Key insights:

  • Variance reduction: Batch variance is σ2/B\sigma^2/B, so larger batches give more stable gradients
  • Learning rate scaling: Linear scaling rule (larger batch = larger LR) works because CLT maintains the signal-to-noise ratio
  • Gradient noise: The deviation from true gradient is approximately Gaussian, which is why adaptive optimizers model it as such

2. Ensemble Methods and Model Averaging

When averaging predictions from multiple models:

y^ensemble=1Mm=1My^m\hat{y}_{\text{ensemble}} = \frac{1}{M} \sum_{m=1}^{M} \hat{y}_m

CLT explains why:

  • Bagging reduces variance: Error variance decreases as 1/M1/M for independent models
  • Prediction intervals: The uncertainty in ensemble predictions is approximately Gaussian
  • Random forests: Bootstrap aggregating exploits CLT to stabilize decision tree predictions

3. Neural Network Initialization

Pre-activations are sums of weighted inputs:

z=i=1nwixiz = \sum_{i=1}^{n} w_i x_i

By CLT, for large n, zz is approximately Gaussian regardless of xix_i distribution. This justifies:

  • Xavier/Glorot initialization assumes pre-activations are Gaussian
  • Batch normalization exploits this by standardizing to N(0,1)N(0,1)
  • Weight pruning analysis assumes approximately Gaussian weight distributions

4. Bayesian Deep Learning

The Laplace approximation uses CLT to approximate posterior distributions:

p(θD)N(θ^,H1)p(\theta | D) \approx N(\hat{\theta}, H^{-1})

where HH is the Hessian at the MAP estimate. CLT justifies this Gaussian approximation for large datasets.


Python Implementation

Let's implement a comprehensive demonstration of the CLT with code explanations:

Central Limit Theorem Demonstration
🐍clt_demonstration.py
1Import Libraries

NumPy for numerical operations, SciPy for statistical functions, and Matplotlib for visualization.

4Define a Highly Skewed Distribution

We use the exponential distribution because it is highly right-skewed (not at all bell-shaped). This demonstrates that CLT works even for very non-normal distributions.

5Theoretical Parameters

For Exp(lambda), the mean is 1/lambda and variance is 1/lambda^2. We will use these to verify the CLT predictions.

8Sample Sizes to Test

We test n=1, 5, 30, and 100 to see how quickly the distribution of sample means converges to normal. The rule of thumb is n>=30, but it depends on the original distribution.

11Generate Sample Means

For each sample size n, we generate 10,000 experiments. Each experiment draws n samples and computes their mean. This gives us the empirical distribution of sample means.

18Standardize the Sample Means

We convert to Z-scores using the CLT formula: Z = (X_bar - mu) / (sigma / sqrt(n)). Under CLT, these should follow N(0,1).

22Compare to Standard Normal

We overlay the standard normal PDF on our histogram. As n increases, the histogram should match this curve increasingly well.

27KS Test for Normality

The Kolmogorov-Smirnov test quantitatively measures how close our sample means are to the normal distribution. Smaller KS statistics mean better fit.

36 lines without explanation
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5# Define a highly skewed distribution (exponential)
6lambda_param = 1.0
7mu = 1 / lambda_param  # Theoretical mean
8sigma = 1 / lambda_param  # Theoretical std
9
10# Sample sizes to demonstrate CLT
11sample_sizes = [1, 5, 30, 100]
12n_experiments = 10000
13
14fig, axes = plt.subplots(2, 2, figsize=(12, 10))
15axes = axes.flatten()
16
17for i, n in enumerate(sample_sizes):
18    # Generate n_experiments sample means, each from n observations
19    sample_means = np.array([
20        np.random.exponential(1/lambda_param, n).mean()
21        for _ in range(n_experiments)
22    ])
23
24    # Standardize using CLT formula
25    z_scores = (sample_means - mu) / (sigma / np.sqrt(n))
26
27    # Plot histogram of standardized means
28    axes[i].hist(z_scores, bins=50, density=True, alpha=0.7,
29                 label=f'Sample means (n={n})', color='steelblue')
30
31    # Overlay standard normal PDF
32    x = np.linspace(-4, 4, 100)
33    axes[i].plot(x, stats.norm.pdf(x), 'r-', lw=2,
34                 label='N(0,1) from CLT')
35
36    # Kolmogorov-Smirnov test
37    ks_stat, p_value = stats.kstest(z_scores, 'norm')
38    axes[i].set_title(f'n={n}  |  KS stat={ks_stat:.4f}')
39    axes[i].legend()
40    axes[i].set_xlim(-4, 4)
41
42plt.suptitle('CLT: Exponential to Normal', fontsize=14)
43plt.tight_layout()
44plt.show()

Verifying CLT Mathematically

Verifying the CLT Proof Steps
🐍clt_proof_verification.py
2Standardized Sum Definition

We define Z_n as the standardized sum of X_i. This has mean 0 and variance 1, regardless of the original distribution (as long as it has finite mean and variance).

5CF of Z_n

The characteristic function of the standardized sum can be written in terms of the original CF. The key is the scaling property: CF of aX at t equals CF of X at at.

8Taylor Expansion

We expand the CF around t=0. The key properties are: phi(0)=1 (normalization), phi'(0)=i*mu (mean), phi''(0)=-E[X^2] (second moment).

11Substitution

Substituting the Taylor expansion into [phi(t/sqrt(n))]^n, we get a form that looks like (1 + x/n)^n.

14Limit Result

The famous limit (1 + x/n)^n -> e^x as n -> infinity gives us the standard normal CF: exp(-t^2/2).

17Levy Continuity Theorem

Since CFs uniquely determine distributions, convergence of CFs implies convergence in distribution. This completes the proof!

29 lines without explanation
1# Mathematical verification of CLT proof steps
2import numpy as np
3from scipy import stats
4
5# Step 1: Define standardized sum Z_n
6def standardized_sum(samples):
7    """Z_n = (sum(X_i) - n*mu) / (sigma*sqrt(n))"""
8    n = len(samples)
9    return (np.sum(samples) - n*mu) / (sigma * np.sqrt(n))
10
11# Step 2: CF of standardized sum is [phi(t/sqrt(n))]^n
12def cf_standardized_sum(t, n, original_cf):
13    """Compute CF of Z_n at point t"""
14    scaled_t = t / np.sqrt(n)
15    return original_cf(scaled_t) ** n
16
17# Step 3: Taylor expansion shows phi(s) ≈ 1 - s^2/2
18# For exponential(1) standardized: phi(t) = 1/(1-it) * exp(-it)
19def exp_cf(t):
20    """CF of standardized Exp(1): (X - 1)"""
21    return np.exp(-1j*t) / (1 - 1j*t)
22
23# Step 4: Limit gives standard normal CF
24def normal_cf(t):
25    """CF of N(0,1): exp(-t^2/2)"""
26    return np.exp(-t**2 / 2)
27
28# Demonstrate convergence
29t_vals = np.linspace(-3, 3, 100)
30n_values = [1, 5, 10, 50, 100]
31
32for n in n_values:
33    cf_zn = np.array([cf_standardized_sum(t, n, exp_cf) for t in t_vals])
34    max_diff = np.max(np.abs(cf_zn - normal_cf(t_vals)))
35    print(f"n={n:3d}: Max |CF(Z_n) - CF(N(0,1))| = {max_diff:.6f}")

Common Misconceptions

Misconception 1: CLT Makes Everything Normal

Wrong: "If I have 30 samples, my data is normally distributed."

Right: The CLT applies to the sampling distribution of the mean, not the data itself. Your original data retains its original distribution. Only the distribution of Xˉ\bar{X} across many samples becomes normal.

Misconception 2: Bigger n Always Means Better Approximation

Nuance: While larger n improves CLT approximation, the rate depends on the original distribution. A symmetric distribution with light tails may converge quickly (n=5 sufficient), while a heavily skewed or heavy-tailed distribution may need n=100 or more.

Misconception 3: CLT Requires Normal Starting Distribution

Completely wrong! The beauty of CLT is that it works for any distribution with finite variance. The starting distribution can be discrete, continuous, skewed, multimodal—it doesn't matter!

Misconception 4: Independence Can Be Ignored

Wrong: Many real-world scenarios violate independence. Time series data, spatial data, and clustered data all have dependence structures that can break CLT. Special versions (like the CLT for dependent variables) may apply, but the standard CLT requires independence.

When CLT Fails

CLT can fail when:

  • Infinite variance: Cauchy, stable distributions
  • Strong dependence: Highly correlated observations
  • Non-stationary: Distributions changing over time
  • Small n with extreme skewness: Need more samples

Test Your Understanding

CLT Knowledge Check

1 / 10

What does the Central Limit Theorem say about the distribution of sample means?


Summary

The Central Limit Theorem is one of the most profound results in probability theory. It explains why the normal distribution appears so frequently in nature and provides the theoretical foundation for statistical inference.

Key Formulas

FormulaDescription
Z_n = (X_bar - mu) / (sigma/sqrt(n))Standardized sample mean
Z_n -> N(0,1) as n -> infinityCentral Limit Theorem
[phi(t/sqrt(n))]^n -> exp(-t^2/2)CF convergence (proof key step)
Error <= C * rho / (sigma^3 * sqrt(n))Berry-Esseen bound
SE = sigma / sqrt(n)Standard error of the mean

Key Takeaways

  1. The CLT states that standardized sample means converge to N(0,1)N(0,1) regardless of the original distribution
  2. The proof via characteristic functions uses the exponential limit (1+x/n)nex(1 + x/n)^n \to e^x
  3. Convergence rate is O(1/n)O(1/\sqrt{n}) (Berry-Esseen); more skewed distributions converge more slowly
  4. Finite variance is essential; heavy-tailed distributions (infinite variance) violate CLT
  5. In ML: CLT justifies mini-batch gradient normality, ensemble averaging, and Bayesian approximations
  6. Always verify assumptions: independence, finite variance, sufficient sample size for your skewness level
The Essence of CLT:
"Average enough random things, and you get a bell curve—nature's attractor for sums."
Coming Next: In the next section, we'll explore CLT Variants—generalizations that handle non-identical distributions, dependent variables, and other extensions beyond the classical CLT.
Loading comments...