Chapter 5
25 min read
Section 38 of 175

Student t-Distribution

Continuous Distributions

Learning Objectives

By the end of this section, you will be able to:

  1. Define the Student t-distribution and explain its relationship to the Normal and Chi-Square distributions
  2. Calculate probabilities and critical values using the t-distribution with different degrees of freedom
  3. Distinguish when to use the t-distribution vs. the Normal distribution based on sample size and knowledge of population variance
  4. Derive the t-statistic formula and understand each component
  5. Apply t-tests (one-sample, two-sample, paired) for hypothesis testing
  6. Construct confidence intervals for population means with unknown variance
  7. Explain why heavy tails matter and their practical implications for outlier robustness
  8. Implement t-tests and robust statistical methods in Python

The Big Picture: The Normal's Cautious Cousin

"The t-distribution is what uncertainty looks like when you're uncertain about your uncertainty."

When you learned about the Normal distribution, you probably encountered formulas like the z-score: z = (ar{x} - \mu) / (\sigma / \sqrt{n}). This formula assumes you know the population standard deviation sigmasigma. But in the real world, you almost never know sigmasigma!

The question that haunted statisticians in the early 1900s was: What happens when we replace the known σ with an estimate s from our sample?

The Core Insight

When you estimate sigmasigma from your sample data, you introduce additional uncertainty. The sample standard deviation ss varies from sample to sample, especially with small samples. The t-distribution accounts for this extra source of variability through its heavier tails.

Think of it this way:

  • Normal distribution: You know exactly how spread out the data is (σ is known)
  • t-distribution: You're estimating the spread from your sample (σ is unknown)

Because you're less certain about the spread, extreme values become more likely. The t-distribution captures this by having fatter tails than the Normal.


The Guinness Brewery Story

Setting: Dublin, Ireland, Early 1900s

William Sealy Gosset was a chemist and statistician working at the famous Guinness Brewery. His job was to ensure beer quality by testing samples of barley and other ingredients.

But Gosset faced a practical problem: testing was expensive. He could only afford to test small samples—perhaps 3 to 10 measurements per batch.

The existing statistical methods (based on the Normal distribution) required either knowing the true population variance or having large samples. Gosset had neither.

The Problem with Small Samples

When Gosset used the sample standard deviation ss instead of the population sigmasigma in his z-score calculations, something was wrong. His calculated probabilities didn't match reality—especially with small samples.

The issue: ss is itself a random variable that underestimatessigmasigma on average for small samples. When you divide by a randomly varying quantity, you get heavier tails.

The Breakthrough (1908)

Gosset mathematically derived what happens when you substitute ss forsigmasigma. The resulting distribution had:

  • The same bell-curve shape as the Normal
  • Heavier tails (extreme values more likely)
  • A parameter called "degrees of freedom" that controls how different it is from Normal
  • Convergence to the Normal as sample size increases

The Pseudonym "Student"

Guinness prohibited employees from publishing scientific papers to prevent competitors from learning their methods. So Gosset published under the pseudonym "Student" in the journal Biometrika in 1908—hence "Student's t-distribution."

Historical Impact

This single paper revolutionized statistical practice. For the first time, scientists could make valid inferences from small samples—crucial in fields like medicine, agriculture, and physics where data collection is expensive.

Mathematical Definition

Definition 1: The Student t-Distribution

If ZsimN(0,1)Z sim N(0,1) and Vsimχ2(ν)V sim \chi^2(\nu) are independent random variables, then:

T = rac{Z}{sqrt{V/\nu}} sim t(\nu)

where ν\nu (the Greek letter "nu") is the degrees of freedom.

Symbol Table

SymbolNameMeaningRange
Tt-statisticRandom variable following t-distribution(-∞, ∞)
ZStandard NormalA N(0,1) random variable(-∞, ∞)
VChi-squareSum of ν squared standard normals(0, ∞)
ν (nu)Degrees of freedomShape parameter; typically n-11, 2, 3, ...
nSample sizeNumber of observations1, 2, 3, ...
Intuitive Statement: The t-distribution is what you get when you take a standard Normal random variable and divide by an estimate of its standard deviation. The "sloppiness" of that estimate (captured by degrees of freedom) determines how heavy the tails are.

The Uncertain Archer Analogy

Imagine an archer aiming at a target (the true population mean):

  • Normal distribution: The archer knows exactly how steady their hands are (known sigmasigma). They can predict their shot pattern precisely.
  • t-distribution: The archer must also estimate their own steadiness from a few practice shots (estimated ss). With fewer practice shots (low df), they're less certain about their steadiness, so they might occasionally shoot much farther from center than expected → heavier tails.

Definition 2: Probability Density Function (PDF)

f(t; \nu) = rac{\Gammaleft( rac{\nu+1}{2} ight)}{sqrt{\nupi}\,\Gammaleft( rac{\nu}{2} ight)} left(1 + rac{t^2}{\nu} ight)^{- rac{\nu+1}{2}}

Where Γ()\Gamma(\cdot) is the Gamma function (a generalization of factorial).

What the PDF tells us: The (1+t2/ν)(1 + t^2/\nu) term in the denominator determines how fast the tails decay. Smaller ν\nu means slower decay and heavier tails.

Definition 3: Key Properties

PropertyFormulaNormal Comparison
MeanE[T] = 0 (for ν > 1)Same as N(0,1)
VarianceVar(T) = ν/(ν-2) (for ν > 2)Greater than 1
Symmetryf(-t) = f(t)Same
Support(-∞, ∞)Same
Kurtosis3 + 6/(ν-4) (for ν > 4)Heavier tails

Critical Insight: Variance Depends on df

The variance formula extVar(T)=ν/(ν2)ext{Var}(T) = \nu/(\nu-2) shows:

  • When ν = 3: Variance = 3 (much larger than Normal's 1)
  • When ν = 10: Variance = 1.25
  • When ν = 30: Variance ≈ 1.07
  • As ν → ∞: Variance → 1 (same as Normal)

Interactive PDF Explorer

Explore how the t-distribution changes with degrees of freedom. Adjust the slider and watch how the distribution transitions from heavy-tailed to Normal-like:

Student t-Distribution Explorer

1 (Cauchy)30100 (≈Normal)
Display Options
-4-3-2-101234t0.000.100.200.300.40Density f(t)t(5)N(0,1)
Degrees of Freedom
df = 5
Mean
E[T] = 0
Variance
1.667
t₀.₀₂₅ (α=0.05)
±2.573

Key Insight

With low df (5), notice the heavier tails compared to the Normal. The variance is 1.67, which is 67% larger than the Normal's variance of 1.

Try This

  • Set df = 1 to see the Cauchy distribution (no defined mean or variance!)
  • Increase df gradually to see convergence to Normal
  • Toggle "Show Critical Values" to see how t-critical changes with df
  • Compare the t-distribution to the Normal overlay

t-Distribution vs Normal Distribution

The key difference between t and Normal is in the tails. This interactive visualization lets you compare them directly:

t-Distribution vs Normal Distribution

1103050
View Mode
t(5)N(0,1)-4-3-2-101234ValueDensity
t-Distribution Variance
1.6667
Normal variance = 1
Total Absolute Difference
0.1178
Area between curves
Convergence Status
Different
Rule of thumb: df ≥ 30

Why Does This Matter?

The heavier tails of the t-distribution have practical consequences:

  • Wider confidence intervals: We need wider intervals to maintain the same confidence level
  • Larger critical values: The t-critical value is always larger than the z-critical value (for finite df)
  • Lower power: Harder to reject the null hypothesis with small samples
  • More robust: Heavy tails make the distribution more tolerant of outliers

Understanding Heavy Tails

Why do "heavy tails" matter? Consider the probability of observing an extreme value (beyond 3 standard deviations):

DistributionP(|X| > 3)Frequency
Normal N(0,1)0.0027About 1 in 370
t(5)0.0145About 1 in 69
t(3)0.0385About 1 in 26
t(1) = Cauchy0.205About 1 in 5

With t(3), extreme values are 14 times more likely than with the Normal! This is why using the Normal distribution when you should use t leads to:

  • Underestimating p-values
  • Confidence intervals that are too narrow
  • False rejections of the null hypothesis

The Cauchy Special Case

When df = 1, the t-distribution becomes the Cauchy distribution. Its tails are so heavy that the mean is undefined (the integral diverges) and the variance is infinite. The Law of Large Numbers doesn't apply!


Degrees of Freedom Explained

The "degrees of freedom" parameter is often mystifying. Here's the intuition:

Degrees of freedom = the number of independent pieces of information available to estimate a parameter.

Why n-1 for a Sample Mean?

When calculating the sample variance s^2 = rac{1}{n-1}\sum(x_i - ar{x})^2, we divide by n1n-1 not nn. Here's why:

  • We start with nn observations, each providing one piece of information
  • We "use up" one degree of freedom to estimate ar{x}
  • Only n1n-1 deviations are truly independent (the last one is determined by the constraint that deviations sum to zero)

The Convergence Story

As degrees of freedom increase:

  • df = 1: Heavy-tailed Cauchy (no mean or variance)
  • df = 3-5: Noticeably heavier tails than Normal
  • df = 10-20: Getting closer to Normal
  • df = 30+: Practically indistinguishable from Normal
  • df → ∞: Exactly the standard Normal distribution

Rule of Thumb

When n > 30, the difference between t and z critical values is often negligible. However, modern practice is to always use t-distribution (never hurts, often helps).

The t-Statistic Formula

For a sample mean ar{x} from nn observations with sample standard deviation ss:

t = rac{ar{x} - \mu_0}{s / \sqrt{n}}

Symbol Table

SymbolNameMeaning
tt-statisticCalculated test statistic
Sample meanAverage of n observations
μ₀Hypothesized meanNull hypothesis value
sSample std devEstimated standard deviation
nSample sizeNumber of observations
s/√nStandard errorEstimated uncertainty in x̄
Intuitive Statement: The t-statistic measures how many estimated standard errors the sample mean is away from the hypothesized population mean.

Compare this to the z-statistic: z = (ar{x} - \mu_0)/(\sigma / \sqrt{n}). The only difference is ss vs. sigmasigma—but this difference is crucial for small samples!


Hypothesis Testing with t

The t-test is one of the most widely used statistical tests. Try it yourself with this interactive demo:

Interactive Hypothesis Testing Demo

Quick Examples:

Sample Statistics

Hypothesis

Settings

Degrees of Freedom
df = 14
t = 1.048-4-2024

Test Statistics

t-statistic:1.0480
Critical t (±):±2.1448
p-value:0.3124
95% CI for μ:[47.593, 57.007]

Decision

FAIL TO REJECT H₀

At α = 0.05, there is insufficient evidence to conclude that the population mean is different from 50.

Formula Used

t =
x̄ − μ₀
s / √n
=
52.3050
8.50 / √15
=1.0480

Types of t-Tests

1. One-Sample t-Test

Tests whether a sample mean differs from a hypothesized population mean.

  • Example: Is the average blood pressure reduction from a new drug significantly different from zero?
  • Formula: t = (ar{x} - \mu_0)/(s/\sqrt{n})
  • df: n1n - 1

2. Two-Sample t-Test (Independent)

Tests whether two independent groups have different means.

  • Example: Do patients receiving drug A have different outcomes than those receiving drug B?
  • Equal variance formula: t = (ar{x}_1 - ar{x}_2)/\sqrt{s_p^2(1/n_1 + 1/n_2)}
  • df: n1+n22n_1 + n_2 - 2 (equal variance assumed)

3. Paired t-Test

Tests whether the mean difference between paired observations is zero.

  • Example: Did students' scores improve from pre-test to post-test?
  • Formula: Compute differences di=xi,afterxi,befored_i = x_{i,after} - x_{i,before}, then apply one-sample t-test to the differences
  • df: n1n - 1 (number of pairs minus 1)

Confidence Intervals

The t-distribution is essential for constructing confidence intervals when the population variance is unknown:

ext{CI} = ar{x} \pm t^* imes rac{s}{\sqrt{n}}

Where tt^* is the critical t-value for your desired confidence level and degrees of freedom.

Confidence Interval Calculator

t-interval vs z-interval Comparison

x̄ = 165.00t-based158.92171.08z-based159.73170.27
1
Calculate degrees of freedom
df = n - 1 = 10 - 1 = 9
2
Find critical t-value for 95% confidence
t0.025000000000000022,df=9 = 2.2622(vs z = 1.9600)
3
Calculate standard error
SE = s / √n = 8.5 / √10 = 2.6879
4
Calculate margin of error
ME = t* × SE = 2.2622 × 2.6879 = 6.0805
5
Construct confidence interval
CI = x̄ ± ME = 165 ± 6.0805
= [158.9195, 171.0805]
95% t-Confidence Interval
[158.919, 171.081]
Width: 12.1611
z-interval (if σ known)
[159.732, 170.268]
Width: 10.5367
Width Difference
+15.4% wider
t-interval accounts for variance uncertainty

Interpretation

We are 95% confident that the true population mean lies between 158.919 and 171.081. With only 10 observations, the t-interval is 15.4% wider than the z-interval to account for uncertainty in estimating σ.

Why t-Intervals Are Wider

The t-interval is always wider than the corresponding z-interval because:

  • t>zt^* > z^* for any finite df
  • This extra width accounts for uncertainty in estimating σ
  • As n increases, the difference shrinks

Real-World Applications

Example 1: Pharmaceutical Testing

Problem: A pharmaceutical company tests a new blood pressure medication on 12 patients. The mean reduction is 8.5 mmHg with s = 4.2 mmHg. Is the drug effective?

Solution:

  • H₀: μ = 0 (no effect) vs H₁: μ > 0 (drug reduces blood pressure)
  • t = (8.5 - 0) / (4.2 / √12) = 8.5 / 1.21 = 7.01
  • df = 11, critical value t₀.₀₅,₁₁ = 1.796
  • Since 7.01 > 1.796, we reject H₀ (p < 0.001)

Example 2: Quality Control (Full Circle to Gosset!)

Problem: Guinness measures alcohol content in 8 samples. Target is 4.2%. Sample mean = 4.35%, s = 0.12%. Is production on target?

Solution:

  • H₀: μ = 4.2 vs H₁: μ ≠ 4.2
  • t = (4.35 - 4.20) / (0.12 / √8) = 0.15 / 0.0424 = 3.54
  • df = 7, critical value t₀.₀₂₅,₇ = 2.365
  • Since 3.54 > 2.365, production is significantly above target

Example 3: A/B Testing in Tech

Problem: A website tests a new checkout flow on 25 users. Mean conversion improvement: 2.4%, s = 5.1%. Is the improvement real?

Solution:

  • t = (2.4 - 0) / (5.1 / √25) = 2.4 / 1.02 = 2.35
  • df = 24, critical value t₀.₀₂₅,₂₄ = 2.064
  • p-value ≈ 0.027
  • Improvement is statistically significant at α = 0.05

Example 4: Financial Analysis

Problem: A hedge fund's strategy shows mean daily return of 0.08% over 50 trading days, with s = 0.5%. Is the strategy profitable?

Solution:

  • t = (0.08 - 0) / (0.5 / √50) = 0.08 / 0.0707 = 1.13
  • df = 49, critical value t₀.₀₅,₄₉ ≈ 1.677
  • p-value ≈ 0.13
  • Not enough evidence (could be luck)

AI/ML Applications

1. Robust Priors in Bayesian Deep Learning

The t-distribution's heavy tails make it an excellent prior distribution for neural network weights:

  • More tolerant of outliers than Gaussian priors
  • Allows occasional large weights without penalty explosion
  • Used in Bayesian neural networks for uncertainty quantification
🐍python
1import torch.distributions as dist
2
3# t-prior for network weights (more robust than Normal)
4t_prior = dist.StudentT(df=4, loc=0, scale=1)
5
6# Sample weights
7weights = t_prior.sample((100,))
8print(f"Weight range: [{weights.min():.3f}, {weights.max():.3f}]")

2. Small-Sample Model Comparison

When comparing two models on limited test sets (common in medical AI):

  • Wrong: Use Normal-based z-test
  • Right: Use paired t-test for significance
🐍python
1from scipy import stats
2import numpy as np
3
4# Model accuracies on 15 test cases
5model_a = np.array([0.82, 0.78, 0.85, 0.79, 0.88, 0.84, 0.80,
6                    0.86, 0.81, 0.83, 0.77, 0.89, 0.82, 0.85, 0.80])
7model_b = np.array([0.79, 0.75, 0.82, 0.78, 0.84, 0.81, 0.77,
8                    0.83, 0.79, 0.80, 0.74, 0.85, 0.79, 0.82, 0.78])
9
10# Paired t-test (correct for small samples)
11t_stat, p_value = stats.ttest_rel(model_a, model_b)
12print(f"t-statistic: {t_stat:.4f}")
13print(f"p-value: {p_value:.4f}")
14print(f"Model A significantly better: {p_value < 0.05}")

3. Robust Regression with t-Likelihood

Standard regression assumes Gaussian noise. For outlier-robust regression, use a t-distributed likelihood:

🐍python
1import pyro
2import pyro.distributions as dist
3
4def robust_regression(x, y):
5    # Priors
6    weight = pyro.sample("weight", dist.Normal(0, 10))
7    bias = pyro.sample("bias", dist.Normal(0, 10))
8    df = pyro.sample("df", dist.Uniform(1, 30))  # Learn df
9    scale = pyro.sample("scale", dist.HalfNormal(5))
10
11    # t-distributed likelihood (robust to outliers!)
12    mean = weight * x + bias
13    with pyro.plate("data", len(y)):
14        pyro.sample("obs", dist.StudentT(df, mean, scale), obs=y)

4. A/B Testing with Limited Data

Early-stage startups often can't wait for large sample sizes. The t-test provides valid inference even with n = 10-20:

🐍python
1from scipy import stats
2
3def bayesian_ab_test_tprior(control, treatment, prior_df=3):
4    """
5    A/B test using t-distribution for robustness
6    to outlier conversion values.
7    """
8    # Welch's t-test (unequal variances allowed)
9    t_stat, p_value = stats.ttest_ind(treatment, control,
10                                       equal_var=False)
11
12    # Effect size (Cohen's d)
13    pooled_std = np.sqrt((np.var(control) + np.var(treatment)) / 2)
14    effect_size = (np.mean(treatment) - np.mean(control)) / pooled_std
15
16    return {
17        "t_statistic": t_stat,
18        "p_value": p_value,
19        "effect_size": effect_size,
20        "significant": p_value < 0.05
21    }

5. Gradient Uncertainty in Training

Gradients in mini-batch SGD are sample means. With small batches, gradient uncertainty follows a t-distribution. This insight connects to:

  • Adaptive learning rates (Adam, RMSprop)
  • Gradient clipping strategies
  • Batch size selection

Connections to Other Distributions

The Distribution Family Tree

RelationshipFormula/Description
t from Z and χ²T = Z / √(χ²/ν) where Z ~ N(0,1), χ² ~ χ²(ν)
t² = F(1, ν)Squaring t gives F-distribution with df (1, ν)
t(∞) = N(0,1)As df → ∞, t converges to standard Normal
t(1) = CauchySpecial case with no defined mean/variance
Relation to BetaCDF involves regularized incomplete Beta function

Why These Connections Matter

Understanding these relationships helps with:

  • Deriving new tests: F-test for variance comparison comes from t² relationship
  • Computational efficiency: Can reuse Beta function implementations
  • Theoretical understanding: All these distributions arise from the Normal

Python Implementation

Basic t-Distribution Operations

🐍python
1from scipy import stats
2import numpy as np
3
4# Create t-distribution with df=10
5t = stats.t(df=10)
6
7# PDF and CDF
8x = 2.0
9print(f"PDF at x=2: {t.pdf(x):.6f}")
10print(f"CDF at x=2: {t.cdf(x):.6f}")
11
12# Critical values (quantiles)
13alpha = 0.05
14t_critical = t.ppf(1 - alpha/2)  # Two-tailed
15print(f"Critical t for 95% CI: ±{t_critical:.4f}")
16
17# Compare to Normal
18z_critical = stats.norm.ppf(1 - alpha/2)
19print(f"Critical z for 95% CI: ±{z_critical:.4f}")
20print(f"Difference: {t_critical - z_critical:.4f}")

One-Sample t-Test

🐍python
1from scipy import stats
2import numpy as np
3
4# Sample data
5data = np.array([165, 170, 168, 172, 169, 175, 167, 171])
6
7# One-sample t-test: Is mean different from 170?
8t_stat, p_value = stats.ttest_1samp(data, popmean=170)
9print(f"t-statistic: {t_stat:.4f}")
10print(f"p-value: {p_value:.4f}")
11
12# Manual calculation for verification
13n = len(data)
14sample_mean = np.mean(data)
15sample_std = np.std(data, ddof=1)  # ddof=1 for sample std
16se = sample_std / np.sqrt(n)
17t_manual = (sample_mean - 170) / se
18print(f"Manual t: {t_manual:.4f}")
19
20# Confidence interval
21df = n - 1
22t_crit = stats.t.ppf(0.975, df)
23ci = (sample_mean - t_crit * se, sample_mean + t_crit * se)
24print(f"95% CI: ({ci[0]:.2f}, {ci[1]:.2f})")

Two-Sample t-Test

🐍python
1from scipy import stats
2import numpy as np
3
4# Two groups
5group_a = np.array([23, 25, 28, 24, 26, 27, 22, 25])
6group_b = np.array([28, 30, 27, 29, 31, 28, 30, 29])
7
8# Independent samples t-test (Welch's by default)
9t_stat, p_value = stats.ttest_ind(group_a, group_b, equal_var=False)
10print(f"Welch's t-test:")
11print(f"  t-statistic: {t_stat:.4f}")
12print(f"  p-value: {p_value:.4f}")
13
14# Effect size (Cohen's d)
15pooled_std = np.sqrt((np.var(group_a, ddof=1) + np.var(group_b, ddof=1)) / 2)
16effect_size = (np.mean(group_b) - np.mean(group_a)) / pooled_std
17print(f"  Effect size (Cohen's d): {effect_size:.4f}")

Paired t-Test

🐍python
1from scipy import stats
2import numpy as np
3
4# Before and after measurements (paired data)
5before = np.array([200, 195, 210, 188, 205, 198, 215, 192])
6after = np.array([185, 182, 195, 175, 190, 185, 200, 180])
7
8# Paired t-test
9t_stat, p_value = stats.ttest_rel(before, after)
10print(f"Paired t-test:")
11print(f"  t-statistic: {t_stat:.4f}")
12print(f"  p-value: {p_value:.4f}")
13
14# Mean difference and CI
15diff = before - after
16mean_diff = np.mean(diff)
17se_diff = np.std(diff, ddof=1) / np.sqrt(len(diff))
18t_crit = stats.t.ppf(0.975, len(diff) - 1)
19ci = (mean_diff - t_crit * se_diff, mean_diff + t_crit * se_diff)
20print(f"  Mean difference: {mean_diff:.2f}")
21print(f"  95% CI for difference: ({ci[0]:.2f}, {ci[1]:.2f})")

Common Pitfalls

Pitfall 1: Using z When You Should Use t

Wrong: Using z-test when σ is unknown (estimated from sample).

Why it matters: Leads to underestimated p-values and false positives.

Rule: Always use t when σ is estimated from data, regardless of sample size.

Pitfall 2: Ignoring the Normality Assumption

The t-test assumes the underlying population is approximately Normal.

  • For n > 30, CLT often provides robustness
  • For small n with skewed data, consider non-parametric alternatives
  • Check with Q-Q plots or Shapiro-Wilk test

Pitfall 3: Confusing df for Different Tests

Test TypeDegrees of Freedom
One-sample t-testn - 1
Paired t-testn - 1 (number of pairs - 1)
Two-sample (equal var)n₁ + n₂ - 2
Welch's t-testComplex formula (Satterthwaite approximation)

Pitfall 4: Multiple Testing Without Correction

Running many t-tests inflates false positive rate. Use Bonferroni correction or False Discovery Rate (FDR) control.

The p-hacking Trap

Don't run multiple tests and only report the significant ones. This is scientifically invalid and a form of p-hacking. Pre-register your hypothesis or use appropriate corrections.


Test Your Understanding

Test Your Understanding

1 / 10

When should you use the t-distribution instead of the Normal distribution for inference about a population mean?


Summary

The Student t-distribution is fundamental for statistical inference when the population variance is unknown. Here are the key takeaways:

  1. When to use t: Whenever you estimate σ from sample data, regardless of sample size
  2. Heavy tails: The t-distribution has heavier tails than Normal, accounting for uncertainty in variance estimation
  3. Degrees of freedom: Controls the shape; lower df = heavier tails; as df → ∞, t → Normal
  4. Confidence intervals: t-based intervals are wider than z-based, providing honest uncertainty quantification
  5. ML/AI applications: Robust priors, heavy-tailed likelihoods, small-sample model comparison
The Bottom Line: The t-distribution is the Normal distribution's more cautious, more honest cousin. It acknowledges what we don't know—and that honesty makes our statistical conclusions more reliable.

From Gosset to Modern ML

Over 100 years after Gosset's paper, the t-distribution remains essential. From clinical trials to A/B testing, from quality control to Bayesian deep learning, understanding the t-distribution is a core skill for any data scientist or ML engineer.

Loading comments...