Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Define the Student t-distribution and explain its relationship to the Normal and Chi-Square distributions
Calculate probabilities and critical values using the t-distribution with different degrees of freedom
Distinguish when to use the t-distribution vs. the Normal distribution based on sample size and knowledge of population variance
Derive the t-statistic formula and understand each component
Apply t-tests (one-sample, two-sample, paired) for hypothesis testing
Construct confidence intervals for population means with unknown variance
Explain why heavy tails matter and their practical implications for outlier robustness
Implement t-tests and robust statistical methods in Python

The Big Picture: The Normal's Cautious Cousin

"The t-distribution is what uncertainty looks like when you're uncertain about your uncertainty."

When you learned about the Normal distribution, you probably encountered formulas like the z-score: $z = (ar{x} - \mu) / (\sigma / \sqrt{n})$ . This formula assumes you know the population standard deviation $sigma$ . But in the real world, you almost never know $sigma$ !

The question that haunted statisticians in the early 1900s was: What happens when we replace the known σ with an estimate s from our sample?

The Core Insight

When you estimate $sigma$ from your sample data, you introduce additional uncertainty. The sample standard deviation $s$ varies from sample to sample, especially with small samples. The t-distribution accounts for this extra source of variability through its heavier tails.

Think of it this way:

Normal distribution: You know exactly how spread out the data is (σ is known)
t-distribution: You're estimating the spread from your sample (σ is unknown)

Because you're less certain about the spread, extreme values become more likely. The t-distribution captures this by having fatter tails than the Normal.

The Guinness Brewery Story

Setting: Dublin, Ireland, Early 1900s

William Sealy Gosset was a chemist and statistician working at the famous Guinness Brewery. His job was to ensure beer quality by testing samples of barley and other ingredients.

But Gosset faced a practical problem: testing was expensive. He could only afford to test small samples—perhaps 3 to 10 measurements per batch.

The existing statistical methods (based on the Normal distribution) required either knowing the true population variance or having large samples. Gosset had neither.

The Problem with Small Samples

When Gosset used the sample standard deviation $s$ instead of the population $sigma$ in his z-score calculations, something was wrong. His calculated probabilities didn't match reality—especially with small samples.

The issue: $s$ is itself a random variable that underestimates $sigma$ on average for small samples. When you divide by a randomly varying quantity, you get heavier tails.

The Breakthrough (1908)

Gosset mathematically derived what happens when you substitute $s$ for $sigma$ . The resulting distribution had:

The same bell-curve shape as the Normal
Heavier tails (extreme values more likely)
A parameter called "degrees of freedom" that controls how different it is from Normal
Convergence to the Normal as sample size increases

The Pseudonym "Student"

Guinness prohibited employees from publishing scientific papers to prevent competitors from learning their methods. So Gosset published under the pseudonym "Student" in the journal Biometrika in 1908—hence "Student's t-distribution."

Historical Impact

This single paper revolutionized statistical practice. For the first time, scientists could make valid inferences from small samples—crucial in fields like medicine, agriculture, and physics where data collection is expensive.

Mathematical Definition

Definition 1: The Student t-Distribution

If $Z sim N(0,1)$ and $V sim \chi^2(\nu)$ are independent random variables, then:

T = rac{Z}{sqrt{V/\nu}} sim t(\nu)

where $\nu$ (the Greek letter "nu") is the degrees of freedom.

Symbol Table

Symbol	Name	Meaning	Range
T	t-statistic	Random variable following t-distribution	(-∞, ∞)
Z	Standard Normal	A N(0,1) random variable	(-∞, ∞)
V	Chi-square	Sum of ν squared standard normals	(0, ∞)
ν (nu)	Degrees of freedom	Shape parameter; typically n-1	1, 2, 3, ...
n	Sample size	Number of observations	1, 2, 3, ...

Intuitive Statement: The t-distribution is what you get when you take a standard Normal random variable and divide by an estimate of its standard deviation. The "sloppiness" of that estimate (captured by degrees of freedom) determines how heavy the tails are.

The Uncertain Archer Analogy

Imagine an archer aiming at a target (the true population mean):

Normal distribution: The archer knows exactly how steady their hands are (known $sigma$ ). They can predict their shot pattern precisely.
t-distribution: The archer must also estimate their own steadiness from a few practice shots (estimated $s$ ). With fewer practice shots (low df), they're less certain about their steadiness, so they might occasionally shoot much farther from center than expected → heavier tails.

Definition 2: Probability Density Function (PDF)

f(t; \nu) = rac{\Gammaleft( rac{\nu+1}{2} ight)}{sqrt{\nupi}\,\Gammaleft( rac{\nu}{2} ight)} left(1 + rac{t^2}{\nu} ight)^{- rac{\nu+1}{2}}

Where $\Gamma(\cdot)$ is the Gamma function (a generalization of factorial).

What the PDF tells us: The $(1 + t^2/\nu)$ term in the denominator determines how fast the tails decay. Smaller $\nu$ means slower decay and heavier tails.

Definition 3: Key Properties

Property	Formula	Normal Comparison
Mean	E[T] = 0 (for ν > 1)	Same as N(0,1)
Variance	Var(T) = ν/(ν-2) (for ν > 2)	Greater than 1
Symmetry	f(-t) = f(t)	Same
Support	(-∞, ∞)	Same
Kurtosis	3 + 6/(ν-4) (for ν > 4)	Heavier tails

Critical Insight: Variance Depends on df

The variance formula $ext{Var}(T) = \nu/(\nu-2)$ shows:

When ν = 3: Variance = 3 (much larger than Normal's 1)
When ν = 10: Variance = 1.25
When ν = 30: Variance ≈ 1.07
As ν → ∞: Variance → 1 (same as Normal)

Interactive PDF Explorer

Explore how the t-distribution changes with degrees of freedom. Adjust the slider and watch how the distribution transitions from heavy-tailed to Normal-like:

Student t-Distribution Explorer

Degrees of Freedom (df): 5

1 (Cauchy)30100 (≈Normal)

Display Options

Show PDFShow CDFCompare to Normal

Show Critical Values

Degrees of Freedom

df = 5

Mean

E[T] = 0

Variance

1.667

t₀.₀₂₅ (α=0.05)

±2.573

Key Insight

With low df (5), notice the heavier tails compared to the Normal. The variance is 1.67, which is 67% larger than the Normal's variance of 1.

Try This

Set df = 1 to see the Cauchy distribution (no defined mean or variance!)
Increase df gradually to see convergence to Normal
Toggle "Show Critical Values" to see how t-critical changes with df
Compare the t-distribution to the Normal overlay

t-Distribution vs Normal Distribution

The key difference between t and Normal is in the tails. This interactive visualization lets you compare them directly:

t-Distribution vs Normal Distribution

Degrees of Freedom: 5

1103050

View Mode

t-Distribution Variance

1.6667

Normal variance = 1

Total Absolute Difference

0.1178

Area between curves

Convergence Status

Different

Rule of thumb: df ≥ 30

Why Does This Matter?

The heavier tails of the t-distribution have practical consequences:

Wider confidence intervals: We need wider intervals to maintain the same confidence level
Larger critical values: The t-critical value is always larger than the z-critical value (for finite df)
Lower power: Harder to reject the null hypothesis with small samples
More robust: Heavy tails make the distribution more tolerant of outliers

Understanding Heavy Tails

Why do "heavy tails" matter? Consider the probability of observing an extreme value (beyond 3 standard deviations):

Distribution	P(\|X\| > 3)	Frequency
Normal N(0,1)	0.0027	About 1 in 370
t(5)	0.0145	About 1 in 69
t(3)	0.0385	About 1 in 26
t(1) = Cauchy	0.205	About 1 in 5

With t(3), extreme values are 14 times more likely than with the Normal! This is why using the Normal distribution when you should use t leads to:

Underestimating p-values
Confidence intervals that are too narrow
False rejections of the null hypothesis

The Cauchy Special Case

When df = 1, the t-distribution becomes the Cauchy distribution. Its tails are so heavy that the mean is undefined (the integral diverges) and the variance is infinite. The Law of Large Numbers doesn't apply!

Degrees of Freedom Explained

The "degrees of freedom" parameter is often mystifying. Here's the intuition:

Degrees of freedom = the number of independent pieces of information available to estimate a parameter.

Why n-1 for a Sample Mean?

When calculating the sample variance $s^2 = rac{1}{n-1}\sum(x_i - ar{x})^2$ , we divide by $n-1$ not $n$ . Here's why:

We start with $n$ observations, each providing one piece of information
We "use up" one degree of freedom to estimate $ar{x}$
Only $n-1$ deviations are truly independent (the last one is determined by the constraint that deviations sum to zero)

The Convergence Story

As degrees of freedom increase:

df = 1: Heavy-tailed Cauchy (no mean or variance)
df = 3-5: Noticeably heavier tails than Normal
df = 10-20: Getting closer to Normal
df = 30+: Practically indistinguishable from Normal
df → ∞: Exactly the standard Normal distribution

Rule of Thumb

When n > 30, the difference between t and z critical values is often negligible. However, modern practice is to always use t-distribution (never hurts, often helps).

The t-Statistic Formula

For a sample mean $ar{x}$ from $n$ observations with sample standard deviation $s$ :

t = rac{ar{x} - \mu_0}{s / \sqrt{n}}

Symbol Table

Symbol	Name	Meaning
t	t-statistic	Calculated test statistic
x̄	Sample mean	Average of n observations
μ₀	Hypothesized mean	Null hypothesis value
s	Sample std dev	Estimated standard deviation
n	Sample size	Number of observations
s/√n	Standard error	Estimated uncertainty in x̄

Intuitive Statement: The t-statistic measures how many estimated standard errors the sample mean is away from the hypothesized population mean.

Compare this to the z-statistic: $z = (ar{x} - \mu_0)/(\sigma / \sqrt{n})$ . The only difference is $s$ vs. $sigma$ —but this difference is crucial for small samples!

Hypothesis Testing with t

The t-test is one of the most widely used statistical tests. Try it yourself with this interactive demo:

Interactive Hypothesis Testing Demo

Quick Examples:

Sample Statistics

Sample Mean (x̄)

Sample Std Dev (s)

Sample Size (n)

Hypothesis

Null Hypothesis Value (μ₀)

Alternative Hypothesis

Settings

Significance Level (α)

Degrees of Freedom

df = 14

Test Statistics

t-statistic:1.0480

Critical t (±):±2.1448

p-value:0.3124

95% CI for μ:[47.593, 57.007]

Decision

FAIL TO REJECT H₀

At α = 0.05, there is insufficient evidence to conclude that the population mean is different from 50.

Formula Used

t =

x̄ − μ₀

s / √n

52.30 − 50

8.50 / √15

=1.0480

Types of t-Tests

1. One-Sample t-Test

Tests whether a sample mean differs from a hypothesized population mean.

Example: Is the average blood pressure reduction from a new drug significantly different from zero?
Formula: $t = (ar{x} - \mu_0)/(s/\sqrt{n})$
df: $n - 1$

2. Two-Sample t-Test (Independent)

Tests whether two independent groups have different means.

Example: Do patients receiving drug A have different outcomes than those receiving drug B?
Equal variance formula: $t = (ar{x}_1 - ar{x}_2)/\sqrt{s_p^2(1/n_1 + 1/n_2)}$
df: $n_1 + n_2 - 2$ (equal variance assumed)

3. Paired t-Test

Tests whether the mean difference between paired observations is zero.

Example: Did students' scores improve from pre-test to post-test?
Formula: Compute differences $d_i = x_{i,after} - x_{i,before}$ , then apply one-sample t-test to the differences
df: $n - 1$ (number of pairs minus 1)

Confidence Intervals

The t-distribution is essential for constructing confidence intervals when the population variance is unknown:

ext{CI} = ar{x} \pm t^* imes rac{s}{\sqrt{n}}

Where $t^*$ is the critical t-value for your desired confidence level and degrees of freedom.

Confidence Interval Calculator

Sample Mean (x̄)

Sample Std Dev (s)

Sample Size (n)

Confidence Level

t-interval vs z-interval Comparison

Calculate degrees of freedom

df = n - 1 = 10 - 1 = 9

Find critical t-value for 95% confidence

t_{0.025000000000000022,df=9} = 2.2622(vs z = 1.9600)

Calculate standard error

SE = s / √n = 8.5 / √10 = 2.6879

Calculate margin of error

ME = t* × SE = 2.2622 × 2.6879 = 6.0805

Construct confidence interval

CI = x̄ ± ME = 165 ± 6.0805

= [158.9195, 171.0805]

95% t-Confidence Interval

[158.919, 171.081]

Width: 12.1611

z-interval (if σ known)

[159.732, 170.268]

Width: 10.5367

Width Difference

+15.4% wider

t-interval accounts for variance uncertainty

Interpretation

We are 95% confident that the true population mean lies between 158.919 and 171.081. With only 10 observations, the t-interval is 15.4% wider than the z-interval to account for uncertainty in estimating σ.

Why t-Intervals Are Wider

The t-interval is always wider than the corresponding z-interval because:

$t^* > z^*$ for any finite df
This extra width accounts for uncertainty in estimating σ
As n increases, the difference shrinks

Real-World Applications

Example 1: Pharmaceutical Testing

Problem: A pharmaceutical company tests a new blood pressure medication on 12 patients. The mean reduction is 8.5 mmHg with s = 4.2 mmHg. Is the drug effective?

Solution:

H₀: μ = 0 (no effect) vs H₁: μ > 0 (drug reduces blood pressure)
t = (8.5 - 0) / (4.2 / √12) = 8.5 / 1.21 = 7.01
df = 11, critical value t₀.₀₅,₁₁ = 1.796
Since 7.01 > 1.796, we reject H₀ (p < 0.001)

Example 2: Quality Control (Full Circle to Gosset!)

Problem: Guinness measures alcohol content in 8 samples. Target is 4.2%. Sample mean = 4.35%, s = 0.12%. Is production on target?

Solution:

H₀: μ = 4.2 vs H₁: μ ≠ 4.2
t = (4.35 - 4.20) / (0.12 / √8) = 0.15 / 0.0424 = 3.54
df = 7, critical value t₀.₀₂₅,₇ = 2.365
Since 3.54 > 2.365, production is significantly above target

Example 3: A/B Testing in Tech

Problem: A website tests a new checkout flow on 25 users. Mean conversion improvement: 2.4%, s = 5.1%. Is the improvement real?

Solution:

t = (2.4 - 0) / (5.1 / √25) = 2.4 / 1.02 = 2.35
df = 24, critical value t₀.₀₂₅,₂₄ = 2.064
p-value ≈ 0.027
Improvement is statistically significant at α = 0.05

Example 4: Financial Analysis

Problem: A hedge fund's strategy shows mean daily return of 0.08% over 50 trading days, with s = 0.5%. Is the strategy profitable?

Solution:

t = (0.08 - 0) / (0.5 / √50) = 0.08 / 0.0707 = 1.13
df = 49, critical value t₀.₀₅,₄₉ ≈ 1.677
p-value ≈ 0.13
Not enough evidence (could be luck)

AI/ML Applications

1. Robust Priors in Bayesian Deep Learning

The t-distribution's heavy tails make it an excellent prior distribution for neural network weights:

More tolerant of outliers than Gaussian priors
Allows occasional large weights without penalty explosion
Used in Bayesian neural networks for uncertainty quantification

🐍python

1import torch.distributions as dist
2
3# t-prior for network weights (more robust than Normal)
4t_prior = dist.StudentT(df=4, loc=0, scale=1)
5
6# Sample weights
7weights = t_prior.sample((100,))
8print(f"Weight range: [{weights.min():.3f}, {weights.max():.3f}]")

2. Small-Sample Model Comparison

When comparing two models on limited test sets (common in medical AI):

Wrong: Use Normal-based z-test
Right: Use paired t-test for significance

🐍python

1from scipy import stats
2import numpy as np
3
4# Model accuracies on 15 test cases
5model_a = np.array([0.82, 0.78, 0.85, 0.79, 0.88, 0.84, 0.80,
6                    0.86, 0.81, 0.83, 0.77, 0.89, 0.82, 0.85, 0.80])
7model_b = np.array([0.79, 0.75, 0.82, 0.78, 0.84, 0.81, 0.77,
8                    0.83, 0.79, 0.80, 0.74, 0.85, 0.79, 0.82, 0.78])
9
10# Paired t-test (correct for small samples)
11t_stat, p_value = stats.ttest_rel(model_a, model_b)
12print(f"t-statistic: {t_stat:.4f}")
13print(f"p-value: {p_value:.4f}")
14print(f"Model A significantly better: {p_value < 0.05}")

3. Robust Regression with t-Likelihood

Standard regression assumes Gaussian noise. For outlier-robust regression, use a t-distributed likelihood:

🐍python

1import pyro
2import pyro.distributions as dist
3
4def robust_regression(x, y):
5    # Priors
6    weight = pyro.sample("weight", dist.Normal(0, 10))
7    bias = pyro.sample("bias", dist.Normal(0, 10))
8    df = pyro.sample("df", dist.Uniform(1, 30))  # Learn df
9    scale = pyro.sample("scale", dist.HalfNormal(5))
10
11    # t-distributed likelihood (robust to outliers!)
12    mean = weight * x + bias
13    with pyro.plate("data", len(y)):
14        pyro.sample("obs", dist.StudentT(df, mean, scale), obs=y)

4. A/B Testing with Limited Data

Early-stage startups often can't wait for large sample sizes. The t-test provides valid inference even with n = 10-20:

🐍python

1from scipy import stats
2
3def bayesian_ab_test_tprior(control, treatment, prior_df=3):
4    """
5    A/B test using t-distribution for robustness
6    to outlier conversion values.
7    """
8    # Welch's t-test (unequal variances allowed)
9    t_stat, p_value = stats.ttest_ind(treatment, control,
10                                       equal_var=False)
11
12    # Effect size (Cohen's d)
13    pooled_std = np.sqrt((np.var(control) + np.var(treatment)) / 2)
14    effect_size = (np.mean(treatment) - np.mean(control)) / pooled_std
15
16    return {
17        "t_statistic": t_stat,
18        "p_value": p_value,
19        "effect_size": effect_size,
20        "significant": p_value < 0.05
21    }

5. Gradient Uncertainty in Training

Gradients in mini-batch SGD are sample means. With small batches, gradient uncertainty follows a t-distribution. This insight connects to:

Adaptive learning rates (Adam, RMSprop)
Gradient clipping strategies
Batch size selection

Connections to Other Distributions

The Distribution Family Tree

Relationship	Formula/Description
t from Z and χ²	T = Z / √(χ²/ν) where Z ~ N(0,1), χ² ~ χ²(ν)
t² = F(1, ν)	Squaring t gives F-distribution with df (1, ν)
t(∞) = N(0,1)	As df → ∞, t converges to standard Normal
t(1) = Cauchy	Special case with no defined mean/variance
Relation to Beta	CDF involves regularized incomplete Beta function

Why These Connections Matter

Understanding these relationships helps with:

Deriving new tests: F-test for variance comparison comes from t² relationship
Computational efficiency: Can reuse Beta function implementations
Theoretical understanding: All these distributions arise from the Normal

Python Implementation

Basic t-Distribution Operations

🐍python

1from scipy import stats
2import numpy as np
3
4# Create t-distribution with df=10
5t = stats.t(df=10)
6
7# PDF and CDF
8x = 2.0
9print(f"PDF at x=2: {t.pdf(x):.6f}")
10print(f"CDF at x=2: {t.cdf(x):.6f}")
11
12# Critical values (quantiles)
13alpha = 0.05
14t_critical = t.ppf(1 - alpha/2)  # Two-tailed
15print(f"Critical t for 95% CI: ±{t_critical:.4f}")
16
17# Compare to Normal
18z_critical = stats.norm.ppf(1 - alpha/2)
19print(f"Critical z for 95% CI: ±{z_critical:.4f}")
20print(f"Difference: {t_critical - z_critical:.4f}")

One-Sample t-Test

🐍python

1from scipy import stats
2import numpy as np
3
4# Sample data
5data = np.array([165, 170, 168, 172, 169, 175, 167, 171])
6
7# One-sample t-test: Is mean different from 170?
8t_stat, p_value = stats.ttest_1samp(data, popmean=170)
9print(f"t-statistic: {t_stat:.4f}")
10print(f"p-value: {p_value:.4f}")
11
12# Manual calculation for verification
13n = len(data)
14sample_mean = np.mean(data)
15sample_std = np.std(data, ddof=1)  # ddof=1 for sample std
16se = sample_std / np.sqrt(n)
17t_manual = (sample_mean - 170) / se
18print(f"Manual t: {t_manual:.4f}")
19
20# Confidence interval
21df = n - 1
22t_crit = stats.t.ppf(0.975, df)
23ci = (sample_mean - t_crit * se, sample_mean + t_crit * se)
24print(f"95% CI: ({ci[0]:.2f}, {ci[1]:.2f})")

Two-Sample t-Test

🐍python

1from scipy import stats
2import numpy as np
3
4# Two groups
5group_a = np.array([23, 25, 28, 24, 26, 27, 22, 25])
6group_b = np.array([28, 30, 27, 29, 31, 28, 30, 29])
7
8# Independent samples t-test (Welch's by default)
9t_stat, p_value = stats.ttest_ind(group_a, group_b, equal_var=False)
10print(f"Welch's t-test:")
11print(f"  t-statistic: {t_stat:.4f}")
12print(f"  p-value: {p_value:.4f}")
13
14# Effect size (Cohen's d)
15pooled_std = np.sqrt((np.var(group_a, ddof=1) + np.var(group_b, ddof=1)) / 2)
16effect_size = (np.mean(group_b) - np.mean(group_a)) / pooled_std
17print(f"  Effect size (Cohen's d): {effect_size:.4f}")

Paired t-Test

🐍python

1from scipy import stats
2import numpy as np
3
4# Before and after measurements (paired data)
5before = np.array([200, 195, 210, 188, 205, 198, 215, 192])
6after = np.array([185, 182, 195, 175, 190, 185, 200, 180])
7
8# Paired t-test
9t_stat, p_value = stats.ttest_rel(before, after)
10print(f"Paired t-test:")
11print(f"  t-statistic: {t_stat:.4f}")
12print(f"  p-value: {p_value:.4f}")
13
14# Mean difference and CI
15diff = before - after
16mean_diff = np.mean(diff)
17se_diff = np.std(diff, ddof=1) / np.sqrt(len(diff))
18t_crit = stats.t.ppf(0.975, len(diff) - 1)
19ci = (mean_diff - t_crit * se_diff, mean_diff + t_crit * se_diff)
20print(f"  Mean difference: {mean_diff:.2f}")
21print(f"  95% CI for difference: ({ci[0]:.2f}, {ci[1]:.2f})")

Common Pitfalls

Pitfall 1: Using z When You Should Use t

Wrong: Using z-test when σ is unknown (estimated from sample).

Why it matters: Leads to underestimated p-values and false positives.

Rule: Always use t when σ is estimated from data, regardless of sample size.

Pitfall 2: Ignoring the Normality Assumption

The t-test assumes the underlying population is approximately Normal.

For n > 30, CLT often provides robustness
For small n with skewed data, consider non-parametric alternatives
Check with Q-Q plots or Shapiro-Wilk test

Pitfall 3: Confusing df for Different Tests

Test Type	Degrees of Freedom
One-sample t-test	n - 1
Paired t-test	n - 1 (number of pairs - 1)
Two-sample (equal var)	n₁ + n₂ - 2
Welch's t-test	Complex formula (Satterthwaite approximation)

Pitfall 4: Multiple Testing Without Correction

Running many t-tests inflates false positive rate. Use Bonferroni correction or False Discovery Rate (FDR) control.

The p-hacking Trap

Don't run multiple tests and only report the significant ones. This is scientifically invalid and a form of p-hacking. Pre-register your hypothesis or use appropriate corrections.

Test Your Understanding

1 / 10

When should you use the t-distribution instead of the Normal distribution for inference about a population mean?

Summary

The Student t-distribution is fundamental for statistical inference when the population variance is unknown. Here are the key takeaways:

When to use t: Whenever you estimate σ from sample data, regardless of sample size
Heavy tails: The t-distribution has heavier tails than Normal, accounting for uncertainty in variance estimation
Degrees of freedom: Controls the shape; lower df = heavier tails; as df → ∞, t → Normal
Confidence intervals: t-based intervals are wider than z-based, providing honest uncertainty quantification
ML/AI applications: Robust priors, heavy-tailed likelihoods, small-sample model comparison

The Bottom Line: The t-distribution is the Normal distribution's more cautious, more honest cousin. It acknowledges what we don't know—and that honesty makes our statistical conclusions more reliable.

From Gosset to Modern ML

Over 100 years after Gosset's paper, the t-distribution remains essential. From clinical trials to A/B testing, from quality control to Bayesian deep learning, understanding the t-distribution is a core skill for any data scientist or ML engineer.