Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Define the F-distribution and explain its construction from two independent Chi-square distributions
Understand intuitively why the F-distribution measures the ratio of two variances
Calculate probabilities and critical values using the F-distribution with two degrees of freedom parameters
Apply F-tests for comparing variances, ANOVA, and regression model comparison
Derive the relationship between F, Chi-square, and t-distributions
Use F-tests for feature selection in machine learning (sklearn's f_classif)
Implement variance ratio tests and ANOVA in Python

The Big Picture: The Variance Comparator

"The F-distribution answers the fundamental question: Are these two sources of variation genuinely different, or just random fluctuation?"

Imagine you're a quality control engineer comparing the precision of two manufacturing machines. Machine A produces ball bearings with some variation in diameter. Machine B also produces ball bearings with some variation. The critical question: Is one machine more inconsistent than the other, or are the differences just due to random chance?

This is where the F-distribution shines. It provides a probability model for the ratio of two variances—telling us whether observed differences in spread are statistically meaningful.

The Core Insight

The F-distribution is the ratio of two independent Chi-square random variables, each scaled by their degrees of freedom. When two populations have equal variance, this ratio clusters around 1. When variances differ, the ratio deviates from 1, and the F-distribution tells us how surprising that deviation is.

Why F = 1 is Special

When you compute $F = \frac{s_1^2}{s_2^2}$ for two samples:

F ≈ 1: The sample variances are similar—no evidence that population variances differ
F >> 1: The numerator variance is much larger—the first population may be more variable
F << 1: The denominator variance is larger—but we typically arrange to put the larger variance on top

The Fisher Story

Setting: Rothamsted, England, 1920s

Ronald Aylmer Fisher (1890-1962) is often called the father of modern statistics. Working at Rothamsted Experimental Station, he faced a practical agricultural problem: How do you determine if different fertilizers, seed varieties, or farming techniques actually produce different crop yields?

The challenge wasn't just comparing means. Fisher realized that understanding variation was equally important. If one treatment produces more variable results, that inconsistency matters for farmers planning their harvests.

The genius of Fisher's approach: Instead of comparing groups one pair at a time, he developed Analysis of Variance (ANOVA), which uses the F-distribution to test whether any of the group means differ—in a single, powerful test.

The Naming

The distribution was named "F" in Fisher's honor by George W. Snedecor, another pioneer of statistical methods. Hence, it is sometimes called the Fisher-Snedecor distribution or simply the "F-distribution."

Fisher's Legacy in ML

Fisher's ideas permeate machine learning: Fisher's Linear Discriminant Analysis (LDA), the Fisher information matrix (used in natural gradient descent), and the F-test (used in sklearn's feature selection) all trace back to his foundational work.

Mathematical Definition

Definition 1: The F-Distribution

If $U \sim \chi^2(d_1)$ and $V \sim \chi^2(d_2)$ are independent chi-square random variables, then:

F = \frac{U/d_1}{V/d_2} \sim F(d_1, d_2)

where $d_1$ is the numerator degrees of freedom and $d_2$ is the denominator degrees of freedom.

Symbol Table

Symbol	Name	Meaning	Range
F	F-statistic	Ratio of scaled chi-squares	(0, ∞)
U	Numerator χ²	Chi-square with d₁ degrees of freedom	(0, ∞)
V	Denominator χ²	Chi-square with d₂ degrees of freedom	(0, ∞)
d₁	Numerator df	Degrees of freedom for numerator	1, 2, 3, ...
d₂	Denominator df	Degrees of freedom for denominator	1, 2, 3, ...

Intuitive Statement: The F-distribution tells us how the ratio of two sample variances (each measuring some type of variation) behaves when the underlying population variances are equal. It's the yardstick for judging whether observed variance differences are statistically meaningful.

Definition 2: Probability Density Function (PDF)

f(x; d_1, d_2) = \frac{\sqrt{\frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x + d_2)^{d_1+d_2}}}}{x \cdot B(d_1/2, d_2/2)}

where $B(\cdot, \cdot)$ is the Beta function (related to the Gamma function).

What the PDF Tells Us

The PDF is always right-skewed (mode < mean) and defined only for positive values. The shape depends entirely on the two degrees of freedom parameters—there's no separate scale or location parameter.

Definition 3: Key Moments

Property	Formula	Condition
Mean	d₂ / (d₂ - 2)	d₂ > 2
Mode	((d₁ - 2) / d₁) × (d₂ / (d₂ + 2))	d₁ > 2
Variance	[2d₂²(d₁ + d₂ - 2)] / [d₁(d₂-2)²(d₂-4)]	d₂ > 4
Skewness	Always positive (right-skewed)	-
Support	(0, ∞)	-

The Mean Approaches 1

As $d_2 \to \infty$ , the mean $E[F] = d_2/(d_2-2) \to 1$ . This makes intuitive sense: under the null hypothesis (equal variances), we expect the ratio of variances to be approximately 1.

Interactive PDF Explorer

Explore how the F-distribution changes with different degrees of freedom. Adjust both parameters and observe:

📈F-Distribution Explorer

d₁ (Numerator df) = 5

Controls the numerator chi-square

d₂ (Denominator df) = 20

Controls the denominator chi-square

Show critical values

Statistics for F(5, 20)

Mean:1.1111

Variance:0.7099

Mode:0.5455

Support:(0, ∞)

Shape: Right-skewed with interior mode
Notation: F ~ F(5, 20)

Critical Values for F(5, 20)

α = 0.10:2.1582

α = 0.05:2.7109

α = 0.01:4.1027

Reject H₀ if your F-statistic exceeds the critical value for your chosen α

F = 1 indicates equal variances. Values >1 suggest numerator variance is larger.

Key Insight: The Variance Ratio

The F-distribution models the ratio of two sample variances. When F ≈ 1, the variances are similar. Large F values suggest the numerator variance is significantly larger than the denominator variance.

Common presets:

Observations

Low d₁ (1-2) creates a spike near 0—the distribution is heavily right-skewed
As both d₁ and d₂ increase, the distribution becomes more symmetric and bell-shaped
The critical value (F₀.₀₅) decreases as degrees of freedom increase
The F = 1 line shows where equal variances would center the distribution

The Chi-Square Ratio

Understanding how the F-distribution arises from chi-square ratios is key to intuition. This interactive demo shows the sampling process:

🎲Chi-Square Ratio Demonstration

How F Arises from Chi-Square

The F-distribution is defined as the ratio of two independent chi-square random variables, each divided by their degrees of freedom:

F = (U/d₁) / (V/d₂) where U ~ χ²(d₁), V ~ χ²(d₂)

d₁ (Numerator df) = 5

d₂ (Denominator df) = 10

Key Insight

As you generate more samples, the histogram approaches the theoretical F(5, 10) distribution. Notice how most values cluster near F = 1 when both chi-squares have similar "expected" behavior.

Why Divide by Degrees of Freedom?

Raw chi-square values grow with degrees of freedom (mean = df). By dividing each by its df, we normalize them:

$U/d_1$ has expected value 1 (under the null)
$V/d_2$ also has expected value 1
Their ratio $F$ centers around 1

Key Properties

1. Reciprocal Property

If $F \sim F(d_1, d_2)$ , then:

\frac{1}{F} \sim F(d_2, d_1)

Taking the reciprocal simply swaps the degrees of freedom. This is useful when you want to always test the larger variance in the numerator.

2. Only Positive Values

The F-distribution only takes values in $(0, \infty)$ because it's the ratio of two non-negative quantities (variances, which are sums of squares).

3. Limiting Behavior

Condition	F approaches	Intuition
d₂ → ∞	χ²(d₁) / d₁	Denominator becomes constant at 1
d₁ → ∞, d₂ → ∞	Normal distribution	CLT-like behavior
d₁ = 1	t²(d₂)	Squared t-distribution

4. Always Right-Skewed

The F-distribution is always positively skewed. The mode is less than the mean, and there's a long right tail for extreme F values.

Variance Ratio Test

The simplest application of the F-distribution: testing whether two populations have equal variance.

⚖️Variance Ratio Test (F-Test)

Hypothesis Test

H₀ (Null):σ₁² = σ₂²

Population variances are equal

H₁ (Alternative):σ₁² ≠ σ₂²

Population variances are different

Sample 1 (comma-separated)

n = 10 values

Sample 2 (comma-separated)

n = 10 values

Significance Level (α) = 0.05

0.010.050.10

Generate samples:

Sample 1

n₁:10

Mean:25.300

Variance (s₁²):4.9000

Sample 2

n₂:10

Mean:29.900

Variance (s₂²):5.4333

F-Statistic Calculation

Sample 2 Variance

5.4333

Sample 1 Variance

4.9000

F-statistic

1.1088

df = (9, 9)

Decision

F-statistic:1.1088

p-value:0.880207

α:0.05

p < α?No

✓ Fail to reject H₀: No significant difference in variances

When to Use This Test

• Testing assumption of equal variances for t-test
• Comparing quality control between two processes
• Checking if two measurement methods have similar precision
• Validating homoscedasticity in regression

The Test Setup

Given samples from two populations:

H₀: $\sigma_1^2 = \sigma_2^2$ (variances are equal)
H₁: $\sigma_1^2 \neq \sigma_2^2$ (variances differ)

Under H₀:

F = \frac{s_1^2}{s_2^2} \sim F(n_1 - 1, n_2 - 1)

Sensitivity to Normality

The F-test for variances is quite sensitive to non-normality. For robust alternatives, consider Levene's test or the Brown-Forsythe test, which work better with non-normal data.

ANOVA: The F-Test for Means

While the variance ratio test compares two variances directly, ANOVA uses the F-distribution to compare means across multiple groups. The key insight: if group means differ, the between-group variance will be larger than the within-group variance.

📊One-Way ANOVA Demo

Hypothesis Test

H₀:μ₁ = μ₂ = μ₃ = ... = μₖ(All group means are equal)

H₁:At least one group mean is different

Enter Group Data

Generate example:

Significance Level (α) = 0.05

Group Visualization

Thick horizontal lines = group means. Dashed red line = grand mean.

ANOVA Table

Source	SS	df	MS	F	p-value
Between Groups	160.13	2	80.07	27.6092	<0.0001
Within Groups	34.80	12	2.90
Total	194.93	14

The F-Statistic

F = MSB / MSW = 80.07 / 2.90 = 27.6092

Large F indicates group means differ more than expected by chance

Effect Size (η²)

82.1%

Proportion of variance explained by group membership

Small: <1% | Medium: 1-6% | Large: >6%

Decision (α = 0.05)

Reject H₀: Significant difference exists between at least one pair of groups. Use post-hoc tests (Tukey HSD) to determine which groups differ.

When to Use One-Way ANOVA

• Comparing means of 3+ groups (use t-test for 2 groups)
• Testing if a categorical variable affects a numeric outcome
• A/B/C testing in product experiments
• Comparing treatment effects in clinical trials

The ANOVA Logic

The total variation in data can be partitioned:

\underbrace{SS_{Total}}_{\text{Total Variation}} = \underbrace{SS_{Between}}_{\text{Due to Group Differences}} + \underbrace{SS_{Within}}_{\text{Random Error}}

The F-statistic compares these:

F = \frac{MS_{Between}}{MS_{Within}} = \frac{SS_B / (k-1)}{SS_W / (N-k)}

where:

$k$ = number of groups
$N$ = total number of observations
$k-1$ = between-group degrees of freedom
$N-k$ = within-group degrees of freedom

ANOVA vs Multiple t-tests

Why not just run t-tests for each pair of groups? With k groups, you'd run k(k-1)/2 tests, inflating the Type I error rate. ANOVA tests all groups simultaneously, controlling the error rate. If ANOVA is significant, use post-hoc tests (Tukey HSD) to find which pairs differ.

Real-World Applications

Example 1: Quality Control

Problem: Two production lines make electronic components. Are their manufacturing tolerances equally consistent?

Line A: n₁ = 25 samples, s₁² = 0.0025 mm²
Line B: n₂ = 30 samples, s₂² = 0.0049 mm²
F = 0.0049 / 0.0025 = 1.96
F₀.₀₅(29, 24) ≈ 2.01
Conclusion: Fail to reject H₀—no significant difference in precision

Example 2: Clinical Trial

Problem: Three drug dosages are tested. Do they produce different average blood pressure reductions?

Low dose (n=20): mean reduction 5.2 mmHg
Medium dose (n=20): mean reduction 8.7 mmHg
High dose (n=20): mean reduction 12.1 mmHg
One-way ANOVA: F(2, 57) = 15.4, p < 0.001
Conclusion: Significant difference—proceed to pairwise comparisons

Example 3: Financial Risk

Problem: Is Portfolio A more volatile than Portfolio B?

Portfolio A: 60 months, variance = 0.0012 (monthly returns)
Portfolio B: 60 months, variance = 0.0008
F = 0.0012 / 0.0008 = 1.5
Test at α = 0.05 to determine if A is significantly more risky

Example 4: A/B/C Testing

Problem: Three website designs are tested. Which produces different conversion rates?

Use ANOVA to test if any design differs
If significant, use Tukey HSD to find the winner
Report effect size (η²) for practical significance

AI/ML Applications

1. Feature Selection with F-Test

The F-test is a core method for feature selection in classification problems. sklearn's f_classif uses ANOVA F-tests to rank features by how well they separate classes:

🎯Feature Selection with F-Test

How F-Test Selects Features

The F-test (via ANOVA) measures how well a feature separates different classes. Features with high F-scores have large between-class variance relative to within-class variance, making them good predictors. This is exactly what sklearn.feature_selection.f_classif does!

Select Dataset

3 species classification based on petal/sepal measurements

Classes: Setosa, Versicolor, Virginica

Significance Level (α) = 0.05

Feature Ranking by F-Score

Petal LengthSignificant

F = 271.81

p = <0.0001

Petal WidthSignificant

F = 204.41

p = <0.0001

Sepal LengthSignificant

F = 16.05

p = 0.0004

Random Noise

F = 0.00

p = 1.0000

Quick select:

Selected Features (0)

No features selected. Click checkboxes or use quick select buttons.

Python Equivalent (scikit-learn)

from sklearn.feature_selection import f_classif, SelectKBest

# Calculate F-scores for all features
f_scores, p_values = f_classif(X, y)

# Select top k features
selector = SelectKBest(f_classif, k=2)
X_selected = selector.fit_transform(X, y)

# Get selected feature indices
selected_indices = selector.get_support(indices=True)
print(f"Selected features: {selected_indices}")

ML Engineering Insight

Notice how the "Random Noise" feature has a low F-score? That's because it doesn't help separate the classes. F-test based feature selection automatically identifies and can eliminate such uninformative features, reducing overfitting and improving model interpretability.

Features with high F-scores effectively distinguish between target classes, making them valuable for model training.

2. Model Comparison in Regression

When comparing nested regression models (one model is a subset of another), the F-test determines if the additional features significantly improve fit:

F = \frac{(RSS_1 - RSS_2)/p}{RSS_2/(n-k)}

where:

RSS₁ = Residual sum of squares for restricted model
RSS₂ = Residual sum of squares for full model
p = number of additional parameters
n-k = degrees of freedom for full model

🐍python

1from scipy import stats
2import numpy as np
3
4# Compare two regression models
5# Model 1: y = b0 + b1*x1 (2 parameters)
6# Model 2: y = b0 + b1*x1 + b2*x2 + b3*x3 (4 parameters)
7
8rss_restricted = 120.5  # RSS from model with fewer features
9rss_full = 95.2         # RSS from model with more features
10p = 2                   # Additional parameters (x2 and x3)
11n = 100                 # Sample size
12k = 4                   # Parameters in full model
13
14f_statistic = ((rss_restricted - rss_full) / p) / (rss_full / (n - k))
15p_value = 1 - stats.f.cdf(f_statistic, p, n - k)
16
17print(f"F-statistic: {f_statistic:.4f}")
18print(f"p-value: {p_value:.6f}")
19print(f"Additional features significant: {p_value < 0.05}")

3. Testing Homoscedasticity

Many ML models (especially linear regression) assume equal variance across groups (homoscedasticity). The F-test helps validate this assumption:

🐍python

1from scipy import stats
2
3def check_homoscedasticity(residuals_by_group):
4    """
5    Test if regression residuals have equal variance
6    across groups (e.g., categorical feature levels).
7    """
8    # Levene's test is more robust than F-test
9    stat, p_value = stats.levene(*residuals_by_group)
10
11    if p_value < 0.05:
12        print("Warning: Heteroscedasticity detected!")
13        print("Consider: weighted least squares or robust SE")
14    else:
15        print("Homoscedasticity assumption appears valid")

4. Neural Network Pruning

F-tests can guide model pruning decisions by testing if removing neurons or layers significantly impacts performance:

Train full model, record validation loss (RSS_full)
Prune model, record new validation loss (RSS_pruned)
F-test: Is the increase in loss significant given the reduction in parameters?
If not significant, accept the smaller model

5. ANOVA for Hyperparameter Tuning

Use ANOVA to test if different hyperparameter settings produce significantly different cross-validation scores:

🐍python

1from scipy import stats
2import numpy as np
3
4# CV scores for different learning rates
5lr_001_scores = [0.82, 0.84, 0.81, 0.83, 0.85]
6lr_01_scores = [0.88, 0.87, 0.89, 0.88, 0.90]
7lr_1_scores = [0.75, 0.78, 0.74, 0.76, 0.77]
8
9# One-way ANOVA
10f_stat, p_value = stats.f_oneway(
11    lr_001_scores,
12    lr_01_scores,
13    lr_1_scores
14)
15
16print(f"F-statistic: {f_stat:.4f}")
17print(f"p-value: {p_value:.6f}")
18
19if p_value < 0.05:
20    print("Learning rates produce significantly different results")
21    # Proceed with post-hoc analysis to find best

Connections to Other Distributions

The Distribution Family Tree

Relationship	Formula/Description
F from χ²	F = (χ²(d₁)/d₁) / (χ²(d₂)/d₂)
t² = F(1, ν)	Squaring a t gives F with (1, ν) df
F(d₁, ∞) → χ²(d₁)/d₁	As d₂ → ∞, F approaches scaled χ²
1/F(d₁, d₂) = F(d₂, d₁)	Reciprocal swaps degrees of freedom
Beta connection	CDF involves regularized incomplete Beta

The t² = F Connection

This is particularly important: when you square a t-statistic with ν degrees of freedom, you get an F(1, ν) random variable:

t^2(\nu) = F(1, \nu)

This explains why the two-sample t-test and one-way ANOVA with two groups give identical p-values—they're testing the same thing!

Python Implementation

Basic F-Distribution Operations

🐍python

1from scipy import stats
2import numpy as np
3
4# Create F-distribution with df = (5, 20)
5f = stats.f(dfn=5, dfd=20)
6
7# PDF and CDF
8x = 2.5
9print(f"PDF at x={x}: {f.pdf(x):.6f}")
10print(f"CDF at x={x}: {f.cdf(x):.6f}")
11
12# Critical values for hypothesis testing
13print(f"\nCritical values:")
14print(f"F₀.₁₀(5,20) = {f.ppf(0.90):.4f}")
15print(f"F₀.₀₅(5,20) = {f.ppf(0.95):.4f}")
16print(f"F₀.₀₁(5,20) = {f.ppf(0.99):.4f}")
17
18# Statistics
19print(f"\nMean: {f.mean():.4f} (theoretical: {20/(20-2):.4f})")
20print(f"Variance: {f.var():.4f}")

Variance Ratio Test

🐍python

1from scipy import stats
2import numpy as np
3
4def variance_ratio_test(sample1, sample2, alpha=0.05):
5    """
6    Test H0: σ₁² = σ₂² vs H1: σ₁² ≠ σ₂²
7    """
8    n1, n2 = len(sample1), len(sample2)
9    var1 = np.var(sample1, ddof=1)
10    var2 = np.var(sample2, ddof=1)
11
12    # Put larger variance in numerator
13    if var1 >= var2:
14        F = var1 / var2
15        dfn, dfd = n1 - 1, n2 - 1
16    else:
17        F = var2 / var1
18        dfn, dfd = n2 - 1, n1 - 1
19
20    # Two-tailed p-value
21    p_value = 2 * (1 - stats.f.cdf(F, dfn, dfd))
22
23    return {
24        'F_statistic': F,
25        'df': (dfn, dfd),
26        'p_value': p_value,
27        'reject_null': p_value < alpha
28    }
29
30# Example usage
31sample_a = np.array([23, 25, 28, 24, 26, 27, 22, 25])
32sample_b = np.array([28, 35, 27, 32, 31, 38, 30, 29])
33
34result = variance_ratio_test(sample_a, sample_b)
35print(f"F = {result['F_statistic']:.4f}")
36print(f"df = {result['df']}")
37print(f"p-value = {result['p_value']:.6f}")
38print(f"Reject H0: {result['reject_null']}")

One-Way ANOVA

🐍python

1from scipy import stats
2import numpy as np
3
4# Three groups
5group_a = np.array([23, 25, 28, 24, 26])
6group_b = np.array([28, 30, 27, 29, 31])
7group_c = np.array([20, 22, 21, 23, 19])
8
9# One-way ANOVA
10F_stat, p_value = stats.f_oneway(group_a, group_b, group_c)
11
12print(f"ANOVA Results:")
13print(f"F-statistic: {F_stat:.4f}")
14print(f"p-value: {p_value:.6f}")
15
16# Effect size (eta-squared)
17all_data = np.concatenate([group_a, group_b, group_c])
18grand_mean = np.mean(all_data)
19ss_total = np.sum((all_data - grand_mean)**2)
20
21group_means = [np.mean(g) for g in [group_a, group_b, group_c]]
22group_ns = [len(g) for g in [group_a, group_b, group_c]]
23ss_between = sum(n * (m - grand_mean)**2
24                  for n, m in zip(group_ns, group_means))
25
26eta_squared = ss_between / ss_total
27print(f"Effect size (η²): {eta_squared:.4f}")

Feature Selection with f_classif

🐍python

1from sklearn.feature_selection import f_classif, SelectKBest
2from sklearn.datasets import load_iris
3import numpy as np
4
5# Load data
6X, y = load_iris(return_X_y=True)
7
8# Calculate F-scores for all features
9f_scores, p_values = f_classif(X, y)
10
11print("Feature Rankings by F-score:")
12feature_names = ['sepal_length', 'sepal_width',
13                 'petal_length', 'petal_width']
14for name, f, p in zip(feature_names, f_scores, p_values):
15    print(f"  {name}: F={f:.2f}, p={p:.4f}")
16
17# Select top 2 features
18selector = SelectKBest(f_classif, k=2)
19X_selected = selector.fit_transform(X, y)
20
21selected_idx = selector.get_support(indices=True)
22print(f"\nSelected features: {[feature_names[i] for i in selected_idx]}")

Common Pitfalls

Pitfall 1: Confusing Degrees of Freedom Order

Wrong: F(dfd, dfn) instead of F(dfn, dfd)

Right: Always numerator first, denominator second. F(5, 20) means 5 df in numerator, 20 in denominator.

Pitfall 2: One-Tailed vs Two-Tailed

For variance ratio tests, you usually want a two-tailed test (variances could differ in either direction). But the F-distribution is not symmetric, so:

🐍python

1# Two-tailed p-value for F-test
2# Put larger variance in numerator, then:
3p_value_two_tailed = 2 * (1 - stats.f.cdf(F, dfn, dfd))

Pitfall 3: Sensitivity to Non-Normality

The F-test for comparing variances is highly sensitive to departures from normality. With non-normal data:

Use Levene's test (more robust)
Use Brown-Forsythe test (uses median)
Consider non-parametric alternatives

Pitfall 4: Post-ANOVA Analysis

Wrong: ANOVA is significant, so immediately run multiple t-tests between all pairs.

Right: Use proper post-hoc tests (Tukey HSD, Bonferroni) that control the family-wise error rate.

ANOVA Assumptions

Independence of observations
Normality within each group (or large samples)
Homogeneity of variances (Levene's test can check this)

Violating these can lead to inflated Type I error rates or reduced power.

Test Your Understanding

📝Test Your Understanding

Score: 0/0

Question 1 of 80 answered

What does the F-distribution model?

Summary

The F-distribution is the workhorse for comparing variances and testing group differences. Key takeaways:

Definition: F is the ratio of two independent chi-squares, each scaled by their degrees of freedom
Two parameters: Numerator df (d₁) and denominator df (d₂)—order matters!
F ≈ 1 means equal variances: Under the null hypothesis, we expect F to cluster around 1
Primary applications: Variance comparison, ANOVA, regression model comparison, feature selection
Key relationship: t²(ν) = F(1, ν) connects t-tests to F-tests
ML relevance: sklearn's f_classif, model comparison, hyperparameter testing

The Bottom Line: Whenever you need to compare variances—whether between two populations, among multiple groups (ANOVA), or between nested models—the F-distribution provides the probability model for making sound statistical decisions.

From Fisher to Modern ML

Fisher's insights from the 1920s continue to power modern machine learning. Every time you use ANOVA for feature selection, compare regression models, or validate homoscedasticity assumptions, you're applying the F-distribution. Understanding it deeply makes you a more effective ML practitioner.