Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

📚 Core Knowledge

• Understand the principle of exchangeability under the null hypothesis
• Explain how permutation tests construct a null distribution empirically
• Describe when permutation tests are preferred over parametric alternatives
• Distinguish between exact and Monte Carlo permutation tests
• Compare permutation tests with bootstrap methods

🔧 Practical Skills

• Implement permutation tests for two-sample comparisons
• Calculate exact p-values for small samples
• Apply Monte Carlo approximation for large datasets
• Extend permutation logic to correlation and paired tests
• Use scipy and custom implementations for permutation inference

🧠 AI/ML Applications

• A/B Testing - Robust hypothesis tests for skewed metrics like revenue or conversion
• Feature Importance - Permutation importance for model interpretation
• Model Comparison - Statistical testing when comparing model performance
• Cross-Validation - Significance of CV score differences
• Fairness Auditing - Testing for disparate impact without distributional assumptions

Central Message: Permutation tests provide exact inference without making any distributional assumptions. By leveraging the principle of exchangeability under the null hypothesis, we construct a null distribution directly from the data itself—a powerful technique that predates and complements modern machine learning.

The Big Picture: Distribution-Free Inference

Throughout this chapter, we have explored tests like the t-test, chi-square test, and likelihood ratio test. These powerful methods all share a common requirement: they rely on asymptotic theory or distributional assumptions to derive the null distribution. But what if your data is highly skewed, has outliers, or comes from an unknown distribution?

Permutation tests (also called randomization tests or exact tests) offer an elegant solution. Instead of assuming a theoretical distribution, they construct the null distribution empirically by repeatedly shuffling the data. The core insight is beautifully simple:

💡

The Core Insight

If the null hypothesis is true, then the group labels are arbitrary. We could shuffle them randomly without affecting the underlying structure of the data. By observing how our test statistic behaves under all possible shuffles, we learn what values are "typical" under H₀.

Historical Context: Fisher's Lady Tasting Tea

The permutation test was pioneered by Ronald Fisher in the 1930s through his famous "Lady Tasting Tea" experiment. A colleague, Dr. Muriel Bristol, claimed she could tell whether milk or tea was poured first into a cup. Fisher designed a rigorous test:

Fisher's Experimental Design (1935)

Prepare 8 cups: 4 with milk first, 4 with tea first
Present cups in random order; she identifies which 4 had milk first
Count how many she correctly identifies
Calculate: What's the probability of this success rate if she were just guessing?

If she were guessing, all $\binom{8}{4} = 70$ ways of choosing 4 cups would be equally likely. The p-value is simply the proportion of these 70 arrangements that are as impressive as (or more impressive than) what she achieved.

Why This Was Revolutionary: Fisher showed that meaningful statistical inference could be conducted without assuming any probability distribution. The null distribution comes directly from the randomization procedure itself.

The Core Principle: Exchangeability

The mathematical foundation of permutation tests is the concept of exchangeability. Under the null hypothesis, observations are exchangeable if their joint distribution is invariant to permutations of their labels.

Formal Definition of Exchangeability

(X_1, \ldots, X_n) \stackrel{d}{=} (X_{\pi(1)}, \ldots, X_{\pi(n)})

for every permutation $\pi$ of $\{1, \ldots, n\}$

Intuition: If there is truly no treatment effect (H₀ is true), then whether an observation came from the "treatment" group or "control" group is just an arbitrary label. The data would look the same regardless of how we assigned these labels.

Scenario	Are Labels Exchangeable Under H₀?	Why?
A/B test with random assignment	Yes	Random assignment means labels are arbitrary if no effect
Drug trial: treatment vs placebo	Yes	If drug has no effect, assignment is irrelevant
Observational study: smokers vs non-smokers	Caution needed	Groups may differ systematically beyond just smoking
Time series: before vs after	Usually no	Temporal ordering typically matters

Critical Assumption: Permutation tests are not a free lunch. They require that observations are exchangeable under H₀, which typically holds when treatments are randomly assigned. For observational data, the assumption may be violated if the groups differ in ways beyond the treatment of interest.

Interactive: Understanding Exchangeability

This interactive demonstration shows how exchangeability works. Under the null hypothesis, we can shuffle group labels and create equally plausible datasets.

Understanding Exchangeability

The Key Insight: Exchangeability

Under the null hypothesis (no treatment effect), the group labels are arbitrary. If there's truly no difference between groups, we could shuffle the labels without changing the underlying data structure. This is the principle of exchangeability.

Original Data with Labels

12A

15A

18A

25B

28B

30B

Mean A: 15.0Mean B: 27.7Diff: 12.7

Key Takeaway

The permutation test asks: "How often would we see a difference as extreme as 12.7 if we randomly shuffled the labels?" If such extreme differences are rare among all permutations, we have evidence against H₀.

Mathematical Framework

Let us formalize the permutation testing procedure for a two-sample comparison.

The Permutation Distribution

Consider two groups with observations $X_1, \ldots, X_{n_1}$ (Group A) and $Y_1, \ldots, Y_{n_2}$ (Group B). Let $T$ be our test statistic (e.g., difference in means).

Permutation Test Procedure

Compute observed statistic: Calculate $T_{\text{obs}}$ from the original data
Pool the data: Combine all $n = n_1 + n_2$ observations into one set
Generate permutations: For each of the $\binom{n}{n_1}$ possible ways to assign $n_1$ observations to "Group A":
- Calculate the test statistic $T^{(b)}$
Build null distribution: The collection $\{T^{(1)}, T^{(2)}, \ldots\}$ forms the permutation distribution
Calculate p-value: Compare $T_{\text{obs}}$ to the permutation distribution

P-Value Calculation

The permutation p-value is calculated as the proportion of permuted statistics that are as extreme or more extreme than the observed statistic:

Permutation P-Value Formulas

Two-sided:

p = \frac{1}{B} \sum_{b=1}^{B} \mathbf{1}(|T^{(b)}| \geq |T_{\text{obs}}|)

Right-tailed:

p = \frac{1}{B} \sum_{b=1}^{B} \mathbf{1}(T^{(b)} \geq T_{\text{obs}})

Left-tailed:

p = \frac{1}{B} \sum_{b=1}^{B} \mathbf{1}(T^{(b)} \leq T_{\text{obs}})

where B is the number of permutations and $\mathbf{1}(\cdot)$ is the indicator function

Exact vs Monte Carlo: When

\binom{n}{n_1}

is small (roughly < 10,000), we can enumerate all permutations for an exact p-value. For larger samples, we sample B permutations randomly for a Monte Carlo approximation. With B = 10,000 permutations, the Monte Carlo error is typically ±0.01.

Interactive: Permutation Test Explorer

This interactive visualization lets you see the permutation test in action. Run permutations, build the null distribution, and observe how the p-value is calculated.

Permutation Test Explorer

Group A (Control)

2328312527

Mean: 26.80

Group B (Treatment)

3538324036

Mean: 36.20

Observed Difference (B - A)

9.40

H₀: This difference is due to random chance

n =1000

Types of Permutation Tests

The permutation principle extends to many testing scenarios beyond two-sample means:

Two-Sample Tests

Compare two independent groups. Test statistic options:

Difference in means: $\bar{Y} - \bar{X}$
Difference in medians
t-statistic (more powerful when variances differ)
Any function of the two groups

Paired Tests

For matched pairs (before/after, twins, etc.):

Compute differences $D_i = Y_i - X_i$
Randomly flip signs of differences
Test if mean difference differs from zero

Correlation Tests

Test H₀: X and Y are independent:

Keep X values fixed
Permute Y values (break the pairing)
Calculate correlation under each permutation

Multi-Group Tests (k groups)

Extension to ANOVA-style comparisons:

Pool all observations
Randomly assign to k groups (respecting sizes)
Use F-statistic or sum of squared deviations

Advantages and Limitations

Aspect	Advantage	Limitation
Assumptions	No distributional assumptions (non-parametric)	Requires exchangeability under H₀
Validity	Exact p-values for any sample size	Only tests H₀, not parameters
Robustness	Works with outliers, skewed data, any shape	May be less powerful than parametric tests when assumptions hold
Computation	Conceptually simple; easy to implement	Can be slow for large datasets
Flexibility	Any test statistic can be used	No confidence intervals directly

Interactive: Robustness Comparison

This simulation compares the Type I error rates of permutation tests versus t-tests under various conditions. See how permutation tests maintain validity even when the t-test assumptions are violated.

Permutation vs Parametric: Robustness Comparison

Sample Size (per group): 20

Effect Size: 0.5

Distribution Skewness: 0.0

0 = Normal, Higher = More right-skewed

Distribution Shape Preview

Normal distribution

Permutation vs Bootstrap

Both permutation tests and bootstrap are resampling methods, but they serve different purposes:

Permutation Tests

Purpose: Hypothesis testing
Sampling: Without replacement (shuffle labels)
Generates: Null distribution
Centered at: Zero (or null value)
Answers: "Is the observed effect real?"

Bootstrap

Purpose: Estimation uncertainty
Sampling: With replacement
Generates: Sampling distribution
Centered at: Observed statistic
Answers: "How precise is our estimate?"

Interactive: Resampling Methods Comparison

Compare the permutation and bootstrap distributions side by side. Notice how the permutation distribution is centered at zero (the null hypothesis) while the bootstrap distribution is centered at the observed difference.

Permutation vs Bootstrap: Two Resampling Philosophies

Group A

1821241923

Mean: 21.0

Group B

2831273229

Mean: 29.4

Observed Difference: 8.4

Permutation Test

Shuffles labels between groups
Samples without replacement
Tests H₀: groups are exchangeable
Distribution centered at zero

Bootstrap

Resamples observations within groups
Samples with replacement
Estimates sampling distribution
Distribution centered at observed

Resamples:500

Key Difference

Permutation tests generate a null distribution (what we'd see if H₀ were true), while bootstrap estimates the sampling distribution of the statistic. Use permutation for hypothesis testing; use bootstrap for confidence intervals.

When to Use Each:

Use permutation tests when testing hypotheses (p-values)
Use bootstrap when constructing confidence intervals
For A/B tests: Use permutation for the test, bootstrap for effect size CIs

Applications in AI/ML

Permutation tests have become increasingly important in modern machine learning. Here are key applications:

🎯 Permutation Feature Importance

Permutation importance measures feature importance by shuffling each feature and observing the drop in model performance. Unlike built-in importance measures, it works for any model and doesn't require model internals.

from sklearn.inspection import permutation_importance

🧪 A/B Testing for Skewed Metrics

Revenue, purchase amount, and session duration are often highly skewed with outliers. The t-test's normal approximation may fail. Permutation tests provide valid inference regardless of the metric's distribution.

📊 Model Comparison Testing

Is Model A's CV accuracy of 0.92 significantly better than Model B's 0.89? Permutation tests on paired CV scores (e.g., McNemar's test for classification) provide rigorous answers without asymptotic assumptions.

⚖️ Algorithmic Fairness Auditing

Testing whether a model's predictions have disparate impact across demographic groups. Permutation tests assess whether observed disparities could arise by chance, without requiring strong distributional assumptions.

Python Implementation

Complete Permutation Test Implementation

🐍python

Explanation(11)

Code(212)

1Imports

We use numpy for numerical operations and scipy.stats for comparison with parametric tests. The typing module helps with type hints.

10Function Signature

The permutation_test function is designed to be flexible: it accepts any test statistic function and handles both exact enumeration and Monte Carlo sampling.

34Observed Statistic

First, we compute the test statistic on the original data. This is the value we'll compare against the permutation distribution.

41Exact vs Monte Carlo

We check if the total number of possible permutations C(n, n_a) is small enough to enumerate exactly. For small samples, we get exact p-values; for large samples, we use Monte Carlo.

46Exact Enumeration

For small samples, we enumerate all possible ways to assign n_a observations to group A using combinations. This gives an exact p-value.

55Monte Carlo Sampling

For large samples, we randomly shuffle the pooled data and split it into two groups. Repeating this n_permutations times approximates the exact distribution.

65P-Value Calculation

The p-value is the proportion of permuted statistics as extreme as the observed. For two-sided tests, we use absolute values.

84Skewed Data Example

This A/B test example uses log-normal data (common for revenue metrics). The permutation test handles skewness correctly without assuming normality.

101Correlation Test

To test independence between X and Y, we shuffle Y (breaking the pairing) while keeping X fixed. This preserves marginal distributions.

130Paired Test

For paired data, we randomly flip the signs of differences. Under H₀ (no effect), positive and negative differences are equally likely.

158Scipy Integration

scipy.stats provides permutation_test since Python 3.9. It supports different permutation types: 'independent' for two-sample, 'samples' for paired, and 'pairings' for correlation.

201 lines without explanation

1import numpy as np
2from scipy import stats
3from typing import Literal, Callable
4
5# =============================================
6# Generic Permutation Test Framework
7# =============================================
8
9def permutation_test(
10    group_a: np.ndarray,
11    group_b: np.ndarray,
12    statistic: Callable[[np.ndarray, np.ndarray], float] = lambda a, b: np.mean(b) - np.mean(a),
13    n_permutations: int = 10000,
14    alternative: Literal['two-sided', 'greater', 'less'] = 'two-sided',
15    seed: int | None = None
16) -> dict:
17    """
18    Perform a two-sample permutation test.
19
20    Parameters
21    ----------
22    group_a : array-like
23        Observations from first group
24    group_b : array-like
25        Observations from second group
26    statistic : callable
27        Function that computes test statistic from (group_a, group_b)
28    n_permutations : int
29        Number of permutations (use 'exact' for small samples)
30    alternative : str
31        'two-sided', 'greater', or 'less'
32    seed : int, optional
33        Random seed for reproducibility
34
35    Returns
36    -------
37    dict with 'statistic', 'p_value', 'permutation_distribution'
38    """
39    if seed is not None:
40        np.random.seed(seed)
41
42    # Compute observed statistic
43    observed = statistic(group_a, group_b)
44
45    # Pool all observations
46    pooled = np.concatenate([group_a, group_b])
47    n_a = len(group_a)
48    n_total = len(pooled)
49
50    # Check if exact enumeration is feasible
51    from math import comb
52    n_exact = comb(n_total, n_a)
53
54    if n_exact <= n_permutations:
55        # Exact test: enumerate all permutations
56        from itertools import combinations
57        perm_stats = []
58        for indices in combinations(range(n_total), n_a):
59            perm_a = pooled[list(indices)]
60            perm_b = pooled[[i for i in range(n_total) if i not in indices]]
61            perm_stats.append(statistic(perm_a, perm_b))
62        perm_stats = np.array(perm_stats)
63        actual_perms = n_exact
64    else:
65        # Monte Carlo approximation
66        perm_stats = np.zeros(n_permutations)
67        for i in range(n_permutations):
68            shuffled = np.random.permutation(pooled)
69            perm_a = shuffled[:n_a]
70            perm_b = shuffled[n_a:]
71            perm_stats[i] = statistic(perm_a, perm_b)
72        actual_perms = n_permutations
73
74    # Calculate p-value based on alternative
75    if alternative == 'two-sided':
76        p_value = np.mean(np.abs(perm_stats) >= np.abs(observed))
77    elif alternative == 'greater':
78        p_value = np.mean(perm_stats >= observed)
79    else:  # 'less'
80        p_value = np.mean(perm_stats <= observed)
81
82    return {
83        'statistic': observed,
84        'p_value': p_value,
85        'permutation_distribution': perm_stats,
86        'n_permutations': actual_perms,
87        'exact': n_exact <= n_permutations
88    }
89
90
91# =============================================
92# Example 1: Basic two-sample test
93# =============================================
94
95# Simulated A/B test data (revenue per user)
96np.random.seed(42)
97control = np.random.lognormal(3, 1, 50)     # Control group: skewed revenue
98treatment = np.random.lognormal(3.2, 1, 50) # Treatment group: 20% higher mean
99
100result = permutation_test(control, treatment, n_permutations=10000)
101
102print("=== Two-Sample Permutation Test ===")
103print(f"Observed difference in means: {result['statistic']:.2f}")
104print(f"P-value: {result['p_value']:.4f}")
105print(f"Exact test: {result['exact']}")
106
107# Compare with t-test (may be unreliable for skewed data!)
108t_stat, t_pval = stats.ttest_ind(treatment, control)
109print(f"\nFor comparison - t-test p-value: {t_pval:.4f}")
110
111
112# =============================================
113# Example 2: Permutation test for correlation
114# =============================================
115
116def permutation_correlation_test(
117    x: np.ndarray,
118    y: np.ndarray,
119    n_permutations: int = 10000,
120    seed: int | None = None
121) -> dict:
122    """Test H0: X and Y are independent."""
123    if seed is not None:
124        np.random.seed(seed)
125
126    observed_r, _ = stats.pearsonr(x, y)
127
128    perm_correlations = np.zeros(n_permutations)
129    for i in range(n_permutations):
130        perm_y = np.random.permutation(y)
131        perm_correlations[i], _ = stats.pearsonr(x, perm_y)
132
133    p_value = np.mean(np.abs(perm_correlations) >= np.abs(observed_r))
134
135    return {
136        'correlation': observed_r,
137        'p_value': p_value,
138        'permutation_distribution': perm_correlations
139    }
140
141# Test correlation between advertising spend and sales
142ad_spend = np.array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55])
143sales = np.array([120, 145, 170, 190, 220, 245, 260, 290, 310, 340])
144
145corr_result = permutation_correlation_test(ad_spend, sales)
146print("\n=== Permutation Correlation Test ===")
147print(f"Observed correlation: {corr_result['correlation']:.4f}")
148print(f"P-value: {corr_result['p_value']:.4f}")
149
150
151# =============================================
152# Example 3: Paired permutation test (sign flip)
153# =============================================
154
155def paired_permutation_test(
156    before: np.ndarray,
157    after: np.ndarray,
158    n_permutations: int = 10000,
159    seed: int | None = None
160) -> dict:
161    """Test H0: No difference (by randomly flipping signs of differences)."""
162    if seed is not None:
163        np.random.seed(seed)
164
165    differences = after - before
166    observed_mean = np.mean(differences)
167
168    perm_means = np.zeros(n_permutations)
169    for i in range(n_permutations):
170        # Randomly flip signs
171        signs = np.random.choice([-1, 1], size=len(differences))
172        perm_means[i] = np.mean(differences * signs)
173
174    p_value = np.mean(np.abs(perm_means) >= np.abs(observed_mean))
175
176    return {
177        'mean_difference': observed_mean,
178        'p_value': p_value,
179        'permutation_distribution': perm_means
180    }
181
182# Blood pressure before and after treatment
183bp_before = np.array([140, 145, 138, 150, 142, 148, 155, 140, 143, 147])
184bp_after = np.array([132, 138, 130, 145, 135, 140, 148, 132, 138, 140])
185
186paired_result = paired_permutation_test(bp_before, bp_after)
187print("\n=== Paired Permutation Test ===")
188print(f"Mean BP reduction: {paired_result['mean_difference']:.2f} mmHg")
189print(f"P-value: {paired_result['p_value']:.4f}")
190
191
192# =============================================
193# Using scipy.stats (Python 3.9+)
194# =============================================
195
196# scipy provides permutation_test in stats module
197from scipy.stats import permutation_test as scipy_perm_test
198
199def stat_func(x, y, axis):
200    return np.mean(x, axis=axis) - np.mean(y, axis=axis)
201
202scipy_result = scipy_perm_test(
203    (treatment, control),
204    stat_func,
205    n_resamples=10000,
206    alternative='two-sided',
207    permutation_type='independent'
208)
209
210print("\n=== scipy.stats.permutation_test ===")
211print(f"Statistic: {scipy_result.statistic:.4f}")
212print(f"P-value: {scipy_result.pvalue:.4f}")

Knowledge Check

Test your understanding of permutation tests with this interactive quiz.

Knowledge CheckQuestion 1 of 8

What is the key assumption that permutation tests rely on under the null hypothesis?

Current score: 0/0

Chapter 15: Complete Test Selection Guide

After covering all the major statistical tests in this chapter, here's a comprehensive guide to help you choose the right test for your situation.

Decision Flowchart: Which Test Should I Use?

Step 1: What type of data?

Continuous (means) → Go to Step 2
Categorical (counts) → Chi-square tests (Section 2)
Variances → F-tests (Section 3)

Step 2: How many groups?

One group vs known value → One-sample t-test
Two groups (independent) → Two-sample t-test (or Welch's)
Two groups (paired/matched) → Paired t-test
3+ groups → ANOVA/F-test

Step 3: Are assumptions met?

Normality holds, large n → Parametric test (t, F, χ²)
Normality violated, small n → Non-parametric alternative
Outliers or skewed data → Permutation test (Section 6)

Parametric vs Non-Parametric Alternatives

When distributional assumptions are violated or sample sizes are small, non-parametric tests provide valid alternatives. Here's a comprehensive mapping:

Situation	Parametric Test	Non-Parametric Alternative	When to Use Alternative
One sample, location	One-sample t-test	Wilcoxon signed-rank	Non-normal, small n, outliers
Two independent samples	Two-sample t-test	Mann-Whitney U (Wilcoxon rank-sum)	Skewed data, ordinal data
Two paired samples	Paired t-test	Wilcoxon signed-rank	Non-normal differences, small n
3+ independent groups	One-way ANOVA	Kruskal-Wallis H	Unequal variances, non-normal
3+ related samples	Repeated measures ANOVA	Friedman test	Non-normal, ordinal data
Correlation	Pearson r	Spearman ρ or Kendall τ	Non-linear, ordinal, outliers
2×2 contingency	Chi-square test	Fisher's exact test	Small expected counts (<5)
General two-sample	t-test	Permutation test	Any violation, skewed, small n

When to Use Parametric

Data approximately normal (or large n by CLT)
Variances roughly equal across groups
Need maximum statistical power
Want confidence intervals for parameters
Sample size is moderate to large (n > 30)

When to Use Non-Parametric

Data heavily skewed or with outliers
Sample size is small (n < 20-30)
Data is ordinal (rankings) not interval
Uncertain about distributional assumptions
Want robustness over efficiency

Complete Test Summary

Test (Section)	Purpose	Key Formula/Statistic	Assumptions
Z-test (1)	Mean when σ known	Z = (x̄ - μ₀) / (σ/√n)	Normal data, known σ
t-test (1)	Mean when σ unknown	t = (x̄ - μ₀) / (s/√n)	Normal (or large n), unknown σ
Chi-square (2)	Categorical associations	χ² = Σ(O-E)²/E	Expected counts ≥ 5
F-test (3)	Variance comparison, ANOVA	F = MS_between / MS_within	Normal, equal variances
LRT (4)	Nested model comparison	-2 log(L₀/L₁) ~ χ²	Large samples (asymptotic)
Wald (5)	Parameter significance	(θ̂ - θ₀)² / Var(θ̂)	Large samples, MLE computed
Score (5)	Parameter significance	U²/I(θ₀)	Large samples, null computed
Permutation (6)	Distribution-free test	Any statistic	Exchangeability under H₀

Rule of Thumb: When in doubt, start with the permutation test. It's valid under the weakest assumptions and often has power comparable to parametric tests. Use parametric tests when you need confidence intervals or when you're confident in assumptions.

Summary

Key Takeaways

Distribution-free inference: Permutation tests require no distributional assumptions. They work correctly for any data shape—skewed, multimodal, with outliers.
Exchangeability principle: Under H₀, group labels are arbitrary. We can shuffle them to build the null distribution directly from the data.
Exact p-values: For small samples, we can enumerate all permutations for exact inference. For large samples, Monte Carlo sampling provides accurate approximations.
Flexibility: Any test statistic can be used (means, medians, custom functions). The same principle extends to paired tests, correlation, and multi-group comparisons.
Bootstrap distinction: Permutation tests shuffle labels to create a null distribution (hypothesis testing). Bootstrap resamples with replacement to estimate sampling variability (confidence intervals).
ML applications: Permutation importance for feature selection, A/B testing for skewed metrics, model comparison, and fairness auditing all leverage permutation logic.

Quick Reference

Test Type	What Gets Permuted	Test Statistic	Use Case
Two-sample	Group labels	Mean difference, t-statistic	A/B tests, treatment effects
Paired	Signs of differences	Mean of signed differences	Before/after comparisons
Correlation	Y values (keep X fixed)	Pearson r, Spearman ρ	Testing independence
Multi-group	Group labels	F-statistic, Kruskal-Wallis H	Comparing >2 groups

Final Thought: Permutation tests embody a beautiful principle: when we don't know the null distribution, we can construct it from the data itself. This approach, pioneered by Fisher nearly a century ago, remains one of the most powerful and underutilized tools in the modern data scientist's toolkit. With computational power now abundant, there's rarely a reason not to use permutation tests when parametric assumptions are questionable.

Learning Objectives

📚 Core Knowledge

🔧 Practical Skills

🧠 AI/ML Applications

The Big Picture: Distribution-Free Inference

The Core Insight

Historical Context: Fisher's Lady Tasting Tea

Fisher's Experimental Design (1935)

The Core Principle: Exchangeability

Formal Definition of Exchangeability

Interactive: Understanding Exchangeability

The Key Insight: Exchangeability

Original Data with Labels

Key Takeaway

Mathematical Framework

The Permutation Distribution

Permutation Test Procedure

P-Value Calculation

Permutation P-Value Formulas

Interactive: Permutation Test Explorer

Group A (Control)

Group B (Treatment)

Types of Permutation Tests

Two-Sample Tests

Paired Tests

Correlation Tests

Multi-Group Tests (k groups)

📊Example: Permutation Test for Correlation

Advantages and Limitations

Interactive: Robustness Comparison

Distribution Shape Preview

Permutation vs Bootstrap

Permutation Tests

Bootstrap

Interactive: Resampling Methods Comparison

Group A

Group B

Permutation Test

Bootstrap

Key Difference

Applications in AI/ML

🎯 Permutation Feature Importance

🧪 A/B Testing for Skewed Metrics

📊 Model Comparison Testing

⚖️ Algorithmic Fairness Auditing

🔍Deep Dive: Permutation Feature Importance

Python Implementation

Knowledge Check

Chapter 15: Complete Test Selection Guide

Decision Flowchart: Which Test Should I Use?

Parametric vs Non-Parametric Alternatives

When to Use Parametric

When to Use Non-Parametric

Complete Test Summary

Summary

Key Takeaways

Quick Reference