Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

📚 Core Knowledge

• Derive and apply z-intervals when σ is known
• Understand why the t-distribution is needed when σ is unknown
• Construct confidence intervals for variance using χ²
• Explain the asymmetry of variance CIs

🔧 Practical Skills

• Construct and interpret CIs for the mean (both z and t)
• Build CIs for comparing two means (A/B testing)
• Determine required sample sizes for desired precision
• Implement these methods in Python for ML workflows

Where You'll Apply This: Model accuracy confidence bounds, A/B test analysis for comparing model versions, uncertainty quantification in predictions, hyperparameter sensitivity analysis, and communicating model performance to stakeholders.

The Big Picture

The normal distribution is the workhorse of statistical inference. Thanks to the Central Limit Theorem, sample means are approximately normal for large samples—regardless of the underlying population distribution. This makes confidence intervals based on normal theory widely applicable.

But there's a critical distinction: Do we know the population variance σ²? This seemingly simple question leads to two different procedures with profound implications for small-sample inference.

Two Scenarios for Estimating μ

σ Known

Z-interval
Use standard normal distribution
z* = 1.96 for 95% CI

Rare in practice; used in quality control with established standards

σ Unknown

T-interval
Use Student's t-distribution
t* depends on df = n - 1

Common in practice; accounts for uncertainty in estimating σ

Historical Development

👨‍🔬

William Sealy Gosset (1908)

Working at Guinness Brewery, Gosset faced small-sample problems in quality control. He discovered the t-distribution and published under the pseudonym "Student" because Guinness prohibited employees from publishing.

📊

R. A. Fisher (1920s-30s)

Fisher provided rigorous mathematical foundations for Gosset's work, proved key distribution properties, and developed the F-distribution for comparing variances.

The development of the t-distribution was a breakthrough for science. Before Gosset, statisticians either required large samples or assumed σ was known. The t-distribution liberated researchers to draw valid inferences from small experiments.

Z-Interval: Known Variance

When the population standard deviation σ is known, we can construct an exact confidence interval for the mean μ using the standard normal distribution. This scenario is relatively rare in practice but provides the foundation for understanding more complex procedures.

The Z-Interval Formula

Confidence Interval for μ (Known σ)

\bar{X} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}

The derivation follows from the sampling distribution of the mean:

Standardize the sample mean

Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \sim N(0, 1)

Find critical values for desired confidence

P(-z^* \leq Z \leq z^*) = 1 - \alpha

Substitute and rearrange

P\left(-z^* \leq \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \leq z^*\right) = 1 - \alpha

Isolate μ in the middle

P\left(\bar{X} - z^*\frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X} + z^*\frac{\sigma}{\sqrt{n}}\right) = 1 - \alpha

Confidence Level	α	z*
90%	0.10	1.645
95%	0.05	1.960
99%	0.01	2.576
99.9%	0.001	3.291

Interactive: Z-Interval Visualizer

Explore how the z-interval changes with different parameter values. Watch how the interval relates to the sampling distribution and observe when it captures or misses the true mean.

Z-Interval Visualizer (Known σ)

When the population standard deviation σ is known, we use the standard normal distribution to construct confidence intervals.

True μ (unknown in practice)

100

Known σ

Sample Size n

Sample Mean x̄

103.0

Confidence Level

Show sampling distribution

Z-Interval Formula

CI = x̄ ± z* · (σ/√n)

= 103.00 ± 1.960 · (15/√25)

= 103.00 ± 5.880

= [97.120, 108.880]

CI Covers True μ

The 95% CI [97.12, 108.88] contains the true μ = 100.

Key Insight

The z-interval is exact when σ is known and the population is normal (or n is large). The critical value z* = 1.960 for 95% confidence comes from P(-z* < Z < z*) = 0.95.

T-Interval: Unknown Variance

In most real-world situations, we don't know the population standard deviation σ. We must estimate it using the sample standard deviation s. This introduces extra uncertainty that the standard normal distribution doesn't account for.

Why the t-Distribution?

When we replace σ with s in our standardized statistic, something fundamental changes:

Known σ

Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0,1)

Z is normally distributed because σ is a constant.

Unknown σ

T = \frac{\bar{X} - \mu}{s/\sqrt{n}} \sim t_{n-1}

T follows the t-distribution with n-1 degrees of freedom.

The Key Insight: When we estimate σ with s, we introduce a random variable in the denominator. This ratio of normal and chi-square random variables gives us the t-distribution. The heavier tails of the t-distribution compensate for the extra uncertainty.

Interactive: T-Distribution Derivation

See exactly why the t-distribution has heavier tails and how it approaches the normal as the sample size increases.

Why We Need the t-Distribution

When σ is unknown and must be estimated by s, the standardized statistic follows a t-distribution, not a normal distribution.

The Key Derivation

Known σ: Z-statistic

Z = (x̄ - μ) / (σ/√n) ~ N(0, 1)

Unknown σ: Replace with sample s

T = (x̄ - μ) / (s/√n) ~ ???

This is NOT normally distributed because s is random!

The key relationship

T = Z / √(V/ν) where V = (n-1)s²/σ² ~ χ²(n-1)

This is the definition of a t-distribution with ν = n-1 degrees of freedom

Result: t-distribution

T = (x̄ - μ) / (s/√n) ~ t(n-1)

The t-distribution has heavier tails to account for the extra uncertainty in estimating σ

Degrees of Freedom (n - 1)

df = 5 (n = 6)

Confidence Level

t(5) vs Standard Normal

Critical Values at 95% Confidence

df	2	5	10	20	30	50	100	∞
t*	4.522	2.571	2.228	2.086	2.042	2.009	1.984	1.960
% wider	+130.7%	+31.2%	+13.7%	+6.4%	+4.2%	+2.5%	+1.2%	-

Key Insight

With df = 5, the t-critical value is 2.571, which is 31.2% larger than z* = 1.96. This makes the t-interval wider, accounting for the uncertainty in estimating σ with s. As df → ∞, t → z.

The T-Interval Formula

Confidence Interval for μ (Unknown σ)

\bar{X} \pm t^*_{n-1} \cdot \frac{s}{\sqrt{n}}

where $t^*_{n-1}$ is the critical value from t-distribution with n-1 degrees of freedom

Assumptions for valid t-intervals:

Random Sample: Observations are independent and randomly selected
Normality: The population is normally distributed, OR the sample size is large (n ≥ 30) so CLT applies
No Extreme Outliers: The t-test is sensitive to outliers, especially with small samples

Interactive: CI Calculator

Use this calculator to construct t-intervals from your data. It also shows the z-interval for comparison, illustrating how the difference diminishes with larger samples.

Confidence Interval Calculator

Sample Mean (x̄)

Sample Std Dev (s)

Sample Size (n)

Confidence Level

t-interval vs z-interval Comparison

Calculate degrees of freedom

df = n - 1 = 10 - 1 = 9

Find critical t-value for 95% confidence

t_{0.025000000000000022,df=9} = 2.2622(vs z = 1.9600)

Calculate standard error

SE = s / √n = 8.5 / √10 = 2.6879

Calculate margin of error

ME = t* × SE = 2.2622 × 2.6879 = 6.0805

Construct confidence interval

CI = x̄ ± ME = 165 ± 6.0805

= [158.9195, 171.0805]

95% t-Confidence Interval

[158.919, 171.081]

Width: 12.1611

z-interval (if σ known)

[159.732, 170.268]

Width: 10.5367

Width Difference

+15.4% wider

t-interval accounts for variance uncertainty

Interpretation

We are 95% confident that the true population mean lies between 158.919 and 171.081. With only 10 observations, the t-interval is 15.4% wider than the z-interval to account for uncertainty in estimating σ.

CI for Variance

Sometimes we're interested in the population variance σ² itself, not just the mean. This arises in quality control, risk assessment, and when comparing variability between groups or processes.

The Chi-Square Pivotal Quantity

For a random sample from a normal population, the sample variance relates to the population variance through a chi-square distribution:

Chi-Square Pivotal Quantity

\frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}

Using this pivotal quantity, we can derive the confidence interval for variance:

Confidence Interval for σ²

\left[\frac{(n-1)s^2}{\chi^2_{1-\alpha/2}}, \frac{(n-1)s^2}{\chi^2_{\alpha/2}}\right]

For standard deviation, take the square root of both bounds

Asymmetry Alert! Unlike CIs for the mean, the CI for variance is asymmetric. The chi-square distribution is right-skewed, so the upper bound is further from s² than the lower bound. This asymmetry is especially pronounced for small degrees of freedom.

Interactive: Variance CI Calculator

Explore how the chi-square distribution shapes the confidence interval for variance. Notice the asymmetry and how it changes with degrees of freedom.

Confidence Interval for Variance (σ²)

The CI for variance uses the chi-square distribution. Unlike the symmetric z and t intervals, this CI is asymmetric.

Sample Variance (s²)

s² = 25

Sample Size (n)

n = 20 (df = 19)

Confidence Level

Chi-Square Distribution with df = 19

The pivotal quantity

(n-1)s²/σ² ~ χ²(n-1)

Find chi-square critical values

χ²_(0.025, 19) = 8.9091
χ²_(0.975, 19) = 32.8405

Solve for σ²

P(χ²_L < (n-1)s²/σ² < χ²_U) = 1 - α
⇒ P((n-1)s²/χ²_U < σ² < (n-1)s²/χ²_L) = 1 - α

Calculate CI bounds

Lower = (19 × 25) / 32.8405 = 14.4639
Upper = (19 × 25) / 8.9091 = 53.3160

95% CI for Variance (σ²)

[14.464, 53.316]

Width: 38.852

95% CI for Std Dev (σ)

[3.803, 7.302]

Take √ of variance bounds

Why Is This CI Asymmetric?

The chi-square distribution is right-skewed (especially for small df), so the CI is not centered on s². The distance from s² to the upper bound (28.32) is larger than the distance to the lower bound (10.54). This asymmetry decreases as df increases.

Two-Sample Confidence Intervals

Often we want to compare means from two independent groups. This is the foundation of A/B testing, treatment comparisons, and many experimental designs.

Pooled vs Welch Approach

There are two main approaches to constructing two-sample CIs, depending on whether we assume equal variances in both populations:

Pooled Variance Approach

Assumes $\sigma_1^2 = \sigma_2^2$

s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}

df = n₁ + n₂ - 2

Welch's Approach

Does NOT assume equal variances

SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

df calculated using Welch-Satterthwaite formula

Practical Recommendation: Welch's approach is generally preferred because it's more robust. If the variances are actually equal, Welch's method performs almost as well as the pooled approach. If they're not equal, it's substantially better.

Interactive: Two-Sample CI

Explore how to construct and interpret confidence intervals for the difference between two means. This is exactly what you'll do in A/B testing!

Two-Sample Confidence Interval

Compare the means of two independent groups. Essential for A/B testing and treatment comparisons.

Group 1 (Control)

Mean (x̄₁)

Std Dev (s₁)

Sample Size (n₁)

Group 2 (Treatment)

Mean (x̄₂)

Std Dev (s₂)

Sample Size (n₂)

Confidence Level

Assume equal variances (pooled t-test)

CI for μ₂ - μ₁

Welch's Approach

SE = √(s₁²/n₁ + s₂²/n₂) = 3.7148

df (Welch's) ≈ 45

t* = 2.0141

ME = 7.4821

95% CI for μ₂ - μ₁

[-0.482, 14.482]

Not statistically significant. The CI contains 0, so we cannot conclude the groups differ.

A/B Testing Connection

This is exactly what happens in A/B testing! If your metric is the conversion rate or average revenue, and the CI for (Treatment - Control) excludes 0, you have evidence that the treatment has a real effect. The CI width shows how precisely you've estimated the effect size.

Sample Size Planning

Before collecting data, you should determine how many observations you need to achieve your desired precision. The sample size formula for a given margin of error is:

Required Sample Size

n = \left(\frac{z^* \cdot \sigma}{ME}\right)^2

where ME is the desired margin of error

The Square Root Law: Because margin of error decreases as 1/√n, you must quadruple the sample size to halve the margin of error. This makes precision improvements increasingly expensive.

Interactive: Sample Size Calculator

Plan your experiments effectively by determining the sample size needed for your desired precision.

Sample Size Planning for Confidence Intervals

Determine how many observations you need to achieve a desired margin of error.

Desired Margin of Error

ME = ±5

Estimated σ (prior knowledge)

σ ≈ 20

Confidence Level

Required Sample Size

n = 62

To achieve ME = ±5 with 95% confidence

n = (z* × σ / ME)² = (1.960 × 20 / 5)² = 62

Margin of Error vs Sample Size

Quick Reference: ME at Different Sample Sizes

Sample Size (n)	10	25	50	100	200	400	800
Margin of Error	±12.40	±7.84	±5.54	±3.92	±2.77	±1.96	±1.39

The Square Root Law

Because ME ∝ 1/√n, you must quadruple the sample size to halve the margin of error. This is why large improvements in precision become increasingly expensive. Planning your sample size upfront is crucial for efficient experimentation.

AI/ML Applications

Confidence intervals are essential tools for machine learning practitioners. Here's how these methods apply to real-world ML workflows:

Model Performance Evaluation

📈 Test Set Performance Uncertainty

When you report "accuracy = 87% on the test set," you should include a confidence interval. For binary classification on n test samples:

CI = \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

This tells stakeholders how much the metric might vary with a different test set.

🔄 K-Fold Cross-Validation

K-fold CV gives you K performance estimates. Use a t-interval on these K values:

CI = \bar{s} \pm t^*_{K-1} \cdot \frac{s_{scores}}{\sqrt{K}}

Be cautious: fold scores are not independent, so this CI may be optimistic.

🎯 A/B Testing Model Versions

Comparing model A vs model B in production? Use a two-sample CI for the difference. If the CI for (Metric_B - Metric_A) excludes 0, you have evidence of a real difference.

Hyperparameter Uncertainty

When tuning hyperparameters, each configuration's performance estimate has uncertainty. Consider the variance of your evaluation metrics when making decisions.

Hyperparameter Selection with CIs

Instead of selecting the config with the highest mean score, consider whether the difference is statistically significant:

Compute CI for each configuration's mean performance
If CIs overlap substantially, the difference may not be meaningful
Consider simpler models when complex ones aren't significantly better

Python Implementation

🐍python

1import numpy as np
2from scipy import stats
3import pandas as pd
4
5def ci_mean_known_sigma(data, sigma, confidence=0.95):
6    """
7    CI for mean when population std dev is known (z-interval).
8
9    Parameters
10    ----------
11    data : array-like
12        Sample data
13    sigma : float
14        Known population standard deviation
15    confidence : float
16        Confidence level (default 0.95)
17
18    Returns
19    -------
20    dict with estimate, lower, upper, margin_of_error
21    """
22    n = len(data)
23    x_bar = np.mean(data)
24    alpha = 1 - confidence
25    z_star = stats.norm.ppf(1 - alpha/2)
26
27    se = sigma / np.sqrt(n)
28    me = z_star * se
29
30    return {
31        'estimate': x_bar,
32        'lower': x_bar - me,
33        'upper': x_bar + me,
34        'margin_of_error': me,
35        'method': 'z-interval'
36    }
37
38
39def ci_mean_unknown_sigma(data, confidence=0.95):
40    """
41    CI for mean when sigma is unknown (t-interval).
42
43    Parameters
44    ----------
45    data : array-like
46        Sample data
47    confidence : float
48        Confidence level (default 0.95)
49
50    Returns
51    -------
52    dict with estimate, lower, upper, margin_of_error, df
53    """
54    n = len(data)
55    x_bar = np.mean(data)
56    s = np.std(data, ddof=1)  # Sample std dev (unbiased)
57
58    alpha = 1 - confidence
59    df = n - 1
60    t_star = stats.t.ppf(1 - alpha/2, df)
61
62    se = s / np.sqrt(n)
63    me = t_star * se
64
65    return {
66        'estimate': x_bar,
67        'lower': x_bar - me,
68        'upper': x_bar + me,
69        'margin_of_error': me,
70        'df': df,
71        't_critical': t_star,
72        'method': 't-interval'
73    }
74
75
76def ci_variance(data, confidence=0.95):
77    """
78    CI for population variance using chi-square distribution.
79
80    Parameters
81    ----------
82    data : array-like
83        Sample data (assumed from normal population)
84    confidence : float
85        Confidence level (default 0.95)
86
87    Returns
88    -------
89    dict with variance and std dev CIs
90    """
91    n = len(data)
92    s2 = np.var(data, ddof=1)  # Sample variance
93
94    alpha = 1 - confidence
95    df = n - 1
96
97    chi2_lower = stats.chi2.ppf(alpha/2, df)
98    chi2_upper = stats.chi2.ppf(1 - alpha/2, df)
99
100    var_lower = (df * s2) / chi2_upper
101    var_upper = (df * s2) / chi2_lower
102
103    return {
104        'sample_variance': s2,
105        'var_lower': var_lower,
106        'var_upper': var_upper,
107        'std_lower': np.sqrt(var_lower),
108        'std_upper': np.sqrt(var_upper),
109        'df': df,
110        'method': 'chi-square interval'
111    }
112
113
114def ci_two_sample_diff(data1, data2, confidence=0.95, equal_var=False):
115    """
116    CI for difference of two means (mu2 - mu1).
117
118    Parameters
119    ----------
120    data1, data2 : array-like
121        Sample data from two groups
122    confidence : float
123        Confidence level (default 0.95)
124    equal_var : bool
125        Whether to assume equal variances (default False = Welch)
126
127    Returns
128    -------
129    dict with point estimate, CI bounds, and significance info
130    """
131    n1, n2 = len(data1), len(data2)
132    x1_bar, x2_bar = np.mean(data1), np.mean(data2)
133    s1, s2 = np.std(data1, ddof=1), np.std(data2, ddof=1)
134
135    diff = x2_bar - x1_bar
136    alpha = 1 - confidence
137
138    if equal_var:
139        # Pooled variance
140        sp2 = ((n1-1)*s1**2 + (n2-1)*s2**2) / (n1 + n2 - 2)
141        se = np.sqrt(sp2 * (1/n1 + 1/n2))
142        df = n1 + n2 - 2
143        method = 'pooled t-interval'
144    else:
145        # Welch's approximation
146        se = np.sqrt(s1**2/n1 + s2**2/n2)
147        df_num = (s1**2/n1 + s2**2/n2)**2
148        df_den = (s1**2/n1)**2/(n1-1) + (s2**2/n2)**2/(n2-1)
149        df = df_num / df_den
150        method = "Welch's t-interval"
151
152    t_star = stats.t.ppf(1 - alpha/2, df)
153    me = t_star * se
154
155    lower = diff - me
156    upper = diff + me
157    significant = (lower > 0) or (upper < 0)
158
159    return {
160        'difference': diff,
161        'lower': lower,
162        'upper': upper,
163        'margin_of_error': me,
164        'se': se,
165        'df': df,
166        't_critical': t_star,
167        'significant': significant,
168        'method': method
169    }
170
171
172def required_sample_size(desired_me, sigma, confidence=0.95):
173    """
174    Calculate required sample size for desired margin of error.
175
176    Parameters
177    ----------
178    desired_me : float
179        Desired margin of error
180    sigma : float
181        Estimated population standard deviation
182    confidence : float
183        Confidence level (default 0.95)
184
185    Returns
186    -------
187    int : Required sample size (rounded up)
188    """
189    alpha = 1 - confidence
190    z_star = stats.norm.ppf(1 - alpha/2)
191    n = (z_star * sigma / desired_me) ** 2
192    return int(np.ceil(n))
193
194
195# Example: ML Model Evaluation
196if __name__ == "__main__":
197    np.random.seed(42)
198
199    # Simulate test set accuracy
200    true_accuracy = 0.87
201    n_test = 500
202    predictions = np.random.binomial(1, true_accuracy, n_test)
203    observed_acc = np.mean(predictions)
204
205    # CI for accuracy (as proportion)
206    se_acc = np.sqrt(observed_acc * (1 - observed_acc) / n_test)
207    z_star = 1.96
208    ci_lower = observed_acc - z_star * se_acc
209    ci_upper = observed_acc + z_star * se_acc
210
211    print(f"Model Accuracy: {observed_acc:.1%}")
212    print(f"95% CI: [{ci_lower:.1%}, {ci_upper:.1%}]")
213    print(f"Report as: {observed_acc:.1%} ± {z_star*se_acc:.1%}")
214
215    # Compare two model versions
216    model_a_scores = np.array([0.85, 0.87, 0.83, 0.86, 0.84])  # 5-fold CV
217    model_b_scores = np.array([0.88, 0.90, 0.87, 0.89, 0.88])
218
219    result = ci_two_sample_diff(model_a_scores, model_b_scores)
220    print(f"\nModel B - Model A: {result['difference']:.3f}")
221    print(f"95% CI: [{result['lower']:.3f}, {result['upper']:.3f}]")
222    print(f"Significant difference? {result['significant']}")

Knowledge Check

Test your understanding of confidence intervals for normal distribution parameters.

Knowledge Check: CI for Normal Parameters

Test your understanding of confidence intervals for normal distribution parameters.

1. When should you use a t-interval instead of a z-interval for estimating the population mean?

2. A 95% CI for the mean is [45, 55]. You want a 99% CI from the same data. What happens?

3. The t-distribution approaches the standard normal distribution as:

4. Why is the confidence interval for variance (σ²) asymmetric around s²?

5. In a two-sample t-test, what does Welch's method do differently than the pooled variance approach?

6. To halve the width of a confidence interval (keeping all else constant), you need to:

7. A 95% CI for μ₂ - μ₁ is [-2.3, 8.7]. What can you conclude at the 5% significance level?

8. Which statement about the t-distribution is FALSE?

Summary

Key Takeaways

Z-interval (known σ): Use standard normal critical values. CI = $\bar{X} \pm z^* \cdot \sigma/\sqrt{n}$
T-interval (unknown σ): Use t-distribution with df = n-1. The wider critical values account for uncertainty in estimating σ.
Variance CI: Uses chi-square distribution and is asymmetric. CI = $[(n-1)s^2/\chi^2_U, (n-1)s^2/\chi^2_L]$
Two-sample CI: Prefer Welch's method unless you have strong evidence of equal variances. The CI for the difference tells you if groups differ.
Sample size planning: n = (z* × σ / ME)². Quadruple n to halve the margin of error.

Looking Ahead: In the next section, we'll explore large-sample confidence intervals that apply when we can rely on the Central Limit Theorem, even for non-normal populations.

Learning Objectives

📚 Core Knowledge

🔧 Practical Skills

The Big Picture

Two Scenarios for Estimating μ

Historical Development

William Sealy Gosset (1908)

R. A. Fisher (1920s-30s)

Z-Interval: Known Variance

The Z-Interval Formula

Confidence Interval for μ (Known σ)

Interactive: Z-Interval Visualizer

Z-Interval Visualizer (Known σ)

Z-Interval Formula

CI Covers True &mu;

Key Insight

T-Interval: Unknown Variance

Why the t-Distribution?

Known σ

Unknown σ

Interactive: T-Distribution Derivation

Why We Need the t-Distribution

The Key Derivation

t(5) vs Standard Normal

Critical Values at 95% Confidence

Key Insight

The T-Interval Formula

Confidence Interval for μ (Unknown σ)

Interactive: CI Calculator

Confidence Interval Calculator

t-interval vs z-interval Comparison

Interpretation

CI for Variance

The Chi-Square Pivotal Quantity

Chi-Square Pivotal Quantity

Confidence Interval for σ²

Interactive: Variance CI Calculator

Confidence Interval for Variance (σ²)

Chi-Square Distribution with df = 19

Why Is This CI Asymmetric?

Two-Sample Confidence Intervals

Pooled vs Welch Approach

Pooled Variance Approach

Welch's Approach

Interactive: Two-Sample CI

Two-Sample Confidence Interval

Group 1 (Control)

Group 2 (Treatment)

CI for μ₂ - μ₁

Welch&apos;s Approach

95% CI for μ₂ - μ₁

A/B Testing Connection

Sample Size Planning

Required Sample Size

Interactive: Sample Size Calculator

Sample Size Planning for Confidence Intervals

Margin of Error vs Sample Size

Quick Reference: ME at Different Sample Sizes

The Square Root Law

AI/ML Applications

Model Performance Evaluation

📈 Test Set Performance Uncertainty

🔄 K-Fold Cross-Validation

🎯 A/B Testing Model Versions

Hyperparameter Uncertainty

Hyperparameter Selection with CIs

Python Implementation

Knowledge Check

Knowledge Check: CI for Normal Parameters

Summary

Key Takeaways

CI Covers True μ

Welch's Approach