Learning Objectives
By the end of this section, you will be able to:
📚 Core Knowledge
- • Understand what a confidence interval is and what it measures
- • Correctly interpret confidence level as a property of the procedure
- • Distinguish correct from incorrect CI interpretations
- • Explain factors that affect CI width
🔧 Practical Skills
- • Calculate and construct confidence intervals
- • Choose appropriate confidence levels for different applications
- • Apply CIs in A/B testing and model evaluation
- • Communicate uncertainty effectively to stakeholders
Where You'll Apply This: A/B testing and experiment analysis, model performance evaluation, hyperparameter uncertainty, prediction intervals in forecasting, uncertainty quantification in neural networks, and communicating ML results to stakeholders.
The Big Picture: Beyond Point Estimates
Imagine you've trained a classification model and measured its accuracy on a test set of 500 examples. The accuracy is 0.87 (87%). But what does this number really tell you? If you collected a different test set of 500 examples, would you get exactly 0.87 again? Almost certainly not.
Point estimates tell us where the target is, but not how confident we should be in hitting it. A confidence interval provides the missing piece: a range of plausible values for the true parameter, along with a measure of our confidence in that range.
The Core Insight
"The accuracy is 87%"
"We're 95% confident true accuracy is between 84% and 90%"
Historical Context
Jerzy Neyman (1894-1981)
Polish mathematician who formalized the concept of confidence intervals in 1937. Neyman's breakthrough was recognizing that we shouldn't ask "What is the probability that the parameter is in this interval?" but rather "What is the probability that our procedure produces an interval containing the parameter?"
This subtle shift in perspective—from the parameter to the procedure—is the essence of frequentist confidence intervals. It's what allows us to make meaningful probability statements without treating the unknown parameter as a random variable.
What Is a Confidence Interval?
A confidence interval is a range of values, computed from sample data, that is likely to contain the true population parameter. It consists of two parts:
- The interval itself: A lower bound and upper bound, typically written as
- The confidence level: A percentage (like 95%) that describes how often intervals constructed this way would contain the true parameter
Formal Definition
Formal Definition
Before observing data, the random interval covers the true parameter with probability
The key insight is that before we collect data, the interval endpoints are random variables (because they depend on the random sample). The probability statement applies to this random interval. After we observe our specific sample, the interval is just a pair of numbers—either it contains θ or it doesn't.
The CI Formula
For estimating a population mean μ with known standard deviation σ, the confidence interval is:
(point estimate)
(from confidence level)
(measures variability)
(more data = narrower CI)
| Confidence Level | α | z* (critical value) |
|---|---|---|
| 90% | 0.10 | 1.645 |
| 95% | 0.05 | 1.960 |
| 99% | 0.01 | 2.576 |
Interactive: Coverage Simulation
The best way to understand what "95% confidence" means is to see it in action. This simulation repeatedly draws samples from a population with known mean and constructs confidence intervals. Watch how many intervals capture the true mean!
Confidence Interval Coverage Simulator
Watch how repeated sampling creates confidence intervals. Each horizontal bar is a CI from one sample. Blue bars contain the true mean, red bars miss it. Over many samples, about 95% should be blue.
Key Insight
A 95% confidence interval means that if we repeated this sampling process many times, approximately 95% of the resulting intervals would contain the true parameter. The confidence level describes the procedure, not any single interval.
The Correct Interpretation
The interpretation of confidence intervals is notoriously tricky. Let's be precise:
✓ Correct Interpretation
"If we repeated this sampling procedure many times, constructing a 95% CI each time, approximately 95% of those intervals would contain the true population parameter."
Notice that this statement is about the procedure, not about any specific interval. The "95%" describes how often the method succeeds in the long run.
Common Misconceptions
✗ WRONG: Probability of Parameter in Interval
"There is a 95% probability that μ lies in [45, 55]."
Why it's wrong: In frequentist statistics, μ is a fixed (not random) quantity. It either is or isn't in [45, 55]—there's no probability involved.
✗ WRONG: Percentage of Data Points
"95% of the data falls within this interval."
Why it's wrong: This confuses a confidence interval with a prediction interval or tolerance interval. CIs are about the parameter, not individual observations.
✗ WRONG: Future Samples
"If I take another sample, there's a 95% chance its mean will be in this interval."
Why it's wrong: The CI is centered on the current sample mean, not on the true mean. A future sample mean could easily fall outside this specific interval.
Interactive: Interpretation Challenge
Test your understanding by evaluating these statements about confidence intervals. Can you identify which interpretations are correct?
CI Interpretation Challenge
For each statement about a 95% confidence interval, decide whether it's a correct or incorrect interpretation. Click to reveal the answer and explanation.
"If we repeat this sampling procedure many times, about 95% of the resulting intervals will contain the true parameter."
"There is a 95% probability that the true parameter lies within this specific interval."
"95% of all sample means fall within this interval."
"The method we used to construct this interval has a 95% success rate in capturing the true parameter."
"95% of the population values fall within this interval."
"If I collected 100 different samples and built a 95% CI from each, approximately 95 of them would contain the true parameter."
What Affects CI Width?
The width of a confidence interval reflects our uncertainty about the parameter. Narrower intervals mean more precision, but what factors control the width?
From the formula , we can identify three factors:
Sample Size (n)
Larger n → Narrower CI
The √n in the denominator means you need to quadruple the sample size to halve the width.
Variability (σ)
Larger σ → Wider CI
More variable populations are harder to estimate precisely. This is usually not under our control.
Confidence Level
Higher confidence → Wider CI
To be more confident, you must cast a wider net. 99% CI is about 30% wider than 95% CI.
Interactive: Width Explorer
Experiment with different parameter values to see how they affect CI width. Pay attention to the relative magnitudes of each effect.
What Affects Confidence Interval Width?
Explore how sample size, variability, and confidence level affect the width of a confidence interval. The CI width formula is: Width = 2 × z* × (σ/√n)
Larger n → Narrower CI (÷√n effect)
Larger σ → Wider CI (direct effect)
Higher conf → Wider CI (larger z*)
Sample Size Effect (σ=10, 95% conf)
Confidence Level Effect (n=30, σ=10)
Key Relationships
- • 4× sample size → ½ the CI width (√n effect)
- • Double σ → Double the CI width (direct relationship)
- • 95% → 99% confidence → ~31% wider CI (z* changes from 1.96 to 2.58)
- • The trade-off: Narrower intervals are more precise but less confident!
The Precision-Confidence Trade-off
There's an inherent tension between precision (narrow intervals) and confidence (high probability of capturing the parameter). You cannot have both for free—the only way to get narrow, high-confidence intervals is to collect more data.
The Trade-off in Practice: In ML experiments, using 95% CIs is standard. If you need narrower intervals for your performance metrics, collect more test data rather than lowering your confidence level. A 90% CI that's wrong 1 in 10 times may not be acceptable for production decisions.
Interactive: CI Calculator
Use this calculator to construct confidence intervals from your data. It compares t-based and z-based intervals, showing how they differ especially for small samples.
Confidence Interval Calculator
t-interval vs z-interval Comparison
Interpretation
We are 95% confident that the true population mean lies between 158.919 and 171.081. With only 10 observations, the t-interval is 15.4% wider than the z-interval to account for uncertainty in estimating σ.
AI/ML Applications
Confidence intervals are essential in machine learning for making reliable decisions about model performance and experiment results. Here are the key applications:
Uncertainty Quantification in ML
📊 Model Performance Evaluation
When you report "accuracy = 87%", you should also report the CI: "87% ± 2.8% (95% CI)". This tells stakeholders how much the metric might vary with a different test set.
🔄 Cross-Validation Uncertainty
K-fold CV gives K performance estimates. Compute their mean and standard error to get a CI for the expected performance. This accounts for the variability across different train/test splits.
🎯 Bayesian Neural Networks
Monte Carlo Dropout and ensemble methods produce uncertainty estimates that can be summarized as prediction intervals. While technically Bayesian (credible intervals), they serve a similar practical purpose.
A/B Testing and CIs
A/B testing is one of the most important applications of confidence intervals in tech. When comparing two variants, we typically construct a CI for the differencebetween their metrics.
A/B Test Decision Rule
Confidence vs Credible Intervals
Confidence intervals are a frequentist concept. Bayesian statistics offers an alternative called credible intervals. Understanding the difference is important for choosing the right tool.
| Property | Confidence Interval | Credible Interval |
|---|---|---|
| Philosophy | Frequentist | Bayesian |
| Parameter is... | Fixed but unknown | A random variable |
| What varies? | The interval (different samples) | Our belief about the parameter |
| Interpretation | 95% of CIs cover θ (long run) | 95% probability θ is in interval (given data) |
| Requires... | Repeated sampling concept | Prior distribution |
| Typical use | Experiment analysis, A/B tests | Bayesian inference, uncertainty estimation |
Practical Note: For many problems, confidence and credible intervals give similar numerical results, especially with large samples and non-informative priors. The choice often depends on your philosophical stance and what question you want to answer.
Python Implementation
1import numpy as np
2from scipy import stats
3
4def confidence_interval_mean_known_sigma(data, sigma, confidence=0.95):
5 """
6 Compute CI for population mean when sigma is known.
7
8 Parameters
9 ----------
10 data : array-like
11 Sample data
12 sigma : float
13 Known population standard deviation
14 confidence : float
15 Confidence level (default 0.95)
16
17 Returns
18 -------
19 tuple : (lower, upper, margin_of_error)
20 """
21 n = len(data)
22 x_bar = np.mean(data)
23
24 # Get critical z-value
25 alpha = 1 - confidence
26 z_star = stats.norm.ppf(1 - alpha/2)
27
28 # Calculate margin of error and CI
29 se = sigma / np.sqrt(n)
30 margin_of_error = z_star * se
31
32 return (x_bar - margin_of_error, x_bar + margin_of_error, margin_of_error)
33
34
35def confidence_interval_mean_unknown_sigma(data, confidence=0.95):
36 """
37 Compute CI for population mean when sigma is unknown (t-interval).
38
39 Parameters
40 ----------
41 data : array-like
42 Sample data
43 confidence : float
44 Confidence level (default 0.95)
45
46 Returns
47 -------
48 tuple : (lower, upper, margin_of_error)
49 """
50 n = len(data)
51 x_bar = np.mean(data)
52 s = np.std(data, ddof=1) # Sample std dev (unbiased)
53
54 # Get critical t-value
55 alpha = 1 - confidence
56 df = n - 1
57 t_star = stats.t.ppf(1 - alpha/2, df)
58
59 # Calculate margin of error and CI
60 se = s / np.sqrt(n)
61 margin_of_error = t_star * se
62
63 return (x_bar - margin_of_error, x_bar + margin_of_error, margin_of_error)
64
65
66def confidence_interval_proportion(successes, n, confidence=0.95):
67 """
68 Compute CI for population proportion (Wald interval).
69
70 Parameters
71 ----------
72 successes : int
73 Number of successes
74 n : int
75 Sample size
76 confidence : float
77 Confidence level (default 0.95)
78
79 Returns
80 -------
81 tuple : (lower, upper, margin_of_error)
82 """
83 p_hat = successes / n
84
85 # Get critical z-value
86 alpha = 1 - confidence
87 z_star = stats.norm.ppf(1 - alpha/2)
88
89 # Calculate SE and margin of error
90 se = np.sqrt(p_hat * (1 - p_hat) / n)
91 margin_of_error = z_star * se
92
93 return (
94 max(0, p_hat - margin_of_error),
95 min(1, p_hat + margin_of_error),
96 margin_of_error
97 )
98
99
100# Example: Model accuracy evaluation
101np.random.seed(42)
102
103# Simulate test set with true accuracy of 0.85
104n_test = 500
105true_accuracy = 0.85
106test_results = np.random.binomial(1, true_accuracy, n_test)
107
108# Compute 95% CI for accuracy
109correct = np.sum(test_results)
110lower, upper, moe = confidence_interval_proportion(correct, n_test, 0.95)
111observed_accuracy = correct / n_test
112
113print(f"Test Set Results:")
114print(f" Observed accuracy: {observed_accuracy:.1%}")
115print(f" 95% CI: [{lower:.1%}, {upper:.1%}]")
116print(f" Margin of error: ±{moe:.1%}")
117print(f" True accuracy ({true_accuracy:.0%}) in CI? {lower <= true_accuracy <= upper}")Knowledge Check
Test your understanding of confidence interval concepts with this quiz. Pay close attention to the subtle distinctions in interpretation.
Confidence Intervals Quiz
What does the '95%' in a 95% confidence interval actually refer to?
Summary
Key Takeaways
- CIs quantify uncertainty: A point estimate without a CI is incomplete. Always report both the estimate and its uncertainty.
- Interpretation is about the procedure: The confidence level describes how often the method succeeds in the long run, not the probability for any single interval.
- Width depends on n, σ, and confidence: Only sample size is typically under your control. Quadruple n to halve CI width.
- Trade-off between precision and confidence: Narrower intervals require more data or lower confidence. There's no free lunch.
- Essential for ML: CIs are crucial for A/B testing, model evaluation, and communicating uncertainty to stakeholders.
Looking Ahead: In the next section, we'll explore confidence intervals for specific parameters of the normal distribution—both the mean (with known and unknown σ) and the variance. These are the building blocks for many statistical procedures.