Chapter 13
25 min read
Section 87 of 175

Confidence Intervals - Concepts

Interval Estimation

Learning Objectives

By the end of this section, you will be able to:

📚 Core Knowledge

  • • Understand what a confidence interval is and what it measures
  • • Correctly interpret confidence level as a property of the procedure
  • • Distinguish correct from incorrect CI interpretations
  • • Explain factors that affect CI width

🔧 Practical Skills

  • • Calculate and construct confidence intervals
  • • Choose appropriate confidence levels for different applications
  • • Apply CIs in A/B testing and model evaluation
  • • Communicate uncertainty effectively to stakeholders
Where You'll Apply This: A/B testing and experiment analysis, model performance evaluation, hyperparameter uncertainty, prediction intervals in forecasting, uncertainty quantification in neural networks, and communicating ML results to stakeholders.

The Big Picture: Beyond Point Estimates

Imagine you've trained a classification model and measured its accuracy on a test set of 500 examples. The accuracy is 0.87 (87%). But what does this number really tell you? If you collected a different test set of 500 examples, would you get exactly 0.87 again? Almost certainly not.

Point estimates tell us where the target is, but not how confident we should be in hitting it. A confidence interval provides the missing piece: a range of plausible values for the true parameter, along with a measure of our confidence in that range.

The Core Insight

0.87
Point Estimate
"The accuracy is 87%"
[0.84, 0.90]
95% Confidence Interval
"We're 95% confident true accuracy is between 84% and 90%"

Historical Context

👨‍🔬

Jerzy Neyman (1894-1981)

Polish mathematician who formalized the concept of confidence intervals in 1937. Neyman's breakthrough was recognizing that we shouldn't ask "What is the probability that the parameter is in this interval?" but rather "What is the probability that our procedure produces an interval containing the parameter?"

This subtle shift in perspective—from the parameter to the procedure—is the essence of frequentist confidence intervals. It's what allows us to make meaningful probability statements without treating the unknown parameter as a random variable.


What Is a Confidence Interval?

A confidence interval is a range of values, computed from sample data, that is likely to contain the true population parameter. It consists of two parts:

  1. The interval itself: A lower bound and upper bound, typically written as [θ^L,θ^U][\hat{\theta}_L, \hat{\theta}_U]
  2. The confidence level: A percentage (like 95%) that describes how often intervals constructed this way would contain the true parameter

Formal Definition

Formal Definition

P(θ^L(X1,,Xn)θθ^U(X1,,Xn))=1αP\left(\hat{\theta}_L(X_1, \ldots, X_n) \leq \theta \leq \hat{\theta}_U(X_1, \ldots, X_n)\right) = 1 - \alpha

Before observing data, the random interval [θ^L,θ^U][\hat{\theta}_L, \hat{\theta}_U] covers the true parameter θ\theta with probability 1α1 - \alpha

The key insight is that before we collect data, the interval endpoints are random variables (because they depend on the random sample). The probability statement applies to this random interval. After we observe our specific sample, the interval is just a pair of numbers—either it contains θ or it doesn't.

The CI Formula

For estimating a population mean μ with known standard deviation σ, the confidence interval is:

Xˉ±zσn\bar{X} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}
Xˉ\bar{X}
Sample mean
(point estimate)
zz^*
Critical value
(from confidence level)
σ\sigma
Population std dev
(measures variability)
n\sqrt{n}
Sample size
(more data = narrower CI)
Confidence Levelαz* (critical value)
90%0.101.645
95%0.051.960
99%0.012.576
The margin of error is the "±" part: ME=zσnME = z^* \cdot \frac{\sigma}{\sqrt{n}}. This is the half-width of the confidence interval.

Interactive: Coverage Simulation

The best way to understand what "95% confidence" means is to see it in action. This simulation repeatedly draws samples from a population with known mean and constructs confidence intervals. Watch how many intervals capture the true mean!

Confidence Interval Coverage Simulator

Coverage Rate:0.0%(target: 95%)

Watch how repeated sampling creates confidence intervals. Each horizontal bar is a CI from one sample. Blue bars contain the true mean, red bars miss it. Over many samples, about 95% should be blue.

0 samples
True μ = 10094.0100.0106.0112.088.0Contains μMisses μ

Key Insight

A 95% confidence interval means that if we repeated this sampling process many times, approximately 95% of the resulting intervals would contain the true parameter. The confidence level describes the procedure, not any single interval.

Key Observation: The coverage rate should fluctuate around 95% (or whatever confidence level you choose). Any single interval either contains μ or misses it—the randomness comes from which sample we happened to draw.

The Correct Interpretation

The interpretation of confidence intervals is notoriously tricky. Let's be precise:

✓ Correct Interpretation

"If we repeated this sampling procedure many times, constructing a 95% CI each time, approximately 95% of those intervals would contain the true population parameter."

Notice that this statement is about the procedure, not about any specific interval. The "95%" describes how often the method succeeds in the long run.

Common Misconceptions

✗ WRONG: Probability of Parameter in Interval

"There is a 95% probability that μ lies in [45, 55]."

Why it's wrong: In frequentist statistics, μ is a fixed (not random) quantity. It either is or isn't in [45, 55]—there's no probability involved.

✗ WRONG: Percentage of Data Points

"95% of the data falls within this interval."

Why it's wrong: This confuses a confidence interval with a prediction interval or tolerance interval. CIs are about the parameter, not individual observations.

✗ WRONG: Future Samples

"If I take another sample, there's a 95% chance its mean will be in this interval."

Why it's wrong: The CI is centered on the current sample mean, not on the true mean. A future sample mean could easily fall outside this specific interval.

Interactive: Interpretation Challenge

Test your understanding by evaluating these statements about confidence intervals. Can you identify which interpretations are correct?

CI Interpretation Challenge

For each statement about a 95% confidence interval, decide whether it's a correct or incorrect interpretation. Click to reveal the answer and explanation.

"If we repeat this sampling procedure many times, about 95% of the resulting intervals will contain the true parameter."

"There is a 95% probability that the true parameter lies within this specific interval."

"95% of all sample means fall within this interval."

"The method we used to construct this interval has a 95% success rate in capturing the true parameter."

"95% of the population values fall within this interval."

"If I collected 100 different samples and built a 95% CI from each, approximately 95 of them would contain the true parameter."


What Affects CI Width?

The width of a confidence interval reflects our uncertainty about the parameter. Narrower intervals mean more precision, but what factors control the width?

From the formula Width=2zσn\text{Width} = 2 \cdot z^* \cdot \frac{\sigma}{\sqrt{n}}, we can identify three factors:

Sample Size (n)

Larger n → Narrower CI
The √n in the denominator means you need to quadruple the sample size to halve the width.

Variability (σ)

Larger σ → Wider CI
More variable populations are harder to estimate precisely. This is usually not under our control.

Confidence Level

Higher confidence → Wider CI
To be more confident, you must cast a wider net. 99% CI is about 30% wider than 95% CI.

Interactive: Width Explorer

Experiment with different parameter values to see how they affect CI width. Pay attention to the relative magnitudes of each effect.

What Affects Confidence Interval Width?

Explore how sample size, variability, and confidence level affect the width of a confidence interval. The CI width formula is: Width = 2 × z* × (σ/√n)

Larger n → Narrower CI (÷√n effect)

Larger σ → Wider CI (direct effect)

Higher conf → Wider CI (larger z*)

Sample Mean (x̄)Base (n=30, σ=10, 95%)Width: 7.16CurrentWidth: 7.160.0% vs baseME = ±3.58
z* (critical value)
1.960
SE = σ/√n
1.826
ME = z* × SE
3.578
Width = 2 × ME
7.157

Sample Size Effect (σ=10, 95% conf)

n = 10
12.40
n = 25
7.84
n = 50
5.54
n = 100
3.92
n = 200
2.77
n = 500
1.75

Confidence Level Effect (n=30, σ=10)

80%
4.67
90%
6.01
95%
7.16
99%
9.41

Key Relationships

  • 4× sample size½ the CI width (√n effect)
  • Double σDouble the CI width (direct relationship)
  • 95% → 99% confidence~31% wider CI (z* changes from 1.96 to 2.58)
  • • The trade-off: Narrower intervals are more precise but less confident!

The Precision-Confidence Trade-off

There's an inherent tension between precision (narrow intervals) and confidence (high probability of capturing the parameter). You cannot have both for free—the only way to get narrow, high-confidence intervals is to collect more data.

The Trade-off in Practice: In ML experiments, using 95% CIs is standard. If you need narrower intervals for your performance metrics, collect more test data rather than lowering your confidence level. A 90% CI that's wrong 1 in 10 times may not be acceptable for production decisions.

Interactive: CI Calculator

Use this calculator to construct confidence intervals from your data. It compares t-based and z-based intervals, showing how they differ especially for small samples.

Confidence Interval Calculator

t-interval vs z-interval Comparison

x̄ = 165.00t-based158.92171.08z-based159.73170.27
1
Calculate degrees of freedom
df = n - 1 = 10 - 1 = 9
2
Find critical t-value for 95% confidence
t0.025000000000000022,df=9 = 2.2622(vs z = 1.9600)
3
Calculate standard error
SE = s / √n = 8.5 / √10 = 2.6879
4
Calculate margin of error
ME = t* × SE = 2.2622 × 2.6879 = 6.0805
5
Construct confidence interval
CI = x̄ ± ME = 165 ± 6.0805
= [158.9195, 171.0805]
95% t-Confidence Interval
[158.919, 171.081]
Width: 12.1611
z-interval (if σ known)
[159.732, 170.268]
Width: 10.5367
Width Difference
+15.4% wider
t-interval accounts for variance uncertainty

Interpretation

We are 95% confident that the true population mean lies between 158.919 and 171.081. With only 10 observations, the t-interval is 15.4% wider than the z-interval to account for uncertainty in estimating σ.


AI/ML Applications

Confidence intervals are essential in machine learning for making reliable decisions about model performance and experiment results. Here are the key applications:

Uncertainty Quantification in ML

📊 Model Performance Evaluation

When you report "accuracy = 87%", you should also report the CI: "87% ± 2.8% (95% CI)". This tells stakeholders how much the metric might vary with a different test set.

CI for accuracy: p^±zp^(1p^)n\text{CI for accuracy: } \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

🔄 Cross-Validation Uncertainty

K-fold CV gives K performance estimates. Compute their mean and standard error to get a CI for the expected performance. This accounts for the variability across different train/test splits.

🎯 Bayesian Neural Networks

Monte Carlo Dropout and ensemble methods produce uncertainty estimates that can be summarized as prediction intervals. While technically Bayesian (credible intervals), they serve a similar practical purpose.

A/B Testing and CIs

A/B testing is one of the most important applications of confidence intervals in tech. When comparing two variants, we typically construct a CI for the differencebetween their metrics.

A/B Test Decision Rule

If 0 is NOT in the CI
→ Statistically significant difference
If 0 is IN the CI
→ Cannot conclude a difference exists
Statistical vs Practical Significance: A CI might exclude 0 (statistically significant) but the effect might be too small to matter (e.g., 0.1% conversion rate lift). Always consider effect size alongside statistical significance.

Confidence vs Credible Intervals

Confidence intervals are a frequentist concept. Bayesian statistics offers an alternative called credible intervals. Understanding the difference is important for choosing the right tool.

PropertyConfidence IntervalCredible Interval
PhilosophyFrequentistBayesian
Parameter is...Fixed but unknownA random variable
What varies?The interval (different samples)Our belief about the parameter
Interpretation95% of CIs cover θ (long run)95% probability θ is in interval (given data)
Requires...Repeated sampling conceptPrior distribution
Typical useExperiment analysis, A/B testsBayesian inference, uncertainty estimation
Practical Note: For many problems, confidence and credible intervals give similar numerical results, especially with large samples and non-informative priors. The choice often depends on your philosophical stance and what question you want to answer.

Python Implementation

🐍python
1import numpy as np
2from scipy import stats
3
4def confidence_interval_mean_known_sigma(data, sigma, confidence=0.95):
5    """
6    Compute CI for population mean when sigma is known.
7
8    Parameters
9    ----------
10    data : array-like
11        Sample data
12    sigma : float
13        Known population standard deviation
14    confidence : float
15        Confidence level (default 0.95)
16
17    Returns
18    -------
19    tuple : (lower, upper, margin_of_error)
20    """
21    n = len(data)
22    x_bar = np.mean(data)
23
24    # Get critical z-value
25    alpha = 1 - confidence
26    z_star = stats.norm.ppf(1 - alpha/2)
27
28    # Calculate margin of error and CI
29    se = sigma / np.sqrt(n)
30    margin_of_error = z_star * se
31
32    return (x_bar - margin_of_error, x_bar + margin_of_error, margin_of_error)
33
34
35def confidence_interval_mean_unknown_sigma(data, confidence=0.95):
36    """
37    Compute CI for population mean when sigma is unknown (t-interval).
38
39    Parameters
40    ----------
41    data : array-like
42        Sample data
43    confidence : float
44        Confidence level (default 0.95)
45
46    Returns
47    -------
48    tuple : (lower, upper, margin_of_error)
49    """
50    n = len(data)
51    x_bar = np.mean(data)
52    s = np.std(data, ddof=1)  # Sample std dev (unbiased)
53
54    # Get critical t-value
55    alpha = 1 - confidence
56    df = n - 1
57    t_star = stats.t.ppf(1 - alpha/2, df)
58
59    # Calculate margin of error and CI
60    se = s / np.sqrt(n)
61    margin_of_error = t_star * se
62
63    return (x_bar - margin_of_error, x_bar + margin_of_error, margin_of_error)
64
65
66def confidence_interval_proportion(successes, n, confidence=0.95):
67    """
68    Compute CI for population proportion (Wald interval).
69
70    Parameters
71    ----------
72    successes : int
73        Number of successes
74    n : int
75        Sample size
76    confidence : float
77        Confidence level (default 0.95)
78
79    Returns
80    -------
81    tuple : (lower, upper, margin_of_error)
82    """
83    p_hat = successes / n
84
85    # Get critical z-value
86    alpha = 1 - confidence
87    z_star = stats.norm.ppf(1 - alpha/2)
88
89    # Calculate SE and margin of error
90    se = np.sqrt(p_hat * (1 - p_hat) / n)
91    margin_of_error = z_star * se
92
93    return (
94        max(0, p_hat - margin_of_error),
95        min(1, p_hat + margin_of_error),
96        margin_of_error
97    )
98
99
100# Example: Model accuracy evaluation
101np.random.seed(42)
102
103# Simulate test set with true accuracy of 0.85
104n_test = 500
105true_accuracy = 0.85
106test_results = np.random.binomial(1, true_accuracy, n_test)
107
108# Compute 95% CI for accuracy
109correct = np.sum(test_results)
110lower, upper, moe = confidence_interval_proportion(correct, n_test, 0.95)
111observed_accuracy = correct / n_test
112
113print(f"Test Set Results:")
114print(f"  Observed accuracy: {observed_accuracy:.1%}")
115print(f"  95% CI: [{lower:.1%}, {upper:.1%}]")
116print(f"  Margin of error: ±{moe:.1%}")
117print(f"  True accuracy ({true_accuracy:.0%}) in CI? {lower <= true_accuracy <= upper}")

Knowledge Check

Test your understanding of confidence interval concepts with this quiz. Pay close attention to the subtle distinctions in interpretation.

Confidence Intervals Quiz

Question 1 of 6

What does the '95%' in a 95% confidence interval actually refer to?


Summary

Key Takeaways

  1. CIs quantify uncertainty: A point estimate without a CI is incomplete. Always report both the estimate and its uncertainty.
  2. Interpretation is about the procedure: The confidence level describes how often the method succeeds in the long run, not the probability for any single interval.
  3. Width depends on n, σ, and confidence: Only sample size is typically under your control. Quadruple n to halve CI width.
  4. Trade-off between precision and confidence: Narrower intervals require more data or lower confidence. There's no free lunch.
  5. Essential for ML: CIs are crucial for A/B testing, model evaluation, and communicating uncertainty to stakeholders.
Looking Ahead: In the next section, we'll explore confidence intervals for specific parameters of the normal distribution—both the mean (with known and unknown σ) and the variance. These are the building blocks for many statistical procedures.
Loading comments...