Learning Objectives
By the end of this section, you will be able to:
๐ Core Knowledge
- โข Distinguish between confidence intervals and prediction intervals
- โข Explain why prediction intervals are always wider than confidence intervals
- โข Derive prediction intervals for normal data
- โข Understand tolerance intervals and their use cases
๐ง Practical Skills
- โข Compute prediction intervals in Python
- โข Apply prediction intervals to regression problems
- โข Choose appropriate intervals for different use cases
- โข Implement uncertainty quantification for ML models
๐ง Deep Learning Connections
- โข Prediction uncertainty: Distinguishing epistemic vs. aleatoric uncertainty
- โข Conformal prediction: Distribution-free prediction intervals
- โข Quantile regression: Direct interval estimation via neural networks
- โข Probabilistic forecasting: Time series prediction intervals
Where You'll Apply This: Sales forecasting, demand prediction, medical prognosis, quality control, weather forecasting, and any ML application where predicting individual outcomes (not just averages) is important.
Prediction vs. Confidence Intervals
One of the most common errors in applied statistics is confusing confidence intervals (for parameters) with prediction intervals (for future observations). They answer fundamentally different questions.
The Key Distinction
๐ Confidence Interval
Question: Where is the population mean ฮผ likely to be?
Target: A fixed but unknown parameter
Uncertainty: Only from sampling variability of Xฬ
๐ฏ Prediction Interval
Question: Where will the next observation Xโโโ fall?
Target: A future random variable
Uncertainty: Sampling variability + inherent randomness of Xโโโ
Key Insight: A confidence interval shrinks as n โ โ (we learn ฮผ exactly). A prediction interval does NOT shrink to zero โ even with infinite data about ฮผ, the next observation still has inherent randomness ฯยฒ!
Variance Components
The prediction interval is wider because it accounts for two sources of uncertainty:
Variance Decomposition
ฯยฒ/n
Estimation uncertainty
(Shrinks with more data)
ฯยฒ
Inherent variability
(Irreducible)
| Aspect | Confidence Interval | Prediction Interval |
|---|---|---|
| Target | Population parameter ฮผ | Next observation Xโโโ |
| Variance factor | 1/n | 1 + 1/n |
| As n โ โ | Width โ 0 | Width โ 2z*ฯ (irreducible) |
| Interpretation | 95% of such intervals contain ฮผ | 95% of such intervals contain Xโโโ |
| Use case | Estimating average effect | Predicting individual outcomes |
Prediction Intervals for Normal Data
Known Variance Case
If Xโ, ..., Xโ ~ N(ฮผ, ฯยฒ) with ฯยฒ known, and we want to predict Xโโโ:
Prediction Interval (ฯ known)
where z* is the appropriate standard normal quantile (e.g., 1.96 for 95%)
Derivation: The prediction error Xฬ - Xโโโ is normally distributed:
Standardizing gives a standard normal variable, leading to the interval.
Unknown Variance Case
When ฯยฒ is unknown and estimated by sยฒ, we use the t-distribution:
Prediction Interval (ฯ unknown)
where t*โโโ is the appropriate t-quantile with n-1 degrees of freedom
Prediction Intervals in Regression
In regression, prediction intervals are especially important. We want to predict not just E[Y|X=x] (the conditional mean), but where an individual observation Y might fall.
Simple Linear Regression
For the model Y = ฮฒโ + ฮฒโX + ฮต with ฮต ~ N(0, ฯยฒ):
Prediction Interval for Y at X = xโ
Where:
- ลทโ = ฮฒฬโ + ฮฒฬโxโ is the predicted value
- sโ is the residual standard error
- The square root term is the standard error of prediction
Notice the interval has three variance components:
- 1 (irreducible): Inherent noise in Y
- 1/n: Uncertainty in estimating the intercept
- (xโ - xฬ)ยฒ / ฮฃ(xแตข - xฬ)ยฒ: Uncertainty from the slope, larger for xโ far from xฬ
Multiple Regression
For the model Y = Xฮฒ + ฮต in matrix notation:
Prediction Interval (Matrix Form)
where p is the number of parameters (including intercept)
Tolerance Intervals
A third type of interval, often confused with both confidence and prediction intervals, is the tolerance interval.
Tolerance Interval Definition
A (1-ฮฑ, ฮฒ) tolerance interval is an interval that, with confidence 1-ฮฑ, contains at least proportion ฮฒ of the population.
where k depends on n, ฮฑ, and ฮฒ (from tolerance tables)
| Interval Type | What It Covers | Example Interpretation |
|---|---|---|
| Confidence | Parameter (ฮผ) | 95% confident ฮผ is in [L, U] |
| Prediction | Next observation | 95% of next observations fall in [L, U] |
| Tolerance | Proportion of population | 95% confident that 90% of population is in [L, U] |
Use case: Manufacturing quality control. "We are 95% confident that 99% of all products have measurements in this range."
AI/ML Applications
๐ฏ Conformal Prediction
Distribution-free prediction intervals for any ML model! Uses calibration data to construct intervals with guaranteed coverage, without assuming normality or other parametric forms.
๐ Quantile Regression
Train neural networks to predict quantiles (e.g., 5th and 95th percentiles) directly. Produces prediction intervals without distributional assumptions. Used in demand forecasting and risk estimation.
๐ง Bayesian Neural Networks
Posterior predictive distribution naturally provides prediction intervals that account for both epistemic (model) and aleatoric (data) uncertainty.
๐ Time Series Forecasting
Prediction intervals are essential for forecasts: they communicate uncertainty to decision-makers. ARIMA, Prophet, and deep learning methods all provide forecast intervals.
Epistemic vs. Aleatoric Uncertainty
Epistemic (Model) Uncertainty
Uncertainty from not knowing the true model/parameters. Reducible with more data. Analogous to ฯยฒ/n term.
Aleatoric (Data) Uncertainty
Inherent randomness in the data-generating process. Irreducible even with infinite data. Analogous to ฯยฒ term.
Python Implementation
1import numpy as np
2from scipy import stats
3from typing import Tuple
4import warnings
5
6# ============================================
7# Prediction Intervals for Normal Data
8# ============================================
9
10def prediction_interval_normal(
11 data: np.ndarray,
12 confidence: float = 0.95
13) -> Tuple[float, float, float]:
14 """
15 Compute prediction interval for next observation from normal data.
16
17 Parameters
18 ----------
19 data : array
20 Observed data (assumed normal)
21 confidence : float
22 Confidence level (e.g., 0.95 for 95%)
23
24 Returns
25 -------
26 lower, upper, width : Prediction interval bounds and width
27 """
28 n = len(data)
29 x_bar = np.mean(data)
30 s = np.std(data, ddof=1)
31
32 # t-quantile
33 alpha = 1 - confidence
34 t_crit = stats.t.ppf(1 - alpha/2, df=n-1)
35
36 # Standard error of prediction
37 se_pred = s * np.sqrt(1 + 1/n)
38
39 # Margin of error
40 margin = t_crit * se_pred
41
42 return x_bar - margin, x_bar + margin, 2 * margin
43
44
45def confidence_interval_mean(
46 data: np.ndarray,
47 confidence: float = 0.95
48) -> Tuple[float, float, float]:
49 """
50 Compute confidence interval for population mean.
51
52 Parameters
53 ----------
54 data : array
55 Observed data
56 confidence : float
57 Confidence level
58
59 Returns
60 -------
61 lower, upper, width : CI bounds and width
62 """
63 n = len(data)
64 x_bar = np.mean(data)
65 s = np.std(data, ddof=1)
66
67 alpha = 1 - confidence
68 t_crit = stats.t.ppf(1 - alpha/2, df=n-1)
69
70 # Standard error of mean
71 se_mean = s / np.sqrt(n)
72
73 margin = t_crit * se_mean
74
75 return x_bar - margin, x_bar + margin, 2 * margin
76
77
78def compare_intervals(data: np.ndarray, confidence: float = 0.95) -> dict:
79 """Compare CI for mean vs. prediction interval."""
80 ci_lower, ci_upper, ci_width = confidence_interval_mean(data, confidence)
81 pi_lower, pi_upper, pi_width = prediction_interval_normal(data, confidence)
82
83 return {
84 'n': len(data),
85 'mean': np.mean(data),
86 'std': np.std(data, ddof=1),
87 'confidence_interval': {
88 'lower': ci_lower,
89 'upper': ci_upper,
90 'width': ci_width
91 },
92 'prediction_interval': {
93 'lower': pi_lower,
94 'upper': pi_upper,
95 'width': pi_width
96 },
97 'width_ratio': pi_width / ci_width
98 }
99
100
101# ============================================
102# Prediction Intervals for Regression
103# ============================================
104
105def regression_prediction_interval(
106 X: np.ndarray,
107 y: np.ndarray,
108 x_new: np.ndarray,
109 confidence: float = 0.95
110) -> dict:
111 """
112 Compute prediction interval for simple linear regression.
113
114 Parameters
115 ----------
116 X : array of shape (n,)
117 Predictor values
118 y : array of shape (n,)
119 Response values
120 x_new : array
121 New x values for prediction
122 confidence : float
123 Confidence level
124
125 Returns
126 -------
127 dict : Predictions, confidence intervals, and prediction intervals
128 """
129 n = len(X)
130 x_bar = np.mean(X)
131 ss_x = np.sum((X - x_bar)**2)
132
133 # Fit regression
134 beta_1 = np.sum((X - x_bar) * (y - np.mean(y))) / ss_x
135 beta_0 = np.mean(y) - beta_1 * x_bar
136
137 # Residuals and residual standard error
138 y_hat = beta_0 + beta_1 * X
139 residuals = y - y_hat
140 s_e = np.sqrt(np.sum(residuals**2) / (n - 2))
141
142 # t-quantile
143 alpha = 1 - confidence
144 t_crit = stats.t.ppf(1 - alpha/2, df=n-2)
145
146 # Predictions
147 y_new = beta_0 + beta_1 * x_new
148
149 # Standard errors
150 se_mean = s_e * np.sqrt(1/n + (x_new - x_bar)**2 / ss_x)
151 se_pred = s_e * np.sqrt(1 + 1/n + (x_new - x_bar)**2 / ss_x)
152
153 return {
154 'x_new': x_new,
155 'y_pred': y_new,
156 'coefficients': {'intercept': beta_0, 'slope': beta_1},
157 'residual_std_error': s_e,
158 'confidence_interval': {
159 'lower': y_new - t_crit * se_mean,
160 'upper': y_new + t_crit * se_mean
161 },
162 'prediction_interval': {
163 'lower': y_new - t_crit * se_pred,
164 'upper': y_new + t_crit * se_pred
165 }
166 }
167
168
169# ============================================
170# Tolerance Interval
171# ============================================
172
173def tolerance_interval_normal(
174 data: np.ndarray,
175 coverage: float = 0.90,
176 confidence: float = 0.95
177) -> Tuple[float, float]:
178 """
179 Compute two-sided tolerance interval for normal data.
180
181 A (confidence, coverage) tolerance interval contains at least
182 'coverage' proportion of the population with 'confidence' confidence.
183
184 Parameters
185 ----------
186 data : array
187 Observed data (assumed normal)
188 coverage : float
189 Proportion of population to cover (e.g., 0.90)
190 confidence : float
191 Confidence level (e.g., 0.95)
192
193 Returns
194 -------
195 lower, upper : Tolerance interval bounds
196 """
197 n = len(data)
198 x_bar = np.mean(data)
199 s = np.std(data, ddof=1)
200
201 # Approximate k factor (two-sided)
202 # This is a simplified approximation; exact values require special tables
203 z_p = stats.norm.ppf((1 + coverage) / 2)
204 chi2_val = stats.chi2.ppf(confidence, df=n-1)
205
206 # Approximate k factor
207 k = z_p * np.sqrt((n - 1) * (1 + 1/n) / chi2_val)
208
209 return x_bar - k * s, x_bar + k * s
210
211
212# ============================================
213# Conformal Prediction (Simple Version)
214# ============================================
215
216def conformal_prediction_interval(
217 cal_predictions: np.ndarray,
218 cal_true: np.ndarray,
219 new_prediction: float,
220 confidence: float = 0.90
221) -> Tuple[float, float]:
222 """
223 Compute distribution-free prediction interval using conformal prediction.
224
225 Parameters
226 ----------
227 cal_predictions : array
228 Model predictions on calibration set
229 cal_true : array
230 True values on calibration set
231 new_prediction : float
232 Model prediction for new point
233 confidence : float
234 Desired coverage level
235
236 Returns
237 -------
238 lower, upper : Prediction interval bounds
239 """
240 # Compute nonconformity scores (absolute residuals)
241 scores = np.abs(cal_predictions - cal_true)
242
243 # Quantile for coverage
244 n_cal = len(scores)
245 q_level = np.ceil((n_cal + 1) * confidence) / n_cal
246 q_level = min(q_level, 1.0)
247
248 # Quantile of scores
249 q_hat = np.quantile(scores, q_level)
250
251 return new_prediction - q_hat, new_prediction + q_hat
252
253
254# ============================================
255# Demonstration
256# ============================================
257
258if __name__ == "__main__":
259 np.random.seed(42)
260
261 print("=" * 60)
262 print("PREDICTION INTERVALS vs CONFIDENCE INTERVALS")
263 print("=" * 60)
264
265 # Generate normal data
266 mu_true = 100
267 sigma_true = 15
268 n = 30
269 data = np.random.normal(mu_true, sigma_true, n)
270
271 # Compare intervals
272 comparison = compare_intervals(data, confidence=0.95)
273
274 print(f"\nSample: n={comparison['n']}, mean={comparison['mean']:.2f}, std={comparison['std']:.2f}")
275 print(f"True parameters: ฮผ={mu_true}, ฯ={sigma_true}")
276
277 print("\n--- Confidence Interval for Mean ---")
278 ci = comparison['confidence_interval']
279 print(f" [{ci['lower']:.2f}, {ci['upper']:.2f}]")
280 print(f" Width: {ci['width']:.2f}")
281
282 print("\n--- Prediction Interval for Next Observation ---")
283 pi = comparison['prediction_interval']
284 print(f" [{pi['lower']:.2f}, {pi['upper']:.2f}]")
285 print(f" Width: {pi['width']:.2f}")
286
287 print(f"\nPrediction interval is {comparison['width_ratio']:.1f}x wider!")
288
289 # Show convergence behavior
290 print("\n--- Width Ratio as n Increases ---")
291 for n_test in [10, 30, 100, 500, 1000]:
292 data_test = np.random.normal(mu_true, sigma_true, n_test)
293 comp = compare_intervals(data_test)
294 print(f" n={n_test:4d}: CI width={comp['confidence_interval']['width']:.2f}, "
295 f"PI width={comp['prediction_interval']['width']:.2f}, "
296 f"ratio={comp['width_ratio']:.2f}")
297
298 print("\n--- Regression Prediction Interval ---")
299 # Generate regression data
300 X = np.linspace(0, 10, 50)
301 y = 2 + 3 * X + np.random.normal(0, 2, 50)
302
303 # Predict at new points
304 x_new = np.array([2, 5, 8, 12]) # Note: 12 is extrapolation!
305 result = regression_prediction_interval(X, y, x_new)
306
307 print(f"Regression: y = {result['coefficients']['intercept']:.2f} + "
308 f"{result['coefficients']['slope']:.2f}x")
309 print(f"Residual SE: {result['residual_std_error']:.2f}")
310 print("\nPredictions with intervals:")
311 for i, x in enumerate(x_new):
312 pi = result['prediction_interval']
313 ci = result['confidence_interval']
314 print(f" x={x:2d}: ลท={result['y_pred'][i]:.1f}, "
315 f"CI=[{ci['lower'][i]:.1f}, {ci['upper'][i]:.1f}], "
316 f"PI=[{pi['lower'][i]:.1f}, {pi['upper'][i]:.1f}]")
317
318 print("\n--- Tolerance Interval ---")
319 tol_lower, tol_upper = tolerance_interval_normal(data, coverage=0.90, confidence=0.95)
320 print(f"(0.95, 0.90) Tolerance Interval: [{tol_lower:.2f}, {tol_upper:.2f}]")
321 print("Interpretation: 95% confident that 90% of population is in this interval")Common Pitfalls
Summary
Key Takeaways
- Prediction intervals quantify uncertainty about future observations, while confidence intervals quantify uncertainty aboutpopulation parameters.
- Prediction intervals are always wider because they account for both estimation uncertainty (reducible) and inherent variability (irreducible).
- As n โ โ: Confidence intervals shrink to zero width, but prediction intervals converge to a minimum width of 2z*ฯ.
- In regression: Prediction intervals widen for extrapolation (predicting at X values far from the data center).
- Tolerance intervals answer a third question: what interval contains a specified proportion of the population with given confidence?
- Modern ML: Conformal prediction provides distribution-free prediction intervals with guaranteed coverage for any model.
Looking Ahead: In the next section, we'll explore Simultaneous Confidence Intervals โ how to maintain correct coverage when making multiple inferences at once, addressing the multiple testing problem through Bonferroni and related corrections.