Chapter 13
25 min read
Section 92 of 175

Prediction Intervals

Interval Estimation

Learning Objectives

By the end of this section, you will be able to:

๐Ÿ“š Core Knowledge

  • โ€ข Distinguish between confidence intervals and prediction intervals
  • โ€ข Explain why prediction intervals are always wider than confidence intervals
  • โ€ข Derive prediction intervals for normal data
  • โ€ข Understand tolerance intervals and their use cases

๐Ÿ”ง Practical Skills

  • โ€ข Compute prediction intervals in Python
  • โ€ข Apply prediction intervals to regression problems
  • โ€ข Choose appropriate intervals for different use cases
  • โ€ข Implement uncertainty quantification for ML models

๐Ÿง  Deep Learning Connections

  • โ€ข Prediction uncertainty: Distinguishing epistemic vs. aleatoric uncertainty
  • โ€ข Conformal prediction: Distribution-free prediction intervals
  • โ€ข Quantile regression: Direct interval estimation via neural networks
  • โ€ข Probabilistic forecasting: Time series prediction intervals
Where You'll Apply This: Sales forecasting, demand prediction, medical prognosis, quality control, weather forecasting, and any ML application where predicting individual outcomes (not just averages) is important.

Prediction vs. Confidence Intervals

One of the most common errors in applied statistics is confusing confidence intervals (for parameters) with prediction intervals (for future observations). They answer fundamentally different questions.

The Key Distinction

๐Ÿ“Š Confidence Interval

Question: Where is the population mean ฮผ likely to be?

Target: A fixed but unknown parameter

Uncertainty: Only from sampling variability of Xฬ„

Xห‰ยฑtโˆ—โ‹…sn\bar{X} \pm t^* \cdot \frac{s}{\sqrt{n}}

๐ŸŽฏ Prediction Interval

Question: Where will the next observation Xโ‚™โ‚Šโ‚ fall?

Target: A future random variable

Uncertainty: Sampling variability + inherent randomness of Xโ‚™โ‚Šโ‚

Xห‰ยฑtโˆ—โ‹…s1+1n\bar{X} \pm t^* \cdot s\sqrt{1 + \frac{1}{n}}
Key Insight: A confidence interval shrinks as n โ†’ โˆž (we learn ฮผ exactly). A prediction interval does NOT shrink to zero โ€” even with infinite data about ฮผ, the next observation still has inherent randomness ฯƒยฒ!

Variance Components

The prediction interval is wider because it accounts for two sources of uncertainty:

Variance Decomposition

Var(Xห‰โˆ’Xn+1)=Var(Xห‰)+Var(Xn+1)=ฯƒ2n+ฯƒ2=ฯƒ2(1+1n)\text{Var}(\bar{X} - X_{n+1}) = \text{Var}(\bar{X}) + \text{Var}(X_{n+1}) = \frac{\sigma^2}{n} + \sigma^2 = \sigma^2\left(1 + \frac{1}{n}\right)

ฯƒยฒ/n

Estimation uncertainty

(Shrinks with more data)

ฯƒยฒ

Inherent variability

(Irreducible)

AspectConfidence IntervalPrediction Interval
TargetPopulation parameter ฮผNext observation Xโ‚™โ‚Šโ‚
Variance factor1/n1 + 1/n
As n โ†’ โˆžWidth โ†’ 0Width โ†’ 2z*ฯƒ (irreducible)
Interpretation95% of such intervals contain ฮผ95% of such intervals contain Xโ‚™โ‚Šโ‚
Use caseEstimating average effectPredicting individual outcomes

Prediction Intervals for Normal Data

Known Variance Case

If Xโ‚, ..., Xโ‚™ ~ N(ฮผ, ฯƒยฒ) with ฯƒยฒ known, and we want to predict Xโ‚™โ‚Šโ‚:

Prediction Interval (ฯƒ known)

Xห‰ยฑzโˆ—โ‹…ฯƒ1+1n\bar{X} \pm z^* \cdot \sigma\sqrt{1 + \frac{1}{n}}

where z* is the appropriate standard normal quantile (e.g., 1.96 for 95%)

Derivation: The prediction error Xฬ„ - Xโ‚™โ‚Šโ‚ is normally distributed:

Xห‰โˆ’Xn+1โˆผN(0,ฯƒ2(1+1n))\bar{X} - X_{n+1} \sim N\left(0, \sigma^2\left(1 + \frac{1}{n}\right)\right)

Standardizing gives a standard normal variable, leading to the interval.

Unknown Variance Case

When ฯƒยฒ is unknown and estimated by sยฒ, we use the t-distribution:

Prediction Interval (ฯƒ unknown)

Xห‰ยฑtnโˆ’1โˆ—โ‹…s1+1n\bar{X} \pm t^*_{n-1} \cdot s\sqrt{1 + \frac{1}{n}}

where t*โ‚™โ‚‹โ‚ is the appropriate t-quantile with n-1 degrees of freedom

Why t-distribution? When we estimate ฯƒ by s, the standardized prediction error follows a t-distribution, not a normal distribution. This accounts for the additional uncertainty from estimating the variance.

Prediction Intervals in Regression

In regression, prediction intervals are especially important. We want to predict not just E[Y|X=x] (the conditional mean), but where an individual observation Y might fall.

Simple Linear Regression

For the model Y = ฮฒโ‚€ + ฮฒโ‚X + ฮต with ฮต ~ N(0, ฯƒยฒ):

Prediction Interval for Y at X = xโ‚€

y^0ยฑtnโˆ’2โˆ—โ‹…se1+1n+(x0โˆ’xห‰)2โˆ‘(xiโˆ’xห‰)2\hat{y}_0 \pm t^*_{n-2} \cdot s_e \sqrt{1 + \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}}

Where:

  • ลทโ‚€ = ฮฒฬ‚โ‚€ + ฮฒฬ‚โ‚xโ‚€ is the predicted value
  • sโ‚‘ is the residual standard error
  • The square root term is the standard error of prediction

Notice the interval has three variance components:

  1. 1 (irreducible): Inherent noise in Y
  2. 1/n: Uncertainty in estimating the intercept
  3. (xโ‚€ - xฬ„)ยฒ / ฮฃ(xแตข - xฬ„)ยฒ: Uncertainty from the slope, larger for xโ‚€ far from xฬ„
Prediction intervals widen as xโ‚€ moves away from xฬ„! Predicting at extreme values of X (extrapolation) gives much wider intervals than predicting near the center of the data.

Multiple Regression

For the model Y = Xฮฒ + ฮต in matrix notation:

Prediction Interval (Matrix Form)

y^0ยฑtnโˆ’pโˆ—โ‹…se1+x0T(XTX)โˆ’1x0\hat{y}_0 \pm t^*_{n-p} \cdot s_e \sqrt{1 + x_0^T(X^TX)^{-1}x_0}

where p is the number of parameters (including intercept)


Tolerance Intervals

A third type of interval, often confused with both confidence and prediction intervals, is the tolerance interval.

Tolerance Interval Definition

A (1-ฮฑ, ฮฒ) tolerance interval is an interval that, with confidence 1-ฮฑ, contains at least proportion ฮฒ of the population.

Xห‰ยฑkโ‹…s\bar{X} \pm k \cdot s

where k depends on n, ฮฑ, and ฮฒ (from tolerance tables)

Interval TypeWhat It CoversExample Interpretation
ConfidenceParameter (ฮผ)95% confident ฮผ is in [L, U]
PredictionNext observation95% of next observations fall in [L, U]
ToleranceProportion of population95% confident that 90% of population is in [L, U]

Use case: Manufacturing quality control. "We are 95% confident that 99% of all products have measurements in this range."


AI/ML Applications

๐ŸŽฏ Conformal Prediction

Distribution-free prediction intervals for any ML model! Uses calibration data to construct intervals with guaranteed coverage, without assuming normality or other parametric forms.

๐Ÿ“Š Quantile Regression

Train neural networks to predict quantiles (e.g., 5th and 95th percentiles) directly. Produces prediction intervals without distributional assumptions. Used in demand forecasting and risk estimation.

๐Ÿง  Bayesian Neural Networks

Posterior predictive distribution naturally provides prediction intervals that account for both epistemic (model) and aleatoric (data) uncertainty.

๐Ÿ“ˆ Time Series Forecasting

Prediction intervals are essential for forecasts: they communicate uncertainty to decision-makers. ARIMA, Prophet, and deep learning methods all provide forecast intervals.

Epistemic vs. Aleatoric Uncertainty

Epistemic (Model) Uncertainty

Uncertainty from not knowing the true model/parameters. Reducible with more data. Analogous to ฯƒยฒ/n term.

Aleatoric (Data) Uncertainty

Inherent randomness in the data-generating process. Irreducible even with infinite data. Analogous to ฯƒยฒ term.


Python Implementation

๐Ÿpython
1import numpy as np
2from scipy import stats
3from typing import Tuple
4import warnings
5
6# ============================================
7# Prediction Intervals for Normal Data
8# ============================================
9
10def prediction_interval_normal(
11    data: np.ndarray,
12    confidence: float = 0.95
13) -> Tuple[float, float, float]:
14    """
15    Compute prediction interval for next observation from normal data.
16
17    Parameters
18    ----------
19    data : array
20        Observed data (assumed normal)
21    confidence : float
22        Confidence level (e.g., 0.95 for 95%)
23
24    Returns
25    -------
26    lower, upper, width : Prediction interval bounds and width
27    """
28    n = len(data)
29    x_bar = np.mean(data)
30    s = np.std(data, ddof=1)
31
32    # t-quantile
33    alpha = 1 - confidence
34    t_crit = stats.t.ppf(1 - alpha/2, df=n-1)
35
36    # Standard error of prediction
37    se_pred = s * np.sqrt(1 + 1/n)
38
39    # Margin of error
40    margin = t_crit * se_pred
41
42    return x_bar - margin, x_bar + margin, 2 * margin
43
44
45def confidence_interval_mean(
46    data: np.ndarray,
47    confidence: float = 0.95
48) -> Tuple[float, float, float]:
49    """
50    Compute confidence interval for population mean.
51
52    Parameters
53    ----------
54    data : array
55        Observed data
56    confidence : float
57        Confidence level
58
59    Returns
60    -------
61    lower, upper, width : CI bounds and width
62    """
63    n = len(data)
64    x_bar = np.mean(data)
65    s = np.std(data, ddof=1)
66
67    alpha = 1 - confidence
68    t_crit = stats.t.ppf(1 - alpha/2, df=n-1)
69
70    # Standard error of mean
71    se_mean = s / np.sqrt(n)
72
73    margin = t_crit * se_mean
74
75    return x_bar - margin, x_bar + margin, 2 * margin
76
77
78def compare_intervals(data: np.ndarray, confidence: float = 0.95) -> dict:
79    """Compare CI for mean vs. prediction interval."""
80    ci_lower, ci_upper, ci_width = confidence_interval_mean(data, confidence)
81    pi_lower, pi_upper, pi_width = prediction_interval_normal(data, confidence)
82
83    return {
84        'n': len(data),
85        'mean': np.mean(data),
86        'std': np.std(data, ddof=1),
87        'confidence_interval': {
88            'lower': ci_lower,
89            'upper': ci_upper,
90            'width': ci_width
91        },
92        'prediction_interval': {
93            'lower': pi_lower,
94            'upper': pi_upper,
95            'width': pi_width
96        },
97        'width_ratio': pi_width / ci_width
98    }
99
100
101# ============================================
102# Prediction Intervals for Regression
103# ============================================
104
105def regression_prediction_interval(
106    X: np.ndarray,
107    y: np.ndarray,
108    x_new: np.ndarray,
109    confidence: float = 0.95
110) -> dict:
111    """
112    Compute prediction interval for simple linear regression.
113
114    Parameters
115    ----------
116    X : array of shape (n,)
117        Predictor values
118    y : array of shape (n,)
119        Response values
120    x_new : array
121        New x values for prediction
122    confidence : float
123        Confidence level
124
125    Returns
126    -------
127    dict : Predictions, confidence intervals, and prediction intervals
128    """
129    n = len(X)
130    x_bar = np.mean(X)
131    ss_x = np.sum((X - x_bar)**2)
132
133    # Fit regression
134    beta_1 = np.sum((X - x_bar) * (y - np.mean(y))) / ss_x
135    beta_0 = np.mean(y) - beta_1 * x_bar
136
137    # Residuals and residual standard error
138    y_hat = beta_0 + beta_1 * X
139    residuals = y - y_hat
140    s_e = np.sqrt(np.sum(residuals**2) / (n - 2))
141
142    # t-quantile
143    alpha = 1 - confidence
144    t_crit = stats.t.ppf(1 - alpha/2, df=n-2)
145
146    # Predictions
147    y_new = beta_0 + beta_1 * x_new
148
149    # Standard errors
150    se_mean = s_e * np.sqrt(1/n + (x_new - x_bar)**2 / ss_x)
151    se_pred = s_e * np.sqrt(1 + 1/n + (x_new - x_bar)**2 / ss_x)
152
153    return {
154        'x_new': x_new,
155        'y_pred': y_new,
156        'coefficients': {'intercept': beta_0, 'slope': beta_1},
157        'residual_std_error': s_e,
158        'confidence_interval': {
159            'lower': y_new - t_crit * se_mean,
160            'upper': y_new + t_crit * se_mean
161        },
162        'prediction_interval': {
163            'lower': y_new - t_crit * se_pred,
164            'upper': y_new + t_crit * se_pred
165        }
166    }
167
168
169# ============================================
170# Tolerance Interval
171# ============================================
172
173def tolerance_interval_normal(
174    data: np.ndarray,
175    coverage: float = 0.90,
176    confidence: float = 0.95
177) -> Tuple[float, float]:
178    """
179    Compute two-sided tolerance interval for normal data.
180
181    A (confidence, coverage) tolerance interval contains at least
182    'coverage' proportion of the population with 'confidence' confidence.
183
184    Parameters
185    ----------
186    data : array
187        Observed data (assumed normal)
188    coverage : float
189        Proportion of population to cover (e.g., 0.90)
190    confidence : float
191        Confidence level (e.g., 0.95)
192
193    Returns
194    -------
195    lower, upper : Tolerance interval bounds
196    """
197    n = len(data)
198    x_bar = np.mean(data)
199    s = np.std(data, ddof=1)
200
201    # Approximate k factor (two-sided)
202    # This is a simplified approximation; exact values require special tables
203    z_p = stats.norm.ppf((1 + coverage) / 2)
204    chi2_val = stats.chi2.ppf(confidence, df=n-1)
205
206    # Approximate k factor
207    k = z_p * np.sqrt((n - 1) * (1 + 1/n) / chi2_val)
208
209    return x_bar - k * s, x_bar + k * s
210
211
212# ============================================
213# Conformal Prediction (Simple Version)
214# ============================================
215
216def conformal_prediction_interval(
217    cal_predictions: np.ndarray,
218    cal_true: np.ndarray,
219    new_prediction: float,
220    confidence: float = 0.90
221) -> Tuple[float, float]:
222    """
223    Compute distribution-free prediction interval using conformal prediction.
224
225    Parameters
226    ----------
227    cal_predictions : array
228        Model predictions on calibration set
229    cal_true : array
230        True values on calibration set
231    new_prediction : float
232        Model prediction for new point
233    confidence : float
234        Desired coverage level
235
236    Returns
237    -------
238    lower, upper : Prediction interval bounds
239    """
240    # Compute nonconformity scores (absolute residuals)
241    scores = np.abs(cal_predictions - cal_true)
242
243    # Quantile for coverage
244    n_cal = len(scores)
245    q_level = np.ceil((n_cal + 1) * confidence) / n_cal
246    q_level = min(q_level, 1.0)
247
248    # Quantile of scores
249    q_hat = np.quantile(scores, q_level)
250
251    return new_prediction - q_hat, new_prediction + q_hat
252
253
254# ============================================
255# Demonstration
256# ============================================
257
258if __name__ == "__main__":
259    np.random.seed(42)
260
261    print("=" * 60)
262    print("PREDICTION INTERVALS vs CONFIDENCE INTERVALS")
263    print("=" * 60)
264
265    # Generate normal data
266    mu_true = 100
267    sigma_true = 15
268    n = 30
269    data = np.random.normal(mu_true, sigma_true, n)
270
271    # Compare intervals
272    comparison = compare_intervals(data, confidence=0.95)
273
274    print(f"\nSample: n={comparison['n']}, mean={comparison['mean']:.2f}, std={comparison['std']:.2f}")
275    print(f"True parameters: ฮผ={mu_true}, ฯƒ={sigma_true}")
276
277    print("\n--- Confidence Interval for Mean ---")
278    ci = comparison['confidence_interval']
279    print(f"  [{ci['lower']:.2f}, {ci['upper']:.2f}]")
280    print(f"  Width: {ci['width']:.2f}")
281
282    print("\n--- Prediction Interval for Next Observation ---")
283    pi = comparison['prediction_interval']
284    print(f"  [{pi['lower']:.2f}, {pi['upper']:.2f}]")
285    print(f"  Width: {pi['width']:.2f}")
286
287    print(f"\nPrediction interval is {comparison['width_ratio']:.1f}x wider!")
288
289    # Show convergence behavior
290    print("\n--- Width Ratio as n Increases ---")
291    for n_test in [10, 30, 100, 500, 1000]:
292        data_test = np.random.normal(mu_true, sigma_true, n_test)
293        comp = compare_intervals(data_test)
294        print(f"  n={n_test:4d}: CI width={comp['confidence_interval']['width']:.2f}, "
295              f"PI width={comp['prediction_interval']['width']:.2f}, "
296              f"ratio={comp['width_ratio']:.2f}")
297
298    print("\n--- Regression Prediction Interval ---")
299    # Generate regression data
300    X = np.linspace(0, 10, 50)
301    y = 2 + 3 * X + np.random.normal(0, 2, 50)
302
303    # Predict at new points
304    x_new = np.array([2, 5, 8, 12])  # Note: 12 is extrapolation!
305    result = regression_prediction_interval(X, y, x_new)
306
307    print(f"Regression: y = {result['coefficients']['intercept']:.2f} + "
308          f"{result['coefficients']['slope']:.2f}x")
309    print(f"Residual SE: {result['residual_std_error']:.2f}")
310    print("\nPredictions with intervals:")
311    for i, x in enumerate(x_new):
312        pi = result['prediction_interval']
313        ci = result['confidence_interval']
314        print(f"  x={x:2d}: ลท={result['y_pred'][i]:.1f}, "
315              f"CI=[{ci['lower'][i]:.1f}, {ci['upper'][i]:.1f}], "
316              f"PI=[{pi['lower'][i]:.1f}, {pi['upper'][i]:.1f}]")
317
318    print("\n--- Tolerance Interval ---")
319    tol_lower, tol_upper = tolerance_interval_normal(data, coverage=0.90, confidence=0.95)
320    print(f"(0.95, 0.90) Tolerance Interval: [{tol_lower:.2f}, {tol_upper:.2f}]")
321    print("Interpretation: 95% confident that 90% of population is in this interval")

Common Pitfalls


Summary

Key Takeaways

  1. Prediction intervals quantify uncertainty about future observations, while confidence intervals quantify uncertainty aboutpopulation parameters.
  2. Prediction intervals are always wider because they account for both estimation uncertainty (reducible) and inherent variability (irreducible).
  3. As n โ†’ โˆž: Confidence intervals shrink to zero width, but prediction intervals converge to a minimum width of 2z*ฯƒ.
  4. In regression: Prediction intervals widen for extrapolation (predicting at X values far from the data center).
  5. Tolerance intervals answer a third question: what interval contains a specified proportion of the population with given confidence?
  6. Modern ML: Conformal prediction provides distribution-free prediction intervals with guaranteed coverage for any model.
Looking Ahead: In the next section, we'll explore Simultaneous Confidence Intervals โ€” how to maintain correct coverage when making multiple inferences at once, addressing the multiple testing problem through Bonferroni and related corrections.
Loading comments...