Chapter 18
20 min read
Section 117 of 175

Bayesian Credible Intervals

Bayesian Inference

Learning Objectives

By the end of this section, you will be able to:

📚 Core Knowledge

  • Define credible intervals and their probability interpretation
  • Distinguish between equal-tailed and HPD intervals
  • Explain why HPD intervals are always shortest
  • Contrast credible intervals with frequentist confidence intervals

🔧 Practical Skills

  • Compute credible intervals for common conjugate posteriors
  • Calculate intervals from MCMC samples
  • Choose between HPD and equal-tailed based on context
  • Communicate Bayesian uncertainty to stakeholders

🧠 Deep Learning Connections

  • Prediction Uncertainty: Credible intervals for neural network predictions via MC Dropout or Bayesian layers
  • Calibrated Uncertainty: Unlike softmax probabilities, Bayesian credible intervals give calibrated uncertainty estimates
  • Safe AI: Credible intervals enable "I don't know" responses when models are uncertain
  • Hyperparameter Uncertainty: Bayesian optimization uses credible intervals to balance exploration and exploitation
Where You'll Apply This: A/B testing (determining if a change is truly better), medical trials (efficacy bounds with honest probability statements), autonomous systems (safe decision-making under uncertainty), recommendation systems (Thompson Sampling), and any application where you need to communicate "how confident are we?"

The Big Picture: Honest Uncertainty

A credible interval is a Bayesian concept that directly answers the question most practitioners actually want to ask: "Given my data, what range of values is the parameter likely to take?"

Unlike frequentist confidence intervals, credible intervals allow us to make direct probability statements about parameters. We can legitimately say: "There is a 95% probability that the true parameter lies between 0.42 and 0.68." This is the interpretation that people intuitively (but incorrectly) apply to confidence intervals.

The Bayesian Promise

Given the observed data DD and our prior beliefs, a 95% credible interval[a,b][a, b] satisfies:

P(aθbD)=0.95P(a \leq \theta \leq b \mid D) = 0.95

The probability that the parameter lies in this interval is literally 95%.

Historical Motivation

The concept of quantifying uncertainty about parameters dates to Thomas Bayes and Pierre-Simon Laplace in the 18th century. However, for much of the 20th century, the frequentist paradigm dominated due to computational constraints. Modern computing power has revived Bayesian methods, and credible intervals are now standard in:

  • Pharmaceutical trials: FDA accepts Bayesian analyses for drug approval
  • Tech industry: Bayesian A/B testing at Google, Microsoft, Netflix
  • Machine learning: Bayesian neural networks, Gaussian processes, uncertainty quantification
  • Climate science: IPCC reports use Bayesian credible intervals for predictions

Mathematical Definition

A 100(1-α)% credible interval for parameter theta\\thetais any interval [a,b][a, b] such that:

abπ(θD)dθ=1α\int_a^b \pi(\theta | D) \, d\theta = 1 - \alpha

The interval contains (1-α) of the posterior probability mass

There are infinitely many intervals satisfying this condition. In practice, we use two main types:

Equal-Tailed Intervals

The equal-tailed interval excludes equal probability mass (α/2) from each tail of the posterior:

a=FθD1(α2),b=FθD1(1α2)a = F^{-1}_{\theta|D}\left(\frac{\alpha}{2}\right), \quad b = F^{-1}_{\theta|D}\left(1 - \frac{\alpha}{2}\right)

where F1F^{-1} is the inverse CDF (quantile function) of the posterior

When to use: Equal-tailed intervals are easy to compute (just quantiles), easy to interpret ("2.5% chance it's below a, 2.5% chance it's above b"), and common for reporting.

Highest Posterior Density (HPD) Intervals

The HPD interval is the shortest interval containing the specified probability mass. It has a remarkable property:

HPD Property

θ1[a,b],  θ2[a,b]:π(θ1D)π(θ2D)\forall \theta_1 \in [a,b], \; \forall \theta_2 \notin [a,b]: \quad \pi(\theta_1 | D) \geq \pi(\theta_2 | D)

Every point inside the HPD has higher posterior density than every point outside

PropertyEqual-TailedHPD
Tail probabilityEqual (alpha/2 each)Unequal (varies)
WidthNot necessarily shortestAlways shortest
SymmetrySymmetric tailsAdapts to skewness
ComputationSimple (quantiles)Optimization needed
For symmetric posteriorsSame as HPDSame as equal-tailed
Key Insight: For symmetric posteriors (like Normal), HPD and equal-tailed intervals are identical. For skewed posteriors (like Beta with α ≠ β), HPD is shorter and shifts toward the mode.

Interactive: HPD vs Equal-Tailed

Explore how HPD and equal-tailed intervals differ for various posterior shapes. Try the presets or adjust the parameters manually to see when HPD provides significant width savings.

HPD vs Equal-Tailed Credible Intervals

Compare the shortest interval (HPD) with the symmetric equal-tailed interval

modemeanET: -0.515HPD: -0.560HPDEqual-tailed00.250.50.751Parameter theta

HPD Interval (Shortest)

[0.9990, 0.4392]

Width: -0.5598

Every point inside has higher density than every point outside

Equal-Tailed Interval

[0.9990, 0.4841]

Width: -0.5149

2.5% probability excluded from each tail

Width Savings

-8.7%

HPD is shorter

Skewness

0.64

Right-skewed

Posterior Mean

0.2308

E[theta|data]

Posterior Std

0.1126

Uncertainty

Key Insight: For symmetric posteriors (skewness near 0), HPD and equal-tailed intervals are identical. For skewed posteriors, HPD provides a shorter interval by shifting toward the mode. The width savings of -8.7% shows how much more precise HPD is for this posterior.


Confidence vs Credible Intervals

The distinction between frequentist confidence intervals and Bayesian credible intervals isphilosophically profound yet often confused. Here's the core difference:

Frequentist Confidence Interval

The parameter theta\\theta is fixed (but unknown). The interval is random (varies with sample).

Interpretation: "If we repeated this experiment many times, 95% of the computed intervals would contain the true θ."

Cannot make probability statements about THIS specific interval containing θ.

Bayesian Credible Interval

The parameter theta\\theta is random (has a distribution). The interval is fixed once computed.

Interpretation: "Given my data and prior, there is a 95% probability that θ lies in [a, b]."

CAN make direct probability statements about the parameter.

The Practical Reality: Most practitioners actually want the Bayesian interpretation. When someone says "95% confident that the effect is between 0.2 and 0.8," they usually mean it probabilistically (Bayesian), not procedurally (frequentist). Credible intervals give you the interpretation you likely already assume!

Interactive: The Two Paradigms

Confidence Intervals vs Credible Intervals

Two fundamentally different interpretations of interval estimates

Frequentist: Confidence Interval

True θ = 50

Coverage Rate: 96%

24 contain true θ, 1 miss

Bayesian: Credible Interval

posterior mean3040506070

95% Credible Interval: [44.2, 59.8]

Width: 15.68

Key Philosophical Differences

AspectConfidence IntervalCredible Interval
Parameter StatusFixed but unknownRandom variable
Interval StatusRandom (varies by sample)Fixed once computed
Probability StatementAbout the procedureAbout the parameter
Requires Prior?NoYes
Interpretation"95% of CIs will contain θ""95% probability θ is here"

The Practical Reality: With uninformative priors and large samples, credible intervals and confidence intervals often give nearly identical numerical results. The philosophical difference matters most when making decisions about specific intervals or when incorporating prior knowledge is important.


Computing Credible Intervals

Step-by-Step Workflow

  1. Specify the prior: Choose a prior distribution reflecting your beliefs before seeing data. For minimal influence, use an uninformative prior.
  2. Collect data: Observe the outcomes of your experiment or study.
  3. Compute the posterior: Use Bayes' theorem to update your beliefs. For conjugate priors, this is closed-form; otherwise, use MCMC.
  4. Extract the interval: Compute equal-tailed quantiles or search for the HPD interval from the posterior distribution or MCMC samples.
  5. Interpret and communicate: State the result as a direct probability statement about the parameter.

Interactive: Bayesian Workflow

Walk through the complete workflow from prior to credible interval. Use the step navigation to see how each component contributes to the final result.

Step-by-Step: Computing Credible Intervals

Follow the Bayesian workflow from prior to posterior to credible interval

Step 1: Choose a Prior

Start with your prior beliefs about the parameter before seeing data.

Higher α pushes prior toward 1

Higher β pushes prior toward 0

Prior00.250.50.751Parameter θ

What Determines Interval Width?

The width of a credible interval (and thus the precision of our inference) depends on three main factors:

Sample Size (n)

Width scales as sim1/sqrtn\\sim 1/\\sqrt{n}. Doubling sample size reduces width by ~30%.

Prior Strength

Strong informative priors can narrow intervals if they match the data, or widen/shift them if they conflict.

Credible Level

Higher confidence requires wider intervals. A 99% CI is always wider than a 95% CI for the same posterior.

Interactive: Width Dynamics

What Affects Credible Interval Width?

Explore how sample size, prior strength, and credible level determine interval precision

More data = narrower intervals (sqrt(n) scaling)

Current Settings

True theta:0.35
n:30
Prior:Beta(2, 2)
Level:95%

Interval Width vs Sample Size

0.5480501001502001.100Sample Size (n)Width

Posterior Distributions at Different Sample Sizes

true thetan=5n=20n=50n=100n=20000.250.50.751Parameter theta

Current CI

[0.0010, 0.5488]

Width: 0.5478

Posterior Mean

0.3824

vs true theta: 0.35

Effective n

34

= data (30) + prior (4)

Key Insight: Width ~ 1/sqrt(n)
Doubling sample size reduces interval width by approximately 30%. This is the fundamental sqrt(n) convergence rate of statistical inference.


Real-World Applications


Deep Learning Applications

Credible intervals are increasingly important in modern deep learning for uncertainty quantification. Here are key applications:

🎲 MC Dropout Uncertainty

Run inference multiple times with dropout enabled. The distribution of predictions gives a posterior over outputs. The 95% credible interval of predictions quantifies epistemic uncertainty - "what the model doesn't know."

🔄 Bayesian Neural Networks

Maintain full posteriors over weights instead of point estimates. Predictions integrate over weight uncertainty, naturally producing credible intervals. Crucial for safety-critical applications like medical diagnosis and autonomous vehicles.

🎯 Bayesian Optimization

Hyperparameter tuning with Gaussian processes produces credible intervals for the objective function at each point. The acquisition function balances exploitation (go where mean is good) and exploration (go where uncertainty is high).

🛡️ Uncertainty-Aware Predictions

Credible intervals enable "I don't know" responses. When the 95% credible interval is too wide, the system can defer to human judgment instead of making overconfident wrong predictions.


Python Implementation

Here's a comprehensive implementation of credible interval computation for both closed-form posteriors and Monte Carlo samples. Click on any highlighted line to see detailed explanations.

Credible Interval Computation
🐍credible_intervals.py
9Equal-Tailed CI Function

Equal-tailed intervals exclude equal probability mass from each tail of the posterior. For a 95% CI, we exclude 2.5% from each tail using the inverse CDF (quantile function).

EXAMPLE
stats.beta(5, 3).ppf(0.025) returns the 2.5th percentile
17HPD Interval Function

HPD finds the shortest interval containing the desired probability mass. For unimodal distributions, we search over all possible lower tail probabilities to minimize width.

23Width as Optimization Target

We define width as a function of where we start the interval. The scipy optimizer finds the lower tail probability that minimizes total interval width.

43Conjugate Prior Choice

Beta(2, 2) is a weakly informative prior - it suggests the parameter is likely not near 0 or 1, but doesn't strongly influence the posterior. Equivalent to having seen 2 successes and 2 failures.

49Posterior Conjugacy

With Beta prior and Binomial likelihood, the posterior is also Beta. The hyperparameters simply add: posterior_alpha = prior_alpha + successes. This is the Beta-Binomial conjugate pair.

76Normal-Normal Conjugacy

When the prior and likelihood are both Normal, the posterior is Normal. The posterior precision (inverse variance) is the sum of prior and data precisions - a weighted average.

80Posterior Mean Formula

The posterior mean is a precision-weighted average of the prior mean and sample mean. With more data, the sample mean dominates; with a strong prior, the prior mean has more influence.

95Monte Carlo Credible Intervals

When we have MCMC samples from the posterior (common in complex models), we compute intervals directly from the empirical distribution of samples. No closed-form posterior needed!

99Empirical Quantiles

For equal-tailed intervals from samples, np.percentile gives the empirical quantiles. With enough samples, this converges to the true posterior quantiles.

109HPD from Samples

For HPD from samples: sort the samples, then check all contiguous blocks of size n_included = ceil(0.95 * n_samples). The block with minimum width is the HPD interval.

121Mixture Posterior Example

Mixture posteriors arise in many Bayesian models (e.g., mixture models, multimodal posteriors). Monte Carlo methods handle these naturally since we just need samples, not closed-form expressions.

145 lines without explanation
1import numpy as np
2from scipy import stats
3from scipy.optimize import minimize_scalar
4import matplotlib.pyplot as plt
5
6# ============================================
7# CORE: Credible Interval Calculations
8# ============================================
9
10def equal_tailed_ci(posterior, credible_level=0.95):
11    """
12    Compute equal-tailed credible interval.
13    Excludes alpha/2 probability from each tail.
14    """
15    alpha = 1 - credible_level
16    lower = posterior.ppf(alpha / 2)
17    upper = posterior.ppf(1 - alpha / 2)
18    return lower, upper
19
20def hpd_interval(posterior, credible_level=0.95, n_points=1000):
21    """
22    Compute Highest Posterior Density (HPD) interval.
23    Finds the shortest interval containing credible_level probability.
24    """
25    alpha = 1 - credible_level
26
27    # For unimodal distributions, search over lower tail probabilities
28    def interval_width(lower_tail_prob):
29        if lower_tail_prob >= alpha:
30            return np.inf
31        lower = posterior.ppf(lower_tail_prob)
32        upper = posterior.ppf(lower_tail_prob + credible_level)
33        return upper - lower
34
35    # Find optimal lower tail probability
36    result = minimize_scalar(
37        interval_width,
38        bounds=(0, alpha),
39        method='bounded'
40    )
41
42    optimal_lower_tail = result.x
43    lower = posterior.ppf(optimal_lower_tail)
44    upper = posterior.ppf(optimal_lower_tail + credible_level)
45
46    return lower, upper
47
48# ============================================
49# Example 1: Beta-Binomial (Proportions)
50# ============================================
51
52# Prior: Beta(2, 2) - slight preference for middle values
53prior_alpha, prior_beta = 2, 2
54
55# Observed data: 23 successes, 7 failures
56successes, failures = 23, 7
57
58# Posterior: Beta(prior_alpha + successes, prior_beta + failures)
59post_alpha = prior_alpha + successes
60post_beta = prior_beta + failures
61posterior = stats.beta(post_alpha, post_beta)
62
63print("=== Beta-Binomial Example ===")
64print(f"Prior: Beta({prior_alpha}, {prior_beta})")
65print(f"Data: {successes} successes, {failures} failures")
66print(f"Posterior: Beta({post_alpha}, {post_beta})")
67print(f"Posterior mean: {posterior.mean():.4f}")
68print(f"Posterior std: {posterior.std():.4f}")
69
70# Compute both interval types
71et_lower, et_upper = equal_tailed_ci(posterior, 0.95)
72hpd_lower, hpd_upper = hpd_interval(posterior, 0.95)
73
74print(f"\n95% Equal-Tailed CI: [{et_lower:.4f}, {et_upper:.4f}]")
75print(f"  Width: {et_upper - et_lower:.4f}")
76print(f"95% HPD Interval: [{hpd_lower:.4f}, {hpd_upper:.4f}]")
77print(f"  Width: {hpd_upper - hpd_lower:.4f}")
78print(f"HPD is {(et_upper-et_lower)-(hpd_upper-hpd_lower):.4f} shorter")
79
80# ============================================
81# Example 2: Normal-Normal (Mean Inference)
82# ============================================
83
84# Prior: mu ~ N(mu0, tau0^2)
85mu0, tau0 = 0, 10  # Weakly informative prior
86
87# Data: n observations with sample mean xbar and known variance
88n_obs = 25
89xbar = 3.2
90sigma = 2.0  # Known population std
91
92# Posterior precision and mean
93tau0_sq = tau0**2
94sigma_sq = sigma**2
95posterior_precision = 1/tau0_sq + n_obs/sigma_sq
96posterior_variance = 1/posterior_precision
97posterior_mean = posterior_variance * (mu0/tau0_sq + n_obs*xbar/sigma_sq)
98posterior_std = np.sqrt(posterior_variance)
99
100posterior_normal = stats.norm(posterior_mean, posterior_std)
101
102print("\n=== Normal-Normal Example ===")
103print(f"Prior: N({mu0}, {tau0}^2)")
104print(f"Data: n={n_obs}, xbar={xbar}, sigma={sigma}")
105print(f"Posterior: N({posterior_mean:.4f}, {posterior_std:.4f}^2)")
106
107et_lower_n, et_upper_n = equal_tailed_ci(posterior_normal, 0.95)
108print(f"95% Credible Interval: [{et_lower_n:.4f}, {et_upper_n:.4f}]")
109
110# ============================================
111# Example 3: Monte Carlo for Complex Posteriors
112# ============================================
113
114def mc_credible_interval(samples, credible_level=0.95):
115    """
116    Compute credible interval from MCMC samples.
117    Works for any posterior, even non-standard ones.
118    """
119    alpha = 1 - credible_level
120
121    # Equal-tailed from quantiles
122    et_lower = np.percentile(samples, 100 * alpha / 2)
123    et_upper = np.percentile(samples, 100 * (1 - alpha / 2))
124
125    # HPD: find shortest interval containing credible_level of samples
126    sorted_samples = np.sort(samples)
127    n_samples = len(sorted_samples)
128    n_included = int(np.ceil(credible_level * n_samples))
129
130    # Check all possible intervals of size n_included
131    min_width = np.inf
132    hpd_lower, hpd_upper = et_lower, et_upper
133
134    for i in range(n_samples - n_included + 1):
135        width = sorted_samples[i + n_included - 1] - sorted_samples[i]
136        if width < min_width:
137            min_width = width
138            hpd_lower = sorted_samples[i]
139            hpd_upper = sorted_samples[i + n_included - 1]
140
141    return {
142        'equal_tailed': (et_lower, et_upper),
143        'hpd': (hpd_lower, hpd_upper)
144    }
145
146# Simulate from a mixture posterior (no closed form)
147np.random.seed(42)
148component1 = np.random.normal(2, 0.5, 7000)
149component2 = np.random.normal(5, 1.0, 3000)
150mixture_samples = np.concatenate([component1, component2])
151np.random.shuffle(mixture_samples)
152
153intervals = mc_credible_interval(mixture_samples, 0.95)
154print("\n=== Monte Carlo for Mixture Posterior ===")
155print(f"95% Equal-Tailed: [{intervals['equal_tailed'][0]:.4f}, {intervals['equal_tailed'][1]:.4f}]")
156print(f"95% HPD: [{intervals['hpd'][0]:.4f}, {intervals['hpd'][1]:.4f}]")

Common Pitfalls and Misconceptions

"Credible intervals and confidence intervals are the same thing"

They have fundamentally different interpretations. Credible intervals make probability statements about the parameter; confidence intervals make statements about the procedure. With uninformative priors and large samples they may be numerically similar, but the interpretation always differs.

"HPD intervals are always better than equal-tailed"

HPD is shorter, yes, but equal-tailed has simpler interpretation ("2.5% in each tail") and is easier to compute. For symmetric posteriors they're identical. For skewed posteriors, choose based on what you're communicating - sometimes the symmetric interpretation is what stakeholders need.

"The prior doesn't matter for credible intervals"

The prior always matters - it's part of the Bayesian model. With lots of data, the prior influence diminishes (Bernstein-von Mises theorem), but with small samples the prior can substantially affect both the center and width of the credible interval. Always report your prior.

"95% credible means 95% coverage in repeated experiments"

Credible intervals make probability statements conditional on the data you observed. They don't guarantee 95% frequentist coverage. A well-calibrated Bayesian model often has good coverage properties, but this isn't guaranteed and isn't the point.


Knowledge Check

Test your understanding of Bayesian credible intervals with this interactive quiz.

Knowledge Check

Question 1 of 8

What is the fundamental difference between how frequentist and Bayesian frameworks treat the parameter θ?

Score: 0/0

Summary

Key Takeaways

  1. Credible intervals give the interpretation people want: "95% probability the parameter lies in this range" - a direct statement about where the parameter is, not about long-run procedure performance.
  2. Two main types exist: Equal-tailed intervals exclude α/2 from each tail (simple, symmetric interpretation). HPD intervals are shortest (optimal, adapt to skewness). For symmetric posteriors, they're identical.
  3. Width depends on sample size, prior, and credible level: More data narrows intervals (~1/sqrt(n)). Stronger priors can narrow or shift intervals. Higher credible levels require wider intervals.
  4. MCMC samples enable credible intervals for any posterior: For complex models without closed-form posteriors, compute intervals directly from MCMC samples using empirical quantiles or sorted-block methods.
  5. Deep learning needs credible intervals: Uncertainty quantification via MC Dropout, Bayesian NNs, and Gaussian processes all produce posterior distributions. Credible intervals translate these into actionable uncertainty statements for safe AI.
Looking Ahead: In the next section, we'll explore Bayes Factors and Model Comparison - how to use Bayesian methods not just for parameter estimation, but for deciding between competing models of the world.
Loading comments...