Learning Objectives
By the end of this section, you will be able to:
📚 Core Knowledge
- • Define credible intervals and explain their Bayesian interpretation
- • Distinguish between equal-tailed and HPD credible intervals
- • Explain the philosophical difference from frequentist confidence intervals
- • Understand when credible and confidence intervals coincide numerically
🔧 Practical Skills
- • Compute credible intervals from posterior distributions
- • Choose between equal-tailed and HPD intervals appropriately
- • Apply credible intervals in Bayesian neural networks and uncertainty quantification
- • Implement credible intervals using Python and probabilistic programming
Where You'll Apply This: Uncertainty quantification in deep learning, Bayesian optimization for hyperparameter tuning, Thompson sampling for multi-armed bandits, A/B testing with prior information, probabilistic forecasting, and any situation where you want to make direct probability statements about unknown parameters.
The Big Picture: A Different Philosophy
In the previous sections, we explored frequentist confidence intervals—a procedure that, when repeated many times, captures the true parameter in a specified percentage of cases. But notice the careful language: we never said the parameter is in the interval with 95% probability.
Bayesian credible intervals offer something different: a direct probability statement about the parameter itself. When we say "there is a 95% probability that θ lies in [0.3, 0.7]," we mean exactly that—given our prior beliefs and the observed data, we believe θ is in this interval with 95% probability.
The Key Philosophical Shift
θ is fixed but unknown
The interval is random (varies with each sample)
Probability describes the procedure, not the parameter
θ is a random variable with a probability distribution
The interval is fixed once computed from data
Probability describes our belief about θ
This isn't just philosophical nit-picking—it has real practical implications. Credible intervals let us answer the question practitioners actually want to ask: "Given what I've observed, where do I think the true value is?"
Historical Context
Thomas Bayes (1763) & Pierre-Simon Laplace (1774)
Bayes' theorem, published posthumously in 1763, laid the foundation for treating unknown quantities as random variables with probability distributions. Laplace extended these ideas and was the first to compute what we now call credible intervals. However, the frequentist school dominated 20th-century statistics. The Bayesian revival came in the 1990s with computational methods (MCMC) that made posterior computation practical. Today, credible intervals are standard in machine learning and scientific computing.
What Is a Credible Interval?
Formal Definition
A credible interval (also called a posterior interval or Bayesian confidence interval) is an interval in the parameter space that contains a specified probability mass of the posterior distribution.
Definition: Credible Interval
An interval such that the posterior probability of θ lying within this interval equals .
Symbol Table
| Symbol | Meaning | Intuition |
|---|---|---|
| θ | Unknown parameter | What we're trying to estimate |
| π(θ|data) | Posterior distribution | Our updated beliefs after seeing data |
| L, U | Lower and upper bounds | The interval endpoints |
| 1-α | Credible level (e.g., 0.95) | How much posterior mass is captured |
| α | Total excluded probability | Mass in the tails (e.g., 0.05) |
The Bayesian Interpretation
The Bayesian interpretation is remarkably intuitive and matches what non-statisticians often incorrectly assume about confidence intervals:
"Given the prior and observed data, there is a 95% probability that the true parameter θ lies between L and U."
This direct probability statement is possible because Bayesians treat θ as a random variable. The posterior distribution represents our complete state of knowledge about θ after incorporating both prior beliefs and observed data.
Interactive: Confidence vs Credible Intervals
This visualization contrasts the two approaches side-by-side. On the left, see how frequentist confidence intervals vary from sample to sample (some miss the true parameter). On the right, see how a single credible interval captures posterior probability mass. Click each panel for detailed interpretation.
Confidence Intervals vs Credible Intervals
Two fundamentally different interpretations of interval estimates
Frequentist: Confidence Interval
Coverage Rate: 96%
24 contain true θ, 1 miss
Bayesian: Credible Interval
95% Credible Interval: [44.2, 59.8]
Width: 15.68
Key Philosophical Differences
| Aspect | Confidence Interval | Credible Interval |
|---|---|---|
| Parameter Status | Fixed but unknown | Random variable |
| Interval Status | Random (varies by sample) | Fixed once computed |
| Probability Statement | About the procedure | About the parameter |
| Requires Prior? | No | Yes |
| Interpretation | "95% of CIs will contain θ" | "95% probability θ is here" |
The Practical Reality: With uninformative priors and large samples, credible intervals and confidence intervals often give nearly identical numerical results. The philosophical difference matters most when making decisions about specific intervals or when incorporating prior knowledge is important.
Computing Credible Intervals
Once we have the posterior distribution, there are multiple ways to construct a credible interval with the same probability content. The two most common are equal-tailed intervals and Highest Posterior Density (HPD) intervals.
Equal-Tailed Intervals
The simplest approach is to exclude equal probability from each tail of the posterior. For a 95% credible interval, we exclude 2.5% from the lower tail and 2.5% from the upper tail.
Equal-Tailed Interval Formula
where is the inverse CDF (quantile function) of the posterior.
- Advantage: Simple to compute—just find the α/2 and 1-α/2 quantiles
- Advantage: Transformation-invariant (same interval for θ and log(θ))
- Disadvantage: May not be the shortest possible interval for skewed posteriors
Highest Posterior Density (HPD) Intervals
The HPD interval (also called the Highest Density Interval or HDI) is the shortest interval containing the specified probability mass. Every point inside the HPD has higher posterior density than every point outside it.
HPD Interval Definition
where k is chosen such that .
- Advantage: Shortest possible interval—maximizes precision
- Advantage: Contains the mode (most likely value)
- Disadvantage: Not transformation-invariant
- Disadvantage: More complex to compute
Interactive: Step-by-Step Computation
Follow the complete Bayesian workflow: specify a prior, observe data, compute the posterior, and extract a credible interval. This step-by-step approach helps build intuition for how each component contributes to the final interval.
Step-by-Step: Computing Credible Intervals
Follow the Bayesian workflow from prior to posterior to credible interval
Step 1: Choose a Prior
Start with your prior beliefs about the parameter before seeing data.
Higher α pushes prior toward 1
Higher β pushes prior toward 0
Interactive: HPD vs Equal-Tailed Explorer
Explore how HPD and equal-tailed intervals compare for different posterior shapes. Use the presets to try symmetric, right-skewed, and left-skewed distributions. Notice how the intervals converge for symmetric cases but diverge dramatically for skewed posteriors.
Credible Interval Explorer: Equal-Tailed vs HPD
Compare two types of Bayesian credible intervals and see when they differ
Posterior Shape Presets
Equal-Tailed Interval
Lower: 0.0010
Upper: 0.4628
Width: 0.4618
Equal probability in each tail: 2.5%
HPD (Highest Posterior Density)
Lower: 0.0010
Upper: 0.4328
Width: 0.4318
Shortest interval containing 95% probability
Width Comparison: HPD is 0.0301 narrower
Key Insight: For symmetric posteriors, equal-tailed and HPD intervals are identical. For skewed posteriors, HPD gives a shorter interval because it captures the high-density region. Try the "Right Skewed" or "Left Skewed" presets to see the difference!
Rule of thumb: HPD intervals are preferred when you want the shortest interval containing the specified probability, but equal-tailed intervals are simpler to compute and communicate.
Interactive: How the Posterior Forms
Understanding credible intervals requires understanding where the posterior comes from. This visualization shows how the posterior is proportional to the likelihood times the prior, and how the balance shifts as you add more data or strengthen the prior.
Posterior = Likelihood × Prior
See visually how the posterior combines information from the prior and the data
Prior Presets
Data Presets
π(θ|data) ∝ L(data|θ) × π(θ)
Posterior is proportional to Likelihood times Prior
Prior Mean
0.500
Beta(2, 2)
MLE (Data)
0.700
14/20
Posterior Mean
0.667
Beta(16, 8)
Data Influence
83%
vs 17% prior
Key Insight: The posterior is a compromise between the prior and the likelihood. The posterior mean (0.667) lies between the prior mean (0.500) and the MLE (0.700). With more data, the posterior shifts toward the MLE; with stronger priors, it stays closer to the prior mean.
When to Use Each Approach
Use Equal-Tailed Intervals When
- • Posterior is approximately symmetric
- • You need transformation-invariance
- • Simplicity is valued over minimal width
- • Computing HPD is impractical (high dimensions)
- • Communicating to audiences unfamiliar with HPD
Use HPD Intervals When
- • Posterior is highly skewed
- • You want the narrowest possible interval
- • Parameter bounds are important (e.g., probabilities, variances)
- • Using MCMC software that computes HPD automatically
- • Decision-making where precision matters
| Aspect | Equal-Tailed | HPD |
|---|---|---|
| Width | May be wider for skewed posteriors | Always shortest |
| Contains Mode | Not guaranteed | Always (for unimodal) |
| Computation | Simple (two quantiles) | Requires optimization |
| Transformation | Invariant | Not invariant |
| Software Support | Universal | Most Bayesian packages |
AI/ML Applications
Credible intervals are increasingly important in modern machine learning, where quantifying uncertainty is crucial for trustworthy AI systems.
Uncertainty Quantification in Deep Learning
🧠 Bayesian Neural Networks
Instead of learning point estimates for weights, Bayesian neural networks maintain posterior distributions over weights. Predictions come with credible intervals that reflect uncertainty.
Why This Matters: In safety-critical applications like medical diagnosis or autonomous driving, a model that says "I predict class A with 95% credible interval [0.6, 0.9]" is far more useful than one that simply says "class A." When the credible interval is wide, the system can defer to human judgment.
Bayesian Optimization
🎯 Hyperparameter Tuning with Credible Intervals
Gaussian Process models in Bayesian optimization provide posterior distributions over the objective function. Credible intervals guide the exploration-exploitation trade-off.
Acquisition Functions: Expected Improvement (EI), Upper Confidence Bound (UCB), and other acquisition functions use the posterior mean and credible interval width to decide where to sample next. Wide credible intervals indicate regions worth exploring.
Thompson Sampling and Multi-Armed Bandits
🎰 Thompson Sampling
Thompson sampling is a Bayesian bandit algorithm that maintains posterior distributions over arm rewards. At each step, it samples from each arm's posterior and pulls the arm with highest sample.
Connection to Credible Intervals: Arms with wider credible intervals are more likely to produce extreme samples, encouraging exploration. As data accumulates, credible intervals shrink, naturally shifting toward exploitation.
Python Implementation
Here's a complete implementation for computing credible intervals from Beta posteriors, along with integration examples for PyMC. Click on any highlighted line for explanation.
Knowledge Check
Test your understanding of credible intervals with this comprehensive quiz. Pay attention to the philosophical differences between Bayesian and frequentist interpretations.
Knowledge Check
Question 1 of 8What is the fundamental difference between how frequentist and Bayesian frameworks treat the parameter θ?
Summary
Key Takeaways
- Credible intervals allow direct probability statements: Unlike confidence intervals, we can say "there is a 95% probability θ is in this interval."
- Equal-tailed vs HPD: Equal-tailed intervals exclude equal probability from each tail; HPD intervals are the shortest possible. They coincide for symmetric posteriors.
- Priors matter: Credible intervals depend on both the prior and the data. With uninformative priors and large samples, they approximate frequentist CIs.
- Bayesian interpretation: The parameter θ is treated as random with a posterior distribution; the interval is fixed once computed.
- ML applications: Uncertainty quantification in neural networks, Bayesian optimization, Thompson sampling, and any application requiring probabilistic reasoning about unknowns.
The Big Picture: Credible intervals complete the Bayesian inference workflow. Start with a prior encoding initial beliefs, update with data via Bayes' theorem to get the posterior, then extract credible intervals to summarize uncertainty. This framework provides a coherent, probability-based approach to quantifying what we know and don't know about the world—essential for building trustworthy AI systems.