Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

📚 Core Knowledge

• Define credible intervals and explain their Bayesian interpretation
• Distinguish between equal-tailed and HPD credible intervals
• Explain the philosophical difference from frequentist confidence intervals
• Understand when credible and confidence intervals coincide numerically

🔧 Practical Skills

• Compute credible intervals from posterior distributions
• Choose between equal-tailed and HPD intervals appropriately
• Apply credible intervals in Bayesian neural networks and uncertainty quantification
• Implement credible intervals using Python and probabilistic programming

Where You'll Apply This: Uncertainty quantification in deep learning, Bayesian optimization for hyperparameter tuning, Thompson sampling for multi-armed bandits, A/B testing with prior information, probabilistic forecasting, and any situation where you want to make direct probability statements about unknown parameters.

The Big Picture: A Different Philosophy

In the previous sections, we explored frequentist confidence intervals—a procedure that, when repeated many times, captures the true parameter in a specified percentage of cases. But notice the careful language: we never said the parameter is in the interval with 95% probability.

Bayesian credible intervals offer something different: a direct probability statement about the parameter itself. When we say "there is a 95% probability that θ lies in [0.3, 0.7]," we mean exactly that—given our prior beliefs and the observed data, we believe θ is in this interval with 95% probability.

The Key Philosophical Shift

Frequentist View

θ is fixed but unknown

The interval is random (varies with each sample)

Probability describes the procedure, not the parameter

Bayesian View

θ is a random variable with a probability distribution

The interval is fixed once computed from data

Probability describes our belief about θ

This isn't just philosophical nit-picking—it has real practical implications. Credible intervals let us answer the question practitioners actually want to ask: "Given what I've observed, where do I think the true value is?"

Historical Context

📜

Thomas Bayes (1763) & Pierre-Simon Laplace (1774)

Bayes' theorem, published posthumously in 1763, laid the foundation for treating unknown quantities as random variables with probability distributions. Laplace extended these ideas and was the first to compute what we now call credible intervals. However, the frequentist school dominated 20th-century statistics. The Bayesian revival came in the 1990s with computational methods (MCMC) that made posterior computation practical. Today, credible intervals are standard in machine learning and scientific computing.

What Is a Credible Interval?

Formal Definition

A credible interval (also called a posterior interval or Bayesian confidence interval) is an interval in the parameter space that contains a specified probability mass of the posterior distribution.

Definition: $(1-\alpha)$ Credible Interval

P(L \leq \theta \leq U \mid \text{data}) = 1 - \alpha

An interval $[L, U]$ such that the posterior probability of θ lying within this interval equals $1-\alpha$ .

Symbol Table

Symbol	Meaning	Intuition
θ	Unknown parameter	What we're trying to estimate
π(θ\|data)	Posterior distribution	Our updated beliefs after seeing data
L, U	Lower and upper bounds	The interval endpoints
1-α	Credible level (e.g., 0.95)	How much posterior mass is captured
α	Total excluded probability	Mass in the tails (e.g., 0.05)

The Bayesian Interpretation

The Bayesian interpretation is remarkably intuitive and matches what non-statisticians often incorrectly assume about confidence intervals:

"Given the prior and observed data, there is a 95% probability that the true parameter θ lies between L and U."

This direct probability statement is possible because Bayesians treat θ as a random variable. The posterior distribution $\pi(\theta|\text{data})$ represents our complete state of knowledge about θ after incorporating both prior beliefs and observed data.

Why the Interpretation Works: In Bayesian statistics, probability represents degree of belief, not long-run frequency. Since θ has a probability distribution (the posterior), we can directly compute the probability that θ falls in any interval.

Interactive: Confidence vs Credible Intervals

This visualization contrasts the two approaches side-by-side. On the left, see how frequentist confidence intervals vary from sample to sample (some miss the true parameter). On the right, see how a single credible interval captures posterior probability mass. Click each panel for detailed interpretation.

Confidence Intervals vs Credible Intervals

Two fundamentally different interpretations of interval estimates

Confidence/Credible Level: 95%

Number of Intervals: 25

Frequentist: Confidence Interval

Coverage Rate: 96%

24 contain true θ, 1 miss

Bayesian: Credible Interval

95% Credible Interval: [44.2, 59.8]

Width: 15.68

Key Philosophical Differences

Aspect	Confidence Interval	Credible Interval
Parameter Status	Fixed but unknown	Random variable
Interval Status	Random (varies by sample)	Fixed once computed
Probability Statement	About the procedure	About the parameter
Requires Prior?	No	Yes
Interpretation	"95% of CIs will contain θ"	"95% probability θ is here"

The Practical Reality: With uninformative priors and large samples, credible intervals and confidence intervals often give nearly identical numerical results. The philosophical difference matters most when making decisions about specific intervals or when incorporating prior knowledge is important.

Computing Credible Intervals

Once we have the posterior distribution, there are multiple ways to construct a credible interval with the same probability content. The two most common are equal-tailed intervals and Highest Posterior Density (HPD) intervals.

Equal-Tailed Intervals

The simplest approach is to exclude equal probability from each tail of the posterior. For a 95% credible interval, we exclude 2.5% from the lower tail and 2.5% from the upper tail.

Equal-Tailed Interval Formula

[L, U] = [F^{-1}_{\theta|\text{data}}(\alpha/2), \; F^{-1}_{\theta|\text{data}}(1-\alpha/2)]

where $F^{-1}$ is the inverse CDF (quantile function) of the posterior.

Advantage: Simple to compute—just find the α/2 and 1-α/2 quantiles
Advantage: Transformation-invariant (same interval for θ and log(θ))
Disadvantage: May not be the shortest possible interval for skewed posteriors

Highest Posterior Density (HPD) Intervals

The HPD interval (also called the Highest Density Interval or HDI) is the shortest interval containing the specified probability mass. Every point inside the HPD has higher posterior density than every point outside it.

HPD Interval Definition

\text{HPD} = \{\theta : \pi(\theta|\text{data}) \geq k\\}

where k is chosen such that $P(\theta \in \text{HPD}) = 1 - \alpha$ .

Advantage: Shortest possible interval—maximizes precision
Advantage: Contains the mode (most likely value)
Disadvantage: Not transformation-invariant
Disadvantage: More complex to compute

When Do They Differ? For symmetric posteriors (like Normal), equal-tailed and HPD intervals are identical. They diverge for skewed posteriors—try the interactive explorer below with different Beta parameters to see this!

Interactive: Step-by-Step Computation

Follow the complete Bayesian workflow: specify a prior, observe data, compute the posterior, and extract a credible interval. This step-by-step approach helps build intuition for how each component contributes to the final interval.

Step-by-Step: Computing Credible Intervals

Follow the Bayesian workflow from prior to posterior to credible interval

Step 1: Choose a Prior

Start with your prior beliefs about the parameter before seeing data.

Prior α (pseudo-successes): 2

Higher α pushes prior toward 1

Prior β (pseudo-failures): 2

Higher β pushes prior toward 0

Interactive: HPD vs Equal-Tailed Explorer

Explore how HPD and equal-tailed intervals compare for different posterior shapes. Use the presets to try symmetric, right-skewed, and left-skewed distributions. Notice how the intervals converge for symmetric cases but diverge dramatically for skewed posteriors.

Credible Interval Explorer: Equal-Tailed vs HPD

Compare two types of Bayesian credible intervals and see when they differ

Posterior Shape Presets

Posterior α: 8.0

Posterior β: 20.0

Credible Level: 95%

Equal-Tailed Interval

Lower: 0.0010

Upper: 0.4628

Width: 0.4618

Equal probability in each tail: 2.5%

HPD (Highest Posterior Density)

Lower: 0.0010

Upper: 0.4328

Width: 0.4318

Shortest interval containing 95% probability

Width Comparison: HPD is 0.0301 narrower

Key Insight: For symmetric posteriors, equal-tailed and HPD intervals are identical. For skewed posteriors, HPD gives a shorter interval because it captures the high-density region. Try the "Right Skewed" or "Left Skewed" presets to see the difference!

Rule of thumb: HPD intervals are preferred when you want the shortest interval containing the specified probability, but equal-tailed intervals are simpler to compute and communicate.

Interactive: How the Posterior Forms

Understanding credible intervals requires understanding where the posterior comes from. This visualization shows how the posterior is proportional to the likelihood times the prior, and how the balance shifts as you add more data or strengthen the prior.

Posterior = Likelihood × Prior

See visually how the posterior combines information from the prior and the data

Prior Presets

Data Presets

Prior α: 2

Prior β: 2

Successes: 14

Failures: 6

π(θ|data) ∝ L(data|θ) × π(θ)

Posterior is proportional to Likelihood times Prior

Prior Mean

0.500

Beta(2, 2)

MLE (Data)

0.700

14/20

Posterior Mean

0.667

Beta(16, 8)

Data Influence

83%

vs 17% prior

Key Insight: The posterior is a compromise between the prior and the likelihood. The posterior mean (0.667) lies between the prior mean (0.500) and the MLE (0.700). With more data, the posterior shifts toward the MLE; with stronger priors, it stays closer to the prior mean.

When to Use Each Approach

Use Equal-Tailed Intervals When

• Posterior is approximately symmetric
• You need transformation-invariance
• Simplicity is valued over minimal width
• Computing HPD is impractical (high dimensions)
• Communicating to audiences unfamiliar with HPD

Use HPD Intervals When

• Posterior is highly skewed
• You want the narrowest possible interval
• Parameter bounds are important (e.g., probabilities, variances)
• Using MCMC software that computes HPD automatically
• Decision-making where precision matters

Aspect	Equal-Tailed	HPD
Width	May be wider for skewed posteriors	Always shortest
Contains Mode	Not guaranteed	Always (for unimodal)
Computation	Simple (two quantiles)	Requires optimization
Transformation	Invariant	Not invariant
Software Support	Universal	Most Bayesian packages

AI/ML Applications

Credible intervals are increasingly important in modern machine learning, where quantifying uncertainty is crucial for trustworthy AI systems.

Uncertainty Quantification in Deep Learning

🧠 Bayesian Neural Networks

Instead of learning point estimates for weights, Bayesian neural networks maintain posterior distributions over weights. Predictions come with credible intervals that reflect uncertainty.

MC Dropout

Apply dropout at test time, run multiple forward passes, use prediction variance as uncertainty. Credible intervals from the distribution of outputs.

Variational Inference

Approximate the posterior over weights with a tractable distribution. Sample weights to get prediction distribution and credible intervals.

Why This Matters: In safety-critical applications like medical diagnosis or autonomous driving, a model that says "I predict class A with 95% credible interval [0.6, 0.9]" is far more useful than one that simply says "class A." When the credible interval is wide, the system can defer to human judgment.

Bayesian Optimization

🎯 Hyperparameter Tuning with Credible Intervals

Gaussian Process models in Bayesian optimization provide posterior distributions over the objective function. Credible intervals guide the exploration-exploitation trade-off.

Acquisition Functions: Expected Improvement (EI), Upper Confidence Bound (UCB), and other acquisition functions use the posterior mean and credible interval width to decide where to sample next. Wide credible intervals indicate regions worth exploring.

Thompson Sampling and Multi-Armed Bandits

🎰 Thompson Sampling

Thompson sampling is a Bayesian bandit algorithm that maintains posterior distributions over arm rewards. At each step, it samples from each arm's posterior and pulls the arm with highest sample.

Step 1

Maintain Beta posteriors for each arm

Step 2

Sample from each posterior

Step 3

Pull arm with highest sample

Connection to Credible Intervals: Arms with wider credible intervals are more likely to produce extreme samples, encouraging exploration. As data accumulates, credible intervals shrink, naturally shifting toward exploitation.

Practical Tip: When implementing Thompson sampling for A/B testing, use Beta-Binomial conjugacy. The posterior is Beta(α + successes, β + failures), and 95% credible intervals give you immediate insight into uncertainty about each variant's true conversion rate.

Python Implementation

Here's a complete implementation for computing credible intervals from Beta posteriors, along with integration examples for PyMC. Click on any highlighted line for explanation.

Computing Credible Intervals

🐍credible_intervals.py

Explanation(8)

Code(79)

1Imports

NumPy for numerical operations and SciPy's stats module provides beta distribution functions we need for Bayesian inference.

5Main Function Signature

This function computes both equal-tailed and HPD credible intervals from a Beta posterior. Beta posteriors arise naturally from binomial data with Beta priors (conjugate pairs).

17Equal-Tailed Interval

Uses the inverse CDF (percent point function) to find quantiles. For 95% CI, we find the 2.5th and 97.5th percentiles of the posterior distribution.

EXAMPLE

For a symmetric posterior, equal-tailed interval equals HPD

23HPD Algorithm

The HPD interval is the shortest interval containing the specified probability mass. We search over different ways to split the tail probability to find the minimum width interval.

27Grid Search for HPD

Try different allocations of the (1-cred_level) tail probability between lower and upper tails. The allocation that gives the shortest interval is the HPD.

EXAMPLE

For skewed posteriors, unequal tail allocation gives shorter intervals

36Posterior Statistics

The posterior mean is α/(α+β), the posterior mode is (α-1)/(α+β-2) for α,β > 1, and posterior variance is αβ/((α+β)²(α+β+1)).

45PyMC Integration Example

Modern probabilistic programming libraries like PyMC compute credible intervals automatically using MCMC samples from the posterior distribution.

49ArviZ Summary

ArviZ's summary function provides HDI (Highest Density Interval, same as HPD) by default. The hdi_prob parameter controls the credible level.

71 lines without explanation

1import numpy as np
2from scipy import stats
3
4
5def credible_interval(alpha, beta, cred_level=0.95, method="equal-tailed"):
6    """
7    Compute credible interval for Beta posterior.
8
9    Parameters
10    ----------
11    alpha, beta : float
12        Beta distribution parameters (posterior)
13    cred_level : float
14        Credible level (e.g., 0.95 for 95% CI)
15    method : str
16        "equal-tailed" or "hpd"
17    """
18    # Equal-tailed interval
19    tail_prob = (1 - cred_level) / 2
20    et_lower = stats.beta.ppf(tail_prob, alpha, beta)
21    et_upper = stats.beta.ppf(1 - tail_prob, alpha, beta)
22
23    if method == "equal-tailed":
24        return {"lower": et_lower, "upper": et_upper, "width": et_upper - et_lower}
25
26    # HPD: find shortest interval by grid search
27    best_lower, best_upper = et_lower, et_upper
28    min_width = best_upper - best_lower
29
30    for i in range(101):
31        lower_tail = (1 - cred_level) * i / 100
32        upper_tail = (1 - cred_level) - lower_tail
33
34        lower = stats.beta.ppf(lower_tail, alpha, beta)
35        upper = stats.beta.ppf(1 - upper_tail, alpha, beta)
36        width = upper - lower
37
38        if width < min_width:
39            min_width = width
40            best_lower, best_upper = lower, upper
41
42    # Compute posterior statistics
43    posterior_mean = alpha / (alpha + beta)
44    posterior_mode = (alpha - 1) / (alpha + beta - 2) if alpha > 1 and beta > 1 else posterior_mean
45    posterior_var = (alpha * beta) / ((alpha + beta)**2 * (alpha + beta + 1))
46
47    return {
48        "lower": best_lower, "upper": best_upper, "width": min_width,
49        "mean": posterior_mean, "mode": posterior_mode, "std": np.sqrt(posterior_var)
50    }
51
52# PyMC example (requires pymc and arviz installed)
53def bayesian_proportion_with_pymc(successes, failures, prior_alpha=1, prior_beta=1):
54    import pymc as pm
55    import arviz as az
56
57    with pm.Model() as model:
58        theta = pm.Beta("theta", alpha=prior_alpha, beta=prior_beta)
59        y = pm.Binomial("y", n=successes+failures, p=theta, observed=successes)
60        trace = pm.sample(2000, return_inferencedata=True)
61
62    # Get 95% HDI (ArviZ's term for HPD)
63    summary = az.summary(trace, hdi_prob=0.95)
64    return summary
65
66
67# Example usage
68prior_alpha, prior_beta = 2, 2  # Prior: Beta(2,2)
69successes, failures = 14, 6     # Observed data
70
71# Posterior is Beta(prior_alpha + successes, prior_beta + failures)
72post_alpha = prior_alpha + successes  # 16
73post_beta = prior_beta + failures     # 8
74
75result = credible_interval(post_alpha, post_beta, method="hpd")
76print(f"Posterior: Beta({post_alpha}, {post_beta})")
77print(f"95% HPD Interval: [{result['lower']:.4f}, {result['upper']:.4f}]")
78print(f"Posterior Mean: {result['mean']:.4f}")
79print(f"Posterior Mode: {result['mode']:.4f}")

Common Pitfall: Don't confuse the prior parameters with the posterior parameters! For Beta-Binomial, the posterior is Beta(α_prior + successes, β_prior + failures). The prior "pseudo-counts" add to the actual observed counts.

Knowledge Check

Test your understanding of credible intervals with this comprehensive quiz. Pay attention to the philosophical differences between Bayesian and frequentist interpretations.

Knowledge Check

Question 1 of 8

What is the fundamental difference between how frequentist and Bayesian frameworks treat the parameter θ?

Score: 0/0

Summary

Key Takeaways

Credible intervals allow direct probability statements: Unlike confidence intervals, we can say "there is a 95% probability θ is in this interval."
Equal-tailed vs HPD: Equal-tailed intervals exclude equal probability from each tail; HPD intervals are the shortest possible. They coincide for symmetric posteriors.
Priors matter: Credible intervals depend on both the prior and the data. With uninformative priors and large samples, they approximate frequentist CIs.
Bayesian interpretation: The parameter θ is treated as random with a posterior distribution; the interval is fixed once computed.
ML applications: Uncertainty quantification in neural networks, Bayesian optimization, Thompson sampling, and any application requiring probabilistic reasoning about unknowns.

The Big Picture: Credible intervals complete the Bayesian inference workflow. Start with a prior encoding initial beliefs, update with data via Bayes' theorem to get the posterior, then extract credible intervals to summarize uncertainty. This framework provides a coherent, probability-based approach to quantifying what we know and don't know about the world—essential for building trustworthy AI systems.

Learning Objectives

📚 Core Knowledge

🔧 Practical Skills

The Big Picture: A Different Philosophy

The Key Philosophical Shift

Historical Context

Thomas Bayes (1763) & Pierre-Simon Laplace (1774)

What Is a Credible Interval?

Formal Definition

Definition: (1−α)(1-\alpha)(1−α) Credible Interval

Symbol Table

The Bayesian Interpretation

Interactive: Confidence vs Credible Intervals

Confidence Intervals vs Credible Intervals

Frequentist: Confidence Interval

Bayesian: Credible Interval

Key Philosophical Differences

Computing Credible Intervals

Equal-Tailed Intervals

Equal-Tailed Interval Formula

Highest Posterior Density (HPD) Intervals

HPD Interval Definition

Interactive: Step-by-Step Computation

Step-by-Step: Computing Credible Intervals

Step 1: Choose a Prior

Interactive: HPD vs Equal-Tailed Explorer

Credible Interval Explorer: Equal-Tailed vs HPD

Equal-Tailed Interval

HPD (Highest Posterior Density)

Interactive: How the Posterior Forms

Posterior = Likelihood × Prior

When to Use Each Approach

Use Equal-Tailed Intervals When

Use HPD Intervals When

AI/ML Applications

Uncertainty Quantification in Deep Learning

🧠 Bayesian Neural Networks

Bayesian Optimization

🎯 Hyperparameter Tuning with Credible Intervals

Thompson Sampling and Multi-Armed Bandits

🎰 Thompson Sampling

Python Implementation

Knowledge Check

Knowledge Check

Summary

Key Takeaways

Definition: $(1-\alpha)$ Credible Interval