Chapter 5
40 min read
Section 35 of 175

Gamma Distribution - Deep Dive

Continuous Distributions

Learning Objectives

By the end of this section, you will be able to:

  1. Define the Gamma distribution and understand both shape-rate and shape-scale parameterizations
  2. Derive the Gamma distribution as a sum of independent Exponential random variables
  3. Understand deeply why Gamma models "time until the k-th event" in a Poisson process
  4. Recognize Exponential and Chi-square as special cases of the Gamma family
  5. Apply Gamma distribution to queueing, reliability, and rainfall modeling
  6. Use Gamma as a conjugate prior for Bayesian inference with Poisson and Exponential likelihoods
  7. Calculate mean, variance, and mode from the distribution parameters
  8. Implement Gamma distribution operations in Python
  9. Identify AI/ML applications including Bayesian neural networks and attention mechanisms

Deep Intuition: Waiting for Multiple Events

"If Exponential is waiting for the first bus, Gamma is waiting for the k-th bus to arrive."

The Gamma distribution answers a natural question: if events occur randomly over time (following a Poisson process), how long do we wait until k events have occurred?

The Core Insight

Gamma(k, λ) is the sum of k independent Exponential(λ) random variables.

Since Exponential models the time until one event, and events are independent, the time until k events is simply the sum of k waiting times.

T1+T2++TkGamma(k,λ)T_1 + T_2 + \cdots + T_k \sim \text{Gamma}(k, \lambda)

where each TiExp(λ)T_i \sim \text{Exp}(\lambda).

The Shape Controls Everything

The shape parameter α\alpha (or k) fundamentally determines the distribution's behavior:

α = 1: Exponential

Pure exponential decay. Memoryless. Waiting for just one event.

α = 2-5: Right-Skewed

A mode appears. Still asymmetric, but with a peak away from zero.

α → ∞: Bell-Shaped

Approaches Normal by CLT. Symmetric, predictable center.


The Historical Story

The Gamma distribution emerges from one of mathematics' most beautiful discoveries—the extension of the factorial function to all numbers.

Leonhard Euler (1729)

Discovered the Gamma function Γ(α)\Gamma(\alpha) while trying to extend the factorial n! = n × (n-1) × ... × 1 to non-integer values. He found an integral that matched n! for integers but worked for all positive numbers.

Karl Pearson (1893)

Systematically studied the Gamma distribution as part of his family of continuous distributions. He showed how varying the shape parameter creates a rich family from exponential to bell-shaped.

A.K. Erlang (1909)

Applied Gamma with integer shape (now called Erlang distribution) to telephone traffic analysis. His work founded queueing theory and showed Gamma naturally models waiting times.

Modern Applications

Today, Gamma is essential in Bayesian statistics (conjugate priors), machine learning (precision in neural networks), and reliability engineering (time to failure).


Why Do We Need the Gamma Distribution?

The Gamma distribution fills a crucial niche in probability theory: modeling positive continuous data with flexible shape.

⏱️
Sum of Wait Times
🎯
Bayesian Priors
📊
Chi-square Parent
🔧
Reliability Models
DomainWhy Gamma Is Used
Queueing TheoryTime for k customers to be served
ReliabilityTime until the k-th failure in a system
HydrologyTotal rainfall amount over a period
InsuranceAggregate claim amounts
Bayesian StatsConjugate prior for Poisson/Exponential rates
Machine LearningPrecision (inverse variance) in neural networks
Statistical TestingChi-square is Gamma with α=ν/2, β=1/2

What Data Can We Model?

USE Gamma When:

  • Strictly positive continuous data
  • Right-skewed distributions (but can be symmetric for large α)
  • Sum of exponentials - waiting times, processing times
  • Rainfall amounts over a time period
  • Insurance claims and financial losses
  • Prior for rate parameters in Bayesian models
  • Chi-square test statistics (special case)

Do NOT Use Gamma When:

  • Data can be negative → Use Normal, t-distribution
  • Data is bounded (0 to 1) → Use Beta
  • Symmetric, bell-shaped is needed → Use Normal
  • Heavy left tail → Consider other distributions
  • Discrete counts → Use Poisson, Negative Binomial

When to Choose Gamma vs. Exponential

If you're modeling time until one event, use Exponential. If you're modeling time until multiple events (or a sum of times), use Gamma. Exponential is just Gamma with α = 1.


Mathematical Definition

There are two common parameterizations of the Gamma distribution. This is a major source of confusion—always verify which one you're using!

Shape-Rate Parameterization (α, β)

f(x;α,β)=βαΓ(α)xα1eβxfor x>0f(x; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} \quad \text{for } x > 0
SymbolNameMeaning
αShapeNumber of events to wait for (can be non-integer)
βRateHow fast events occur (inverse of scale)
Γ(α)Gamma functionNormalization constant
x^(α-1)Power termCreates the rising portion for α > 1
e^(-βx)Decay termCreates exponential decay (from Exp)

Shape-Scale Parameterization (k, θ)

f(x;k,θ)=1Γ(k)θkxk1ex/θfor x>0f(x; k, \theta) = \frac{1}{\Gamma(k) \theta^k} x^{k-1} e^{-x/\theta} \quad \text{for } x > 0

The relationship between parameterizations:

k=α,θ=1β,β=1θk = \alpha, \quad \theta = \frac{1}{\beta}, \quad \beta = \frac{1}{\theta}

Critical: Know Your Parameterization!

Different software uses different conventions:

  • SciPy: Uses (shape, scale) = (α, 1/β)
  • NumPy: Uses (shape, scale) = (k, θ)
  • Stan/JAGS: Uses (shape, rate) = (α, β)

Always check the mean! Mean = α/β = kθ. If this doesn't match, you have the wrong parameterization.

Summary of Moments

PropertyShape-Rate (α, β)Shape-Scale (k, θ)
Meanα / β
Varianceα / β²kθ²
Mode(α-1) / β if α ≥ 1(k-1)θ if k ≥ 1
Skewness2 / √α2 / √k

The Gamma Function: Extending Factorial

The Gamma function is one of the most important special functions in mathematics. It extends the factorial to all complex numbers (except negative integers).

Γ(α)=0tα1etdt\Gamma(\alpha) = \int_0^\infty t^{\alpha-1} e^{-t} \, dt

Key Properties

PropertyFormulaExplanation
Factorial connectionΓ(n) = (n-1)! for n ∈ ℤ⁺Γ(5) = 4! = 24
RecursiveΓ(α+1) = α · Γ(α)Like n! = n × (n-1)!
Γ(1) = 1Base caseSince 0! = 1
Γ(1/2) = √πFamous resultFrom Gaussian integral

Why the Shift?

You might wonder why Γ(n) = (n-1)! instead of Γ(n) = n!. This comes from the historical definition of the integral. It's a minor annoyance that mathematicians have debated for centuries!

Some useful values:

α123451/23/2
Γ(α)112624√π ≈ 1.77√π/2 ≈ 0.89

Exploring the Distribution

Use this interactive visualizer to explore how the Gamma distribution behaves. Adjust the shape (α) and rate (β) parameters and observe:

📊Gamma Distribution Explorer
Erlang

Controls shape: α=1 is Exponential, larger α makes it more bell-shaped

Higher rate → faster decay, smaller mean

Statistics

Mean (μ):3.0000
Variance:3.0000
Std Dev:1.7321
Mode:2.0000
Skewness:1.1547(always positive)
Notation: X ~ Gamma(3.0, 1.0)
Mean = α/β = 3.0/1.0 = 3.000
μ = 3.00modexf(x)0.02.55.07.510.0

The PDF shows the probability density at each point x. Higher density means values are more likely in that region.

Current Distribution

f(x) = (βα / Γ(α)) × xα-1 × e-βx

f(x) = (1.003.00 / Γ(3.00)) × x2.00 × e-1.00x

What Do You Notice?

  • α = 1: The distribution is Exponential—starts at maximum and decays
  • α > 1: A mode appears away from zero, creating a peak
  • Increasing α: The distribution becomes more bell-shaped and symmetric
  • Increasing β: The distribution shifts left and becomes more concentrated
  • Mode < Mean: For α > 1, the mode is always less than the mean (right-skewed)

The Exponential-Gamma Connection

The most important property of Gamma is its relationship to Exponential:

Theorem: If T1,T2,,TkT_1, T_2, \ldots, T_k are independent Exponential(λ) random variables, then:
X=T1+T2++TkGamma(k,λ)X = T_1 + T_2 + \cdots + T_k \sim \text{Gamma}(k, \lambda)

This theorem explains why Gamma appears whenever we sum exponential waiting times. See it in action:

🔗Gamma as Sum of Exponentials

Key Insight: If T₁, T₂, ..., Tk are independent Exponential(λ) random variables, then their sumX = T₁ + T₂ + ... + Tkfollows a Gamma(k, λ) distribution.

Sum k independent Exp(λ) random variables

Statistics Comparison

Theoretical Mean
k/λ = 3.0000
Sample Mean
2.8829
Theoretical Var
k/λ² = 3.0000
Sample Var
2.7037

Sample Breakdown (First 3 Samples)

#T1T2T3Sum (Gamma)
10.8730.7342.4644.071
21.2001.3991.1693.768
30.7360.1650.0500.951
Simulated SamplesGamma(3, 1) PDFX = T₁ + T₂ + ... + T3Density0.02.85.58.311.1

The histogram shows the distribution of the sum of 3 independent Exp(1) random variables. The red curve is the theoretical Gamma(3, 1) PDF.

Why This Works (MGF Proof)

The MGF of Exponential(λ) is MT(t) = λ/(λ-t)

For independent RVs, MGF of sum = product of MGFs:

MX(t) = [λ/(λ-t)]k

This is exactly the MGF of Gamma(k, λ)! ✓

Proof via Moment Generating Functions

The proof is elegant using MGFs. For independent random variables, the MGF of a sum is the product of individual MGFs:

MX(t)=MT1(t)MT2(t)MTk(t)=(λλt)kM_X(t) = M_{T_1}(t) \cdot M_{T_2}(t) \cdots M_{T_k}(t) = \left(\frac{\lambda}{\lambda - t}\right)^k

This is exactly the MGF of Gamma(k, λ)! Since MGFs uniquely identify distributions, we've proven the result.

The Sum Property

If XGamma(α1,β)X \sim \text{Gamma}(\alpha_1, \beta) and YGamma(α2,β)Y \sim \text{Gamma}(\alpha_2, \beta) are independent with the same rate, then:

X+YGamma(α1+α2,β)X + Y \sim \text{Gamma}(\alpha_1 + \alpha_2, \beta)

Shapes add, rate stays the same! This is why Gamma is so natural for sums.


Waiting for the k-th Event

Let's visualize the Gamma distribution in its natural habitat: a Poisson process. Watch events occur randomly, and see how the waiting time for every k events follows a Gamma distribution:

⏱️Waiting for the k-th Event

Watch events occur randomly on a timeline (Poisson process with rate λ). The time to wait for the k-th event follows a Gamma(k, λ) distribution. Every k events, we record the waiting time and reset.

Event Timeline

05101520t=0Time
Regular event
k-th event (recorded)

Simulation Statistics

Events observed:0
Waiting times collected:0
Sample mean:-
Theoretical mean (k/λ):1.500

Recent Waiting Times

Start the simulation to collect waiting times...

What You're Seeing

Each time 3 events occur, we record how long we waited. As you collect more samples, the histogram converges to the Gamma(3, 2) distribution. This demonstrates that Gamma models "time to the k-th event" in a Poisson process!

Real-World Interpretation

Imagine you're at a coffee shop where customers arrive randomly at rate λ customers per minute. If there are 3 people ahead of you, your waiting time follows Gamma(3, λ)!


The Gamma Family: Special Cases

The Gamma distribution is the parent of several important distributions. Understanding Gamma means understanding an entire family:

🌳The Gamma Distribution Family
Gamma(α, β)
Exponential
α = 1
Erlang
α ∈ ℤ⁺
Chi-square
α=ν/2, β=1/2
ExponentialErlangChi-squareGammaxf(x)051015
DistributionAs GammaMeanVariance
Exponential(λ)Gamma(1, λ)1/λ1/λ²
Erlang(k, λ)Gamma(k, λ)k/λk/λ²
Chi-square(ν)Gamma(ν/2, 1/2)ν
General GammaGamma(α, β)α/βα/β²

Key Insight

All these distributions are special cases of Gamma. Understanding Gamma means understanding an entire family of distributions used across statistics, engineering, and ML!

Why Chi-Square Matters

The Chi-square distribution is critical for statistical inference:

χ2(ν)=Gamma(ν2,12)\chi^2(\nu) = \text{Gamma}\left(\frac{\nu}{2}, \frac{1}{2}\right)

Chi-square arises when you sum squared standard normals:

Z12+Z22++Zν2χ2(ν) where ZiN(0,1)Z_1^2 + Z_2^2 + \cdots + Z_\nu^2 \sim \chi^2(\nu) \text{ where } Z_i \sim N(0, 1)

The Chi-Square Connection

This explains why the Gamma function appears in so many statistical formulas! The t-test, F-test, and chi-square test all involve Gamma distributions through their connection to Chi-square.


Key Properties

PropertyFormulaInterpretation
MeanE[X] = α/βAverage waiting time
VarianceVar(X) = α/β²Spread of waiting times
Mode(α-1)/β if α ≥ 1, else 0Most likely value
Skewness2/√αRight-skewed, decreases with α
Kurtosis (excess)6/αHeavier tails for small α
CV1/√αCoefficient of variation
MGF(β/(β-t))^α for t < βMoment generating function

Memoryless? No!

Unlike Exponential, Gamma is NOT memoryless. If you've been waiting for 2 events and one has already occurred, you know something—and that affects your expected remaining wait time.

Why Gamma Remembers

Exponential: "I don't care how long you've waited—the remaining time has the same distribution."

Gamma(k>1): "I know how many events have occurred. My expected remaining time depends on this history."


Bayesian Applications: Conjugate Priors

One of Gamma's most powerful applications is as a conjugate prior in Bayesian inference. When the prior and posterior belong to the same family, calculations become simple closed-form updates.

🎯Gamma as Conjugate Prior

Conjugate Prior: When the prior and posterior belong to the same family. Gamma is conjugate for the Poisson rate and Exponential rate parameters. Watch the posterior update as you add data!

Model: X1, ..., Xn ~ Poisson(λ)
Prior: λ ~ Gamma(α, β)
Posterior: λ | data ~ Gamma(α + Σxi, β + n)

Prior Parameters

Prior mean: 2.00

Data Controls

Data points collected: 0
True λ (hidden): 3
True λ = 3PriorPosteriorTrue λλ (rate parameter)Density02468

Prior: Gamma(2, 1)

Mean: 2.000
Variance: 2.000

Posterior: Gamma(2.0, 1.00)

Mean: 2.000
Variance: 2.000

What You're Seeing

As you add more data, the posterior (purple) concentrates around the true λ (green line). The prior belief gets overwhelmed by the evidence. This is Bayesian learning in action!

Notice how the posterior remains a Gamma distribution—that's the power of conjugate priors: simple closed-form updates.

Why Conjugate Priors Matter

For Poisson data with Gamma prior:

Prior: λGamma(α,β)\text{Prior: } \lambda \sim \text{Gamma}(\alpha, \beta)
Likelihood: X1,,XnPoisson(λ)\text{Likelihood: } X_1, \ldots, X_n \sim \text{Poisson}(\lambda)
Posterior: λxGamma(α+i=1nxi,β+n)\text{Posterior: } \lambda | \mathbf{x} \sim \text{Gamma}\left(\alpha + \sum_{i=1}^n x_i, \beta + n\right)

The update rules are simple:

  • Shape increases by the sum of observations (more evidence → more concentrated)
  • Rate increases by the sample size (more data → more confident)

Interpreting the Prior

A Gamma(α, β) prior for a Poisson rate can be interpreted as having seen α-1 "pseudo-events" in β "pseudo-time units" before collecting real data.


Real-World Applications

1. Queueing Theory (Erlang Distribution)

Call Center Wait Times

A.K. Erlang pioneered the use of Gamma for telephone traffic. If calls take an average of 2 minutes to handle (Exp with rate 0.5/min), the time for 5 calls follows Gamma(5, 0.5).

Example: Expected wait for 5 calls = 5/0.5 = 10 minutes
Variance = 5/0.25 = 20 min², so std dev ≈ 4.5 minutes

2. Reliability Engineering

Time to k-th Failure

In a system with redundancy, you might have backup components. If components fail independently with exponential lifetimes, the time until k failures (system failure) follows Gamma.

Example: A server cluster with 3 redundant nodes. Time until all 3 fail ~ Gamma(3, λ) where λ is the failure rate.

3. Hydrology and Rainfall

Precipitation Modeling

Gamma is widely used to model rainfall amounts. Total precipitation over a period is approximately Gamma-distributed, making it useful for flood risk and agricultural planning.

4. Insurance Claims

Aggregate Claims

Individual claim sizes often follow Gamma or related distributions. Understanding claim distributions is essential for pricing insurance and maintaining solvency.


AI/ML Applications

Gamma distribution appears throughout machine learning, often in places you might not expect:

1. Bayesian Neural Networks

Precision Priors

In Bayesian neural networks, we often use Gamma priors for the precision (inverse variance) of weight distributions:

τGamma(α,β),wτN(0,1/τ)\tau \sim \text{Gamma}(\alpha, \beta), \quad w | \tau \sim N(0, 1/\tau)

This hierarchical model allows the network to learn uncertainty about its own weights.

🐍bnn_prior.py
1# Bayesian Neural Network with Gamma precision prior
2import pymc as pm
3
4with pm.Model():
5    # Precision prior (inverse variance)
6    tau = pm.Gamma('tau', alpha=1, beta=1)
7
8    # Weight prior given precision
9    weights = pm.Normal('weights', mu=0, tau=tau, shape=(n_input, n_hidden))
10
11    # This models uncertainty about weight variance!

2. Attention Mechanisms

Concentration Parameters

In attention mechanisms using Dirichlet distributions, the concentration parameter can be modeled with a Gamma distribution. This controls how "focused" or "spread out" the attention is.

3. Point Processes

Event Modeling

When modeling sequences of events (like user clicks, financial transactions, or network packets), Gamma-based models capture temporal dependencies.

  • Hawkes processes with Gamma kernels
  • Inter-event time modeling
  • Temporal point process intensity functions

4. Variational Inference

Variational Families

Gamma is used as a variational family for positive parameters. Computing the KL divergence between two Gamma distributions has a closed form, making optimization tractable.

🐍gamma_kl.py
1import torch
2from torch.distributions import Gamma, kl_divergence
3
4# Two Gamma distributions
5q = Gamma(concentration=3.0, rate=1.0)
6p = Gamma(concentration=2.0, rate=1.0)
7
8# KL divergence has closed form!
9kl = kl_divergence(q, p)  # KL(q || p)

Python Implementation

Basic Operations with SciPy

🐍gamma_basics.py
1import numpy as np
2from scipy import stats
3
4# Create Gamma distribution: Gamma(α=3, β=2) in shape-rate form
5# IMPORTANT: scipy uses (shape, scale) where scale = 1/rate
6alpha, beta = 3, 2
7gamma_dist = stats.gamma(a=alpha, scale=1/beta)
8
9# PDF
10x = 1.5
11pdf_value = gamma_dist.pdf(x)
12print(f"f({x}) = {pdf_value:.6f}")
13
14# CDF
15cdf_value = gamma_dist.cdf(x)
16print(f"P(X ≤ {x}) = {cdf_value:.4f}")
17
18# Mean and variance
19print(f"Mean: {gamma_dist.mean():.4f}")  # α/β = 1.5
20print(f"Var: {gamma_dist.var():.4f}")    # α/β² = 0.75
21
22# Percentile (inverse CDF)
23percentile_95 = gamma_dist.ppf(0.95)
24print(f"95th percentile: {percentile_95:.4f}")
25
26# Generate samples
27samples = gamma_dist.rvs(size=10000)
28print(f"Sample mean: {samples.mean():.4f}")
29print(f"Sample var: {samples.var():.4f}")

Verifying the Sum Property

🐍gamma_sum.py
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5# Sum of k exponentials should be Gamma(k, λ)
6k = 5
7lambda_rate = 2.0
8n_samples = 10000
9
10# Method 1: Sum of exponentials
11exp_samples = np.random.exponential(1/lambda_rate, (n_samples, k))
12sum_samples = exp_samples.sum(axis=1)
13
14# Method 2: Direct Gamma sampling
15gamma_samples = stats.gamma(a=k, scale=1/lambda_rate).rvs(n_samples)
16
17# Compare distributions
18print(f"Sum of Exp - Mean: {sum_samples.mean():.3f}, Var: {sum_samples.var():.3f}")
19print(f"Gamma direct - Mean: {gamma_samples.mean():.3f}, Var: {gamma_samples.var():.3f}")
20print(f"Theoretical - Mean: {k/lambda_rate:.3f}, Var: {k/lambda_rate**2:.3f}")
21
22# They should match!

Bayesian Update with Poisson Data

🐍bayesian_gamma.py
1import numpy as np
2from scipy import stats
3
4# Prior: Gamma(2, 1) for Poisson rate λ
5prior_alpha, prior_beta = 2, 1
6
7# Observed data: counts from Poisson(λ)
8data = [3, 2, 5, 4, 3, 6, 2, 4]  # 8 observations
9
10# Posterior update (conjugate!)
11n = len(data)
12sum_x = sum(data)
13
14posterior_alpha = prior_alpha + sum_x  # 2 + 29 = 31
15posterior_beta = prior_beta + n        # 1 + 8 = 9
16
17print(f"Prior: Gamma({prior_alpha}, {prior_beta})")
18print(f"  Mean: {prior_alpha/prior_beta:.3f}")
19
20print(f"\nData: n={n}, sum={sum_x}")
21print(f"  Sample mean: {sum_x/n:.3f}")
22
23print(f"\nPosterior: Gamma({posterior_alpha}, {posterior_beta})")
24print(f"  Mean: {posterior_alpha/posterior_beta:.3f}")
25
26# 95% credible interval for λ
27posterior = stats.gamma(a=posterior_alpha, scale=1/posterior_beta)
28ci_low, ci_high = posterior.ppf([0.025, 0.975])
29print(f"  95% CI: ({ci_low:.3f}, {ci_high:.3f})")

Common Pitfalls

Parameterization Confusion (Most Common Error!)

This is the #1 source of bugs. Always verify with the mean:

🐍param_check.py
1from scipy import stats
2
3# You want Gamma(α=3, β=2) in shape-rate form
4# Mean should be α/β = 1.5
5
6# WRONG: passing rate as second argument
7wrong = stats.gamma(3, 2)
8print(f"Wrong mean: {wrong.mean()}")  # 6.0 - WRONG!
9
10# RIGHT: scale = 1/rate
11right = stats.gamma(a=3, scale=1/2)
12print(f"Right mean: {right.mean()}")  # 1.5 - CORRECT!
13
14# ALWAYS check!

Confusing with Normal for Large α

As α → ∞, Gamma approaches Normal. But for moderate α (say, α < 30), the distribution is still noticeably right-skewed. Don't assume normality without checking!

Forgetting the Support

Gamma is defined only for x > 0. If your data can be negative or exactly zero, Gamma is not appropriate. Zero-inflated models may be needed if you have many zeros.

Chi-square Relationship

Remember that χ²(ν) = Gamma(ν/2, 1/2). It's easy to mix up the parameters. If ν = 10 degrees of freedom, that's Gamma(5, 0.5), not Gamma(10, 0.5).


Test Your Understanding

📝Test Your Understanding
Question 1 of 7

If X ~ Gamma(α, β) with shape-rate parameterization, what is E[X]?

Current Score: 0 / 0

Summary

The Gamma distribution is a versatile tool that models positive continuous data with flexible shape. It's the sum of exponentials, the parent of Chi-square, and a natural conjugate prior for Bayesian inference.

Key Formulas

PropertyShape-Rate (α, β)Shape-Scale (k, θ)
PDFβ^α / Γ(α) × x^(α-1) × e^(-βx)1 / (Γ(k)θ^k) × x^(k-1) × e^(-x/θ)
Meanα / β
Varianceα / β²kθ²
Mode(α-1) / β if α ≥ 1(k-1)θ if k ≥ 1
Relationβ = 1/θθ = 1/β

Key Takeaways

  1. Gamma is the sum of Exponentials: If T₁, ..., Tₖ ~ iid Exp(λ), then T₁ + ... + Tₖ ~ Gamma(k, λ)
  2. Two parameterizations exist: Shape-rate (α, β) and shape-scale (k, θ). Always verify with the mean!
  3. Special cases: Exponential is Gamma(1, λ); Chi-square(ν) is Gamma(ν/2, 1/2)
  4. Shape controls skewness: Small α → right-skewed; large α → bell-shaped (approaches Normal)
  5. Conjugate prior: Gamma is conjugate for Poisson and Exponential rate parameters
  6. NOT memoryless: Unlike Exponential, Gamma "remembers" how many events have occurred
  7. ML applications: Precision priors in Bayesian NNs, attention concentration, point processes
The Essence of Gamma:
"Gamma is the patient distribution—it models how long you wait for multiple events. From telephone traffic to neural networks, it captures the sum of random waiting times."
Coming Next: In the next section, we'll explore the Beta Distribution—the distribution of probabilities. You'll see how it models uncertainty about unknown probabilities and serves as the foundation of Bayesian A/B testing.
Loading comments...