Learning Objectives
By the end of this section, you will:
- Understand the Geometric distribution as the waiting time until first success
- Know both parameterizations: "trials until success" vs "failures before success"
- Master the memoryless property and its profound implications
- Extend to the Negative Binomial distribution for r successes
- Prove that NegBin(r, p) = sum of r independent Geometric(p) random variables
- Apply these distributions to AI/ML: retry mechanisms, convergence analysis, early stopping
Historical Context: Waiting Time Problems
From Gambling Tables to Machine Learning
The Geometric distribution emerged from fundamental questions about games of chance in the 17th and 18th centuries. Mathematicians like Pierre-Simon Laplace asked:
- "How many times must I roll a die before getting a 6?"
- "How many cards must I draw before finding an ace?"
- "How many coin flips until the first heads?"
These "waiting time" questions revealed the Geometric distribution and its remarkable memoryless property—a discovery that still surprises students today.
Why This Matters for AI/ML: Geometric and Negative Binomial distributions model fundamental patterns in machine learning systems: API retries, training convergence, packet retransmission, and sequential decision-making. Understanding these distributions helps you design robust, predictable systems.
The Geometric Distribution: Waiting for Success
The Core Question
Consider a Bernoulli trial with success probability p. We keep performing independent trials until we see the first success. The Geometric distribution answers: How many trials does this take?
Definition: Geometric Distribution
A random variable X follows a Geometric distribution with parameter p if X represents the number of trials until (and including) the first success, where each trial succeeds independently with probability p.
Two Parameterizations
Convention 1: X = Trials until success
Support: k = 1, 2, 3, ...
Minimum value: 1 (at least one trial needed)
Convention 2: Y = Failures before success
Support: k = 0, 1, 2, ...
Minimum value: 0 (success on first trial)
Understanding the PMF Formula
The PMF has a beautiful intuition:
- —the first (k-1) trials must all be failures. Each failure has probability (1-p), and they're independent.
- —the k-th trial must be a success with probability p.
We need: Fail, Fail, Fail, Success = (0.7)³ × 0.3 = 0.343 × 0.3 = 0.1029
Key Properties
| Property | Convention 1 (X = trials) | Convention 2 (Y = failures) |
|---|---|---|
| Support | {1, 2, 3, ...} | {0, 1, 2, ...} |
| PMF | (1-p)^(k-1) × p | (1-p)^k × p |
| Mean E[X] | 1/p | (1-p)/p |
| Variance | (1-p)/p² | (1-p)/p² |
| Mode | 1 | 0 |
| MGF | pe^t / (1-(1-p)e^t) | p / (1-(1-p)e^t) |
Interactive: Geometric PMF Explorer
Adjust the success probability p and explore how the Geometric distribution changes. Toggle between conventions to see how they differ. Notice that higher p leads to shorter expected waiting times.
Loading interactive demo...
Interactive: Waiting Time Simulator
Run experiments to see the Geometric distribution emerge naturally. Watch individual trials unfold, or run hundreds of experiments to see the histogram converge to the theoretical PMF.
Loading interactive demo...
The Memoryless Property: A Remarkable Feature
Mathematical Statement
The Memoryless Property
"Given that you've already waited s trials without success, the distribution of remaining wait time equals the original distribution—as if you just started."
Proof
Using the definition of conditional probability:
What This Means
The memoryless property is both mathematically beautiful and practically counterintuitive:
- No "due for success": After 10 failures, you're not "closer" to success. The expected remaining wait is still 1/p, exactly as if you just started.
- Independence preserved: The coin doesn't remember past flips. Each trial is fresh and independent.
- Unique property: The Geometric distribution is the only discrete distribution with this property. The Exponential distribution is its continuous analog.
Interactive: Memoryless Property Demo
Visualize the memoryless property. Set how many trials you've already waited, and see that the conditional distribution of remaining wait time equals the original distribution.
Loading interactive demo...
The Negative Binomial Distribution: Waiting for r Successes
Generalization of Geometric
The Geometric distribution answers: "How many trials until the first success?"
The Negative Binomial generalizes this: "How many trials until r successes?"
Definition: Negative Binomial Distribution
X ~ NegBin(r, p) represents the number of trials needed to achieve exactly r successes, where each trial succeeds independently with probability p.
Understanding the PMF
The PMF formula has three intuitive parts:
- inom{k-1}{r-1}—on trial k, we get the r-th success. In the previous (k-1) trials, we had exactly (r-1) successes. This counts the arrangements.
- —probability of r successes, each with probability p.
- —probability of (k-r) failures.
Key Properties
| Property | Formula |
|---|---|
| Support | {r, r+1, r+2, ...} |
| Mean | E[X] = r/p |
| Variance | Var(X) = r(1-p)/p² |
| Mode | ⌊r + (r-1)(1-p)/p⌋ for r ≥ 1 |
Interactive: Negative Binomial Explorer
Explore how the Negative Binomial distribution changes with r (successes needed) and p (success probability). Set r = 1 to see it become the Geometric distribution!
Loading interactive demo...
Interactive: Sum of Geometrics = Negative Binomial
A fundamental result: if are independent Geometric(p) random variables, then their sum follows NegBin(r, p). Watch this relationship in action!
Loading interactive demo...
Real-World Examples
Example 1: Quality Control Inspection
Problem: A quality inspector tests products. Each product has a 5% defect rate. How many products must be inspected, on average, to find the first defective one?
Solution: X ~ Geometric(0.05)
On average, 20 products must be inspected to find the first defect.
Example 2: Customer Acquisition
Problem: A salesperson has a 3% conversion rate. What's the probability of making the first sale within 50 calls?
Solution: X ~ Geometric(0.03)
There's about a 78.2% chance of making at least one sale within 50 calls.
Example 3: Clinical Trial Enrollment
Problem: A clinical trial needs 10 patients who meet specific criteria. 40% of screened patients qualify. How many patients must be screened on average?
Solution: X ~ NegBin(10, 0.4)
On average, 25 patients must be screened to find 10 who qualify.
AI/ML Applications
Geometric and Negative Binomial distributions are foundational in machine learning systems:
1. Retry Mechanisms in Distributed Systems
API calls, database connections, and network requests often fail probabilistically. The Geometric distribution models retries until success:
- Expected retries = 1/p (where p = success probability per attempt)
- Used to set timeout values and max retry limits
- Exponential backoff strategies account for this distribution
2. Training Convergence Analysis
If each training epoch has probability p of achieving target loss:
- Expected epochs to converge: E[X] = 1/p
- Use to set patience parameters for early stopping
- Helps estimate total training time
3. Sequential Testing & Early Stopping
In A/B testing and bandit algorithms:
- NegBin(r, p) models trials until r "significant" results
- Used in sequential hypothesis testing frameworks
- Helps balance exploration vs exploitation
4. Network Packet Retransmission
TCP and other protocols retransmit packets on failure:
- X ~ Geometric(1 - packet_loss_rate)
- Used for buffer sizing and timeout calculation
- Critical for real-time ML inference systems
Interactive: Retry & Convergence Demo
Explore how Geometric distributions model real AI/ML scenarios. Choose from API retries, training convergence, or network packets, and see how the observed distribution matches theoretical predictions.
Loading interactive demo...
Python Implementation
1import numpy as np
2from scipy.stats import geom, nbinom
3import matplotlib.pyplot as plt
4
5# ================================================
6# GEOMETRIC DISTRIBUTION
7# ================================================
8
9# scipy.stats.geom uses "number of trials" convention
10p = 0.3
11X = geom(p)
12
13# PMF - P(X = k)
14print(f"P(X=1) = {X.pmf(1):.4f}") # First trial success
15print(f"P(X=3) = {X.pmf(3):.4f}") # Success on third trial
16print(f"P(X=5) = {X.pmf(5):.4f}")
17
18# CDF - P(X <= k)
19print(f"\nP(X <= 5) = {X.cdf(5):.4f}")
20print(f"P(X > 10) = {1 - X.cdf(10):.4f}")
21
22# Mean and Variance
23print(f"\nE[X] = {X.mean():.4f}") # 1/p = 3.33
24print(f"Var(X) = {X.var():.4f}") # (1-p)/p² = 7.78
25
26# Sampling
27samples = X.rvs(size=10000)
28print(f"Sample mean: {np.mean(samples):.4f}")
29
30# ================================================
31# NEGATIVE BINOMIAL DISTRIBUTION
32# ================================================
33
34# scipy.stats.nbinom counts FAILURES before r successes
35# So for "trials until r successes", add r to the result
36
37r, p = 5, 0.4
38# nbinom(r, p) counts failures before r successes
39Y_failures = nbinom(r, p)
40
41# Mean number of FAILURES before r successes
42print(f"\nNegBin({r}, {p}) - failures convention:")
43print(f"E[failures] = {Y_failures.mean():.4f}") # r(1-p)/p
44
45# To get TOTAL TRIALS, add r to failures:
46def negbin_trials_pmf(k, r, p):
47 """PMF for total trials until r successes"""
48 if k < r:
49 return 0
50 # k trials = (k-r) failures before r-th success
51 return nbinom.pmf(k - r, r, p)
52
53# Expected total trials
54expected_trials = Y_failures.mean() + r # = r/p
55print(f"E[total trials] = {expected_trials:.4f}") # r/p = 12.5
56
57# ================================================
58# KEY RELATIONSHIP: Sum of Geometrics = NegBin
59# ================================================
60
61def demonstrate_sum_property(r, p, n_simulations=10000):
62 """Show that sum of r Geometric(p) = NegBin(r, p)"""
63
64 # Method 1: Sum of r independent Geometrics
65 geometric_sums = np.zeros(n_simulations)
66 for _ in range(r):
67 geometric_sums += geom.rvs(p, size=n_simulations)
68
69 # Method 2: Direct NegBin sampling (add r to get total trials)
70 negbin_samples = nbinom.rvs(r, p, size=n_simulations) + r
71
72 print(f"\nSum of {r} Geometric({p}) vs NegBin({r}, {p}):")
73 print(f"Geometric sum mean: {np.mean(geometric_sums):.4f}")
74 print(f"NegBin mean: {np.mean(negbin_samples):.4f}")
75 print(f"Theoretical E[X] = r/p: {r/p:.4f}")
76
77demonstrate_sum_property(5, 0.4)
78
79# ================================================
80# MEMORYLESS PROPERTY VERIFICATION
81# ================================================
82
83def verify_memoryless(p, s, n_simulations=100000):
84 """Verify P(X > s+t | X > s) = P(X > t)"""
85 samples = geom.rvs(p, size=n_simulations)
86
87 # Samples where X > s (condition)
88 conditional_samples = samples[samples > s]
89
90 # For these, compute remaining wait = X - s
91 remaining = conditional_samples - s
92
93 # Fresh samples for comparison
94 fresh_samples = geom.rvs(p, size=len(remaining))
95
96 print(f"\nMemoryless property (p={p}, s={s}):")
97 print(f"E[X | X > s] = {np.mean(conditional_samples):.4f}")
98 print(f"E[remaining after s] = {np.mean(remaining):.4f}")
99 print(f"E[fresh start] = {np.mean(fresh_samples):.4f}")
100 print(f"Theoretical E[X] = {1/p:.4f}")
101
102verify_memoryless(0.3, 5)
103
104# ================================================
105# AI/ML APPLICATION: Retry Analysis
106# ================================================
107
108def analyze_retries(success_prob, max_retries=10):
109 """Analyze retry mechanism using Geometric distribution"""
110 X = geom(success_prob)
111
112 print(f"\nRetry Analysis (success_prob = {success_prob}):")
113 print(f"Expected attempts: {X.mean():.2f}")
114 print(f"Std deviation: {X.std():.2f}")
115
116 # Probability of success within max_retries
117 p_success = X.cdf(max_retries)
118 print(f"P(success in ≤{max_retries} attempts): {p_success:.4f}")
119
120 # 95th percentile - worst case planning
121 p95 = X.ppf(0.95)
122 print(f"95th percentile (worst case): {p95:.0f} attempts")
123
124 return X
125
126# Example: API with 70% success rate
127analyze_retries(0.7)
128
129# ================================================
130# VISUALIZATION
131# ================================================
132
133fig, axes = plt.subplots(1, 2, figsize=(12, 4))
134
135# Plot 1: Geometric PMF for different p values
136ax1 = axes[0]
137k = np.arange(1, 21)
138for p in [0.2, 0.4, 0.6, 0.8]:
139 ax1.bar(k + (p-0.5)*0.15, geom.pmf(k, p), width=0.15,
140 label=f'p={p}', alpha=0.7)
141ax1.set_xlabel('k (trials until success)')
142ax1.set_ylabel('P(X = k)')
143ax1.set_title('Geometric PMF')
144ax1.legend()
145
146# Plot 2: Negative Binomial PMF for different r values
147ax2 = axes[1]
148k = np.arange(5, 31)
149p = 0.4
150for r in [5, 10, 15]:
151 probs = [negbin_trials_pmf(ki, r, p) for ki in k]
152 ax2.plot(k, probs, 'o-', label=f'r={r}', alpha=0.7, markersize=4)
153ax2.set_xlabel('k (trials until r successes)')
154ax2.set_ylabel('P(X = k)')
155ax2.set_title(f'Negative Binomial PMF (p={p})')
156ax2.legend()
157
158plt.tight_layout()
159plt.savefig('geometric_negbin.png', dpi=150)
160plt.show()Common Pitfalls
Different sources use different conventions (trials vs failures). SciPy's geom uses trials until success, but nbinom counts failures. Always check the documentation!
Believing that past failures make success more likely. The memoryless property proves this is false—each trial is independent of the past.
The Geometric distribution requires independent trials. If trials are correlated (e.g., learning from failures), use a different model.
Geometric: E[X] = 1/p (trials until success)
Negative Binomial: E[X] = r/p (trials until r successes)
Don't mix them up!
- Each trial has exactly two outcomes (success/failure)
- Trials are independent
- Success probability p is constant across all trials
- We're counting trials until a fixed number of successes
Test Your Understanding
Loading interactive demo...
Summary
Key Takeaways
- Geometric(p) models waiting time until the first success. Mean = 1/p, Variance = (1-p)/p².
- Memoryless property: P(X > s+t | X > s) = P(X > t). The Geometric is the only discrete memoryless distribution.
- Negative Binomial(r, p) models trials until r successes. Mean = r/p, Variance = r(1-p)/p².
- Key relationship: NegBin(1, p) = Geometric(p), and NegBin(r, p) = sum of r independent Geometric(p) random variables.
- AI/ML applications: Retry mechanisms, training convergence, network protocols, sequential testing, and early stopping.
Quick Reference
| Distribution | PMF | Mean | Variance |
|---|---|---|---|
| Geometric(p) | (1-p)^(k-1) × p | 1/p | (1-p)/p² |
| NegBin(r,p) | C(k-1,r-1) p^r (1-p)^(k-r) | r/p | r(1-p)/p² |
Looking Ahead: In the next section, we'll explore the Poisson distribution—what happens when we model the number of events in a fixed interval of time or space. The Poisson emerges as a limit of both Binomial (rare events) and provides the foundation for Poisson processes in machine learning.