Chapter 4
30 min read
Section 3 of 7

Poisson Distribution and Process

Discrete Distributions

Learning Objectives

By the end of this section, you will:

  • Understand the Poisson distribution as a model for counting rare events in fixed intervals
  • Derive the Poisson PMF from the Binomial via the Law of Rare Events
  • Master the signature property: E[X] = Var(X) = λ
  • Understand the Poisson Process and its connection to Exponential inter-arrival times
  • Know that sums of independent Poissons are Poisson
  • Apply to AI/ML: anomaly detection, NLP word frequencies, network traffic modeling

Historical Context: Siméon Denis Poisson

The Birth of the Law of Rare Events

Siméon Denis Poisson (1781-1840), a French mathematician, discovered this distribution while studying an unusual problem: wrongful convictions in court trials.

In his 1837 work "Recherches sur la probabilité des jugements," Poisson asked: If many people are tried, and each has a small probability of wrongful conviction, what is the distribution of total wrongful convictions?

His answer revealed a beautiful pattern: when events are rare but opportunities are many, the Binomial distribution simplifies to something elegant—what we now call the Poisson distribution.

Why the Poisson Distribution Matters

The Poisson distribution emerges naturally when:

  1. Events occur independently—one event doesn't affect others
  2. Events are rare—small probability per opportunity
  3. Many opportunities exist—large number of potential "slots"
  4. Rate is constant—the expected count λ stays fixed
Why This Matters for AI/ML: The Poisson distribution is foundational for modeling events in time and space: website traffic, user clicks, system failures, word frequencies in documents. It's the basis for anomaly detection, queuing theory, and stochastic processes in ML.

The Poisson Distribution: Counting Rare Events

Definition: Poisson Distribution

A random variable X follows a Poisson distribution with rate parameter λ > 0 if it counts the number of events in a fixed interval (time, space, etc.), where events occur independently at constant rate λ.

XextPoisson(λ)X \sim ext{Poisson}(\lambda)

Probability Mass Function

The PMF of a Poisson(λ) random variable is:

P(X = k) = rac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \ldots

Understanding the Formula: Symbol by Symbol

SymbolMeaningIntuition
λRate parameterExpected number of events (λ = E[X])
kNumber of eventsThe count we compute probability for
e^(-λ)NormalizationProbability of zero events; ensures sum = 1
λ^kRate to the power kMore events → higher power of rate
k!FactorialEvents are interchangeable (order doesn't matter)
Intuition: Think of e as the "baseline probability of nothing happening." Each additional event multiplies by λ/k, where k accounts for the fact that we don't care about the order.

The Signature Property: E[X] = Var(X) = λ

Equidispersion: The Poisson distribution is uniquely characterized by having Mean = Variance = λ.

If your data has Var > Mean: overdispersed (consider Negative Binomial)
If your data has Var < Mean: underdispersed (rare, consider Binomial)

Key Properties

PropertyFormulaNote
Support{0, 1, 2, ...}Can be any non-negative integer
MeanE[X] = λRate is the expected count
VarianceVar(X) = λSame as mean!
Std Devσ = √λSpread grows with √rate
Mode⌊λ⌋Most likely count (floor of λ)
MGFM(t) = exp(λ(e^t - 1))Useful for proving sum property
Skewness1/√λApproaches symmetric as λ → ∞

Interactive: Poisson PMF Explorer

Explore how the Poisson distribution changes with λ. Notice the signature property: Mean and Variance are both equal to λ!

Loading interactive demo...


The Poisson Limit Theorem (Law of Rare Events)

Mathematical Statement

The Poisson Limit Theorem

lim_{n o infty} inom{n}{k} left( rac{lambda}{n} ight)^k left(1 - rac{lambda}{n} ight)^{n-k} = rac{e^{-lambda} lambda^k}{k!}

If X ~ Binomial(n, λ/n), then as n → ∞, X converges to Poisson(λ).

Proof Sketch

Starting with Binomial(n, p) where p = λ/n:

  1. Binomial coefficient: inom{n}{k} = rac{n(n-1)\cdots(n-k+1)}{k!} \approx rac{n^k}{k!}
  2. Success term: p^k = left( rac{lambda}{n} ight)^k = rac{lambda^k}{n^k}
  3. Failure term: (1-p)^{n-k} = left(1 - rac{lambda}{n} ight)^{n-k} o e^{-lambda}
  4. Combining: rac{n^k}{k!} \cdot rac{\lambda^k}{n^k} \cdot e^{-\lambda} = rac{e^{-\lambda}\lambda^k}{k!}
Key Insight: The famous limit lim_{n o infty}left(1 + rac{x}{n} ight)^n = e^x is what transforms the binomial term into the exponential e.

Interactive: Law of Rare Events Demo

Watch the Binomial distribution converge to Poisson as n increases while λ = np stays constant. This is the mathematical foundation of Poisson—why it models rare events!

Loading interactive demo...


Sum of Independent Poissons

Sum of Poissons is Poisson

If X1extPoisson(λ1)X_1 \sim ext{Poisson}(\lambda_1) and X2extPoisson(λ2)X_2 \sim ext{Poisson}(\lambda_2) are independent, then:

X1+X2extPoisson(λ1+λ2)X_1 + X_2 \sim ext{Poisson}(\lambda_1 + \lambda_2)

Proof Using MGFs

The MGF of Poisson(λ) is MX(t)=elambda(et1)M_X(t) = e^{lambda(e^t - 1)}. For independent X1 and X2:

MX1+X2(t)=MX1(t)cdotMX2(t)=elambda1(et1)cdotelambda2(et1)=e(lambda1+lambda2)(et1)M_{X_1+X_2}(t) = M_{X_1}(t) cdot M_{X_2}(t) = e^{lambda_1(e^t-1)} cdot e^{lambda_2(e^t-1)} = e^{(lambda_1+lambda_2)(e^t-1)}

This is the MGF of Poisson(λ1 + λ2), so the sum is Poisson.

Real-world application: If call center A receives 5 calls/hour and center B receives 8 calls/hour independently, routing all calls to a single queue gives 13 calls/hour, still Poisson distributed!

The Poisson Process: Events in Continuous Time

Definition

A Poisson process with rate λ is a counting process {N(t),t0}\{N(t), t \geq 0\} where:

  1. N(0) = 0—process starts with zero events
  2. Independent increments—counts in non-overlapping intervals are independent
  3. Stationary increments—N(t+s) - N(t) depends only on s, not t
  4. N(t) ~ Poisson(λt)—count in any interval [0, t] is Poisson

Key Connection: Inter-arrival Times

Fundamental Result: In a Poisson process with rate λ, the time between consecutive events (inter-arrival time) follows an Exponential(λ) distribution!

T \sim ext{Exponential}(\lambda), \quad E[T] = rac{1}{\lambda}

This connects Poisson (discrete count) to Exponential (continuous waiting time).

PropertyDescription
Count N(t)N(t) ~ Poisson(λt) for any t
Inter-arrival TT ~ Exponential(λ), E[T] = 1/λ
SuperpositionSum of Poisson processes is Poisson
ThinningFiltering events creates Poisson subprocess
MemorylessFuture independent of past

Interactive: Poisson Process Timeline

Watch events arrive over time according to a Poisson process. Observe how the count distribution follows Poisson(λ) and inter-arrival times follow Exponential(λ).

Loading interactive demo...


Real-World Examples

Example 1: Call Center Staffing

Problem: A call center receives an average of 8 calls per hour. What is P(exactly 10 calls in one hour)? What is P(more than 12 calls)?

Solution: X ~ Poisson(8)

P(X = 10) = rac{e^{-8} \cdot 8^{10}}{10!} \approx 0.0993 ext{ (9.93%)}P(X > 12) = 1 - P(X \leq 12) \approx 0.064 ext{ (6.4%)}

Insight: There's about a 6.4% chance of receiving more than 12 calls, useful for staffing decisions.

Example 2: Quality Control (Defects per Unit)

Problem: A factory produces fabric with an average of 2 defects per 100 meters. What is P(exactly 5 defects in a 200-meter roll)?

Solution: Scale the rate: λ = 2 × (200/100) = 4 defects expected. X ~ Poisson(4)

P(X = 5) = rac{e^{-4} \cdot 4^5}{5!} \approx 0.156 ext{ (15.6%)}

Example 3: Website Traffic

Problem: A website gets 50 visitors per minute on average. What is P(fewer than 40 visitors in a given minute)?

Solution: X ~ Poisson(50)

P(X < 40) = P(X \leq 39) \approx 0.0427 ext{ (4.27%)}

Anomaly Detection: If traffic drops below 40, there's only a 4.27% chance this is normal—could indicate a system issue!


Interactive: Event Rate Simulator

Explore real-world Poisson scenarios. Select a context, run simulations, and calculate probabilities for practical decision-making.

Loading interactive demo...


AI/ML Applications

The Poisson distribution is everywhere in machine learning and AI systems:

1. Anomaly Detection in Network Traffic

Model normal traffic as Poisson(λnormal):

  • Flag when P(X ≥ observed) < α (too high → DDoS attack?)
  • Flag when P(X ≤ observed) < α (too low → system failure?)
  • Used in: intrusion detection, fraud monitoring, system health checks

2. Natural Language Processing

Word frequency modeling in documents:

  • Rare words follow Poisson distribution in fixed-length documents
  • Topic models (LDA) use Poisson for word counts per topic
  • TF-IDF and document classification rely on Poisson assumptions

3. Recommendation Systems

User activity and click modeling:

  • Clicks per session follow Poisson
  • Poisson Factorization for matrix factorization with count data
  • Used in collaborative filtering and implicit feedback models

4. Reinforcement Learning & Queuing

Event-driven environments:

  • Poisson processes model environment events (arrivals, requests)
  • Continuous-time RL with stochastic event timing
  • Service systems, task scheduling, resource allocation

Interactive: Anomaly Detection with Poisson

See how Poisson distribution powers anomaly detection in a simulated network traffic monitoring system. Watch for DDoS attacks (high traffic) and outages (low traffic).

Loading interactive demo...


Python Implementation

🐍python
1import numpy as np
2from scipy.stats import poisson, binom
3import matplotlib.pyplot as plt
4
5# ================================================
6# POISSON DISTRIBUTION BASICS
7# ================================================
8
9lambda_rate = 5
10X = poisson(lambda_rate)
11
12# PMF - P(X = k)
13print("Poisson PMF:")
14for k in range(15):
15    print(f"P(X={k:2d}) = {X.pmf(k):.4f}")
16
17# CDF - P(X <= k)
18print(f"\nP(X <= 7) = {X.cdf(7):.4f}")
19print(f"P(X > 7) = {1 - X.cdf(7):.4f}")
20
21# Signature property: Mean = Variance = λ
22print(f"\nE[X] = {X.mean():.4f}")    # 5.0
23print(f"Var(X) = {X.var():.4f}")    # 5.0 (same!)
24
25# Sampling
26samples = X.rvs(size=10000)
27print(f"Sample mean: {np.mean(samples):.4f}")
28print(f"Sample variance: {np.var(samples):.4f}")
29
30# ================================================
31# LAW OF RARE EVENTS: Binomial → Poisson
32# ================================================
33
34def demonstrate_poisson_limit(lambda_fixed, n_values):
35    """Show Binomial(n, λ/n) → Poisson(λ) as n → ∞"""
36    k = np.arange(0, 20)
37    poisson_pmf = poisson.pmf(k, lambda_fixed)
38
39    print(f"\nConvergence to Poisson({lambda_fixed}):")
40    for n in n_values:
41        p = lambda_fixed / n
42        binomial_pmf = binom.pmf(k, n, p)
43        max_error = np.max(np.abs(binomial_pmf - poisson_pmf))
44        print(f"n={n:5d}, p={p:.6f}, Max Error: {max_error:.6f}")
45
46demonstrate_poisson_limit(5, [10, 50, 100, 500, 1000, 5000])
47
48# ================================================
49# POISSON PROCESS SIMULATION
50# ================================================
51
52def simulate_poisson_process(rate, T):
53    """Simulate Poisson process on [0, T] using inter-arrivals"""
54    events = []
55    t = 0
56
57    # Inter-arrival times are Exponential(rate)
58    while True:
59        inter_arrival = np.random.exponential(1 / rate)
60        t += inter_arrival
61        if t >= T:
62            break
63        events.append(t)
64
65    return np.array(events)
66
67# Simulate
68rate = 3  # 3 events per unit time
69T = 10    # observe for 10 time units
70events = simulate_poisson_process(rate, T)
71print(f"\nPoisson Process (rate={rate}, T={T}):")
72print(f"Total events: {len(events)}")
73print(f"Expected events: {rate * T}")
74
75# Verify inter-arrival distribution
76if len(events) > 1:
77    inter_arrivals = np.diff(np.concatenate([[0], events]))
78    print(f"Mean inter-arrival: {np.mean(inter_arrivals):.4f} (expected: {1/rate:.4f})")
79
80# ================================================
81# SUM OF POISSONS
82# ================================================
83
84def verify_sum_property(lambda1, lambda2, n_samples=10000):
85    """Verify X1 + X2 ~ Poisson(λ1 + λ2)"""
86    X1 = poisson.rvs(lambda1, size=n_samples)
87    X2 = poisson.rvs(lambda2, size=n_samples)
88    sum_samples = X1 + X2
89    direct_samples = poisson.rvs(lambda1 + lambda2, size=n_samples)
90
91    print(f"\nSum of Poisson({lambda1}) + Poisson({lambda2}):")
92    print(f"Sum mean: {np.mean(sum_samples):.4f}")
93    print(f"Direct Poisson({lambda1 + lambda2}) mean: {np.mean(direct_samples):.4f}")
94    print(f"Theoretical mean: {lambda1 + lambda2:.4f}")
95
96verify_sum_property(3, 7)
97
98# ================================================
99# ANOMALY DETECTION
100# ================================================
101
102def detect_anomalies(counts, lambda_normal, alpha=0.01):
103    """Detect anomalous counts using Poisson model"""
104    anomalies = []
105
106    for i, count in enumerate(counts):
107        # Two-tailed test
108        p_lower = poisson.cdf(count, lambda_normal)
109        p_upper = 1 - poisson.cdf(count - 1, lambda_normal)
110        p_value = 2 * min(p_lower, p_upper)
111
112        if p_value < alpha:
113            anomaly_type = 'low' if p_lower < p_upper else 'high'
114            anomalies.append({
115                'index': i,
116                'count': count,
117                'p_value': p_value,
118                'type': anomaly_type
119            })
120
121    return anomalies
122
123# Example: website traffic monitoring
124np.random.seed(42)
125normal_traffic = poisson.rvs(100, size=55)  # Normal days
126# Inject anomalies
127attack_traffic = poisson.rvs(250, size=3)   # DDoS attack
128outage_traffic = poisson.rvs(30, size=2)    # System issue
129all_traffic = np.concatenate([
130    normal_traffic[:15],
131    attack_traffic,       # minutes 15-17
132    normal_traffic[15:40],
133    outage_traffic,       # minutes 40-41
134    normal_traffic[40:]
135])
136
137anomalies = detect_anomalies(all_traffic, lambda_normal=100, alpha=0.01)
138print(f"\nAnomaly Detection Results:")
139print(f"Total anomalies detected: {len(anomalies)}")
140for a in anomalies:
141    print(f"  Minute {a['index']}: count={a['count']}, p={a['p_value']:.4f}, type={a['type']}")
142
143# ================================================
144# VISUALIZATION
145# ================================================
146
147fig, axes = plt.subplots(2, 2, figsize=(12, 10))
148
149# Plot 1: Poisson PMF for different λ values
150ax1 = axes[0, 0]
151k = np.arange(0, 25)
152for lam in [2, 5, 10, 15]:
153    ax1.plot(k, poisson.pmf(k, lam), 'o-', label=f'λ={lam}', alpha=0.7)
154ax1.set_xlabel('k')
155ax1.set_ylabel('P(X = k)')
156ax1.set_title('Poisson PMF for Different λ')
157ax1.legend()
158
159# Plot 2: Binomial → Poisson convergence
160ax2 = axes[0, 1]
161lambda_fixed = 5
162k = np.arange(0, 15)
163ax2.plot(k, poisson.pmf(k, lambda_fixed), 'k-', lw=2, label='Poisson(5)')
164for n in [10, 50, 200]:
165    p = lambda_fixed / n
166    ax2.plot(k, binom.pmf(k, n, p), 'o--', label=f'Bin({n}, {p:.3f})', alpha=0.6)
167ax2.set_xlabel('k')
168ax2.set_ylabel('P(X = k)')
169ax2.set_title('Law of Rare Events')
170ax2.legend()
171
172# Plot 3: Sum of Poissons
173ax3 = axes[1, 0]
174k = np.arange(0, 25)
175ax3.plot(k, poisson.pmf(k, 3), 'b-', label='Poisson(3)', alpha=0.7)
176ax3.plot(k, poisson.pmf(k, 7), 'g-', label='Poisson(7)', alpha=0.7)
177ax3.plot(k, poisson.pmf(k, 10), 'r-', lw=2, label='Poisson(10) = Sum')
178ax3.set_xlabel('k')
179ax3.set_ylabel('P(X = k)')
180ax3.set_title('Sum of Independent Poissons')
181ax3.legend()
182
183# Plot 4: Traffic monitoring
184ax4 = axes[1, 1]
185minutes = np.arange(len(all_traffic))
186ax4.plot(minutes, all_traffic, 'b-', alpha=0.7, label='Traffic')
187ax4.axhline(y=100, color='g', linestyle='--', label='Expected (λ=100)')
188
189# Highlight anomalies
190anomaly_idx = [a['index'] for a in anomalies]
191ax4.scatter(anomaly_idx, all_traffic[anomaly_idx], c='r', s=100,
192            marker='x', linewidths=2, label='Anomalies')
193ax4.set_xlabel('Minute')
194ax4.set_ylabel('Requests')
195ax4.set_title('Anomaly Detection in Traffic')
196ax4.legend()
197
198plt.tight_layout()
199plt.savefig('poisson_visualization.png', dpi=150)
200plt.show()

Common Pitfalls

Pitfall 1: Forgetting to Scale the Rate

Wrong: "λ = 5 calls/hour, so P(10 calls in 2 hours) uses λ = 5"
Right: Scale the rate: λ2hr = 5 × 2 = 10

Pitfall 2: Ignoring Overdispersion

If your data has Var >> Mean, Poisson is violated! Real-world count data (e.g., user clicks, disease counts) often exhibits overdispersion. Consider Negative Binomial instead.

Pitfall 3: Applying Poisson When Events Aren't Independent

Customer arrivals in groups (families, tours) violate independence. Website traffic during viral events isn't Poisson. Check the independence assumption before modeling!

Pitfall 4: Confusing Poisson Distribution vs Process

Poisson distribution: P(X = k) for a fixed interval
Poisson process: Counts across continuous time with specific properties

Pitfall 5: Using Poisson for Bounded Counts

If there's a maximum possible count (e.g., max 10 people in a room), Poisson may not be appropriate. Consider Binomial or truncated distributions.

Conditions for Poisson:
  1. Events occur independently
  2. Events are rare (small probability per "opportunity")
  3. The rate λ is constant over the observation period
  4. Two events cannot occur at exactly the same time
If any condition fails, consider alternative distributions!

Test Your Understanding

Loading interactive demo...


Summary

Key Takeaways

  1. Poisson(λ) models counting rare events in fixed intervals. PMF: P(X=k)=elambdalambdak/k!P(X=k) = e^{-lambda}lambda^k/k!
  2. Signature property: E[X] = Var(X) = λ (equidispersion). This is unique to Poisson and used to test the model fit.
  3. Law of Rare Events: Binomial(n, λ/n) → Poisson(λ) as n → ∞. This is why Poisson models rare events.
  4. Sum property: Sum of independent Poissons is Poisson with summed rates.
  5. Poisson process: N(t) ~ Poisson(λt), with inter-arrivals ~ Exponential(λ).
  6. AI/ML applications: Anomaly detection, NLP word frequencies, recommendation systems, queuing.
Quick Reference
PropertyFormula
PMFP(X=k) = e^(-λ)λ^k/k!
MeanE[X] = λ
VarianceVar(X) = λ
Sum of PoissonsPoi(λ₁) + Poi(λ₂) = Poi(λ₁+λ₂)
Rate scalingIn time T: λT
Inter-arrivalT ~ Exponential(λ), E[T] = 1/λ
Looking Ahead: In the next section, we'll explore the Hypergeometric distribution—what happens when sampling without replacement from a finite population. This is crucial for understanding A/B tests with limited samples and quality control inspections.