Learning Objectives
By the end of this section, you will be able to:
📚 Core Knowledge
- • Define a Poisson process and its three fundamental properties
- • Derive the exponential distribution of inter-arrival times
- • Explain the connection to order statistics and Gamma distributions
- • Understand superposition and thinning operations
🔧 Practical Skills
- • Simulate homogeneous Poisson processes from scratch
- • Use the thinning algorithm for time-varying intensities
- • Model aggregate claims with compound Poisson processes
- • Apply these concepts to real-world event modeling
🧠 Deep Learning Connections
- • Neural Point Processes — Modern deep learning extends Poisson processes to learn intensity functions from data
- • Temporal Event Prediction — Predicting when the next user action, transaction, or system failure will occur
- • Attention Mechanisms — Self-attention can be viewed through the lens of point process theory
- • Reinforcement Learning — Poisson processes model random events in environment dynamics
Where You'll Apply This: Queuing systems, call centers, network traffic analysis, insurance claims modeling, stock price jump models, customer arrival prediction, epidemiology, and temporal event prediction in neural networks.
The Big Picture
The Poisson process is the fundamental model for random events occurring in time. Unlike the Poisson distribution (which counts events in a fixed interval), a Poisson process describes the entire sequence of arrival times—a continuous-time stochastic process.
The Core Insight
A Poisson process answers: "When do random events happen?" It models events that occur independently and at a constant average rate—like radioactive decay, customer arrivals, or server requests. The beautiful mathematics: while the process is continuous in time, the count in any interval follows a discrete Poisson distribution.
Continuous Time: Events can occur at any moment
Memoryless: Future is independent of past
Stationary: Same statistics at any time
Historical Context
Siméon Denis Poisson (1837)
Poisson derived the famous distribution while studying court judgments, but didn't develop the process theory. His distribution models "rare events"—the count of occurrences when each individual event has tiny probability.
Agner Krarup Erlang (1909)
Erlang developed queuing theory while working for the Copenhagen Telephone Company. He modeled telephone call arrivals as a Poisson process—founding modern operations research and teletraffic engineering.
Ernest Rutherford & Hans Geiger (1910)
Rutherford and Geiger confirmed that radioactive decay follows a Poisson process—each atom decays independently with a constant rate, providing the first rigorous physical validation of the model.
Mathematical Definition
A counting process counts the number of events that occur by time . It's a Poisson process if it satisfies three fundamental properties:
The Three Defining Properties
1. Independent Increments
The number of events in disjoint time intervals are independent random variables.
2. Stationary Increments
The distribution of counts depends only on the length of the interval, not its position.
3. Orderliness (No Simultaneous Events)
Events occur one at a time—the probability of two or more events in a tiny interval is negligible.
These properties uniquely determine that :
where is the rate parameter (events per unit time)
| Property | Formula | Interpretation |
|---|---|---|
| Mean | E[N(t)] = λt | Expected events grows linearly with time |
| Variance | Var(N(t)) = λt | Mean equals variance (Poisson property) |
| Standard deviation | σ = √(λt) | Uncertainty grows as square root of time |
| Rate | λ = E[N(t)]/t | Average events per unit time |
Interactive: Process Timeline
Simulate a Poisson process and observe how events arrive over time. Notice how the count in each unit interval follows a Poisson distribution, and the inter-arrival times follow an exponential distribution.
Poisson Process: Events Over Time
Watch events arrive according to a Poisson process. The count in each unit interval follows Poisson(λ), and the time between events follows Exponential(λ).
Count Distribution (per unit interval)
Key Properties of Poisson Process
- 1.N(t) ~ Poisson(λt) for any interval of length t
- 2.Inter-arrival times ~ Exponential(λ)
- 3.Independent increments: counts in non-overlapping intervals are independent
- 4.Stationary increments: count distribution depends only on interval length
Inter-arrival Times
One of the most important properties of the Poisson process: inter-arrival times are exponentially distributed. Let denote the time between the -th and -th event.
Inter-arrival Time Distribution
Arrival Time Distribution
The -th arrival time is the sum of independent exponential random variables. This sum follows a Gamma distribution:
with mean and variance
Interactive: Arrival Times
Explore the relationship between inter-arrival times (exponential) and arrival times (Gamma). Run multiple simulations to see how the empirical distributions match the theoretical predictions.
Arrival Times & Order Statistics
A remarkable property: given N(T) = n events in [0,T], the arrival times are distributed as the order statistics of n uniform random variables on [0,T]. Additionally, the n-th arrival time Sn follows a Gamma(n, 1/\u03BB) distribution.
Key Mathematical Results
1.Arrival Time Distribution: Sn ~ Gamma(n, 1/\u03BB), because it's the sum of n independent Exp(\u03BB) random variables.
2.Expected Value: E[Sn] = n/\u03BB
3.Variance: Var(Sn) = n/\u03BB\u00B2
4.Order Statistics: Given N(T)=n, the arrival times are distributed like the order statistics of n Uniform(0,T) random variables.
Superposition and Thinning
Two fundamental operations on Poisson processes allow us to combine and decompose them. These are inverses of each other:
Superposition (Merging)
The sum of independent Poisson processes is a Poisson process.
Example: Combining arrivals from two entrances
Thinning (Splitting)
Randomly classifying events creates independent Poisson processes.
Example: Splitting customers by product interest
Interactive: Operations
Experiment with superposition (merging streams) and thinning (splitting by type). Observe how the rates add in superposition and split proportionally in thinning.
Superposition & Thinning
Two fundamental operations on Poisson processes: superposition merges independent processes into one, while thinning splits a process into independent sub-processes. These operations are inverses of each other.
Superposition Theorem
If N\u2081(t) and N\u2082(t) are independent Poisson processes with rates \u03BB\u2081 and \u03BB\u2082, then their superposition N(t) = N\u2081(t) + N\u2082(t) is a Poisson process with rate \u03BB = \u03BB\u2081 + \u03BB\u2082.
PP(\u03BB\u2081) + PP(\u03BB\u2082) = PP(\u03BB\u2081 + \u03BB\u2082)
This extends to any finite number of independent Poisson processes.
Inhomogeneous Poisson Processes
When the arrival rate varies with time, we have an inhomogeneous (non-homogeneous) Poisson process with time-varying intensity function .
Inhomogeneous Poisson Process
is the cumulative intensity (or integrated rate)
Real-world examples of time-varying intensity:
- Call centers: Peak hours in morning and afternoon, low at night
- Website traffic: Spikes during promotions, lower on weekends
- Hospital admissions: Seasonal flu patterns, weekly cycles
- Financial markets: Higher volatility at market open/close
The Thinning Algorithm
To simulate an inhomogeneous Poisson process, we use the thinning algorithm(Lewis & Shedler, 1979):
- Find , the maximum intensity
- Generate events from a homogeneous Poisson process with rate
- For each event at time , accept it with probability
- The accepted events form the inhomogeneous Poisson process
Interactive: Time-Varying Rates
Visualize inhomogeneous Poisson processes with different intensity functions. Toggle "Show rejected events" to see the thinning algorithm in action.
Inhomogeneous (Non-Homogeneous) Poisson Process
When the arrival rate varies with time, we have an inhomogeneous Poisson process with time-varying intensity function \u03BB(t). This is simulated using the thinning algorithm: generate events at the maximum rate, then randomly accept or reject based on the local intensity.
The Thinning Algorithm (Lewis & Shedler, 1979)
- 1.Find \u03BB_max = sup\u209C \u03BB(t), the maximum intensity over the time interval
- 2.Generate events from a homogeneous Poisson process with rate \u03BB_max
- 3.For each event at time t, accept it with probability \u03BB(t)/\u03BB_max
- 4.The accepted events form an inhomogeneous Poisson process with intensity \u03BB(t)
Compound Poisson Processes
A compound Poisson process generalizes the Poisson process by adding random "jump sizes" at each arrival. Instead of counting events, we aggregate random amounts:
Compound Poisson Process
| Property | Formula | Notes |
|---|---|---|
| Mean | E[S(t)] = λμt | Wald's equation |
| Variance | Var(S(t)) = λE[X²]t | Includes jump size variability |
| MGF | M_S(s) = exp(λt(M_X(s)-1)) | Composition of MGFs |
Key applications of compound Poisson processes:
- Insurance: Claims arrive as a Poisson process; claim sizes are random
- Finance: Jump-diffusion models for asset prices with random price jumps
- Retail: Customers arrive randomly; purchase amounts vary
- Inventory: Demand arrives as Poisson; order quantities are random
Interactive: Aggregate Claims
Simulate a compound Poisson process representing aggregate claims, purchases, or other cumulative random processes. Compare different jump size distributions.
Compound Poisson Process
A compound Poisson process S(t) = \u2211X\u1D62 aggregates random jump sizes X\u1D62 at Poisson arrival times. Unlike a regular Poisson process that counts events, this process accumulates random amounts—critical for modeling insurance claims, financial jumps, and queuing systems.
Compound Poisson Process Definition
where N(t) ~ Poisson(\u03BBt) and X\u1D62 are i.i.d. with E[X] = \u03BC
Key Properties
Mean: E[S(t)] = \u03BB\u00B7\u03BC\u00B7t
Variance: Var(S(t)) = \u03BB\u00B7E[X\u00B2]\u00B7t
MGF: M\u209B\u209C(s) = exp(\u03BBt(M\u1D6A(s) - 1))
Applications in Machine Learning
Poisson processes are fundamental to many modern ML systems that deal with temporal events:
🧠 Neural Point Processes
Instead of hand-crafting intensity functions, neural networks learn from data. Recurrent neural networks capture temporal dependencies, while transformers model long-range interactions. Used for predicting user actions, medical events, and financial transactions.
🔄 Continuous Normalizing Flows
Neural ODEs and continuous normalizing flows can be viewed through the lens of point processes, enabling density estimation in continuous time. The thinning algorithm inspires rejection sampling in these models.
📊 Anomaly Detection
Model normal event patterns as a Poisson process. Deviations from the expected rate signal anomalies: fraud detection (unusual transaction patterns), system monitoring (burst traffic), and cybersecurity (attack detection).
🎮 Reinforcement Learning
Poisson processes model random events in environment dynamics: customer arrivals in inventory management, opponent actions in games, and resource availability in scheduling. Semi-Markov decision processes use inter-event time distributions.
Real-World Poisson Scenarios
Select a scenario to see how Poisson distribution models real-world event counting. Run simulations and calculate probabilities for planning and decision-making.
Call Center
Incoming calls to a customer service hotline
Probability Calculator
Python Implementation
Let's implement Poisson processes from scratch, including homogeneous, inhomogeneous, and compound variants.
scipy.stats for the Poisson and exponential distributions, or specialized packages like tick or pytorch-geometric-temporalfor neural point processes.Knowledge Check
Test your understanding of Poisson processes with these questions:
Poisson Processes Quiz
What distribution do inter-arrival times follow in a homogeneous Poisson process with rate λ?
Summary
Key Takeaways
What's Next
You've now completed Chapter 25 on Stochastic Processes! The next chapter on Probabilistic Graphical Models will show how to represent complex dependencies between random variables using graphs—combining probability theory with graph theory to build interpretable models for reasoning under uncertainty.