Learning Objectives
By the end of this section, you will:
- Define what makes a random variable "mixed" — having both discrete (jump) and continuous (smooth) probability components
- Identify real-world phenomena with mixed behavior: insurance claims, censored data, zero-inflated counts
- Decompose a mixed RV:
- Calculate probabilities and expected values for mixed distributions
- Recognize why CDF is the ONLY universal tool that works for discrete, continuous, AND mixed RVs
- Apply to AI/ML: zero-inflated models, censored regression, mixture density networks
Where You'll Apply This Knowledge:
Historical Context
The Problem of "Weird" Distributions
In the early 20th century, statisticians encountered distributions that defied classification. Most notably, the insurance industry faced a puzzle: most policyholders claim $0, but actual claims can be any positive amount. This is neither purely discrete nor purely continuous!
Key Historical Developments:
- Émile Borel (1909): First studied probability measures that aren't purely discrete or continuous
- Andrey Kolmogorov (1933): Made CDF the fundamental object precisely because it handles ALL cases
- William Feller (1950s): Popularized mixed distributions in his influential textbook
- Modern Era (1980s-present): Zero-inflated models became standard in econometrics and biostatistics
Kolmogorov's Insight: Instead of defining probability through PMF (discrete-only) or PDF (continuous-only), define it through a single object—the CDF—that works for all random variables: discrete, continuous, and mixed!
The Puzzle: Neither Discrete Nor Continuous
Consider this real-world scenario that neither PMF nor PDF can fully describe:
Insurance Claims Example
Let = insurance claim amount for a randomly selected policyholder.
Discrete Part (Atom)
70% of policyholders file no claim
Continuous Part
30% file claims with exponentially distributed amounts
⚠️ This is neither purely discrete nor purely continuous!
Why Standard Tools Fail
❌ PMF Cannot Work
PMF requires over countable values.
But can be any positive real number—uncountably many values!
❌ PDF Cannot Work
PDF requires for all .
But ! There's a discrete "atom" at zero.
✓ CDF Works Perfectly
The CDF gracefully handles both the discrete jump at 0 and the continuous accumulation for positive values:
Interactive: Mixed RV Visualizer
Explore the insurance claims example interactively. See how the CDF combines a discrete jump at zero with a smooth exponential curve for positive values.
Loading interactive demo...
Formal Definition
Definition: Mixed Random Variable
A random variable is mixed if its CDF has:
- Jump discontinuities (discrete "atoms") at some points
- Continuous, differentiable portions elsewhere
The CDF can be decomposed as:
Symbol Reference
| Symbol | Name | Intuitive Meaning |
|---|---|---|
| F(x) | CDF of mixed RV | Total probability ≤ x (jumps + smooth part) |
| pᵢ = P(X = xᵢ) | Probability atom | Discrete probability mass at point xᵢ |
| g(t) | Continuous component | The "density" between atoms (may not integrate to 1) |
| 1(x ≥ xᵢ) | Indicator function | 1 if x ≥ xᵢ, else 0 |
| F(x) - F(x⁻) | Jump size | Discrete probability at x |
The Three Types of Random Variables
With mixed random variables, we complete the picture. There are exactly three types:
| Property | Discrete | Continuous | Mixed |
|---|---|---|---|
| CDF shape | Pure staircase | Smooth curve | Staircase + curve |
| P(X = x) | > 0 for some x | = 0 for all x | > 0 at atoms, = 0 elsewhere |
| Described by | PMF p(x) | PDF f(x) | CDF F(x) only! |
| Jump points | Every possible value | None | Selected atoms |
| Examples | Dice, counts | Heights, times | Claims, ratings |
Interactive: Three Types Comparison
See the three types of random variables side by side. Notice how the CDF handles each case seamlessly!
Loading interactive demo...
Common Patterns of Mixed Distributions
Mixed distributions appear frequently in real-world data. Here are the most common patterns:
Pattern A: "Atom at Zero" (Zero-Inflated)
Examples: Insurance claims (no claim vs positive), rainfall (no rain vs amount), customer spending (non-buyer vs spending amount), click-through (no click vs engagement time)
Pattern B: "Atoms at Boundaries" (Censored/Truncated)
Examples: Sensor readings at detection limits, credit scores (300-850), survey responses with ceiling/floor, grades bounded by 0-100
Pattern C: "Discrete + Continuous Mixture"
Examples: Product ratings (not-rated vs stars vs quality score), customer lifetime value (churned vs active with spend), medical diagnosis
Interactive: Atom Pattern Explorer
Explore each pattern interactively. Adjust parameters to see how the CDF changes for zero-inflated, censored, and mixture distributions.
Loading interactive demo...
CDF for Mixed Random Variables
The CDF of a mixed random variable combines discrete jumps with continuous accumulation:
General CDF Formula for Mixed RVs
Properties Still Hold!
Computing Probabilities
The same CDF formulas work for mixed random variables:
At Most x
Greater Than x
In an Interval
Exactly x (at an atom)
Worked Example: Insurance Claims
Let = claim amount where and for : continuous Exp(0.01) scaled by 0.3.
Compute P(X = 0): F(0) - F(0⁻) = 0.7 - 0 = 0.7 ✓
Compute P(X ≤ 100): 0.7 + 0.3(1 - e⁻¹) ≈ 0.7 + 0.189 = 0.889
Compute P(X > 200): 1 - F(200) = 1 - [0.7 + 0.3(1 - e⁻²)] ≈ 0.041
Compute P(0 < X ≤ 100): F(100) - F(0) = 0.889 - 0.7 = 0.189
Interactive: Probability Calculator
Practice computing probabilities for mixed distributions. Drag the interval bounds and see how the CDF handles discrete jumps.
Loading interactive demo...
Expected Value for Mixed RVs
The expected value of a mixed random variable has contributions from both parts:
Expected Value Formula
Worked Example
For insurance claims X with P(X = 0) = 0.7 and X|X>0 ~ Exp(0.01):
Interpretation: The average claim is $30, even though 70% of claims are $0! The few large claims pull up the average significantly.
Interactive: Expected Value Demo
See how the expected value is computed for mixed distributions. Watch the discrete and continuous contributions combine.
Loading interactive demo...
Real-World Examples
🏥 Medical Costs
Pattern: Many patients have $0 cost, others have continuous distribution
Atom: P(X = 0) = healthy population
ML Use: Healthcare cost prediction, fraud detection
☔ Daily Rainfall
Pattern: Many days have 0mm, rainy days have continuous amount
Atom: P(X = 0) = probability of dry day
ML Use: Weather forecasting, agriculture planning
🛒 Customer Spending
Pattern: Non-buyers at $0, buyers with continuous spend
Atom: P(X = 0) = non-conversion rate
ML Use: LTV prediction, recommendation systems
🔋 Sensor Readings
Pattern: Atoms at min/max limits, continuous in between
Atoms: P(X = L), P(X = U) at sensor limits
ML Use: Anomaly detection, sensor fusion
AI/ML Applications
Mixed distributions are fundamental to modern machine learning. Here are key applications:
1. Zero-Inflated Neural Networks
Problem: Predicting quantities where many values are exactly zero
Solution: Two-headed model: classifier (is it zero?) + regressor (if not, how much?)
Applications: Click prediction, demand forecasting, medical diagnosis
2. Censored Regression (Tobit Models)
Problem: True values are censored at bounds (sensor limits, survey scales)
Solution: Model combines censoring probability with truncated distribution
Applications: Survival analysis, credit risk, time-to-event prediction
3. Mixture Density Networks (MDN)
Problem: Output can be discrete category + continuous value
Solution: Network outputs mixture weights for discrete and Gaussian parameters
Applications: Multi-modal prediction, inverse problems, generative models
4. Dropout as Mixed Distribution
Insight: Dropout can be viewed as a mixed distribution!
This is a zero-inflated scaled activation—a mixed random variable!
5. VAE with Discrete + Continuous Latents
Architecture: Discrete latent (cluster) + Continuous latent (variation)
Training: Gumbel-Softmax for discrete, reparameterization for continuous
Applications: Controllable generation, disentangled representations
Interactive: Zero-Inflated ML Demo
See zero-inflated regression in action. Watch how the model learns to separate the zero-probability from the continuous distribution.
Loading interactive demo...
Python Implementation
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5# ============================================
6# EXAMPLE 1: Zero-Inflated Distribution
7# ============================================
8
9class ZeroInflatedExponential:
10 """Zero-inflated exponential distribution for insurance claims."""
11
12 def __init__(self, p_zero, lambda_):
13 self.p_zero = p_zero
14 self.lambda_ = lambda_
15 self.exp = stats.expon(scale=1/lambda_)
16
17 def cdf(self, x):
18 """CDF: F(x) = p_zero * 1(x >= 0) + (1 - p_zero) * F_exp(x)"""
19 x = np.asarray(x)
20 result = np.zeros_like(x, dtype=float)
21 mask = x >= 0
22 result[mask] = self.p_zero + (1 - self.p_zero) * self.exp.cdf(x[mask])
23 return result
24
25 def pmf_at_zero(self):
26 """P(X = 0) = jump at zero"""
27 return self.p_zero
28
29 def expected_value(self):
30 """E[X] = 0 * p_zero + (1 - p_zero) * E[Exp(lambda)]"""
31 return (1 - self.p_zero) * (1 / self.lambda_)
32
33 def prob_greater(self, x):
34 """P(X > x) = 1 - F(x)"""
35 return 1 - self.cdf(x)
36
37# Create and analyze
38claims = ZeroInflatedExponential(p_zero=0.7, lambda_=0.01)
39
40print("=== Zero-Inflated Exponential (Insurance Claims) ===")
41print("P(X = 0) = %.3f" % claims.pmf_at_zero())
42print("P(X <= 100) = %.4f" % claims.cdf(100))
43print("P(X > 200) = %.4f" % claims.prob_greater(200))
44print("E[X] = $%.2f" % claims.expected_value())
45
46# ============================================
47# EXAMPLE 2: Compute probabilities
48# ============================================
49
50# P(0 < X <= 100) = F(100) - F(0)
51p_interval = claims.cdf(100) - claims.cdf(0)
52print("\nP(0 < X <= 100) = %.4f" % p_interval)
53
54# P(X = 0) = F(0) - F(0-) = jump at 0
55p_zero = claims.cdf(0) - 0 # F(0-) = 0 for this distribution
56print("P(X = 0) = %.4f" % p_zero)
57
58# ============================================
59# EXAMPLE 3: PyTorch Zero-Inflated Loss
60# ============================================
61
62import torch
63import torch.nn as nn
64
65class ZeroInflatedLoss(nn.Module):
66 """Loss function for zero-inflated regression."""
67
68 def forward(self, p_zero, mu, sigma, target):
69 eps = 1e-8
70 is_zero = (target == 0).float()
71
72 # Log-likelihood for zeros
73 ll_zero = is_zero * torch.log(p_zero + eps)
74
75 # Log-likelihood for positive values
76 dist = torch.distributions.Normal(mu, sigma)
77 ll_pos = (1 - is_zero) * (
78 torch.log(1 - p_zero + eps) + dist.log_prob(target)
79 )
80
81 # Negative log-likelihood
82 nll = -(ll_zero + ll_pos)
83 return nll.mean()
84
85# ============================================
86# EXAMPLE 4: Sampling from Mixed Distribution
87# ============================================
88
89def sample_mixed_rv(n_samples, p_zero, continuous_dist):
90 """Sample from zero-inflated distribution."""
91 samples = np.zeros(n_samples)
92
93 # Decide: zero or positive?
94 is_positive = np.random.binomial(1, 1 - p_zero, n_samples)
95
96 # Generate continuous values for positive cases
97 continuous_samples = continuous_dist.rvs(n_samples)
98
99 # Combine: zero for non-positive, continuous otherwise
100 samples = is_positive * continuous_samples
101 return samples
102
103# Generate samples
104samples = sample_mixed_rv(
105 n_samples=10000,
106 p_zero=0.7,
107 continuous_dist=stats.expon(scale=100)
108)
109
110print("\n=== Sampling Results ===")
111print("Sample mean: %.2f" % samples.mean())
112print("Empirical P(X=0): %.3f" % (samples == 0).mean())
113print("Theoretical P(X=0): 0.700")
114print("Theoretical E[X]: %.2f" % (0.3 * 100)) # (1-0.7) * 100
115
116# ============================================
117# EXAMPLE 5: Inverse Transform Sampling
118# ============================================
119
120# For zero-inflated exponential with CDF:
121# F(x) = 0.7 + 0.3 * (1 - exp(-0.01*x)) for x >= 0
122
123def inverse_cdf_mixed(u, p_zero, lambda_):
124 """Inverse CDF for zero-inflated exponential."""
125 if u <= p_zero:
126 return 0 # Atom at zero
127 else:
128 # Solve: u = p_zero + (1-p_zero) * (1 - exp(-lambda*x))
129 # (u - p_zero) / (1 - p_zero) = 1 - exp(-lambda*x)
130 # exp(-lambda*x) = 1 - (u - p_zero) / (1 - p_zero)
131 # x = -ln(1 - (u - p_zero) / (1 - p_zero)) / lambda
132 v = (u - p_zero) / (1 - p_zero)
133 return -np.log(1 - v) / lambda_
134
135# Generate using inverse transform
136u_samples = np.random.uniform(0, 1, 10000)
137x_samples = np.array([inverse_cdf_mixed(u, 0.7, 0.01) for u in u_samples])
138print("\n=== Inverse Transform Sampling ===")
139print("Mean from inverse transform: %.2f" % x_samples.mean())Common Pitfalls
The PDF doesn't exist at atoms! You'd need infinite density at jump points (a Dirac delta function). Use the CDF or decompose into discrete + continuous parts.
for mixed RVs! You must add for the atoms.
when a is an atom! For mixed RVs, be precise about whether endpoints are included.
Rounded measurements can look discrete but aren't true atoms. True atoms have exact repeated values with positive probability.
The "density" g(x) for mixed RVs doesn't integrate to 1. It integrates to .
Test Your Understanding
Loading interactive demo...
Key Takeaways
- Mixed = Discrete + Continuous: Real-world data often has both jump (discrete atoms) and smooth (continuous) probability components.
- CDF is Universal: Only the CDF works for ALL random variables— discrete, continuous, AND mixed. This is why Kolmogorov chose it as the foundation.
- Common Pattern: "Atom at zero" appears everywhere: no purchase, no claim, no click, no rain—all are zeros with positive probability.
- Decomposition:
- Expected Value: —both parts contribute!
- AI/ML Essential: Zero-inflated models, censored regression, mixture density networks, and even dropout are all applications of mixed distributions.
- Practical Reality: Pure discrete or continuous models are idealized. Mixed models capture the true complexity of real-world data.
Chapter Complete! You now understand all three types of random variables: discrete (PMF), continuous (PDF), and mixed (CDF-only). The CDF unifies them all—and that's why it's the foundation of probability theory. In the next chapter, we'll explore Expected Value and how to compute the "center" of any distribution.