Learning Objectives
By the end of this section, you will:
- Define the Discrete Uniform Distribution and understand when it applies
- Derive the PMF, CDF, mean, variance, and entropy with intuitive understanding
- Recognize real-world scenarios: dice, cards, random selection, hashing
- Apply the distribution in sampling, randomization, and simulation
- Connect to AI/ML: weight initialization, data shuffling, mini-batch selection, ε-greedy exploration
- Master the inverse CDF method for generating any distribution from uniform random numbers
Big Picture: From Ancient Dice to Modern Deep Learning
The Birth of Fair Randomness
The concept of "equally likely outcomes" is ancient. Dice made from animal bones (astragali) date back 5,000 years. But formalizing what "fair" means mathematically required the work of pioneering mathematicians.
Historical Journey
- Gerolamo Cardano (1564): In Liber de Ludo Aleae, first systematic study of dice probabilities—each face of a fair die has equal chance.
- Pierre-Simon Laplace (1812): Defined classical probability as "favorable outcomes / total outcomes"—essentially assuming uniform distribution.
- John von Neumann (1946): Middle-square method for generating pseudo-random numbers—the birth of computational randomness.
- Derrick Lehmer (1949): Linear congruential generator—uniform random integers that power modern computing.
The Problem It Solves
The discrete uniform distribution answers a fundamental question: What does it mean to "pick at random" from a finite set? When we say "choose uniformly at random," we mean every element has exactly the same probability of being selected.
Why This Matters Today: Every random number generator produces uniform random numbers. The discrete uniform distribution is the atomic unit of computational randomness—the foundation of Monte Carlo methods, cryptography, and machine learning.
Definition and Probability Mass Function
Definition: Discrete Uniform Distribution
A random variable X follows a Discrete Uniform Distribution on the set{a, a+1, ..., b} if each value is equally likely:
Probability Mass Function (PMF)
Symbol Breakdown
| Symbol | Meaning | Example (Fair Die) |
|---|---|---|
| a | Minimum value (lower bound) | 1 |
| b | Maximum value (upper bound) | 6 |
| n = b - a + 1 | Number of possible values | 6 |
| 1/n | Probability of each outcome | 1/6 ≈ 0.167 |
Special Case: DU(1, n)
When starting from 1, the notation simplifies:
X \sim DU(1, n): \quad P(X = k) = rac{1}{n}, \quad k \in \{1, 2, \ldots, n\}
Example: A fair 6-sided die is
Interactive: Discrete Uniform PMF Explorer
Adjust the bounds a and b to see how the PMF changes. Notice that all bars have equal height—this is the defining property of uniform distributions!
Loading interactive demo...
Key Properties
Cumulative Distribution Function (CDF)
Intuition: The CDF is a staircase function with equal-height steps. Each step has height 1/n, and there are n steps total reaching probability 1.
Expected Value (Mean)
Derivation:
Intuition: The mean is exactly the midpoint of the range. For a fair die:
Variance
For DU(1, n): ext{Var}(X) = rac{n^2 - 1}{12}
Example (Fair die): ext{Var}(X) = rac{36-1}{12} = rac{35}{12} \approx 2.92
Summary Table
| Property | Formula | Fair Die (n=6) |
|---|---|---|
| PMF | P(X=k) = 1/n | 1/6 ≈ 0.167 |
| Mean | E[X] = (a+b)/2 | 3.5 |
| Variance | Var(X) = (n²-1)/12 | 35/12 ≈ 2.92 |
| Std Dev | σ = √(Var) | ≈ 1.71 |
| Support | {a, a+1, ..., b} | {1, 2, 3, 4, 5, 6} |
| Entropy | log₂(n) bits | log₂(6) ≈ 2.58 bits |
Interactive: Dice Roll Simulator
Watch the Law of Large Numbers in action! Roll a die many times and observe how the frequency histogram converges to uniform (each face ≈ 16.7%) and the sample mean approaches 3.5.
Loading interactive demo...
Maximum Entropy Property
Key Theorem: Maximum Entropy
Among all discrete distributions on a finite set of n values, the uniform distribution has maximum entropy:
What Does This Mean?
- Maximum uncertainty: When all outcomes are equally likely, we have no information favoring any outcome. This is the definition of maximum uncertainty.
- No prior information: Using a uniform distribution says "I have no reason to prefer one outcome over another."
- Principle of Indifference: When you know nothing, assume uniform. This is Laplace's principle of insufficient reason.
Interactive: Inverse CDF Transform Method
The inverse CDF method reveals a profound truth: uniform random numbers can generate any distribution! Given , applying produces a random variable with distribution F.
Loading interactive demo...
Real-World Examples
Example 1: Fair Die Roll
Model:
Example 2: Random Card Selection
Setup: 52 cards, select one uniformly at random
- P(any specific card) = 1/52 ≈ 0.019
- P(Ace) = 4/52 = 1/13 ≈ 0.077
- P(Heart) = 13/52 = 1/4 = 0.25
Example 3: Lottery Number Selection
Model: Pick numbers uniformly from 1 to 49
Each number has probability 1/49. The fairness of the lottery depends on achieving true uniformity—any deviation can be exploited!
Example 4: Load Balancing
Setup: n servers, assign each request uniformly at random
- Each server receives 1/n of total traffic in expectation
- Simple but effective baseline for distributed systems
- Used in round-robin DNS and basic load balancers
AI/ML Applications
The discrete uniform distribution is everywhere in machine learning, often hidden behind library functions. Here are the key applications:
1. Xavier/Glorot Uniform Initialization
Neural network weights are initialized uniformly:
Why uniform? Maintains variance across layers, prevents vanishing/exploding gradients, gives equal chance to all initial weight magnitudes.
2. Random Data Shuffling
Before each training epoch, data is shuffled using the Fisher-Yates algorithm:
- For i from n-1 down to 1: swap element i with element j where j ~ DU(0, i)
- Each of n! permutations is equally likely
- Breaks sequential correlations, ensures unbiased training
3. Mini-Batch Index Selection
In SGD, mini-batch indices are sampled uniformly:
- Sample batch_size indices from DU(0, n-1)
- Without replacement for standard mini-batch
- Uniform sampling = unbiased gradient estimate
This is why mini-batch gradient descent converges to the true gradient!
4. ε-Greedy Exploration in Reinforcement Learning
1# With probability ε, explore uniformly
2if np.random.uniform(0, 1) < epsilon:
3 action = np.random.randint(0, n_actions) # DU(0, n-1)
4else:
5 action = np.argmax(Q_values) # ExploitWhy uniform exploration? No prior bias toward any action, maximum entropy exploration, guarantees all actions eventually tried.
5. Hash Functions for ML Systems
- Feature hashing: Maps features to indices uniformly
- MinHash: Uses uniform random permutations for similarity estimation
- Bloom filters: Uniform hash distribution for set membership
Interactive: Mini-Batch Sampling in Neural Networks
Watch how uniform random sampling selects data points for mini-batch training. Over many epochs, each point should be selected approximately equally—this ensures unbiased gradient estimation!
Loading interactive demo...
Python Implementation
1import numpy as np
2from scipy.stats import randint
3
4# ==============================================
5# DISCRETE UNIFORM DISTRIBUTION
6# ==============================================
7
8# scipy.stats.randint(low, high) gives DU(low, high-1)
9# Note: high is EXCLUSIVE
10
11# Create DU(1, 6) - like a fair die
12die = randint(1, 7) # DU(1,6) - high is exclusive!
13
14# PMF: P(X = k)
15print("P(X=1):", die.pmf(1)) # 0.1667
16print("P(X=6):", die.pmf(6)) # 0.1667
17
18# CDF: P(X ≤ k)
19print("P(X≤3):", die.cdf(3)) # 0.5
20
21# Mean and Variance
22print("E[X]:", die.mean()) # 3.5
23print("Var(X):", die.var()) # 2.917
24
25# Entropy
26print("Entropy:", die.entropy()) # ln(6) ≈ 1.79 nats
27
28# Sampling
29samples = die.rvs(size=10000)
30print("Sample mean:", np.mean(samples)) # ≈ 3.5
31
32# ==============================================
33# CHI-SQUARE GOODNESS OF FIT TEST
34# ==============================================
35
36from scipy.stats import chisquare
37
38# Count occurrences of each face
39observed = np.bincount(samples, minlength=7)[1:7] # faces 1-6
40expected = np.ones(6) * len(samples) / 6
41
42stat, pval = chisquare(observed, expected)
43print(f"Chi-square: {stat:.2f}, p-value: {pval:.4f}")
44# High p-value → consistent with uniform
45
46# ==============================================
47# FISHER-YATES SHUFFLE (ML DATA SHUFFLING)
48# ==============================================
49
50def fisher_yates_shuffle(arr):
51 """Uniform random shuffle - each permutation equally likely"""
52 arr = arr.copy()
53 n = len(arr)
54 for i in range(n - 1, 0, -1):
55 # j ~ DU(0, i)
56 j = np.random.randint(0, i + 1)
57 arr[i], arr[j] = arr[j], arr[i]
58 return arr
59
60data = np.arange(10)
61shuffled = fisher_yates_shuffle(data)
62print("Shuffled:", shuffled)
63
64# ==============================================
65# XAVIER UNIFORM INITIALIZATION
66# ==============================================
67
68def xavier_uniform(fan_in, fan_out, shape):
69 """Xavier/Glorot uniform initialization"""
70 limit = np.sqrt(6.0 / (fan_in + fan_out))
71 return np.random.uniform(-limit, limit, shape)
72
73# Initialize a 100x50 weight matrix
74weights = xavier_uniform(100, 50, (100, 50))
75print("Weight mean:", np.mean(weights)) # ≈ 0
76print("Weight std:", np.std(weights)) # ≈ 0.1
77
78# ==============================================
79# INVERSE CDF METHOD
80# ==============================================
81
82def inverse_cdf_exponential(u, lambd=1.0):
83 """Transform U~Uniform(0,1) to Exponential(λ)"""
84 return -np.log(1 - u) / lambd
85
86# Generate exponential samples from uniform
87u_samples = np.random.uniform(0, 1, 10000)
88exp_samples = inverse_cdf_exponential(u_samples, lambd=2.0)
89print("Exponential mean:", np.mean(exp_samples)) # ≈ 0.5 = 1/λ
90
91# ==============================================
92# MINI-BATCH SAMPLING
93# ==============================================
94
95def sample_minibatch(n_samples, batch_size):
96 """Sample batch indices uniformly without replacement"""
97 return np.random.choice(n_samples, size=batch_size, replace=False)
98
99# Simulate one epoch
100n_samples = 1000
101batch_size = 32
102n_batches = n_samples // batch_size
103
104for batch_idx in range(n_batches):
105 indices = sample_minibatch(n_samples, batch_size)
106 # Process batch...
107 passCommon Pitfalls
Using rand() % n doesn't give uniform distribution when RAND_MAX+1 isn't divisible by n. Use np.random.randint() or rejection sampling.
Naive shuffles (swap each element with random other element) produce biased permutations. Always use Fisher-Yates for true uniformity.
Using the same random seed produces identical "random" sequences. Set seed once at program start, or use different seeds for reproducibility testing.
scipy.stats.randint(low, high) gives DU(low, high-1)—the upper bound isexclusive! For a die, use randint(1, 7), not randint(1, 6).
Real-world data is rarely uniform. Don't assume uniformity without testing. Use chi-square tests to verify.
Test Your Understanding
Loading interactive demo...
Summary
Key Takeaways
- DU(a, b) assigns equal probability 1/n to each value in {a, ..., b}, where n = b - a + 1.
- Key properties: Mean = (a+b)/2, Variance = (n²-1)/12, Entropy = log(n).
- Maximum entropy: Uniform has the highest entropy among all distributions on a fixed finite set—representing maximum uncertainty.
- Inverse CDF method: U ~ Uniform(0,1) can generate any distribution via X = F⁻¹(U). Uniform is the "universal random source."
- AI/ML applications: Weight initialization, data shuffling, mini-batch selection, ε-greedy exploration, hashing.
- Law of Large Numbers: Sample frequencies converge to 1/n as sample size increases.
Quick Reference
| Property | Formula | Example: Fair Die |
|---|---|---|
| PMF | P(X=k) = 1/n | 1/6 |
| CDF | F(x) = (⌊x⌋ - a + 1)/n | F(3) = 0.5 |
| Mean | (a + b)/2 | 3.5 |
| Variance | (n² - 1)/12 | 2.92 |
| Entropy | log(n) | log(6) ≈ 2.58 bits |
Looking Ahead: In the next section, we'll explore Relationships Between Distributions—how Binomial becomes Poisson, how uniform generates exponential, and the beautiful connections that unify probability theory.