Chapter 4
20 min read
Section 30 of 175

Discrete Uniform Distribution

Discrete Distributions

Learning Objectives

By the end of this section, you will:

  • Define the Discrete Uniform Distribution and understand when it applies
  • Derive the PMF, CDF, mean, variance, and entropy with intuitive understanding
  • Recognize real-world scenarios: dice, cards, random selection, hashing
  • Apply the distribution in sampling, randomization, and simulation
  • Connect to AI/ML: weight initialization, data shuffling, mini-batch selection, ε-greedy exploration
  • Master the inverse CDF method for generating any distribution from uniform random numbers

Big Picture: From Ancient Dice to Modern Deep Learning

The Birth of Fair Randomness

The concept of "equally likely outcomes" is ancient. Dice made from animal bones (astragali) date back 5,000 years. But formalizing what "fair" means mathematically required the work of pioneering mathematicians.

Historical Journey

  1. Gerolamo Cardano (1564): In Liber de Ludo Aleae, first systematic study of dice probabilities—each face of a fair die has equal chance.
  2. Pierre-Simon Laplace (1812): Defined classical probability as "favorable outcomes / total outcomes"—essentially assuming uniform distribution.
  3. John von Neumann (1946): Middle-square method for generating pseudo-random numbers—the birth of computational randomness.
  4. Derrick Lehmer (1949): Linear congruential generator—uniform random integers that power modern computing.

The Problem It Solves

The discrete uniform distribution answers a fundamental question: What does it mean to "pick at random" from a finite set? When we say "choose uniformly at random," we mean every element has exactly the same probability of being selected.

Why This Matters Today: Every random number generator produces uniform random numbers. The discrete uniform distribution is the atomic unit of computational randomness—the foundation of Monte Carlo methods, cryptography, and machine learning.

Definition and Probability Mass Function

Definition: Discrete Uniform Distribution

A random variable X follows a Discrete Uniform Distribution on the set{a, a+1, ..., b} if each value is equally likely:

XextDiscreteUniform(a,b)extorXDU(a,b)X \sim ext{DiscreteUniform}(a, b) \quad ext{or} \quad X \sim DU(a, b)

Probability Mass Function (PMF)

P(X = k) = rac{1}{n} = rac{1}{b - a + 1}, \quad k \in \{a, a+1, \ldots, b\}

Symbol Breakdown

SymbolMeaningExample (Fair Die)
aMinimum value (lower bound)1
bMaximum value (upper bound)6
n = b - a + 1Number of possible values6
1/nProbability of each outcome1/6 ≈ 0.167
Intuition: If you have n equally likely outcomes, each one gets probability 1/n. It's the mathematical embodiment of "no outcome is more special than another."

Special Case: DU(1, n)

When starting from 1, the notation simplifies:

X \sim DU(1, n): \quad P(X = k) = rac{1}{n}, \quad k \in \{1, 2, \ldots, n\}

Example: A fair 6-sided die is XsimDU(1,6)X sim DU(1, 6)


Interactive: Discrete Uniform PMF Explorer

Adjust the bounds a and b to see how the PMF changes. Notice that all bars have equal height—this is the defining property of uniform distributions!

Loading interactive demo...


Key Properties

Cumulative Distribution Function (CDF)

F(x) = P(X leq x) = rac{\lfloor x \rfloor - a + 1}{n}, quad a leq x leq b

Intuition: The CDF is a staircase function with equal-height steps. Each step has height 1/n, and there are n steps total reaching probability 1.

Expected Value (Mean)

E[X] = rac{a + b}{2}

Derivation:

E[X] = \sum_{k=a}^{b} k \cdot rac{1}{n} = rac{1}{n} \cdot rac{n(a+b)}{2} = rac{a+b}{2}

Intuition: The mean is exactly the midpoint of the range. For a fair die: E[X]=(1+6)/2=3.5E[X] = (1+6)/2 = 3.5

Variance

ext{Var}(X) = rac{(b-a+1)^2 - 1}{12} = rac{n^2 - 1}{12}

For DU(1, n): ext{Var}(X) = rac{n^2 - 1}{12}

Example (Fair die): ext{Var}(X) = rac{36-1}{12} = rac{35}{12} \approx 2.92

Summary Table

PropertyFormulaFair Die (n=6)
PMFP(X=k) = 1/n1/6 ≈ 0.167
MeanE[X] = (a+b)/23.5
VarianceVar(X) = (n²-1)/1235/12 ≈ 2.92
Std Devσ = √(Var)≈ 1.71
Support{a, a+1, ..., b}{1, 2, 3, 4, 5, 6}
Entropylog₂(n) bitslog₂(6) ≈ 2.58 bits

Interactive: Dice Roll Simulator

Watch the Law of Large Numbers in action! Roll a die many times and observe how the frequency histogram converges to uniform (each face ≈ 16.7%) and the sample mean approaches 3.5.

Loading interactive demo...


Maximum Entropy Property

Key Theorem: Maximum Entropy

Among all discrete distributions on a finite set of n values, the uniform distribution has maximum entropy:

H(X)=sumk=1nP(X=k)logP(X=k)=log(n)H(X) = -sum_{k=1}^{n} P(X=k) log P(X=k) = log(n)

What Does This Mean?

  • Maximum uncertainty: When all outcomes are equally likely, we have no information favoring any outcome. This is the definition of maximum uncertainty.
  • No prior information: Using a uniform distribution says "I have no reason to prefer one outcome over another."
  • Principle of Indifference: When you know nothing, assume uniform. This is Laplace's principle of insufficient reason.
AI/ML Connection: This is why uniform priors in Bayesian inference represent "no prior information." It's also why maximum entropy methods are used when we want to make minimal assumptions about a distribution.

Interactive: Inverse CDF Transform Method

The inverse CDF method reveals a profound truth: uniform random numbers can generate any distribution! Given UextUniform(0,1)U \sim ext{Uniform}(0,1), applying X=F1(U)X = F^{-1}(U) produces a random variable with distribution F.

Loading interactive demo...

This is why uniform distribution is called the "universal random source." All random number generators produce uniform numbers, which can then be transformed into any desired distribution!

Real-World Examples

Example 1: Fair Die Roll

Model: XsimDU(1,6)X sim DU(1, 6)

  • P(X=4)=1/6approx0.167P(X = 4) = 1/6 approx 0.167
  • P(Xleq3)=3/6=0.5P(X leq 3) = 3/6 = 0.5
  • E[X]=3.5,extVar(X)=35/122.92E[X] = 3.5, \quad ext{Var}(X) = 35/12 \approx 2.92

Example 2: Random Card Selection

Setup: 52 cards, select one uniformly at random

  • P(any specific card) = 1/52 ≈ 0.019
  • P(Ace) = 4/52 = 1/13 ≈ 0.077
  • P(Heart) = 13/52 = 1/4 = 0.25

Example 3: Lottery Number Selection

Model: Pick numbers uniformly from 1 to 49

Each number has probability 1/49. The fairness of the lottery depends on achieving true uniformity—any deviation can be exploited!

Example 4: Load Balancing

Setup: n servers, assign each request uniformly at random

  • Each server receives 1/n of total traffic in expectation
  • Simple but effective baseline for distributed systems
  • Used in round-robin DNS and basic load balancers

AI/ML Applications

The discrete uniform distribution is everywhere in machine learning, often hidden behind library functions. Here are the key applications:

1. Xavier/Glorot Uniform Initialization

Neural network weights are initialized uniformly:

W sim ext{Uniform}left[-sqrt{ rac{6}{ ext{fan\_in} + ext{fan\_out}}}, sqrt{ rac{6}{ ext{fan\_in} + ext{fan\_out}}} ight]

Why uniform? Maintains variance across layers, prevents vanishing/exploding gradients, gives equal chance to all initial weight magnitudes.

2. Random Data Shuffling

Before each training epoch, data is shuffled using the Fisher-Yates algorithm:

  • For i from n-1 down to 1: swap element i with element j where j ~ DU(0, i)
  • Each of n! permutations is equally likely
  • Breaks sequential correlations, ensures unbiased training

3. Mini-Batch Index Selection

In SGD, mini-batch indices are sampled uniformly:

  • Sample batch_size indices from DU(0, n-1)
  • Without replacement for standard mini-batch
  • Uniform sampling = unbiased gradient estimate

This is why mini-batch gradient descent converges to the true gradient!

4. ε-Greedy Exploration in Reinforcement Learning

🐍python
1# With probability ε, explore uniformly
2if np.random.uniform(0, 1) < epsilon:
3    action = np.random.randint(0, n_actions)  # DU(0, n-1)
4else:
5    action = np.argmax(Q_values)  # Exploit

Why uniform exploration? No prior bias toward any action, maximum entropy exploration, guarantees all actions eventually tried.

5. Hash Functions for ML Systems

  • Feature hashing: Maps features to indices uniformly
  • MinHash: Uses uniform random permutations for similarity estimation
  • Bloom filters: Uniform hash distribution for set membership

Interactive: Mini-Batch Sampling in Neural Networks

Watch how uniform random sampling selects data points for mini-batch training. Over many epochs, each point should be selected approximately equally—this ensures unbiased gradient estimation!

Loading interactive demo...


Python Implementation

🐍python
1import numpy as np
2from scipy.stats import randint
3
4# ==============================================
5# DISCRETE UNIFORM DISTRIBUTION
6# ==============================================
7
8# scipy.stats.randint(low, high) gives DU(low, high-1)
9# Note: high is EXCLUSIVE
10
11# Create DU(1, 6) - like a fair die
12die = randint(1, 7)  # DU(1,6) - high is exclusive!
13
14# PMF: P(X = k)
15print("P(X=1):", die.pmf(1))  # 0.1667
16print("P(X=6):", die.pmf(6))  # 0.1667
17
18# CDF: P(X ≤ k)
19print("P(X≤3):", die.cdf(3))  # 0.5
20
21# Mean and Variance
22print("E[X]:", die.mean())     # 3.5
23print("Var(X):", die.var())   # 2.917
24
25# Entropy
26print("Entropy:", die.entropy())  # ln(6) ≈ 1.79 nats
27
28# Sampling
29samples = die.rvs(size=10000)
30print("Sample mean:", np.mean(samples))  # ≈ 3.5
31
32# ==============================================
33# CHI-SQUARE GOODNESS OF FIT TEST
34# ==============================================
35
36from scipy.stats import chisquare
37
38# Count occurrences of each face
39observed = np.bincount(samples, minlength=7)[1:7]  # faces 1-6
40expected = np.ones(6) * len(samples) / 6
41
42stat, pval = chisquare(observed, expected)
43print(f"Chi-square: {stat:.2f}, p-value: {pval:.4f}")
44# High p-value → consistent with uniform
45
46# ==============================================
47# FISHER-YATES SHUFFLE (ML DATA SHUFFLING)
48# ==============================================
49
50def fisher_yates_shuffle(arr):
51    """Uniform random shuffle - each permutation equally likely"""
52    arr = arr.copy()
53    n = len(arr)
54    for i in range(n - 1, 0, -1):
55        # j ~ DU(0, i)
56        j = np.random.randint(0, i + 1)
57        arr[i], arr[j] = arr[j], arr[i]
58    return arr
59
60data = np.arange(10)
61shuffled = fisher_yates_shuffle(data)
62print("Shuffled:", shuffled)
63
64# ==============================================
65# XAVIER UNIFORM INITIALIZATION
66# ==============================================
67
68def xavier_uniform(fan_in, fan_out, shape):
69    """Xavier/Glorot uniform initialization"""
70    limit = np.sqrt(6.0 / (fan_in + fan_out))
71    return np.random.uniform(-limit, limit, shape)
72
73# Initialize a 100x50 weight matrix
74weights = xavier_uniform(100, 50, (100, 50))
75print("Weight mean:", np.mean(weights))  # ≈ 0
76print("Weight std:", np.std(weights))    # ≈ 0.1
77
78# ==============================================
79# INVERSE CDF METHOD
80# ==============================================
81
82def inverse_cdf_exponential(u, lambd=1.0):
83    """Transform U~Uniform(0,1) to Exponential(λ)"""
84    return -np.log(1 - u) / lambd
85
86# Generate exponential samples from uniform
87u_samples = np.random.uniform(0, 1, 10000)
88exp_samples = inverse_cdf_exponential(u_samples, lambd=2.0)
89print("Exponential mean:", np.mean(exp_samples))  # ≈ 0.5 = 1/λ
90
91# ==============================================
92# MINI-BATCH SAMPLING
93# ==============================================
94
95def sample_minibatch(n_samples, batch_size):
96    """Sample batch indices uniformly without replacement"""
97    return np.random.choice(n_samples, size=batch_size, replace=False)
98
99# Simulate one epoch
100n_samples = 1000
101batch_size = 32
102n_batches = n_samples // batch_size
103
104for batch_idx in range(n_batches):
105    indices = sample_minibatch(n_samples, batch_size)
106    # Process batch...
107    pass

Common Pitfalls

Pitfall 1: Modulo Bias

Using rand() % n doesn't give uniform distribution when RAND_MAX+1 isn't divisible by n. Use np.random.randint() or rejection sampling.

Pitfall 2: Non-Fisher-Yates Shuffle

Naive shuffles (swap each element with random other element) produce biased permutations. Always use Fisher-Yates for true uniformity.

Pitfall 3: Seed Reuse

Using the same random seed produces identical "random" sequences. Set seed once at program start, or use different seeds for reproducibility testing.

Pitfall 4: Confusing scipy.stats.randint Bounds

scipy.stats.randint(low, high) gives DU(low, high-1)—the upper bound isexclusive! For a die, use randint(1, 7), not randint(1, 6).

Pitfall 5: Assuming Uniform When It Isn't

Real-world data is rarely uniform. Don't assume uniformity without testing. Use chi-square tests to verify.


Test Your Understanding

Loading interactive demo...


Summary

Key Takeaways

  1. DU(a, b) assigns equal probability 1/n to each value in {a, ..., b}, where n = b - a + 1.
  2. Key properties: Mean = (a+b)/2, Variance = (n²-1)/12, Entropy = log(n).
  3. Maximum entropy: Uniform has the highest entropy among all distributions on a fixed finite set—representing maximum uncertainty.
  4. Inverse CDF method: U ~ Uniform(0,1) can generate any distribution via X = F⁻¹(U). Uniform is the "universal random source."
  5. AI/ML applications: Weight initialization, data shuffling, mini-batch selection, ε-greedy exploration, hashing.
  6. Law of Large Numbers: Sample frequencies converge to 1/n as sample size increases.
Quick Reference
PropertyFormulaExample: Fair Die
PMFP(X=k) = 1/n1/6
CDFF(x) = (⌊x⌋ - a + 1)/nF(3) = 0.5
Mean(a + b)/23.5
Variance(n² - 1)/122.92
Entropylog(n)log(6) ≈ 2.58 bits
Looking Ahead: In the next section, we'll explore Relationships Between Distributions—how Binomial becomes Poisson, how uniform generates exponential, and the beautiful connections that unify probability theory.