Chapter 5
25 min read
Section 32 of 175

Uniform Distribution

Continuous Distributions

Learning Objectives

By the end of this section, you will be able to:

  1. Define the continuous uniform distribution U(a,b)U(a, b) and understand its parameters
  2. Interpret the uniform distribution as "equal probability everywhere" within bounds
  3. Calculate probabilities using the PDF and CDF for any interval
  4. Apply inverse transform sampling to generate samples from ANY distribution using uniform
  5. Recognize uniform as the maximum entropy distribution for bounded support
  6. Use uniform distribution in Monte Carlo simulation and numerical integration
  7. Implement uniform distribution operations in Python

Deep Intuition: The Fairness Distribution

Think of it as "I have no reason to favor any value over another."

When you know a value lies somewhere in a range but have absolutely no information about where it's more likely to be, the uniform distribution is the only logical choice. It treats every point in the range with equal respect.

The Universal Generator Mental Model

Here's the profound insight:

  • 🎲 Every computer random number generator produces uniform first
  • 🔄 From uniform, you can generate ANY other distribution
  • 🌱 Uniform is the "stem cell" of probability distributions
  • ⚖️ It's mathematically "fair" - no value is privileged

This makes uniform the most fundamental continuous distribution!

The Historical Principle: Insufficient Reason

In 1814, Pierre-Simon Laplace formalized the Principle of Insufficient Reason:

"When we have no information to distinguish between possibilities, we should assign them equal probabilities."

This principle led directly to the uniform distribution. It's not just a mathematical convenience - it's a statement about rational belief in the absence of information.


Why Do We Need the Uniform Distribution?

The uniform distribution serves three critical roles in probability and computing:

🎯 Role 1: Random Generation

All computer RNGs produce uniform first. Every random sample from any distribution starts as a uniform random number.

⚖️ Role 2: Maximum Entropy

Uniform maximizes entropy for bounded support. It represents "maximum ignorance" - the least informative distribution.

📊 Role 3: Integration

Foundation of Monte Carlo methods. Estimating integrals by averaging function values at uniform random points.

These three roles make uniform distribution appear everywhere:

DomainHow Uniform Is Used
Random Number GenerationMersenne Twister, xorshift produce U(0,1)
SimulationGenerate random events, arrival times, positions
Monte Carlo IntegrationEstimate integrals using random samples
CryptographySecure randomness requires perfect uniformity
A/B TestingRandom user assignment to treatment groups
Game DevelopmentSpawn positions, random events, loot drops
Machine LearningWeight initialization, dropout, data augmentation

What Data Can We Model?

USE Uniform When:

  • Random angles - Equally likely in [0, 2π]
  • Random times within a known interval
  • Rounding errors - Random portion of digit
  • Hash function outputs (well-designed)
  • Non-informative priors for bounded parameters
  • Random positions in a bounded region
  • Quantization noise in signal processing

Do NOT Use Uniform When:

  • Values cluster around a center → Use Normal
  • Support is unbounded → Use exponential, normal, etc.
  • Rare events matter more → Use power-law
  • Prior knowledge suggests non-uniform → Use appropriate prior
  • Natural phenomena (heights, errors) → Usually not uniform

The Fairness Test

Ask yourself: "Is there any reason to believe some values are more likely than others?" If genuinely NO, use uniform. If ANY reason exists, a different distribution is probably more appropriate.

What Does the Distribution Tell Us?

Let XU(a,b)X \sim U(a, b). Here's what each quantity means:

QuantityFormulaWhat It Tells You
MeanE[X] = (a + b) / 2Exactly the midpoint - perfect symmetry
VarianceVar(X) = (b - a)² / 12Spread depends only on range width
PDF Heightf(x) = 1/(b - a)Inversely related to range width

The PDF: A Perfect Rectangle

f(x)={1baif axb0otherwisef(x) = \begin{cases} \frac{1}{b-a} & \text{if } a \leq x \leq b \\ 0 & \text{otherwise} \end{cases}

Interpretation: The probability density is constant everywhere within [a, b]. The height is 1/(b-a) because:

Area=Height×Width=1ba×(ba)=1\text{Area} = \text{Height} \times \text{Width} = \frac{1}{b-a} \times (b-a) = 1 \quad \checkmark

The CDF: A Linear Ramp

F(x)={0if x<axabaif axb1if x>bF(x) = \begin{cases} 0 & \text{if } x < a \\ \frac{x-a}{b-a} & \text{if } a \leq x \leq b \\ 1 & \text{if } x > b \end{cases}

Interpretation: The CDF increases linearly from 0 to 1 across the interval. At any point, it tells you what fraction of the interval is below that point.

The Key Insight: Proportional Probability

For uniform distribution, probability is purely about proportion of length:

P(cXd)=dcba=interval lengthtotal lengthP(c \leq X \leq d) = \frac{d - c}{b - a} = \frac{\text{interval length}}{\text{total length}}

This is the defining characteristic of uniformity!


Exploring the Distribution

Use this interactive visualizer to explore how the uniform distribution behaves. Adjust the bounds a and b, and see how the PDF and CDF change:

📊 Uniform Distribution Explorer

Adjust bounds a and b to explore the PDF, CDF, and probabilities

Lower Bound (a)a = 2.0
Upper Bound (b)b = 8.0
0.22.65.07.49.8x0.000.100.20abμ = 5.00PDF f(x)CDF F(x)
Range (b - a)
6.00
Mean (μ)
5.000
= (a+b)/2
Variance (σ²)
3.000
= (b-a)²/12
PDF Height
0.167
= 1/(b-a)
Probability Density Function
f(x) = 1/(b-a) = 0.1667 for 2 ≤ x ≤ 8
Cumulative Distribution Function
F(x) = (x - a) / (b - a) for 2 ≤ x ≤ 8

What Do You Notice?

  • Wider range → Lower PDF height: The probability gets "spread thinner" over a larger area
  • PDF is always a rectangle: Height adjusts to keep area = 1
  • CDF is always linear: Equal probability accumulation rate everywhere
  • Mean is always centered: Exactly at (a+b)/2

Mathematical Derivation

Let's derive everything from first principles, understanding why each formula must be what it is.

Deriving the PDF from Fairness

Start with the requirement: equal probability density everywhere. This means f(x) = c (constant) for all x ∈ [a, b].

The total probability must be 1:

abf(x)dx=1    abcdx=c(ba)=1    c=1ba\int_a^b f(x) \, dx = 1 \implies \int_a^b c \, dx = c(b-a) = 1 \implies c = \frac{1}{b-a}

Therefore:

f(x)=1bafor axbf(x) = \frac{1}{b-a} \quad \text{for } a \leq x \leq b

Deriving the CDF from the PDF

The CDF is the integral of the PDF from -∞ to x:

F(x)=xf(t)dt=ax1badt=xabaF(x) = \int_{-\infty}^{x} f(t) \, dt = \int_a^x \frac{1}{b-a} \, dt = \frac{x-a}{b-a}

The Linear CDF

The CDF of a uniform distribution is linear because the integral of a constant is linear. This is unique to the uniform distribution - no other continuous distribution has a perfectly linear CDF!

Deriving the Mean

E[X]=abx1badx=1bax22ab=b2a22(ba)=(b+a)(ba)2(ba)=a+b2E[X] = \int_a^b x \cdot \frac{1}{b-a} \, dx = \frac{1}{b-a} \cdot \frac{x^2}{2}\bigg|_a^b = \frac{b^2 - a^2}{2(b-a)} = \frac{(b+a)(b-a)}{2(b-a)} = \frac{a+b}{2}

The mean is exactly the midpoint! This makes perfect sense - the distribution is symmetric around the center.

Deriving the Variance

First, find E[X2]E[X^2]:

E[X2]=abx21badx=1bax33ab=b3a33(ba)=a2+ab+b23E[X^2] = \int_a^b x^2 \cdot \frac{1}{b-a} \, dx = \frac{1}{b-a} \cdot \frac{x^3}{3}\bigg|_a^b = \frac{b^3 - a^3}{3(b-a)} = \frac{a^2 + ab + b^2}{3}

Then use Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2:

Var(X)=a2+ab+b23(a+b2)2=(ba)212\text{Var}(X) = \frac{a^2 + ab + b^2}{3} - \left(\frac{a+b}{2}\right)^2 = \frac{(b-a)^2}{12}

The Magic Number 12

The factor of 12 in the variance formula is fundamental. For the standard uniform U(0, 1):

Var(U)=1120.0833\text{Var}(U) = \frac{1}{12} \approx 0.0833

This small variance reflects that values are bounded and spread uniformly over a finite range.


Key Properties

PropertyFormulaInterpretation
Mean (μ)(a + b) / 2Center of the interval
Variance (σ²)(b - a)² / 12Spread proportional to range squared
Std Dev (σ)(b - a) / √12About 29% of range width
Median(a + b) / 2Same as mean (symmetric)
ModeAny point in [a, b]Every point is equally likely!
Skewness0Perfectly symmetric
Kurtosis9/5 = 1.8Less peaked than normal (platykurtic)

The Standard Uniform: U(0, 1)

The standard uniform U(0, 1) is special because:

🎯 U(0, 1) - The Canonical Uniform

PDF: f(x) = 1 for 0 ≤ x ≤ 1

CDF: F(x) = x for 0 ≤ x ≤ 1

Mean: 0.5

Variance: 1/12 ≈ 0.0833

Key Property: Any U(a, b) can be generated from U(0, 1):

If UU(0,1) then X=a+(ba)UU(a,b)\text{If } U \sim U(0, 1) \text{ then } X = a + (b-a)U \sim U(a, b)

The Inverse Transform Method

This is perhaps the most important application of the uniform distribution. It answers: How can we generate random samples from ANY distribution?

The Fundamental Theorem: If U ~ U(0, 1) and F is any CDF with inverse F⁻¹, then X = F⁻¹(U) has CDF F.

Why Does This Work?

Let X = F⁻¹(U) where U ~ U(0, 1). We want to prove X has CDF F:

P(Xx)=P(F1(U)x)=P(UF(x))=F(x)P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)

The last step uses the fact that for U ~ U(0, 1), P(U ≤ p) = p!

The Algorithm

  1. Generate U ~ U(0, 1) using computer RNG
  2. Compute X = F⁻¹(U) using inverse CDF of target distribution
  3. X now follows the target distribution!

Use this interactive demo to see the inverse transform method in action:

🔄 Inverse Transform Sampling Demo

Watch how U(0,1) samples transform into any distribution via inverse CDF

CDF and Inverse Transform Process
F(x) / U0.01.32.53.85.00.000.250.500.751.00CDF F(x)
Histogram of Generated Samples (0 samples)
0.02.55.0

How It Works

  1. 1. Generate U ~ U(0, 1) on the y-axis
  2. 2. Draw horizontal line to hit the CDF curve
  3. 3. Drop vertically to find X = F⁻¹(U) on the x-axis
  4. 4. X follows the target distribution!
As you add more samples, the histogram converges to the true PDF (red curve).

Examples of Inverse Transform

DistributionCDF F(x)Inverse F⁻¹(u)Sample as...
Exponential(λ)1 - e^(-λx)-ln(1-u)/λ-ln(U)/λ
Weibull(k, λ)1 - e^(-(x/λ)^k)λ(-ln(1-u))^(1/k)λ(-ln(U))^(1/k)
Logistic(0, 1)1/(1+e^(-x))ln(u/(1-u))ln(U/(1-U))
Cauchy(0, 1)(1/π)arctan(x) + 1/2tan(π(u-1/2))tan(π(U-1/2))

When Inverse Doesn't Exist Analytically

For distributions like the normal, the inverse CDF has no closed form. In these cases, we use:

  • Numerical approximation - Tables or algorithms for Φ⁻¹
  • Box-Muller transform - Two uniforms → two normals
  • Rejection sampling - Accept/reject uniform proposals

Monte Carlo Integration

One of the most powerful applications of uniform distribution is Monte Carlo integration - estimating integrals using random samples.

The Core Idea

Want to compute abg(x)dx\int_a^b g(x) \, dx? Here's the trick:

abg(x)dx=(ba)E[g(U)]where UU(a,b)\int_a^b g(x) \, dx = (b-a) \cdot E[g(U)] \quad \text{where } U \sim U(a, b)

So we can estimate the integral by:

  1. Generate n samples U₁, U₂, ..., Uₙ ~ U(a, b)
  2. Compute the average: gˉ=1ni=1ng(Ui)\bar{g} = \frac{1}{n}\sum_{i=1}^{n} g(U_i)
  3. Multiply by width: I^=(ba)gˉ\hat{I} = (b-a) \cdot \bar{g}

By the Law of Large Numbers, this converges to the true integral as n → ∞!

Try this interactive demo that estimates π using Monte Carlo:

🎯 Monte Carlo Estimation of π

Estimate π by throwing random darts at a unit square with a quarter circle

Points in circle (blue) vs outside (red)
xy
In circle: 0Outside: 0
Estimated π
0.000000
True π = 3.141593
Error: 3.141593 (100.00%)
Total Points
0
In Circle %
0.00%

The Math Behind It

The quarter circle has area = π/4, while the unit square has area = 1.

Ratio: (Points in circle) / (Total points) ≈ π/4

Therefore: π ≈ 4 × (Points in circle) / (Total points)

By the Law of Large Numbers, as n → ∞, the estimate converges to true π. Error decreases as O(1/√n).

Why Monte Carlo Matters

  • High dimensions: Traditional integration fails in high-D, but Monte Carlo keeps working
  • Complex regions: Irregular integration domains are no problem
  • Error rate: Error decreases as 1/√n regardless of dimension!

The Monte Carlo Advantage

In D dimensions, traditional numerical integration has error O(n^(-1/D)), which is terrible for large D. Monte Carlo has error O(n^(-1/2)) regardless of dimension. This is why it dominates high-dimensional integration in ML/physics.


Real-World Applications

1. Random Number Generation

💻 The Foundation of All Randomness

Every computer random number generator (Mersenne Twister, xorshift, PCG, etc.) produces U(0, 1) as its fundamental output. All other distributions are derived from this.

🐍rng_example.py
1import numpy as np
2
3# This is what the RNG actually produces
4u = np.random.random()  # U(0, 1) sample
5
6# Everything else is derived from it
7normal = np.random.randn()    # Uses Box-Muller on uniform
8exponential = np.random.exponential()  # Uses -ln(U)

2. Cryptography

🔐 Security Requires Perfect Uniformity

Cryptographic keys must be generated from uniform distributions. Any bias in the distribution creates a vulnerability that attackers can exploit.

  • Key generation requires uniform random bits
  • Initialization vectors (IVs) must be uniformly random
  • Nonces in encryption schemes must be uniform

3. A/B Testing

📊 Fair Random Assignment

When assigning users to A/B test variants, we need uniform random assignment to ensure unbiased groups. Bias in assignment invalidates statistical conclusions.

🐍ab_test.py
1def assign_variant(user_id):
2    # Hash gives uniform distribution
3    u = hash(user_id) / MAX_HASH
4    if u < 0.5:
5        return "control"
6    else:
7        return "treatment"

4. Simulation

🎮 Random Events and Positions

  • Random spawn positions: U(0, map_width) × U(0, map_height)
  • Random angles: U(0, 2π)
  • Loot drop rolls: U(0, 1) compared to drop rate
  • Traffic simulation: random arrival times

AI/ML Applications

Uniform distribution is ubiquitous in machine learning, often working behind the scenes:

1. Weight Initialization

🧠 Xavier/Glorot Initialization

The famous Xavier initialization uses uniform distribution:

WU(6nin+nout,6nin+nout)W \sim U\left(-\sqrt{\frac{6}{n_{in} + n_{out}}}, \sqrt{\frac{6}{n_{in} + n_{out}}}\right)

Why uniform? It provides bounded initialization with controlled variance, preventing exploding/vanishing gradients.

🐍weight_init.py
1import torch.nn as nn
2
3# Xavier uniform initialization
4nn.init.xavier_uniform_(layer.weight)
5
6# Equivalent to:
7# limit = sqrt(6 / (fan_in + fan_out))
8# W ~ U(-limit, limit)

2. Dropout

🎲 Bernoulli from Uniform

Dropout generates Bernoulli masks from uniform:

🐍dropout.py
1def dropout(x, p):
2    mask = np.random.uniform(0, 1, x.shape) > p
3    return x * mask / (1 - p)

By comparing U(0, 1) to threshold p, we get Bernoulli(1-p) for each neuron.

3. Data Augmentation

📷 Random Transformations

AugmentationUniform Distribution Used
Random cropU(0, max_offset) for x and y positions
Random rotationU(-θ_max, θ_max) for angle
Random brightnessU(1-δ, 1+δ) for brightness factor
Random flipU(0, 1) < 0.5 triggers flip
Random scaleU(min_scale, max_scale)

🔍 Random Search

Random hyperparameter search uses uniform distributions:

🐍hyperparam_search.py
1# Learning rate: log-uniform (uniform in log space)
2log_lr = np.random.uniform(np.log(1e-5), np.log(1e-1))
3lr = np.exp(log_lr)
4
5# Dropout: uniform
6dropout = np.random.uniform(0.1, 0.5)
7
8# Hidden units: discrete uniform
9hidden = np.random.randint(64, 512)

5. Variational Inference

📐 Reparameterization Trick

VAEs sample from latent distributions using uniform-based transforms:

z=μ+σΦ1(U)where UU(0,1)z = \mu + \sigma \cdot \Phi^{-1}(U) \quad \text{where } U \sim U(0,1)

This allows gradients to flow through the sampling operation.


Python Implementation

Basic Operations

🐍uniform_basics.py
1import numpy as np
2from scipy import stats
3
4# Create uniform distribution U(2, 8)
5a, b = 2, 8
6uniform_dist = stats.uniform(loc=a, scale=b-a)  # NOTE: scale = b - a
7
8# PDF
9x = 5
10pdf_value = uniform_dist.pdf(x)
11print(f"f({x}) = {pdf_value:.4f}")  # 0.1667 = 1/(8-2)
12
13# CDF
14cdf_value = uniform_dist.cdf(x)
15print(f"F({x}) = {cdf_value:.4f}")  # 0.5 = (5-2)/(8-2)
16
17# Probability of interval
18prob = uniform_dist.cdf(6) - uniform_dist.cdf(4)
19print(f"P(4 ≤ X ≤ 6) = {prob:.4f}")  # 0.3333 = 2/6
20
21# Mean and variance
22print(f"Mean = {uniform_dist.mean():.4f}")  # 5.0
23print(f"Var = {uniform_dist.var():.4f}")   # 3.0
24
25# Generate samples
26samples = uniform_dist.rvs(size=10000)
27print(f"Sample mean: {samples.mean():.4f}")
28print(f"Sample var: {samples.var():.4f}")

Inverse Transform Sampling

🐍inverse_transform.py
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5# Generate standard uniform samples
6n = 10000
7u = np.random.uniform(0, 1, n)
8
9# Transform to exponential using inverse CDF
10# F(x) = 1 - e^(-λx), so F^(-1)(u) = -ln(1-u)/λ
11lambda_rate = 2.0
12exponential_samples = -np.log(1 - u) / lambda_rate
13
14# Verify: compare with scipy
15true_exponential = stats.expon(scale=1/lambda_rate).rvs(n)
16
17# Plot comparison
18fig, axes = plt.subplots(1, 2, figsize=(12, 4))
19
20axes[0].hist(exponential_samples, bins=50, density=True, alpha=0.7,
21             label='Inverse Transform')
22x = np.linspace(0, 5, 100)
23axes[0].plot(x, lambda_rate * np.exp(-lambda_rate * x), 'r-',
24             label='True PDF', linewidth=2)
25axes[0].legend()
26axes[0].set_title('Exponential from Uniform')
27
28axes[1].hist(true_exponential, bins=50, density=True, alpha=0.7,
29             label='scipy.stats')
30axes[1].plot(x, lambda_rate * np.exp(-lambda_rate * x), 'r-',
31             label='True PDF', linewidth=2)
32axes[1].legend()
33axes[1].set_title('Direct scipy Generation')
34
35plt.tight_layout()
36plt.show()

Monte Carlo Integration

🐍monte_carlo.py
1import numpy as np
2
3def monte_carlo_integrate(f, a, b, n=10000):
4    """Estimate integral of f from a to b using Monte Carlo."""
5    u = np.random.uniform(a, b, n)
6    return (b - a) * np.mean(f(u))
7
8# Example 1: Integral of sin(x) from 0 to π
9# True value: -cos(π) + cos(0) = 2
10estimate = monte_carlo_integrate(np.sin, 0, np.pi, n=100000)
11print(f"∫sin(x)dx from 0 to π: {estimate:.6f} (true: 2.0)")
12
13# Example 2: Estimate π using quarter circle
14# Area of quarter circle = π/4, so π = 4 * (fraction of points in circle)
15n = 100000
16x = np.random.uniform(0, 1, n)
17y = np.random.uniform(0, 1, n)
18in_circle = (x**2 + y**2) <= 1
19pi_estimate = 4 * np.mean(in_circle)
20print(f"π estimate: {pi_estimate:.6f} (true: {np.pi:.6f})")
21
22# Example 3: Higher-dimensional integral
23# ∫∫∫ e^(-(x² + y² + z²)) dx dy dz over [-1, 1]³
24def integrand(xyz):
25    return np.exp(-np.sum(xyz**2, axis=1))
26
27n = 100000
28samples = np.random.uniform(-1, 1, (n, 3))
29volume = 2**3  # volume of [-1, 1]³
30estimate = volume * np.mean(integrand(samples))
31print(f"3D Gaussian integral estimate: {estimate:.6f}")

Common Pitfalls

SciPy Parameterization

SciPy uses loc (lower bound) and scale(width), NOT (a, b) directly:

🐍scipy_warning.py
1from scipy import stats
2
3# For U(2, 8):
4correct = stats.uniform(loc=2, scale=6)   # ✓ scale = 8 - 2 = 6
5wrong = stats.uniform(2, 8)               # ✗ This gives U(2, 10)!
6
7# Always verify:
8print(correct.mean())  # Should be 5.0 for U(2, 8)

NumPy vs SciPy

NumPy and SciPy have different conventions:

🐍numpy_scipy.py
1import numpy as np
2from scipy import stats
3
4# NumPy: uses (low, high) directly
5np.random.uniform(2, 8)  # U(2, 8) ✓
6
7# SciPy: uses (loc, scale)
8stats.uniform(loc=2, scale=6)  # U(2, 8) ✓
9stats.uniform(2, 8)  # U(2, 10) ✗ - NOT what you expect!

Continuous vs Discrete

Don't confuse continuous and discrete uniform:

  • np.random.uniform(a, b) - continuous U(a, b)
  • np.random.randint(a, b) - discrete uniform on{a, a+1, ..., b-1}
  • np.random.choice(arr) - discrete uniform over array elements

Test Your Understanding

Check your understanding of the uniform distribution with these practice problems:

📝 Uniform Distribution Quiz

Question 1 of 8

For X ~ U(2, 10), what is the probability P(4 ≤ X ≤ 7)?

Current Score: 0 / 0

Summary

The uniform distribution is deceptively simple yet profoundly important. It is the foundation upon which all random number generation is built.

Key Formulas

PropertyFormula
PDFf(x) = 1/(b-a) for a ≤ x ≤ b
CDFF(x) = (x-a)/(b-a) for a ≤ x ≤ b
MeanE[X] = (a+b)/2
VarianceVar(X) = (b-a)²/12
Interval ProbabilityP(c ≤ X ≤ d) = (d-c)/(b-a)
TransformIf U ~ U(0,1), then a + (b-a)U ~ U(a,b)

Key Takeaways

  1. Uniform distribution models "equal probability everywhere" within bounded support
  2. Every computer RNG produces uniform first - it's the universal generator
  3. Inverse transform sampling: U(0,1) + inverse CDF = any distribution
  4. Monte Carlo integration uses uniform samples to estimate integrals
  5. Uniform maximizes entropy for bounded support - the "least informative" distribution
  6. In ML: weight initialization, dropout, data augmentation all use uniform
The Essence of Uniform:
"The uniform distribution is the mathematical expression of fairness: equal treatment for all values in its support, and the mother of all distributions through inverse transform sampling."
Coming Next: In the next section, we'll explore the Normal Distribution - the famous bell curve that arises from the Central Limit Theorem and dominates natural phenomena. You'll see why it's often called "the most important distribution in all of statistics."
Loading comments...