Learning Objectives
By the end of this section, you will be able to:
- Understand why exponentiating a normal random variable creates a log-normal distribution and derive this relationship mathematically
- Distinguish multiplicative from additive processes and recognize when log-normal is the appropriate model
- Calculate the mean, median, mode, and variance—and understand why and are NOT the mean and standard deviation
- Apply log-normal distributions to model stock prices, income distributions, and other real-world phenomena
- Implement log-normal models in Python using scipy.stats
- Connect log-normal concepts to AI/ML applications including weight initialization, attention mechanisms, and uncertainty quantification
The Big Picture: When Multiplication Replaces Addition
"The log-normal distribution is what you get when many small effects multiply together, just as the normal distribution is what you get when many small effects add together."
You already know the Central Limit Theorem: when you add many independent random effects, the sum tends toward a normal distribution. But what happens when effects multiply instead of add?
Think about compound interest: a $100 investment growing at 10% annually becomes $100 × 1.1 × 1.1 × 1.1... This is multiplicative growth. Each year's effect multiplies the previous total. The result? After many periods, the final value follows a log-normal distribution.
The Key Mathematical Insight
Here's the beautiful trick: multiplication becomes addition when you take logarithms!
So if your final value is a product of many random factors, then the logarithm of your final value is a sum of many random terms. By the CLT, this sum is approximately normal. Therefore, the original value (before taking logs) is log-normally distributed.
The Fundamental Relationship
If (normal), then .
Equivalently: If , then .
Multiplicative vs. Additive Processes
Understanding when to use log-normal vs. normal requires recognizing the fundamental nature of the process generating your data:
Additive Processes → Normal Distribution
- Human Height: Many genes each add a small amount to height. Height ≈ base + gene₁ effect + gene₂ effect + ... + environment effects
- Measurement Errors: Total error = sum of many small independent errors
- IQ Scores: Designed to be normally distributed through standardization
Multiplicative Processes → Log-Normal Distribution
- Stock Prices: Today's price = yesterday's price × (1 + return). Returns compound: P_n = P_0 × (1+r_1) × (1+r_2) × ... × (1+r_n)
- Income: Income grows multiplicatively with raises. A 5% raise means multiplying by 1.05, not adding $5,000.
- Biological Growth: Cell populations double each division: N = N_0 × 2^generations. Particle sizes result from breakage/growth processes.
- File Sizes: Programs grow multiplicatively as features multiply complexity.
Quick Test: Is it Multiplicative?
Ask yourself: "Does a 10% change make sense, regardless of the current value?" If a 10% raise makes sense whether you earn $50K or $500K, the process is multiplicative. If it only makes sense to add a fixed amount (like $5,000), it's additive.
Mathematical Definition
Definition 1: Via Transformation
A random variable follows a log-normal distribution with parameters and if:
We write .
Definition 2: Probability Density Function (PDF)
Symbol Table
| Symbol | Name | Meaning | Range |
|---|---|---|---|
| y | Random variable | The value we observe | (0, ∞) |
| μ | Location parameter | Mean of ln(Y), NOT mean of Y | (-∞, ∞) |
| σ | Scale parameter | Std dev of ln(Y), NOT std dev of Y | (0, ∞) |
| ln(y) | Natural logarithm | Log-transformed value | (-∞, ∞) |
Critical Warning: μ and σ Are NOT What You Think!
The parameters and are the mean and standard deviation of , NOT of itself! This is the most common source of confusion with log-normal distributions.
- (The mean of Y is NOT μ)
- (The SD of Y is NOT σ)
Key Statistics
| Statistic | Formula | Intuition |
|---|---|---|
| Mean (E[Y]) | e^(μ + σ²/2) | Always larger than median |
| Median | e^μ | The 50th percentile; simpler formula |
| Mode | e^(μ - σ²) | The peak of the PDF; smallest of the three |
| Variance | (e^σ² - 1) × e^(2μ + σ²) | Grows rapidly with σ |
| Skewness | (e^σ² + 2) × √(e^σ² - 1) | Always positive (right-skewed) |
| Support | (0, ∞) | Only positive values possible |
The Golden Rule: For log-normal distributions, Mean > Median > Mode always. This is because the distribution is always right-skewed—the heavy right tail "pulls" the mean to the right.
Interactive PDF Explorer
Explore how the log-normal distribution changes with different parameters. Pay special attention to how the mean, median, and mode relate to each other:
Log-Normal Distribution Explorer
μ = mean of ln(X), NOT the mean of X
σ = std dev of ln(X), controls skewness
Key Insight: Mean > Median > Mode (Always!)
Notice how Mode < Median < Mean for the log-normal distribution. This is because the distribution is always right-skewed (positive skewness = 1.75). The mean is "pulled" rightward by the heavy right tail. As σ increases, this gap widens.
Try This
- Set σ to 0.2 (small) and observe how the distribution looks almost symmetric
- Increase σ to 1.0+ and watch the right tail stretch dramatically
- Notice how the mean is always pulled to the right of the median
- Hover over the curve to see exact PDF and CDF values
The Exponential Transformation
The heart of the log-normal distribution is the transformation . This interactive visualization shows how a symmetric normal distribution transforms into a right-skewed log-normal:
The Exponential Transformation: Normal → Log-Normal
If X ~ N(μ, σ²), then Y = eX ~ LogNormal(μ, σ). Watch how the symmetric Normal distribution transforms into the right-skewed Log-Normal.
The Transformation Insight
Notice how negative normal values (X < 0) get mapped to values between 0 and 1, while positive normal values (X > 0) get mapped to values greater than 1. This is why log-normal is always positive and right-skewed: the exponential function "stretches" the right side while "compressing" the left.
Why the Transformation Creates Skewness
The exponential function has a key property: it compresses negative values and stretches positive values:
- When X = -2 (2 standard deviations left): Y = e^(-2) ≈ 0.14 (compressed near zero)
- When X = 0 (center): Y = e^0 = 1
- When X = +2 (2 standard deviations right): Y = e^2 ≈ 7.39 (stretched far right)
This asymmetric stretching is why log-normal is always right-skewed!
Understanding Mean, Median, and Mode
For the log-normal distribution, these three measures of central tendency are always in the same order: Mode < Median < Mean.
The Formulas
The Intuition
Think of a room of people with their incomes (a classic log-normal example):
- Mode (most common): What's the most frequent income? This is where the peak of the distribution is—somewhere modest.
- Median (50th percentile): Half the people earn less, half earn more. A "typical" person.
- Mean (average): Add up all incomes and divide. The few billionaires in the room pull this way up!
When to Use Which
- Median: Best "typical" value for communication (e.g., "median income is $50,000")
- Mean: Best for calculations involving totals (e.g., total revenue = mean × count)
- Mode: Best for understanding the most likely outcome
Key Properties of the Log-Normal
Property 1: Products of Log-Normals
If and are independent, then:
The product of log-normals is log-normal! (Compare to: sum of normals is normal.)
Property 2: Powers of Log-Normals
If , then:
Property 3: The Log Transform Normalizes
This is the most practical property: if your data is log-normal, taking logs makes it normal. This means:
- You can use normal-based statistical tests on log-transformed data
- Linear regression on log(Y) is often appropriate
- Confidence intervals are easier to compute in log-space
The Log-Transform Workflow
- Recognize your data is log-normal (right-skewed, positive values)
- Transform: Z = ln(Y), now Z is approximately normal
- Perform analysis on Z using normal-based methods
- Back-transform results: Y = e^Z
Stock Price Modeling
The Geometric Brownian Motion model, which underlies the famous Black-Scholes option pricing formula, assumes stock prices follow a log-normal distribution.
The Model
Stock prices evolve according to:
This stochastic differential equation has the solution:
Since , the term in brackets is normal, so is log-normally distributed.
Stock Price Simulator: Geometric Brownian Motion
Stock prices follow Geometric Brownian Motion, which means final prices are log-normally distributed. Adjust parameters and watch how the distribution of final prices changes.
50 Simulated Price Paths
Final Price Distribution (Log-Normal)
Black-Scholes Connection
This is exactly the model used in the Black-Scholes option pricing formula. The assumption that stock returns are normally distributed means stock prices are log-normally distributed. Notice how the distribution is right-skewed—large gains are possible but prices can't go below zero.
Why Log-Normal for Stock Prices?
- Multiplicative returns: A 10% daily return means multiplying by 1.10, regardless of the current price
- Non-negativity: Stock prices cannot go below zero (log-normal support is (0, ∞))
- Compound growth: Returns compound over time
- Empirical fit: Log-returns are approximately normal (though real markets have heavier tails)
Model Limitations
Real stock returns have "fat tails"—extreme events occur more often than the log-normal model predicts. This is why options are often mispriced by Black-Scholes during market crashes. Extensions like the Heston model use stochastic volatility to address this.
Real-World Applications
Example 1: Income Distribution
Problem: A company's employee salaries follow a log-normal distribution with μ = 11.0 and σ = 0.5 (in log-dollars). Find the median salary and the percentage of employees earning over $100,000.
Solution:
- Median = e^μ = e^11.0 = $59,874
- P(Salary > 100,000) = P(ln(Salary) > ln(100,000)) = P(X > 11.51) where X ~ N(11.0, 0.25)
- z = (11.51 - 11.0) / 0.5 = 1.02
- P(Z > 1.02) ≈ 15.4% earn over $100,000
Example 2: Network Latency
Problem: Server response times follow LogNormal(2.5, 0.8) in milliseconds. Design an SLA guaranteeing 99% of requests complete within a threshold. What threshold should you set?
Solution:
- Find the 99th percentile of the log-normal distribution
- In log-space, the 99th percentile of N(2.5, 0.64) is: 2.5 + 2.33(0.8) = 4.36
- Back-transform: e^4.36 = 78.3 ms
- SLA: "99% of requests complete in under 80ms"
Example 3: Particle Sizes
Problem: Aerosol particle diameters follow LogNormal(μ, σ) with median 2.5 μm and mean 3.2 μm. Find μ and σ.
Solution:
- Median = e^μ = 2.5, so μ = ln(2.5) = 0.916
- Mean = e^(μ + σ²/2) = 3.2
- ln(3.2) = 0.916 + σ²/2
- σ² = 2(ln(3.2) - 0.916) = 2(1.163 - 0.916) = 0.494
- σ = 0.703
AI/ML Applications
1. Weight Initialization and Gradient Flow
In deep neural networks, activations after many layers tend toward log-normal distributions due to multiplicative effects:
- Each layer multiplies by weights and applies activation functions
- After many layers, this repeated multiplication creates log-normal patterns
- He/Kaiming initialization accounts for this by scaling weights to maintain variance across layers
1import torch
2import torch.nn as nn
3
4# He initialization for ReLU networks
5# Accounts for multiplicative variance growth
6layer = nn.Linear(512, 256)
7nn.init.kaiming_normal_(layer.weight, mode='fan_in', nonlinearity='relu')
8
9# After many layers with ReLU, activations approximately follow
10# a truncated log-normal distribution2. Attention Scores in Transformers
Raw attention scores (before softmax) in transformer models often exhibit log-normal-like patterns:
- Dot-product similarities between embeddings can be log-normally distributed
- Heavy right tails explain why attention focuses on few "key" tokens
- This informs design choices for attention normalization
3. Loss Distributions in Training
Individual sample losses during training often follow log-normal distributions:
1import numpy as np
2import matplotlib.pyplot as plt
3
4# Per-sample cross-entropy losses are often log-normal
5sample_losses = model.compute_per_sample_loss(batch)
6
7# Taking log transforms for analysis
8log_losses = np.log(sample_losses)
9# log_losses is approximately normal!
10
11# This suggests:
12# 1. Use log-loss for monitoring (more interpretable)
13# 2. Hard examples have very high loss (right tail)
14# 3. Curriculum learning can exploit this structure4. Uncertainty Quantification
For positive quantities, log-normal priors are more appropriate than Gaussian:
1import torch
2import torch.distributions as dist
3
4# For modeling positive uncertainty (e.g., variance, scale)
5# Log-normal is more appropriate than Normal
6
7# Bayesian neural network with log-normal prior on variance
8log_var = torch.nn.Parameter(torch.zeros(1)) # log(variance)
9var_prior = dist.LogNormal(loc=-2.0, scale=0.5)
10
11# The actual variance is positive: var = exp(log_var)
12variance = torch.exp(log_var)
13
14# Loss includes KL divergence to prior
15kl_loss = dist.kl_divergence(
16 dist.LogNormal(log_var, torch.ones_like(log_var)),
17 var_prior
18)5. Data Augmentation with Multiplicative Noise
Many augmentation techniques use multiplicative factors that are log-normally distributed:
- Color jittering: Multiply RGB channels by random factors
- Scale augmentation: Multiply image dimensions
- Audio augmentation: Multiply amplitude by random gain
1import numpy as np
2
3def log_normal_color_jitter(image, sigma=0.1):
4 """Apply multiplicative color jittering using log-normal factors."""
5 # Generate log-normal multiplicative factors
6 # E[factor] = 1 when mu = -sigma^2/2
7 mu = -sigma**2 / 2
8 factors = np.random.lognormal(mu, sigma, size=(1, 1, 3))
9
10 # Multiply and clip
11 augmented = np.clip(image * factors, 0, 255).astype(np.uint8)
12 return augmentedConnections to Other Distributions
| Relationship | Description |
|---|---|
| LogNormal ↔ Normal | Y = e^X transforms Normal to LogNormal (and vice versa with log) |
| LogNormal & Exponential | Exponential is a special case related to gamma; both model waiting times |
| LogNormal & Weibull | Both used for reliability/lifetime modeling; Weibull offers more flexibility |
| Products of LogNormals | Product of independent LogNormals is LogNormal (like sum of Normals is Normal) |
| LogNormal & Pareto | Both heavy-tailed; Pareto has even heavier tails (power-law vs exponential) |
The Distribution Family Tree
The log-normal arises naturally from the normal through the exponential transformation. This places it in a family of distributions connected by transformations:
- Normal → (exponential) → Log-Normal
- Normal → (square) → Chi-Square (one degree of freedom)
- Exponential → (sum of k) → Gamma
- Gamma → (ratio) → Beta
Python Implementation
Basic Log-Normal Operations
1from scipy import stats
2import numpy as np
3
4# IMPORTANT: scipy.stats.lognorm uses a different parameterization!
5# scipy: s = sigma, scale = exp(mu)
6# standard: mu, sigma
7
8mu = 0.5 # location parameter (mean of log)
9sigma = 0.8 # scale parameter (std of log)
10
11# Create distribution
12lognorm = stats.lognorm(s=sigma, scale=np.exp(mu))
13
14# PDF and CDF
15x = 2.0
16print(f"PDF at x=2: {lognorm.pdf(x):.6f}")
17print(f"CDF at x=2: {lognorm.cdf(x):.6f}") # P(X < 2)
18
19# Key statistics
20print(f"Mean: {lognorm.mean():.4f}") # Should be exp(mu + sigma^2/2)
21print(f"Median: {lognorm.median():.4f}") # Should be exp(mu)
22print(f"Variance: {lognorm.var():.4f}")
23print(f"Mode: {np.exp(mu - sigma**2):.4f}") # Not built-in
24
25# Quantiles (percentiles)
26print(f"95th percentile: {lognorm.ppf(0.95):.4f}")
27
28# Generate random samples
29samples = lognorm.rvs(size=1000)
30print(f"Sample mean: {samples.mean():.4f}")
31print(f"Sample median: {np.median(samples):.4f}")Fitting Log-Normal to Data
1import numpy as np
2from scipy import stats
3
4# Suppose we have right-skewed positive data
5data = np.array([1.2, 2.5, 3.1, 1.8, 4.2, 2.9, 5.1, 1.5, 2.2, 3.8])
6
7# Method 1: Fit log-normal directly
8# Returns shape (sigma), loc, scale (exp(mu))
9shape, loc, scale = stats.lognorm.fit(data, floc=0) # Fix loc=0 for standard lognorm
10mu_fit = np.log(scale)
11sigma_fit = shape
12print(f"Fitted parameters: mu = {mu_fit:.4f}, sigma = {sigma_fit:.4f}")
13
14# Method 2: Fit normal to log-transformed data (often more robust)
15log_data = np.log(data)
16mu_log, sigma_log = log_data.mean(), log_data.std(ddof=1)
17print(f"From log-data: mu = {mu_log:.4f}, sigma = {sigma_log:.4f}")
18
19# Verify the fit
20fitted_dist = stats.lognorm(s=sigma_fit, scale=np.exp(mu_fit))
21print(f"Theoretical mean: {fitted_dist.mean():.4f}")
22print(f"Actual mean: {data.mean():.4f}")Confidence Intervals
1import numpy as np
2from scipy import stats
3
4def lognormal_ci(data, confidence=0.95):
5 """
6 Compute confidence interval for log-normal mean.
7
8 Strategy: CI on log-transformed data, then back-transform.
9 """
10 n = len(data)
11 log_data = np.log(data)
12
13 # CI for mean of log-data (normal)
14 mu_hat = log_data.mean()
15 se = log_data.std(ddof=1) / np.sqrt(n)
16 t_crit = stats.t.ppf((1 + confidence) / 2, df=n-1)
17
18 log_ci_lower = mu_hat - t_crit * se
19 log_ci_upper = mu_hat + t_crit * se
20
21 # Back-transform for median CI
22 median_ci = (np.exp(log_ci_lower), np.exp(log_ci_upper))
23
24 # For mean, need to account for variance
25 sigma2_hat = log_data.var(ddof=1)
26 mean_hat = np.exp(mu_hat + sigma2_hat / 2)
27
28 return {
29 'median_ci': median_ci,
30 'mean_estimate': mean_hat,
31 'mu_hat': mu_hat,
32 'sigma_hat': np.sqrt(sigma2_hat)
33 }
34
35# Example usage
36data = np.random.lognormal(mean=1.0, sigma=0.5, size=100)
37result = lognormal_ci(data)
38print(f"Median CI: ({result['median_ci'][0]:.3f}, {result['median_ci'][1]:.3f})")
39print(f"Mean estimate: {result['mean_estimate']:.3f}")Common Pitfalls
Pitfall 1: Confusing Parameters with Statistics
Wrong: "The log-normal has mean μ and standard deviation σ."
Right: μ and σ are the mean and standard deviation of ln(Y), not Y itself. The actual mean is e^(μ + σ²/2).
Pitfall 2: Scipy Parameterization
Wrong: Using scipy.stats.lognorm with the "standard" parameterization.
1from scipy import stats
2import numpy as np
3
4mu, sigma = 1.0, 0.5
5
6# WRONG: This doesn't use mu and sigma directly!
7# wrong = stats.lognorm(mu, sigma)
8
9# CORRECT: scipy uses s=sigma, scale=exp(mu)
10correct = stats.lognorm(s=sigma, scale=np.exp(mu))
11
12print(f"Mean should be {np.exp(mu + sigma**2/2):.4f}")
13print(f"scipy gives: {correct.mean():.4f}") # Matches!Pitfall 3: Arithmetic vs Geometric Mean
For log-normal data, the geometric mean (which equals the median) is often more meaningful than the arithmetic mean:
1import numpy as np
2from scipy import stats
3
4# Log-normal data
5data = stats.lognorm.rvs(s=0.8, scale=np.exp(0.5), size=1000)
6
7# Arithmetic mean - pulled up by outliers
8arith_mean = data.mean()
9
10# Geometric mean - more robust, equals median for log-normal
11geom_mean = np.exp(np.log(data).mean())
12
13# Median
14median = np.median(data)
15
16print(f"Arithmetic mean: {arith_mean:.3f}")
17print(f"Geometric mean: {geom_mean:.3f}")
18print(f"Median: {median:.3f}")
19# Geometric mean ≈ Median for log-normal dataPitfall 4: Forgetting the Support
Log-normal is only defined for positive values (y > 0). If your data can be zero or negative, log-normal is not appropriate!
- Zero values: Consider zero-inflated log-normal or add a small constant before log-transforming
- Negative values: Log-normal is not appropriate. Consider normal, shifted log-normal, or other distributions
Test Your Understanding
Test Your Understanding
If X ~ N(0, 1) (standard normal), what distribution does Y = eˣ follow?
Summary
The log-normal distribution captures the behavior of multiplicative processes just as the normal distribution captures additive processes.
- Fundamental relationship: If X ~ Normal(μ, σ²), then e^X ~ LogNormal(μ, σ)
- Parameters ≠ Statistics: μ is NOT the mean; σ is NOT the standard deviation. They are the mean and std of ln(Y).
- Always right-skewed: Mean > Median > Mode, always
- Multiplicative processes: Use log-normal when effects multiply (stock prices, income, biological growth)
- Take logs first: Transform to normal, analyze, then back-transform
- Positive support: Log-normal only for y > 0
The Bottom Line: When you see right-skewed positive data that results from multiplicative processes, think log-normal. Take logs to normalize, analyze, and interpret—then back-transform for practical conclusions.
From Finance to Deep Learning
The log-normal distribution connects classical statistics to modern ML. From Black-Scholes option pricing to understanding gradient flow in deep networks, recognizing multiplicative processes helps you choose appropriate models and build more robust systems.