Chapter 5
25 min read
Section 40 of 175

Log-Normal Distribution

Continuous Distributions

Learning Objectives

By the end of this section, you will be able to:

  1. Understand why exponentiating a normal random variable creates a log-normal distribution and derive this relationship mathematically
  2. Distinguish multiplicative from additive processes and recognize when log-normal is the appropriate model
  3. Calculate the mean, median, mode, and variance—and understand why mumu and sigmasigma are NOT the mean and standard deviation
  4. Apply log-normal distributions to model stock prices, income distributions, and other real-world phenomena
  5. Implement log-normal models in Python using scipy.stats
  6. Connect log-normal concepts to AI/ML applications including weight initialization, attention mechanisms, and uncertainty quantification

The Big Picture: When Multiplication Replaces Addition

"The log-normal distribution is what you get when many small effects multiply together, just as the normal distribution is what you get when many small effects add together."

You already know the Central Limit Theorem: when you add many independent random effects, the sum tends toward a normal distribution. But what happens when effects multiply instead of add?

Think about compound interest: a $100 investment growing at 10% annually becomes $100 × 1.1 × 1.1 × 1.1... This is multiplicative growth. Each year's effect multiplies the previous total. The result? After many periods, the final value follows a log-normal distribution.

The Key Mathematical Insight

Here's the beautiful trick: multiplication becomes addition when you take logarithms!

ln(aimesbimesc)=ln(a)+ln(b)+ln(c)\ln(a imes b imes c) = \ln(a) + \ln(b) + \ln(c)

So if your final value is a product of many random factors, then the logarithm of your final value is a sum of many random terms. By the CLT, this sum is approximately normal. Therefore, the original value (before taking logs) is log-normally distributed.

The Fundamental Relationship

If XsimN(mu,sigma2)X sim N(mu, sigma^2) (normal), then Y=eXextLogNormal(μ,σ)Y = e^X \sim ext{LogNormal}(\mu, \sigma).

Equivalently: If YextLogNormal(μ,σ)Y \sim ext{LogNormal}(\mu, \sigma), then ln(Y)simN(mu,sigma2)ln(Y) sim N(mu, sigma^2).


Multiplicative vs. Additive Processes

Understanding when to use log-normal vs. normal requires recognizing the fundamental nature of the process generating your data:

Additive Processes → Normal Distribution

  • Human Height: Many genes each add a small amount to height. Height ≈ base + gene₁ effect + gene₂ effect + ... + environment effects
  • Measurement Errors: Total error = sum of many small independent errors
  • IQ Scores: Designed to be normally distributed through standardization

Multiplicative Processes → Log-Normal Distribution

  • Stock Prices: Today's price = yesterday's price × (1 + return). Returns compound: P_n = P_0 × (1+r_1) × (1+r_2) × ... × (1+r_n)
  • Income: Income grows multiplicatively with raises. A 5% raise means multiplying by 1.05, not adding $5,000.
  • Biological Growth: Cell populations double each division: N = N_0 × 2^generations. Particle sizes result from breakage/growth processes.
  • File Sizes: Programs grow multiplicatively as features multiply complexity.

Quick Test: Is it Multiplicative?

Ask yourself: "Does a 10% change make sense, regardless of the current value?" If a 10% raise makes sense whether you earn $50K or $500K, the process is multiplicative. If it only makes sense to add a fixed amount (like $5,000), it's additive.


Mathematical Definition

Definition 1: Via Transformation

A random variable YY follows a log-normal distribution with parameters mumu and sigmasigma if:

Y=eXextwhereXN(μ,σ2)Y = e^X \quad ext{where} \quad X \sim N(\mu, \sigma^2)

We write YextLogNormal(μ,σ)Y \sim ext{LogNormal}(\mu, \sigma).

Definition 2: Probability Density Function (PDF)

f(y; mu, sigma) = rac{1}{y sigma sqrt{2pi}} expleft(- rac{(ln y - mu)^2}{2sigma^2} ight), quad y > 0

Symbol Table

SymbolNameMeaningRange
yRandom variableThe value we observe(0, ∞)
μLocation parameterMean of ln(Y), NOT mean of Y(-∞, ∞)
σScale parameterStd dev of ln(Y), NOT std dev of Y(0, ∞)
ln(y)Natural logarithmLog-transformed value(-∞, ∞)

Critical Warning: μ and σ Are NOT What You Think!

The parameters mumu and sigmasigma are the mean and standard deviation of ln(Y)ln(Y), NOT of YY itself! This is the most common source of confusion with log-normal distributions.

  • E[Y]eqmuE[Y] eq mu (The mean of Y is NOT μ)
  • extSD(Y)eqsigmaext{SD}(Y) eq sigma (The SD of Y is NOT σ)

Key Statistics

StatisticFormulaIntuition
Mean (E[Y])e^(μ + σ²/2)Always larger than median
Mediane^μThe 50th percentile; simpler formula
Modee^(μ - σ²)The peak of the PDF; smallest of the three
Variance(e^σ² - 1) × e^(2μ + σ²)Grows rapidly with σ
Skewness(e^σ² + 2) × √(e^σ² - 1)Always positive (right-skewed)
Support(0, ∞)Only positive values possible
The Golden Rule: For log-normal distributions, Mean > Median > Mode always. This is because the distribution is always right-skewed—the heavy right tail "pulls" the mean to the right.

Interactive PDF Explorer

Explore how the log-normal distribution changes with different parameters. Pay special attention to how the mean, median, and mode relate to each other:

Log-Normal Distribution Explorer

-202

μ = mean of ln(X), NOT the mean of X

0.10.751.5

σ = std dev of ln(X), controls skewness

Display Options
Mode: 0.78Median: 1.00Mean: 1.1301.32.63.95.26.5x0.000.280.550.831.10f(x)PDFMeanMedianMode
Mode
0.7788
eμ-σ²
Median
1.0000
eμ
Mean
1.1331
eμ+σ²/2
Variance
0.3647
Skewness
1.7502
Always positive

Key Insight: Mean > Median > Mode (Always!)

Notice how Mode < Median < Mean for the log-normal distribution. This is because the distribution is always right-skewed (positive skewness = 1.75). The mean is "pulled" rightward by the heavy right tail. As σ increases, this gap widens.

Try This

  • Set σ to 0.2 (small) and observe how the distribution looks almost symmetric
  • Increase σ to 1.0+ and watch the right tail stretch dramatically
  • Notice how the mean is always pulled to the right of the median
  • Hover over the curve to see exact PDF and CDF values

The Exponential Transformation

The heart of the log-normal distribution is the transformation Y=eXY = e^X. This interactive visualization shows how a symmetric normal distribution transforms into a right-skewed log-normal:

The Exponential Transformation: Normal → Log-Normal

If X ~ N(μ, σ²), then Y = eX ~ LogNormal(μ, σ). Watch how the symmetric Normal distribution transforms into the right-skewed Log-Normal.

Normal Distribution N(μ, σ²)-3-2-10123X (Normal random variable)
Y = eX
Log-Normal Distribution024Y = eX (Log-Normal random variable)

The Transformation Insight

Notice how negative normal values (X < 0) get mapped to values between 0 and 1, while positive normal values (X > 0) get mapped to values greater than 1. This is why log-normal is always positive and right-skewed: the exponential function "stretches" the right side while "compressing" the left.

Why the Transformation Creates Skewness

The exponential function exe^x has a key property: it compresses negative values and stretches positive values:

  • When X = -2 (2 standard deviations left): Y = e^(-2) ≈ 0.14 (compressed near zero)
  • When X = 0 (center): Y = e^0 = 1
  • When X = +2 (2 standard deviations right): Y = e^2 ≈ 7.39 (stretched far right)

This asymmetric stretching is why log-normal is always right-skewed!


Understanding Mean, Median, and Mode

For the log-normal distribution, these three measures of central tendency are always in the same order: Mode < Median < Mean.

The Formulas

extMode=eμσ2<extMedian=eμ<extMean=eμ+σ2/2ext{Mode} = e^{\mu - \sigma^2} < ext{Median} = e^{\mu} < ext{Mean} = e^{\mu + \sigma^2/2}

The Intuition

Think of a room of people with their incomes (a classic log-normal example):

  • Mode (most common): What's the most frequent income? This is where the peak of the distribution is—somewhere modest.
  • Median (50th percentile): Half the people earn less, half earn more. A "typical" person.
  • Mean (average): Add up all incomes and divide. The few billionaires in the room pull this way up!

When to Use Which

  • Median: Best "typical" value for communication (e.g., "median income is $50,000")
  • Mean: Best for calculations involving totals (e.g., total revenue = mean × count)
  • Mode: Best for understanding the most likely outcome

Key Properties of the Log-Normal

Property 1: Products of Log-Normals

If Y1extLogNormal(μ1,σ1)Y_1 \sim ext{LogNormal}(\mu_1, \sigma_1) and Y2extLogNormal(μ2,σ2)Y_2 \sim ext{LogNormal}(\mu_2, \sigma_2) are independent, then:

Y1imesY2extLogNormal(μ1+μ2,σ12+σ22)Y_1 imes Y_2 \sim ext{LogNormal}(\mu_1 + \mu_2, \sqrt{\sigma_1^2 + \sigma_2^2})

The product of log-normals is log-normal! (Compare to: sum of normals is normal.)

Property 2: Powers of Log-Normals

If YextLogNormal(μ,σ)Y \sim ext{LogNormal}(\mu, \sigma), then:

YcextLogNormal(cμ,cσ)Y^c \sim ext{LogNormal}(c\mu, |c|\sigma)

Property 3: The Log Transform Normalizes

This is the most practical property: if your data is log-normal, taking logs makes it normal. This means:

  • You can use normal-based statistical tests on log-transformed data
  • Linear regression on log(Y) is often appropriate
  • Confidence intervals are easier to compute in log-space

The Log-Transform Workflow

  1. Recognize your data is log-normal (right-skewed, positive values)
  2. Transform: Z = ln(Y), now Z is approximately normal
  3. Perform analysis on Z using normal-based methods
  4. Back-transform results: Y = e^Z

Stock Price Modeling

The Geometric Brownian Motion model, which underlies the famous Black-Scholes option pricing formula, assumes stock prices follow a log-normal distribution.

The Model

Stock prices evolve according to:

dS=μSdt+σSdWdS = \mu S \, dt + \sigma S \, dW

This stochastic differential equation has the solution:

S(T) = S(0) expleft[left(mu - rac{sigma^2}{2} ight)T + sigma W(T) ight]

Since W(T)simN(0,T)W(T) sim N(0, T), the term in brackets is normal, so S(T)S(T) is log-normally distributed.

Stock Price Simulator: Geometric Brownian Motion

Stock prices follow Geometric Brownian Motion, which means final prices are log-normally distributed. Adjust parameters and watch how the distribution of final prices changes.

Expected return
Price uncertainty
~0.2 years

50 Simulated Price Paths

Initial: $100Trading DaysStock Price ($)$73$108$144

Final Price Distribution (Log-Normal)

MeanFinal Price ($)$81$105$129SimulatedTheory
Mean Price
$103.16
Median Price
$100.49
Std Dev
$11.31
5th Percentile
$84.85
95th Percentile
$121.30
Range
$81-$129

Black-Scholes Connection

This is exactly the model used in the Black-Scholes option pricing formula. The assumption that stock returns are normally distributed means stock prices are log-normally distributed. Notice how the distribution is right-skewed—large gains are possible but prices can't go below zero.

Why Log-Normal for Stock Prices?

  1. Multiplicative returns: A 10% daily return means multiplying by 1.10, regardless of the current price
  2. Non-negativity: Stock prices cannot go below zero (log-normal support is (0, ∞))
  3. Compound growth: Returns compound over time
  4. Empirical fit: Log-returns are approximately normal (though real markets have heavier tails)

Model Limitations

Real stock returns have "fat tails"—extreme events occur more often than the log-normal model predicts. This is why options are often mispriced by Black-Scholes during market crashes. Extensions like the Heston model use stochastic volatility to address this.


Real-World Applications

Example 1: Income Distribution

Problem: A company's employee salaries follow a log-normal distribution with μ = 11.0 and σ = 0.5 (in log-dollars). Find the median salary and the percentage of employees earning over $100,000.

Solution:

  • Median = e^μ = e^11.0 = $59,874
  • P(Salary > 100,000) = P(ln(Salary) > ln(100,000)) = P(X > 11.51) where X ~ N(11.0, 0.25)
  • z = (11.51 - 11.0) / 0.5 = 1.02
  • P(Z > 1.02) ≈ 15.4% earn over $100,000

Example 2: Network Latency

Problem: Server response times follow LogNormal(2.5, 0.8) in milliseconds. Design an SLA guaranteeing 99% of requests complete within a threshold. What threshold should you set?

Solution:

  • Find the 99th percentile of the log-normal distribution
  • In log-space, the 99th percentile of N(2.5, 0.64) is: 2.5 + 2.33(0.8) = 4.36
  • Back-transform: e^4.36 = 78.3 ms
  • SLA: "99% of requests complete in under 80ms"

Example 3: Particle Sizes

Problem: Aerosol particle diameters follow LogNormal(μ, σ) with median 2.5 μm and mean 3.2 μm. Find μ and σ.

Solution:

  • Median = e^μ = 2.5, so μ = ln(2.5) = 0.916
  • Mean = e^(μ + σ²/2) = 3.2
  • ln(3.2) = 0.916 + σ²/2
  • σ² = 2(ln(3.2) - 0.916) = 2(1.163 - 0.916) = 0.494
  • σ = 0.703

AI/ML Applications

1. Weight Initialization and Gradient Flow

In deep neural networks, activations after many layers tend toward log-normal distributions due to multiplicative effects:

  • Each layer multiplies by weights and applies activation functions
  • After many layers, this repeated multiplication creates log-normal patterns
  • He/Kaiming initialization accounts for this by scaling weights to maintain variance across layers
🐍python
1import torch
2import torch.nn as nn
3
4# He initialization for ReLU networks
5# Accounts for multiplicative variance growth
6layer = nn.Linear(512, 256)
7nn.init.kaiming_normal_(layer.weight, mode='fan_in', nonlinearity='relu')
8
9# After many layers with ReLU, activations approximately follow
10# a truncated log-normal distribution

2. Attention Scores in Transformers

Raw attention scores (before softmax) in transformer models often exhibit log-normal-like patterns:

  • Dot-product similarities between embeddings can be log-normally distributed
  • Heavy right tails explain why attention focuses on few "key" tokens
  • This informs design choices for attention normalization

3. Loss Distributions in Training

Individual sample losses during training often follow log-normal distributions:

🐍python
1import numpy as np
2import matplotlib.pyplot as plt
3
4# Per-sample cross-entropy losses are often log-normal
5sample_losses = model.compute_per_sample_loss(batch)
6
7# Taking log transforms for analysis
8log_losses = np.log(sample_losses)
9# log_losses is approximately normal!
10
11# This suggests:
12# 1. Use log-loss for monitoring (more interpretable)
13# 2. Hard examples have very high loss (right tail)
14# 3. Curriculum learning can exploit this structure

4. Uncertainty Quantification

For positive quantities, log-normal priors are more appropriate than Gaussian:

🐍python
1import torch
2import torch.distributions as dist
3
4# For modeling positive uncertainty (e.g., variance, scale)
5# Log-normal is more appropriate than Normal
6
7# Bayesian neural network with log-normal prior on variance
8log_var = torch.nn.Parameter(torch.zeros(1))  # log(variance)
9var_prior = dist.LogNormal(loc=-2.0, scale=0.5)
10
11# The actual variance is positive: var = exp(log_var)
12variance = torch.exp(log_var)
13
14# Loss includes KL divergence to prior
15kl_loss = dist.kl_divergence(
16    dist.LogNormal(log_var, torch.ones_like(log_var)),
17    var_prior
18)

5. Data Augmentation with Multiplicative Noise

Many augmentation techniques use multiplicative factors that are log-normally distributed:

  • Color jittering: Multiply RGB channels by random factors
  • Scale augmentation: Multiply image dimensions
  • Audio augmentation: Multiply amplitude by random gain
🐍python
1import numpy as np
2
3def log_normal_color_jitter(image, sigma=0.1):
4    """Apply multiplicative color jittering using log-normal factors."""
5    # Generate log-normal multiplicative factors
6    # E[factor] = 1 when mu = -sigma^2/2
7    mu = -sigma**2 / 2
8    factors = np.random.lognormal(mu, sigma, size=(1, 1, 3))
9
10    # Multiply and clip
11    augmented = np.clip(image * factors, 0, 255).astype(np.uint8)
12    return augmented

Connections to Other Distributions

RelationshipDescription
LogNormal ↔ NormalY = e^X transforms Normal to LogNormal (and vice versa with log)
LogNormal & ExponentialExponential is a special case related to gamma; both model waiting times
LogNormal & WeibullBoth used for reliability/lifetime modeling; Weibull offers more flexibility
Products of LogNormalsProduct of independent LogNormals is LogNormal (like sum of Normals is Normal)
LogNormal & ParetoBoth heavy-tailed; Pareto has even heavier tails (power-law vs exponential)

The Distribution Family Tree

The log-normal arises naturally from the normal through the exponential transformation. This places it in a family of distributions connected by transformations:

  • Normal → (exponential) → Log-Normal
  • Normal → (square) → Chi-Square (one degree of freedom)
  • Exponential → (sum of k) → Gamma
  • Gamma → (ratio) → Beta

Python Implementation

Basic Log-Normal Operations

🐍python
1from scipy import stats
2import numpy as np
3
4# IMPORTANT: scipy.stats.lognorm uses a different parameterization!
5# scipy: s = sigma, scale = exp(mu)
6# standard: mu, sigma
7
8mu = 0.5      # location parameter (mean of log)
9sigma = 0.8   # scale parameter (std of log)
10
11# Create distribution
12lognorm = stats.lognorm(s=sigma, scale=np.exp(mu))
13
14# PDF and CDF
15x = 2.0
16print(f"PDF at x=2: {lognorm.pdf(x):.6f}")
17print(f"CDF at x=2: {lognorm.cdf(x):.6f}")  # P(X < 2)
18
19# Key statistics
20print(f"Mean: {lognorm.mean():.4f}")        # Should be exp(mu + sigma^2/2)
21print(f"Median: {lognorm.median():.4f}")    # Should be exp(mu)
22print(f"Variance: {lognorm.var():.4f}")
23print(f"Mode: {np.exp(mu - sigma**2):.4f}")  # Not built-in
24
25# Quantiles (percentiles)
26print(f"95th percentile: {lognorm.ppf(0.95):.4f}")
27
28# Generate random samples
29samples = lognorm.rvs(size=1000)
30print(f"Sample mean: {samples.mean():.4f}")
31print(f"Sample median: {np.median(samples):.4f}")

Fitting Log-Normal to Data

🐍python
1import numpy as np
2from scipy import stats
3
4# Suppose we have right-skewed positive data
5data = np.array([1.2, 2.5, 3.1, 1.8, 4.2, 2.9, 5.1, 1.5, 2.2, 3.8])
6
7# Method 1: Fit log-normal directly
8# Returns shape (sigma), loc, scale (exp(mu))
9shape, loc, scale = stats.lognorm.fit(data, floc=0)  # Fix loc=0 for standard lognorm
10mu_fit = np.log(scale)
11sigma_fit = shape
12print(f"Fitted parameters: mu = {mu_fit:.4f}, sigma = {sigma_fit:.4f}")
13
14# Method 2: Fit normal to log-transformed data (often more robust)
15log_data = np.log(data)
16mu_log, sigma_log = log_data.mean(), log_data.std(ddof=1)
17print(f"From log-data: mu = {mu_log:.4f}, sigma = {sigma_log:.4f}")
18
19# Verify the fit
20fitted_dist = stats.lognorm(s=sigma_fit, scale=np.exp(mu_fit))
21print(f"Theoretical mean: {fitted_dist.mean():.4f}")
22print(f"Actual mean: {data.mean():.4f}")

Confidence Intervals

🐍python
1import numpy as np
2from scipy import stats
3
4def lognormal_ci(data, confidence=0.95):
5    """
6    Compute confidence interval for log-normal mean.
7
8    Strategy: CI on log-transformed data, then back-transform.
9    """
10    n = len(data)
11    log_data = np.log(data)
12
13    # CI for mean of log-data (normal)
14    mu_hat = log_data.mean()
15    se = log_data.std(ddof=1) / np.sqrt(n)
16    t_crit = stats.t.ppf((1 + confidence) / 2, df=n-1)
17
18    log_ci_lower = mu_hat - t_crit * se
19    log_ci_upper = mu_hat + t_crit * se
20
21    # Back-transform for median CI
22    median_ci = (np.exp(log_ci_lower), np.exp(log_ci_upper))
23
24    # For mean, need to account for variance
25    sigma2_hat = log_data.var(ddof=1)
26    mean_hat = np.exp(mu_hat + sigma2_hat / 2)
27
28    return {
29        'median_ci': median_ci,
30        'mean_estimate': mean_hat,
31        'mu_hat': mu_hat,
32        'sigma_hat': np.sqrt(sigma2_hat)
33    }
34
35# Example usage
36data = np.random.lognormal(mean=1.0, sigma=0.5, size=100)
37result = lognormal_ci(data)
38print(f"Median CI: ({result['median_ci'][0]:.3f}, {result['median_ci'][1]:.3f})")
39print(f"Mean estimate: {result['mean_estimate']:.3f}")

Common Pitfalls

Pitfall 1: Confusing Parameters with Statistics

Wrong: "The log-normal has mean μ and standard deviation σ."

Right: μ and σ are the mean and standard deviation of ln(Y), not Y itself. The actual mean is e^(μ + σ²/2).

Pitfall 2: Scipy Parameterization

Wrong: Using scipy.stats.lognorm with the "standard" parameterization.

🐍python
1from scipy import stats
2import numpy as np
3
4mu, sigma = 1.0, 0.5
5
6# WRONG: This doesn't use mu and sigma directly!
7# wrong = stats.lognorm(mu, sigma)
8
9# CORRECT: scipy uses s=sigma, scale=exp(mu)
10correct = stats.lognorm(s=sigma, scale=np.exp(mu))
11
12print(f"Mean should be {np.exp(mu + sigma**2/2):.4f}")
13print(f"scipy gives: {correct.mean():.4f}")  # Matches!

Pitfall 3: Arithmetic vs Geometric Mean

For log-normal data, the geometric mean (which equals the median) is often more meaningful than the arithmetic mean:

🐍python
1import numpy as np
2from scipy import stats
3
4# Log-normal data
5data = stats.lognorm.rvs(s=0.8, scale=np.exp(0.5), size=1000)
6
7# Arithmetic mean - pulled up by outliers
8arith_mean = data.mean()
9
10# Geometric mean - more robust, equals median for log-normal
11geom_mean = np.exp(np.log(data).mean())
12
13# Median
14median = np.median(data)
15
16print(f"Arithmetic mean: {arith_mean:.3f}")
17print(f"Geometric mean:  {geom_mean:.3f}")
18print(f"Median:          {median:.3f}")
19# Geometric mean ≈ Median for log-normal data

Pitfall 4: Forgetting the Support

Log-normal is only defined for positive values (y > 0). If your data can be zero or negative, log-normal is not appropriate!

  • Zero values: Consider zero-inflated log-normal or add a small constant before log-transforming
  • Negative values: Log-normal is not appropriate. Consider normal, shifted log-normal, or other distributions

Test Your Understanding

Test Your Understanding

Score: 0 / 7

If X ~ N(0, 1) (standard normal), what distribution does Y = eˣ follow?

Question 1 of 7

Summary

The log-normal distribution captures the behavior of multiplicative processes just as the normal distribution captures additive processes.

  1. Fundamental relationship: If X ~ Normal(μ, σ²), then e^X ~ LogNormal(μ, σ)
  2. Parameters ≠ Statistics: μ is NOT the mean; σ is NOT the standard deviation. They are the mean and std of ln(Y).
  3. Always right-skewed: Mean > Median > Mode, always
  4. Multiplicative processes: Use log-normal when effects multiply (stock prices, income, biological growth)
  5. Take logs first: Transform to normal, analyze, then back-transform
  6. Positive support: Log-normal only for y > 0
The Bottom Line: When you see right-skewed positive data that results from multiplicative processes, think log-normal. Take logs to normalize, analyze, and interpret—then back-transform for practical conclusions.

From Finance to Deep Learning

The log-normal distribution connects classical statistics to modern ML. From Black-Scholes option pricing to understanding gradient flow in deep networks, recognizing multiplicative processes helps you choose appropriate models and build more robust systems.

Loading comments...