Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Understand why exponentiating a normal random variable creates a log-normal distribution and derive this relationship mathematically
Distinguish multiplicative from additive processes and recognize when log-normal is the appropriate model
Calculate the mean, median, mode, and variance—and understand why $mu$ and $sigma$ are NOT the mean and standard deviation
Apply log-normal distributions to model stock prices, income distributions, and other real-world phenomena
Implement log-normal models in Python using scipy.stats
Connect log-normal concepts to AI/ML applications including weight initialization, attention mechanisms, and uncertainty quantification

The Big Picture: When Multiplication Replaces Addition

"The log-normal distribution is what you get when many small effects multiply together, just as the normal distribution is what you get when many small effects add together."

You already know the Central Limit Theorem: when you add many independent random effects, the sum tends toward a normal distribution. But what happens when effects multiply instead of add?

Think about compound interest: a $100 investment growing at 10% annually becomes $100 × 1.1 × 1.1 × 1.1... This is multiplicative growth. Each year's effect multiplies the previous total. The result? After many periods, the final value follows a log-normal distribution.

The Key Mathematical Insight

Here's the beautiful trick: multiplication becomes addition when you take logarithms!

\ln(a imes b imes c) = \ln(a) + \ln(b) + \ln(c)

So if your final value is a product of many random factors, then the logarithm of your final value is a sum of many random terms. By the CLT, this sum is approximately normal. Therefore, the original value (before taking logs) is log-normally distributed.

The Fundamental Relationship

If $X sim N(mu, sigma^2)$ (normal), then $Y = e^X \sim ext{LogNormal}(\mu, \sigma)$ .

Equivalently: If $Y \sim ext{LogNormal}(\mu, \sigma)$ , then $ln(Y) sim N(mu, sigma^2)$ .

Multiplicative vs. Additive Processes

Understanding when to use log-normal vs. normal requires recognizing the fundamental nature of the process generating your data:

Additive Processes → Normal Distribution

Human Height: Many genes each add a small amount to height. Height ≈ base + gene₁ effect + gene₂ effect + ... + environment effects
Measurement Errors: Total error = sum of many small independent errors
IQ Scores: Designed to be normally distributed through standardization

Multiplicative Processes → Log-Normal Distribution

Stock Prices: Today's price = yesterday's price × (1 + return). Returns compound: P_n = P_0 × (1+r_1) × (1+r_2) × ... × (1+r_n)
Income: Income grows multiplicatively with raises. A 5% raise means multiplying by 1.05, not adding $5,000.
Biological Growth: Cell populations double each division: N = N_0 × 2^generations. Particle sizes result from breakage/growth processes.
File Sizes: Programs grow multiplicatively as features multiply complexity.

Quick Test: Is it Multiplicative?

Ask yourself: "Does a 10% change make sense, regardless of the current value?" If a 10% raise makes sense whether you earn $50K or $500K, the process is multiplicative. If it only makes sense to add a fixed amount (like $5,000), it's additive.

Mathematical Definition

Definition 1: Via Transformation

A random variable $Y$ follows a log-normal distribution with parameters $mu$ and $sigma$ if:

Y = e^X \quad ext{where} \quad X \sim N(\mu, \sigma^2)

We write $Y \sim ext{LogNormal}(\mu, \sigma)$ .

Definition 2: Probability Density Function (PDF)

f(y; mu, sigma) = rac{1}{y sigma sqrt{2pi}} expleft(- rac{(ln y - mu)^2}{2sigma^2} ight), quad y > 0

Symbol Table

Symbol	Name	Meaning	Range
y	Random variable	The value we observe	(0, ∞)
μ	Location parameter	Mean of ln(Y), NOT mean of Y	(-∞, ∞)
σ	Scale parameter	Std dev of ln(Y), NOT std dev of Y	(0, ∞)
ln(y)	Natural logarithm	Log-transformed value	(-∞, ∞)

Critical Warning: μ and σ Are NOT What You Think!

The parameters $mu$ and $sigma$ are the mean and standard deviation of $ln(Y)$ , NOT of $Y$ itself! This is the most common source of confusion with log-normal distributions.

$E[Y] eq mu$ (The mean of Y is NOT μ)
$ext{SD}(Y) eq sigma$ (The SD of Y is NOT σ)

Key Statistics

Statistic	Formula	Intuition
Mean (E[Y])	e^(μ + σ²/2)	Always larger than median
Median	e^μ	The 50th percentile; simpler formula
Mode	e^(μ - σ²)	The peak of the PDF; smallest of the three
Variance	(e^σ² - 1) × e^(2μ + σ²)	Grows rapidly with σ
Skewness	(e^σ² + 2) × √(e^σ² - 1)	Always positive (right-skewed)
Support	(0, ∞)	Only positive values possible

The Golden Rule: For log-normal distributions, Mean > Median > Mode always. This is because the distribution is always right-skewed—the heavy right tail "pulls" the mean to the right.

Interactive PDF Explorer

Explore how the log-normal distribution changes with different parameters. Pay special attention to how the mean, median, and mode relate to each other:

Log-Normal Distribution Explorer

Location parameter (μ): 0.00

-202

μ = mean of ln(X), NOT the mean of X

Scale parameter (σ): 0.50

0.10.751.5

σ = std dev of ln(X), controls skewness

Display Options

Show PDFShow CDFShow Mean/Median/Mode

Mode

0.7788

e^μ-σ²

Median

1.0000

e^μ

Mean

1.1331

e^μ+σ²/2

Variance

0.3647

Skewness

1.7502

Always positive

Key Insight: Mean > Median > Mode (Always!)

Notice how Mode < Median < Mean for the log-normal distribution. This is because the distribution is always right-skewed (positive skewness = 1.75). The mean is "pulled" rightward by the heavy right tail. As σ increases, this gap widens.

Try This

Set σ to 0.2 (small) and observe how the distribution looks almost symmetric
Increase σ to 1.0+ and watch the right tail stretch dramatically
Notice how the mean is always pulled to the right of the median
Hover over the curve to see exact PDF and CDF values

The Exponential Transformation

The heart of the log-normal distribution is the transformation $Y = e^X$ . This interactive visualization shows how a symmetric normal distribution transforms into a right-skewed log-normal:

The Exponential Transformation: Normal → Log-Normal

If X ~ N(μ, σ²), then Y = e^X ~ LogNormal(μ, σ). Watch how the symmetric Normal distribution transforms into the right-skewed Log-Normal.

μ: 0.00

σ: 0.50

Show particles

→

Y = e^X

^X (Log-Normal random variable)

The Transformation Insight

Notice how negative normal values (X < 0) get mapped to values between 0 and 1, while positive normal values (X > 0) get mapped to values greater than 1. This is why log-normal is always positive and right-skewed: the exponential function "stretches" the right side while "compressing" the left.

Why the Transformation Creates Skewness

The exponential function $e^x$ has a key property: it compresses negative values and stretches positive values:

When X = -2 (2 standard deviations left): Y = e^(-2) ≈ 0.14 (compressed near zero)
When X = 0 (center): Y = e^0 = 1
When X = +2 (2 standard deviations right): Y = e^2 ≈ 7.39 (stretched far right)

This asymmetric stretching is why log-normal is always right-skewed!

Understanding Mean, Median, and Mode

For the log-normal distribution, these three measures of central tendency are always in the same order: Mode < Median < Mean.

The Formulas

ext{Mode} = e^{\mu - \sigma^2} < ext{Median} = e^{\mu} < ext{Mean} = e^{\mu + \sigma^2/2}

The Intuition

Think of a room of people with their incomes (a classic log-normal example):

Mode (most common): What's the most frequent income? This is where the peak of the distribution is—somewhere modest.
Median (50th percentile): Half the people earn less, half earn more. A "typical" person.
Mean (average): Add up all incomes and divide. The few billionaires in the room pull this way up!

When to Use Which

Median: Best "typical" value for communication (e.g., "median income is $50,000")
Mean: Best for calculations involving totals (e.g., total revenue = mean × count)
Mode: Best for understanding the most likely outcome

Key Properties of the Log-Normal

Property 1: Products of Log-Normals

If $Y_1 \sim ext{LogNormal}(\mu_1, \sigma_1)$ and $Y_2 \sim ext{LogNormal}(\mu_2, \sigma_2)$ are independent, then:

Y_1 imes Y_2 \sim ext{LogNormal}(\mu_1 + \mu_2, \sqrt{\sigma_1^2 + \sigma_2^2})

The product of log-normals is log-normal! (Compare to: sum of normals is normal.)

Property 2: Powers of Log-Normals

If $Y \sim ext{LogNormal}(\mu, \sigma)$ , then:

Y^c \sim ext{LogNormal}(c\mu, |c|\sigma)

Property 3: The Log Transform Normalizes

This is the most practical property: if your data is log-normal, taking logs makes it normal. This means:

You can use normal-based statistical tests on log-transformed data
Linear regression on log(Y) is often appropriate
Confidence intervals are easier to compute in log-space

The Log-Transform Workflow

Recognize your data is log-normal (right-skewed, positive values)
Transform: Z = ln(Y), now Z is approximately normal
Perform analysis on Z using normal-based methods
Back-transform results: Y = e^Z

Stock Price Modeling

The Geometric Brownian Motion model, which underlies the famous Black-Scholes option pricing formula, assumes stock prices follow a log-normal distribution.

The Model

Stock prices evolve according to:

dS = \mu S \, dt + \sigma S \, dW

This stochastic differential equation has the solution:

S(T) = S(0) expleft[left(mu - rac{sigma^2}{2} ight)T + sigma W(T) ight]

Since $W(T) sim N(0, T)$ , the term in brackets is normal, so $S(T)$ is log-normally distributed.

Stock Price Simulator: Geometric Brownian Motion

Stock prices follow Geometric Brownian Motion, which means final prices are log-normally distributed. Adjust parameters and watch how the distribution of final prices changes.

Annual Drift (μ): 8%

Expected return

Annual Volatility (σ): 20%

Price uncertainty

Trading Days: 60

~0.2 years

Simulations: 50

Show price pathsShow final price distribution

50 Simulated Price Paths

Final Price Distribution (Log-Normal)

Mean Price

$103.16

Median Price

$100.49

Std Dev

$11.31

5th Percentile

$84.85

95th Percentile

$121.30

Range

$81-$129

Black-Scholes Connection

This is exactly the model used in the Black-Scholes option pricing formula. The assumption that stock returns are normally distributed means stock prices are log-normally distributed. Notice how the distribution is right-skewed—large gains are possible but prices can't go below zero.

Why Log-Normal for Stock Prices?

Multiplicative returns: A 10% daily return means multiplying by 1.10, regardless of the current price
Non-negativity: Stock prices cannot go below zero (log-normal support is (0, ∞))
Compound growth: Returns compound over time
Empirical fit: Log-returns are approximately normal (though real markets have heavier tails)

Model Limitations

Real stock returns have "fat tails"—extreme events occur more often than the log-normal model predicts. This is why options are often mispriced by Black-Scholes during market crashes. Extensions like the Heston model use stochastic volatility to address this.

Real-World Applications

Example 1: Income Distribution

Problem: A company's employee salaries follow a log-normal distribution with μ = 11.0 and σ = 0.5 (in log-dollars). Find the median salary and the percentage of employees earning over $100,000.

Solution:

Median = e^μ = e^11.0 = $59,874
P(Salary > 100,000) = P(ln(Salary) > ln(100,000)) = P(X > 11.51) where X ~ N(11.0, 0.25)
z = (11.51 - 11.0) / 0.5 = 1.02
P(Z > 1.02) ≈ 15.4% earn over $100,000

Example 2: Network Latency

Problem: Server response times follow LogNormal(2.5, 0.8) in milliseconds. Design an SLA guaranteeing 99% of requests complete within a threshold. What threshold should you set?

Solution:

Find the 99th percentile of the log-normal distribution
In log-space, the 99th percentile of N(2.5, 0.64) is: 2.5 + 2.33(0.8) = 4.36
Back-transform: e^4.36 = 78.3 ms
SLA: "99% of requests complete in under 80ms"

Example 3: Particle Sizes

Problem: Aerosol particle diameters follow LogNormal(μ, σ) with median 2.5 μm and mean 3.2 μm. Find μ and σ.

Solution:

Median = e^μ = 2.5, so μ = ln(2.5) = 0.916
Mean = e^(μ + σ²/2) = 3.2
ln(3.2) = 0.916 + σ²/2
σ² = 2(ln(3.2) - 0.916) = 2(1.163 - 0.916) = 0.494
σ = 0.703

AI/ML Applications

1. Weight Initialization and Gradient Flow

In deep neural networks, activations after many layers tend toward log-normal distributions due to multiplicative effects:

Each layer multiplies by weights and applies activation functions
After many layers, this repeated multiplication creates log-normal patterns
He/Kaiming initialization accounts for this by scaling weights to maintain variance across layers

🐍python

1import torch
2import torch.nn as nn
3
4# He initialization for ReLU networks
5# Accounts for multiplicative variance growth
6layer = nn.Linear(512, 256)
7nn.init.kaiming_normal_(layer.weight, mode='fan_in', nonlinearity='relu')
8
9# After many layers with ReLU, activations approximately follow
10# a truncated log-normal distribution

2. Attention Scores in Transformers

Raw attention scores (before softmax) in transformer models often exhibit log-normal-like patterns:

Dot-product similarities between embeddings can be log-normally distributed
Heavy right tails explain why attention focuses on few "key" tokens
This informs design choices for attention normalization

3. Loss Distributions in Training

Individual sample losses during training often follow log-normal distributions:

🐍python

1import numpy as np
2import matplotlib.pyplot as plt
3
4# Per-sample cross-entropy losses are often log-normal
5sample_losses = model.compute_per_sample_loss(batch)
6
7# Taking log transforms for analysis
8log_losses = np.log(sample_losses)
9# log_losses is approximately normal!
10
11# This suggests:
12# 1. Use log-loss for monitoring (more interpretable)
13# 2. Hard examples have very high loss (right tail)
14# 3. Curriculum learning can exploit this structure

4. Uncertainty Quantification

For positive quantities, log-normal priors are more appropriate than Gaussian:

🐍python

1import torch
2import torch.distributions as dist
3
4# For modeling positive uncertainty (e.g., variance, scale)
5# Log-normal is more appropriate than Normal
6
7# Bayesian neural network with log-normal prior on variance
8log_var = torch.nn.Parameter(torch.zeros(1))  # log(variance)
9var_prior = dist.LogNormal(loc=-2.0, scale=0.5)
10
11# The actual variance is positive: var = exp(log_var)
12variance = torch.exp(log_var)
13
14# Loss includes KL divergence to prior
15kl_loss = dist.kl_divergence(
16    dist.LogNormal(log_var, torch.ones_like(log_var)),
17    var_prior
18)

5. Data Augmentation with Multiplicative Noise

Many augmentation techniques use multiplicative factors that are log-normally distributed:

Color jittering: Multiply RGB channels by random factors
Scale augmentation: Multiply image dimensions
Audio augmentation: Multiply amplitude by random gain

🐍python

1import numpy as np
2
3def log_normal_color_jitter(image, sigma=0.1):
4    """Apply multiplicative color jittering using log-normal factors."""
5    # Generate log-normal multiplicative factors
6    # E[factor] = 1 when mu = -sigma^2/2
7    mu = -sigma**2 / 2
8    factors = np.random.lognormal(mu, sigma, size=(1, 1, 3))
9
10    # Multiply and clip
11    augmented = np.clip(image * factors, 0, 255).astype(np.uint8)
12    return augmented

Connections to Other Distributions

Relationship	Description
LogNormal ↔ Normal	Y = e^X transforms Normal to LogNormal (and vice versa with log)
LogNormal & Exponential	Exponential is a special case related to gamma; both model waiting times
LogNormal & Weibull	Both used for reliability/lifetime modeling; Weibull offers more flexibility
Products of LogNormals	Product of independent LogNormals is LogNormal (like sum of Normals is Normal)
LogNormal & Pareto	Both heavy-tailed; Pareto has even heavier tails (power-law vs exponential)

The Distribution Family Tree

The log-normal arises naturally from the normal through the exponential transformation. This places it in a family of distributions connected by transformations:

Normal → (exponential) → Log-Normal
Normal → (square) → Chi-Square (one degree of freedom)
Exponential → (sum of k) → Gamma
Gamma → (ratio) → Beta

Python Implementation

Basic Log-Normal Operations

🐍python

1from scipy import stats
2import numpy as np
3
4# IMPORTANT: scipy.stats.lognorm uses a different parameterization!
5# scipy: s = sigma, scale = exp(mu)
6# standard: mu, sigma
7
8mu = 0.5      # location parameter (mean of log)
9sigma = 0.8   # scale parameter (std of log)
10
11# Create distribution
12lognorm = stats.lognorm(s=sigma, scale=np.exp(mu))
13
14# PDF and CDF
15x = 2.0
16print(f"PDF at x=2: {lognorm.pdf(x):.6f}")
17print(f"CDF at x=2: {lognorm.cdf(x):.6f}")  # P(X < 2)
18
19# Key statistics
20print(f"Mean: {lognorm.mean():.4f}")        # Should be exp(mu + sigma^2/2)
21print(f"Median: {lognorm.median():.4f}")    # Should be exp(mu)
22print(f"Variance: {lognorm.var():.4f}")
23print(f"Mode: {np.exp(mu - sigma**2):.4f}")  # Not built-in
24
25# Quantiles (percentiles)
26print(f"95th percentile: {lognorm.ppf(0.95):.4f}")
27
28# Generate random samples
29samples = lognorm.rvs(size=1000)
30print(f"Sample mean: {samples.mean():.4f}")
31print(f"Sample median: {np.median(samples):.4f}")

Fitting Log-Normal to Data

🐍python

1import numpy as np
2from scipy import stats
3
4# Suppose we have right-skewed positive data
5data = np.array([1.2, 2.5, 3.1, 1.8, 4.2, 2.9, 5.1, 1.5, 2.2, 3.8])
6
7# Method 1: Fit log-normal directly
8# Returns shape (sigma), loc, scale (exp(mu))
9shape, loc, scale = stats.lognorm.fit(data, floc=0)  # Fix loc=0 for standard lognorm
10mu_fit = np.log(scale)
11sigma_fit = shape
12print(f"Fitted parameters: mu = {mu_fit:.4f}, sigma = {sigma_fit:.4f}")
13
14# Method 2: Fit normal to log-transformed data (often more robust)
15log_data = np.log(data)
16mu_log, sigma_log = log_data.mean(), log_data.std(ddof=1)
17print(f"From log-data: mu = {mu_log:.4f}, sigma = {sigma_log:.4f}")
18
19# Verify the fit
20fitted_dist = stats.lognorm(s=sigma_fit, scale=np.exp(mu_fit))
21print(f"Theoretical mean: {fitted_dist.mean():.4f}")
22print(f"Actual mean: {data.mean():.4f}")

Confidence Intervals

🐍python

1import numpy as np
2from scipy import stats
3
4def lognormal_ci(data, confidence=0.95):
5    """
6    Compute confidence interval for log-normal mean.
7
8    Strategy: CI on log-transformed data, then back-transform.
9    """
10    n = len(data)
11    log_data = np.log(data)
12
13    # CI for mean of log-data (normal)
14    mu_hat = log_data.mean()
15    se = log_data.std(ddof=1) / np.sqrt(n)
16    t_crit = stats.t.ppf((1 + confidence) / 2, df=n-1)
17
18    log_ci_lower = mu_hat - t_crit * se
19    log_ci_upper = mu_hat + t_crit * se
20
21    # Back-transform for median CI
22    median_ci = (np.exp(log_ci_lower), np.exp(log_ci_upper))
23
24    # For mean, need to account for variance
25    sigma2_hat = log_data.var(ddof=1)
26    mean_hat = np.exp(mu_hat + sigma2_hat / 2)
27
28    return {
29        'median_ci': median_ci,
30        'mean_estimate': mean_hat,
31        'mu_hat': mu_hat,
32        'sigma_hat': np.sqrt(sigma2_hat)
33    }
34
35# Example usage
36data = np.random.lognormal(mean=1.0, sigma=0.5, size=100)
37result = lognormal_ci(data)
38print(f"Median CI: ({result['median_ci'][0]:.3f}, {result['median_ci'][1]:.3f})")
39print(f"Mean estimate: {result['mean_estimate']:.3f}")

Common Pitfalls

Pitfall 1: Confusing Parameters with Statistics

Wrong: "The log-normal has mean μ and standard deviation σ."

Right: μ and σ are the mean and standard deviation of ln(Y), not Y itself. The actual mean is e^(μ + σ²/2).

Pitfall 2: Scipy Parameterization

Wrong: Using scipy.stats.lognorm with the "standard" parameterization.

🐍python

1from scipy import stats
2import numpy as np
3
4mu, sigma = 1.0, 0.5
5
6# WRONG: This doesn't use mu and sigma directly!
7# wrong = stats.lognorm(mu, sigma)
8
9# CORRECT: scipy uses s=sigma, scale=exp(mu)
10correct = stats.lognorm(s=sigma, scale=np.exp(mu))
11
12print(f"Mean should be {np.exp(mu + sigma**2/2):.4f}")
13print(f"scipy gives: {correct.mean():.4f}")  # Matches!

Pitfall 3: Arithmetic vs Geometric Mean

For log-normal data, the geometric mean (which equals the median) is often more meaningful than the arithmetic mean:

🐍python

1import numpy as np
2from scipy import stats
3
4# Log-normal data
5data = stats.lognorm.rvs(s=0.8, scale=np.exp(0.5), size=1000)
6
7# Arithmetic mean - pulled up by outliers
8arith_mean = data.mean()
9
10# Geometric mean - more robust, equals median for log-normal
11geom_mean = np.exp(np.log(data).mean())
12
13# Median
14median = np.median(data)
15
16print(f"Arithmetic mean: {arith_mean:.3f}")
17print(f"Geometric mean:  {geom_mean:.3f}")
18print(f"Median:          {median:.3f}")
19# Geometric mean ≈ Median for log-normal data

Pitfall 4: Forgetting the Support

Log-normal is only defined for positive values (y > 0). If your data can be zero or negative, log-normal is not appropriate!

Zero values: Consider zero-inflated log-normal or add a small constant before log-transforming
Negative values: Log-normal is not appropriate. Consider normal, shifted log-normal, or other distributions

Test Your Understanding

Score: 0 / 7

If X ~ N(0, 1) (standard normal), what distribution does Y = eˣ follow?

Question 1 of 7

Summary

The log-normal distribution captures the behavior of multiplicative processes just as the normal distribution captures additive processes.

Fundamental relationship: If X ~ Normal(μ, σ²), then e^X ~ LogNormal(μ, σ)
Parameters ≠ Statistics: μ is NOT the mean; σ is NOT the standard deviation. They are the mean and std of ln(Y).
Always right-skewed: Mean > Median > Mode, always
Multiplicative processes: Use log-normal when effects multiply (stock prices, income, biological growth)
Take logs first: Transform to normal, analyze, then back-transform
Positive support: Log-normal only for y > 0

The Bottom Line: When you see right-skewed positive data that results from multiplicative processes, think log-normal. Take logs to normalize, analyze, and interpret—then back-transform for practical conclusions.

From Finance to Deep Learning

The log-normal distribution connects classical statistics to modern ML. From Black-Scholes option pricing to understanding gradient flow in deep networks, recognizing multiplicative processes helps you choose appropriate models and build more robust systems.