Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Explain why classical calculus breaks down when prices wiggle infinitely often, and why a new kind of calculus is needed.
Construct Brownian motion as the limit of a rescaled random walk with steps of size $\sqrt{\Delta t}$ .
State the four defining properties of standard Brownian motion $W(t)$ and recognise why each one is mathematically essential.
Derive the identity $(dW)^2 = dt$ from quadratic variation — the single fact that powers Itô calculus.
Simulate Brownian motion and Geometric Brownian Motion in NumPy and PyTorch, and recognise these simulations inside every Monte-Carlo option-pricing engine.

The Question Calculus Cannot Answer

In Section 31.1 we wrote down the Black-Scholes PDE as if its $\partial V / \partial t$ and $\partial^2 V / \partial S^2$ terms were just ordinary derivatives. They are not. A stock price $S(t)$ is not a smooth curve — it is a microscopically jagged path that wiggles in every interval, no matter how small. Plot a minute-by-minute price and zoom in; you do not see a straight line, you see another wiggly graph. Zoom in again — still wiggly. The path is nowhere differentiable, and a function with no derivative breaks every chain rule you have ever used.

The big question: How do we do calculus on objects that have no slope? That is the problem stochastic calculus was invented to solve, and Brownian motion is its primary building block — the rough surface every other process is built on top of.

The trick will turn out to be: replace the smooth differential $dx$ with a random differential $dW$ , redefine what a derivative means, and pay a single price — the Itô correction $\tfrac{1}{2}\sigma^2 \frac{\partial^2 V}{\partial S^2}$ that appears in Black-Scholes. Everything in this chapter hinges on building $W(t)$ carefully, so let us do that now.

From a Drunkard's Walk to a Stock Price

Forget options for a moment and imagine a drunk standing at the origin of a number line. Every second he flips a fair coin: heads he steps $+1$ , tails he steps $-1$ . After $N$ seconds his position is

X_N \;=\; \xi_1 + \xi_2 + \cdots + \xi_N, \qquad \xi_k = \pm 1 \text{ with prob.\ }\tfrac{1}{2}.

Two textbook facts: $\mathbb{E}[X_N] = 0$ (drift-free), and $\operatorname{Var}(X_N) = N$ (variances add because the steps are independent). So the typical magnitude of his position after $N$ seconds is $\sqrt{N}$ , not $N$ . This is the famous square-root law of diffusion.

Intuition. Adding

N

independent +1/−1 steps is like adding

N

noisy bills together: cancellation makes the total grow like

\sqrt{N}

, not

N

. Stock returns inherit exactly this scaling, which is why option prices end up depending on

\sigma \sqrt{T}

and not

\sigma T

The Magic of the √dt Scaling

Now let us speed the drunk up. Say we want him to take $N$ steps in one second instead of $N$ seconds. If we kept the step size at $\pm 1$ , after one second his typical distance would be $\sqrt{N}$ , which blows up as $N \to \infty$ . Useless — we get infinite wandering.

If instead we made each step $\pm 1/N$ , after one second his typical distance would be $\sqrt{N}/N = 1/\sqrt{N} \to 0$ . Equally useless — every path collapses to zero.

There is exactly one Goldilocks choice: step size $\pm 1/\sqrt{N}$ , i.e. $\pm \sqrt{\Delta t}$ with $\Delta t = 1/N$ . Then $\operatorname{Var}(X_N) = N \cdot (1/\sqrt{N})^2 = 1$ — finite and independent of $N$ . As $N \to \infty$ these rescaled walks converge in distribution to a non-trivial limit: standard Brownian motion $W(t)$ .

The √dt rule. A Brownian increment over a tiny time

dt

has standard deviation

\sqrt{dt}

, not

dt

. That is why

dW

behaves like

\sqrt{dt}

, and why

(dW)^2 \sim dt

rather than the

(dx)^2 \sim (dt)^2

you would expect from ordinary calculus. This single mismatch produces every difference between ordinary and stochastic calculus.

The visualizer below lets you watch this convergence happen. Drag the slider for N: each panel keeps the same underlying random sequence but uses 4× more steps. Notice that the overall shape is preserved while finer wiggles appear — that is Donsker's invariance principle in action.

Loading random-walk limit visualizer…

Brownian Motion: The Formal Definition

The limit object we just sketched is a continuous-time stochastic process $W : [0, \infty) \to \mathbb{R}$ characterised by four properties (sometimes called the Wiener-process axioms):

Starts at zero. $W(0) = 0$ .
Independent increments. For any $0 \le s < t \le u < v$ , the random variables $W(t) - W(s)$ and $W(v) - W(u)$ are independent. The past tells you nothing about the future jump.
Gaussian increments. $W(t) - W(s) \sim \mathcal{N}(0,\, t-s)$ . Mean zero, variance equal to the elapsed time.
Continuous paths. The map $t \mapsto W(t)$ is (almost surely) continuous. No jumps.

Why these four? Each one corresponds to a real modelling assumption: (1) is a normalisation; (2) encodes the Markov property — the future depends only on the present; (3) is the central-limit theorem applied to the rescaled random walk; (4) is what distinguishes Brownian motion from a Poisson process or a jump process. Drop (2) and you get correlated noise; drop (3) and you leave the Black-Scholes world for jump diffusions.

From these four axioms alone, the entire algebra of stochastic integrals follows. Everything you will see in the next sections — Itô's lemma, the Black-Scholes PDE, risk-neutral pricing — is a consequence.

Playing With the Process

Time to get your hands on it. The explorer below simulates either pure Brownian motion $W(t)$ or its multiplicative cousin Geometric Brownian Motion $S(t)$ . Each coloured curve is one independent realisation drawn from the distribution described above.

Switch between W(t) and S(t) with the buttons.
Slide σ up — paths spread out faster. The dashed band traces $\pm 2\sigma\sqrt{t}$ , the 95% envelope.
Slide steps from 50 to 2000 — finer time grids reveal finer wiggles, but the path's overall shape and ending point stay roughly the same. That is the invariance principle again.
For S(t) mode, drag μ: the dashed curve is the expected price $\mathbb{E}[S(t)] = S_0 e^{\mu t}$ . Individual paths fluctuate around it.
Click New seed to get a fresh set of paths.

Loading Brownian motion explorer…

Try this. Set

\sigma = 1

and run with 8 paths. Pause at

t = 0.25

: most paths sit in

[-1, 1]

, because the standard deviation of

W(0.25)

\sqrt{0.25} = 0.5

, so 95% of mass lies in

[-1, 1]

. Slide forward to

t = 1

: the envelope widens to

[-2, 2]

. The spread grows like

\sqrt{t}

— you can literally see the square-root law.

Strange Properties of Brownian Paths

Brownian motion is the source of many counter-intuitive facts. The following four are the ones every quant must internalise:

Property	What it means	Why it matters
Continuous everywhere	Paths have no jumps; you can draw them without lifting the pen.	Stock log-prices in Black-Scholes never gap.
Differentiable nowhere	Pick any time t. The slope (W(t+h) - W(t))/h does not converge as h → 0.	Cannot apply classical chain rule — need Itô calculus.
Self-similar	For any c > 0, the process c·W(t/c²) is again a Brownian motion.	Volatility scales with √t, not t — pricing formulas inherit this.
Infinite total variation	Summed \|ΔW\| over a time interval is infinite — the path has infinite arclength.	Riemann-Stieltjes integrals against dW don't exist; we need stochastic integrals.

Counter-intuitive moment. Picture the price chart of a stock. It looks like a normal continuous curve. But under the Brownian model, between any two times — say

t = 0.500

and

t = 0.501

— the path crosses every price level in some band infinitely many times. The chart you see is just a finite sample; the true mathematical object is much rougher.

Quadratic Variation: Why $(dW)^2 = dt$

Take the interval $[0, T]$ and divide it into $N$ sub-intervals of length $\Delta t = T/N$ . Form the sum

\displaystyle Q_N \;=\; \sum_{k=1}^{N} \bigl(W(t_k) - W(t_{k-1})\bigr)^2.

Each increment $\Delta W_k = W(t_k) - W(t_{k-1})$ is normal with variance $\Delta t$ . So $\mathbb{E}[(\Delta W_k)^2] = \Delta t$ , and therefore

\mathbb{E}[Q_N] \;=\; N \cdot \Delta t \;=\; T.

The variance of $Q_N$ can be computed from the fourth moment of a normal: $\operatorname{Var}(Q_N) = 2N(\Delta t)^2 = 2T \Delta t \to 0$ as $\Delta t \to 0$ . So $Q_N$ converges (in mean-square) to the deterministic constant $T$ . In differential shorthand:

\boxed{\,(dW)^2 \;=\; dt\,}.

This is THE identity of stochastic calculus. Ordinary calculus says

(dx)^2 = 0

to leading order — we throw squared infinitesimals away. Brownian motion is so jagged that the squared increments add up to something finite. The leftover

\tfrac{1}{2}\sigma^2 \partial^2 V / \partial S^2

in Itô's lemma — and therefore in Black-Scholes — is literally the price you pay for

(dW)^2 = dt

Numerical check. Run the Python simulation below with

\Delta t = 0.001

N = 1000

, and compute

\sum_k (\Delta W_k)^2

. You will get something near

T = 1

— we measured

0.9639

on one seed and

0.9977

with

N = 10\,000

. The bigger

N

, the closer to 1.

From W(t) to Stock Prices: Geometric Brownian Motion

Pure Brownian motion is not a great model for stock prices: it can go negative, and it has constant variance per unit time rather than constant percentage variance. The fix is to exponentiate it. Define

S(t) \;=\; S_0 \, \exp\!\left[\bigl(\mu - \tfrac{1}{2}\sigma^2\bigr)\,t \;+\; \sigma\,W(t)\right].

This is Geometric Brownian Motion (GBM). It is the process Black and Scholes assumed for the underlying stock — every modern derivatives engine starts here. Three things are worth unpacking:

It is always positive. $e^x > 0$ for every $x$ , no matter how negative $W(t)$ wanders. Stock prices stay above zero — good.
Log-returns are Gaussian. $\log(S(t)/S_0) = (\mu - \tfrac{1}{2}\sigma^2) t + \sigma W(t) \sim \mathcal{N}\bigl((\mu - \tfrac{1}{2}\sigma^2) t,\, \sigma^2 t\bigr)$ . Empirically log-returns of liquid stocks really are roughly normal at short horizons — a major reason the model survives.
Expected price grows exponentially. $\mathbb{E}[S(t)] = S_0 \, e^{\mu t}$ . The $-\tfrac{1}{2}\sigma^2$ correction is exactly what is needed to cancel the convexity of the exponential — this is the first Itô-style correction we will see, and it appears again as the missing term in Black-Scholes.

Switch the explorer above into S(t) mode and play with $\mu$ and $\sigma$ . The dashed curve is $\mathbb{E}[S(t)] = S_0 e^{\mu t}$ ; the coloured paths fluctuate around it.

Worked Example: Hand-Computing a Path

Before you let Python loose, do five steps with a pencil. The example is small enough to redo on a napkin, and it nails down the meaning of every symbol in the simulator.

▶ Worked example: a 5-step discrete Brownian path with

T = 1

N = 5

\Delta t = 0.2

We simulate a Brownian motion as the limit of a scaled random walk with $\xi_k = \pm 1$ coin flips and step-size $\sqrt{\Delta t}$ . With seed 42, NumPy produces the sequence

\xi = (-1,\, +1,\, -1,\, -1,\, -1).

The discrete path is the running sum scaled by $\sqrt{\Delta t} = \sqrt{0.2} \approx 0.4472$ :

k	t_k	ξ_k	running sum Σξ	W(t_k) = √Δt · Σξ
0	0.0	—	0	0.0000
1	0.2	−1	−1	−0.4472
2	0.4	+1	0	0.0000
3	0.6	−1	−1	−0.4472
4	0.8	−1	−2	−0.8944
5	1.0	−1	−3	−1.3416

Step 1: increments. Each Δ-move is $\xi_k \sqrt{\Delta t}$ . Numerically: $\Delta W = (-0.4472, +0.4472, -0.4472, -0.4472, -0.4472)$ .

Step 2: path values are the running totals of those increments (read off the last column above).

Step 3: quadratic variation. Each $(\Delta W_k)^2 = 0.2$ exactly, so

Q_5 \;=\; \sum_{k=1}^{5} (\Delta W_k)^2 \;=\; 5 \cdot 0.2 \;=\; 1.0 \;=\; T. \checkmark

The identity $(dW)^2 = dt$ reduces to a one-line check at this discretisation level — there is no chance involved at all, because for a ±1 walk every squared step is $\Delta t$ identically. With Gaussian increments it would be approximately $T$ , but the law of large numbers kicks in quickly.

Step 4: turn it into a stock price. Take $S_0 = 100$ , $\mu = 0.05$ , $\sigma = 0.2$ . At $t = 1$ :

S(1) = 100 \cdot \exp\!\bigl[(0.05 - 0.02)\cdot 1 + 0.2 \cdot (-1.3416)\bigr] = 100 \cdot e^{-0.2383} \approx 78.79.

So this particular realisation of the market loses about 21% over the year — a bad path. Most paths do better; only ~12% finish below 80. The visualizer in S(t) mode shows you the full shape of the distribution.

Step 5: sanity check via simulator. Set the explorer to steps = 5, $\sigma = 0.2$ , paths = 1, and New seed until you see a downward path that ends near 78–80. That is one realisation of exactly this hand calculation.

Plain Python Simulation

We now translate the recipe into NumPy. The exact numbers below come from running this script — they are not approximations, they are the precise output of $\texttt{np.random.seed(0)}$ . Click any line on the right and the corresponding explanation card will appear on the left.

🐍brownian.py

Explanation(16)

Code(22)

1Import NumPy

📚 NumPy gives us vectorised random numbers (np.random.normal) and the running sum np.cumsum we'll use to build the path.

2Import matplotlib (for the figure)

📚 matplotlib.pyplot is only used to plot W later in a notebook. The simulation itself is pure NumPy.

4Fix the seed

📚 np.random.seed(0) makes the random draws reproducible. Calling this script twice gives the same numbers — important when comparing to the hand-worked example.

6T = horizon

T is the total time we simulate over (think 'one year'). Brownian motion is defined on a continuous interval [0, T]; here T = 1.0.

EXECUTION STATE

T = 1.0

7N = number of steps

We chop the interval into N equal pieces. The bigger N, the finer the approximation to the true continuous process.

EXECUTION STATE

N = 100

8dt = step size

dt = T/N is how much time each step covers. Here dt = 0.01. This number is the lever that controls everything: each tiny increment dW has standard deviation √dt = 0.1.

EXECUTION STATE

dt = 0.01

10What this line does

Comment only — we are about to draw the N Brownian increments.

11Comment

Comment only — reminds the reader that each increment has variance dt (NOT variance 1).

12Draw N normal increments

📚 np.random.normal(loc, scale, size) draws Gaussian samples. We pass: • loc=0.0 → the mean of every increment is 0 (Brownian motion has no drift). • scale=np.sqrt(dt) → the standard deviation is √dt = 0.1, so variance = dt = 0.01. • size=N → we want N independent draws. ⬆ return: a 1D array dW of length 100. First 5 values: [0.1764, 0.0400, 0.0979, 0.2241, 0.1868]. These are the Δ-amounts the path moves on each step.

EXECUTION STATE

dW (first 5) = [0.1764, 0.0400, 0.0979, 0.2241, 0.1868]

dW.shape = (100,)

dW.mean() (≈ 0) = 0.0151

dW.std() (≈ √dt = 0.1) = 0.0992

14What this line does

Comment only — explains that the path itself is just the running total of the increments.

15Build the path by cumulative sum

📚 np.cumsum(dW) returns the running totals: [dW₀, dW₀+dW₁, dW₀+dW₁+dW₂, …]. Mathematically this is the discrete version of W(t) = ∫₀ᵗ dW(s). 📚 np.concatenate([[0.0], …]) prepends 0 because W(0) = 0 by definition of standard Brownian motion. ⬆ return: W of length N+1 = 101. First 5: [0.0000, 0.1764, 0.2164, 0.3143, 0.5384]. The final value W[100] = W(T) is one realisation of a draw from N(0, T).

EXECUTION STATE

W (first 5) = [0.0000, 0.1764, 0.2164, 0.3143, 0.5384]

W.shape = (101,)

W[-1] (one draw of W(T)) = 0.5981

17Comment

Comment only — the next line builds the matching time grid.

18Time grid

📚 np.linspace(0.0, T, N+1) returns N+1 equally-spaced values starting at 0 and ending at T. With T=1, N=100 this is t = 0.00, 0.01, 0.02, …, 1.00 — exactly the times at which we know W. ⬆ return: t with shape (101,).

EXECUTION STATE

t (first 5) = [0.00, 0.01, 0.02, 0.03, 0.04]

t[-1] = 1.0

20Print the first 5 increments

Sanity-check the first five Δ-moves. They should look like small numbers around 0 with spread ≈ 0.1. ⬆ prints: first 5 dW : [0.1764 0.0400 0.0979 0.2241 0.1868]

21Print the first 5 path values

Sanity-check the partial sums. Notice W[0] = 0 (by definition), W[1] = dW[0] = 0.1764, W[2] = dW[0]+dW[1] = 0.2164. ⬆ prints: first 5 W : [0. 0.1764 0.2164 0.3143 0.5384]

22Print W(T)

Final position of this single realisation. Theory says W(T) ~ N(0, T) = N(0, 1), so we should see something with magnitude ≲ 2 most of the time. ⬆ prints: W(T) : 0.5981

EXECUTION STATE

W(T) (this seed) = 0.5981

6 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4np.random.seed(0)
5
6T  = 1.0           # total time horizon (1 year)
7N  = 100           # number of time steps
8dt = T / N         # size of each time step  ->  0.01
9
10# Step 1: draw N independent normal increments,
11# each with mean 0 and variance dt.
12dW = np.random.normal(loc=0.0, scale=np.sqrt(dt), size=N)
13
14# Step 2: the Brownian path is the running sum of those increments.
15W = np.concatenate([[0.0], np.cumsum(dW)])
16
17# Step 3: associated time grid 0, dt, 2dt, ..., T.
18t = np.linspace(0.0, T, N + 1)
19
20print("first 5 dW :", dW[:5].round(4))
21print("first 5 W  :", W[:5].round(4))
22print("W(T)       :", W[-1].round(4))

How this maps to the math. Line 12 implements the Gaussian-increments axiom (property 3 above). Line 15 implements the Markov / independent-increments structure (property 2) by adding independent draws one after another. The fact that

W[0] = 0

is property 1, and using floats with a finite grid is the closest we can get to property 4 (continuity) on a computer.

PyTorch: Batched Path Simulation

Pricing a single derivative typically needs hundreds of thousands of simulated paths. Looping in Python would be glacial; PyTorch lets us draw the entire $(M \times N)$ grid of increments in one CUDA kernel. Here is a tiny example with $M = 4$ paths and $N = 5$ time steps — small enough that you can read every entry. Again, all numbers shown in the explanation cards are the literal output of running this script.

🐍gbm_batch.py

Explanation(16)

Code(23)

1Import PyTorch

📚 torch lets us simulate millions of paths in parallel on CPU or GPU using the same NumPy-style API.

3Seed

📚 torch.manual_seed(42) makes torch.randn reproducible. Same call twice → same numbers.

5Batch shape

M = number of independent paths, N = number of time steps per path. We picked tiny values (4 and 5) so the printed tensors fit on the page; production uses M=100 000 routinely.

EXECUTION STATE

M (paths) = 4

N (steps) = 5

6Market parameters

S0 = today's stock price, μ = expected log-return per year, σ = annualised volatility. The same three numbers Black–Scholes will eat in section 4.

EXECUTION STATE

S0 = 100.0

mu (drift) = 0.05

sigma (vol) = 0.20

7Horizon

T = 1 year of simulation.

EXECUTION STATE

T = 1.0

8Step size

dt = T / N = 0.2 years per step. Coarse on purpose.

EXECUTION STATE

dt = 0.2

10What this line does

Comment only — we are about to draw the whole grid of Brownian increments in one shot.

11Draw the entire (M × N) increment grid

📚 torch.randn(M, N) returns a tensor of shape (M, N) of standard normals (mean 0, variance 1). We scale by √dt to give each entry variance dt — exactly what defines Brownian increments. ⬆ return: dW with shape (4, 5). Row p is the sequence of increments for path p.

EXECUTION STATE

dW.shape = (4, 5)

dW (rounded) =

[[ 0.8617,  0.6651,  0.4028, -0.9416, -0.3390],
 [ 0.4822,  0.3581,  0.7516,  0.1591, -0.3071],
 [-0.2206,  0.1080, -0.1036,  0.0187, -0.1125],
 [ 0.3845, -0.1385, -0.1770,  0.3593, -0.2780]]

13What this line does

Comment only — running sum along columns gives the Brownian paths.

14Cumulative sum gives the path

📚 torch.cumsum(dW, dim=1) sums along the time axis (the second axis). • dim=1 means accumulate across columns within each row, i.e. WITHIN each path. • If we used dim=0 we would (wrongly) sum across paths. Example for path 0: 0.8617, 0.8617+0.6651=1.5269, +0.4028=1.9297, … and so on. ⬆ return: W with shape (4, 5).

EXECUTION STATE

W.shape = (4, 5)

W (rounded) =

[[ 0.8617,  1.5269,  1.9297,  0.9881,  0.6490],
 [ 0.4822,  0.8404,  1.5920,  1.7511,  1.4440],
 [-0.2206, -0.1126, -0.2162, -0.1975, -0.3101],
 [ 0.3845,  0.2460,  0.0691,  0.4284,  0.1504]]

16What this line does

Comment only — set up the row vector of times at which we will evaluate S(t).

17Time grid

📚 torch.arange(1, N+1) gives the integers [1,2,3,4,5]. We multiply by dt = 0.2 to get the actual times [0.2, 0.4, 0.6, 0.8, 1.0]. .float() casts the integer tensor to float so the later exponentiation works. ⬆ return: t with shape (5,). Broadcasts across all 4 paths.

EXECUTION STATE

t = [0.2, 0.4, 0.6, 0.8, 1.0]

19What this line does

Comment only — apply the closed-form solution of the GBM SDE.

20Apply the GBM formula

📚 torch.exp evaluates eˣ elementwise. The formula has two ingredients per entry: • Deterministic drift (μ − ½σ²) t — same for every path at the same time. • Stochastic shock σ W(t) — different per path. Broadcasting works because t has shape (5,) and W has shape (4,5), so PyTorch lines up t with the time axis automatically. ⬆ return: S with shape (4, 5) — each row is one simulated price path.

EXECUTION STATE

S.shape = (4, 5)

S (rounded) =

[[119.52, 137.35, 149.77, 124.81, 117.33],
 [110.79, 119.73, 139.99, 145.39, 137.55],
 [ 96.26,  98.95,  97.51,  98.46,  96.85],
 [108.64, 106.31, 103.23, 111.59, 106.19]]

22Print the price matrix

Each row is one path of S over the 5 time stamps. With only 4 paths the sample mean is a noisy estimator of E[S(T)] = S₀ eᵘᵀ = 105.13. ⬆ prints: tensor([[119.52, 137.35, 149.77, 124.81, 117.33], …])

23Sample mean at T

📚 S[:, -1] selects the LAST column — terminal prices across all paths. • : on the first axis means 'every path'. • -1 on the second axis means 'last time step'. .mean() averages those 4 numbers, .item() pulls the scalar out of the tensor. ⬆ prints: mean S(T): 114.4795. With only 4 paths this is far from the analytic 105.13 — convergence is O(1/√M), so production runs use M ~ 10⁵–10⁶.

EXECUTION STATE

sample mean S(T) (M=4) = 114.4795

analytic E[S(T)] = S0 e^{μT} = 105.1271

7 lines without explanation

1import torch
2
3torch.manual_seed(42)
4
5M, N = 4, 5                 # 4 paths, 5 steps each
6S0, mu, sigma = 100.0, 0.05, 0.20
7T  = 1.0
8dt = T / N                  # 0.2
9
10# Step 1: draw all M*N increments in one call.
11dW = torch.randn(M, N) * (dt ** 0.5)
12
13# Step 2: running sum along the time axis -> W has shape (M, N).
14W = torch.cumsum(dW, dim=1)
15
16# Step 3: time grid 0.2, 0.4, ..., 1.0
17t = torch.arange(1, N + 1).float() * dt
18
19# Step 4: GBM closed form  S_t = S0 * exp((mu - 0.5*sigma^2)*t + sigma*W_t)
20S = S0 * torch.exp((mu - 0.5 * sigma ** 2) * t + sigma * W)
21
22print(S.round(decimals=2))
23print("mean S(T):", S[:, -1].mean().item())

Why this matters in production. Replace

M=4

with

M = 100\,000

and move

dW

to GPU with

\texttt{.cuda()}

— the same six-line program prices a vanilla European option in under a second. We will use exactly this engine in Section 31.8 for Monte-Carlo pricing.

Why Quants Care

Brownian motion is not just a mathematical curiosity — it is the atomic noise model of quantitative finance, physics, and biology. A partial list:

Domain	Use	Famous result
Derivative pricing	Underlying stock log-prices follow σ W(t).	Black–Scholes formula (1973)
Interest-rate modelling	Short rate r(t) driven by dr = a(b - r) dt + σ dW.	Vasicek (1977), CIR (1985), Hull–White
Portfolio optimisation	Wealth dynamics dW_t = (r W_t + π(μ−r)) dt + π σ dW.	Merton's optimal consumption (1969)
Statistical physics	Pollen grain in fluid — Einstein's original 1905 model.	Diffusion equation ∂u/∂t = ½ ∂²u/∂x²
Machine learning	Score-based generative models reverse a Brownian forward process.	Stable diffusion's forward SDE
Algorithms	Stochastic gradient descent ≈ noisy gradient flow + Brownian.	SGD-as-SDE analysis (Mandt et al. 2017)

The Big Idea. Every random thing that evolves in continuous time and is driven by independent, mean-zero noise looks locally like a Brownian motion. That is why we spend a whole section on it. Master $W(t)$ and you have the building block for derivative pricing, interest-rate models, diffusion in physics, score-based generative AI, and a great deal of modern probability.

Summary

A stock's price wiggles infinitely often — its path is nowhere differentiable. Classical calculus cannot handle this; stochastic calculus can.
The rescaled random walk with step size $\sqrt{\Delta t}$ converges to Brownian motion $W(t)$ (Donsker's theorem). The $\sqrt{\Delta t}$ scaling is what keeps variance finite as the time step shrinks.
$W(t)$ is defined by four axioms: $W(0)=0$ , independent increments, Gaussian increments $\mathcal{N}(0, t-s)$ , and continuous paths.
The single most important consequence is the quadratic-variation identity $(dW)^2 = dt$ . It is the seed of every difference between Itô and Newton-Leibniz calculus.
Geometric Brownian Motion $S(t) = S_0 e^{(\mu - \sigma^2/2)t + \sigma W(t)}$ is the standard stock-price model in Black-Scholes. Its expected price grows exponentially at rate $\mu$ .
A few lines of NumPy or PyTorch are enough to simulate millions of paths — the foundation of every Monte-Carlo derivatives engine in Section 31.8.

Next we promote $(dW)^2 = dt$ into a working tool — Itô's lemma — and use it to write down the SDE for any function of $S(t)$ . That is the last missing ingredient before the Black-Scholes PDE falls out almost for free.

Learning Objectives

The Question Calculus Cannot Answer

From a Drunkard's Walk to a Stock Price

The Magic of the √dt Scaling

Brownian Motion: The Formal Definition

Playing With the Process

Strange Properties of Brownian Paths

Quadratic Variation: Why (dW)2=dt(dW)^2 = dt(dW)2=dt

From W(t) to Stock Prices: Geometric Brownian Motion

Worked Example: Hand-Computing a Path

Plain Python Simulation

PyTorch: Batched Path Simulation

Why Quants Care

Summary

Quadratic Variation: Why $(dW)^2 = dt$