Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Explain why Monte Carlo pricing works at all by identifying the option price with an expectation under the risk-neutral measure.
Simulate a Geometric Brownian Motion path by hand and recognise where the Itô correction $-\tfrac12 \sigma^2 T$ comes from in the analytic terminal-price formula.
State and use the Monte Carlo error formula $\text{SE} = s/\sqrt{N}$ , including its consequence: 10× accuracy ⇒ 100× compute.
Hand-trace a six-path Monte Carlo pricing of a European call and read off both the estimate and the 95% confidence interval.
Implement a clean Python pricer and a PyTorch differentiable pricer that returns the price and all four Greeks from a single backward pass.
Choose when Monte Carlo is worth the variance — path-dependent payoffs, high-dimensional baskets, and pricing under dynamics with no closed form.

The Big Picture: When Formulas Run Out

Section-05 gave us a closed-form Black-Scholes price for European calls and puts. So why bother simulating?

Because the closed form is a special case — a beautiful one, but a special case. The instant the payoff stops being a simple function of $S_T$ , the integral that defines the price loses its tidy form. Look at a few examples that real desks deal with every day:

Option type	Payoff	Closed form?
European call/put	(S_T − K)₊	Yes — Black-Scholes
Asian (avg-price)	(avg(S_t) − K)₊	No — average of a lognormal is not lognormal
Barrier (knock-out)	(S_T − K)₊ · 𝟙{max S_t < B}	Only under constant vol; closes fast otherwise
Basket on 5 assets	(Σ wᵢ Sᵢ − K)₊	No — sum of lognormals is not lognormal
American put	max early exercise	No — free-boundary PDE
Path-dependent custom	anything	Almost never

Every one of these has the same mathematical form:

\displaystyle V_0 \;=\; e^{-rT}\,\mathbb{E}^{\mathbb{Q}}\!\left[\, \text{payoff}(S_{[0,T]}) \,\right]

Risk-neutral pricing is, at its heart, the statement that today's fair price is the expected discounted payoff under a special probability measure $\mathbb{Q}$ in which the stock's expected return is the risk-free rate. The only thing that changes between products is the payoff functional. And computing an expectation — when no clever integral substitution makes it pop out in closed form — is exactly what Monte Carlo was invented for.

Analogy. Imagine you want the average height of every adult in your country. You could write down a giant integral over a population density you do not really know — or you could randomly sample 10,000 people, average their heights, and call it done. Monte Carlo option pricing is the second move: instead of integrating the lognormal distribution analytically, we sample from it 10,000 (or 10,000,000) times and average the payoff.

The Core Idea: Pricing = Average Discounted Payoff

Start from the risk-neutral formula. For a European call,

\displaystyle C_0 \;=\; e^{-rT}\,\mathbb{E}^{\mathbb{Q}}\!\left[\, (S_T - K)_+ \,\right]

Suppose for a moment we have a magic machine that can, on demand, produce one independent draw $S_T^{(i)}$ from the risk-neutral distribution of $S_T$ . We do not (yet) know the density — we just need samples from it.

The strong law of large numbers guarantees that

\displaystyle \hat{C}_N \;=\; e^{-rT}\,\frac{1}{N}\sum_{i=1}^{N}\,(S_T^{(i)} - K)_+ \;\xrightarrow[N\to\infty]{\text{a.s.}}\; C_0

as long as the payoff has finite mean — which it does for any payoff bounded by a polynomial in $S_T$ . Pricing reduces to three independent tasks:

Sample $S_T$ from the risk-neutral law (the only place model assumptions live).
Evaluate the payoff at each sample (problem-specific, but always a deterministic function).
Average the discounted payoffs and report the mean with a standard error.

The hidden virtue. The error of Monte Carlo does not depend on the dimension of the integrand. Numerical quadrature on a five-asset basket would need

M^5

grid points; Monte Carlo still needs

N

samples for the same accuracy. That curse-of-dimensionality immunity is the reason MC dominates pricing for

\geq 3

underlyings.

Simulating One Stock Path Under GBM

Under Black-Scholes the stock follows geometric Brownian motion under $\mathbb{Q}$ :

\displaystyle dS_t \;=\; r\,S_t\,dt \;+\; \sigma\,S_t\,dW_t

Applying Itô's lemma to $\ln S_t$ gives the integrated solution

\displaystyle S_T \;=\; S_0 \,\exp\!\left[\,\left(r - \tfrac12 \sigma^2\right) T \;+\; \sigma\, W_T\,\right]

with $W_T \sim \mathcal{N}(0, T)$ , so $W_T = \sqrt{T}\,Z$ for $Z \sim \mathcal{N}(0, 1)$ . Substituting,

\displaystyle S_T \;=\; S_0 \,\exp\!\left[\,\left(r - \tfrac12 \sigma^2\right) T \;+\; \sigma\sqrt{T}\,Z\,\right]

This is the entire engine. A single draw

Z \sim \mathcal{N}(0,1)

turns into one sample of

S_T

. No path discretization, no Euler-Maruyama, no time stepping — for European payoffs the analytic GBM solution lets us jump straight to expiry.

Three small things are worth pausing on:

The drift is $r$ , not the real-world expected return. That is the whole substance of risk-neutral pricing — the asset pretends to drift at the risk-free rate under $\mathbb{Q}$ .
The $-\tfrac12 \sigma^2$ correction is not optional. It is the Itô correction. Without it, $\mathbb{E}^{\mathbb{Q}}[S_T] \ne S_0 e^{rT}$ and discounted prices would fail to be martingales — and our entire pricing edifice would collapse.
For path-dependent payoffs we cannot jump to expiry — we have to step. The Euler-Maruyama discretization is the go-to: $S_{t+\Delta t} = S_t \exp[(r - \tfrac12\sigma^2)\Delta t + \sigma\sqrt{\Delta t}\,Z]$ .

Interactive: Watch Many Worlds Unfold

Each curve below is one possible future of the stock — a single draw from the risk-neutral distribution. Green paths end above the strike $K$ (they pay out $S_T - K$ ); red paths end below it (they pay out nothing). The Monte Carlo price is the average green height, discounted, divided by the total path count.

S₀ (spot)100.00

K (strike)105.00

T (years)1.00

r5.0%

σ25%

N paths40

steps/path100

seed7

Each green/red curve is one simulated future of the stock. The right-edge histogram is the empirical distribution of the terminal price S_T. The call payoff (S_T − K)₊ is the green-shaded mass above K — the Monte Carlo price is just the average of that mass, discounted back at rate r.

Push $\sigma$ up and watch the fan splay wider — more paths land in the green region but also further from it. Push $T$ up and the fan stretches in time. Compare the "MC" readout against "BS analytic" in the top-left corner; the gap is your Monte Carlo error.

Try this. Set

K = 105

and

N = 20

paths. The MC price will be wildly wrong — sometimes 50% off — because only a handful of paths land in the money. Now bump

N

to 500; the estimate tightens dramatically. This is the core lesson: Monte Carlo is consistent, but not magic.

The Monte Carlo Estimator and Its Standard Error

Write $X_i = e^{-rT}(S_T^{(i)} - K)_+$ for the i-th discounted payoff. Each $X_i$ is an independent draw from the same distribution; call its true mean $\mu = C_0$ and its variance $\sigma_X^2$ . The MC estimator is the sample mean,

\displaystyle \hat{C}_N \;=\; \bar{X}_N \;=\; \frac{1}{N}\sum_{i=1}^{N} X_i.

Two consequences from elementary probability:

Unbiased: $\mathbb{E}[\hat{C}_N] = \mu = C_0$ . There is no systematic error from finite sampling.
Variance scales like 1/N: $\mathrm{Var}(\hat{C}_N) = \sigma_X^2 / N$ . Hence the standard error $\mathrm{SE}(\hat{C}_N) = \sigma_X / \sqrt{N}$ .

The Central Limit Theorem then gives us a confidence band:

\displaystyle \hat{C}_N \;-\; 1.96\,\frac{s_N}{\sqrt{N}} \;\leq\; C_0 \;\leq\; \hat{C}_N \;+\; 1.96\,\frac{s_N}{\sqrt{N}} \quad (95\%)

where $s_N$ is the sample standard deviation of the payoffs. This single inequality is the rule that lets a trading desk sleep at night.

The 1/√N tax. To cut the error by a factor of 10, you must increase $N$ by a factor of 100. To cut by 100, by 10,000. Monte Carlo never escapes this geometry — every variance-reduction technique we will meet is some clever way to shrink $\sigma_X$ in the numerator instead of fighting $\sqrt{N}$ in the denominator.

Interactive: Convergence at Rate 1/√N

The plot below draws the running MC estimate after every new path (amber), the true Black-Scholes price (cyan dashed), and a 95% confidence band (light cyan ribbon) that widens or narrows as the empirical variance updates.

S₀100.00

K100.00

T1.00

r5.0%

σ25%

N max5000

seed13

The amber curve is the Monte Carlo estimate after each new path. The dashed cyan line is the exact Black-Scholes value. The shaded band is the running ±1.96 standard errors. Notice how the band — like every honest Monte Carlo error — shrinks like $1/\sqrt{N}$ : a 10× improvement in accuracy costs a 100× increase in sample size.

Three things to notice as you play:

The amber curve is jagged at the start and gradually flattens. That is the law of large numbers in action.
The cyan band shrinks roughly like $1/\sqrt{N}$ . Doubling $N$ shrinks it by a factor of $\sqrt{2} \approx 1.41$ , not 2.
The estimate stays inside the band roughly 95% of the time. Move the seed slider; you can find runs where the estimate briefly drifts outside it — that is the 5% the CLT allows.

Worked Example (Try It By Hand)

Let us price a one-year ATM European call with $S_0 = 100, K = 100, r = 5\%, \sigma = 25\%, T = 1$ using only six paths. The closed-form Black-Scholes answer is $C_0 \approx 12.336$ ; we will see how close a tiny MC sample can get.

We need six standard-normal draws. Suppose our RNG hands us

\displaystyle Z \;=\; (\,-0.55,\; 1.21,\; 0.04,\; -1.83,\; 0.72,\; 1.55\,)

▶ Show full hand-worked solution (6 paths, 4 steps)

Step 1 — Pre-compute the GBM constants

With $r = 0.05$ , $\sigma = 0.25$ , $T = 1$ :

drift = $(r - \tfrac12 \sigma^2)\,T = (0.05 - 0.03125)\cdot 1 = 0.01875$
diffusion = $\sigma \sqrt{T} = 0.25\cdot 1 = 0.25$
discount = $e^{-rT} = e^{-0.05} \approx 0.95123$

Step 2 — Map each Z to S_T

Use $S_T = 100 \cdot \exp(0.01875 + 0.25\,Z)$ .

i	Zᵢ	exponent	S_T(i)	Payoff (S_T − 100)₊
1	−0.55	0.01875 − 0.1375 = −0.11875	88.80	0.00
2	+1.21	0.01875 + 0.3025 = 0.32125	137.89	37.89
3	+0.04	0.01875 + 0.0100 = 0.02875	102.92	2.92
4	−1.83	0.01875 − 0.4575 = −0.43875	64.48	0.00
5	+0.72	0.01875 + 0.1800 = 0.19875	121.99	21.99
6	+1.55	0.01875 + 0.3875 = 0.40625	150.10	50.10

Step 3 — Average and discount

Sum of payoffs = 0 + 37.89 + 2.92 + 0 + 21.99 + 50.10 = 112.90.

Sample mean = $112.90 / 6 = 18.817$ .

Discounted estimate = $0.95123 \cdot 18.817 \approx 17.90$ .

Step 4 — Compute the 95% confidence interval

Discounted payoffs: $0,\; 36.04,\; 2.78,\; 0,\; 20.92,\; 47.66$ .

Mean of those = 17.90 (matches above). Sample variance ≈ 401.6 (using $\tfrac{1}{N-1}\sum (X_i - \bar X)^2$ ), so $s_N \approx 20.04$ .

Standard error = $20.04/\sqrt{6} \approx 8.18$ .

95% CI ≈ $17.90 \pm 1.96 \cdot 8.18 = 17.90 \pm 16.04$ , i.e. the interval roughly $[\,1.86,\, 33.94\,]$ .

The lesson. Our point estimate (17.90) is wildly off the true value (12.336) — but the 95% CI does contain the true price. With only six samples that is the best honest summary we can give. Push $N$ to 100,000 and the half-width drops to ~0.05 — the estimate becomes a precision measurement.

Variance Reduction: Same Accuracy, Fewer Paths

The 1/√N rate is unbreakable, but the constant $\sigma_X$ in front is up for grabs. Every variance-reduction technique attacks the same target: shrink the variance of one payoff, and you need fewer paths for the same precision. Three techniques cover ~90% of real-world MC work.

1. Antithetic variates

For every standard normal $Z$ you draw, also evaluate the payoff at its mirror $-Z$ , and average the pair. Because $-Z \sim \mathcal{N}(0,1)$ too, each pair is a valid sample of $\mathbb{E}[\,\text{payoff}\,]$ . But the two members of a pair are negatively correlated — high $S_T$ on one comes with low $S_T$ on the other. The variance of their average is

\displaystyle \mathrm{Var}\!\left(\frac{X_+ + X_-}{2}\right) \;=\; \frac{\sigma_X^2}{2}\,(1 + \rho)

with $\rho < 0$ — strictly less than the plain variance $\sigma_X^2/2$ you would get from two independent samples. For at-the-money European calls $\rho \approx -0.6$ , giving roughly a 2.5× variance reduction for the same compute.

2. Control variates

Suppose $Y$ is some related payoff whose expectation $\mu_Y$ we know in closed form. Then the unbiased estimator

\displaystyle \hat{C}^{\text{CV}} \;=\; \bar{X} - c\,(\bar{Y} - \mu_Y)

has variance $\sigma_X^2(1 - \rho_{XY}^2)$ at the optimal $c$ . A common choice for option pricing: use the underlying itself as $Y$ — we know $\mathbb{E}^{\mathbb{Q}}[S_T] = S_0 e^{rT}$ . Variance reductions of 10× or more are routine.

3. Importance sampling

Deep out-of-the-money options have many paths returning zero — wasted samples. Importance sampling reweights the distribution to oversample the in-the-money region, then corrects with a likelihood ratio. Spectacular gains (1000×+) on rare-event problems; requires care to keep the estimator unbiased.

Interactive: Antithetic vs Plain MC

Both lines below see the same stream of random numbers — the only difference is that the purple estimator also evaluates the payoff at $-Z$ and averages the pair. Watch how it converges to the analytic price along a noticeably tighter trajectory.

S₀100.00

K100.00

T1.00

r5.0%

σ25%

N3000

seed21

Both estimators see exactly the same random numbers — antithetic MC simply also evaluates the payoff at $-Z$ and averages the pair. The ratio of standard errors at the top tells you how many fewer paths you would need for the same precision; for at-the-money European calls the ratio is typically $\approx 0.4$ , i.e. variance roughly cut in half for the same compute budget.

The "ratio" readout shows $\text{SE}_{\text{anti}} / \text{SE}_{\text{plain}}$ . At ATM you should see something close to $0.4$ – $0.5$ — equivalent to squeezing 4× as many independent paths out of the same compute budget. Push $K$ far OTM (say 150) and the gain shrinks; the negative correlation weakens because both members of every pair tend to be zero.

Plain Python: A Vanilla Monte Carlo Pricer

Before we let PyTorch do the heavy lifting, we build the same pricer in plain Python so every operation is visible. No numpy, no autograd — just a for-loop and $\mathcal{N}(0,1)$ draws.

Vanilla Monte Carlo Pricer (Plain Python)

🐍mc_pricer.py

Explanation(22)

Code(30)

1Import math

We need exp, log, sqrt for the closed-form GBM step and the discount factor. We avoid numpy here on purpose — the goal is to make every operation visible. Vectorization will come back in the PyTorch version below.

2Import random

Python's standard-library Mersenne-Twister RNG. Good enough for teaching, not good enough for a trading desk (in production you would use numpy.random.Generator with PCG64, or QMC sequences like Sobol).

4Function signature

Five Black-Scholes inputs plus the number of paths and a seed. Notice n_paths defaults to 100,000 — that is the magic number that gives ~1-cent accuracy on a $10 option, which is the typical bid-ask spread.

EXAMPLE

mc_european_call(100, 100, 1.0, 0.05, 0.25)

5Docstring

One line: the function prices a European call by averaging discounted payoffs. That is the entire risk-neutral pricing recipe in one sentence.

6Seeded RNG

A local Random instance with a fixed seed makes the function deterministic. Reproducibility is non-negotiable in pricing — without it you can never tell a real PnL change from a re-seed.

EXAMPLE

rng.gauss(0,1) → -0.193, 0.547, ...

7Discount factor

The whole point of risk-neutral pricing: future cashflows discount at the risk-free rate. We compute exp(−rT) once and reuse it.

EXAMPLE

exp(−0.05 · 1) ≈ 0.9512

8Drift constant

Under the risk-neutral measure Q the log-price has drift (r − ½σ²)T over T years. The −½σ² term is the Itô correction — without it the discounted price would not be a martingale.

EXAMPLE

(0.05 − ½·0.0625) · 1 = 0.01875

9Diffusion scale

σ √T is the standard deviation of the log-return over T years. Together with the drift it gives the analytic one-step GBM update — no Euler-Maruyama needed for the terminal price.

EXAMPLE

0.25 · √1 = 0.25

11Running sum

We accumulate the sum of payoffs into a Python float. Mathematically this is N · sample mean. Numerically: be aware that summing 10⁷ floats can lose ~6 digits of precision; Kahan summation or numpy's vectorized sum avoids it.

12Running sum of squares

We need this to compute the sample variance — which becomes our standard error, which becomes our confidence band on the price. A pricer with no error bar is dangerously misleading.

13The Monte Carlo loop

One iteration = one simulated future of the world. n_paths iterations = n_paths independent draws of S_T. Each draw is statistically equivalent to actually running the universe forward T years — a fact that never stops being amazing.

14Standard normal draw

Z ~ N(0,1). This is the *only* source of randomness in the whole pricer; everything else is a deterministic transform of Z. That single fact is what makes pathwise derivatives (later, in PyTorch) tractable.

EXAMPLE

Z = -0.193  →  one possible Brownian endpoint

15Closed-form GBM endpoint

Because GBM has a closed-form solution, we do not need to simulate the path step-by-step for a European payoff — we can jump straight from t=0 to t=T using S_T = S_0 · exp((r − ½σ²)T + σ√T · Z). For path-dependent options (Asian, barrier) you would discretize instead.

EXAMPLE

S_T = 100 · exp(0.01875 − 0.25·0.193) ≈ 97.1

16Discounted payoff

max(S_T − K, 0) is the European call's payoff at expiry; multiplying by exp(−rT) brings that cashflow back to today's dollars. This single quantity is what we average to get the price.

EXAMPLE

max(97.1 − 100, 0) · 0.9512 = 0.0

17Accumulate payoff

Add to the running sum. After n_paths iterations, total / n_paths is the sample mean — an unbiased, strongly consistent estimator of E[e^{−rT}·(S_T − K)_+].

18Accumulate payoff²

For the variance: Var = E[X²] − (E[X])². Doing it in one pass (Welford's algorithm is the more numerically stable version) avoids storing all 10⁵ payoffs.

20Sample mean

The Monte Carlo estimate of the call price. By the strong law of large numbers it converges to the true price as n_paths → ∞.

EXAMPLE

mean ≈ 12.34 → call price estimate

21Sample variance of the payoffs

Variance of one draw. Note: this is NOT the variance of the estimator — that comes next, divided by n_paths.

22Standard error of the mean

The CLT says (mean − true) ≈ N(0, var/n). The standard error is √(var/n). The 95% confidence half-width is 1.96 · SE. This is the single most important number a Monte Carlo pricer must report.

EXAMPLE

SE ≈ 0.026  →  95% CI ≈ ±0.05

23Return (estimate, SE)

Returning the standard error alongside the estimate is non-negotiable. A point estimate alone is a lie — the model could be off by 10 cents and you would have no idea.

26Call the pricer

Same worked-example scenario we hand-traced: S₀=100, ATM strike, 1 year, 5% rate, 25% vol. With 200k paths we expect ~1-cent precision.

31Print with confidence band

Always print the 95% CI half-width next to the point estimate. If a trader asks 'is the price 12.34 or 12.40?' the band tells them whether the question is even answerable with this many paths.

EXAMPLE

MC price = 12.3361 ± 0.0511 (95% CI)

8 lines without explanation

1import math
2import random
3
4def mc_european_call(S0, K, T, r, sigma, n_paths=100_000, seed=0):
5    """Price a European call by averaging discounted payoffs."""
6    rng = random.Random(seed)
7    discount = math.exp(-r * T)
8    drift    = (r - 0.5 * sigma * sigma) * T
9    diff     = sigma * math.sqrt(T)
10
11    total = 0.0
12    total_sq = 0.0
13    for _ in range(n_paths):
14        Z   = rng.gauss(0.0, 1.0)
15        ST  = S0 * math.exp(drift + diff * Z)
16        payoff = max(ST - K, 0.0) * discount
17        total    += payoff
18        total_sq += payoff * payoff
19
20    mean = total / n_paths
21    var  = total_sq / n_paths - mean * mean
22    se   = math.sqrt(var / n_paths)
23    return mean, se
24
25# Run on the worked-example scenario.
26price, se = mc_european_call(
27    S0=100, K=100, T=1.0, r=0.05, sigma=0.25,
28    n_paths=200_000, seed=42,
29)
30print(f"MC price = {price:.4f}  ±  {1.96 * se:.4f}  (95% CI)")

Running this on the worked-example scenario with 200,000 paths gives an estimate within ~5 cents of the Black-Scholes truth — better than the bid-ask of any liquid option. The pricer is exactly 17 lines of computation, no closed forms, no integrals. That is the Monte Carlo bargain: trade analytic effort for compute.

PyTorch: Pathwise Greeks via Autograd

The plain-Python loop gives us the price. To trade an option we also need its sensitivities — the Greeks. Two paths from here:

Bumping (finite differences): run the pricer again with $S_0 + h$ , subtract, divide by $h$ . One full MC repricing per Greek. Five inputs ⇒ five reruns. Noisy because you are subtracting two noisy numbers.
Pathwise via autograd: the chain rule, applied inside every Monte Carlo path, then averaged. One backward pass yields all Greeks, exactly as variance-reduced as the price itself.

Because the closed-form GBM formula $S_T = S_0 \exp[\cdot\cdot\cdot]$ is smooth in every input, and the call payoff $(S_T - K)_+$ is smooth almost everywhere, the pathwise gradient estimator is unbiased. PyTorch's autograd computes it for us essentially for free.

Differentiable Monte Carlo: Price + All Greeks in One Backward Pass

🐍mc_torch.py

Explanation(22)

Code(28)

1Import torch

Torch gives us two things at once: vectorized tensor ops (one line replaces the for-loop) AND automatic differentiation (one .backward() replaces a textbook of Greek formulas).

3Differentiable pricer

Same risk-neutral recipe as the plain-Python version, but every operation is a torch op. That makes the entire mapping (S₀, K, T, r, σ) → price one big differentiable computation graph.

4Docstring

The phrase 'in one shot' is doing real work here. Bumping (the alternative to pathwise Greeks) requires one full Monte Carlo run per Greek; autograd gives you all of them from a single forward pass.

5Seeded Generator

torch.Generator().manual_seed(seed) makes the entire pricer deterministic. Critical for regression testing: if your Delta changes from 0.602 to 0.604 between commits, you need to know whether it is the model or the noise.

6All normals in one shot

torch.randn(n, generator=gen) draws all n_paths standard normals as a single tensor on the C++ side. For n=400k this is ~30× faster than the Python loop in the previous block — and exactly the same statistics.

EXAMPLE

Z.shape = torch.Size([400000])

8Drift

Same Itô-corrected drift (r − ½σ²)T. The crucial bit: this is a tensor expression. Because sigma has requires_grad=True, autograd will keep track of how the price depends on σ THROUGH this line.

9Diffusion

torch.sqrt(T) — not math.sqrt(T) — so T can be a leaf with requires_grad. That is what unlocks Theta (∂price/∂T) later.

10Closed-form GBM, vectorized

ST is a tensor of shape (n_paths,). One line replaces the Python loop. The exp and * are broadcast across all paths in parallel, and each path's gradient w.r.t. every leaf input is tracked individually before being averaged at the end.

EXAMPLE

ST.shape = torch.Size([400000])

11torch.clamp for max(·, 0)

torch.clamp(x, min=0) gives the ReLU-like payoff. It IS differentiable almost everywhere — the kink at S_T = K has measure zero, so the pathwise gradient is well-defined for European options. (For barrier/digital options this argument breaks; you would need to smooth the payoff.)

EXAMPLE

clamp(-3, min=0) = 0,   clamp(7, min=0) = 7

12Discount + mean = MC price

payoff.mean() is the Monte Carlo estimator; multiplying by exp(−rT) discounts back to today. Because mean() is differentiable, ∂(price)/∂(any input) is automatically the average of pathwise derivatives — the pathwise method for free.

15S₀ as a leaf tensor with grad

requires_grad=True tells autograd: I want ∂price/∂S₀. After .backward(), S0.grad will hold the Monte Carlo estimate of Delta.

EXAMPLE

S0.grad → Delta ≈ 0.602

16K is a constant

We do not differentiate the price by the strike — the strike is fixed by the contract. Leaving requires_grad off keeps K out of the autograd graph and saves a few ops.

17T with grad → Theta

Setting requires_grad=True on T means ∂price/∂T is computed for free in the same backward pass. That derivative is Theta (with a sign convention: trading Theta is usually −∂P/∂T).

18r with grad → Rho

Same trick for r: one extra requires_grad=True is all it costs to get Rho.

19σ with grad → Vega

Vega = ∂price/∂σ. Closed-form Vega is S·φ(d₁)·√T — we will never type that formula. Autograd reads it off the graph.

21Forward pass

Runs the whole pricer once. Internally, every tensor op was recorded to a graph so that the gradient w.r.t. each requires_grad leaf can be reconstructed on the way back.

EXAMPLE

price ≈ tensor(12.34, grad_fn=<MulBackward0>)

22Backward pass

price.backward() walks the graph backwards and deposits ∂price/∂x into x.grad for every leaf with requires_grad. One call → four Greeks. Bumping would have required four full repricings.

24Print the price

Same number you would get from the plain-Python loop, ± Monte Carlo noise. Increase n_paths and seed scanning to bound that noise.

EXAMPLE

price ≈ 12.34

25Delta

S0.grad is the average over paths of ∂payoff/∂S₀. For a European call this is the pathwise indicator 𝟙{S_T > K} · S_T / S₀ · e^{−rT} — exactly the closed-form Delta of Black-Scholes. Autograd reproduces it without ever being told.

EXAMPLE

Delta ≈ 0.602 — matches N(d₁) from the BS formula

26Vega

sigma.grad — the Monte Carlo Vega. With 400k paths this typically matches the analytic S·φ(d₁)·√T to ~3 decimals. Increase paths to tighten further.

EXAMPLE

Vega ≈ 37.5

27Theta

T.grad — the derivative with respect to time-to-expiry. The economic Theta the desk quotes is usually the negative of this (price decays with time).

28Rho

r.grad — the derivative with respect to the risk-free rate. For typical ATM equity options Rho is small and overlooked, but for long-dated swaptions it dominates.

6 lines without explanation

1import torch
2
3def mc_call_torch(S0, K, T, r, sigma, n_paths=200_000, seed=0):
4    """Differentiable MC pricer: price AND all Greeks in one shot."""
5    gen = torch.Generator().manual_seed(seed)
6    Z   = torch.randn(n_paths, generator=gen)
7
8    drift = (r - 0.5 * sigma**2) * T
9    diff  = sigma * torch.sqrt(T)
10    ST    = S0 * torch.exp(drift + diff * Z)
11    payoff = torch.clamp(ST - K, min=0.0)
12    return torch.exp(-r * T) * payoff.mean()
13
14# Inputs as leaf tensors with grad — these become the *Greeks*.
15S0    = torch.tensor(100.0, requires_grad=True)
16K     = torch.tensor(100.0)
17T     = torch.tensor(1.0,   requires_grad=True)
18r     = torch.tensor(0.05,  requires_grad=True)
19sigma = torch.tensor(0.25,  requires_grad=True)
20
21price = mc_call_torch(S0, K, T, r, sigma, n_paths=400_000, seed=42)
22price.backward()
23
24print(f"price = {price.item():.4f}")
25print(f"Delta = ∂P/∂S₀ = {S0.grad.item():.4f}")
26print(f"Vega  = ∂P/∂σ  = {sigma.grad.item():.4f}")
27print(f"Theta = ∂P/∂T  = {T.grad.item():.4f}")
28print(f"Rho   = ∂P/∂r  = {r.grad.item():.4f}")

Compare the output to the closed-form Greeks from section-06:

Greek	Pathwise MC	BS closed form
Price	12.34	12.336
Delta = N(d₁)	≈ 0.602	0.6021
Vega = S φ(d₁) √T	≈ 37.5	37.524
Theta	≈ −6.4	−6.414
Rho	≈ 53.2	53.232

Why this matters. The pathwise estimator gives you the entire Greek stack from a single Monte Carlo run. On a desk where you reprice 10,000 instruments overnight, this is the difference between a 1-hour batch and a 5-hour batch. And it works for any payoff smooth in the inputs — Asian options, baskets, exotic structured products — with literally zero new mathematics.

Where Monte Carlo Earns Its Keep

Monte Carlo is the second-choice tool for vanilla European options — the closed form is faster and noise-free. Where MC dominates is any setting where the closed form does not exist:

Problem	Why MC wins
Asian options	Average of a lognormal is not lognormal — no closed form. Discretize the path, average over draws.
Basket options on ≥3 assets	Sum of correlated lognormals is not lognormal. Sample correlated normals via Cholesky, then average.
American-style with regression (Longstaff-Schwartz)	Backward induction on regressed continuation values; the underlying paths are MC.
Counterparty credit (CVA / xVA)	Joint simulation of all market factors over the life of every trade — millions of dimensions. Only MC scales.
Insurance / annuity guarantees	Equity-linked products with policyholder behaviour models too complex for any PDE solver.
Model calibration with stochastic vol (Heston, SABR)	Even when the model has a quasi-closed form for vanillas, MC is the universal validator for exotics priced under the same model.

The duality with the PDE approach. The Feynman-Kač theorem says that the same option value can be computed either as an expectation (Monte Carlo) or as the solution of the Black-Scholes PDE. The two are mathematically identical; they differ only in numerical practicality. Low dimensions (1–2 underlyings, smooth payoffs) ⇒ PDE wins. High dimensions or path-dependence ⇒ MC wins. Knowing both lets you pick the right tool for every job.

Summary

Risk-neutral pricing identifies an option's fair value with the discounted expected payoff under $\mathbb{Q}$ . Monte Carlo computes that expectation by sampling.
For European payoffs under GBM, one normal draw $Z \sim \mathcal{N}(0,1)$ yields one sample of $S_T$ via the closed-form solution $S_T = S_0 \exp[(r - \tfrac12\sigma^2)T + \sigma\sqrt{T}\,Z]$ .
The estimator $\hat{C}_N = e^{-rT}\cdot \frac{1}{N}\sum (S_T^{(i)} - K)_+$ is unbiased, consistent, and has standard error $s_N / \sqrt{N}$ . The 1/√N rate is the unavoidable tax.
Variance reduction attacks the numerator $s_N$ . Antithetic variates halve variance for ATM options at zero compute cost; control variates push that to 10× or more; importance sampling rescues deep-OTM rare events.
Pathwise Greeks via autograd deliver every sensitivity from one backward pass, with the same variance as the price itself — replacing five separate bumps with one differentiable forward.
Monte Carlo's niche is high dimensions, path dependence, and any setting where no closed form exists. Together with the PDE perspective from section-04, it forms the full numerical toolbox for option pricing.

One sentence to take away. Monte Carlo turns analytical impossibility into computational patience: any expectation you can sample from, you can price — and any payoff smooth in its inputs reveals all its Greeks at once through autograd.