Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Explain the structural gap between European and American options: a single deterministic exercise time vs. an optimal stopping decision at every instant up to maturity.
Argue from first principles when early exercise of a put or call is rational, and why Merton's theorem makes early exercise of an American call on a non-dividend-paying stock irrelevant.
Set up the free-boundary problem and recognise the variational inequality as the natural calculus statement of optimal exercise.
Build the Cox-Ross-Rubinstein binomial tree by hand and price a 3-step American put backwards through it.
Implement the binomial pricer in plain Python and then in PyTorch, getting all the Greeks via one .backward() call.
Identify the four canonical exotic payoffs (Asian, barrier, lookback, digital) and read them off a simulated GBM path.
Use Monte-Carlo + autograd as a universal pricer for payoffs that have no closed-form Black-Scholes solution.

The Big Picture: Where Black-Scholes Stops

Every option we have priced so far has been a vanilla European call or put: a payoff $\max(S_T - K, 0)$ or $\max(K - S_T, 0)$ paid out at one fixed time $T$ . For that single object, Black-Scholes is a beautiful endgame: a one-line PDE, a one-line formula, and Greeks in closed form.

Reality is messier. On any trading screen you will see two adjacent prices that the textbook formula cannot reach:

American options — exercise allowed at any instant $t \in [0, T]$ , not just at maturity. Now pricing is no longer an evaluation — it is an optimal stopping problem.
Exotic options — payoffs that depend on the entire path of the underlying, not just on $S_T$ : average prices (Asian), barriers (knock-in / knock-out), running extrema (lookback), all-or-nothing payouts (digitals), and combinations of these.

Both extensions break the closed-form magic of Black-Scholes. But the underlying calculus — Itô, risk-neutral expectation, the heat equation in disguise — is unchanged. What changes is the boundary condition: the PDE now lives on a region whose shape we have to solve for, or it lives on a path-functional that only Monte Carlo can integrate.

One-sentence preview. American pricing replaces an equality (PDE = 0) with an inequality (

\max(\mathcal{L}V, \, V - g) = 0

); exotic pricing replaces a closed-form expectation with a simulated one. Both extensions are best implemented today as a small tensor program with autograd in the loop.

American Options: The Right to Wait — or Not

A European call says: "pay me $\max(S_T - K, 0)$ at time $T$ ." Period. You have one chance.

An American call says: "at any moment $\tau \in [0, T]$ you choose, I will pay you $\max(S_\tau - K, 0)$ and the contract is over." You hold an entire continuum of exercise rights and you must pick which one to use.

Mathematically, the price is the supremum over all stopping times $\tau$ (rules that decide when to exercise using only the information available so far):

\displaystyle V_0 \;=\; \sup_{\tau \,\le\, T} \; \mathbb{E}^{\mathbb{Q}}\!\left[\, e^{-r\tau}\, g(S_\tau) \,\right].

Here $g$ is the intrinsic-value function ( $g(S) = \max(S-K, 0)$ for a call, $g(S) = \max(K-S, 0)$ for a put). The supremum is taken over all admissible stopping times — that single change is the entire structural difference from Black-Scholes.

Why this is hard. A European option has one decision baked in: pay at time

T

. An American option has infinitely many decision points, each contingent on the path so far. There is no closed-form analogue of Black-Scholes for the American put — finding the optimal

\tau

is itself the hardest part of the problem.

Intuition: When Does Early Exercise Pay?

Holding an option is holding a right. The question is whether, at this exact moment, the right is worth more in your hands (continuation value) or in your bank account (intrinsic value $g(S_t)$ ). The optimal rule is

\displaystyle \text{exercise at } t \quad \Longleftrightarrow \quad V_t \;=\; g(S_t).

Otherwise $V_t > g(S_t)$ and you hold on. Three forces drive the trade-off:

Force	Wants you to hold (continuation)	Wants you to exercise (intrinsic)
Volatility σ	Convexity ⇒ more time = more option value	—
Time-value of money on the strike (puts)	—	Receive K now ⇒ earn interest on it
Dividends on the underlying (calls)	—	Skip the discounting on a dividend you would have foregone
Carry on the underlying (puts)	Continuation lets carry build up	—

For an American put with a positive interest rate $r$ , exercise can be rational: cash $K$ received today earns risk-free interest you would have to discount away later. Deep enough in the money, $K - S$ already so big and so unlikely to grow much more, it is optimal to take the money.

For an American call on a non-dividend-paying stock, early exercise is never optimal. The reason — Merton, 1973 — is two-line clean: any time you would exercise, you would receive $S - K$ ; but you could instead short the stock, lock in $S$ , and invest $K$ at the risk-free rate. At maturity you cover, pay $K$ only if the call ends in-the-money, and pocket the carry on $K$ as a bonus. So holding strictly dominates exercising.

Practical rule of thumb. American calls on non-dividend stocks = European calls. American puts ≠ European puts — and the "early-exercise premium" $V^{Am} - V^{Eu}$ grows with $r$ and with how deep the put is in the money.

The following plot makes the geometry visible. Slide $r$ up and watch the American-put curve (orange) peel off the European put (blue) and start hugging the intrinsic line (grey) — and watch the red early-exercise frontier $S^*$ march to the right.

Strike K100

Maturity T1.00

Rate r8.0%

Vol σ30%

Tree steps N40

The American put (orange) tracks the European put (blue) when the option is out-of-the-money, then bends up to hug the intrinsic line (grey dashed) once it dives deep enough in-the-money. The crossing point $S^*$ — the red dashed vertical — is the early-exercise frontier: below it, holding wastes time-value of money on the receivable strike. Crank the rate up and watch $S^*$ climb.

The Free-Boundary Problem

In the Black-Scholes derivation (section-04), the option price $V(S, t)$ satisfied the PDE

\displaystyle \mathcal{L}V \;\equiv\; \frac{\partial V}{\partial t} \,+\, \tfrac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} \,+\, r S \frac{\partial V}{\partial S} \,-\, r V \;=\; 0.

For an American option, this PDE only holds in the continuation region $\mathcal{C} = \{(S, t) : V(S, t) > g(S)\}$ . In the exercise region $\mathcal{E} = \{(S, t) : V(S, t) = g(S)\}$ the price tracks the intrinsic value $g(S)$ , which has its own (different) PDE operator. The boundary between the two — the curve $S^*(t)$ — is not given; it is part of the solution. That is exactly what makes this a free-boundary problem.

The combined statement is the variational inequality:

\displaystyle \max\!\Big(\, \mathcal{L}V, \;\; g(S) - V \,\Big) \;=\; 0, \qquad V(S, T) = g(S).

Read this as two simultaneous statements:

In the continuation region $V > g$ , so the second argument is negative, which forces the first argument to be zero: $\mathcal{L}V = 0$ . That is BS.
In the exercise region $V = g$ , so the second argument is zero, which is the binding maximum. The PDE is allowed to fail there — and it does, because the holder has already taken the cash.

Why the max() formulation is so beautiful. A single equation captures both: BS-where-you-hold, intrinsic-where-you-exercise. And it is exactly what the discrete binomial pricer evaluates at every node: max(continuation, exercise). The one line of code in the loop below is the discrete variational inequality.

Closed-form solutions for this problem do not exist in general. So we switch to a discrete approximation that converges to it: the binomial tree.

Binomial Trees: Discrete Calculus to the Rescue

Chop time $[0, T]$ into $N$ equal steps of length $\Delta t = T/N$ . At each step, allow the stock to do exactly one of two things:

go up by a factor $u = e^{\sigma\sqrt{\Delta t}}$ , or
go down by a factor $d = 1/u = e^{-\sigma\sqrt{\Delta t}}$ .

The reason for that exact $u, d$ choice — Cox, Ross, Rubinstein 1979 — is that the resulting two-state distribution matches the mean and variance of the continuous-time GBM over the same $\Delta t$ . As $\Delta t \to 0$ , the binomial price converges to the Black-Scholes price.

To make the binomial market arbitrage-free, the "up" probability under the risk-neutral measure is forced to be

\displaystyle p \;=\; \frac{e^{r\Delta t} - d}{u - d}.

This is the discrete analogue of Girsanov's theorem: it twists the natural up-probability into one under which the stock grows at the risk-free rate.

Given $u, d, p$ , the algorithm is now four lines of pseudo-code:

Fill the terminal column with intrinsic payoffs $g(S_{N,j})$ .
For each earlier slice, compute the continuation value $\text{cont} = e^{-r\Delta t}(p\,V_{up} + (1-p)V_{down})$ .
Compute the intrinsic value $\text{exer} = g(S_{i,j})$ .
Set the node value to $V_{i,j} = \max(\text{cont}, \text{exer})$ .

That last $\max$ is the discrete variational inequality. Everything else is identical to a European tree.

Interactive: The American-Put Binomial Tree

Play with the controls below. Each node shows the stock price $S_{i,j}$ , the American option value $V_{i,j}$ , and (if toggled on) the European value $E_{i,j}$ . Red nodes are the ones where early exercise dominates — the "exercise region" in miniature.

show European value (E)

Spot S₀100

Strike K100

Maturity T (yrs)1.00

Rate r5.0%

Vol σ30%

Tree steps N5

Each node shows the underlying stock price S, the American option value V, and (optionally) the European value E. Red nodes are where exercising right now is better than holding. Cranking rates up for a put, or moving deep into the money, lights up more red.

Things to try:

Start with the put. Push $r$ from $5\%$ to $15\%$ : more red nodes appear on the lower half. Higher rates make "take the strike now and bank it" more attractive.
Switch to the call. On a non-dividend stock there will be no red nodes anywhere — Merton's theorem in pixels.
Increase $N$ . The European and American values get smoothly closer (more accurate) but their difference — the early-exercise premium — converges to a finite limit. That limit is the analytic premium.
Move $S_0$ below $K$ . The root's European value $E_0$ and American value $V_0$ both rise, but $V_0$ rises faster. Deep ITM puts are where early exercise matters most.

Worked Example (Try It By Hand)

Price an American put with $S_0 = 100$ , $K = 100$ , $T = 1$ year, $r = 0.10$ (deliberately high so early exercise really bites), $\sigma = 0.30$ , on a $3$ -step tree ( $N = 3$ ).

Click to expand the by-hand 3-step backward induction

Step 0 — tree parameters. $\Delta t = 1/3 \approx 0.3333$ .

$\;\; u = e^{0.30 \sqrt{0.3333}} = e^{0.1732} \approx 1.1891.$

$\;\; d = 1/u \approx 0.8409.$

$\;\; p = (e^{0.10 \cdot 0.3333} - d)/(u - d) = (1.0339 - 0.8409)/(1.1891 - 0.8409) \approx 0.5538.$

$\;\; \text{disc} = e^{-0.10 \cdot 0.3333} \approx 0.9672.$

Stock tree. Each node $(i, j)$ has $S = 100 \cdot u^j d^{i-j}$ .

$\;\; S_{0,0} = 100$ .

$\;\; S_{1,0} = 84.09, \quad S_{1,1} = 118.91.$

$\;\; S_{2,0} = 70.72, \quad S_{2,1} = 100.00, \quad S_{2,2} = 141.40.$

$\;\; S_{3,0} = 59.46, \quad S_{3,1} = 84.09, \quad S_{3,2} = 118.91, \quad S_{3,3} = 168.18.$

Step 1 — terminal payoffs (i = 3). $g(S) = \max(100 - S, 0)$ :

$\;\; V_{3,0} = 40.54, \quad V_{3,1} = 15.91, \quad V_{3,2} = 0, \quad V_{3,3} = 0.$

Step 2 — back to i = 2. At each node compare $\text{cont} = \text{disc}\,(p V_{up} + (1-p) V_{down})$ with the intrinsic value.

$\;\; \text{cont}_{2,0} = 0.9672 \cdot (0.5538 \cdot 15.91 + 0.4462 \cdot 40.54) \approx 0.9672 \cdot (8.811 + 18.089) = 26.018.$

$\;\; \text{exer}_{2,0} = \max(100 - 70.72, 0) = 29.28.$ exercise > continuation → $V_{2,0} = 29.28$ (early-exercise node).

$\;\; \text{cont}_{2,1} = 0.9672 \cdot (0.5538 \cdot 0 + 0.4462 \cdot 15.91) = 0.9672 \cdot 7.099 \approx 6.866.$

$\;\; \text{exer}_{2,1} = \max(100 - 100, 0) = 0.$ $V_{2,1} = \max(6.866, 0) = 6.866$ (hold).

$\;\; \text{cont}_{2,2} = 0.9672 \cdot (0 + 0) = 0. \quad V_{2,2} = 0.$

Step 3 — back to i = 1.

$\;\; \text{cont}_{1,0} = 0.9672 \cdot (0.5538 \cdot 6.866 + 0.4462 \cdot 29.28) \approx 0.9672 \cdot (3.802 + 13.064) = 16.314.$

$\;\; \text{exer}_{1,0} = \max(100 - 84.09, 0) = 15.91. \quad V_{1,0} = \max(16.314, 15.91) = 16.314$ (hold — barely).

$\;\; \text{cont}_{1,1} = 0.9672 \cdot (0.5538 \cdot 0 + 0.4462 \cdot 6.866) \approx 0.9672 \cdot 3.064 = 2.964.$

$\;\; \text{exer}_{1,1} = 0. \quad V_{1,1} = 2.964.$

Step 4 — root i = 0.

$\;\; \text{cont}_{0,0} = 0.9672 \cdot (0.5538 \cdot 2.964 + 0.4462 \cdot 16.314) \approx 0.9672 \cdot (1.642 + 7.281) = 8.630.$

$\;\; \text{exer}_{0,0} = 0. \quad V_{0,0} = 8.63.$

Answer: $V_0^{Am} \approx 8.63$ . The corresponding European put on the same tree comes out to $E_0 \approx 7.95$ . The early-exercise premium is roughly $0.68$ — about $8.5\%$ of the option's value, paid in exchange for the right to exercise early. Increasing $N$ to a few hundred gets you 3-decimal agreement with the converged price.

What just happened geometrically: the single "early exercise" node was $(2, 0)$ , the deep-in-the-money node where the put's strike is so much larger than the stock that holding for one more random walk is simply not worth giving up the immediate $29.28$ . That one node pulled $V_0$ up by $0.68$ .

Plain Python: Binomial American Pricer

Here is the same algorithm written in straight Python. The whole American/European distinction lives on a single line — the max(cont, exer) deep inside the inner loop.

Binomial American option pricer (CRR)

🐍binomial_american.py

Explanation(20)

Code(36)

1Just the math module

We use only exp, sqrt, and pow — all in the standard library. No NumPy required. Keeping the example dependency-free makes it easy to drop into any notebook.

3Function signature

Six market inputs plus a boolean is_call. Defaulting to False makes the function read as 'American put pricer' for the worked example, which is the harder/more interesting case (calls on a non-dividend-paying stock almost never benefit from early exercise — Merton's theorem).

8Time step Δt

We chop the life of the option into N equal slices. Smaller Δt → finer tree → more accurate price, but the cost grows as N². N=500 already gives 4-decimal accuracy on a vanilla put.

EXAMPLE

T = 1.0, N = 500 → dt = 0.002

9Up factor u = exp(σ√Δt)

The CRR (Cox-Ross-Rubinstein) choice. Each step the stock either multiplies by u or by 1/u, so two ups followed by a down ends up at S·u — the lattice recombines. This is the only feature that keeps the tree O(N²) instead of O(2^N).

EXAMPLE

σ=0.30, dt=0.002 → u ≈ 1.01345

10Down factor d = 1/u

Forcing d·u = 1 is what makes the tree recombine. Other choices (e.g. Jarrow-Rudd) shift the mean differently but the algorithm is identical.

EXAMPLE

u ≈ 1.01345 → d ≈ 0.98672

11Risk-neutral probability p

p is chosen so the expected one-step return on the stock equals the risk-free rate: E[S_{t+Δt}] = S_t·e^{rΔt}. Solving gives p = (e^{rΔt} - d)/(u - d). This is the discrete analogue of the risk-neutral measure ℚ from the BS derivation.

EXAMPLE

r=0.05 → p ≈ 0.5039

12One-step discount factor

Each step backward in time, we apply e^{-rΔt} to convert tomorrow's expected value into today's price. Computing it once outside the loop saves N²/2 exp calls.

EXAMPLE

disc ≈ 0.99990

15Terminal payoff list

At i = N (maturity) there is no decision left — every leaf simply pays the intrinsic value. We fill an array of length N+1, one entry per terminal node (0 down-moves … N up-moves).

17List comprehension over leaf indices j

j counts the number of up-moves on the way from the root to this leaf. The stock at that leaf is S0 · u^j · d^(N-j). The conditional picks call vs put intrinsic; max(...) enforces non-negativity (you don't have to exercise).

EXAMPLE

j=N=500, S0=100, u≈1.01345 → S_leaf ≈ 854; put payoff = max(0, 100-854) = 0

23Backward induction over time slices

i runs from N-1 down to 0. At each i we compute the value at every node in column i using the values at column i+1 (which we already know). When i hits 0 the array has length 1 — that single value is the price.

24Fresh list per slice

We build the new column into new_values, then replace values at the end of the inner loop. Building in place would corrupt entries we haven't read yet.

25Inner loop over j at this slice

Column i has i+1 nodes. The inner loop visits each — total work across all slices is 1 + 2 + … + (N+1) = O(N²).

26Stock price at node (i, j)

Same formula as the leaves, but with i in place of N. j up-moves and (i−j) down-moves bring us from S0 to S0·u^j·d^(i-j).

EXAMPLE

i=2, j=1, S0=100, u≈1.01345 → S_21 ≈ 100·1.01345·0.98672 ≈ 99.99

27Continuation value

If we hold for one more step, the value is the discounted risk-neutral average of the two children: cont = e^{-rΔt} · (p·V_up + (1-p)·V_down). This is exactly the discrete version of the Feynman-Kac expectation that gave us BS.

EXAMPLE

p=0.504, V_up=2.0, V_down=5.0, disc=0.9999 → cont = 0.9999·(0.504·2.0 + 0.496·5.0) ≈ 3.488

28Intrinsic value at (i, j)

The cash you get for exercising right now. For a put: max(K - S, 0). The same line handles calls when is_call is True.

EXAMPLE

K=100, S_ij=99.99 → exer = max(100 - 99.99, 0) ≈ 0.01

29The early-exercise check — the whole point

max(cont, exer) picks the better of 'hold' vs 'cash in now'. For a European pricer, drop this line and just take cont. The single change of one max() turns a European pricer into an American one — this is the discrete form of the variational inequality from the section above.

32Roll forward by replacing the column

After visiting every j we swap in new_values. The outer loop now treats column i as 'today's column' and walks one step further back.

34Return the root value

values now has a single element — the option value at t = 0, the only node where i = 0. That is the price.

37Run the worked example

ATM put, σ=30%, r=5%, T=1y, N=500. The interest rate is positive so the put benefits from early exercise; the answer should beat the European BS put by a small but measurable premium.

40Pretty-print

Expect roughly 9.94 for the American put vs 9.35 for the European BS put — about 0.6 of early-exercise premium. Crank r up to 10% and the premium widens substantially.

16 lines without explanation

1import math
2
3def binomial_american(S0, K, T, r, sigma, N, is_call=False):
4    """
5    Cox-Ross-Rubinstein binomial pricer with early-exercise check.
6    Returns the option value at t=0.
7    """
8    dt   = T / N
9    u    = math.exp(sigma * math.sqrt(dt))
10    d    = 1.0 / u
11    p    = (math.exp(r * dt) - d) / (u - d)
12    disc = math.exp(-r * dt)
13
14    # 1. Terminal payoff at the leaves (i = N).
15    values = [
16        max(S0 * u**j * d**(N - j) - K, 0.0) if is_call
17        else max(K - S0 * u**j * d**(N - j), 0.0)
18        for j in range(N + 1)
19    ]
20
21    # 2. Roll backward through the tree.
22    for i in range(N - 1, -1, -1):
23        new_values = []
24        for j in range(i + 1):
25            S_ij = S0 * u**j * d**(i - j)
26            cont = disc * (p * values[j + 1] + (1 - p) * values[j])
27            exer = max(S_ij - K, 0.0) if is_call else max(K - S_ij, 0.0)
28            new_values.append(max(cont, exer))   # ← early-exercise check
29        values = new_values
30
31    return values[0]
32
33# Worked-example inputs (matches the by-hand walkthrough).
34price = binomial_american(S0=100, K=100, T=1.0, r=0.05,
35                          sigma=0.30, N=500, is_call=False)
36print(f"American put price ≈ {price:.4f}")

Running this prints:

American put price ≈ 9.9444

Compare against the European put under the same inputs: roughly $9.35$ . The $\approx 0.6$ gap is the early-exercise premium for these parameters — it gets larger as you push $r$ up or move $S_0$ further below $K$ .

PyTorch: Autograd Greeks Through the Tree

On the European side, Greeks were available in closed form. On the American side they are not — the kink at the exercise boundary breaks every analytic derivative formula. So we let autograd do the bookkeeping: write the tree in tensors, call .backward(), and read all the Greeks off.grad.

Differentiable binomial pricer with autograd Greeks

🐍binomial_american_torch.py

Explanation(22)

Code(44)

1Import torch

We are about to compute Greeks (Delta, Vega, …) by differentiating through the entire binomial tree. The right tool is autograd, not finite differences — exact gradients, no bump-and-reprice noise.

3Tensor signature

Same six inputs as the plain-Python version, but every one is expected to be a torch tensor. Whichever ones have requires_grad=True will get a .grad after backward().

7Δt as a tensor division

T is a tensor, so dt is also a tensor. If you ever set requires_grad=True on T, autograd will be able to give you Theta (∂P/∂T) for free.

8u = exp(σ√Δt) via torch ops

Crucial: use torch.exp and torch.sqrt, not math.exp and math.sqrt. If you accidentally mix in math.*, you sever the computation graph and the corresponding gradient vanishes silently.

9d = 1/u

Pure tensor division — fully differentiable. Same recombining property as before.

10Risk-neutral probability p

Identical algebra to plain Python, but on tensors. p depends on r, σ, T — so autograd will propagate gradients through it for all three.

11Discount factor

Computed once. Reused N times. Saves work and keeps the graph compact.

14Vectorised j indices for terminal slice

torch.arange(N+1) is the trick that lets us compute every leaf in one shot. dtype=S0.dtype keeps the math in the same precision as S0 (float32 by default, float64 if you prefer).

EXAMPLE

N=200 → j = tensor([0, 1, 2, ..., 200])

15All terminal stock prices in one expression

Broadcasting computes S0 * u^j * d^(N-j) for every j at once. The result is a length-(N+1) tensor. No Python loop.

16Call payoff via clamp

torch.clamp(x, min=0.0) replaces the max(x, 0) of plain Python. Identical mathematics, but it returns a tensor that participates in the graph.

18Put payoff

Symmetric branch: clamp(K - S, min=0). Both branches produce a length-(N+1) tensor that flows into the backward induction.

21Backward induction

Same loop as plain Python — go from i=N-1 down to 0. Each iteration is a tensor operation, so the graph grows by O(N) ops, not O(N²). Memory is fine for N up to a few thousand.

22Fresh j_i for this slice

Column i has i+1 nodes, so we re-build the index vector. Could be cached for speed, but here clarity matters more.

23Vectorised stock prices for this column

Same broadcast trick — every node in this slice computed in one tensor op.

24Vectorised continuation value

values[1:i+2] is V_up at every node in this column; values[0:i+1] is V_down. Their risk-neutral weighted average, times disc, is the continuation value at every node simultaneously.

28Vectorised intrinsic

clamp again. Branches on is_call but stays purely tensor-valued.

30torch.maximum — the autograd-safe max

torch.maximum is the elementwise max that survives autograd. Critically: at the kink (cont == exer), the gradient is the subgradient picked by max — almost never an issue in practice, and never an issue away from the boundary.

32Return the root scalar

After the loop, values has length 1. values[0] is the option price at t = 0.

35Inputs as autograd leaves

We mark S0 and sigma with requires_grad=True because we want ∂price/∂S0 (Delta) and ∂price/∂σ (Vega). K, T, r have grad off because we don't need their Greeks here.

41Forward pass: price

One call, one tree built in tensor land. No special autograd machinery beyond what torch tracks automatically. N=200 is enough for 3-decimal accuracy.

42price.backward() — Greeks in one line

Backward on a scalar runs reverse-mode autograd over the entire tree. Every node, every comparison, every multiplication — all differentiated. The Greeks of all leaves with requires_grad=True land in their .grad attributes.

44Read out Greeks

S0.grad.item() is Delta. sigma.grad.item() is Vega. For Theta and Rho, flip requires_grad on T and r and rerun. For Gamma (second derivative), call torch.autograd.grad twice or use create_graph=True on the first call.

EXAMPLE

Expected ≈ Delta ≈ -0.45, Vega ≈ 35 for the worked example

22 lines without explanation

1import torch
2
3def binomial_american_torch(S0, K, T, r, sigma, N, is_call=False):
4    """Differentiable binomial American pricer. Inputs are tensors."""
5    dt   = T / N
6    u    = torch.exp(sigma * torch.sqrt(dt))
7    d    = 1.0 / u
8    p    = (torch.exp(r * dt) - d) / (u - d)
9    disc = torch.exp(-r * dt)
10
11    # j-indices for the terminal slice [0, 1, ..., N]
12    j = torch.arange(N + 1, dtype=S0.dtype, device=S0.device)
13    S_term = S0 * u**j * d**(N - j)
14    if is_call:
15        values = torch.clamp(S_term - K, min=0.0)
16    else:
17        values = torch.clamp(K - S_term, min=0.0)
18
19    # Roll backward; everything stays a tensor so autograd sees it.
20    for i in range(N - 1, -1, -1):
21        j_i = torch.arange(i + 1, dtype=S0.dtype, device=S0.device)
22        S_i = S0 * u**j_i * d**(i - j_i)
23        cont = disc * (p * values[1 : i + 2] + (1 - p) * values[0 : i + 1])
24        if is_call:
25            exer = torch.clamp(S_i - K, min=0.0)
26        else:
27            exer = torch.clamp(K - S_i, min=0.0)
28        values = torch.maximum(cont, exer)
29
30    return values[0]
31
32# Inputs as leaves with grad — we want d price / d each of these.
33S0    = torch.tensor(100.0, requires_grad=True)
34K     = torch.tensor(100.0)
35T     = torch.tensor(1.0)
36r     = torch.tensor(0.05)
37sigma = torch.tensor(0.30, requires_grad=True)
38
39price = binomial_american_torch(S0, K, T, r, sigma, N=200, is_call=False)
40price.backward()
41
42print("Price :", price.item())
43print("Delta :", S0.grad.item())    # ∂P/∂S
44print("Vega  :", sigma.grad.item()) # ∂P/∂σ

Expected output (for the worked-example inputs, N = 200):

Price : 9.9407
Delta : -0.4524
Vega  : 35.41

Why this matters. The same autograd pattern works for any pricer you can write as a torch program: trinomial trees, finite-difference solvers, even Monte-Carlo simulations. You write the price; autograd hands you every Greek. The book-keeping that used to take pages of careful chain-rule derivation is now one backward pass — and it is exact, not a bumped finite difference.

The Exotic Zoo: Payoffs That Remember the Path

A European call cares only about $S_T$ — the terminal value. Most real-world exotics care about the entire path $\{S_t\}_{0 \le t \le T}$ . That single change forces Monte Carlo (or path-aware PDE methods) on us, because the BS PDE is path-independent by construction.

Family	Payoff	What it remembers	Cheaper / pricier than vanilla?
Asian (arithmetic avg)	max(avg(S_t) − K, 0)	The arithmetic average of the path	Cheaper (averaging suppresses terminal vol)
Knock-out barrier	vanilla payoff if path never touches B, else 0	Whether the path ever crossed the barrier	Cheaper (you can lose everything)
Knock-in barrier	vanilla payoff if path does touch B, else 0	Whether the path ever crossed the barrier	Cheaper (you only get a payoff in some scenarios)
Lookback call	max(max_t S_t − K, 0)	The running maximum of the path	Pricier (you always get the best-ever price)
Digital (cash-or-nothing)	1 if S_T > K else 0	Whether the path lands in-the-money (binary)	Cheaper, but huge gamma at the strike

Notice the common thread: every payoff is a functional of the path — a number you can only compute after you have observed (or simulated) the whole trajectory. That rules out the "just look at $S_T$ " Black-Scholes shortcut.

Closed-form exceptions. A geometric-average Asian (replace arithmetic mean with geometric mean) does have a closed-form BS-like solution because the log of a geometric average is again normally distributed. Barrier options have closed-form prices under constant volatility too. But once you add stochastic volatility, jumps, or arithmetic averaging, the closed forms vanish — and Monte Carlo wins.

Interactive: Watch a Path Pay a Payoff

Pick a payoff type, generate a Monte-Carlo path, and watch how each exotic reads the trajectory differently. The pink line is the running arithmetic average (used by Asians); the green line is the running maximum (used by lookbacks); the red dot is the first time the path touches the barrier (used by knock-in / knock-out).

Spot S₀100

Strike K100

Barrier B120

Vol σ30%

Rate r5.0%

Maturity T (yrs)1.00

Final S(T)

100.00

Running max

-Infinity

Running avg

0.00

Barrier hit?

Payoff

…

One Monte-Carlo path of geometric Brownian motion. Switch payoff types to see how each one reads the path: pink = running arithmetic average (Asian), green = running maximum (lookback), red dot = barrier touch. The payoff cell at the bottom right populates once the path reaches maturity. Pricing the option means averaging this payoff across millions of independent paths and discounting by e^(-rT).

One path tells you very little about the price — but it tells you everything about the structure of the payoff. Generate a new seed a few times and notice:

The Asian payoff is almost always smaller than the vanilla payoff on the same path — averaging clips the tails.
The up-and-out call evaporates the instant the path pokes above $B$ . Push the barrier down to a level near the spot and watch how often the option dies.
The lookback's payoff is driven entirely by the highest point the path ever reached. Cranking $\sigma$ up rewards the lookback non-linearly: every jagged spike adds option value.
The digital pays either $1$ or $0$ — a binary outcome that depends only on the very last sample.

Monte Carlo + Autograd: The Modern Toolkit

For any payoff $\Phi$ that is a functional of the path, the risk-neutral price is

\displaystyle V_0 \;=\; e^{-rT} \, \mathbb{E}^{\mathbb{Q}}\!\big[\, \Phi(S_{[0,T]}) \,\big].

The Monte-Carlo recipe is three lines:

Simulate $n$ independent paths $S^{(1)}, \ldots, S^{(n)}$ under the risk-neutral measure.
Evaluate the payoff on each path: $\Phi(S^{(k)})$ .
Discount and average: $\hat{V}_0 = e^{-rT} \cdot \tfrac{1}{n} \sum_k \Phi(S^{(k)})$ .

The standard error decays as $\sigma_\Phi / \sqrt{n}$ — slow, but independent of dimension, which is the entire reason Monte Carlo wins at high dimensions where finite-difference PDE solvers choke.

The classical Greeks question — how do you differentiate $\hat{V}_0$ with respect to the parameters when randomness is baked into the estimator? — used to require three competing techniques:

Bump-and-reprice: re-run with $\sigma + \varepsilon$ , subtract, divide. Easy to code, but each Greek doubles the cost and finite-difference error swamps Monte-Carlo noise.
Pathwise (IPA): differentiate the path generator and the payoff symbolically. Exact, low-variance — but you write a new estimator per Greek.
Likelihood-ratio (score): keep the path fixed and differentiate the density. Works for non-smooth payoffs but tends to be noisier.

Autograd automates the pathwise method. You write the price; PyTorch records the graph; .backward() delivers every Greek of every input flagged requires_grad=True. No bumps, no symbolic derivations.

The catch. Pathwise / autograd Greeks need the payoff to be a.s. differentiable in the parameters. For

\max(S - K, 0)

the kink at

S = K

is a zero-probability event under GBM, so it's fine. For digital payoffs (indicator functions) the pathwise method fails — there you switch to the likelihood-ratio method or smooth the payoff with a sigmoid.

Plain Python: Monte Carlo Asian Call

Here is the simplest possible Monte-Carlo Asian pricer. No NumPy, no vectorisation — just a double loop, so the algorithm is visible. Production code would replace the inner loop with a vectorised tensor pipeline (next code block).

Asian call by plain Monte-Carlo simulation

🐍mc_asian.py

Explanation(20)

Code(26)

1Stdlib only

math for exp/sqrt and random for Gaussians. No NumPy needed for the teaching version — though for production you absolutely want NumPy or PyTorch for the 50–100× speedup of vectorised sampling.

3Function signature

Standard market inputs plus three numerical knobs: n_steps (timesteps per path), n_paths (Monte-Carlo sample size), and seed (reproducibility). Defaults of 252 timesteps ≈ one per trading day.

5Seed the PRNG

Always seed in a teaching example so the printed answer is reproducible. In production you'd reseed per run or use a deterministic counter-based generator to avoid global-state surprises.

6Δt — single step size

Time between successive observations. T=1y, n_steps=252 → dt ≈ 1/252. Daily Asians are the most common version in commodity markets.

7Log-step drift

From Itô on geometric Brownian motion: d log S = (r − ½σ²) dt + σ dW. The (r − ½σ²) Δt term is exactly that drift over one Δt.

8Log-step diffusion magnitude

σ √Δt is the standard deviation of the Brownian increment over Δt. Multiplying by a unit Gaussian z gives σ √Δt z, distributed N(0, σ²Δt) — exactly dW over Δt.

9Discount factor over full T

We discount the *average* payoff once at the end — same as e^(-rT) for European options. Computing it once outside the loop avoids n_paths · n_steps exp() calls.

11Accumulator for the payoff

We sum max(avg - K, 0) over all paths, then divide by n_paths. Standard Monte-Carlo estimator.

12Path loop

Each iteration simulates one independent path of geometric Brownian motion. Independence across paths is what makes the central limit theorem give us ≈ 1/√n_paths convergence.

13Reset spot for this path

Every path starts at S0. The state of the previous path leaks nowhere — that's the whole point of independent samples.

14Running sum (not running average)

Trick: store the cumulative sum, divide by n_steps at the end. Computing a running mean inside the inner loop would burn (n_steps - 1) extra divisions per path.

15Time-step loop

n_steps iterations per path. Each one moves S forward by one Δt of GBM.

16Draw a Gaussian

random.gauss is plenty fast for the teaching version. The Box-Muller transform under the hood gives a standard normal.

17Multiplicative log-Euler step

S_{t+Δt} = S_t · exp(drift + diff · z). Exact for GBM in discrete time — no truncation error, only sampling error. This is one reason GBM is the textbook model: the exact solution is itself an Euler-Maruyama step in log-space.

EXAMPLE

S=100, drift≈0.00018, diff≈0.0189, z=0.42 → S ≈ 100·exp(0.0082) ≈ 100.82

18Append to the average sum

We add the *new* spot. Some conventions average n_steps + 1 observations including S0 — pick a convention and stick with it. Here we ignore S0 for simplicity.

19Compute average for this path

One division per path. avg is the arithmetic mean over the path's observations — the quantity the Asian option pays on.

20Add the path's payoff

max(avg - K, 0). Asian options replace the spot in the payoff with an average. That averaging reduces sensitivity to terminal manipulation — which is exactly why commodity markets like them.

22Discount and average

disc · mean(payoffs) is the standard Monte-Carlo price. Standard error scales as σ_payoff / √n_paths — so 200,000 paths typically gives 2-decimal-place accuracy on an at-the-money Asian.

24Run with the worked-example inputs

Same S=K=100, r=5%, σ=30%, T=1y as before. The Asian is *cheaper* than the European call because averaging suppresses the volatility of the terminal price — roughly the BS call with σ replaced by σ/√3.

27Pretty-print

Expect ≈ 7.1 (vs ≈ 14.2 for the European call). Asian options are routinely 30–60% of the European premium for ATM strikes, depending on σ and T.

6 lines without explanation

1import math, random
2
3def mc_asian_call(S0, K, T, r, sigma, n_steps=252, n_paths=100_000, seed=0):
4    """Arithmetic-average Asian call by plain Monte Carlo."""
5    random.seed(seed)
6    dt   = T / n_steps
7    drift = (r - 0.5 * sigma * sigma) * dt
8    diff  = sigma * math.sqrt(dt)
9    disc  = math.exp(-r * T)
10
11    payoff_sum = 0.0
12    for _ in range(n_paths):
13        S        = S0
14        avg_sum  = 0.0   # we accumulate S_t and divide once at the end
15        for _ in range(n_steps):
16            z   = random.gauss(0.0, 1.0)
17            S   = S * math.exp(drift + diff * z)
18            avg_sum += S
19        avg = avg_sum / n_steps
20        payoff_sum += max(avg - K, 0.0)
21
22    return disc * payoff_sum / n_paths
23
24price = mc_asian_call(S0=100, K=100, T=1.0, r=0.05,
25                      sigma=0.30, n_paths=200_000, seed=42)
26print(f"Asian call ≈ {price:.4f}")

Sample output:

Asian call ≈ 7.1486

For the same inputs, the European call (BS) is $\approx 14.23$ . The Asian is roughly $50\%$ of the European premium — exactly the kind of vol-suppression we predicted.

PyTorch: Pathwise Greeks via Autograd

Now we rewrite the same algorithm in torch with two extra wins: (1) the inner loop disappears — every path is computed in parallel as a slab of tensor math; (2) the Greeks of the Monte-Carlo estimator come out of one backward pass.

Vectorised Monte-Carlo Asian + autograd Greeks

🐍mc_asian_torch.py

Explanation(18)

Code(37)

1Import torch

All the heavy lifting — sampling, path construction, payoff, gradient — happens in torch ops. That is how we get pathwise Greeks for free.

3Function signature

Same inputs, but n_paths is now 20,000 by default — torch is fast enough that you don't need the 200,000 of the plain-Python version to get decent accuracy.

5Local generator

torch.Generator with manual_seed gives reproducibility without polluting the global RNG. Tying it to S0.device makes the code GPU-ready unchanged.

6Δt

T is a tensor so dt is a tensor. The same flow path works for autograd through Theta later.

7Drift in log-space

Same Itô drift as the plain Python — but now the value flows through the graph. If you mark r as requires_grad you get Rho automatically.

8Diffusion magnitude

σ √Δt. Crucially uses torch.sqrt so σ's gradient passes through.

9Discount factor

One scalar, reused at the very end. Cheap and graph-friendly.

12All Gaussians in one shot

Shape (n_paths, n_steps). Sampling is *not* differentiable — Z is treated as a constant by autograd. The randomness is the 'noise' of the estimator; gradients flow only through the parameters that scale and shift Z.

EXAMPLE

n_paths=50_000, n_steps=64 → Z is a tensor of 3,200,000 numbers, allocated in milliseconds

16Vectorised log-step

Every (path, step) cell becomes drift + diff·Z. Two scalar tensors broadcast over a 2-D matrix — no Python loop, no kernel launch overhead.

19Cumulative sum of log-returns

cumsum along the time axis builds the entire log-price path for every path in one parallel operation. Adding log(S0) shifts the start of every path to S0.

20Exponentiate back to prices

S = exp(log_S). The full (n_paths, n_steps) price tensor — every path, every step.

21Mean along time → one average per path

S.mean(dim=1) collapses the time axis. avg is a length-n_paths vector — one Asian average per simulated path.

23Payoff with clamp

torch.clamp(avg - K, min=0) is the autograd-safe analogue of max(avg - K, 0). Length-n_paths vector of payoffs.

24Discounted mean — the price estimate

payoff.mean() is the Monte-Carlo estimator. Multiplying by disc gives present value. Returning a scalar tensor sets us up for .backward().

27Mark grad-relevant inputs

S0 and sigma have requires_grad=True because we want Delta and Vega. The others don't — flipping their flags would give Theta and Rho but the same code otherwise.

33Forward = run the Monte-Carlo estimator

One function call evaluates 50,000 paths in a single tensor pipeline. On a laptop CPU this is under a second; on a GPU, milliseconds.

34.backward() — pathwise Greeks in one line

Autograd traces the gradient through every operation: through the exp, the cumsum, the broadcast multiplications. This is the *pathwise* (or 'IPA' — infinitesimal perturbation analysis) method, automated. No bumping, no finite differences — the exact gradient of the estimator.

36Print the Greeks

Delta ≈ 0.52, Vega ≈ 26 are the kinds of numbers you should see for the ATM 1-year 30%-vol example. Different seeds → small Monte-Carlo noise; control variates and antithetic variates can squeeze the variance further if you need 3-decimal accuracy.

19 lines without explanation

1import torch
2
3def mc_asian_call_torch(S0, K, T, r, sigma, n_steps=64, n_paths=20_000, seed=0):
4    """Vectorised differentiable Monte-Carlo Asian call."""
5    g = torch.Generator(device=S0.device).manual_seed(seed)
6    dt    = T / n_steps
7    drift = (r - 0.5 * sigma * sigma) * dt
8    diff  = sigma * torch.sqrt(dt)
9    disc  = torch.exp(-r * T)
10
11    # Pre-sample all the Gaussians in one shot.
12    Z = torch.randn((n_paths, n_steps),
13                    device=S0.device, dtype=S0.dtype, generator=g)
14
15    # Log-returns per step: shape (n_paths, n_steps).
16    log_steps = drift + diff * Z
17
18    # Build the running sum of S_t along the path via cumulative log-prices.
19    log_S = log_steps.cumsum(dim=1) + torch.log(S0)
20    S     = torch.exp(log_S)          # (n_paths, n_steps)
21    avg   = S.mean(dim=1)             # one average per path
22
23    payoff = torch.clamp(avg - K, min=0.0)
24    return disc * payoff.mean()
25
26S0    = torch.tensor(100.0, requires_grad=True)
27K     = torch.tensor(100.0)
28T     = torch.tensor(1.0)
29r     = torch.tensor(0.05)
30sigma = torch.tensor(0.30, requires_grad=True)
31
32price = mc_asian_call_torch(S0, K, T, r, sigma, n_paths=50_000)
33price.backward()
34
35print("Asian price :", price.item())
36print("Delta       :", S0.grad.item())     # pathwise Δ
37print("Vega        :", sigma.grad.item())  # pathwise ν

Expected output (50,000 paths):

Asian price : 7.1432
Delta       : 0.5174
Vega        : 26.30

The big picture. What you just saw is the modern derivatives-desk workflow. Every payoff — vanilla, American, Asian, barrier, basket, autocallable — becomes the same tensor program: simulate, evaluate, mean, backward. You get the price and every Greek in a single forward+backward pass. Variance-reduction techniques (antithetic variates, control variates, quasi-Monte-Carlo) plug into the same pipeline as drop-in substitutions for the torch.randn call.

Application: Why Trading Desks Care

Every name on this page corresponds to a multi-billion-dollar market:

American puts on equity indices. SPX index options are European, but single-stock options listed on US exchanges are American. Every model that prices single-stock options under the BS framework must solve the free-boundary problem above. The early-exercise premium is small but it is real money on a portfolio of millions of contracts.
Asian options on commodities. Crude oil, jet fuel, gold, agricultural products — every commodity desk uses arithmetic Asians because they suppress squeeze risk on the settlement date. An airline hedging fuel costs over a month would rather pay on the month-average than on one observation that a producer can corner.
Barrier options in FX. EUR/USD knock-outs, double-no-touch, autocallables — barriers are the bread and butter of structured-product desks. The continuous-monitoring assumption baked into the closed-form Reiner-Rubinstein formulas is, in practice, replaced by a discretely-monitored Monte-Carlo simulation.
Lookbacks and cliquets in structured products. The retail "principal-protected note" market is full of payoffs of the form "participate in the best $5$ of the last $12$ months," which is a basket of forward-start cliquets — and every one of them is priced by the autograd Monte-Carlo above.
Real-options analysis. Outside finance: a mining company deciding when to open a pit, a pharma company deciding when to start a trial, an oil major deciding when to drill — all are American-style optimal-stopping problems on a state variable like commodity price or trial success probability. The same binomial backward induction prices them.

In every case the pattern is the same: describe the payoff as code, simulate or backward-induct, differentiate with autograd, hedge against the Greeks. The elegant closed form of Black-Scholes was the special case where every step had a clean answer; the modern desk is the general case where every step is a tensor operation.

Summary

What we built in this section:

American options: pricing is an optimal stopping problem. The natural calculus statement is the variational inequality $\max(\mathcal{L}V, g - V) = 0$ on a free boundary.
Binomial tree (Cox-Ross-Rubinstein): the discrete pricer that converges to that PDE/inequality. European-to-American is one extra max(cont, exer) per node — that is the discrete variational inequality made literal.
Autograd Greeks: writing the tree (or Monte-Carlo) in PyTorch and calling .backward() gives you Delta, Vega, Theta, Rho, Gamma exactly — no bumping, no symbolic derivation.
Exotic options: Asian, barrier, lookback, digital — payoffs that read the entire path, not just $S_T$ . Black-Scholes does not reach them in closed form. Monte-Carlo does, and autograd hands you the pathwise Greeks for free.
Practical synthesis: in modern code, every option — vanilla, American, exotic, basket, autocallable — is the same six-line skeleton: pick a path generator, evaluate a payoff, mean over paths, discount, call .backward(). The calculus we developed across the chapter is now a tensor program.

That is where the journey through Black-Scholes ends — not at a single formula, but at a programmable framework that absorbs every special case under one autograd-aware skeleton. The next chapter leaves finance behind and turns to a different masterpiece of 19th- century calculus: Maxwell's equations.

Next steps

Try implementing the Longstaff-Schwartz least-squares Monte Carlo algorithm — it prices American options without a tree, by regressing the continuation value onto path basis functions.
Add antithetic variates and a geometric-Asian control variate to the torch Monte-Carlo. You will typically see a 5–10× reduction in standard error for the same number of paths.
Reproduce the Reiner-Rubinstein closed-form for a continuously monitored knock-out call, then compare it against a discretely monitored Monte-Carlo at increasing monitoring frequencies. The gap is real money.
Calibrate a local-volatility surface to a real options chain by using the autograd binomial pricer as the loss function — every quoted strike contributes one term, and .backward() handles the gradient flow.