Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Explain what implied volatility is and why it inverts the Black-Scholes relationship between price and $\sigma$ .
Prove (intuitively) that implied volatility exists and is unique by appealing to $\nu = \partial C/\partial \sigma > 0$ .
Apply Newton's method on the BS price curve to back out $\sigma_{IV}$ from a market price by hand.
Implement the solver in plain Python and again with PyTorch autograd — and recognise that the two implementations do exactly the same calculus.
Recognise the volatility smile and skew as a fingerprint of the assumptions that Black-Scholes gets wrong.
Read a real option chain's implied-vol curve and explain what each shape tells you about the market's beliefs.

The Big Picture: Inverting Black-Scholes

Up to this point Black-Scholes has been a one-way machine. Plug in five inputs — spot $S$ , strike $K$ , time $T$ , rate $r$ , and volatility $\sigma$ — and out comes a price $C$ .

\displaystyle C \;=\; \text{BS}(S, K, T, r, \sigma)

But on any options exchange the situation is reversed. The market broadcasts the price $C^{\text{mkt}}$ every second. Four of the five inputs — spot, strike, time, rate — are observed directly. The only thing the market does not hand us is $\sigma$ . So we ask the reverse question:

Given a market price $C^{\text{mkt}}$ , what value of $\sigma$ makes the Black-Scholes formula spit out exactly that price?

That single number is the option market's answer to "how volatile do you think the stock will be from now until expiry?". The number itself is called the implied volatility, and the whole machinery of modern options trading rests on it.

Analogy. Implied vol is to options what a yield is to bonds. Nobody quotes a bond by its present value; they quote it by the yield-to-maturity that produces that present value. Likewise nobody quotes an option by its dollar price; they quote it by the

\sigma

that produces that dollar price through Black-Scholes.

Definition: Implied Volatility

Fix $S, K, T, r$ and the observed market price $C^{\text{mkt}}$ . The implied volatility $\sigma_{IV}$ is the value that solves the equation

\displaystyle \text{BS}(S, K, T, r, \sigma_{IV}) \;=\; C^{\text{mkt}}.

Look at what the equation says, in words: I have a function of one variable $\sigma$ , namely $f(\sigma) = \text{BS}(\sigma) - C^{\text{mkt}}$ , and I want the root of that function. Pure calculus.

Why not just "measure" volatility? You can measure realised volatility (the standard deviation of past returns), and people do — but that is backward-looking. Implied volatility is forward-looking: it is the market's consensus forecast of volatility over the option's remaining life. The two are almost never equal, and the gap is itself a tradable spread.

Why IV Is Unique — Vega Is the Reason

How do we know the equation $\text{BS}(\sigma) = C^{\text{mkt}}$ has a solution at all, and that the solution is unique? One word: Vega.

Recall from section-06 that for a European call,

\displaystyle \nu \;=\; \frac{\partial C}{\partial \sigma} \;=\; S \, \varphi(d_1) \, \sqrt{T} \;>\; 0

for every $\sigma > 0$ . The price $C(\sigma)$ is therefore a strictly increasing function of $\sigma$ on $(0, \infty)$ . Add the two boundary facts —

As $\sigma \to 0$ , the call price approaches its intrinsic value $\max(S - K e^{-rT}, 0)$ .
As $\sigma \to \infty$ , the call price approaches the spot $S$ (you essentially own the upside, with zero downside discounted away).

— and the Intermediate Value Theorem gives us exactly what we wanted:

Existence + uniqueness of implied volatility. For any market price inside the no-arbitrage band $\max(S - K e^{-rT}, 0) \,<\, C^{\text{mkt}} \,<\, S$ , there is a unique $\sigma_{IV} > 0$ such that $\text{BS}(\sigma_{IV}) = C^{\text{mkt}}$ .

This is the entire mathematical justification for talking about "the" implied volatility. It is a one-sentence consequence of Vega being positive — which itself is a one-line consequence of $\varphi(d_1)$ being positive.

Interactive: Inverting the BS Curve

Here is the situation drawn explicitly. The blue curve is $C(\sigma) = \text{BS}(S, K, T, r, \sigma)$ . The dashed yellow horizontal line is the market price $C^{\text{mkt}}$ . The intersection — the green vertical line — is $\sigma_{IV}$ .

step 0: σ = 40.000%, C(σ) = 8.5526, err = 3.5526

Spot S100.00

Strike K100.00

Time T (yr)0.25

Rate r5.0%

Market price5.00

Start σ₀40%

Blue curve: the Black-Scholes call price as a function of σ. Dashed yellow line: the market-quoted price you are trying to match. Each red dot is a Newton iterate; the short red line is the tangent (slope = vega). Where that tangent crosses the yellow line is the next iterate. Increase volatility ⇒ the curve goes up and eventually stalls (deep-OTM behavior). Hit Converge to see the algorithm finish in two or three steps for almost any reasonable starting σ₀.

Drag the "market price" slider up and down. The green vertical line slides with it. Notice three things:

The blue curve is monotonic — it never has two crossings, so the implied vol is unique.
Far from at-the-money the curve flattens, so a small price change translates into a large change in σ. That is why deep-OTM IVs are noisy — Vega is small there.
The red tangent is the linear approximation Newton's method uses. The next iterate is where that tangent crosses the yellow line. Two steps almost always suffice.

Newton's Method on the BS Price Curve

Newton-Raphson on the function $f(\sigma) = \text{BS}(\sigma) - C^{\text{mkt}}$ is the standard solver. The update rule is the one-line classic:

\displaystyle \sigma_{n+1} \;=\; \sigma_n \;-\; \frac{f(\sigma_n)}{f'(\sigma_n)} \;=\; \sigma_n \;-\; \frac{\text{BS}(\sigma_n) - C^{\text{mkt}}}{\nu(\sigma_n)}.

Each iteration: evaluate price and Vega at the current σ, take the Newton step, repeat until the error is below tolerance. The guarantee is quadratic convergence near the root — every iteration roughly squares the error.

Why Newton instead of bisection? Bisection always works but converges only linearly. On Black-Scholes the function is smooth, monotonic, gently convex, and we already have its derivative in closed form. There is no reason not to use Newton — and you pay for that 2-iteration speed with almost no robustness cost.

For a deeply out-of-the-money option, Vega can be tiny — division blows up. The practical fix is to seed Newton sensibly (e.g. with the Brenner-Subrahmanyam approximation $\sigma_0 \approx \sqrt{2\pi/T} \cdot C^{\text{mkt}}/S$ ) or fall back to bisection if a Newton step pushes σ outside a sensible band.

Worked Example (Try It By Hand)

Same inputs we used to derive the Greeks in section-06: $S = 100$ , $K = 100$ , $T = 0.25$ years, $r = 0.05$ . Today the market is quoting an ATM call at $C^{\text{mkt}} = 5.00$ . What is the implied volatility?

Click to expand the by-hand Newton iteration

Step 0 — seed. Take $\sigma_0 = 0.20$ . We already computed in section-06 that, at this seed,

$\,\;d_1 = 0.175, \quad d_2 = 0.075, \quad N(d_1) \approx 0.5695, \quad N(d_2) \approx 0.5299, \quad \varphi(d_1) \approx 0.3932.$

Price at σ = 0.20:

$\;\;C = 100 \cdot 0.5695 \;-\; 100 \cdot e^{-0.0125} \cdot 0.5299 \;=\; 56.95 \,-\, 52.336 \;=\; 4.615.$

Vega at σ = 0.20:

$\;\;\nu = S \, \varphi(d_1) \, \sqrt{T} = 100 \cdot 0.3932 \cdot 0.5 \,\approx\, 19.66.$

Error: $f(\sigma_0) = 4.615 - 5.000 = -0.385$ . The model is below the market — we need more volatility.

Newton step:

$\;\;\sigma_1 \;=\; 0.20 \;-\; \frac{-0.385}{19.66} \;=\; 0.20 + 0.01960 \;=\; 0.21960.$

Step 1 — verify. Re-evaluate at $\sigma_1 = 0.21960$ . Now

$\;\;d_1 = \frac{0 + (0.05 + 0.5 \cdot 0.04822) \cdot 0.25}{0.21960 \cdot 0.5} = \frac{0.01853}{0.10980} \approx 0.1687,$

$\;\;d_2 = 0.1687 - 0.10980 \approx 0.0589, \quad N(d_1) \approx 0.5670, \quad N(d_2) \approx 0.5235.$

$\;\;C = 100 \cdot 0.5670 - 100 \cdot e^{-0.0125} \cdot 0.5235 \;=\; 56.70 - 51.70 \;=\; 5.000.$

That is the market price to four decimals. Newton converged in one step from a generic 20% seed because the curve is so close to linear over this range.

Answer: $\sigma_{IV} \approx 0.2196 = 21.96\%$ .

What just happened geometrically: starting from $(0.20, \;4.615)$ on the BS curve, we drew the tangent of slope 19.66. That tangent crosses the horizontal line $C = 5.00$ at $\sigma = 0.2196$ — and that point is so close to the true root that the next BS evaluation already agrees to four decimals.

Plain Python: Newton Solver for IV

Here is the same algorithm, written in plain Python with only the standard math module. The code is short because the idea is short.

Newton-Raphson solver for implied volatility

🐍implied_vol.py

Explanation(24)

Code(33)

1Just the math module

Everything we need — log, sqrt, exp, erf, pi — lives in the standard library. We deliberately avoid scipy here so the example is dependency-free and easy to drop into a notebook.

3BS call + Vega in one helper

Every Newton step needs the price (to measure error) and Vega (to take the step). Returning both from the same call avoids redundant work — both d₁ and d₂ are reused.

5Square root of T

√T appears in d₁, d₂, AND in Vega = S·φ(d₁)·√T. Compute it once and store it.

EXAMPLE

T = 0.25 → sqT = 0.5

6Compute d₁

Same d₁ we used in section-04 to derive BS. Note how the volatility σ appears in both the numerator (via 0.5σ² T) AND the denominator (σ√T) — that is what makes the equation interesting to invert.

EXAMPLE

S=K=100, r=0.05, σ=0.20, T=0.25 → d₁ = (0 + 0.0175)/(0.1) = 0.175

7Compute d₂

d₂ = d₁ − σ√T. Shifted one standard deviation lower. N(d₂) is the risk-neutral probability of finishing in-the-money.

EXAMPLE

d₂ = 0.175 − 0.10 = 0.075

8Standard normal CDF at d₁

math.erf gives the error function; the standard-normal CDF is (1 + erf(x/√2)) / 2. No SciPy needed.

EXAMPLE

N(0.175) ≈ 0.5695

9Standard normal CDF at d₂

Same calculation, evaluated at d₂. N(d₂) is the discounted probability used to weight the strike in the call price.

EXAMPLE

N(0.075) ≈ 0.5299

10Standard normal PDF at d₁

φ(d₁) is the bell-curve height at d₁. It will appear in Vega; both Gamma and Vega vanish when d₁ is far from zero because φ does.

EXAMPLE

φ(0.175) ≈ 0.3932

11Call price

C = S·N(d₁) − K·e^{−rT}·N(d₂). The closed-form Black-Scholes call. This is the quantity we will match against the market.

EXAMPLE

100·0.5695 − 100·e^{−0.0125}·0.5299 ≈ 4.615

12Vega = ∂C/∂σ

Closed-form Vega: ν = S·φ(d₁)·√T. This is the derivative we hand to Newton. Notice ν > 0 strictly — that is the entire reason the inversion is well-posed.

EXAMPLE

100·0.3932·0.5 ≈ 19.66

15Define implied_vol(...)

Inputs: the market-observed price plus the same five Black-Scholes inputs except σ (which we are solving for). Output: the implied volatility and how many Newton iterations it took.

17Default σ₀ = 20%

A reasonable seed for equity options. The basin of attraction for Newton on BS is enormous because the price-vs-σ curve is monotonic and gently convex — almost any seed in [5%, 80%] works.

18Default tolerance 1e-8 and 50 iterations

Newton converges quadratically near the root, so 50 iterations is overkill. Most cases finish in 2–4. The tight tolerance is so the solver also works for very small Vegas in deep-OTM cases.

19Initialize the iterate

sigma will be overwritten every loop iteration. We seed it from sigma0 once.

20Newton loop

Each pass: evaluate price + Vega at the current σ, take one Newton step, check convergence. We bail early on tol or on a vanishing Vega.

21Single combined call

We reuse the helper. Price tells us the error; Vega tells us the slope. Both come from the same d₁ — no wasted work.

22Signed error

err = model − market. Positive means we are above the market and σ must come down; negative means we are below and σ must go up. Newton handles the sign automatically through the division.

EXAMPLE

price=4.615, market=5.00 → err = −0.385

23Convergence check

Compare absolute error to tol. As soon as the model price is within 1e-8 of the market we return — there is no point in chasing more digits than the bid-ask spread.

24Return σ and iteration count

Returning the iteration count is useful both as a sanity check (1 means luck, >10 means something is wrong with the inputs) and for plotting convergence experiments.

25Guard: vanishing Vega

If σ wanders to a region where Vega ≈ 0 (deep-OTM or near-zero T), Newton's update blows up. Raising explicitly is much safer than silently producing a NaN.

27Newton update

σ_{n+1} = σ_n − f(σ_n) / f'(σ_n). For us f(σ) = BS(σ) − market and f'(σ) = Vega(σ). The max(1e-4, ...) clamp keeps us from accidentally going negative on a wild step early on.

EXAMPLE

σ = 0.20 − (−0.385)/19.66 = 0.21959

28Did not converge

If we exhaust max_iter we raise instead of returning a bad answer. For Newton-on-BS this almost never fires — it is a defensive guard for pathological inputs (no-arbitrage violations, e.g. a call price below S − Ke^{−rT}).

31Run the worked example

S=100, K=100, T=0.25, r=5%, target market price = $5.00, σ₀ = 20%. This is the same scenario we hand-traced above.

34Pretty-print result

Expected: σ_IV ≈ 0.219588 (≈ 21.96%) in 2 iterations. Quadratic convergence ⇒ each step roughly squares the error, so 2 steps suffice to go from 4e-1 to 1e-7.

9 lines without explanation

1import math
2
3def bs_call(S, K, T, r, sigma):
4    """Black-Scholes call price and Vega."""
5    sqT = math.sqrt(T)
6    d1  = (math.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * sqT)
7    d2  = d1 - sigma * sqT
8    Nd1 = 0.5 * (1 + math.erf(d1 / math.sqrt(2)))
9    Nd2 = 0.5 * (1 + math.erf(d2 / math.sqrt(2)))
10    pdf = math.exp(-0.5 * d1 * d1) / math.sqrt(2 * math.pi)
11    price = S * Nd1 - K * math.exp(-r * T) * Nd2
12    vega  = S * pdf * sqT
13    return price, vega
14
15def implied_vol(market_price, S, K, T, r,
16                sigma0=0.20, tol=1e-8, max_iter=50):
17    """Newton-Raphson search for the σ that reproduces the market price."""
18    sigma = sigma0
19    for i in range(max_iter):
20        price, vega = bs_call(S, K, T, r, sigma)
21        err = price - market_price
22        if abs(err) < tol:
23            return sigma, i
24        if vega < 1e-10:
25            raise RuntimeError("Vega vanished — Newton stalled.")
26        sigma = max(1e-4, sigma - err / vega)
27    raise RuntimeError("Did not converge within max_iter.")
28
29# Same worked example as the by-hand derivation.
30sigma, iters = implied_vol(market_price=5.00,
31                           S=100, K=100, T=0.25, r=0.05,
32                           sigma0=0.20)
33print(f"σ_IV = {sigma:.6f}  in {iters} iterations")

Running this prints:

σ_IV = 0.219588  in 2 iterations

Two iterations — same as the by-hand walk. Quadratic convergence on a smooth monotonic function is gorgeous.

PyTorch Autograd Solver for IV

The closed-form Vega is easy on a vanilla call. But once we move to exotic payoffs (barrier, Asian, lookback), the Vega has no clean formula. Autograd doesn't care — it computes $\partial C / \partial \sigma$ directly from the price function you wrote.

The structural change versus the plain-Python solver is one line: instead of returning Vega from bs_call, we ask autograd for it. Everything else is identical.

Implied vol via PyTorch autograd

🐍implied_vol_torch.py

Explanation(23)

Code(34)

1Import torch

We use torch because we want the derivative ∂price/∂σ (which equals Vega) computed automatically. We never write a Vega formula again — autograd reproduces it from the price formula.

3Differentiable BS call

Exactly the textbook formula, but every op is a torch op. That means the chain S → d₁ → N(d₁) → price is a graph autograd can traverse.

4Docstring as contract

Single-line description so future readers know this function is meant to stay differentiable — do not silently replace torch.log with numpy.log.

5torch.sqrt for √T

Using torch.sqrt (not math.sqrt) so T can be a tensor that participates in autograd. Today we only differentiate by σ, but the same code works if you later want Theta or Rho.

6d₁ in torch

Same algebra as in the numpy version. The key fact: σ shows up twice — inside (r + 0.5σ²)T AND in the denominator (σ√T). Autograd handles both paths via the product / quotient rule automatically.

7d₂ from d₁

d₂ = d₁ − σ√T. d₂ depends on σ both through d₁ (chain) and directly through the subtraction. Autograd composes those gradients without any manual bookkeeping.

8Normal CDF as a torch op

torch.distributions.Normal(0, 1).cdf gives a differentiable N(x); its derivative is φ(x). This single fact is the bridge between the BS closed-form derivative chain rule and autograd.

9Compose the price

Price = S·N(d₁) − K·e^{−rT}·N(d₂). Pure torch ops top to bottom — the computation graph is complete and differentiable in every input.

12Market observation

The price quoted by the exchange. Just a constant — we are NOT differentiating with respect to it.

EXAMPLE

market = tensor(5.00)

15Static spot

Treated as a constant for this solver. Not requires_grad because we only need ∂/∂σ.

16Strike

Constant. K never moves during a single option's life — it is part of the contract.

17Time to expiry

Constant for this snapshot. (If you wanted Theta as a free byproduct, set requires_grad=True here too.)

18Risk-free rate

Constant. Updated overnight from the yield curve in practice; held fixed during a single solve.

21σ is the variable we solve for

requires_grad=True tells autograd to track this leaf. Now any computation downstream of sigma can be differentiated by it.

EXAMPLE

torch.autograd.grad(price, sigma) → Vega

24Newton loop

Same loop shape as the plain-Python version. The only structural difference: we get the derivative from autograd instead of from the Vega formula.

25Forward pass

Compute the current model price. This call also builds the autograd graph from sigma to price. We will tear that graph down when we call torch.autograd.grad on line 29.

26Signed error

err is a tensor with grad_fn — but the loop logic itself only needs its numerical value via .item().

27Convergence test

.item() pulls the Python float so we can compare against a scalar tolerance. We never compare tensors with abs() < float — that returns a tensor, not a bool, and would not short-circuit the loop properly.

29Compute Vega via autograd

torch.autograd.grad(price, sigma)[0] returns ∂price/∂σ — which is the closed-form Vega. The [0] indexes into the returned tuple (we only asked for one derivative).

30no_grad context

The update sigma -= err / grad_sigma is an in-place mutation of a leaf tensor. Wrapping it in torch.no_grad() tells autograd 'don't track this — it is an optimizer step, not a forward computation.'

31Newton step on σ

Exactly the same update as the closed-form version: σ_{n+1} = σ_n − f(σ_n)/f'(σ_n) with f = BS − market and f' = Vega.

EXAMPLE

After step 1: sigma ≈ 0.2196

32Re-enable grad

An in-place subtract on a leaf clears requires_grad. We restore it so the next iteration's forward pass once again builds a graph that autograd can differentiate.

34Print the answer

Expect σ_IV ≈ 0.2196 in 2–3 steps. Identical to the plain-Python and by-hand answers — autograd just spared us from writing the Vega line.

11 lines without explanation

1import torch
2
3def bs_call_torch(S, K, T, r, sigma):
4    """Differentiable BS call price."""
5    sqT = torch.sqrt(T)
6    d1  = (torch.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * sqT)
7    d2  = d1 - sigma * sqT
8    Ncdf = torch.distributions.Normal(0.0, 1.0).cdf
9    return S * Ncdf(d1) - K * torch.exp(-r * T) * Ncdf(d2)
10
11# Market observation.
12market = torch.tensor(5.00)
13
14# Static market inputs.
15S = torch.tensor(100.0)
16K = torch.tensor(100.0)
17T = torch.tensor(0.25)
18r = torch.tensor(0.05)
19
20# σ is the *only* unknown — it is a leaf with grad.
21sigma = torch.tensor(0.20, requires_grad=True)
22
23# Newton via autograd: each step uses ∂price/∂σ from .grad.
24for step in range(20):
25    price = bs_call_torch(S, K, T, r, sigma)
26    err   = price - market
27    if err.abs().item() < 1e-8:
28        break
29    grad_sigma = torch.autograd.grad(price, sigma)[0]   # = Vega
30    with torch.no_grad():
31        sigma -= err / grad_sigma
32    sigma.requires_grad_(True)
33
34print("σ_IV =", sigma.item(), "steps =", step + 1)

Expected output:

σ_IV = 0.21958809...  steps = 2

For an Asian call you cannot write a closed-form Vega — the average of a geometric Brownian motion has no nice algebra. But you can still write the price by Monte-Carlo and reparameterise the path generator so σ is a leaf tensor. Autograd then differentiates the Monte-Carlo estimator through the path — and the exact same Newton loop above keeps working. This is the modern, path-independent way derivatives desks compute Greeks for any payoff.

From One Number to a Curve: The Volatility Smile

So far we have produced one implied volatility from one option price. But a real options chain has dozens of strikes at each expiry. Repeat the inversion at every strike $K$ and you trace out a function:

\displaystyle \sigma_{IV}(K) \;=\; \text{BS}^{-1}\!\left(C^{\text{mkt}}(K)\right)

If Black-Scholes were correct, this function would be a constant — every strike would imply the same $\sigma$ , because the model assumes a single volatility for the underlying asset. The picture would be a flat horizontal line.

It is not. Across every liquid options market in the world, $\sigma_{IV}(K)$ is a curve. On equity indices it slopes downward (puts are expensive relative to BS). On currencies it's a near-symmetric smile. On commodities you see both, often blended. The shape is called the volatility smile when symmetric, or the volatility skew when one-sided.

Interactive: Build Your Own Smile

Switch between the four shapes below. The dashed blue line is the flat Black-Scholes assumption. The orange curve is the market. Hover over the strike axis to read the implied vol at any point.

ATM vol (σ at K=S)22.0%

Skew (slope at ATM)-0.35

Curvature0.60

Dashed blue: the Black-Scholes world — one σ for every strike. Orange: the market. Hover anywhere along the strike axis to read an implied volatility. The skew preset mimics what you see on the S&P 500 and most single stocks (puts are more expensive than the BS model predicts). The smile preset mimics currencies and many commodities (out-of-the-money options on both sides are more expensive than ATM). Switch to custom and shape your own.

In the custom preset, the curve is a quadratic in log-moneyness:

\displaystyle \sigma_{IV}(K) \;\approx\; \sigma_{ATM} \;+\; \beta \,\log(K/S) \;+\; \gamma \,\log^2(K/S).

Three numbers — ATM level, skew (slope at ATM), and curvature — capture nearly any one-expiry smile observed in practice. Real market-makers parametrize it more carefully (SVI, SABR, …) so that no-arbitrage conditions are guaranteed, but the spirit is the same: compress the whole smile into a handful of interpretable knobs.

Why the Smile Exists — Black-Scholes Is Wrong

The smile is not a flaw in the implied-vol algorithm. It is a feature of reality that Black-Scholes' assumptions cannot accommodate. Recall the assumptions:

Stock returns are log-normal.
Volatility $\sigma$ is constant (not random, not strike-dependent, not time-dependent).
Trading is continuous — no jumps, no gaps.
The risk-free rate is constant and known.

Each of these is wrong in a specific way that bends the smile in a specific direction:

Real-world feature	What it does to OTM puts	What it does to OTM calls	Resulting smile shape
Fat tails (kurtosis > 3)	More expensive	More expensive	Symmetric smile
Leverage effect (vol ↑ when price ↓)	More expensive	Slightly cheaper	Downward skew
Crash-o-phobia (jump risk on the downside)	Much more expensive	About the same	Steep downward skew
Stochastic volatility (σ is random)	More expensive	More expensive	Smile (mild)

Equity indices show all four to varying degrees, but the leverage + crash effects dominate, producing the famous downward skew. FX markets are roughly symmetric (USD/JPY is just as scary as JPY/USD), so a near-symmetric smile dominates there.

The market's confession. The volatility smile is the option market openly saying: "We don't believe the log-normal assumption." The shape of the smile is exactly the correction the market applies to BS prices to compensate.

Skew, Smile, and Term Structure

Three orthogonal axes of departure from the flat BS world:

1. Skew

The slope of $\sigma_{IV}$ at the ATM strike:

\displaystyle \text{Skew} \;=\; \left.\frac{\partial \sigma_{IV}}{\partial \log K}\right|_{K = S}

For S&P 500 options this number is typically around $-0.2$ to $-0.5$ : a 1% lower strike implies a ~0.2–0.5% higher IV. Negative skew means downside protection is expensive.

2. Curvature (the "wings")

The second derivative $\partial^2 \sigma_{IV} / \partial (\log K)^2$ at the ATM strike. Positive curvature ⇒ both wings rise above ATM ⇒ tail risk is priced in.

3. Term structure

Implied volatility also depends on time-to-expiry $T$ . Short-dated ATM IV is more reactive to recent events; long-dated ATM IV mean-reverts toward a cross-asset "normal". Plotting $\sigma_{IV}(K, T)$ for a grid of strikes and expiries gives the implied volatility surface — the central object of every options desk's morning meeting.

Surface, not smile. The smile is one horizontal slice (fixed

T

) through the surface. A full surface is a function of two arguments, and the dynamics of how it deforms day-to-day is the entire research subject of stochastic volatility and local volatility models — see Heston, Dupire, SABR.

Application: Reading the Market's Mind

Once you can compute $\sigma_{IV}(K)$ for every listed strike, you have a window into the risk-neutral distribution of returns that the market is implicitly pricing in.

Breeden-Litzenberger (1978). If $C(K)$ is the market price of a call with strike $K$ , then the second derivative with respect to strike,

\displaystyle q(K) \;=\; e^{rT} \, \frac{\partial^2 C}{\partial K^2},

is the risk-neutral density of $S_T$ . A flat smile would produce a perfect log-normal density. A downward skew produces a density with a fat left tail and a thin right tail — exactly the "crashes are worse than rallies" pattern equity investors fear.

So traders use the smile in several practical ways:

Quoting. Bid/ask spreads on options are published as bid IV / ask IV. Two different option contracts can have wildly different dollar prices but compare directly via their IVs.
Risk management. When the smile steepens, the market is paying up for tail protection — a signal for risk desks to widen Value-at-Risk bands.
Relative-value trades. If two related options have a smile-inconsistent IV gap, sell the rich one and buy the cheap one — the classic skew trade.
Model calibration. Local-vol and stochastic-vol models are calibrated so that the prices they imply reproduce today's entire smile surface. Then exotic options are priced consistently with the vanilla market.

VIX, in one sentence. The VIX index is just a weighted integral of out-of-the-money S&P 500 IVs over a 30-day horizon — a single scalar summary of the whole short-dated smile. When the VIX spikes, the smile got more expensive across the board.

Summary

Implied volatility is the single most important number on an options screen. Mathematically it is the root of a smooth, monotonic equation. Computationally it is a five-line Newton solver. Economically it is the market's consensus forecast of future volatility.

Concept	What it is	Why it matters
σ_IV	The σ that satisfies BS(σ) = C_market	A normalized price every trader compares across strikes
Vega ν > 0	∂BS/∂σ is strictly positive	Guarantees a unique σ_IV — calculus is the proof
Newton on BS	σ_{n+1} = σ_n − (BS−C)/ν	Quadratic convergence; 2–4 steps for any sensible seed
Autograd IV	∂price/∂σ via torch.autograd.grad	Generalizes to exotics with no closed-form Vega
Volatility smile	σ_IV(K) is not flat	The market saying log-normal returns are wrong
Skew	Slope of σ_IV at ATM	Measures crash-o-phobia / leverage effect
Vol surface	σ_IV(K, T)	Inputs to every modern derivatives pricing model

Take-aways:

Implied vol turns Black-Scholes from a forward map (inputs → price) into a two-way translator (price ↔ a normalised vol number).
The whole inversion is justified by one calculus fact: $\nu = \partial C / \partial \sigma > 0$ . That positivity gives existence, uniqueness, and Newton's quadratic convergence — all from one positive partial derivative.
Newton is the right algorithm here because $C(\sigma)$ is smooth, monotonic, and differentiable in closed form. Two iterations from a 20% seed almost always suffice.
PyTorch autograd doesn't change the calculus — it just frees you from writing the Vega line. That generalises immediately to any payoff you can express in differentiable torch ops, including Monte-Carlo exotics.
The volatility smile is the option market openly disagreeing with BS's constant-σ assumption. Its shape encodes fat tails, leverage, and jump risk — the things BS leaves out.
The full implied-vol surface $\sigma_{IV}(K, T)$ is the central state variable of every options desk: pricing, hedging, risk, and exotic calibration all flow from it.

Looking ahead. In the next section we leave closed-form pricing behind and use Monte Carlo to price options — simulating thousands of stock paths, averaging payoffs, and discounting back. The same autograd trick we used for IV will let us extract Greeks from Monte-Carlo prices essentially for free.