Chapter 31
28 min read
Section 266 of 353

Implied Volatility and the Volatility Smile

The Black-Scholes Equation

Learning Objectives

By the end of this section, you will be able to:

  1. Explain what implied volatility is and why it inverts the Black-Scholes relationship between price and σ\sigma.
  2. Prove (intuitively) that implied volatility exists and is unique by appealing to ν=C/σ>0\nu = \partial C/\partial \sigma > 0.
  3. Apply Newton's method on the BS price curve to back out σIV\sigma_{IV} from a market price by hand.
  4. Implement the solver in plain Python and again with PyTorch autograd — and recognise that the two implementations do exactly the same calculus.
  5. Recognise the volatility smile and skew as a fingerprint of the assumptions that Black-Scholes gets wrong.
  6. Read a real option chain's implied-vol curve and explain what each shape tells you about the market's beliefs.

The Big Picture: Inverting Black-Scholes

Up to this point Black-Scholes has been a one-way machine. Plug in five inputs — spot SS, strike KK, time TT, rate rr, and volatility σ\sigma — and out comes a price CC.

C  =  BS(S,K,T,r,σ)\displaystyle C \;=\; \text{BS}(S, K, T, r, \sigma)

But on any options exchange the situation is reversed. The market broadcasts the price CmktC^{\text{mkt}} every second. Four of the five inputs — spot, strike, time, rate — are observed directly. The only thing the market does not hand us is σ\sigma. So we ask the reverse question:

Given a market price CmktC^{\text{mkt}}, what value of σ\sigma makes the Black-Scholes formula spit out exactly that price?

That single number is the option market's answer to "how volatile do you think the stock will be from now until expiry?". The number itself is called the implied volatility, and the whole machinery of modern options trading rests on it.

Analogy. Implied vol is to options what a yield is to bonds. Nobody quotes a bond by its present value; they quote it by the yield-to-maturity that produces that present value. Likewise nobody quotes an option by its dollar price; they quote it by the σ\sigma that produces that dollar price through Black-Scholes.

Definition: Implied Volatility

Fix S,K,T,rS, K, T, r and the observed market price CmktC^{\text{mkt}}. The implied volatility σIV\sigma_{IV} is the value that solves the equation

BS(S,K,T,r,σIV)  =  Cmkt.\displaystyle \text{BS}(S, K, T, r, \sigma_{IV}) \;=\; C^{\text{mkt}}.

Look at what the equation says, in words: I have a function of one variable σ\sigma, namely f(σ)=BS(σ)Cmktf(\sigma) = \text{BS}(\sigma) - C^{\text{mkt}}, and I want the root of that function. Pure calculus.

Why not just "measure" volatility? You can measure realised volatility (the standard deviation of past returns), and people do — but that is backward-looking. Implied volatility is forward-looking: it is the market's consensus forecast of volatility over the option's remaining life. The two are almost never equal, and the gap is itself a tradable spread.

Why IV Is Unique — Vega Is the Reason

How do we know the equation BS(σ)=Cmkt\text{BS}(\sigma) = C^{\text{mkt}} has a solution at all, and that the solution is unique? One word: Vega.

Recall from section-06 that for a European call,

ν  =  Cσ  =  Sφ(d1)T  >  0\displaystyle \nu \;=\; \frac{\partial C}{\partial \sigma} \;=\; S \, \varphi(d_1) \, \sqrt{T} \;>\; 0

for every σ>0\sigma > 0. The price C(σ)C(\sigma) is therefore a strictly increasing function of σ\sigma on (0,)(0, \infty). Add the two boundary facts —

  • As σ0\sigma \to 0, the call price approaches its intrinsic value max(SKerT,0)\max(S - K e^{-rT}, 0).
  • As σ\sigma \to \infty, the call price approaches the spot SS (you essentially own the upside, with zero downside discounted away).

— and the Intermediate Value Theorem gives us exactly what we wanted:

Existence + uniqueness of implied volatility. For any market price inside the no-arbitrage band max(SKerT,0)<Cmkt<S\max(S - K e^{-rT}, 0) \,<\, C^{\text{mkt}} \,<\, S, there is a unique σIV>0\sigma_{IV} > 0 such that BS(σIV)=Cmkt\text{BS}(\sigma_{IV}) = C^{\text{mkt}}.

This is the entire mathematical justification for talking about "the" implied volatility. It is a one-sentence consequence of Vega being positive — which itself is a one-line consequence of φ(d1)\varphi(d_1) being positive.


Interactive: Inverting the BS Curve

Here is the situation drawn explicitly. The blue curve is C(σ)=BS(S,K,T,r,σ)C(\sigma) = \text{BS}(S, K, T, r, \sigma). The dashed yellow horizontal line is the market price CmktC^{\text{mkt}}. The intersection — the green vertical line — is σIV\sigma_{IV}.

step 0: σ = 40.000%, C(σ) = 8.5526, err = 3.5526
Spot S100.00
Strike K100.00
Time T (yr)0.25
Rate r5.0%
Market price5.00
Start σ₀40%

Blue curve: the Black-Scholes call price as a function of σ. Dashed yellow line: the market-quoted price you are trying to match. Each red dot is a Newton iterate; the short red line is the tangent (slope = vega). Where that tangent crosses the yellow line is the next iterate. Increase volatility ⇒ the curve goes up and eventually stalls (deep-OTM behavior). Hit Converge to see the algorithm finish in two or three steps for almost any reasonable starting σ₀.

Drag the "market price" slider up and down. The green vertical line slides with it. Notice three things:

  1. The blue curve is monotonic — it never has two crossings, so the implied vol is unique.
  2. Far from at-the-money the curve flattens, so a small price change translates into a large change in σ. That is why deep-OTM IVs are noisy — Vega is small there.
  3. The red tangent is the linear approximation Newton's method uses. The next iterate is where that tangent crosses the yellow line. Two steps almost always suffice.

Newton's Method on the BS Price Curve

Newton-Raphson on the function f(σ)=BS(σ)Cmktf(\sigma) = \text{BS}(\sigma) - C^{\text{mkt}} is the standard solver. The update rule is the one-line classic:

σn+1  =  σn    f(σn)f(σn)  =  σn    BS(σn)Cmktν(σn).\displaystyle \sigma_{n+1} \;=\; \sigma_n \;-\; \frac{f(\sigma_n)}{f'(\sigma_n)} \;=\; \sigma_n \;-\; \frac{\text{BS}(\sigma_n) - C^{\text{mkt}}}{\nu(\sigma_n)}.

Each iteration: evaluate price and Vega at the current σ, take the Newton step, repeat until the error is below tolerance. The guarantee is quadratic convergence near the root — every iteration roughly squares the error.

Why Newton instead of bisection? Bisection always works but converges only linearly. On Black-Scholes the function is smooth, monotonic, gently convex, and we already have its derivative in closed form. There is no reason not to use Newton — and you pay for that 2-iteration speed with almost no robustness cost.

For a deeply out-of-the-money option, Vega can be tiny — division blows up. The practical fix is to seed Newton sensibly (e.g. with the Brenner-Subrahmanyam approximation σ02π/TCmkt/S\sigma_0 \approx \sqrt{2\pi/T} \cdot C^{\text{mkt}}/S) or fall back to bisection if a Newton step pushes σ outside a sensible band.


Worked Example (Try It By Hand)

Same inputs we used to derive the Greeks in section-06: S=100S = 100, K=100K = 100, T=0.25T = 0.25 years, r=0.05r = 0.05. Today the market is quoting an ATM call at Cmkt=5.00C^{\text{mkt}} = 5.00. What is the implied volatility?

Click to expand the by-hand Newton iteration

Step 0 — seed. Take σ0=0.20\sigma_0 = 0.20. We already computed in section-06 that, at this seed,

  d1=0.175,d2=0.075,N(d1)0.5695,N(d2)0.5299,φ(d1)0.3932.\,\;d_1 = 0.175, \quad d_2 = 0.075, \quad N(d_1) \approx 0.5695, \quad N(d_2) \approx 0.5299, \quad \varphi(d_1) \approx 0.3932.

Price at σ = 0.20:

    C=1000.5695    100e0.01250.5299  =  56.9552.336  =  4.615.\;\;C = 100 \cdot 0.5695 \;-\; 100 \cdot e^{-0.0125} \cdot 0.5299 \;=\; 56.95 \,-\, 52.336 \;=\; 4.615.

Vega at σ = 0.20:

    ν=Sφ(d1)T=1000.39320.519.66.\;\;\nu = S \, \varphi(d_1) \, \sqrt{T} = 100 \cdot 0.3932 \cdot 0.5 \,\approx\, 19.66.

Error: f(σ0)=4.6155.000=0.385f(\sigma_0) = 4.615 - 5.000 = -0.385. The model is below the market — we need more volatility.

Newton step:

    σ1  =  0.20    0.38519.66  =  0.20+0.01960  =  0.21960.\;\;\sigma_1 \;=\; 0.20 \;-\; \frac{-0.385}{19.66} \;=\; 0.20 + 0.01960 \;=\; 0.21960.

Step 1 — verify. Re-evaluate at σ1=0.21960\sigma_1 = 0.21960. Now

    d1=0+(0.05+0.50.04822)0.250.219600.5=0.018530.109800.1687,\;\;d_1 = \frac{0 + (0.05 + 0.5 \cdot 0.04822) \cdot 0.25}{0.21960 \cdot 0.5} = \frac{0.01853}{0.10980} \approx 0.1687,

    d2=0.16870.109800.0589,N(d1)0.5670,N(d2)0.5235.\;\;d_2 = 0.1687 - 0.10980 \approx 0.0589, \quad N(d_1) \approx 0.5670, \quad N(d_2) \approx 0.5235.

    C=1000.5670100e0.01250.5235  =  56.7051.70  =  5.000.\;\;C = 100 \cdot 0.5670 - 100 \cdot e^{-0.0125} \cdot 0.5235 \;=\; 56.70 - 51.70 \;=\; 5.000.

That is the market price to four decimals. Newton converged in one step from a generic 20% seed because the curve is so close to linear over this range.

Answer: σIV0.2196=21.96%\sigma_{IV} \approx 0.2196 = 21.96\%.

What just happened geometrically: starting from (0.20,  4.615)(0.20, \;4.615) on the BS curve, we drew the tangent of slope 19.66. That tangent crosses the horizontal line C=5.00C = 5.00 at σ=0.2196\sigma = 0.2196 — and that point is so close to the true root that the next BS evaluation already agrees to four decimals.


Plain Python: Newton Solver for IV

Here is the same algorithm, written in plain Python with only the standard math module. The code is short because the idea is short.

Newton-Raphson solver for implied volatility
🐍implied_vol.py
1Just the math module

Everything we need — log, sqrt, exp, erf, pi — lives in the standard library. We deliberately avoid scipy here so the example is dependency-free and easy to drop into a notebook.

3BS call + Vega in one helper

Every Newton step needs the price (to measure error) and Vega (to take the step). Returning both from the same call avoids redundant work — both d₁ and d₂ are reused.

5Square root of T

√T appears in d₁, d₂, AND in Vega = S·φ(d₁)·√T. Compute it once and store it.

EXAMPLE
T = 0.25 → sqT = 0.5
6Compute d₁

Same d₁ we used in section-04 to derive BS. Note how the volatility σ appears in both the numerator (via 0.5σ² T) AND the denominator (σ√T) — that is what makes the equation interesting to invert.

EXAMPLE
S=K=100, r=0.05, σ=0.20, T=0.25 → d₁ = (0 + 0.0175)/(0.1) = 0.175
7Compute d₂

d₂ = d₁ − σ√T. Shifted one standard deviation lower. N(d₂) is the risk-neutral probability of finishing in-the-money.

EXAMPLE
d₂ = 0.175 − 0.10 = 0.075
8Standard normal CDF at d₁

math.erf gives the error function; the standard-normal CDF is (1 + erf(x/√2)) / 2. No SciPy needed.

EXAMPLE
N(0.175) ≈ 0.5695
9Standard normal CDF at d₂

Same calculation, evaluated at d₂. N(d₂) is the discounted probability used to weight the strike in the call price.

EXAMPLE
N(0.075) ≈ 0.5299
10Standard normal PDF at d₁

φ(d₁) is the bell-curve height at d₁. It will appear in Vega; both Gamma and Vega vanish when d₁ is far from zero because φ does.

EXAMPLE
φ(0.175) ≈ 0.3932
11Call price

C = S·N(d₁) − K·e^{−rT}·N(d₂). The closed-form Black-Scholes call. This is the quantity we will match against the market.

EXAMPLE
100·0.5695 − 100·e^{−0.0125}·0.5299 ≈ 4.615
12Vega = ∂C/∂σ

Closed-form Vega: ν = S·φ(d₁)·√T. This is the derivative we hand to Newton. Notice ν > 0 strictly — that is the entire reason the inversion is well-posed.

EXAMPLE
100·0.3932·0.5 ≈ 19.66
15Define implied_vol(...)

Inputs: the market-observed price plus the same five Black-Scholes inputs except σ (which we are solving for). Output: the implied volatility and how many Newton iterations it took.

17Default σ₀ = 20%

A reasonable seed for equity options. The basin of attraction for Newton on BS is enormous because the price-vs-σ curve is monotonic and gently convex — almost any seed in [5%, 80%] works.

18Default tolerance 1e-8 and 50 iterations

Newton converges quadratically near the root, so 50 iterations is overkill. Most cases finish in 2–4. The tight tolerance is so the solver also works for very small Vegas in deep-OTM cases.

19Initialize the iterate

sigma will be overwritten every loop iteration. We seed it from sigma0 once.

20Newton loop

Each pass: evaluate price + Vega at the current σ, take one Newton step, check convergence. We bail early on tol or on a vanishing Vega.

21Single combined call

We reuse the helper. Price tells us the error; Vega tells us the slope. Both come from the same d₁ — no wasted work.

22Signed error

err = model − market. Positive means we are above the market and σ must come down; negative means we are below and σ must go up. Newton handles the sign automatically through the division.

EXAMPLE
price=4.615, market=5.00 → err = −0.385
23Convergence check

Compare absolute error to tol. As soon as the model price is within 1e-8 of the market we return — there is no point in chasing more digits than the bid-ask spread.

24Return σ and iteration count

Returning the iteration count is useful both as a sanity check (1 means luck, >10 means something is wrong with the inputs) and for plotting convergence experiments.

25Guard: vanishing Vega

If σ wanders to a region where Vega ≈ 0 (deep-OTM or near-zero T), Newton's update blows up. Raising explicitly is much safer than silently producing a NaN.

27Newton update

σ_{n+1} = σ_n − f(σ_n) / f'(σ_n). For us f(σ) = BS(σ) − market and f'(σ) = Vega(σ). The max(1e-4, ...) clamp keeps us from accidentally going negative on a wild step early on.

EXAMPLE
σ = 0.20 − (−0.385)/19.66 = 0.21959
28Did not converge

If we exhaust max_iter we raise instead of returning a bad answer. For Newton-on-BS this almost never fires — it is a defensive guard for pathological inputs (no-arbitrage violations, e.g. a call price below S − Ke^{−rT}).

31Run the worked example

S=100, K=100, T=0.25, r=5%, target market price = $5.00, σ₀ = 20%. This is the same scenario we hand-traced above.

34Pretty-print result

Expected: σ_IV ≈ 0.219588 (≈ 21.96%) in 2 iterations. Quadratic convergence ⇒ each step roughly squares the error, so 2 steps suffice to go from 4e-1 to 1e-7.

9 lines without explanation
1import math
2
3def bs_call(S, K, T, r, sigma):
4    """Black-Scholes call price and Vega."""
5    sqT = math.sqrt(T)
6    d1  = (math.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * sqT)
7    d2  = d1 - sigma * sqT
8    Nd1 = 0.5 * (1 + math.erf(d1 / math.sqrt(2)))
9    Nd2 = 0.5 * (1 + math.erf(d2 / math.sqrt(2)))
10    pdf = math.exp(-0.5 * d1 * d1) / math.sqrt(2 * math.pi)
11    price = S * Nd1 - K * math.exp(-r * T) * Nd2
12    vega  = S * pdf * sqT
13    return price, vega
14
15def implied_vol(market_price, S, K, T, r,
16                sigma0=0.20, tol=1e-8, max_iter=50):
17    """Newton-Raphson search for the σ that reproduces the market price."""
18    sigma = sigma0
19    for i in range(max_iter):
20        price, vega = bs_call(S, K, T, r, sigma)
21        err = price - market_price
22        if abs(err) < tol:
23            return sigma, i
24        if vega < 1e-10:
25            raise RuntimeError("Vega vanished — Newton stalled.")
26        sigma = max(1e-4, sigma - err / vega)
27    raise RuntimeError("Did not converge within max_iter.")
28
29# Same worked example as the by-hand derivation.
30sigma, iters = implied_vol(market_price=5.00,
31                           S=100, K=100, T=0.25, r=0.05,
32                           sigma0=0.20)
33print(f"σ_IV = {sigma:.6f}  in {iters} iterations")

Running this prints:

σ_IV = 0.219588  in 2 iterations

Two iterations — same as the by-hand walk. Quadratic convergence on a smooth monotonic function is gorgeous.


PyTorch Autograd Solver for IV

The closed-form Vega is easy on a vanilla call. But once we move to exotic payoffs (barrier, Asian, lookback), the Vega has no clean formula. Autograd doesn't care — it computes C/σ\partial C / \partial \sigma directly from the price function you wrote.

The structural change versus the plain-Python solver is one line: instead of returning Vega from bs_call, we ask autograd for it. Everything else is identical.

Implied vol via PyTorch autograd
🐍implied_vol_torch.py
1Import torch

We use torch because we want the derivative ∂price/∂σ (which equals Vega) computed automatically. We never write a Vega formula again — autograd reproduces it from the price formula.

3Differentiable BS call

Exactly the textbook formula, but every op is a torch op. That means the chain S → d₁ → N(d₁) → price is a graph autograd can traverse.

4Docstring as contract

Single-line description so future readers know this function is meant to stay differentiable — do not silently replace torch.log with numpy.log.

5torch.sqrt for √T

Using torch.sqrt (not math.sqrt) so T can be a tensor that participates in autograd. Today we only differentiate by σ, but the same code works if you later want Theta or Rho.

6d₁ in torch

Same algebra as in the numpy version. The key fact: σ shows up twice — inside (r + 0.5σ²)T AND in the denominator (σ√T). Autograd handles both paths via the product / quotient rule automatically.

7d₂ from d₁

d₂ = d₁ − σ√T. d₂ depends on σ both through d₁ (chain) and directly through the subtraction. Autograd composes those gradients without any manual bookkeeping.

8Normal CDF as a torch op

torch.distributions.Normal(0, 1).cdf gives a differentiable N(x); its derivative is φ(x). This single fact is the bridge between the BS closed-form derivative chain rule and autograd.

9Compose the price

Price = S·N(d₁) − K·e^{−rT}·N(d₂). Pure torch ops top to bottom — the computation graph is complete and differentiable in every input.

12Market observation

The price quoted by the exchange. Just a constant — we are NOT differentiating with respect to it.

EXAMPLE
market = tensor(5.00)
15Static spot

Treated as a constant for this solver. Not requires_grad because we only need ∂/∂σ.

16Strike

Constant. K never moves during a single option's life — it is part of the contract.

17Time to expiry

Constant for this snapshot. (If you wanted Theta as a free byproduct, set requires_grad=True here too.)

18Risk-free rate

Constant. Updated overnight from the yield curve in practice; held fixed during a single solve.

21σ is the variable we solve for

requires_grad=True tells autograd to track this leaf. Now any computation downstream of sigma can be differentiated by it.

EXAMPLE
torch.autograd.grad(price, sigma) → Vega
24Newton loop

Same loop shape as the plain-Python version. The only structural difference: we get the derivative from autograd instead of from the Vega formula.

25Forward pass

Compute the current model price. This call also builds the autograd graph from sigma to price. We will tear that graph down when we call torch.autograd.grad on line 29.

26Signed error

err is a tensor with grad_fn — but the loop logic itself only needs its numerical value via .item().

27Convergence test

.item() pulls the Python float so we can compare against a scalar tolerance. We never compare tensors with abs() < float — that returns a tensor, not a bool, and would not short-circuit the loop properly.

29Compute Vega via autograd

torch.autograd.grad(price, sigma)[0] returns ∂price/∂σ — which is the closed-form Vega. The [0] indexes into the returned tuple (we only asked for one derivative).

30no_grad context

The update sigma -= err / grad_sigma is an in-place mutation of a leaf tensor. Wrapping it in torch.no_grad() tells autograd 'don't track this — it is an optimizer step, not a forward computation.'

31Newton step on σ

Exactly the same update as the closed-form version: σ_{n+1} = σ_n − f(σ_n)/f'(σ_n) with f = BS − market and f' = Vega.

EXAMPLE
After step 1: sigma ≈ 0.2196
32Re-enable grad

An in-place subtract on a leaf clears requires_grad. We restore it so the next iteration's forward pass once again builds a graph that autograd can differentiate.

34Print the answer

Expect σ_IV ≈ 0.2196 in 2–3 steps. Identical to the plain-Python and by-hand answers — autograd just spared us from writing the Vega line.

11 lines without explanation
1import torch
2
3def bs_call_torch(S, K, T, r, sigma):
4    """Differentiable BS call price."""
5    sqT = torch.sqrt(T)
6    d1  = (torch.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * sqT)
7    d2  = d1 - sigma * sqT
8    Ncdf = torch.distributions.Normal(0.0, 1.0).cdf
9    return S * Ncdf(d1) - K * torch.exp(-r * T) * Ncdf(d2)
10
11# Market observation.
12market = torch.tensor(5.00)
13
14# Static market inputs.
15S = torch.tensor(100.0)
16K = torch.tensor(100.0)
17T = torch.tensor(0.25)
18r = torch.tensor(0.05)
19
20# σ is the *only* unknown — it is a leaf with grad.
21sigma = torch.tensor(0.20, requires_grad=True)
22
23# Newton via autograd: each step uses ∂price/∂σ from .grad.
24for step in range(20):
25    price = bs_call_torch(S, K, T, r, sigma)
26    err   = price - market
27    if err.abs().item() < 1e-8:
28        break
29    grad_sigma = torch.autograd.grad(price, sigma)[0]   # = Vega
30    with torch.no_grad():
31        sigma -= err / grad_sigma
32    sigma.requires_grad_(True)
33
34print("σ_IV =", sigma.item(), "steps =", step + 1)

Expected output:

σ_IV = 0.21958809...  steps = 2
For an Asian call you cannot write a closed-form Vega — the average of a geometric Brownian motion has no nice algebra. But you can still write the price by Monte-Carlo and reparameterise the path generator so σ is a leaf tensor. Autograd then differentiates the Monte-Carlo estimator through the path — and the exact same Newton loop above keeps working. This is the modern, path-independent way derivatives desks compute Greeks for any payoff.

From One Number to a Curve: The Volatility Smile

So far we have produced one implied volatility from one option price. But a real options chain has dozens of strikes at each expiry. Repeat the inversion at every strike KK and you trace out a function:

σIV(K)  =  BS1 ⁣(Cmkt(K))\displaystyle \sigma_{IV}(K) \;=\; \text{BS}^{-1}\!\left(C^{\text{mkt}}(K)\right)

If Black-Scholes were correct, this function would be a constant — every strike would imply the same σ\sigma, because the model assumes a single volatility for the underlying asset. The picture would be a flat horizontal line.

It is not. Across every liquid options market in the world, σIV(K)\sigma_{IV}(K) is a curve. On equity indices it slopes downward (puts are expensive relative to BS). On currencies it's a near-symmetric smile. On commodities you see both, often blended. The shape is called the volatility smile when symmetric, or the volatility skew when one-sided.


Interactive: Build Your Own Smile

Switch between the four shapes below. The dashed blue line is the flat Black-Scholes assumption. The orange curve is the market. Hover over the strike axis to read the implied vol at any point.

ATM vol (σ at K=S)22.0%
Skew (slope at ATM)-0.35
Curvature0.60

Dashed blue: the Black-Scholes world — one σ for every strike. Orange: the market. Hover anywhere along the strike axis to read an implied volatility. The skew preset mimics what you see on the S&P 500 and most single stocks (puts are more expensive than the BS model predicts). The smile preset mimics currencies and many commodities (out-of-the-money options on both sides are more expensive than ATM). Switch to custom and shape your own.

In the custom preset, the curve is a quadratic in log-moneyness:

σIV(K)    σATM  +  βlog(K/S)  +  γlog2(K/S).\displaystyle \sigma_{IV}(K) \;\approx\; \sigma_{ATM} \;+\; \beta \,\log(K/S) \;+\; \gamma \,\log^2(K/S).

Three numbers — ATM level, skew (slope at ATM), and curvature — capture nearly any one-expiry smile observed in practice. Real market-makers parametrize it more carefully (SVI, SABR, …) so that no-arbitrage conditions are guaranteed, but the spirit is the same: compress the whole smile into a handful of interpretable knobs.


Why the Smile Exists — Black-Scholes Is Wrong

The smile is not a flaw in the implied-vol algorithm. It is a feature of reality that Black-Scholes' assumptions cannot accommodate. Recall the assumptions:

  1. Stock returns are log-normal.
  2. Volatility σ\sigma is constant (not random, not strike-dependent, not time-dependent).
  3. Trading is continuous — no jumps, no gaps.
  4. The risk-free rate is constant and known.

Each of these is wrong in a specific way that bends the smile in a specific direction:

Real-world featureWhat it does to OTM putsWhat it does to OTM callsResulting smile shape
Fat tails (kurtosis > 3)More expensiveMore expensiveSymmetric smile
Leverage effect (vol ↑ when price ↓)More expensiveSlightly cheaperDownward skew
Crash-o-phobia (jump risk on the downside)Much more expensiveAbout the sameSteep downward skew
Stochastic volatility (σ is random)More expensiveMore expensiveSmile (mild)

Equity indices show all four to varying degrees, but the leverage + crash effects dominate, producing the famous downward skew. FX markets are roughly symmetric (USD/JPY is just as scary as JPY/USD), so a near-symmetric smile dominates there.

The market's confession. The volatility smile is the option market openly saying: "We don't believe the log-normal assumption." The shape of the smile is exactly the correction the market applies to BS prices to compensate.

Skew, Smile, and Term Structure

Three orthogonal axes of departure from the flat BS world:

1. Skew

The slope of σIV\sigma_{IV} at the ATM strike:

Skew  =  σIVlogKK=S\displaystyle \text{Skew} \;=\; \left.\frac{\partial \sigma_{IV}}{\partial \log K}\right|_{K = S}

For S&P 500 options this number is typically around 0.2-0.2 to 0.5-0.5: a 1% lower strike implies a ~0.2–0.5% higher IV. Negative skew means downside protection is expensive.

2. Curvature (the "wings")

The second derivative 2σIV/(logK)2\partial^2 \sigma_{IV} / \partial (\log K)^2 at the ATM strike. Positive curvature ⇒ both wings rise above ATM ⇒ tail risk is priced in.

3. Term structure

Implied volatility also depends on time-to-expiry TT. Short-dated ATM IV is more reactive to recent events; long-dated ATM IV mean-reverts toward a cross-asset "normal". Plotting σIV(K,T)\sigma_{IV}(K, T) for a grid of strikes and expiries gives the implied volatility surface — the central object of every options desk's morning meeting.

Surface, not smile. The smile is one horizontal slice (fixed TT) through the surface. A full surface is a function of two arguments, and the dynamics of how it deforms day-to-day is the entire research subject of stochastic volatility and local volatility models — see Heston, Dupire, SABR.

Application: Reading the Market's Mind

Once you can compute σIV(K)\sigma_{IV}(K) for every listed strike, you have a window into the risk-neutral distribution of returns that the market is implicitly pricing in.

Breeden-Litzenberger (1978). If C(K)C(K) is the market price of a call with strike KK, then the second derivative with respect to strike,

q(K)  =  erT2CK2,\displaystyle q(K) \;=\; e^{rT} \, \frac{\partial^2 C}{\partial K^2},

is the risk-neutral density of STS_T. A flat smile would produce a perfect log-normal density. A downward skew produces a density with a fat left tail and a thin right tail — exactly the "crashes are worse than rallies" pattern equity investors fear.

So traders use the smile in several practical ways:

  1. Quoting. Bid/ask spreads on options are published as bid IV / ask IV. Two different option contracts can have wildly different dollar prices but compare directly via their IVs.
  2. Risk management. When the smile steepens, the market is paying up for tail protection — a signal for risk desks to widen Value-at-Risk bands.
  3. Relative-value trades. If two related options have a smile-inconsistent IV gap, sell the rich one and buy the cheap one — the classic skew trade.
  4. Model calibration. Local-vol and stochastic-vol models are calibrated so that the prices they imply reproduce today's entire smile surface. Then exotic options are priced consistently with the vanilla market.
VIX, in one sentence. The VIX index is just a weighted integral of out-of-the-money S&P 500 IVs over a 30-day horizon — a single scalar summary of the whole short-dated smile. When the VIX spikes, the smile got more expensive across the board.

Summary

Implied volatility is the single most important number on an options screen. Mathematically it is the root of a smooth, monotonic equation. Computationally it is a five-line Newton solver. Economically it is the market's consensus forecast of future volatility.

ConceptWhat it isWhy it matters
σ_IVThe σ that satisfies BS(σ) = C_marketA normalized price every trader compares across strikes
Vega ν > 0∂BS/∂σ is strictly positiveGuarantees a unique σ_IV — calculus is the proof
Newton on BSσ_{n+1} = σ_n − (BS−C)/νQuadratic convergence; 2–4 steps for any sensible seed
Autograd IV∂price/∂σ via torch.autograd.gradGeneralizes to exotics with no closed-form Vega
Volatility smileσ_IV(K) is not flatThe market saying log-normal returns are wrong
SkewSlope of σ_IV at ATMMeasures crash-o-phobia / leverage effect
Vol surfaceσ_IV(K, T)Inputs to every modern derivatives pricing model

Take-aways:

  1. Implied vol turns Black-Scholes from a forward map (inputs → price) into a two-way translator (price ↔ a normalised vol number).
  2. The whole inversion is justified by one calculus fact: ν=C/σ>0\nu = \partial C / \partial \sigma > 0. That positivity gives existence, uniqueness, and Newton's quadratic convergence — all from one positive partial derivative.
  3. Newton is the right algorithm here because C(σ)C(\sigma) is smooth, monotonic, and differentiable in closed form. Two iterations from a 20% seed almost always suffice.
  4. PyTorch autograd doesn't change the calculus — it just frees you from writing the Vega line. That generalises immediately to any payoff you can express in differentiable torch ops, including Monte-Carlo exotics.
  5. The volatility smile is the option market openly disagreeing with BS's constant-σ assumption. Its shape encodes fat tails, leverage, and jump risk — the things BS leaves out.
  6. The full implied-vol surface σIV(K,T)\sigma_{IV}(K, T) is the central state variable of every options desk: pricing, hedging, risk, and exotic calibration all flow from it.
Looking ahead. In the next section we leave closed-form pricing behind and use Monte Carlo to price options — simulating thousands of stock paths, averaging payoffs, and discounting back. The same autograd trick we used for IV will let us extract Greeks from Monte-Carlo prices essentially for free.
Loading comments...