Chapter 5
20 min read
Section 41 of 353

Derivative of e^x: The Special Exponential

Derivatives of Transcendental Functions

Learning Objectives

By the end of this section, you will be able to:

  1. State and prove the rule ddxex=ex\dfrac{d}{dx}\,e^x = e^x
  2. Explain the magical limit limh0eh1h=1\displaystyle\lim_{h \to 0} \dfrac{e^h - 1}{h} = 1 and why it defines the number ee
  3. Compute the slope of the tangent line to y=exy = e^x at any point without using a calculator beyond evaluating exe^x
  4. Recognize why a=ea = e is the unique base for which axa^x is its own derivative
  5. Apply the chain-rule version ddxekx=kekx\dfrac{d}{dx}\, e^{kx} = k\, e^{kx} to growth and decay problems
  6. Verify the derivative numerically in plain Python and via PyTorch autograd

The Big Picture: A Function Equal to Its Own Slope

“Of all functions known to mathematics, only one — up to a constant multiplier — is exactly equal to its own derivative. That function is exe^x.”

Stop and let that land. Take any other function you can think of — a polynomial, a sine, a logarithm — and its derivative is some different function. The derivative of x2x^2 is 2x2x, a different shape entirely. The derivative of sinx\sin x is cosx\cos x, a shifted wave. But the derivative of exe^x is itself:

ddxex  =  ex\frac{d}{dx}\, e^x \;=\; e^x

What this rule actually says

At every point on the curve y=exy = e^x, the height of the curve equals the slope of its tangent line.

If the curve is 2.7 units above the x-axis, the tangent at that point rises 2.7 units for every 1 unit you move right. If the curve is 20 units up, the tangent has slope 20. The steepness and the value are the same number.

This single fact is the reason ee shows up everywhere: compound interest, radioactive decay, population growth, RC-circuit charging, Bayesian priors, neural-network softmax outputs, the Schrödinger equation, the normal distribution. Every time a quantity grows at a rate proportional to itself, the answer is dressed in ee.


Intuition: Money and Bacteria

The bank account that compounds continuously

Imagine a savings account that pays interest continuously at rate 100%100\% per year. After time tt years your balance is B(t)=etB(t) = e^t (starting from B(0)=1B(0) = 1).

The rate at which money is added to the account at instant tt is B(t)B'(t) — interest dollars per year. Common sense says:

interest per year  =  (rate)×(principal)  =  1B(t)  =  B(t)\text{interest per year} \;=\; (\text{rate}) \times (\text{principal}) \;=\; 1 \cdot B(t) \;=\; B(t)

So B(t)=B(t)B'(t) = B(t). The bigger the balance, the faster it grows. The function and its derivative are the same. That is the differential equation y=yy' = y, and its solution is y=ety = e^t.

A colony of bacteria

Now replace dollars with bacteria. Each bacterium splits at a constant per-capita rate. If there are N(t)N(t) bacteria, the population produces new bacteria at rate proportional to N(t)N(t) itself — twice as many parents, twice as many babies per minute. Same equation, N(t)=N(t)N'(t) = N(t), same solution N(t)=etN(t) = e^t.

The slogan to remember

Whenever you see “the rate of growth is proportional to the current amount,” the answer is an exponential. Whenever the constant of proportionality is exactly 1, the answer is exe^x.


Numerical Discovery

Let's set the theory aside for a moment and measure the slope of y=exy = e^x by hand at three different x values. We will use the secant-line slope with a small step h=0.001h = 0.001 as a stand-in for the tangent slope:

slope at x    ex+hexh,h=0.001\text{slope at } x \;\approx\; \frac{e^{x+h} - e^x}{h}, \qquad h = 0.001
xHeight e^xNumerical slope (h=0.001)Ratio slope / height
01.00000000001.00050016671.0005
12.71828182852.71964147621.0005
27.38905609897.39275146601.0005
320.085536923220.09558172471.0005

Look at the last column. The ratio is the same number at every xx! And as hh shrinks toward zero that ratio approaches 1. The slope and the height aren't just proportional; they're equal.

That is the entire empirical content of this section. The rest is just turning the observation into a proof.


Derivation from First Principles

Apply the limit definition of the derivative to f(x)=exf(x) = e^x:

f(x)  =  limh0f(x+h)f(x)h  =  limh0ex+hexhf'(x) \;=\; \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \;=\; \lim_{h \to 0} \frac{e^{x+h} - e^{x}}{h}

Use the most useful property of the exponential — ea+b=eaebe^{a+b} = e^a \cdot e^b — to split ex+he^{x+h} as exehe^x \cdot e^h:

f(x)  =  limh0exehexh  =  limh0ex(eh1)hf'(x) \;=\; \lim_{h \to 0} \frac{e^x \cdot e^h - e^x}{h} \;=\; \lim_{h \to 0} \frac{e^x \,(e^h - 1)}{h}

The factor exe^x does not depend on hh, so it survives the limit as a constant. Pull it outside:

f(x)  =  exlimh0eh1hf'(x) \;=\; e^x \cdot \lim_{h \to 0} \frac{e^h - 1}{h}

Now the entire question reduces to a single number: what is limh0eh1h\displaystyle\lim_{h \to 0} \dfrac{e^h - 1}{h}?

A clean test

If this limit equals 1, then f(x)=ex1=exf'(x) = e^x \cdot 1 = e^x — done. If it equals any other number kk, the derivative would be kexk \, e^x instead. So the value of this one little limit is everything.


The Magical Limit (e^h − 1)/h → 1

Drag the slider below and watch the orange dot fall onto the green dashed line at y=1y = 1. The orange curve has a hole at h=0h = 0 (we'd divide by zero) — but the limiting value is unambiguously 1.

The Magical Limit: (e^h − 1) / h → 1

Drag h toward zero. The whole derivative of e^x rests on this one limit.

y = 1 (target)-1.5-1-0.500.511.50.00.51.01.52.02.53.0h →(e^h − 1) / h
Current h
5.00e-1
(e^h − 1) / h
1.29744254
Distance to 1
2.97e-1
h(e^h − 1) / h| value − 1 |
11.71828182857.18e-1
0.51.29744254142.97e-1
0.251.13610166681.36e-1
0.11.05170918085.17e-2
0.051.02542192752.54e-2
0.011.00501670845.02e-3
0.0051.00250417192.50e-3
0.0011.00050016675.00e-4
0.00051.00025004172.50e-4
0.00011.00005000175.00e-5
As h shrinks, the orange dot snaps onto the green dashed line at y = 1.
That single fact is the seed from which d/dx (e^x) = e^x grows.

Why is the limit 1 and not some other number?

The number ee can be defined as the unique positive real number for which this limit equals 1. From chapter 1 we also have the equivalent definitions:

  1. e=limn(1+1n)n\displaystyle e = \lim_{n \to \infty} \left(1 + \tfrac{1}{n}\right)^n — compound interest as the compounding period vanishes.
  2. e=k=01k!=1+1+12+16+124+\displaystyle e = \sum_{k=0}^{\infty} \tfrac{1}{k!} = 1 + 1 + \tfrac{1}{2} + \tfrac{1}{6} + \tfrac{1}{24} + \cdots — Euler's power series.
  3. ex=k=0xkk!e^x = \displaystyle\sum_{k=0}^{\infty} \tfrac{x^k}{k!} — the Taylor series for the exponential.

Use definition (3) and plug in eh1e^h - 1:

eh1  =  h+h22!+h33!+e^h - 1 \;=\; h + \frac{h^2}{2!} + \frac{h^3}{3!} + \cdots

Divide by hh:

eh1h  =  1+h2!+h23!+\frac{e^h - 1}{h} \;=\; 1 + \frac{h}{2!} + \frac{h^2}{3!} + \cdots

Now let h0h \to 0. Every term except the leading 11 vanishes. The limit is exactly 1.

The fingerprint of e

That leading 11 in the series isn't a coincidence — it is exactly why ee is the “natural” base for the exponential. Any other base aa gives a different leading coefficient, namely ln(a)\ln(a). We'll see this next.


Why e Is the Unique Base

Slide the base aa in the visualizer below. The solid blue curve is axa^x and the red dashed curve is its derivative. They are always proportional. The proportionality constant is exactly ln(a)\ln(a). Slide all the way to a=e2.71828a = e \approx 2.71828 and watch the two curves snap together.

Why is e the special base?

Slide the base a. The derivative is always a constant multiple of the function. That constant is ln(a). Only when a = e does it equal 1.

ln(a) = 0.693147
-2-101202468f(x) = a^xf'(x) = ln(a)·a^x
At x = 0
f(0) = 1.0000
f'(0) = 0.6931
ratio = 0.6931
At x = 1
f(1) = 2.0000
f'(1) = 1.3863
ratio = 0.6931
Magic ratio f'/f
0.693147
distance from 1 → 0.3069
The derivative is the function scaled by ln(a) = 0.6931.
Try sliding closer to 2.71828 and watch the two curves merge.

Where ln(a) comes from

Run the same derivation we just did but with a general base aa:

ddxax  =  limh0ax+haxh  =  axlimh0ah1h\frac{d}{dx}\, a^x \;=\; \lim_{h \to 0} \frac{a^{x+h} - a^x}{h} \;=\; a^x \cdot \lim_{h \to 0} \frac{a^h - 1}{h}

The remaining limit defines a number that depends only on aa. Call it k(a)k(a):

k(a)  =  limh0ah1hk(a) \;=\; \lim_{h \to 0} \frac{a^h - 1}{h}

Using ah=ehlnaa^h = e^{h\ln a} and the special limit we just proved:

k(a)  =  limh0ehlna1h  =  ln(a)limu0eu1u  =  ln(a)1  =  ln(a)k(a) \;=\; \lim_{h \to 0} \frac{e^{h \ln a} - 1}{h} \;=\; \ln(a) \cdot \lim_{u \to 0} \frac{e^u - 1}{u} \;=\; \ln(a) \cdot 1 \;=\; \ln(a)

(We substituted u=hlnau = h \ln a, which also tends to zero.) Therefore:

  ddxax  =  ln(a)ax  \boxed{\;\frac{d}{dx}\, a^x \;=\; \ln(a)\,\cdot\, a^x\;}

And the “the function is its own derivative” property happens precisely when ln(a)=1\ln(a) = 1, i.e. when a=ea = e. That is the definition of natural exponential.

Base aln(a)d/dx a^xBehaviour
20.69310.6931 · 2^xDerivative is shorter than function
e ≈ 2.718281.00001 · e^x = e^xDerivative equals function ✓
31.09861.0986 · 3^xDerivative is taller than function
102.30262.3026 · 10^xDerivative is much taller

Geometric Meaning: Slope = Height

Pick any point (a,ea)(a, e^a) on the curve. The tangent line at that point has equation:

yea  =  ea(xa)y - e^a \;=\; e^a \cdot (x - a)

Two consequences worth absorbing:

  1. The tangent at x=ax = a always has slope eae^a, which is the very height of the point of tangency.
  2. That tangent line crosses the x-axis at x=a1x = a - 1 — exactly one unit to the left of the point of tangency, no matter where on the curve you are. (Try it: substitute y=0y = 0 in the tangent equation and solve.) This is a striking self-similarity property unique to exe^x.

Geometric self-similarity

If you stand on the curve at any point and look one unit to your left along the tangent, you are looking at the x-axis. Move along the curve to a new point — the same thing happens again. The curve is self-similar under horizontal translation in a way no other function is.


Worked Example: Compute the Tangent by Hand

Find the tangent line to y=exy = e^x at x=2x = 2. We will compute the slope, write the tangent equation, and check the “one unit to the left” property above.

▶ Click to expand the full hand calculation

Step 1. Locate the point.

y(2)=e27.3890561y(2) = e^{2} \approx 7.3890561

So the point of tangency is (2,  7.3890561)(2,\; 7.3890561).

Step 2. Compute the slope using the rule ddxex=ex\dfrac{d}{dx} e^x = e^x.

m=e27.3890561m = e^{2} \approx 7.3890561

The slope equals the height of the point — exactly the property we proved.

Step 3. Write the tangent line in point-slope form.

y7.3890561  =  7.3890561(x2)y - 7.3890561 \;=\; 7.3890561 \cdot (x - 2)

Expand:

y  =  7.3890561x    7.3890561y \;=\; 7.3890561 \, x \;-\; 7.3890561

So the tangent line is y=e2xe2y = e^{2} \, x - e^{2}, or equivalently y=e2(x1)y = e^{2}(x - 1).

Step 4. Verify the “one unit to the left” property by setting y=0y = 0:

0  =  e2(x1)    x=10 \;=\; e^{2}(x - 1) \;\Longrightarrow\; x = 1

The tangent crosses the x-axis at x=1x = 1, which is 21=12 - 1 = 1 unit to the left of the point of tangency. ✓

Step 5. Sanity-check numerically. Move Δx=0.001\Delta x = 0.001 to the right along the tangent line. The tangent predicts:

Δytangent=mΔx=7.3890561×0.001=0.0073890561\Delta y_{\text{tangent}} = m \cdot \Delta x = 7.3890561 \times 0.001 = 0.0073890561

The actual curve gives:

Δycurve=e2.001e2=0.0073927516\Delta y_{\text{curve}} = e^{2.001} - e^{2} = 0.0073927516

Difference 3.7×106\approx 3.7 \times 10^{-6} — the tangent is indistinguishable from the curve over a step that small. That is what “the derivative is the slope of the curve” means in practice.

Notice we never reached for a calculator beyond evaluating e2e^2. The slope follows for free because the derivative is the function. That ease is the practical reason scientists overwhelmingly use base ee instead of base 2 or base 10.


Tangent Explorer: See It All in One Picture

Drag the purple dot along the curve below and watch the green tangent line stay glued to the curve. The reported slope is always equal to the height — the central property of exe^x. Then shrink the step hh to see the orange secant collapse onto the tangent.

The Derivative of e^x: Visualized

Watch how the secant line approaches the tangent as h → 0

-2-10123024681012h = 0.500f(x)f(x) = e^xSecant lineTangent line
Secant Slope (Difference Quotient)
[f(x+h) - f(x)] / h
= [e^(1.00 + 0.500) - e^1.00] / 0.500
= 3.526814
Tangent Slope (True Derivative)
f'(x) = e^x
= e^1.00
= 2.718282
Error: |Secant - Tangent|
8.0853e-1
As h → 0, error → 0
Secant slope converges to tangent slope!
The Magical Property of e^x
Notice: f(x) = e^x = 2.7183 and f'(x) = e^x = 2.7183
They're the same! The function equals its own derivative.

Chain Rule Preview: e^(kx) and Half-Life

The most common exponential you'll meet in physics, biology, and finance isn't exe^x with growth rate 1 per unit time — it's ekxe^{kx} with some non-unit rate kk. By the chain rule (proved in detail in section 4.7):

ddxekx  =  kekx\frac{d}{dx}\, e^{kx} \;=\; k \, e^{kx}

So ekxe^{kx} is kk times its own derivative. The constant kk is precisely the per-unit-time growth rate.

FunctionDerivativeWhat it models
e^xe^xUnit growth rate
e^(2x)2 e^(2x)Doubling every ln(2)/2 ≈ 0.347 units
e^(-x)-e^(-x)Decay at rate 1
e^(-0.693 t)-0.693 e^(-0.693 t)Radioactive half-life t½ = 1
e^(rt) — Black–Scholesr e^(rt)Continuously compounded return

The differential equation y' = ky

Every “rate of change is proportional to amount” problem reduces to the equation y=kyy' = k\,y, and the solution is y(t)=y0ekty(t) = y_0 \, e^{kt}. We will solve it formally in chapter 11; here, just notice that plugging in confirms it: y(t)=ky0ekt=ky(t)y'(t) = k \, y_0 e^{kt} = k \, y(t). ✓


Real-World Applications

🏦 Continuously compounded interest

Balance B(t)=PertB(t) = P\, e^{rt}. Instantaneous earning rate is B(t)=rB(t)B'(t) = r B(t) — interest per unit time equals rate × current principal.

☢ Radioactive decay

N(t)=N0eλtN(t) = N_0\, e^{-\lambda t}. Number of decays per second is N(t)=λN(t)-N'(t) = \lambda N(t) — proportional to atoms still present.

⚡ RC circuit charging

Voltage across capacitor: V(t)=V0(1et/RC)V(t) = V_0\,(1 - e^{-t/RC}). Charging current i(t)et/RCi(t) \propto e^{-t/RC} decays as the cap fills.

🤖 Softmax in neural networks

σ(zi)=ezijezj\sigma(z_i) = \dfrac{e^{z_i}}{\sum_j e^{z_j}}. Gradients involve σizj\dfrac{\partial \sigma_i}{\partial z_j}, and every term carries the “derivative-equals-itself” signature of exe^x, making backprop simple.

🌡 Newton's law of cooling

Temperature difference ΔT(t)=ΔT0ekt\Delta T(t) = \Delta T_0\, e^{-kt}. The rate of cooling is proportional to the current temperature gap.

🎯 Normal distribution (statistics)

f(x)=12πσe(xμ)22σ2f(x) = \dfrac{1}{\sqrt{2\pi}\sigma}\, e^{-\frac{(x-\mu)^2}{2\sigma^2}}. Differentiating to find the maximum reduces to setting the inner exponent's derivative to zero — the value of exe^x never enters.


Plain Python Implementation

Let's convert everything we proved into code. The script does three things:

  1. Defines a numerical derivative routine using the limit definition.
  2. Compares the numerical derivative of exe^x against exe^x itself at six points.
  3. Watches (eh1)/h(e^h - 1)/h march toward 1 as hh shrinks.
Numerical verification: derivative of e^x equals e^x
🐍derivative_of_exp.py
1Import math

We need math.exp(x) — Python's built-in implementation of e^x. It is computed via a hardware-accelerated power series, so we can treat it as the exact reference value for our experiment.

EXECUTION STATE
math.e = 2.718281828459045
math.exp(0) = 1.0
math.exp(1) = 2.718281828459045
3Step 1 of the experiment

The comment lays out the exact algebra we're about to verify numerically. We start from the limit definition of the derivative, factor out e^x, and recognise that the remaining limit equals 1 — the famous special limit. The conclusion is that e^x is its own derivative.

4Limit definition

(e^(x+h) − e^x) / h is the slope of the secant line through (x, e^x) and (x+h, e^(x+h)). As h shrinks toward 0, that slope becomes the slope of the tangent line — the derivative.

5Factor e^x out

Because e^(x+h) = e^x · e^h, the numerator becomes e^x · (e^h − 1). The e^x factor doesn't depend on h, so it can leave the limit. Everything difficult is now packaged inside the lone factor (e^h − 1)/h.

6Apply the special limit

The very special fact lim h→0 (e^h − 1)/h = 1 is the defining property of e. We will verify it numerically in step 3 below.

7Conclusion

Multiplying e^x by 1 gives e^x back. So the derivative of e^x is e^x. This is the only nonzero function (up to a scalar) with that property.

9Define the numerical derivative

A general-purpose helper. Given x and a small step h, it returns the difference-quotient approximation to f'(x) for f(x) = e^x. Default h = 1e-6 is small enough to be accurate yet large enough to avoid catastrophic floating-point cancellation.

10The difference quotient itself

Implements (e^(x+h) − e^x) / h literally. Try plugging x = 0: we get (e^h − 1)/h, which is exactly the magical limit we will study.

EXAMPLE
At x = 0, h = 1e-6: (e^(1e-6) − 1)/1e-6 ≈ 1.0000005000017
12Step 2 of the experiment

Time to confront theory with measurement. We will compute e^x at several values of x, then compute the numerical derivative at the same x, then compare. If our claim 'derivative equals function' is right, the two columns should match to about 1e-7 precision (limited by floating-point arithmetic, not by our math).

13Print the column headers

The format spec >6 means 'right-align in a 6-character field'. Just cosmetics so the output forms a clean table the reader can scan.

14Iterate over test points

Six values spanning negative, zero, fractional, and positive integers. The point is to show that the function-equals-derivative property holds everywhere, not just at one lucky x.

LOOP TRACE · 6 iterations
x = -1.0
true_val = e^x = 0.3678794412
approx = 0.3678796253
error = 1.84e-07
x = 0.0
true_val = e^x = 1.0000000000
approx = 1.0000005000
error = 5.00e-07
x = 0.5
true_val = e^x = 1.6487212707
approx = 1.6487220950
error = 8.24e-07
x = 1.0
true_val = e^x = 2.7182818285
approx = 2.7182831876
error = 1.36e-06
x = 2.0
true_val = e^x = 7.3890560989
approx = 7.3890597932
error = 3.69e-06
x = 3.0
true_val = e^x = 20.0855369232
approx = 20.0855469670
error = 1.00e-05
15Compute the true value

Use Python's library e^x as ground truth. Inside this loop body the value true_val is the analytic derivative according to our theory.

16Compute the numerical approximation

Calls our helper, which evaluates the secant slope with h = 1e-6. If the theory is right, this should be within rounding error of true_val.

17Compute the absolute error

abs() because we don't care about the sign of the rounding error, only its size. Looking at the iteration table: errors are around 1e-7 to 1e-5 — exactly the size we expect from finite-difference rounding, not from any flaw in the theory.

18Print one row

Format spec :>6.2f reads 'right-align in 6 chars, 2 decimal places'. :>12.2e is scientific notation. The result is a neat table the reader can compare column by column.

20Step 3 of the experiment

Now we strip away the e^x factor and study just the inner limit. If lim h→0 (e^h − 1)/h is really 1, the column we print should converge to 1 as h shrinks.

21What to watch for

Comment foreshadowing the punchline. The numbers will start near 1.6 (for h = 1) and march toward 1.000000 as h shrinks.

22Blank line

Just visual separation between the two tables. print() with no argument emits a newline.

23Print the inner-table headers

Field width 14 fits the small h values cleanly; width 20 leaves room for ten decimal places in the result column.

24Iterate over shrinking h

Each successive h is roughly 10× smaller. This makes the convergence to 1 visually unmistakable — every row should drop two more zeros after the decimal.

LOOP TRACE · 8 iterations
h = 1
math.exp(1) = 2.718281828
value = (e^1 - 1)/1 = 1.7182818285
h = 0.5
math.exp(0.5) = 1.6487212707
value = (e^0.5 - 1)/0.5 = 1.2974425414
h = 0.1
math.exp(0.1) = 1.1051709181
value = 1.0517091808
h = 0.01
math.exp(0.01) = 1.0100501671
value = 1.0050167084
h = 0.001
math.exp(0.001) = 1.0010005002
value = 1.0005001667
h = 0.0001
math.exp(0.0001) = 1.0001000050
value = 1.0000500017
h = 1e-5
value = 1.0000050000
h = 1e-6
value = 1.0000005000
25Evaluate the special limit

For each h we compute (math.exp(h) − 1) / h. Algebraically this is exactly the inner factor we pulled out of the derivative. The iteration trace above shows the value snapping toward 1 — confirming numerically that the limit is 1.

26Print one row

Ten decimal places makes the convergence pattern obvious: 1.71… → 1.05… → 1.005… → 1.0005… — each smaller h gives one more leading 9 (the value is 1 + h/2 + h²/6 + …, so the leading error is h/2).

4 lines without explanation
1import math
2
3# 1. Numerical derivative of e^x via the limit definition.
4#    d/dx e^x = lim h->0  (e^(x+h) - e^x) / h
5#             = e^x * lim h->0  (e^h - 1) / h
6#             = e^x * 1
7#             = e^x   <-- the function is its own derivative
8
9def numerical_derivative_exp(x, h=1e-6):
10    return (math.exp(x + h) - math.exp(x)) / h
11
12# 2. Verify by comparing to e^x itself across a few points.
13print(f"{'x':>6}  {'e^x':>16}  {'approx d/dx e^x':>20}  {'|error|':>12}")
14for x in [-1.0, 0.0, 0.5, 1.0, 2.0, 3.0]:
15    true_val = math.exp(x)
16    approx   = numerical_derivative_exp(x)
17    error    = abs(approx - true_val)
18    print(f"{x:>6.2f}  {true_val:>16.10f}  {approx:>20.10f}  {error:>12.2e}")
19
20# 3. Inspect the magical limit (e^h - 1) / h directly.
21#    Watch the column converge to 1.
22print()
23print(f"{'h':>14}  {'(e^h - 1)/h':>20}")
24for h in [1, 0.5, 0.1, 0.01, 0.001, 0.0001, 1e-5, 1e-6]:
25    value = (math.exp(h) - 1) / h
26    print(f"{h:>14.6f}  {value:>20.10f}")

What the run produces

The first table shows e^x and the numerical derivative agreeing to about seven decimal places at every test point. The second table shows (eh1)/h(e^h - 1)/h converging to 1: at h=106h = 10^{-6} it equals 1.00000050001.0000005000, exactly 1+h/21 + h/2 as predicted by the Taylor series.


PyTorch Verification

Plain Python gave us numerical confirmation. Let's now ask PyTorch's autograd engine — designed to handle the messiest neural-network gradients — to compute d/dx  exd/dx\; e^x for us. We expect the gradient tensor to be bit-for-bit equal to exe^x.

PyTorch autograd reproduces d/dx e^x = e^x
🐍derivative_of_exp_pytorch.py
1Import PyTorch

PyTorch's autograd engine lets us compute exact derivatives by recording every operation in a computational graph. We will use it as a second, independent witness that d/dx e^x = e^x.

3Plan of the experiment

Build a small tensor of test points, run them through exp, ask PyTorch for the gradient, and compare the gradient to e^x itself. If they match exactly, the derivative claim is verified by software written by people who do not know we are testing it.

4Create x with gradient tracking

torch.tensor wraps a plain list of floats into a PyTorch tensor. requires_grad=True tells autograd 'watch every operation that touches this tensor — I will eventually want d/dx of something with respect to it.'

EXECUTION STATE
x = tensor([-1.0, 0.0, 0.5, 1.0, 2.0, 3.0])
x.requires_grad = True
x.grad = None (not yet populated)
6Compute y = e^x

torch.exp applies e^x element-wise. Because x requires grad, y also carries grad-tracking and remembers that it was produced by torch.exp.

7Element-wise exponential

Each y[i] equals e^{x[i]}. PyTorch stores the inverse computation needed for backprop: dy_i/dx_i = e^{x_i}, which is exactly y_i itself.

EXECUTION STATE
y = tensor([0.3679, 1.0000, 1.6487, 2.7183, 7.3891, 20.0855])
y.grad_fn = <ExpBackward0>
9Need a scalar to backprop from

PyTorch's .backward() expects a scalar. We use .sum() because the derivative of a sum is the sum of the derivatives, and each term contributes exactly its own gradient to its own x[i] — no cross-talk between coordinates.

10Why .sum() is safe here

For y_sum = y_0 + y_1 + ... + y_n, partial derivative ∂y_sum/∂x_i equals ∂y_i/∂x_i (the others don't depend on x_i). So x.grad[i] will end up holding exactly d/dx_i e^{x_i} = e^{x_i}.

11Form the scalar

y_sum is now a 0-dimensional tensor (a single number). It still carries grad history — PyTorch remembers it came from summing six exp(...) calls.

EXECUTION STATE
y_sum = tensor(33.1095, grad_fn=<SumBackward0>)
13Trigger backpropagation

y_sum.backward() walks the computation graph in reverse, applying the chain rule. When it reaches x it deposits the accumulated gradient into x.grad. After this single call, every x[i] has its derivative computed.

14.backward() vs torch.autograd.grad()

.backward() writes gradients into the .grad attribute of every leaf tensor that participated. torch.autograd.grad(y_sum, x) would instead return them as a tuple without mutating .grad. Both are valid; we use .backward() because we only have one input tensor and the side-effect style is cleaner here.

EXECUTION STATE
x.grad = tensor([0.3679, 1.0000, 1.6487, 2.7183, 7.3891, 20.0855])
16Compare to the analytic answer

Now we print three lines side by side: the inputs x, the exponentials e^x, and PyTorch's computed gradient. The claim of the section is x.grad should equal e^x exactly. Read the next three lines as the experimental test of that claim.

17Print x

.detach() returns a copy without grad-tracking, .tolist() converts it to a plain Python list for readable printing. Output: [-1.0, 0.0, 0.5, 1.0, 2.0, 3.0].

18Print e^x

These are the function values themselves. We will compare them to the gradient line below.

EXAMPLE
[0.3679, 1.0, 1.6487, 2.7183, 7.3891, 20.0855]
19Print x.grad

These are PyTorch's gradient values. They should be identical to the e^x line above. They are: [0.3679, 1.0, 1.6487, 2.7183, 7.3891, 20.0855]. Confirmation that PyTorch's autograd computes d/dx e^x = e^x.

20Final assertion

torch.allclose returns True if two tensors are element-wise equal within a small tolerance. We print 'matches? : True'. Two independent computations — one hand-derived (e^x), one from a general-purpose autograd engine — agree.

EXAMPLE
matches? : True
5 lines without explanation
1import torch
2
3# Build a tensor x with several test points and ask PyTorch to track gradients.
4x = torch.tensor([-1.0, 0.0, 0.5, 1.0, 2.0, 3.0], requires_grad=True)
5
6# Compute y = e^x element-wise.
7y = torch.exp(x)
8
9# Sum so we have a scalar to backpropagate through.
10# For a sum, d(y_sum)/dx_i = d(e^{x_i})/dx_i = e^{x_i}.
11y_sum = y.sum()
12
13# Trigger autograd. After this call, x.grad[i] holds d(y_sum)/dx_i.
14y_sum.backward()
15
16# Compare PyTorch's autograd to the analytic answer e^x.
17print("x        :", x.detach().tolist())
18print("e^x      :", y.detach().tolist())
19print("x.grad   :", x.grad.tolist())
20print("matches? :", torch.allclose(x.grad, torch.exp(x.detach())))

The final line prints matches?:True\text{matches?} \,:\, \text{True}. Two independent computations — a hand derivation following the limit definition, and a general-purpose automatic-differentiation engine — agree on every digit.

Why this matters for deep learning

Inside every neural network, the chain rule has to propagate gradients through hundreds or thousands of operations. Every time an exponential appears (softmax, sigmoid via σ(x)=1/(1+ex)\sigma(x) = 1/(1 + e^{-x}), attention weights), the framework needs d/dxex=exd/dx\, e^x = e^x as a primitive. The identity forward output=backward gradient\text{forward output} = \text{backward gradient} makes those exponentials computationally cheap — you reuse the cached forward value instead of recomputing.


Common Mistakes

Mistake 1: Applying the Power Rule to e^x

Wrong: ddxex=xex1\dfrac{d}{dx}\, e^x = x \, e^{x-1}

Correct: ddxex=ex\dfrac{d}{dx}\, e^x = e^x

The Power Rule ddxxn=nxn1\dfrac{d}{dx}\, x^n = n\, x^{n-1} applies when the variable is in the base. Here, the variable is in the exponent — entirely different rule.

Mistake 2: Forgetting the chain rule for e^(kx)

Wrong: ddxe5x=e5x\dfrac{d}{dx}\, e^{5x} = e^{5x}

Correct: ddxe5x=5e5x\dfrac{d}{dx}\, e^{5x} = 5\, e^{5x}

The factor of 55 comes from the inner derivative ddx(5x)=5\dfrac{d}{dx}(5x) = 5. Only when the exponent is exactly xx does the chain-rule factor equal 1.

Mistake 3: Confusing e^x with general a^x

Wrong: ddx2x=2x\dfrac{d}{dx}\, 2^x = 2^x

Correct: ddx2x=ln(2)2x0.6932x\dfrac{d}{dx}\, 2^x = \ln(2) \cdot 2^x \approx 0.693 \cdot 2^x

The function-equals-derivative property is unique to base ee. Every other base picks up a factor of ln(a)\ln(a).

Mistake 4: Treating e as a variable

Wrong: ddeex=xex1\dfrac{d}{de}\, e^x = x\, e^{x-1}

Correct: ee is the constant 2.71828…, not a variable. The expression dde\dfrac{d}{de} is meaningless. Only xx varies.


Summary

ConceptStatement
The headline ruled/dx e^x = e^x
The special limitlim h→0 (e^h − 1) / h = 1
General base ad/dx a^x = ln(a) · a^x
Chain rule versiond/dx e^(kx) = k · e^(kx)
Geometric readingAt every point on y = e^x, slope of tangent = height of curve
Tangent x-interceptTangent at (a, e^a) crosses x-axis at x = a − 1 (always 1 unit left)
Differential-equation formy' = y has solution y = C · e^x

One sentence to take away

ee is the unique base for which the exponential function exe^x coincides with its own derivative — a property born from the magical limit (eh1)/h1(e^h - 1)/h \to 1, which in turn is baked into the very definition of ee.

In the next section we generalize: what is the derivative of a general exponential axa^x, and how does ln(a)\ln(a) enter the formula? Spoiler — we already derived it above. We'll just put it to work.

Loading comments...