Learning Objectives
By the end of this section, you will be able to:
- State and prove the rule
- Explain the magical limit and why it defines the number
- Compute the slope of the tangent line to at any point without using a calculator beyond evaluating
- Recognize why is the unique base for which is its own derivative
- Apply the chain-rule version to growth and decay problems
- Verify the derivative numerically in plain Python and via PyTorch autograd
The Big Picture: A Function Equal to Its Own Slope
“Of all functions known to mathematics, only one — up to a constant multiplier — is exactly equal to its own derivative. That function is .”
Stop and let that land. Take any other function you can think of — a polynomial, a sine, a logarithm — and its derivative is some different function. The derivative of is , a different shape entirely. The derivative of is , a shifted wave. But the derivative of is itself:
What this rule actually says
At every point on the curve , the height of the curve equals the slope of its tangent line.
If the curve is 2.7 units above the x-axis, the tangent at that point rises 2.7 units for every 1 unit you move right. If the curve is 20 units up, the tangent has slope 20. The steepness and the value are the same number.
This single fact is the reason shows up everywhere: compound interest, radioactive decay, population growth, RC-circuit charging, Bayesian priors, neural-network softmax outputs, the Schrödinger equation, the normal distribution. Every time a quantity grows at a rate proportional to itself, the answer is dressed in .
Intuition: Money and Bacteria
The bank account that compounds continuously
Imagine a savings account that pays interest continuously at rate per year. After time years your balance is (starting from ).
The rate at which money is added to the account at instant is — interest dollars per year. Common sense says:
So . The bigger the balance, the faster it grows. The function and its derivative are the same. That is the differential equation , and its solution is .
A colony of bacteria
Now replace dollars with bacteria. Each bacterium splits at a constant per-capita rate. If there are bacteria, the population produces new bacteria at rate proportional to itself — twice as many parents, twice as many babies per minute. Same equation, , same solution .
The slogan to remember
Whenever you see “the rate of growth is proportional to the current amount,” the answer is an exponential. Whenever the constant of proportionality is exactly 1, the answer is .
Numerical Discovery
Let's set the theory aside for a moment and measure the slope of by hand at three different x values. We will use the secant-line slope with a small step as a stand-in for the tangent slope:
| x | Height e^x | Numerical slope (h=0.001) | Ratio slope / height |
|---|---|---|---|
| 0 | 1.0000000000 | 1.0005001667 | 1.0005 |
| 1 | 2.7182818285 | 2.7196414762 | 1.0005 |
| 2 | 7.3890560989 | 7.3927514660 | 1.0005 |
| 3 | 20.0855369232 | 20.0955817247 | 1.0005 |
Look at the last column. The ratio is the same number at every ! And as shrinks toward zero that ratio approaches 1. The slope and the height aren't just proportional; they're equal.
That is the entire empirical content of this section. The rest is just turning the observation into a proof.
Derivation from First Principles
Apply the limit definition of the derivative to :
Use the most useful property of the exponential — — to split as :
The factor does not depend on , so it survives the limit as a constant. Pull it outside:
Now the entire question reduces to a single number: what is ?
A clean test
If this limit equals 1, then — done. If it equals any other number , the derivative would be instead. So the value of this one little limit is everything.
The Magical Limit (e^h − 1)/h → 1
Drag the slider below and watch the orange dot fall onto the green dashed line at . The orange curve has a hole at (we'd divide by zero) — but the limiting value is unambiguously 1.
The Magical Limit: (e^h − 1) / h → 1
Drag h toward zero. The whole derivative of e^x rests on this one limit.
| h | (e^h − 1) / h | | value − 1 | |
|---|---|---|
| 1 | 1.7182818285 | 7.18e-1 |
| 0.5 | 1.2974425414 | 2.97e-1 |
| 0.25 | 1.1361016668 | 1.36e-1 |
| 0.1 | 1.0517091808 | 5.17e-2 |
| 0.05 | 1.0254219275 | 2.54e-2 |
| 0.01 | 1.0050167084 | 5.02e-3 |
| 0.005 | 1.0025041719 | 2.50e-3 |
| 0.001 | 1.0005001667 | 5.00e-4 |
| 0.0005 | 1.0002500417 | 2.50e-4 |
| 0.0001 | 1.0000500017 | 5.00e-5 |
Why is the limit 1 and not some other number?
The number can be defined as the unique positive real number for which this limit equals 1. From chapter 1 we also have the equivalent definitions:
- — compound interest as the compounding period vanishes.
- — Euler's power series.
- — the Taylor series for the exponential.
Use definition (3) and plug in :
Divide by :
Now let . Every term except the leading vanishes. The limit is exactly 1.
The fingerprint of e
That leading in the series isn't a coincidence — it is exactly why is the “natural” base for the exponential. Any other base gives a different leading coefficient, namely . We'll see this next.
Why e Is the Unique Base
Slide the base in the visualizer below. The solid blue curve is and the red dashed curve is its derivative. They are always proportional. The proportionality constant is exactly . Slide all the way to and watch the two curves snap together.
Why is e the special base?
Slide the base a. The derivative is always a constant multiple of the function. That constant is ln(a). Only when a = e does it equal 1.
Where ln(a) comes from
Run the same derivation we just did but with a general base :
The remaining limit defines a number that depends only on . Call it :
Using and the special limit we just proved:
(We substituted , which also tends to zero.) Therefore:
And the “the function is its own derivative” property happens precisely when , i.e. when . That is the definition of natural exponential.
| Base a | ln(a) | d/dx a^x | Behaviour |
|---|---|---|---|
| 2 | 0.6931 | 0.6931 · 2^x | Derivative is shorter than function |
| e ≈ 2.71828 | 1.0000 | 1 · e^x = e^x | Derivative equals function ✓ |
| 3 | 1.0986 | 1.0986 · 3^x | Derivative is taller than function |
| 10 | 2.3026 | 2.3026 · 10^x | Derivative is much taller |
Geometric Meaning: Slope = Height
Pick any point on the curve. The tangent line at that point has equation:
Two consequences worth absorbing:
- The tangent at always has slope , which is the very height of the point of tangency.
- That tangent line crosses the x-axis at — exactly one unit to the left of the point of tangency, no matter where on the curve you are. (Try it: substitute in the tangent equation and solve.) This is a striking self-similarity property unique to .
Geometric self-similarity
If you stand on the curve at any point and look one unit to your left along the tangent, you are looking at the x-axis. Move along the curve to a new point — the same thing happens again. The curve is self-similar under horizontal translation in a way no other function is.
Worked Example: Compute the Tangent by Hand
Find the tangent line to at . We will compute the slope, write the tangent equation, and check the “one unit to the left” property above.
▶ Click to expand the full hand calculation
Step 1. Locate the point.
So the point of tangency is .
Step 2. Compute the slope using the rule .
The slope equals the height of the point — exactly the property we proved.
Step 3. Write the tangent line in point-slope form.
Expand:
So the tangent line is , or equivalently .
Step 4. Verify the “one unit to the left” property by setting :
The tangent crosses the x-axis at , which is unit to the left of the point of tangency. ✓
Step 5. Sanity-check numerically. Move to the right along the tangent line. The tangent predicts:
The actual curve gives:
Difference — the tangent is indistinguishable from the curve over a step that small. That is what “the derivative is the slope of the curve” means in practice.
Notice we never reached for a calculator beyond evaluating . The slope follows for free because the derivative is the function. That ease is the practical reason scientists overwhelmingly use base instead of base 2 or base 10.
Tangent Explorer: See It All in One Picture
Drag the purple dot along the curve below and watch the green tangent line stay glued to the curve. The reported slope is always equal to the height — the central property of . Then shrink the step to see the orange secant collapse onto the tangent.
The Derivative of e^x: Visualized
Watch how the secant line approaches the tangent as h → 0
Chain Rule Preview: e^(kx) and Half-Life
The most common exponential you'll meet in physics, biology, and finance isn't with growth rate 1 per unit time — it's with some non-unit rate . By the chain rule (proved in detail in section 4.7):
So is times its own derivative. The constant is precisely the per-unit-time growth rate.
| Function | Derivative | What it models |
|---|---|---|
| e^x | e^x | Unit growth rate |
| e^(2x) | 2 e^(2x) | Doubling every ln(2)/2 ≈ 0.347 units |
| e^(-x) | -e^(-x) | Decay at rate 1 |
| e^(-0.693 t) | -0.693 e^(-0.693 t) | Radioactive half-life t½ = 1 |
| e^(rt) — Black–Scholes | r e^(rt) | Continuously compounded return |
The differential equation y' = ky
Every “rate of change is proportional to amount” problem reduces to the equation , and the solution is . We will solve it formally in chapter 11; here, just notice that plugging in confirms it: . ✓
Real-World Applications
🏦 Continuously compounded interest
Balance . Instantaneous earning rate is — interest per unit time equals rate × current principal.
☢ Radioactive decay
. Number of decays per second is — proportional to atoms still present.
⚡ RC circuit charging
Voltage across capacitor: . Charging current decays as the cap fills.
🤖 Softmax in neural networks
. Gradients involve , and every term carries the “derivative-equals-itself” signature of , making backprop simple.
🌡 Newton's law of cooling
Temperature difference . The rate of cooling is proportional to the current temperature gap.
🎯 Normal distribution (statistics)
. Differentiating to find the maximum reduces to setting the inner exponent's derivative to zero — the value of never enters.
Plain Python Implementation
Let's convert everything we proved into code. The script does three things:
- Defines a numerical derivative routine using the limit definition.
- Compares the numerical derivative of against itself at six points.
- Watches march toward 1 as shrinks.
What the run produces
The first table shows e^x and the numerical derivative agreeing to about seven decimal places at every test point. The second table shows converging to 1: at it equals , exactly as predicted by the Taylor series.
PyTorch Verification
Plain Python gave us numerical confirmation. Let's now ask PyTorch's autograd engine — designed to handle the messiest neural-network gradients — to compute for us. We expect the gradient tensor to be bit-for-bit equal to .
The final line prints . Two independent computations — a hand derivation following the limit definition, and a general-purpose automatic-differentiation engine — agree on every digit.
Why this matters for deep learning
Inside every neural network, the chain rule has to propagate gradients through hundreds or thousands of operations. Every time an exponential appears (softmax, sigmoid via , attention weights), the framework needs as a primitive. The identity makes those exponentials computationally cheap — you reuse the cached forward value instead of recomputing.
Common Mistakes
Mistake 1: Applying the Power Rule to e^x
Wrong:
Correct:
The Power Rule applies when the variable is in the base. Here, the variable is in the exponent — entirely different rule.
Mistake 2: Forgetting the chain rule for e^(kx)
Wrong:
Correct:
The factor of comes from the inner derivative . Only when the exponent is exactly does the chain-rule factor equal 1.
Mistake 3: Confusing e^x with general a^x
Wrong:
Correct:
The function-equals-derivative property is unique to base . Every other base picks up a factor of .
Mistake 4: Treating e as a variable
Wrong:
Correct: is the constant 2.71828…, not a variable. The expression is meaningless. Only varies.
Summary
| Concept | Statement |
|---|---|
| The headline rule | d/dx e^x = e^x |
| The special limit | lim h→0 (e^h − 1) / h = 1 |
| General base a | d/dx a^x = ln(a) · a^x |
| Chain rule version | d/dx e^(kx) = k · e^(kx) |
| Geometric reading | At every point on y = e^x, slope of tangent = height of curve |
| Tangent x-intercept | Tangent at (a, e^a) crosses x-axis at x = a − 1 (always 1 unit left) |
| Differential-equation form | y' = y has solution y = C · e^x |
One sentence to take away
is the unique base for which the exponential function coincides with its own derivative — a property born from the magical limit , which in turn is baked into the very definition of .
In the next section we generalize: what is the derivative of a general exponential , and how does enter the formula? Spoiler — we already derived it above. We'll just put it to work.