Learning Objectives
By the end of this short section, you will be able to:
- State the rule from memory.
- Derive it two ways: rewriting through with the chain rule, and directly from the limit definition.
- See visually why is the only constant that can possibly appear there.
- Use the rule to find tangent lines and instantaneous growth rates for any positive base.
- Verify the rule both in plain Python and with PyTorch's automatic differentiation.
The Question We Are Forced to Ask
In the previous section we discovered a small miracle: the derivative of is itself. That is unusually clean — it almost feels like cheating. But the world does not only use base .
- Bacteria double: the natural base of the model is , not .
- Earthquakes and pH live on a base- scale.
- Computer memory grows by halves and doubles — bases and .
- Radioactive decay is naturally modeled with bases like (the half-life view).
So the honest question is: how does the slope of behave for a general positive base ? We are about to discover that the answer is almost as clean as the case , just with a single extra multiplicative fingerprint of the base: the number .
The headline result
Read it out loud: “the derivative of is times .” The whole section exists to make that line feel inevitable.
Intuition: Every a^x Is a Stretched e^x
Here is the picture you should carry in your head before any algebra: every exponential function is secretly the function that has been horizontally stretched. If the curve is squeezed tighter (it grows faster); if it is stretched out (it grows slower). The stretch factor is precisely the number .
Why should a horizontal stretch matter for the slope? Because the chain rule says: when you compress the input axis by a factor , every slope on the graph gets multiplied by . The slope of at the corresponding point is . Rebrand the variable and you get . That is the whole argument in one sentence — everything below just makes it rigorous.
Rewriting a^x Through e
Any positive number can be written as a power of , because the natural log is defined to undo :
That single identity is the bridge from the previous section to this one. It says: every exponential lives inside the exponential function. The base hides inside the exponent as the constant multiplier .
Why this rewrite is legal
We are using two facts: (1) the definition of the natural log, for ; and (2) the power-of-a-power rule . Both are valid for any real exponent, so the rewrite holds for every positive base.
Derivation by the Chain Rule
Now use the chain rule on . Let , so that .
Three short lines, and we have the rule for every base at once. The outer derivative gave us back the function (because that is what does), and the inner derivative pulled the constant down. The constant is the fingerprint of the base — everything else is the function itself.
Derivation from the Limit Definition
For readers who want to see this without invoking the chain rule, here is the same conclusion straight from the difference quotient. At any point ,
All the -dependence factored out into the in front. Whatever the rule's multiplier turns out to be, it must be a constant that depends only on . Call it :
We claim . The fastest proof uses the rewrite we just did:
So and the rule reappears. Notice what the limit equation is telling us geometrically: the constant is just the slope of at . Stretch the function up by and you get the slope at every other point.
Interactive Visualization
Drag the base . The blue curve is . The dashed red curve is . Watch the red curve as you slide through : it passes through the blue curve. That moment is the only moment where , and it is exactly why is the “natural” base.
d/dx (a^x) = (ln a) · a^x — Interactive Explorer
Drag the base. Watch the multiplier ln a stretch the derivative curve, and notice it lands exactly on the function when a = e.
Two things to verify with the sliders:
- For the dashed red derivative sits above the blue function — the slope is bigger than the value.
- For the red curve sits below the blue curve — the slope is a shrunken copy of the function.
- For the constant becomes negative; the red curve flips below the x-axis and the function decays.
Why the Multiplier Is ln a (Not Something Else)
It is worth pausing on the question that confuses every student the first time: why log? Why not square root, why not the base itself? Here is the cleanest answer the limit gives us.
We just saw is the slope of at the origin. Two properties pin it down completely:
- It is additive in exponents. The product rule for exponents gives , and a short calculation shows . A function turning products into sums is a logarithm.
- It equals at . Because , the slope of at the origin is , so .
There is exactly one continuous function on the positive reals satisfying both: the natural logarithm . So the multiplier was never a choice — it was forced.
Why ln a? The Hidden Limit Inside Every a^x
The slope of a^x at x = 0 is the limit of (a^h − 1) / h. Shrink h and watch the orange dashed curve fold onto the blue ln a curve.
Slide until the dashed orange curve and the solid blue curve meet at . The crossing point on the x-axis is, by construction, .
Worked Example: Doubling Time and Tangents
Let us do one full numerical walk-through that ties intuition, formula, and computation together.
Problem. A population grows according to , where is measured in days. (a) How fast is the population growing at ? (b) Write the tangent line to at . (c) Use that tangent line to estimate , and compare to the true value.
Click to expand the hand-computation
Step 1 — Apply the rule. With ,
Step 2 — Evaluate at .
Numerically , so — the population is growing by roughly 5.5 individuals per day at .
Step 3 — Tangent line. The point on the curve is . So the tangent has the point-slope equation
Step 4 — Linear estimate at .
Step 5 — Compare. The true value is . The tangent under-estimates by about — an error of roughly 0.23%. For such a fast-growing function over a tenth of a day, the local linear model is excellent. This is why engineers love tangent lines: they replace a hard exponential with a one-line multiplication and still get three accurate digits.
| Quantity | Symbolic | Numeric |
|---|---|---|
| Function value | P(3) = 2^3 | 8.000000 |
| Rate of change | P'(3) = (ln 2)·8 | 5.545177 |
| Tangent at 3.1 | L(3.1) | 8.554518 |
| True P(3.1) | 2^3.1 | 8.574188 |
| Linear error | |P - L| | ≈ 0.019670 |
Shortcut Rules and Common Cases
Once you have , every related rule falls out by the chain rule. The ones worth memorising:
| Function | Derivative | Why |
|---|---|---|
| a^x | (ln a) · a^x | this section |
| a^(kx) | k (ln a) · a^(kx) | chain rule with inner u = kx |
| a^(g(x)) | (ln a) · a^(g(x)) · g'(x) | chain rule with inner g |
| e^x | e^x | special case ln e = 1 |
| 2^x | (ln 2) · 2^x ≈ 0.6931 · 2^x | a = 2 |
| 10^x | (ln 10) · 10^x ≈ 2.3026 · 10^x | a = 10 |
| (1/2)^x | (ln 0.5) · (1/2)^x ≈ -0.6931 · (1/2)^x | decay base |
- — an exponential rule.
- — the power rule from chapter 4.
Python: Verifying the Rule Numerically
We will build intuition with plain Python before reaching for any framework. The goal is not to compute — that is one line. The goal is to cross-check the formula against a brute-force numerical slope so we trust it physically, not just symbolically.
What you should see when you run this
- For : f'(x) rule ≈ 5.545177, numerical ≈ 5.545177, error ≈ 1e-12.
- For : f(x), f'(x), and the numerical estimate all print 2.71828… — the function is its derivative.
- For : the derivative is negative, around . The sign comes for free from .
PyTorch: Autograd Knows the Same Rule
Now let us check the same identity with the tool that powers modern deep learning. PyTorch's autograd does not know any rule by name — it composes elementary derivatives at runtime. So if were wrong, every model that ever used a learnable exponent would be wrong too. Spoiler: it isn't.
a ** x has two completely different derivatives depending on which leaf carries requires_grad=True. With respect to we get our new exponential rule; with respect to we get the power rule. That is the cleanest mental check that “exponential” and “power” really are distinct families of functions.Where This Rule Lives in the Real World
1. Doubling and halving processes
Anywhere a quantity doubles in a fixed window — cell division, Moore's law, viral spread — the model is for some doubling time . Its instantaneous growth rate is
The constant is what epidemiologists call the growth rate. It is the section's rule with the base 2 and the chain rule applied to .
2. Half-life of radioactive isotopes
Carbon-14 decays as . Then
The negative sign comes straight out of . The rule encoded decay and growth with the same formula — the sign of the slope is the sign of .
3. Earthquakes, decibels, pH — base-10 scales
The seismic moment magnitude scale is base-10. A model of energy release like has
Going up one magnitude multiplies the energy by ; the derivative tells us the local sensitivity at every magnitude.
4. Machine learning: learnable bases and temperatures
Temperature-scaled softmax, , makes the base a tunable knob. Backpropagating through it — for example, learning the temperature — relies on exactly this rule. Likewise, any layer that uses bases other than (rare but possible in custom architectures) needs to propagate gradients correctly.
Common Mistakes
- Dropping the . The most frequent error is writing . That is true only for . For every other base the fingerprint must be there.
- Using instead of . The rule is , with the natural log. Using base 10 will make every answer off by a factor of .
- Confusing exponential and power rules. When the variable is in the exponent, use the exponential rule; when the variable is in the base, use the power rule. The previous PyTorch example showed both rules applied to the same code — the only difference was which tensor required grad.
- Forgetting the chain rule on . The derivative is , not just . People often nail the first factor and then drop the inner derivative.
- Treating as a special case. It is not. The same formula handles decay automatically because is negative.
Summary
The Derivative of a^x in one line
Key takeaways
- Every exponential rewrites as . That single identity reduces this entire section to the chain rule.
- The constant is the slope of at the origin and the fingerprint of the base.
- The base is “natural” precisely because makes the multiplier disappear — nothing more, nothing less.
- Sign of growth, rate of decay, and instantaneous sensitivity to the input are all encoded in that one constant .
- The rule is consistent with both elementary numerical slopes (plain Python) and full automatic differentiation (PyTorch).
Coming next: we invert the picture and ask what is the derivative of ? The answer is going to be startlingly simple — and the “” we just discovered is no accident.