Chapter 5
12 min read
Section 42 of 353

Derivatives of General Exponentials: a^x

Derivatives of Transcendental Functions

Learning Objectives

By the end of this short section, you will be able to:

  1. State the rule ddxax=(lna)ax\frac{d}{dx}\,a^{x} = (\ln a)\,a^{x} from memory.
  2. Derive it two ways: rewriting through ee with the chain rule, and directly from the limit definition.
  3. See visually why lna\ln a is the only constant that can possibly appear there.
  4. Use the rule to find tangent lines and instantaneous growth rates for any positive base.
  5. Verify the rule both in plain Python and with PyTorch's automatic differentiation.

The Question We Are Forced to Ask

In the previous section we discovered a small miracle: the derivative of exe^{x} is exe^{x} itself. That is unusually clean — it almost feels like cheating. But the world does not only use base ee.

  • Bacteria double: the natural base of the model is 22, not ee.
  • Earthquakes and pH live on a base-1010 scale.
  • Computer memory grows by halves and doubles — bases 12\tfrac{1}{2} and 22.
  • Radioactive decay is naturally modeled with bases like (1/2)(1/2) (the half-life view).

So the honest question is: how does the slope of axa^{x} behave for a general positive base a>0, a1a > 0,\ a \neq 1? We are about to discover that the answer is almost as clean as the case a=ea = e, just with a single extra multiplicative fingerprint of the base: the number lna\ln a.

The headline result

ddx(ax)  =  (lna)ax\frac{d}{dx}\bigl(a^{x}\bigr) \;=\; (\ln a)\,\cdot\,a^{x}

Read it out loud: “the derivative of axa^{x} is lna\ln a times axa^{x}.” The whole section exists to make that line feel inevitable.


Intuition: Every a^x Is a Stretched e^x

Here is the picture you should carry in your head before any algebra: every exponential function axa^{x} is secretly the function exe^{x} that has been horizontally stretched. If a>ea > e the curve is squeezed tighter (it grows faster); if a<ea < e it is stretched out (it grows slower). The stretch factor is precisely the number lna\ln a.

Why should a horizontal stretch matter for the slope? Because the chain rule says: when you compress the input axis by a factor kk, every slope on the graph gets multiplied by kk. The slope of exe^{x} at the corresponding point is exe^{x}. Rebrand the variable and you get (lna)ax(\ln a)\cdot a^{x}. That is the whole argument in one sentence — everything below just makes it rigorous.

Mental model: the function axa^{x} is “exe^{x} with the clock running at speed lna\ln a.” The slope picks up that clock speed exactly — nothing else.

Rewriting a^x Through e

Any positive number aa can be written as a power of ee, because the natural log is defined to undo e()e^{(\cdot)}:

a=elnaax=(elna)x=e(lna)x.a = e^{\ln a} \quad\Longrightarrow\quad a^{x} = \bigl(e^{\ln a}\bigr)^{x} = e^{(\ln a)\,x}.

That single identity is the bridge from the previous section to this one. It says: every exponential lives inside the exponential function. The base aa hides inside the exponent as the constant multiplier lna\ln a.

Why this rewrite is legal

We are using two facts: (1) the definition of the natural log, elna=ae^{\ln a} = a for a>0a > 0; and (2) the power-of-a-power rule (bp)q=bpq(b^{p})^{q} = b^{pq}. Both are valid for any real exponent, so the rewrite holds for every positive base.


Derivation by the Chain Rule

Now use the chain rule on ax=e(lna)xa^{x} = e^{(\ln a)\,x}. Let u(x)=(lna)xu(x) = (\ln a)\,x, so that ax=eu(x)a^{x} = e^{u(x)}.

ddxax=ddxeu(x)=eu(x)u(x)\frac{d}{dx}\,a^{x} = \frac{d}{dx}\,e^{u(x)} = e^{u(x)}\cdot u'(x)
=e(lna)x(lna)= e^{(\ln a)\,x}\cdot (\ln a)
=(lna)ax.= (\ln a)\,\cdot\, a^{x}.  ∎

Three short lines, and we have the rule for every base at once. The outer derivative gave us back the function (because that is what exe^{x} does), and the inner derivative pulled the constant lna\ln a down. The constant is the fingerprint of the base — everything else is the function itself.

Why this is the clean way: we did not have to re-prove anything. We borrowed the previous section's identity (ex)=ex(e^{x})' = e^{x} and the chain rule from chapter 4. New facts in this section: zero. New consequences: all exponentials.

Derivation from the Limit Definition

For readers who want to see this without invoking the chain rule, here is the same conclusion straight from the difference quotient. At any point xx,

ddxax=limh0ax+haxh=axlimh0ah1h.\frac{d}{dx}\,a^{x} = \lim_{h\to 0}\frac{a^{x+h}-a^{x}}{h} = a^{x}\,\lim_{h\to 0}\frac{a^{h}-1}{h}.

All the xx-dependence factored out into the axa^{x} in front. Whatever the rule's multiplier turns out to be, it must be a constant that depends only on aa. Call it L(a)L(a):

L(a)  =  limh0ah1h.L(a) \;=\; \lim_{h\to 0}\frac{a^{h}-1}{h}.

We claim L(a)=lnaL(a) = \ln a. The fastest proof uses the rewrite we just did:

ah1=e(lna)h1a^{h} - 1 = e^{(\ln a)\,h} - 1
    =(lna)h+12(lna)2h2+\;\;= (\ln a)\,h + \tfrac{1}{2}\,(\ln a)^{2}\,h^{2} + \cdots\quad(Taylor series for ez)
ah1h=lna+12(lna)2h+  h0  lna.\frac{a^{h}-1}{h} = \ln a + \tfrac{1}{2}\,(\ln a)^{2}\,h + \cdots \;\xrightarrow[h\to 0]{}\; \ln a.

So L(a)=lnaL(a) = \ln a and the rule reappears. Notice what the limit equation is telling us geometrically: the constant lna\ln a is just the slope of axa^{x} at x=0x = 0. Stretch the function up by axa^{x} and you get the slope at every other point.


Interactive Visualization

Drag the base aa. The blue curve is f(x)=axf(x) = a^{x}. The dashed red curve is f(x)=(lna)axf'(x) = (\ln a)\,a^{x}. Watch the red curve as you slide aa through e2.718e \approx 2.718: it passes through the blue curve. That moment is the only moment where lna=1\ln a = 1, and it is exactly why ee is the “natural” base.

d/dx (a^x) = (ln a) · a^x — Interactive Explorer

Drag the base. Watch the multiplier ln a stretch the derivative curve, and notice it lands exactly on the function when a = e.

-2-10123-202468f(x) = a^xf'(x) = (ln a) · a^xTangent at x
ln a (the stretch factor)
0.6931
f(x) = a^x
2.0000
f'(x) = (ln a) · a^x
1.3863
Flatter than e^x at every point
Because 0 < ln a < 1, the derivative is a shrunken copy of the function.

Two things to verify with the sliders:

  • For a>ea > e the dashed red derivative sits above the blue function — the slope is bigger than the value.
  • For 1<a<e1 < a < e the red curve sits below the blue curve — the slope is a shrunken copy of the function.
  • For a<1a < 1 the constant lna\ln a becomes negative; the red curve flips below the x-axis and the function decays.

Why the Multiplier Is ln a (Not Something Else)

It is worth pausing on the question that confuses every student the first time: why log? Why not square root, why not the base itself? Here is the cleanest answer the limit gives us.

We just saw L(a)=limh0(ah1)/hL(a)=\lim_{h\to 0}(a^{h}-1)/h is the slope of axa^{x} at the origin. Two properties pin it down completely:

  1. It is additive in exponents. The product rule for exponents gives (ab)h=ahbh(ab)^{h} = a^{h}b^{h}, and a short calculation shows L(ab)=L(a)+L(b)L(ab) = L(a) + L(b). A function turning products into sums is a logarithm.
  2. It equals 11 at a=ea = e. Because (ex)=ex(e^{x})' = e^{x}, the slope of exe^{x} at the origin is 11, so L(e)=1L(e) = 1.

There is exactly one continuous function on the positive reals satisfying both: the natural logarithm ln\ln. So the multiplier was never a choice — it was forced.

Why ln a? The Hidden Limit Inside Every a^x

The slope of a^x at x = 0 is the limit of (a^h − 1) / h. Shrink h and watch the orange dashed curve fold onto the blue ln a curve.

a = e12e3456-1012true slope: ln aestimate: (a^h − 1) / h
True slope ln a
0.693147
Estimate (a^h − 1) / h
0.717735
|error|
2.459e-2

Slide aa until the dashed orange curve and the solid blue curve meet at y=1y = 1. The crossing point on the x-axis is, by construction, a=ea = e.


Worked Example: Doubling Time and Tangents

Let us do one full numerical walk-through that ties intuition, formula, and computation together.

Problem. A population grows according to P(t)=2tP(t) = 2^{t}, where tt is measured in days. (a) How fast is the population growing at t=3t = 3? (b) Write the tangent line to PP at t=3t = 3. (c) Use that tangent line to estimate P(3.1)P(3.1), and compare to the true value.
Click to expand the hand-computation

Step 1 — Apply the rule. With a=2a = 2,

P(t)=(ln2)2t.P'(t) = (\ln 2)\,\cdot\,2^{t}.

Step 2 — Evaluate at t=3t = 3.

P(3)=(ln2)23=(ln2)8.P'(3) = (\ln 2)\cdot 2^{3} = (\ln 2)\cdot 8.

Numerically ln20.693147\ln 2 \approx 0.693147, so P(3)5.545177P'(3) \approx 5.545177 — the population is growing by roughly 5.5 individuals per day at t=3t = 3.

Step 3 — Tangent line. The point on the curve is (3,  P(3))=(3,8)\bigl(3,\;P(3)\bigr) = (3,\,8). So the tangent has the point-slope equation

L(t)=8+(ln2)8(t3).L(t) = 8 + (\ln 2)\cdot 8\,(t - 3).

Step 4 — Linear estimate at t=3.1t = 3.1.

L(3.1)=8+(ln2)80.18+0.5545188.554518.L(3.1) = 8 + (\ln 2)\cdot 8\cdot 0.1 \approx 8 + 0.554518 \approx 8.554518.

Step 5 — Compare. The true value is P(3.1)=23.18.574188P(3.1) = 2^{3.1} \approx 8.574188. The tangent under-estimates by about 0.01970.0197 — an error of roughly 0.23%. For such a fast-growing function over a tenth of a day, the local linear model is excellent. This is why engineers love tangent lines: they replace a hard exponential with a one-line multiplication and still get three accurate digits.

QuantitySymbolicNumeric
Function valueP(3) = 2^38.000000
Rate of changeP'(3) = (ln 2)·85.545177
Tangent at 3.1L(3.1)8.554518
True P(3.1)2^3.18.574188
Linear error|P - L|≈ 0.019670

Shortcut Rules and Common Cases

Once you have (ax)=(lna)ax(a^{x})' = (\ln a)\,a^{x}, every related rule falls out by the chain rule. The ones worth memorising:

FunctionDerivativeWhy
a^x(ln a) · a^xthis section
a^(kx)k (ln a) · a^(kx)chain rule with inner u = kx
a^(g(x))(ln a) · a^(g(x)) · g'(x)chain rule with inner g
e^xe^xspecial case ln e = 1
2^x(ln 2) · 2^x ≈ 0.6931 · 2^xa = 2
10^x(ln 10) · 10^x ≈ 2.3026 · 10^xa = 10
(1/2)^x(ln 0.5) · (1/2)^x ≈ -0.6931 · (1/2)^xdecay base
Do not confuse axa^{x} (exponential, variable in the exponent) with xax^{a} (power function, variable in the base). They are completely different rules:
  • ddxax=(lna)ax\dfrac{d}{dx}\,a^{x} = (\ln a)\,a^{x} — an exponential rule.
  • ddxxa=axa1\dfrac{d}{dx}\,x^{a} = a\,x^{a-1} — the power rule from chapter 4.
At a=2, x=3a = 2,\ x = 3 the first gives ≈ 5.545 and the second gives 12. Same letters, different answers — the moving variable matters.

Python: Verifying the Rule Numerically

We will build intuition with plain Python before reaching for any framework. The goal is not to compute (lna)ax(\ln a) \cdot a^{x} — that is one line. The goal is to cross-check the formula against a brute-force numerical slope so we trust it physically, not just symbolically.

Cross-checking d/dx (a^x) numerically
🐍general_exponential_derivative.py
1Why we import math

We need math.log for the natural logarithm ln. In Python, math.log(x) is the *natural* log by default — not log base 10. That is exactly the function the derivative rule asks for.

EXAMPLE
math.log(math.e) == 1.0  # by definition
3The function signature: three knobs

a is the base (any positive number except 1), x is the point where we want the slope, and h is the tiny step we will use to *cross-check* the rule numerically. Defaulting h to 1e-6 gives 6-digit agreement with the analytic answer for well-behaved x.

14f(x) — the function value at the point

This is just a**x evaluated at the chosen x. We compute it once because both the closed-form rule and the numerical estimate refer back to it. For a=2, x=3 this is 8.

EXAMPLE
2 ** 3  ==  8
15f'(x) — the closed-form derivative we are teaching

This is the single line that encodes the rule of this whole section: d/dx (a^x) = (ln a) · a^x. Notice it is literally the function value multiplied by the constant ln a. For a=2, x=3 that is ln(2) · 8 = 0.6931… · 8 ≈ 5.5452.

EXAMPLE
math.log(2) * 2**3  →  5.5451774444...
16Symmetric numerical slope — the witness

This is a centered difference quotient. For tiny h, [a^(x+h) − a^(x−h)] / (2h) approximates f'(x) with error proportional to h². If the rule is correct, this number must match line 15 to many digits. It is our independent witness.

EXAMPLE
Centered ≈ true slope + O(h²)
18Return a labelled dictionary

We deliberately return all of a, x, f(x), ln a, the rule's answer, and the numerical answer, plus their absolute error. That lets the reader *see* the rule working — not just trust the formula.

29Test #1 — base 2 at x = 3

Hand-check: f(3) = 2^3 = 8. f'(3) = ln(2) · 8 ≈ 0.693147 · 8 = 5.545177. The 'numerical' line should print 5.545177… with abs error around 1e-12.

34Test #2 — base e (the magical case)

Here ln a = ln e = 1 *exactly*. The rule collapses to f'(x) = f(x). At x = 1 both f(x) and f'(x) print as 2.71828… — the function literally equals its own derivative. This is *why* e is the natural base.

39Test #3 — decay base 1/2

ln(0.5) = −ln(2) ≈ −0.6931 is negative, so f'(x) = (ln 0.5) · (0.5)^x is negative everywhere. The function is decreasing, and the rule encodes that sign automatically — we did not have to special-case it.

EXAMPLE
f'(2) = ln(0.5) * 0.25  ≈  -0.1733
33 lines without explanation
1import math
2
3def derivative_a_to_x(a: float, x: float, h: float = 1e-6) -> dict:
4    """
5    Compare three things at the same point:
6
7      1. The closed-form derivative:   f'(x) = (ln a) * a**x
8      2. A symmetric numerical slope:  [a**(x+h) - a**(x-h)] / (2h)
9      3. The function value itself:    f(x) = a**x
10
11    The rule says (1) = (2) up to floating-point noise, and (1) equals (3)
12    only when ln a == 1, i.e. exactly when a == e.
13    """
14    f_x        = a ** x                                  # the function value
15    f_prime    = math.log(a) * f_x                       # closed-form rule
16    f_numeric  = (a ** (x + h) - a ** (x - h)) / (2 * h) # symmetric estimate
17
18    return {
19        "a":          a,
20        "x":          x,
21        "f(x)":       f_x,
22        "ln(a)":      math.log(a),
23        "f'(x) rule": f_prime,
24        "numerical":  f_numeric,
25        "abs error":  abs(f_prime - f_numeric),
26    }
27
28
29# 1) f(x) = 2^x at x = 3.  f(3) = 8, f'(3) = ln(2) * 8 ≈ 5.5452
30print("--- 2^x at x = 3 ---")
31for k, v in derivative_a_to_x(2.0, 3.0).items():
32    print(f"  {k:>10}: {v}")
33
34# 2) f(x) = e^x at x = 1.  Here ln(a) = 1, so f'(x) = f(x).
35print("\n--- e^x at x = 1 ---")
36for k, v in derivative_a_to_x(math.e, 1.0).items():
37    print(f"  {k:>10}: {v}")
38
39# 3) f(x) = (1/2)^x at x = 2.  Decay: ln(a) = -ln(2) < 0, so slope is negative.
40print("\n--- (1/2)^x at x = 2 ---")
41for k, v in derivative_a_to_x(0.5, 2.0).items():
42    print(f"  {k:>10}: {v}")

What you should see when you run this

  • For a=2, x=3a=2,\ x=3: f'(x) rule ≈ 5.545177, numerical ≈ 5.545177, error ≈ 1e-12.
  • For a=e, x=1a=e,\ x=1: f(x), f'(x), and the numerical estimate all print 2.71828… — the function is its derivative.
  • For a=1/2, x=2a=1/2,\ x=2: the derivative is negative, around 0.1733-0.1733. The sign comes for free from ln(1/2)<0\ln(1/2) < 0.

PyTorch: Autograd Knows the Same Rule

Now let us check the same identity with the tool that powers modern deep learning. PyTorch's autograd does not know any rule by name — it composes elementary derivatives at runtime. So if (ax)=(lna)ax(a^{x})' = (\ln a)\,a^{x} were wrong, every model that ever used a learnable exponent would be wrong too. Spoiler: it isn't.

PyTorch confirms d/dx (a^x) = (ln a) a^x
🐍autograd_check.py
1Why PyTorch for a calculus rule?

Autograd is a derivative engine. Every neural-network gradient you have ever heard of is the chain rule applied automatically. If we wire up f = a^x and call .backward(), PyTorch will hand us exactly the number our rule predicts — a perfect cross-check.

3Pick a base that is NOT e

We use a = 2 so the multiplier ln a ≈ 0.693 is clearly different from 1. If we used a = e the rule would collapse to f'(x) = f(x) and the multiplier would be invisible — interesting but less convincing as a test.

4requires_grad=True on x

Autograd only tracks tensors that ask to be tracked. x is the variable we are differentiating with respect to, so it is the one that needs requires_grad=True. a is a constant here.

6f = a ** x — the forward pass

PyTorch computes 2^3 = 8 and silently records the operation in a graph. The graph remembers 'this output came from a power with base a and exponent x' so it can replay the derivative later.

7f.backward() — the rule fires

This is the moment of truth. Backward applies the same identity we proved on paper: d/dx (a^x) = (ln a) · a^x. The result is deposited into x.grad. We never typed the formula — autograd derived it.

10Compute the analytic answer by hand

torch.log is the natural log. We detach x because we only want the *value* 2^3, not a node in the graph. The result is (ln 2) · 8 ≈ 5.5452 — the very number the rule predicts.

EXAMPLE
(ln 2) * 2**3 ≈ 0.6931 * 8 ≈ 5.5452
12Print to compare

Both x.grad and analytic should match to roughly 1e-7 (float32 precision). If they didn't, *either* our rule is wrong *or* PyTorch's autograd is — and after a million users, autograd is not wrong.

18Same operation, different variable

Now we make a the leaf with requires_grad=True and treat x as a constant. The expression a^x is now viewed as a *power* function in a, so its derivative follows the power rule from chapter 4: x · a^(x-1). Same Python code, completely different rule — the role of variable matters.

EXAMPLE
d/da (a^3) at a=2 = 3 * 2^2 = 12
22The numeric check

AD gives 12.0, and x · a^(x-1) also gives 12.0. This drives the deepest point home: 'derivative of a^x' is *ambiguous* until you say which letter is varying. Section 5.2 fixes that letter to be x.

16 lines without explanation
1import torch
2
3# We deliberately pick a base a != e to make the multiplier (ln a) visible.
4a = torch.tensor(2.0)
5x = torch.tensor(3.0, requires_grad=True)   # x must be a leaf with grad
6
7f = a ** x                                  # f = 2^x at x = 3 -> 8.0
8f.backward()                                # ask autograd for df/dx
9
10# Closed-form answer for comparison
11analytic = torch.log(a) * a ** x.detach()   # (ln 2) * 2^3 = 5.5451...
12
13print(f"f(x)         = {f.item():.6f}")     # 8.000000
14print(f"x.grad (AD)  = {x.grad.item():.6f}")# 5.545177
15print(f"analytic     = {analytic.item():.6f}")
16print(f"abs error    = {abs(x.grad.item() - analytic.item()):.2e}")
17
18# A second example: gradient w.r.t. the BASE a, not x.
19a2 = torch.tensor(2.0, requires_grad=True)
20x2 = torch.tensor(3.0)
21g  = a2 ** x2                               # g = a^x as a function of a
22g.backward()
23# Power rule (in a, treating x as constant): dg/da = x * a^(x-1)
24print(f"\ndg/da (AD)   = {a2.grad.item():.6f}")  # 3 * 2^2 = 12.0
25print(f"x * a^(x-1)  = {(x2 * a2.detach() ** (x2 - 1)).item():.6f}")
Two derivatives, one expression: the same Python line a ** x has two completely different derivatives depending on which leaf carries requires_grad=True. With respect to xx we get our new exponential rule; with respect to aa we get the power rule. That is the cleanest mental check that “exponential” and “power” really are distinct families of functions.

Where This Rule Lives in the Real World

1. Doubling and halving processes

Anywhere a quantity doubles in a fixed window — cell division, Moore's law, viral spread — the model is N(t)=N02t/TN(t) = N_{0}\,2^{t/T} for some doubling time TT. Its instantaneous growth rate is

N(t)=ln2TN(t).N'(t) = \frac{\ln 2}{T}\,N(t).

The constant (ln2)/T(\ln 2)/T is what epidemiologists call the growth rate. It is the section's rule with the base 2 and the chain rule applied to t/Tt/T.

2. Half-life of radioactive isotopes

Carbon-14 decays as N(t)=N0(1/2)t/T1/2N(t) = N_{0}\,(1/2)^{t/T_{1/2}}. Then

N(t)=ln2T1/2N(t).N'(t) = -\frac{\ln 2}{T_{1/2}}\,N(t).

The negative sign comes straight out of ln(1/2)=ln2\ln(1/2) = -\ln 2. The rule encoded decay and growth with the same formula — the sign of the slope is the sign of lna\ln a.

3. Earthquakes, decibels, pH — base-10 scales

The seismic moment magnitude scale is base-10. A model of energy release like E(M)=E0101.5ME(M) = E_{0}\cdot 10^{1.5 M} has

dEdM=1.5(ln10)E0101.5M3.4538E(M).\frac{dE}{dM} = 1.5\,(\ln 10)\,E_{0}\cdot 10^{1.5 M} \approx 3.4538\,E(M).

Going up one magnitude multiplies the energy by 101.531.610^{1.5} \approx 31.6; the derivative tells us the local sensitivity at every magnitude.

4. Machine learning: learnable bases and temperatures

Temperature-scaled softmax, piazip_{i} \propto a^{z_{i}}, makes the base aa a tunable knob. Backpropagating through it — for example, learning the temperature — relies on exactly this rule. Likewise, any layer that uses bases other than ee (rare but possible in custom architectures) needs (lna)ax(\ln a)\,a^{x} to propagate gradients correctly.


Common Mistakes

  1. Dropping the lna\ln a. The most frequent error is writing (2x)=2x(2^{x})' = 2^{x}. That is true only for a=ea = e. For every other base the fingerprint lna\ln a must be there.
  2. Using log10\log_{10} instead of ln\ln. The rule is (ax)=(lna)ax(a^{x})' = (\ln a)\,a^{x}, with the natural log. Using base 10 will make every answer off by a factor of ln102.3026\ln 10 \approx 2.3026.
  3. Confusing exponential and power rules. When the variable is in the exponent, use the exponential rule; when the variable is in the base, use the power rule. The previous PyTorch example showed both rules applied to the same code — the only difference was which tensor required grad.
  4. Forgetting the chain rule on ag(x)a^{g(x)}. The derivative is (lna)ag(x)g(x)(\ln a)\,a^{g(x)}\,g'(x), not just (lna)ag(x)(\ln a)\,a^{g(x)}. People often nail the first factor and then drop the inner derivative.
  5. Treating a<1a < 1 as a special case. It is not. The same formula handles decay automatically because lna\ln a is negative.

Summary

The Derivative of a^x in one line

ddxax  =  (lna)ax,a>0, a1.\frac{d}{dx}\,a^{x} \;=\; (\ln a)\,\cdot\,a^{x},\qquad a > 0,\ a \neq 1.

Key takeaways

  1. Every exponential rewrites as ax=e(lna)xa^{x} = e^{(\ln a)\,x}. That single identity reduces this entire section to the chain rule.
  2. The constant lna\ln a is the slope of axa^{x} at the origin and the fingerprint of the base.
  3. The base ee is “natural” precisely because lne=1\ln e = 1 makes the multiplier disappear — nothing more, nothing less.
  4. Sign of growth, rate of decay, and instantaneous sensitivity to the input are all encoded in that one constant lna\ln a.
  5. The rule is consistent with both elementary numerical slopes (plain Python) and full automatic differentiation (PyTorch).
Coming next: we invert the picture and ask what is the derivative of lnx\ln x? The answer is going to be startlingly simple — and the “lna\ln a” we just discovered is no accident.
Loading comments...