Learning Objectives
By the end of this section you will be able to:
- State and use the identity for every .
- Explain why the slope of the log curve is the reciprocal of the input, both geometrically (area under ) and algebraically (inverse of ).
- Combine the rule with the chain rule to differentiate for any positive function .
- Avoid the classic mistakes — sign errors with , undefined values at , and treating as a linear function.
- Verify the formula numerically with finite differences and symbolically with PyTorch's autograd.
The Question Behind the Section
We already know how steep the curve is at every point. Question: how steep is its mirror image — the curve — at every point?
Section 5.1 gave us our first transcendental derivative: . The exponential is the unique function whose slope equals its own value. That is a deep statement, but it leaves a sibling question hanging.
Pull the graph of in front of you and rotate it across the line . The mirror image is the natural logarithm, . Wherever the exponential was steep, the log is shallow. Where the exponential was shallow (near ), the log is steep (near ). Slopes get exchanged with their reciprocals — that is what happens when you flip a curve across .
So we already suspect the answer. The rest of the section makes it precise, explains why from two independent angles (an area picture and an algebra proof), gives you a 3Blue1Brown-style playground to feel it, walks a worked example by hand, and finally checks the formula in Python and PyTorch.
The headline
For every :
Read it in English: the slope of the log curve at a point is one over that point. At the slope is . At it is . The curve never quite stops rising but it gets flatter and flatter — exactly the behaviour the formula predicts.
Geometric Definition: ln Is an Area
Many textbooks define as “the inverse of ”. That is fine for computation but it hides what the log really is. There is a more honest definition — a definition that makes the derivative formula obvious:
In plain English: the natural logarithm of a positive number is the area trapped between the curve and the -axis, fromup to . If , we are accumulating area going to the right and the result is positive. If , we are travelling backwards along the axis, so the area picks up a minus sign — and comes out negative, exactly as you remember.
Why this is a legitimate definition
The integrand is positive and continuous for every , so the integral exists for every positive . From this single integral you can derive every property of the logarithm — , , — by changing the variable inside the integral. It is the cleanest starting point in all of calculus for the logarithm.
The picture in three sentences
- Draw the hyperbola on the positive -axis.
- Anchor the left edge at ; this is where the area starts counting (and where ).
- Slide the right edge to . The shaded region between 1 and is exactly .
Interactive: Slide a, Watch ln(a) Grow
The interactive below makes the definition tangible. Drag the slider for and watch the shaded area change in real time. The number underneath the picture is the value of computed by the area definition — it agrees with your calculator to the last decimal place.
Three things to try, in order:
- Slide right until the “Snap a = e” button rounds you to . The shaded area should read . That is the area-based definition of : the number whose log is one.
- Slide below . The shaded region turns red, because area travelled right-to-left counts negative, and the printed value of turns negative.
- Turn on the orange differential strip and look at the three numbers under the picture. The strip's area (shown in amber) is almost identical to (in magenta). That tiny mismatch shrinks to zero as shrinks — and that is literally the derivative.
From Area to Derivative: The Rate of Growth
We now turn the picture into a one-line proof. Start with the definition:
The right-hand side is a function defined by an integral with a moving upper limit. The Fundamental Theorem of Calculus, Part 1 (from of this book), says exactly how its derivative behaves:
FTC Part 1. If for a continuous integrand , then .
Apply that with and :
That is the whole proof. One line, because the definition was the right one.
The same argument in plain words
Let the shaded area equal . Now nudge the right edge from to . The added strip is so thin that its top is essentially a horizontal segment of height . So the strip's area is approximately:
Divide both sides by and send . The left-hand side becomes the derivative; the right-hand side becomes . Done.
Why the formula has no constants
Notice no factors of , , or anything else appeared. The derivative of the natural logarithm is the clean function precisely because the integrand inside 's definition is . Other log bases will pick up a constant — we will see that in §5.4.
Two Rigorous Proofs (Inverse Function & Limit)
The FTC proof above is the cleanest, but it assumes you accept the area definition. If instead you take to be defined as the inverse of , you can still recover by two short arguments. Both are worth knowing.
Proof 1 — Inverse-function rule (the algebra route)
Let . Then by the definition of inverse, . Differentiate both sides of with respect to and use the chain rule on the right-hand side:
Solve for :
The last equality used . That is the full proof. Notice how every step is a single rule of algebra — no limits, no areas.
Proof 2 — Straight from the limit definition
For readers who want the bare-metal version, compute the derivative directly from :
Use , set (so and as ):
The remaining limit is a famous one (proved in §4): . Plug it in:
What this proves about the slope at x = 1
The limit is geometrically saying: the slope of at is exactly 1. That single fact, combined with the chain rule, forces every other slope to be .
Interactive: Tangent Slope vs. 1/x
Time to see the formula. The left panel shows the curve with two superimposed lines:
- A dashed orange secant connecting and .
- A solid green tangent at , drawn with slope .
The right panel plots . A green dot marks the current slope and you can see it ride along the magenta hyperbola as you drag . Hit “Animate h → 0” and watch the orange secant collapse onto the green tangent — the difference quotient becoming the derivative.
What the colored boxes underneath are telling you
The amber box is the secant slope . The emerald box is . The magenta box is the absolute difference. Slide down towards and you will see the magenta error collapse below . The numbers in those boxes are not pre-baked — they are recomputed from the JavaScript versions of and arithmetic, and they agree with the formula to as many decimal places as you care to count.
Worked Example by Hand
Below is a single example chosen because it exercises the identity at three different levels: numerical (a difference quotient at ), the Taylor approximation that explains the residual, and a chain-rule composition. Pop the details open and try it before reading the solution — five minutes with paper will pay back the rest of the chapter.
Worked example — three checks at x = 2
Setup. Take . We want three things at :
- The exact slope from the formula .
- A numerical secant slope with and the error you would expect.
- The derivative of the composition at the same .
Step 1 — Exact slope
By the rule , evaluating at gives . This is the number every other estimate should converge to.
Step 2 — Numerical secant with h = 0.1
Plug in:
Use the Taylor series with :
Divide by : the secant slope is .
Predicted error. A Taylor expansion of the secant slope around the tangent slope gives:
With and , the leading error is . So the secant slope should be about . Matches our hand computation of to three decimal places — the tiny extra came from the cubic term we dropped.
Step 3 — Chain rule on ln(x² + 1)
Let . Then . Applying the chain rule for :
Plug in : . Higher than the slope of plain at the same point — makes sense, because the inner function is climbing fast, and the chain rule amplifies the log's response.
Verification
All three numbers below should be near each other:
| Quantity | Value | Check |
|---|---|---|
| 0.5000000 | Closed form: 1/2. | |
| Secant slope, | 0.4879016 | Off by ~0.0121 (matches the −h/(2x²) prediction). |
| Secant slope, | 0.4998751 | Off by ~1.25 × 10⁻⁴ (h is 100× smaller, error is 100× smaller). |
| 0.8000000 | By the chain rule: 2x/(x²+1) at x=2 → 4/5. |
Combining With the Chain Rule
Most logs you meet in practice are not bare . They wrap something more complicated: , , . All of these reduce to one pattern.
The way to read this: differentiate the inside, divide by the inside. Three worked instances:
| Function | u(x) | u'(x) | Derivative |
|---|---|---|---|
The last row is not a coincidence
by the inverse identity. Differentiating obviously gives 1. The chain rule got the same answer via . Whenever two routes give the same number, you can trust both.
Common Mistakes and Edge Cases
Pitfall 1 — Forgetting the domain
is undefined for , so its derivative is undefined there too. Writing at, say, is meaningless — the formula gives but the function does not exist at .
Pitfall 2 — ln|x| vs. ln(x)
For the absolute-value version is defined everywhere except zero, and a small calculation shows:
Same formula, broader domain. The proof: for we already have it. For , write and apply the chain rule: . That is why the formula on the antiderivative side reads — the absolute value is the right object once you let be either sign.
Pitfall 3 — Treating ln as if it were linear
It is very common to see students write . This is false. The genuine rules are:
| Identity | Holds? |
|---|---|
| Yes. | |
| Yes. | |
| Yes. | |
| NO — never use this. | |
| NO — also nonsense. |
Pitfall 4 — Confusing log bases
In a math context always means natural log (base ). In some engineering or programming contexts alone is used to mean base 10. If you write when the textbook meant base 10, you will be off by a factor of . The next section (§5.4) handles the general-base case carefully.
Plain Python: Numerical Verification
Now we leave paper and go to a screen. The first script does the thing you would do with a calculator if you wanted to convince a skeptic: it computes a numerical derivative of at several points and prints it next to the analytic value . They should agree to 10 decimal places.
When you run this you should see a table that looks like this (truncated):
x numerical 1 / x abs error
--------------------------------------------------------
0.5000 2.0000000000 2.0000000000 1.11e-11
1.0000 1.0000000001 1.0000000000 6.67e-12
2.0000 0.5000000000 0.5000000000 2.78e-12
2.7183 0.3678794412 0.3678794412 1.11e-12
5.0000 0.2000000000 0.2000000000 2.78e-13
10.0000 0.1000000000 0.1000000000 2.78e-13Every “abs error” entry sits at the level of double-precision round-off (). That is as close as floating-point arithmetic can get. Empirical evidence does not get cleaner than this.
Now apply the chain rule from Python
The second script reuses the helper to differentiate three compositions at two different inputs. Same identity, more interesting inner functions:
PyTorch: Autograd Confirms 1/x
We have done the algebra, drawn the picture, and verified numerically. There is one more witness worth calling — automatic differentiation. PyTorch's autograd does not estimate the derivative with a finite-difference quotient; it walks the computation graph and applies the analytic chain rule node by node. If autograd and our formula disagree, one of them is wrong.
They do not disagree.
Expected output:
x autograd 1 / x |delta|
--------------------------------------------------------
0.5000 2.0000000000 2.0000000000 0.00e+00
1.0000 1.0000000000 1.0000000000 0.00e+00
2.0000 0.5000000000 0.5000000000 0.00e+00
2.7183 0.3678794503 0.3678794503 0.00e+00
5.0000 0.2000000000 0.2000000000 0.00e+00
10.0000 0.1000000015 0.1000000000 1.49e-09Why this matters past chapter 5
You now have five independent confirmations of the same one-line formula: the FTC proof, the inverse-function proof, the limit proof, the symmetric finite difference, and PyTorch's reverse-mode autograd. Whenever you can confirm a piece of mathematics from five angles like that, you can stop second-guessing it and start using it. Every deep-learning loss function that contains a log term — cross-entropy, KL divergence, negative log-likelihood — leans on exactly this derivative.
Real-World Applications
shows up the moment a problem has a multiplicative or relative-rate flavour. Three quick instances:
1. Relative growth rate (economics, biology)
For a positive quantity , the logarithmic derivative is the percentage growth rate per unit time. A population doubling in one year has log-derivative per year — exactly the meaning of an interest rate or doubling-time formula.
2. Information theory — cross-entropy and surprise
The information content of an event with probability is (nats). Its derivative with respect to the probability is . When deep-learning libraries differentiate cross-entropy loss, this is the term doing the work. Every gradient step in classifier training is moving along .
3. Physics — entropy and the partition function
The Helmholtz free energy of a thermodynamic system is where is the partition function. Quantities derived from require derivatives of with respect to temperature or volume — every one of them brings a factor of .
4. Calculus itself — the missing antiderivative
The power rule says for every . The exception is exactly — the integral of . The formula we proved here is precisely what fills that hole:
Without , calculus would be unable to integrate the single function . With it, the gap is closed.
Summary
One identity, five proofs, two pictures, three witnesses in code.
| Concept | Formula | Why |
|---|---|---|
| Derivative of ln | FTC applied to the area definition; equivalently, inverse of e^x. | |
| Slope at x = 1 | The famous limit ln(1+u)/u → 1. | |
| Chain-rule form | Standard chain rule with the outer derivative 1/u. | |
| Absolute-value form | Extends the formula to x < 0; required for ∫(1/x) dx. | |
| Antiderivative | Fills the n = −1 gap in the power rule. | |
| ML connection | Gradient of cross-entropy loss with respect to predicted probability. |
Coming next: §5.4 generalises the formula to logarithms of arbitrary base — . We will see exactly how a single constant of divides into to give , and why the natural base is “natural” precisely because it removes that constant.