Chapter 5
18 min read
Section 43 of 353

Derivative of ln(x)

Derivatives of Transcendental Functions

Learning Objectives

By the end of this section you will be able to:

  1. State and use the identity ddx[lnx]=1x\dfrac{d}{dx}\left[\ln x\right] = \dfrac{1}{x} for every x>0x > 0.
  2. Explain why the slope of the log curve is the reciprocal of the input, both geometrically (area under 1/x1/x) and algebraically (inverse of exe^x).
  3. Combine the rule with the chain rule to differentiate ln(u(x))\ln(u(x)) for any positive function uu.
  4. Avoid the classic mistakes — sign errors with lnx\ln|x|, undefined values at x0x \le 0, and treating ln\ln as a linear function.
  5. Verify the formula numerically with finite differences and symbolically with PyTorch's autograd.

The Question Behind the Section

We already know how steep the curve y=exy = e^{x} is at every point. Question: how steep is its mirror image — the curve y=lnxy = \ln x — at every point?

Section 5.1 gave us our first transcendental derivative: ddx[ex]=ex\dfrac{d}{dx}\bigl[e^{x}\bigr] = e^{x}. The exponential is the unique function whose slope equals its own value. That is a deep statement, but it leaves a sibling question hanging.

Pull the graph of exe^{x} in front of you and rotate it across the line y=xy = x. The mirror image is the natural logarithm, lnx\ln x. Wherever the exponential was steep, the log is shallow. Where the exponential was shallow (near x=0x = 0), the log is steep (near x=1x = 1). Slopes get exchanged with their reciprocals — that is what happens when you flip a curve across y=xy = x.

So we already suspect the answer. The rest of the section makes it precise, explains why from two independent angles (an area picture and an algebra proof), gives you a 3Blue1Brown-style playground to feel it, walks a worked example by hand, and finally checks the formula in Python and PyTorch.

The headline

For every x>0x > 0:

ddx[lnx]  =  1x\displaystyle \frac{d}{dx}\bigl[\ln x\bigr] \;=\; \frac{1}{x}

Read it in English: the slope of the log curve at a point is one over that point. At x=2x = 2 the slope is 1/21/2. At x=10x = 10 it is 1/101/10. The curve never quite stops rising but it gets flatter and flatter — exactly the behaviour the formula predicts.


Geometric Definition: ln Is an Area

Many textbooks define lnx\ln x as “the inverse of exe^{x}”. That is fine for computation but it hides what the log really is. There is a more honest definition — a definition that makes the derivative formula obvious:

ln(a)  =  1a1tdt(a>0)\displaystyle \ln(a) \;=\; \int_{1}^{\,a} \frac{1}{t}\,dt \qquad (a > 0)

In plain English: the natural logarithm of a positive number aa is the area trapped between the curve y=1/ty = 1/t and the tt-axis, from  t=1  \;t = 1\;up to   t=a\;t = a. If a>1a > 1, we are accumulating area going to the right and the result is positive. If 0<a<10 < a < 1, we are travelling backwards along the axis, so the area picks up a minus sign — and lna\ln a comes out negative, exactly as you remember.

Why this is a legitimate definition

The integrand 1/t1/t is positive and continuous for every t>0t > 0, so the integral exists for every positive aa. From this single integral you can derive every property of the logarithm — ln(ab)=lna+lnb\ln(ab) = \ln a + \ln b, ln(1)=0\ln(1) = 0, ln(1/a)=lna\ln(1/a) = -\ln a — by changing the variable inside the integral. It is the cleanest starting point in all of calculus for the logarithm.

The picture in three sentences

  • Draw the hyperbola y=1/ty = 1/t on the positive tt-axis.
  • Anchor the left edge at t=1t = 1; this is where the area starts counting (and where ln1=0\ln 1 = 0).
  • Slide the right edge to t=at = a. The shaded region between 1 and aa is exactly lna\ln a.

Interactive: Slide a, Watch ln(a) Grow

The interactive below makes the definition tangible. Drag the slider for aa and watch the shaded area change in real time. The number underneath the picture is the value of lna\ln a computed by the area definition — it agrees with your calculator to the last decimal place.

Loading hyperbola-area demo…

Three things to try, in order:

  1. Slide aa right until the “Snap a = e” button rounds you to a=2.718a = 2.718\ldots. The shaded area should read 1.000001.00000. That is the area-based definition of ee: the number whose log is one.
  2. Slide aa below 11. The shaded region turns red, because area travelled right-to-left counts negative, and the printed value of lna\ln a turns negative.
  3. Turn on the orange differential strip and look at the three numbers under the picture. The strip's area (shown in amber) is almost identical to (1/a)h(1/a)\cdot h (in magenta). That tiny mismatch shrinks to zero as hh shrinks — and that is literally the derivative.

From Area to Derivative: The Rate of Growth

We now turn the picture into a one-line proof. Start with the definition:

lnx  =  1x1tdt\displaystyle \ln x \;=\; \int_{1}^{\,x} \frac{1}{t}\,dt

The right-hand side is a function defined by an integral with a moving upper limit. The Fundamental Theorem of Calculus, Part 1 (from §14\S 14 of this book), says exactly how its derivative behaves:

FTC Part 1. If F(x)=axf(t)dtF(x) = \int_{a}^{\,x} f(t)\,dt for a continuous integrand ff, then F(x)=f(x)F'(x) = f(x).

Apply that with f(t)=1/tf(t) = 1/t and a=1a = 1:

ddx[lnx]  =  ddx1x1tdt  =  1x.\displaystyle \frac{d}{dx}\bigl[\ln x\bigr] \;=\; \frac{d}{dx}\int_{1}^{\,x} \frac{1}{t}\,dt \;=\; \frac{1}{x}.

That is the whole proof. One line, because the definition was the right one.

The same argument in plain words

Let the shaded area equal L(x)=lnxL(x) = \ln x. Now nudge the right edge from xx to x+hx + h. The added strip is so thin that its top is essentially a horizontal segment of height 1/x1/x. So the strip's area is approximately:

L(x+h)L(x)    1xh.\displaystyle L(x + h) - L(x) \;\approx\; \frac{1}{x}\cdot h.

Divide both sides by hh and send h0h \to 0. The left-hand side becomes the derivative; the right-hand side becomes 1/x1/x. Done.

Why the formula has no constants

Notice no factors of log10\log_{10}, log2\log_{2}, or anything else appeared. The derivative of the natural logarithm is the clean function 1/x1/x precisely because the integrand inside ln\ln's definition is 1/t1/t. Other log bases will pick up a constant — we will see that in §5.4.


Two Rigorous Proofs (Inverse Function & Limit)

The FTC proof above is the cleanest, but it assumes you accept the area definition. If instead you take ln\ln to be defined as the inverse of exe^{x}, you can still recover d/dx[lnx]=1/xd/dx[\ln x] = 1/x by two short arguments. Both are worth knowing.

Proof 1 — Inverse-function rule (the algebra route)

Let y=lnxy = \ln x. Then by the definition of inverse, x=eyx = e^{y}. Differentiate both sides of x=eyx = e^{y} with respect to xx and use the chain rule on the right-hand side:

1  =  eydydx.\displaystyle 1 \;=\; e^{y}\cdot \frac{dy}{dx}.

Solve for dy/dxdy/dx:

dydx  =  1ey  =  1x.\displaystyle \frac{dy}{dx} \;=\; \frac{1}{e^{y}} \;=\; \frac{1}{x}.

The last equality used ey=xe^{y} = x. That is the full proof. Notice how every step is a single rule of algebra — no limits, no areas.

Proof 2 — Straight from the limit definition

For readers who want the bare-metal version, compute the derivative directly from f(x)=limh0f(x+h)f(x)hf'(x) = \lim_{h \to 0}\dfrac{f(x+h)-f(x)}{h}:

f(x)  =  limh0ln(x+h)lnxh  =  limh01hln ⁣(x+hx).\displaystyle f'(x) \;=\; \lim_{h\to 0} \frac{\ln(x+h) - \ln x}{h} \;=\; \lim_{h\to 0} \frac{1}{h}\,\ln\!\left(\frac{x+h}{x}\right).

Use (x+h)/x=1+h/x(x+h)/x = 1 + h/x, set u=h/xu = h/x (so h=uxh = ux and u0u \to 0 as h0h \to 0):

f(x)  =  limu01uxln(1+u)  =  1xlimu0ln(1+u)u.\displaystyle f'(x) \;=\; \lim_{u\to 0} \frac{1}{ux}\,\ln(1+u) \;=\; \frac{1}{x}\,\lim_{u\to 0}\frac{\ln(1+u)}{u}.

The remaining limit is a famous one (proved in §4): limu0ln(1+u)u=1\displaystyle \lim_{u\to 0}\frac{\ln(1+u)}{u} = 1. Plug it in:

f(x)  =  1x1  =  1x.\displaystyle f'(x) \;=\; \frac{1}{x}\cdot 1 \;=\; \frac{1}{x}.

What this proves about the slope at x = 1

The limit limu0ln(1+u)/u=1\lim_{u\to 0} \ln(1+u)/u = 1 is geometrically saying: the slope of lnx\ln x at x=1x = 1 is exactly 1. That single fact, combined with the chain rule, forces every other slope to be 1/x1/x.


Interactive: Tangent Slope vs. 1/x

Time to see the formula. The left panel shows the curve y=lnxy = \ln x with two superimposed lines:

  • A dashed orange secant connecting (x,lnx)(x, \ln x) and (x+h,ln(x+h))(x+h, \ln(x+h)).
  • A solid green tangent at (x,lnx)(x, \ln x), drawn with slope 1/x1/x.

The right panel plots y=1/xy' = 1/x. A green dot marks the current slope and you can see it ride along the magenta hyperbola as you drag xx. Hit “Animate h → 0” and watch the orange secant collapse onto the green tangent — the difference quotient becoming the derivative.

Loading tangent-slope explorer…

What the colored boxes underneath are telling you

The amber box is the secant slope [ln(x+h)lnx]/h[\ln(x+h) - \ln x]/h. The emerald box is 1/x1/x. The magenta box is the absolute difference. Slide hh down towards 0.0020.002 and you will see the magenta error collapse below 10310^{-3}. The numbers in those boxes are not pre-baked — they are recomputed from the JavaScript versions of ln\ln and arithmetic, and they agree with the formula to as many decimal places as you care to count.


Worked Example by Hand

Below is a single example chosen because it exercises the identity at three different levels: numerical (a difference quotient at x=2x = 2), the Taylor approximation that explains the residual, and a chain-rule composition. Pop the details open and try it before reading the solution — five minutes with paper will pay back the rest of the chapter.

Worked example — three checks at x = 2

Setup. Take f(x)=lnxf(x) = \ln x. We want three things at x=2x = 2:

  • The exact slope from the formula 1/x1/x.
  • A numerical secant slope with h=0.1h = 0.1 and the error you would expect.
  • The derivative of the composition g(x)=ln(x2+1)g(x) = \ln(x^{2}+1) at the same x=2x = 2.

Step 1 — Exact slope

By the rule f(x)=1/xf'(x) = 1/x, evaluating at x=2x = 2 gives f(2)=1/2=0.5f'(2) = 1/2 = 0.5. This is the number every other estimate should converge to.

Step 2 — Numerical secant with h = 0.1

Plug in:

ln(2.1)ln(2.0)0.1  =  ln(1.05)0.1.\displaystyle \frac{\ln(2.1) - \ln(2.0)}{0.1} \;=\; \frac{\ln(1.05)}{0.1}.

Use the Taylor series ln(1+u)=uu2/2+u3/3\ln(1+u) = u - u^{2}/2 + u^{3}/3 - \cdots with u=0.05u = 0.05:

ln(1.05)0.050.00252+0.00012530.04879.\displaystyle \ln(1.05) \approx 0.05 - \tfrac{0.0025}{2} + \tfrac{0.000125}{3} \approx 0.04879.

Divide by h=0.1h = 0.1: the secant slope is 0.4879\approx 0.4879.

Predicted error. A Taylor expansion of the secant slope around the tangent slope gives:

ln(x+h)lnxh  =  1xh2x2+O(h2).\displaystyle \frac{\ln(x+h)-\ln x}{h} \;=\; \frac{1}{x} - \frac{h}{2x^{2}} + \mathcal{O}(h^{2}).

With x=2x = 2 and h=0.1h = 0.1, the leading error is h/(2x2)=0.0125-h/(2x^{2}) = -0.0125. So the secant slope should be about 0.50.0125=0.48750.5 - 0.0125 = 0.4875. Matches our hand computation of 0.48790.4879 to three decimal places — the tiny extra came from the cubic term we dropped.

Step 3 — Chain rule on ln(x² + 1)

Let u(x)=x2+1u(x) = x^{2}+1. Then du/dx=2xdu/dx = 2x. Applying the chain rule for ln\ln:

ddxln(x2+1)  =  1x2+12x  =  2xx2+1.\displaystyle \frac{d}{dx}\ln(x^{2}+1) \;=\; \frac{1}{x^{2}+1}\cdot 2x \;=\; \frac{2x}{x^{2}+1}.

Plug in x=2x = 2: 45=0.8\dfrac{4}{5} = 0.8. Higher than the slope of plain lnx\ln x at the same point — makes sense, because the inner function x2+1x^{2}+1 is climbing fast, and the chain rule amplifies the log's response.

Verification

All three numbers below should be near each other:

QuantityValueCheck
f(2)f'(2)0.5000000Closed form: 1/2.
Secant slope, h=0.1h = 0.10.4879016Off by ~0.0121 (matches the −h/(2x²) prediction).
Secant slope, h=0.001h = 0.0010.4998751Off by ~1.25 × 10⁻⁴ (h is 100× smaller, error is 100× smaller).
ddxln(x2+1)\tfrac{d}{dx}\ln(x^{2}+1)0.8000000By the chain rule: 2x/(x²+1) at x=2 → 4/5.

Combining With the Chain Rule

Most logs you meet in practice are not bare lnx\ln x. They wrap something more complicated: ln(x2+1)\ln(x^{2}+1), ln(sinx+2)\ln(\sin x + 2), ln(x)\ln(\sqrt{x}). All of these reduce to one pattern.

ddx[ln(u(x))]  =  1u(x)dudx  =  u(x)u(x).\displaystyle \frac{d}{dx}\bigl[\ln(u(x))\bigr] \;=\; \frac{1}{u(x)}\cdot \frac{du}{dx} \;=\; \frac{u'(x)}{u(x)}.

The way to read this: differentiate the inside, divide by the inside. Three worked instances:

Functionu(x)u'(x)Derivative
ln(x2+1)\ln(x^{2}+1)x2+1x^{2}+12x2x2xx2+1\dfrac{2x}{x^{2}+1}
ln(sinx+2)\ln(\sin x + 2)sinx+2\sin x + 2cosx\cos xcosxsinx+2\dfrac{\cos x}{\sin x + 2}
ln ⁣x\ln\!\sqrt{x}x\sqrt{x}12x\tfrac{1}{2\sqrt{x}}12x\dfrac{1}{2x}
ln(ex)\ln(e^{x})exe^{x}exe^{x}11

The last row is not a coincidence

ln(ex)=x\ln(e^{x}) = x by the inverse identity. Differentiating xx obviously gives 1. The chain rule got the same answer via ex/ex=1e^{x}/e^{x} = 1. Whenever two routes give the same number, you can trust both.


Common Mistakes and Edge Cases

Pitfall 1 — Forgetting the domain

lnx\ln x is undefined for x0x \le 0, so its derivative is undefined there too. Writing d/dx[lnx]=1/xd/dx[\ln x] = 1/x at, say, x=2x = -2 is meaningless — the formula gives 1/2-1/2 but the function does not exist at 2-2.

Pitfall 2 — ln|x| vs. ln(x)

For x0x \neq 0 the absolute-value version lnx\ln|x| is defined everywhere except zero, and a small calculation shows:

ddx[lnx]  =  1x(x0).\displaystyle \frac{d}{dx}\bigl[\ln|x|\bigr] \;=\; \frac{1}{x}\qquad (x \neq 0).

Same formula, broader domain. The proof: for x>0x > 0 we already have it. For x<0x < 0, write lnx=ln(x)\ln|x| = \ln(-x) and apply the chain rule: (1/(x))(1)=1/x(1/(-x))\cdot (-1) = 1/x. That is why the formula on the antiderivative side reads 1xdx=lnx+C\int \tfrac{1}{x}\,dx = \ln|x| + C — the absolute value is the right object once you let xx be either sign.

Pitfall 3 — Treating ln as if it were linear

It is very common to see students write ln(a+b)=lna+lnb\ln(a + b) = \ln a + \ln b. This is false. The genuine rules are:

IdentityHolds?
ln(ab)=lna+lnb\ln(ab) = \ln a + \ln bYes.
ln(a/b)=lnalnb\ln(a/b) = \ln a - \ln bYes.
ln(ak)=klna\ln(a^{k}) = k\ln aYes.
ln(a+b)=lna+lnb\ln(a+b) = \ln a + \ln bNO — never use this.
ln(a)ln(b)=ln(a+b)\ln(a) \cdot \ln(b) = \ln(a+b)NO — also nonsense.

Pitfall 4 — Confusing log bases

In a math context ln\ln always means natural log (base ee). In some engineering or programming contexts log\log alone is used to mean base 10. If you write d/dx[logx]=1/xd/dx[\log x] = 1/x when the textbook meant base 10, you will be off by a factor of ln102.3026\ln 10 \approx 2.3026. The next section (§5.4) handles the general-base case carefully.


Plain Python: Numerical Verification

Now we leave paper and go to a screen. The first script does the thing you would do with a calculator if you wanted to convince a skeptic: it computes a numerical derivative of lnx\ln x at several points and prints it next to the analytic value 1/x1/x. They should agree to 10 decimal places.

Symmetric finite difference confirms d/dx[ln(x)] = 1/x
🐍ln_numerical_derivative.py
1Why import math, not numpy?

math.log is the natural logarithm in Python. Base e is the default — no second argument needed. We use the pure-Python version so the numbers print exactly the way a calculator would show them, with no NumPy broadcasting magic between us and the result.

3Type hints make the contract obvious

x: float — the point where we want the derivative. h: float — the step size used in the finite-difference quotient. The return type -> float says: one number out. A reader of this function should be able to predict the input/output without running it.

EXECUTION STATE
x = the point at which we evaluate f'(x)
h = tiny step, default 1e-5 (≈ 0.00001)
4Why a docstring that names the formula?

Three months from now you will not remember why h appears in the denominator twice. The docstring records the exact identity being approximated. Symmetric (or 'central') difference is preferred over forward difference because its error shrinks like h^2 instead of h — about a thousand times more accurate for h = 1e-5.

10Guard against the only forbidden input

ln(x) is only defined for x > 0. Calling math.log(0) raises ValueError on its own, but the early raise here gives a more readable message ('only defined for x > 0') and prevents the deeper math.log error from leaking through.

EXAMPLE
derivative_of_ln(-1) → ValueError('ln(x) is only defined for x > 0'). No NaN, no silent zero — a loud failure is the right behaviour.
12The two log calls

math.log(x + h) is ln just to the right of x, math.log(x - h) is ln just to the left. Subtracting gives a tiny vertical rise; dividing by 2h gives the average slope across that small interval — which is the derivative as h → 0.

EXECUTION STATE
x + h (at x = 2, h = 1e-5) = 2.00001
x - h = 1.99999
math.log(2.00001) - math.log(1.99999) = ≈ 1.0000000000e-5
÷ (2 * h) = ≈ 0.5000000000 = 1/2 = 1/x
16Pick a deliberate set of sample points

We want to test the formula in interesting places, not just easy ones. 0.5 < 1 (where ln is negative). 1.0 (where ln is exactly 0). 2.0 (a clean rational point). e (where ln returns 1 exactly). 5.0 and 10.0 (large enough that 1/x is small).

EXECUTION STATE
math.e = 2.718281828459045...
math.log(math.e) = 1.0
18Header row with column widths

f-strings let us format columns by total width. {'x':>10} right-aligns the literal string 'x' in 10 characters of space. Matching widths in the data rows below keeps the table aligned.

21Loop body — three numbers per point

For every sample point we compute the numerical estimate, the analytic answer 1/x, and the absolute error between them. Printing all three side-by-side is the witness: if the numerical column tracks the analytic column to ~10 decimal places, the formula d/dx[ln x] = 1/x is empirically confirmed.

EXECUTION STATE
at x = 2.0 = numeric ≈ 0.5000000000 analytic = 0.5000000000 error ≈ 1e-11
at x = e = numeric ≈ 0.3678794412 analytic ≈ 0.3678794412 error ≈ 1e-11
at x = 0.5 = numeric ≈ 2.0000000000 analytic = 2.0000000000 error ≈ 1e-11
27The if-name-main idiom

Keeps main() from running if this file is imported as a module. Standard Python hygiene — if a test later imports derivative_of_ln, the table will not spill into the test output.

22 lines without explanation
1import math
2
3def derivative_of_ln(x: float, h: float = 1e-5) -> float:
4    """
5    Numerically estimate d/dx[ln(x)] at the point x using the symmetric
6    difference quotient:
7
8        f'(x) ~ ( f(x + h) - f(x - h) ) / (2 * h)
9
10    For f(x) = ln(x) the analytic answer is 1/x, so we use this routine
11    only as an independent witness that confirms the formula.
12    """
13    if x <= 0:
14        raise ValueError("ln(x) is only defined for x > 0")
15    return (math.log(x + h) - math.log(x - h)) / (2.0 * h)
16
17
18def main() -> None:
19    sample_points = [0.5, 1.0, 2.0, math.e, 5.0, 10.0]
20
21    print(f"{'x':>10}  {'numerical':>14}  {'1 / x':>14}  {'abs error':>12}")
22    print('-' * 56)
23    for x in sample_points:
24        numeric = derivative_of_ln(x)
25        analytic = 1.0 / x
26        error = abs(numeric - analytic)
27        print(f"{x:>10.4f}  {numeric:>14.10f}  {analytic:>14.10f}  {error:>12.2e}")
28
29
30if __name__ == "__main__":
31    main()

When you run this you should see a table that looks like this (truncated):

         x       numerical            1 / x     abs error
--------------------------------------------------------
    0.5000    2.0000000000   2.0000000000     1.11e-11
    1.0000    1.0000000001   1.0000000000     6.67e-12
    2.0000    0.5000000000   0.5000000000     2.78e-12
    2.7183    0.3678794412   0.3678794412     1.11e-12
    5.0000    0.2000000000   0.2000000000     2.78e-13
   10.0000    0.1000000000   0.1000000000     2.78e-13

Every “abs error” entry sits at the level of double-precision round-off (1012\sim 10^{-12}). That is as close as floating-point arithmetic can get. Empirical evidence does not get cleaner than this.

Now apply the chain rule from Python

The second script reuses the helper d/dx[lnu(x)]=u(x)/u(x)d/dx[\ln u(x)] = u'(x)/u(x) to differentiate three compositions at two different inputs. Same identity, more interesting inner functions:

Chain rule for ln composed with three inner functions
🐍ln_chain_rule.py
3One helper, one identity

We are not implementing all of calculus. We are implementing exactly the chain rule for the outer function ln(·). Giving it its own function name makes downstream code read like the math: derivative = d_ln_of(u, dudx).

8Why u(x) must be positive

ln is only defined for positive arguments. If the caller passes u_value = 0 or negative, the composition ln(u(x)) is undefined at that point and so is its derivative. We raise early so a downstream NaN cannot silently propagate.

EXAMPLE
d_ln_of(-1, 2) → ValueError. Better than returning a misleading negative number.
14The chain rule in one line

d/dx[ln(u(x))] = (1/u) · du/dx. Reading the line of code in English: take the inner derivative, divide it by the inner value. That is it.

EXECUTION STATE
u_value = 5, du_dx = 3 = returns 3 / 5 = 0.6
u_value = e, du_dx = e = returns 1.0 (this is d/dx[x] in disguise, since ln(e^x) = x)
23A deliberately mixed set of inner functions

x^2 + 1 is polynomial. 3x + 5 is linear. sin(x) + 2 is trig. The same formula handles all three — that is what makes the chain rule worth memorising.

27Why sin(x) + 2, not sin(x)?

sin(x) by itself dips negative on (π, 2π), and ln of a negative number is undefined. Adding 2 lifts the whole curve above zero so ln(sin(x) + 2) is defined for every real x — a clean teaching example.

33Walk through one row by hand

At x = 1: ln(x^2 + 1) → u = 1 + 1 = 2, du/dx = 2·1 = 2. So derivative = 2 / 2 = 1. Compare to the printed value: 1.000000. The Python is doing exactly the algebra you would write in your notebook.

EXECUTION STATE
row 'ln(x^2 + 1)' at x=1 = u=2.0000 du/dx=2.0000 deriv=1.000000
row 'ln(3x + 5)' at x=1 = u=8.0000 du/dx=3.0000 deriv=0.375000
row 'ln(sin x+2)' at x=1 = u=2.8415 du/dx≈0.5403 deriv≈0.190187
38Verify with a second x

Running the demo at x = 1 and again at x = 2 gives twelve numbers, all consistent with the formula. Two independent checkpoints catch off-by-one mistakes that one checkpoint would miss.

36 lines without explanation
1import math
2
3def d_ln_of(u_value: float, du_dx: float) -> float:
4    """
5    Apply the chain rule for d/dx[ ln( u(x) ) ].
6
7        d/dx[ ln(u) ] = (1 / u) * du/dx
8
9    Parameters
10    ----------
11    u_value : value of u(x) at the point in question (must be > 0).
12    du_dx   : the derivative of u with respect to x at that point.
13    """
14    if u_value <= 0:
15        raise ValueError("u(x) must be positive for ln(u(x)) to be defined")
16    return du_dx / u_value
17
18
19def demo_three_inner_functions(x: float) -> None:
20    """
21    Show how the same outer rule applies to three different inner functions:
22        f1(x) = ln(x^2 + 1)
23        f2(x) = ln(3 * x + 5)
24        f3(x) = ln(sin(x) + 2)     <- always positive for any x
25    """
26    cases = [
27        ("ln(x^2 + 1)",   x * x + 1,         2.0 * x),
28        ("ln(3x + 5)",    3.0 * x + 5.0,     3.0),
29        ("ln(sin x + 2)", math.sin(x) + 2.0, math.cos(x)),
30    ]
31
32    print(f"At x = {x}")
33    print(f"{'expression':<20}  {'u(x)':>10}  {'du/dx':>10}  {'derivative':>14}")
34    print('-' * 60)
35    for label, u, dudx in cases:
36        deriv = d_ln_of(u, dudx)
37        print(f"{label:<20}  {u:>10.4f}  {dudx:>10.4f}  {deriv:>14.6f}")
38
39
40if __name__ == "__main__":
41    demo_three_inner_functions(1.0)
42    print()
43    demo_three_inner_functions(2.0)

PyTorch: Autograd Confirms 1/x

We have done the algebra, drawn the picture, and verified numerically. There is one more witness worth calling — automatic differentiation. PyTorch's autograd does not estimate the derivative with a finite-difference quotient; it walks the computation graph and applies the analytic chain rule node by node. If autograd and our formula disagree, one of them is wrong.

They do not disagree.

torch.autograd.grad on y = ln(x) at six points
🐍ln_pytorch_autograd.py
1Why bring PyTorch into a calculus chapter?

For a one-line function like ln(x) the derivative is trivial by hand. For a 50-million-parameter neural network it is not. PyTorch's autograd computes derivatives mechanically by walking the computation graph backwards. Practicing on f = ln(x) builds the same muscle memory you will use later for loss functions.

3Pure function with a precise signature

x_value: float in, a tuple of two floats out. The tuple names what each slot means. Functions like this are easy to test and easy to compose — qualities you want for every numerical building block.

13torch.tensor(x_value, requires_grad=True)

Three things happen on one line. (a) The number x_value is wrapped in a tensor. (b) requires_grad=True flags it as a leaf in the autograd graph — meaning we are allowed to ask for the derivative with respect to x later. (c) The tensor is now connected to PyTorch's dynamic graph machinery; every torch operation that consumes x will append a node to that graph.

EXECUTION STATE
x = tensor(2.0, requires_grad=True) when x_value = 2.0
x.is_leaf = True (built directly from data, not from another op)
16y = torch.log(x) — natural log, not base 10

PyTorch's torch.log is the natural log. (For base 10 you would write torch.log10.) Behind the scenes PyTorch creates a tiny graph node: y is now a tensor with a grad_fn=<LogBackward0> attribute that knows the analytic derivative is 1/x and will use it on the way back.

EXECUTION STATE
y = tensor(0.6931, grad_fn=<LogBackward0>) when x = 2
y.grad_fn = <LogBackward0 object at 0x…> (this is what makes backprop possible)
19torch.autograd.grad — the targeted gradient call

Asks PyTorch: 'starting from y, push the gradient back to x'. This is the cleaner cousin of y.backward() — it returns the gradient as a tensor instead of writing it to x.grad. Returns a tuple in the same order as the inputs argument, so we destructure with (grad_x,).

EXAMPLE
torch.autograd.grad(outputs=y, inputs=x) returns (tensor(0.5),) for x_value = 2. The trailing comma matters: we are unpacking a one-element tuple.
21Return autograd's answer next to the closed-form answer

.item() converts the 0-dim tensor back to a Python float so we can print and compare cleanly. 1.0 / x_value is the formula we are validating. Returning both lets the caller decide what to do with the comparison.

EXECUTION STATE
grad_x.item() at x = 2 = 0.5
1.0 / 2.0 = 0.5
abs delta = 0.0 (exact on machine precision)
25torch.e — Euler's number as a tensor constant

torch.e is a Python float holding the same value as math.e: 2.718281828459045. We cast it back to a Python float on the next line so the function signature x_value: float is respected. At x = e, ln(x) = 1 and 1/x ≈ 0.3679 — a nice anchor point.

32The validation column tells the whole story

The right-most column prints |autograd − 1/x|. For every point it should be 0.0 or a tiny round-off value. Seeing that column collapse to zero across very different x values is the empirical proof of the identity — for a polynomial-like log graph, autograd is bit-exact.

EXECUTION STATE
at x = 0.5 = autograd 2.0000000000 1/x 2.0000000000 |delta| 0.00e+00
at x = 1.0 = autograd 1.0000000000 1/x 1.0000000000 |delta| 0.00e+00
at x = e = autograd 0.3678794503 1/x 0.3678794503 |delta| 0.00e+00
at x = 10 = autograd 0.1000000015 1/x 0.1000000000 |delta| ≈ 1.5e-09
30 lines without explanation
1import torch
2
3def grad_of_ln_at(x_value: float) -> tuple[float, float]:
4    """
5    Confirm d/dx[ln(x)] = 1/x by running PyTorch's autograd at one point.
6
7    Returns
8    -------
9    (autograd_value, closed_form_value)
10        autograd_value     — what PyTorch's reverse-mode AD gives us.
11        closed_form_value  — 1 / x, computed by hand.
12
13    Both numbers must match to within floating-point round-off.
14    """
15    # 1. Build x as a *leaf* tensor with gradient tracking enabled.
16    x = torch.tensor(x_value, requires_grad=True)
17
18    # 2. Forward pass: y = ln(x). The computation graph records "log".
19    y = torch.log(x)
20
21    # 3. Reverse pass: ask autograd for dy/dx at this single point.
22    (grad_x,) = torch.autograd.grad(outputs=y, inputs=x)
23
24    return grad_x.item(), 1.0 / x_value
25
26
27def main() -> None:
28    test_points = [0.5, 1.0, 2.0, torch.e, 5.0, 10.0]
29
30    print(f"{'x':>10}  {'autograd':>14}  {'1 / x':>14}  {'|delta|':>12}")
31    print('-' * 56)
32    for xv in test_points:
33        a, b = grad_of_ln_at(float(xv))
34        print(f"{float(xv):>10.4f}  {a:>14.10f}  {b:>14.10f}  {abs(a - b):>12.2e}")
35
36
37if __name__ == "__main__":
38    main()

Expected output:

         x        autograd            1 / x      |delta|
--------------------------------------------------------
    0.5000    2.0000000000   2.0000000000     0.00e+00
    1.0000    1.0000000000   1.0000000000     0.00e+00
    2.0000    0.5000000000   0.5000000000     0.00e+00
    2.7183    0.3678794503   0.3678794503     0.00e+00
    5.0000    0.2000000000   0.2000000000     0.00e+00
   10.0000    0.1000000015   0.1000000000     1.49e-09

Why this matters past chapter 5

You now have five independent confirmations of the same one-line formula: the FTC proof, the inverse-function proof, the limit proof, the symmetric finite difference, and PyTorch's reverse-mode autograd. Whenever you can confirm a piece of mathematics from five angles like that, you can stop second-guessing it and start using it. Every deep-learning loss function that contains a log term — cross-entropy, KL divergence, negative log-likelihood — leans on exactly this derivative.


Real-World Applications

d/dx[lnx]=1/xd/dx[\ln x] = 1/x shows up the moment a problem has a multiplicative or relative-rate flavour. Three quick instances:

1. Relative growth rate (economics, biology)

For a positive quantity Q(t)Q(t), the logarithmic derivative ddtlnQ(t)=Q(t)Q(t)\dfrac{d}{dt}\ln Q(t) = \dfrac{Q'(t)}{Q(t)} is the percentage growth rate per unit time. A population doubling in one year has log-derivative ln20.693\ln 2 \approx 0.693 per year — exactly the meaning of an interest rate or doubling-time formula.

2. Information theory — cross-entropy and surprise

The information content of an event with probability pp is I(p)=lnpI(p) = -\ln p (nats). Its derivative with respect to the probability is 1/p-1/p. When deep-learning libraries differentiate cross-entropy loss, this is the term doing the work. Every gradient step in classifier training is moving along 1/p1/p.

3. Physics — entropy and the partition function

The Helmholtz free energy of a thermodynamic system is F=kBTlnZF = -k_{B}T\ln Z where ZZ is the partition function. Quantities derived from FF require derivatives of lnZ\ln Z with respect to temperature or volume — every one of them brings a factor of 1/Z1/Z.

4. Calculus itself — the missing antiderivative

The power rule says xndx=xn+1/(n+1)+C\int x^{n}\,dx = x^{n+1}/(n+1) + C for every n1n \neq -1. The exception is exactly n=1n = -1 — the integral of 1/x1/x. The formula we proved here is precisely what fills that hole:

1xdx  =  lnx+C.\displaystyle \int \frac{1}{x}\,dx \;=\; \ln|x| + C.

Without ln\ln, calculus would be unable to integrate the single function 1/x1/x. With it, the gap is closed.


Summary

One identity, five proofs, two pictures, three witnesses in code.

ConceptFormulaWhy
Derivative of lnddxlnx=1x\dfrac{d}{dx}\ln x = \dfrac{1}{x}FTC applied to the area definition; equivalently, inverse of e^x.
Slope at x = 1ddxlnxx=1=1\dfrac{d}{dx}\ln x\Big|_{x=1} = 1The famous limit ln(1+u)/u → 1.
Chain-rule formddxlnu(x)=u(x)u(x)\dfrac{d}{dx}\ln u(x) = \dfrac{u'(x)}{u(x)}Standard chain rule with the outer derivative 1/u.
Absolute-value formddxlnx=1x\dfrac{d}{dx}\ln|x| = \dfrac{1}{x}Extends the formula to x < 0; required for ∫(1/x) dx.
Antiderivative1xdx=lnx+C\displaystyle\int\dfrac{1}{x}\,dx = \ln|x| + CFills the n = −1 gap in the power rule.
ML connectionddp[lnp]=1p\dfrac{d}{dp}\bigl[-\ln p\bigr] = -\dfrac{1}{p}Gradient of cross-entropy loss with respect to predicted probability.
The essence of this section:
“The slope of the natural log at x is 1/x — because the area beneath 1/t grows at rate 1/x when its right edge sits at x.”
Coming next: §5.4 generalises the formula to logarithms of arbitrary base — logax\log_{a} x. We will see exactly how a single constant of lna\ln a divides into 1/x1/x to give 1/(xlna)1/(x \ln a), and why the natural base is “natural” precisely because it removes that constant.
Loading comments...