Derivatives of Exponential, Logarithmic, and Trigonometric Functions
Learning Objectives
By the end of this section you will be able to:
Recognize the three function shapes that demand logarithmic differentiation — variable-base-variable-exponent (like xx), messy products and quotients, and towers of exponents.
Apply the 4-step recipe: take ln of both sides, differentiate implicitly, simplify, multiply back by y.
Derive the master formula f′(x)=f(x)dxd[lnf(x)] from (lny)′=y′/y.
Use log-diff to find critical points of functions the basic power and exponential rules cannot touch.
Verify any log-diff answer in Python with SymPy and in PyTorch with autograd.
Connect the trick to log-likelihoods in machine learning — the reason every ML paper differentiates sums of logs and never products of probabilities.
Why We Need a New Trick
So far in this chapter you have learned three derivative machines, and each one solves exactly one job:
Function shape
Rule that works
Example
Power: x^n (n is a fixed number)
Power rule
d/dx(x^3) = 3x^2
Exponential: a^x (a is a fixed number)
Exponential rule
d/dx(2^x) = 2^x · ln 2
Logarithm: log_a(x)
Log rule
d/dx(ln x) = 1/x
Now look at f(x)=xx. The base is x (changing). The exponent is x (also changing). The power rule wants the exponent to be constant. The exponential rule wants the base to be constant. Both refuse.
The pain point: tools that assume one slot is fixed cannot differentiate a function where both slots move at once.
Logarithmic differentiation is the universal escape hatch. The recipe is short, the algebra is mostly bookkeeping, and once you see it you will recognize whole new families of functions you can now differentiate by hand.
Big idea in one sentence: taking the natural log of both sides turns multiplication into addition, exponentiation into multiplication, and a stack of operators into a simple sum of friendly terms. Once the equation is a sum, every basic rule applies.
Intuition: ln() Linearizes Multiplication
Imagine you are an accountant and someone hands you the equation
y=x2⋅(x+1)3⋅ex
Differentiating this directly is a triple-nested product rule and chain rule. You will fill half a page with parentheses and you will probably drop a sign somewhere. Painful.
But the same equation, viewed through ln, becomes
lny=2lnx+3ln(x+1)+x
That is a sum. Three simple terms. The right side is now a job for the sum rule plus three one-line applications of the log derivative.
Analogy. Think of ln as a decoder ring that turns multiplicative structure into additive structure. We use it any time the original function is glued together by multiplication or exponentiation and we wish it were glued together by addition instead.
This is the same reason ML engineers always work with log-likelihoods instead of likelihoods: probabilities multiply, log-probabilities add, and additive things are easier to optimize, store, and differentiate.
The 4-Step Recipe
Given an equation y=f(x) where f is a tangle of products, quotients, or variable-exponent towers (and f(x)>0 near the point of interest):
Take ln of both sides. The left becomes lny. Use the log laws to break the right side into a sum: ln(ab)=lna+lnb, ln(a/b)=lna−lnb, ln(ac)=clna.
Differentiate both sides with respect to x. The left side gives y1⋅dxdy by the chain rule (because y is itself a function of x). The right side is a sum, so differentiate term by term.
Simplify the right side. Plain algebra. No calculus left.
Multiply both sides by y to solve for dy/dx, then substitute the original formula for y so the final answer is in pure x.
Step 2 is where almost every beginner mistake lives. The left side is notd/dx[lny]=1/y. That would be true if y were the variable. But y is a function of x, so the chain rule fires and produces an extra dy/dx.
Interactive Walkthrough on x^x
Step through the recipe on y=xx below. Watch how line 1 contains an exponent and line 4 contains only sums and products.
Loading step-by-step pipeline…
The Master Formula
Strip away the names and the entire procedure is one identity. Starting from g(x)=lnf(x) and applying the chain rule,
g′(x)=dxdlnf(x)=f(x)f′(x)
Solving for f′(x):
f′(x)=f(x)⋅dxd[lnf(x)]
Read it out loud: “the derivative of f is f times the derivative of its log.” Every example in the rest of this section is a deliberate exercise of this one identity.
The expression f(x)f′(x) is called the logarithmic derivative of f. It measures relative rate of change — what fraction of itself f changes by per unit x. In finance this is the instantaneous return; in biology, the instantaneous growth rate; in ML, the log-likelihood gradient.
Worked Example: f(x)=xx at x=2
We will walk the full 4-step recipe and end with a single number. Try the steps on paper before opening the answer.
Click to expand the by-hand walkthrough
Step 1 — Take ln of both sides.
lny=ln(xx)=xlnx
The log law ln(ab)=blna just brought the exponent down out of the tower.
Step 2 — Differentiate both sides. On the left, chain rule:
y1dxdy=dxd[xlnx]
On the right, product rule with u=x,v=lnx:
(uv)′=u′v+uv′=(1)(lnx)+(x)(x1)=lnx+1
Step 3 — Combine.
y1dxdy=lnx+1
Pure algebra from here.
Step 4 — Solve for dy/dx. Multiply both sides by y and substitute y=xx:
dxdy=y(lnx+1)=xx(lnx+1)
Plug in x = 2.
f′(2)=22(ln2+1)=4(0.6931+1)=4⋅1.6931≈6.7726
Sanity check by tiny step. A forward difference with h=0.001: f(2.001) − f(2) ≈ 4.006774 − 4.000000 = 0.006774 divide by h: 0.006774 / 0.001 ≈ 6.774 Matches our analytic 6.7726 to 3 decimals — exactly the agreement we expect from a first-order difference.
Interactive Log-Diff Explorer (4 Functions)
Pick any of the four notoriously annoying functions below. Drag the probe slider to read off f(x), lnf(x), and f′(x) simultaneously. Notice how dashed purple (the log) is always tame and smooth even when cyan (the original) explodes.
Loading interactive log-diff explorer…
What to look for. For xx, slide x near 0.37 — that is 1/e, and the orange f′(x) curve crosses zero exactly there. For x1/x, slide x near e≈2.718 — the same thing happens. Both critical points fall straight out of the log-diff identity and cannot be obtained any other way at this level.
Payoff: Finding the Minimum of xx
Setting f′(x)=xx(lnx+1)=0 and using the fact that xx>0 always, the only way the product can be zero is lnx+1=0, i.e. x=e−1=1/e≈0.3679. At that point,
f(1/e)=(e1)1/e≈0.6922
Drag the slider below until the slope readout turns green at zero. That is the global minimum of xx on (0,∞).
Loading minimum-of-xˣ explorer…
Why this is impressive. Without log-diff you would have no rule that even applies to xx, let alone a way to find its critical point. The very existence of x=1/e as a minimum is a result you can only obtain with logarithmic differentiation at this level of math.
Pattern 2: Messy Products and Quotients
Even when the exponents are constant, log-diff is a labor-saver whenever you have many factors. Consider
y=(x+3)4(x+1)2x−1
Direct differentiation requires the quotient rule wrapping a product rule wrapping the chain rule. Through ln the same thing becomes
lny=2ln(x+1)+21ln(x−1)−4ln(x+3)
Differentiate each term in isolation:
y1dxdy=x+12+2(x−1)1−x+34
Multiply back by y and we are done.
Pattern. Whenever a function has n factors connected by × or ÷, log-diff converts its derivative into a single sum of n simple rational terms. That is a linear-cost-in-n procedure where the direct method is closer to quadratic.
Pattern 3: Variable Base, Variable Exponent
This is the family that only log-diff can handle. Examples:
Function
Log-form
Derivative
y = x^x
ln y = x ln x
y' = x^x (ln x + 1)
y = x^{sin x}
ln y = sin(x) · ln x
y' = x^{sin x} ( cos x · ln x + sin x / x )
y = (sin x)^x (with sin x > 0)
ln y = x · ln(sin x)
y' = (sin x)^x ( ln(sin x) + x·cot x )
y = x^{1/x}
ln y = (1/x) · ln x
y' = x^{1/x} · (1 − ln x) / x^2
Every row above came out of the same 4-step recipe. Once the log step linearizes things, you are doing standard sum-rule and product-rule moves — no special memorization required.
Common Pitfalls (Read This!)
Pitfall 1: forgetting the chain rule on the left side. After taking ln of both sides, the left is lny, and the derivative with respect to x is (1/y)⋅(dy/dx), not just 1/y. Skipping the dy/dx is the #1 mistake.
Pitfall 2: applying it where f is not strictly positive.ln is undefined on (−∞,0]. If f changes sign, work with ln∣f(x)∣ — the identity (ln∣f∣)′=f′/f still holds — and be extra careful at the zeros of f themselves.
Pitfall 3: confusing xx with xn or ax.dxdxx=xxx−1 (that would be the power rule) and =xxlnx (that would be the exponential rule applied with a wrong base). The correct derivative is xx(lnx+1) — and the extra +1 exists because the base is changing too.
Pitfall 4: forgetting to substitute y back at the end. Your answer should be in pure x, not in y. Step 4 is finished only after you replace y with the original formula.
Python: Symbolic + Numerical Verification
Plain Python first. We will implement the log-diff identity f′(x)=f(x)⋅(lnf)′(x) as a numerical estimator, then compare it against the analytic answer we derived by hand. If our algebra was right, the two columns should agree to ~10 decimals.
log_diff.py — pen-and-paper log-diff, in numerical form
🐍log_diff.py
Explanation(12)
Code(43)
1Imports — just the standard library
We only need math.log and a type hint for the callable. No NumPy. The whole point of this section is to make logarithmic differentiation feel mechanical, so we stay close to pen-and-paper math.
EXECUTION STATE
math.log = natural logarithm, base e
Callable = type alias for a function-valued argument
4Function signature: log_diff(f, x, h)
This is the central object of the file: a numerical estimator of f'(x) that goes through ln f first. Inputs are the function f, the point x, and the step size h used for the central difference on ln f.
EXAMPLE
log_diff(x_to_the_x, 2.0) → 6.7726
EXECUTION STATE
f = Callable[[float], float] — the function we want to differentiate
x = the point where we want f'(x)
h = 1e-6 (step size for the symmetric difference)
14Evaluate f(x) once
We call f(x) exactly once because (a) we need it later as a multiplier and (b) repeated calls would waste work. fx is the y-value of the original function at the probe point.
EXECUTION STATE
fx = f(x), e.g. 2^2 = 4.0 when x = 2
15Guard: log-diff only works where f > 0
ln is only defined for positive arguments, so log-diff requires f(x) > 0 in a neighborhood. For functions like x^x with x > 0, this is always fine. For signed functions, you would first take the absolute value (the identity d/dx ln|f| = f'/f still works).
EXECUTION STATE
ValueError = raised if f(x) ≤ 0
19Sample ln f at x + h
We push the input forward by h and immediately apply ln. ln f(x + h) is the value of the auxiliary, linearized function g(x) = ln f(x), evaluated slightly to the right of x.
Same idea, but a step to the left. We need both sides for the symmetric (central) difference, which has O(h²) accuracy instead of the O(h) accuracy of a one-sided difference.
EXECUTION STATE
g_minus = ln f(x − h)
21Central difference quotient
This estimates the derivative of g(x) = ln f(x) at the probe point. The (g_plus − g_minus) / (2h) form cancels the leading-order error term of the Taylor expansion, leaving an O(h²) approximation.
EXAMPLE
For f = x^x at x = 2:
g_prime ≈ (1.38629548 − 1.38629375) / 2e-6
g_prime ≈ 1.6931472
EXECUTION STATE
g_prime = (g_plus − g_minus) / (2h), e.g. 1.6931 at x = 2
24Multiply back by f(x) — the log-diff identity
Here is the whole point: d/dx ln f(x) = f'(x) / f(x), so f'(x) = f(x) · d/dx ln f(x). We just plug in the numerical estimate of d/dx ln f(x) and multiply by the original f(x). One identity does the entire job.
EXAMPLE
At x = 2:
f'(2) ≈ 4.0 * 1.6931 = 6.7726
(matches the analytic answer)
EXECUTION STATE
return value = f(x) · g_prime — our estimate of f'(x)
27Concrete test function: x ↦ x^x
We pick the canonical "hard" function for log-diff. The basic power rule says d/dx x^n = n x^(n−1) but n is constant. The basic exponential rule says d/dx a^x = a^x ln a but a is constant. x^x violates *both*. Log-diff is the way out.
33Analytical answer (so we can grade the numerics)
From the algebra ln(x^x) = x ln x → (ln f)' = ln x + 1 → f' = f · (ln x + 1) = x^x (ln x + 1). We include this so the test prints both the numerical estimate and the exact value side-by-side.
EXAMPLE
At x = 2:
exact = 4 · (ln 2 + 1) = 4 · 1.6931 = 6.7726
37Driver loop — five sample points
We sweep x through 0.5, 1.0, 1.5, 2.0, 2.5 so the reader can see log-diff agree with the analytic formula across both the decreasing region (x < 1/e) and the increasing region (x > 1/e).
LOOP TRACE · 5 iterations
x = 0.5
f(x) = 0.5^0.5 ≈ 0.7071
numerical = 0.7071 · (ln 0.5 + 1) ≈ 0.2171
analytic = 0.2171
x = 1.0
f(x) = 1.0^1.0 = 1.0
numerical = 1.0 · (ln 1 + 1) = 1.0000
analytic = 1.0000
x = 1.5
f(x) = 1.5^1.5 ≈ 1.8371
numerical = 1.8371 · (ln 1.5 + 1) ≈ 2.5821
analytic = 2.5821
x = 2.0
f(x) = 2^2 = 4.0
numerical = 4.0 · (ln 2 + 1) ≈ 6.7726
analytic = 6.7726
x = 2.5
f(x) = 2.5^2.5 ≈ 9.8821
numerical = 9.8821 · (ln 2.5 + 1) ≈ 18.937
analytic = 18.937
38Print row — readable diagnostic table
Each row shows x, f(x), the numerical log-diff estimate, the analytic exact value, and the absolute error. With h = 1e-6 the error column will be ~1e-10 across the board, which is the signature of an O(h²) method on a smooth function.
31 lines without explanation
1import math
2from typing import Callable
34deflog_diff(f: Callable[[float],float], x:float, h:float=1e-6)->float:5"""
6 Numerically compute f'(x) using the *logarithmic-differentiation identity*
78 f'(x) = f(x) * d/dx [ ln f(x) ]
910 instead of differencing f directly. For functions like f(x) = x^x this
11 is dramatically more stable because ln(x^x) = x ln(x) is a smooth,
12 well-behaved sum — the original f explodes super-exponentially.
13 """14 fx = f(x)15if fx <=0:16raise ValueError("log-diff requires f(x) > 0 in a neighborhood of x")1718# central difference on g(x) = ln f(x)19 g_plus = math.log(f(x + h))20 g_minus = math.log(f(x - h))21 g_prime =(g_plus - g_minus)/(2* h)2223# multiply back by f(x) to recover f'(x)24return fx * g_prime
252627defx_to_the_x(x:float)->float:28return x ** x
293031# Analytical answer derived in the section:32# f(x) = x^x => f'(x) = x^x * (ln x + 1)33defx_to_the_x_prime_analytic(x:float)->float:34return(x ** x)*(math.log(x)+1)353637if __name__ =="__main__":38for x in[0.5,1.0,1.5,2.0,2.5]:39 numerical = log_diff(x_to_the_x, x)40 analytic = x_to_the_x_prime_analytic(x)41print(f"x={x:>4} f(x)={x_to_the_x(x):10.4f} "42f"num={numerical:10.4f} exact={analytic:10.4f} "43f"err={abs(numerical - analytic):.2e}")
Run the script and you will see error column entries on the order of 1e-10 for every row. That is the floating-point signature of an O(h²) central difference applied to a smooth function — confirmation that our hand-derived formula f′(x)=xx(lnx+1) is correct.
PyTorch: Autograd Confirms the Recipe
PyTorch will happily differentiate xx directly via torch.autograd.grad — internally, its PowBackward node already implements exactly the formula we just derived. That gives us a cross-check: we will compute the answer two independent ways (through ln first, vs. directly) and confirm they agree.
log_diff_torch.py — autograd reproduces the log-diff identity
🐍log_diff_torch.py
Explanation(12)
Code(48)
1Import PyTorch
Just torch. Everything we need — tensors, autograd, log — lives in the top-level namespace.
4The forward function f(x) = x^x
We define f as a normal PyTorch operation. Because x is a tensor with requires_grad=True (set in the driver loop), every operation we do here is recorded by autograd onto a computation graph.
This function is the PyTorch translation of the pen-and-paper recipe. Step 1: build g = ln f. Step 2: ask autograd for g'. Step 3: multiply by f(x). Three lines, exact same logic as the math.
14g = torch.log(f(x))
This is the linearization step. Inside autograd's view, the computation graph is now: x → x^x → ln(x^x) = x ln x. We never explicitly write `x ln x`; PyTorch tracks the equivalent graph automatically.
EXECUTION STATE
g = tensor(1.3863, grad_fn=<LogBackward>) for x = 2
17torch.autograd.grad(outputs=g, inputs=x)
This call backpropagates from g to x and returns dg/dx as a one-element tuple. It is the autograd equivalent of `g.backward()` followed by `x.grad`, but without mutating x's .grad attribute — important when you want to call grad() multiple times in the same script.
EXAMPLE
g_prime = tensor(1.6931) for x = 2
(matches ln(2) + 1 = 1.6931 to 4+ decimals)
18outputs=g
We are differentiating g (the ln of f), not f itself. This is the entire trick — pushing the differentiation through ln converts an exponential tangle into a sum.
19inputs=x
The variable with respect to which we want the gradient. x must have requires_grad=True or autograd refuses to track it.
20create_graph=False
We do NOT need second-order derivatives here, so we keep create_graph off to avoid building a graph-over-the-graph. If we wanted f''(x), we would set create_graph=True and call grad() a second time.
24return f(x) * g_prime
This is f(x) · d/dx ln f(x) = f'(x). We re-call f(x) here because the previous tensor was consumed inside the log; recomputing is cheap.
We ask autograd to differentiate the original messy f directly. PyTorch's chain rule already handles x^x because internally Pow with two tensor arguments has a known gradient formula. We compare the two results to confirm log-diff matches.
35Driver loop
Same five probe points as the plain-Python file. For each, we build a fresh tensor (autograd consumes graphs once unless retain_graph=True), call both methods, and print whether they agree to within floating-point tolerance.
LOOP TRACE · 5 iterations
x = 0.5
log-diff = 0.217148
direct = 0.217148
match = True
x = 1.0
log-diff = 1.000000
direct = 1.000000
match = True
x = 1.5
log-diff = 2.582057
direct = 2.582057
match = True
x = 2.0
log-diff = 6.772589
direct = 6.772589
match = True
x = 2.5
log-diff = 18.936735
direct = 18.936735
match = True
47torch.allclose comparison
We use allclose instead of `==` because the two paths add floating-point operations in different orders. Up to ~1e-6 relative error is normal; allclose handles this gracefully.
36 lines without explanation
1import torch
23# Define the function symbolically inside PyTorch4deff(x: torch.Tensor)-> torch.Tensor:5return x ** x # x must be positive for this to be real-valued678deff_prime_via_logdiff(x: torch.Tensor)-> torch.Tensor:9"""
10 Implements the textbook log-diff trick using autograd:
11 let g(x) = ln f(x)
12 then f'(x) = f(x) * g'(x)
13 """14# 1. Apply ln to convert multiplication/exponentiation into addition15 g = torch.log(f(x))# ln f(x) = ln(x^x) = x ln x1617# 2. Let autograd compute g'(x) for us18(g_prime,)= torch.autograd.grad(19 outputs=g,20 inputs=x,21 create_graph=False,22)2324# 3. Multiply back by f(x) to recover f'(x)25return f(x)* g_prime
262728deff_prime_direct(x: torch.Tensor)-> torch.Tensor:29"""Cross-check: ask autograd to differentiate the messy original directly."""30 y = f(x)31(y_prime,)= torch.autograd.grad(outputs=y, inputs=x)32return y_prime
333435if __name__ =="__main__":36 torch.set_printoptions(precision=6)37for xv in[0.5,1.0,1.5,2.0,2.5]:38 x = torch.tensor(xv, requires_grad=True)3940 via_log = f_prime_via_logdiff(x)4142# autograd needs a fresh graph for the second call, so rebuild x43 x2 = torch.tensor(xv, requires_grad=True)44 direct = f_prime_direct(x2)4546print(f"x={xv} log-diff={via_log.item():.6f} "47f"direct={direct.item():.6f} "48f"match={torch.allclose(via_log.detach(), direct.detach())}")
What this tells you: the chain-rule machinery inside autograd is the log-diff identity written down in code. Any time PyTorch differentiates an expression of the form a(x)b(x) it internally rewrites it as exp(b(x)lna(x)) and differentiates that. So when you use log-diff on paper, you are doing — by hand — the exact step modern autodiff systems do automatically.
Why ML Engineers Care
In statistics and machine learning we constantly minimize negative log-likelihoods. Given i.i.d. data, the likelihood is a product of densities:
L(θ)=∏i=1Np(xi∣θ)
Differentiating that product directly would be a nightmare for even modest N. But taking ln gives the log-likelihood:
ℓ(θ)=lnL(θ)=∑i=1Nlnp(xi∣θ)
Now the derivative is a sum of N clean terms — exactly what gradient descent needs. This is the log-diff trick applied at industrial scale.
Setting
Product form (don't differentiate)
Log form (do differentiate)
MLE for Gaussian mean
Π exp(−(xᵢ−μ)² / 2σ²)
−½σ⁻² Σ (xᵢ−μ)²
Softmax cross-entropy
−ln(Π probs_correctᵢ)
−Σ ln(probs_correctᵢ)
Variational free energy (ELBO)
log-ratio of densities
expectation of differences of logs
Diffusion model loss
score = ∇ log p_t
directly the logarithmic derivative
Every time you write F.cross_entropy(logits, targets) in PyTorch, the framework is doing log-diff for you — operating on log-probabilities for numerical stability and so that the gradient is a clean sum.
Summary
Log-diff is for three shapes: variable-base-variable-exponent (e.g. xx), messy products and quotients with many factors, and stacked exponents (towers).
The recipe is 4 steps: (1) take ln of both sides, (2) differentiate (the left side gets 1/y⋅dy/dx from the chain rule), (3) simplify, (4) multiply by y and substitute the original formula back.
The master identity is f′(x)=f(x)⋅dxd[lnf(x)]. Everything in this section flows from it.
Critical insight:f′/f is the logarithmic derivative — a measure of relative rate of change. It is the natural quantity for finance, biology, and ML loss surfaces.
Verification is cheap. Plain Python with math.log and a central difference, or PyTorch autograd.grad on a log-then-differentiate computation, will reproduce any log-diff result you derive by hand.
Connection to ML. Every log-likelihood you ever differentiate is an industrial-scale application of the exact same trick. Products of densities become sums of log densities, and those are what optimizers need.