Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section you will be able to:

Recognize the three function shapes that demand logarithmic differentiation — variable-base-variable-exponent (like $x^x$ ), messy products and quotients, and towers of exponents.
Apply the 4-step recipe: take $\ln$ of both sides, differentiate implicitly, simplify, multiply back by $y$ .
Derive the master formula $f'(x) = f(x)\,\dfrac{d}{dx}\bigl[\ln f(x)\bigr]$ from $(\ln y)' = y'/y$ .
Use log-diff to find critical points of functions the basic power and exponential rules cannot touch.
Verify any log-diff answer in Python with SymPy and in PyTorch with autograd.
Connect the trick to log-likelihoods in machine learning — the reason every ML paper differentiates sums of logs and never products of probabilities.

Why We Need a New Trick

So far in this chapter you have learned three derivative machines, and each one solves exactly one job:

Function shape	Rule that works	Example
Power: x^n (n is a fixed number)	Power rule	d/dx(x^3) = 3x^2
Exponential: a^x (a is a fixed number)	Exponential rule	d/dx(2^x) = 2^x · ln 2
Logarithm: log_a(x)	Log rule	d/dx(ln x) = 1/x

Now look at $f(x) = x^x$ . The base is $x$ (changing). The exponent is $x$ (also changing). The power rule wants the exponent to be constant. The exponential rule wants the base to be constant. Both refuse.

The pain point: tools that assume one slot is fixed cannot differentiate a function where both slots move at once.

Logarithmic differentiation is the universal escape hatch. The recipe is short, the algebra is mostly bookkeeping, and once you see it you will recognize whole new families of functions you can now differentiate by hand.

Big idea in one sentence: taking the natural log of both sides turns multiplication into addition, exponentiation into multiplication, and a stack of operators into a simple sum of friendly terms. Once the equation is a sum, every basic rule applies.

Intuition: ln() Linearizes Multiplication

Imagine you are an accountant and someone hands you the equation

y \;=\; x^2 \cdot (x+1)^3 \cdot e^{x}

Differentiating this directly is a triple-nested product rule and chain rule. You will fill half a page with parentheses and you will probably drop a sign somewhere. Painful.

But the same equation, viewed through $\ln$ , becomes

\ln y \;=\; 2\ln x \;+\; 3\ln(x+1) \;+\; x

That is a sum. Three simple terms. The right side is now a job for the sum rule plus three one-line applications of the log derivative.

Analogy. Think of

\ln

as a decoder ring that turns multiplicative structure into additive structure. We use it any time the original function is glued together by multiplication or exponentiation and we wish it were glued together by addition instead.

This is the same reason ML engineers always work with log-likelihoods instead of likelihoods: probabilities multiply, log-probabilities add, and additive things are easier to optimize, store, and differentiate.

The 4-Step Recipe

Given an equation $y = f(x)$ where $f$ is a tangle of products, quotients, or variable-exponent towers (and $f(x) > 0$ near the point of interest):

Take ln of both sides. The left becomes $\ln y$ . Use the log laws to break the right side into a sum: $\ln(ab) = \ln a + \ln b$ , $\ln(a/b) = \ln a - \ln b$ , $\ln(a^c) = c\ln a$ .
Differentiate both sides with respect to x. The left side gives $\dfrac{1}{y}\cdot\dfrac{dy}{dx}$ by the chain rule (because $y$ is itself a function of $x$ ). The right side is a sum, so differentiate term by term.
Simplify the right side. Plain algebra. No calculus left.
Multiply both sides by y to solve for $dy/dx$ , then substitute the original formula for $y$ so the final answer is in pure $x$ .

Step 2 is where almost every beginner mistake lives. The left side is not

d/dx[\ln y] = 1/y

. That would be true if

y

were the variable. But

y

is a function of

x

, so the chain rule fires and produces an extra

dy/dx

Interactive Walkthrough on x^x

Step through the recipe on $y = x^x$ below. Watch how line 1 contains an exponent and line 4 contains only sums and products.

Loading step-by-step pipeline…

The Master Formula

Strip away the names and the entire procedure is one identity. Starting from $g(x) = \ln f(x)$ and applying the chain rule,

g'(x) \;=\; \frac{d}{dx}\,\ln f(x) \;=\; \frac{f'(x)}{f(x)}

Solving for $f'(x)$ :

\boxed{\,f'(x) \;=\; f(x)\,\cdot\,\dfrac{d}{dx}\bigl[\ln f(x)\bigr]\,}

Read it out loud: “the derivative of f is f times the derivative of its log.” Every example in the rest of this section is a deliberate exercise of this one identity.

The expression

\dfrac{f'(x)}{f(x)}

is called the logarithmic derivative of

f

. It measures relative rate of change — what fraction of itself

f

changes by per unit

x

. In finance this is the instantaneous return; in biology, the instantaneous growth rate; in ML, the log-likelihood gradient.

Worked Example: $f(x) = x^x$ at $x = 2$

We will walk the full 4-step recipe and end with a single number. Try the steps on paper before opening the answer.

Click to expand the by-hand walkthrough

Step 1 — Take ln of both sides.

\ln y \;=\; \ln\!\bigl(x^x\bigr) \;=\; x\,\ln x

The log law

\ln(a^b) = b\ln a

just brought the exponent down out of the tower.

Step 2 — Differentiate both sides. On the left, chain rule:

\frac{1}{y}\,\frac{dy}{dx} \;=\; \frac{d}{dx}\bigl[x\,\ln x\bigr]

On the right, product rule with

u = x,\; v = \ln x

(uv)' \;=\; u'v + uv' \;=\; (1)(\ln x) + (x)\!\left(\tfrac{1}{x}\right) \;=\; \ln x + 1

Step 3 — Combine.

\frac{1}{y}\,\frac{dy}{dx} \;=\; \ln x + 1

Pure algebra from here.

Step 4 — Solve for dy/dx. Multiply both sides by

y

and substitute

y = x^x

\frac{dy}{dx} \;=\; y\,(\ln x + 1) \;=\; x^{x}\,(\ln x + 1)

Plug in x = 2.

f'(2) \;=\; 2^{2}\,(\ln 2 + 1) \;=\; 4\,(0.6931 + 1) \;=\; 4 \cdot 1.6931 \;\approx\; 6.7726

Sanity check by tiny step. A forward difference with

h = 0.001

:
f(2.001) − f(2) ≈ 4.006774 − 4.000000 = 0.006774
divide by h: 0.006774 / 0.001 ≈ 6.774
Matches our analytic

6.7726

to 3 decimals — exactly the agreement we expect from a first-order difference.

Interactive Log-Diff Explorer (4 Functions)

Pick any of the four notoriously annoying functions below. Drag the probe slider to read off $f(x)$ , $\ln f(x)$ , and $f'(x)$ simultaneously. Notice how dashed purple (the log) is always tame and smooth even when cyan (the original) explodes.

Loading interactive log-diff explorer…

What to look for. For

x^x

, slide x near

0.37

— that is

1/e

, and the orange

f'(x)

curve crosses zero exactly there. For

x^{1/x}

, slide x near

e \approx 2.718

— the same thing happens. Both critical points fall straight out of the log-diff identity and cannot be obtained any other way at this level.

Payoff: Finding the Minimum of $x^x$

Setting $f'(x) = x^x(\ln x + 1) = 0$ and using the fact that $x^x > 0$ always, the only way the product can be zero is $\ln x + 1 = 0$ , i.e. $x = e^{-1} = 1/e \approx 0.3679$ . At that point,

f(1/e) \;=\; \left(\tfrac{1}{e}\right)^{1/e} \;\approx\; 0.6922

Drag the slider below until the slope readout turns green at zero. That is the global minimum of $x^x$ on $(0, \infty)$ .

Loading minimum-of-xˣ explorer…

Why this is impressive. Without log-diff you would have no rule that even applies to

x^x

, let alone a way to find its critical point. The very existence of

x = 1/e

as a minimum is a result you can only obtain with logarithmic differentiation at this level of math.

Pattern 2: Messy Products and Quotients

Even when the exponents are constant, log-diff is a labor-saver whenever you have many factors. Consider

y \;=\; \dfrac{(x+1)^2\,\sqrt{x-1}}{(x+3)^4}

Direct differentiation requires the quotient rule wrapping a product rule wrapping the chain rule. Through $\ln$ the same thing becomes

\ln y \;=\; 2\ln(x+1) \;+\; \tfrac{1}{2}\ln(x-1) \;-\; 4\ln(x+3)

Differentiate each term in isolation:

\frac{1}{y}\frac{dy}{dx} \;=\; \frac{2}{x+1} \;+\; \frac{1}{2(x-1)} \;-\; \frac{4}{x+3}

Multiply back by y and we are done.

Pattern. Whenever a function has n factors connected by

\times

\div

, log-diff converts its derivative into a single sum of n simple rational terms. That is a linear-cost-in-n procedure where the direct method is closer to quadratic.

Pattern 3: Variable Base, Variable Exponent

This is the family that only log-diff can handle. Examples:

Function	Log-form	Derivative
y = x^x	ln y = x ln x	y' = x^x (ln x + 1)
y = x^{sin x}	ln y = sin(x) · ln x	y' = x^{sin x} ( cos x · ln x + sin x / x )
y = (sin x)^x (with sin x > 0)	ln y = x · ln(sin x)	y' = (sin x)^x ( ln(sin x) + x·cot x )
y = x^{1/x}	ln y = (1/x) · ln x	y' = x^{1/x} · (1 − ln x) / x^2

Every row above came out of the same 4-step recipe. Once the log step linearizes things, you are doing standard sum-rule and product-rule moves — no special memorization required.

Common Pitfalls (Read This!)

Pitfall 1: forgetting the chain rule on the left side. After taking

\ln

of both sides, the left is

\ln y

, and the derivative with respect to

x

(1/y) \cdot (dy/dx)

, not just

1/y

. Skipping the

dy/dx

is the #1 mistake.

Pitfall 2: applying it where $f$ is not strictly positive.

\ln

is undefined on

(-\infty, 0]

. If

f

changes sign, work with

\ln|f(x)|

— the identity

(\ln|f|)' = f'/f

still holds — and be extra careful at the zeros of

f

themselves.

Pitfall 3: confusing $x^x$ with $x^n$ or $a^x$ .

\dfrac{d}{dx}\,x^x \neq x\,x^{x-1}

(that would be the power rule) and

\neq x^x \ln x

(that would be the exponential rule applied with a wrong base). The correct derivative is

x^x(\ln x + 1)

— and the extra

+1

exists because the base is changing too.

Pitfall 4: forgetting to substitute $y$ back at the end. Your answer should be in pure

x

, not in

y

. Step 4 is finished only after you replace

y

with the original formula.

Python: Symbolic + Numerical Verification

Plain Python first. We will implement the log-diff identity $f'(x) = f(x)\cdot(\ln f)'(x)$ as a numerical estimator, then compare it against the analytic answer we derived by hand. If our algebra was right, the two columns should agree to ~10 decimals.

log_diff.py — pen-and-paper log-diff, in numerical form

🐍log_diff.py

Explanation(12)

Code(43)

1Imports — just the standard library

We only need math.log and a type hint for the callable. No NumPy. The whole point of this section is to make logarithmic differentiation feel mechanical, so we stay close to pen-and-paper math.

EXECUTION STATE

math.log = natural logarithm, base e

Callable = type alias for a function-valued argument

4Function signature: log_diff(f, x, h)

This is the central object of the file: a numerical estimator of f'(x) that goes through ln f first. Inputs are the function f, the point x, and the step size h used for the central difference on ln f.

EXAMPLE

log_diff(x_to_the_x, 2.0) → 6.7726

EXECUTION STATE

f = Callable[[float], float] — the function we want to differentiate

x = the point where we want f'(x)

h = 1e-6 (step size for the symmetric difference)

14Evaluate f(x) once

We call f(x) exactly once because (a) we need it later as a multiplier and (b) repeated calls would waste work. fx is the y-value of the original function at the probe point.

EXECUTION STATE

fx = f(x), e.g. 2^2 = 4.0 when x = 2

15Guard: log-diff only works where f > 0

ln is only defined for positive arguments, so log-diff requires f(x) > 0 in a neighborhood. For functions like x^x with x > 0, this is always fine. For signed functions, you would first take the absolute value (the identity d/dx ln|f| = f'/f still works).

EXECUTION STATE

ValueError = raised if f(x) ≤ 0

19Sample ln f at x + h

We push the input forward by h and immediately apply ln. ln f(x + h) is the value of the auxiliary, linearized function g(x) = ln f(x), evaluated slightly to the right of x.

EXAMPLE

f(2 + 1e-6) = 2.000001^2.000001 ≈ 4.0000048
ln(...) ≈ 1.38629580…

EXECUTION STATE

g_plus = ln f(x + h), e.g. 1.3863 for x = 2

20Sample ln f at x − h

Same idea, but a step to the left. We need both sides for the symmetric (central) difference, which has O(h²) accuracy instead of the O(h) accuracy of a one-sided difference.

EXECUTION STATE

g_minus = ln f(x − h)

21Central difference quotient

This estimates the derivative of g(x) = ln f(x) at the probe point. The (g_plus − g_minus) / (2h) form cancels the leading-order error term of the Taylor expansion, leaving an O(h²) approximation.

EXAMPLE

For f = x^x at x = 2:
  g_prime ≈ (1.38629548 − 1.38629375) / 2e-6
  g_prime ≈ 1.6931472

EXECUTION STATE

g_prime = (g_plus − g_minus) / (2h), e.g. 1.6931 at x = 2

24Multiply back by f(x) — the log-diff identity

Here is the whole point: d/dx ln f(x) = f'(x) / f(x), so f'(x) = f(x) · d/dx ln f(x). We just plug in the numerical estimate of d/dx ln f(x) and multiply by the original f(x). One identity does the entire job.

EXAMPLE

At x = 2:
  f'(2) ≈ 4.0 * 1.6931 = 6.7726
  (matches the analytic answer)

EXECUTION STATE

return value = f(x) · g_prime — our estimate of f'(x)

27Concrete test function: x ↦ x^x

We pick the canonical "hard" function for log-diff. The basic power rule says d/dx x^n = n x^(n−1) but n is constant. The basic exponential rule says d/dx a^x = a^x ln a but a is constant. x^x violates *both*. Log-diff is the way out.

33Analytical answer (so we can grade the numerics)

From the algebra ln(x^x) = x ln x → (ln f)' = ln x + 1 → f' = f · (ln x + 1) = x^x (ln x + 1). We include this so the test prints both the numerical estimate and the exact value side-by-side.

EXAMPLE

At x = 2:
  exact = 4 · (ln 2 + 1) = 4 · 1.6931 = 6.7726

37Driver loop — five sample points

We sweep x through 0.5, 1.0, 1.5, 2.0, 2.5 so the reader can see log-diff agree with the analytic formula across both the decreasing region (x < 1/e) and the increasing region (x > 1/e).

LOOP TRACE · 5 iterations

x = 0.5

f(x) = 0.5^0.5 ≈ 0.7071

numerical = 0.7071 · (ln 0.5 + 1) ≈ 0.2171

analytic = 0.2171

x = 1.0

f(x) = 1.0^1.0 = 1.0

numerical = 1.0 · (ln 1 + 1) = 1.0000

analytic = 1.0000

x = 1.5

f(x) = 1.5^1.5 ≈ 1.8371

numerical = 1.8371 · (ln 1.5 + 1) ≈ 2.5821

analytic = 2.5821

x = 2.0

f(x) = 2^2 = 4.0

numerical = 4.0 · (ln 2 + 1) ≈ 6.7726

analytic = 6.7726

x = 2.5

f(x) = 2.5^2.5 ≈ 9.8821

numerical = 9.8821 · (ln 2.5 + 1) ≈ 18.937

analytic = 18.937

38Print row — readable diagnostic table

Each row shows x, f(x), the numerical log-diff estimate, the analytic exact value, and the absolute error. With h = 1e-6 the error column will be ~1e-10 across the board, which is the signature of an O(h²) method on a smooth function.

31 lines without explanation

1import math
2from typing import Callable
3
4def log_diff(f: Callable[[float], float], x: float, h: float = 1e-6) -> float:
5    """
6    Numerically compute f'(x) using the *logarithmic-differentiation identity*
7
8        f'(x) = f(x) * d/dx [ ln f(x) ]
9
10    instead of differencing f directly. For functions like f(x) = x^x this
11    is dramatically more stable because ln(x^x) = x ln(x) is a smooth,
12    well-behaved sum — the original f explodes super-exponentially.
13    """
14    fx = f(x)
15    if fx <= 0:
16        raise ValueError("log-diff requires f(x) > 0 in a neighborhood of x")
17
18    # central difference on g(x) = ln f(x)
19    g_plus  = math.log(f(x + h))
20    g_minus = math.log(f(x - h))
21    g_prime = (g_plus - g_minus) / (2 * h)
22
23    # multiply back by f(x) to recover f'(x)
24    return fx * g_prime
25
26
27def x_to_the_x(x: float) -> float:
28    return x ** x
29
30
31# Analytical answer derived in the section:
32# f(x) = x^x  =>  f'(x) = x^x * (ln x + 1)
33def x_to_the_x_prime_analytic(x: float) -> float:
34    return (x ** x) * (math.log(x) + 1)
35
36
37if __name__ == "__main__":
38    for x in [0.5, 1.0, 1.5, 2.0, 2.5]:
39        numerical = log_diff(x_to_the_x, x)
40        analytic  = x_to_the_x_prime_analytic(x)
41        print(f"x={x:>4}  f(x)={x_to_the_x(x):10.4f}  "
42              f"num={numerical:10.4f}  exact={analytic:10.4f}  "
43              f"err={abs(numerical - analytic):.2e}")

Run the script and you will see error column entries on the order of 1e-10 for every row. That is the floating-point signature of an O(h²) central difference applied to a smooth function — confirmation that our hand-derived formula

f'(x) = x^x(\ln x + 1)

is correct.

PyTorch: Autograd Confirms the Recipe

PyTorch will happily differentiate $x^x$ directly via $\texttt{torch.autograd.grad}$ — internally, its PowBackward node already implements exactly the formula we just derived. That gives us a cross-check: we will compute the answer two independent ways (through $\ln$ first, vs. directly) and confirm they agree.

log_diff_torch.py — autograd reproduces the log-diff identity

🐍log_diff_torch.py

Explanation(12)

Code(48)

1Import PyTorch

Just torch. Everything we need — tensors, autograd, log — lives in the top-level namespace.

4The forward function f(x) = x^x

We define f as a normal PyTorch operation. Because x is a tensor with requires_grad=True (set in the driver loop), every operation we do here is recorded by autograd onto a computation graph.

EXAMPLE

f(tensor(2.0, requires_grad=True))
  → tensor(4.0, grad_fn=<PowBackward>)

8Log-diff path: derivative *via* ln f(x)

This function is the PyTorch translation of the pen-and-paper recipe. Step 1: build g = ln f. Step 2: ask autograd for g'. Step 3: multiply by f(x). Three lines, exact same logic as the math.

14g = torch.log(f(x))

This is the linearization step. Inside autograd's view, the computation graph is now: x → x^x → ln(x^x) = x ln x. We never explicitly write `x ln x`; PyTorch tracks the equivalent graph automatically.

EXECUTION STATE

g = tensor(1.3863, grad_fn=<LogBackward>) for x = 2

17torch.autograd.grad(outputs=g, inputs=x)

This call backpropagates from g to x and returns dg/dx as a one-element tuple. It is the autograd equivalent of `g.backward()` followed by `x.grad`, but without mutating x's .grad attribute — important when you want to call grad() multiple times in the same script.

EXAMPLE

g_prime = tensor(1.6931) for x = 2
  (matches ln(2) + 1 = 1.6931 to 4+ decimals)

18outputs=g

We are differentiating g (the ln of f), not f itself. This is the entire trick — pushing the differentiation through ln converts an exponential tangle into a sum.

19inputs=x

The variable with respect to which we want the gradient. x must have requires_grad=True or autograd refuses to track it.

20create_graph=False

We do NOT need second-order derivatives here, so we keep create_graph off to avoid building a graph-over-the-graph. If we wanted f''(x), we would set create_graph=True and call grad() a second time.

24return f(x) * g_prime

This is f(x) · d/dx ln f(x) = f'(x). We re-call f(x) here because the previous tensor was consumed inside the log; recomputing is cheap.

EXAMPLE

x = 2:
  f(x) = 4.0
  g_prime = 1.6931
  f'(x) = 4.0 · 1.6931 = 6.7726

27f_prime_direct — sanity check

We ask autograd to differentiate the original messy f directly. PyTorch's chain rule already handles x^x because internally Pow with two tensor arguments has a known gradient formula. We compare the two results to confirm log-diff matches.

35Driver loop

Same five probe points as the plain-Python file. For each, we build a fresh tensor (autograd consumes graphs once unless retain_graph=True), call both methods, and print whether they agree to within floating-point tolerance.

LOOP TRACE · 5 iterations

x = 0.5

log-diff = 0.217148

direct = 0.217148

match = True

x = 1.0

log-diff = 1.000000

direct = 1.000000

match = True

x = 1.5

log-diff = 2.582057

direct = 2.582057

match = True

x = 2.0

log-diff = 6.772589

direct = 6.772589

match = True

x = 2.5

log-diff = 18.936735

direct = 18.936735

match = True

47torch.allclose comparison

We use allclose instead of `==` because the two paths add floating-point operations in different orders. Up to ~1e-6 relative error is normal; allclose handles this gracefully.

36 lines without explanation

1import torch
2
3# Define the function symbolically inside PyTorch
4def f(x: torch.Tensor) -> torch.Tensor:
5    return x ** x  # x must be positive for this to be real-valued
6
7
8def f_prime_via_logdiff(x: torch.Tensor) -> torch.Tensor:
9    """
10    Implements the textbook log-diff trick using autograd:
11        let  g(x) = ln f(x)
12        then f'(x) = f(x) * g'(x)
13    """
14    # 1. Apply ln to convert multiplication/exponentiation into addition
15    g = torch.log(f(x))                  # ln f(x) = ln(x^x) = x ln x
16
17    # 2. Let autograd compute g'(x) for us
18    (g_prime,) = torch.autograd.grad(
19        outputs=g,
20        inputs=x,
21        create_graph=False,
22    )
23
24    # 3. Multiply back by f(x) to recover f'(x)
25    return f(x) * g_prime
26
27
28def f_prime_direct(x: torch.Tensor) -> torch.Tensor:
29    """Cross-check: ask autograd to differentiate the messy original directly."""
30    y = f(x)
31    (y_prime,) = torch.autograd.grad(outputs=y, inputs=x)
32    return y_prime
33
34
35if __name__ == "__main__":
36    torch.set_printoptions(precision=6)
37    for xv in [0.5, 1.0, 1.5, 2.0, 2.5]:
38        x = torch.tensor(xv, requires_grad=True)
39
40        via_log = f_prime_via_logdiff(x)
41
42        # autograd needs a fresh graph for the second call, so rebuild x
43        x2 = torch.tensor(xv, requires_grad=True)
44        direct = f_prime_direct(x2)
45
46        print(f"x={xv}  log-diff={via_log.item():.6f}  "
47              f"direct={direct.item():.6f}  "
48              f"match={torch.allclose(via_log.detach(), direct.detach())}")

What this tells you: the chain-rule machinery inside autograd is the log-diff identity written down in code. Any time PyTorch differentiates an expression of the form

a(x)^{b(x)}

it internally rewrites it as

\exp\!\bigl(b(x)\ln a(x)\bigr)

and differentiates that. So when you use log-diff on paper, you are doing — by hand — the exact step modern autodiff systems do automatically.

Why ML Engineers Care

In statistics and machine learning we constantly minimize negative log-likelihoods. Given i.i.d. data, the likelihood is a product of densities:

L(\theta) \;=\; \prod_{i=1}^{N} p(x_i \mid \theta)

Differentiating that product directly would be a nightmare for even modest $N$ . But taking $\ln$ gives the log-likelihood:

\ell(\theta) \;=\; \ln L(\theta) \;=\; \sum_{i=1}^{N} \ln p(x_i \mid \theta)

Now the derivative is a sum of $N$ clean terms — exactly what gradient descent needs. This is the log-diff trick applied at industrial scale.

Setting	Product form (don't differentiate)	Log form (do differentiate)
MLE for Gaussian mean	Π exp(−(xᵢ−μ)² / 2σ²)	−½σ⁻² Σ (xᵢ−μ)²
Softmax cross-entropy	−ln(Π probs_correctᵢ)	−Σ ln(probs_correctᵢ)
Variational free energy (ELBO)	log-ratio of densities	expectation of differences of logs
Diffusion model loss	score = ∇ log p_t	directly the logarithmic derivative

Every time you write F.cross_entropy(logits, targets) in PyTorch, the framework is doing log-diff for you — operating on log-probabilities for numerical stability and so that the gradient is a clean sum.

Summary

Log-diff is for three shapes: variable-base-variable-exponent (e.g. $x^x$ ), messy products and quotients with many factors, and stacked exponents (towers).
The recipe is 4 steps: (1) take $\ln$ of both sides, (2) differentiate (the left side gets $1/y \cdot dy/dx$ from the chain rule), (3) simplify, (4) multiply by $y$ and substitute the original formula back.
The master identity is $f'(x) = f(x)\cdot \dfrac{d}{dx}[\ln f(x)]$ . Everything in this section flows from it.
Critical insight: $f^\prime/f$ is the logarithmic derivative — a measure of relative rate of change. It is the natural quantity for finance, biology, and ML loss surfaces.
Verification is cheap. Plain Python with math.log and a central difference, or PyTorch autograd.grad on a log-then-differentiate computation, will reproduce any log-diff result you derive by hand.
Connection to ML. Every log-likelihood you ever differentiate is an industrial-scale application of the exact same trick. Products of densities become sums of log densities, and those are what optimizers need.

Learning Objectives

Why We Need a New Trick

Intuition: ln() Linearizes Multiplication

The 4-Step Recipe

Interactive Walkthrough on x^x

The Master Formula

Worked Example: f(x)=xxf(x) = x^xf(x)=xx at x=2x = 2x=2

Interactive Log-Diff Explorer (4 Functions)

Payoff: Finding the Minimum of xxx^xxx

Pattern 2: Messy Products and Quotients

Pattern 3: Variable Base, Variable Exponent

Common Pitfalls (Read This!)

Python: Symbolic + Numerical Verification

PyTorch: Autograd Confirms the Recipe

Why ML Engineers Care

Summary

Worked Example: $f(x) = x^x$ at $x = 2$

Payoff: Finding the Minimum of $x^x$