Learning Objectives
After completing this section, you will be able to:
- Define logarithms as inverse functions of exponentials and translate between logarithmic and exponential forms
- Distinguish between common logarithms (base 10), natural logarithms (base e), and binary logarithms (base 2)
- Apply logarithm properties (product, quotient, and power rules) to simplify expressions and solve equations
- Use the change of base formula to convert between logarithms of different bases
- Graph logarithmic functions and identify key features: domain, range, asymptotes, and intercepts
- Recognize real-world applications including the Richter scale, decibels, pH, and information theory
- Understand why logarithms are essential in machine learning for numerical stability and gradient computation
The Story of Logarithms
A Revolution in Calculation
In the early 17th century, astronomers, navigators, and scientists faced an enormous computational challenge. Calculating planetary orbits, navigating ships across oceans, and conducting scientific experiments required multiplying and dividing very large numbers—a process that was tedious, error-prone, and could take hours or even days.
In 1614, Scottish mathematician John Napier published a revolutionary discovery that would transform calculation: logarithms. His key insight was profound:
Napier's Insight: Multiplication can be converted to addition by working with exponents. If you want to multiply two numbers, add their logarithms instead—then convert back.
This single idea reduced multiplication to addition, division to subtraction, and exponentiation to simple multiplication. Before electronic calculators, logarithm tables and slide rules (which are essentially analog logarithm computers) were indispensable tools used by scientists and engineers for over 300 years.
The Name "Logarithm"
Napier coined the term from Greek: logos (ratio, proportion) and arithmos (number). A logarithm is literally a "ratio number"—it captures the ratio or proportion of growth in exponential processes.
| Year | Development | Impact |
|---|---|---|
| 1614 | Napier publishes logarithm tables | Reduces calculation time by orders of magnitude |
| 1617 | Henry Briggs creates common (base 10) logarithms | Easier to use with decimal system |
| 1620s | Slide rules invented | Portable logarithm calculators used for 350 years |
| 1668 | Natural logarithm (base e) formalized | Essential for calculus and continuous growth |
| 1948 | Shannon\'s information theory | Logarithms measure information in bits |
| Today | Machine learning | Log-likelihood, cross-entropy, softmax stabilization |
Why This History Matters
Definition: The Inverse of Exponentials
The Fundamental Relationship
A logarithm answers a simple question: "To what power must I raise the base to get this number?"
Formally, for any base with :
In words: "the logarithm base of " equals if and only if " to the power equals ."
The Inverse Function Relationship
The logarithm function and the exponential function are inverse functions. This means they "undo" each other:
Domain Restriction
Intuition: The "What Power?" Question
Forget the symbols for a moment. Every logarithm is just a question. When you write you are literally asking: “starting from 2, how many times must I multiply by 2 to reach 8?” The answer is 3, because . That number you arrived at — that count of multiplications — is the logarithm.
Analogy: think of an exponential function as a recipe (“multiply by itself times to get a size ”). The logarithm is the recipe reader: given the finished cake of size , it tells you how many times you must have doubled (or tripled, or -tupled) to bake it. Exponentials grow; logs count growth steps. They are two views of the same coin.
📝 Worked example by hand: convert 5 exponential statements to logarithms (click to expand)
Translate each exponential statement into a logarithm by reading it out loud: “base, to what power, gives result?”
Round trip check. If , then plugging back gives . The two operations exactly undo each other. That round trip is the entire content of the inverse-function property.
Common and Natural Logarithms
The Three Most Important Bases
| Base | Name | Notation | Primary Use |
|---|---|---|---|
| b = 10 | Common logarithm | log(x) or log₁₀(x) | Scientific notation, orders of magnitude, decibels |
| b = e ≈ 2.718... | Natural logarithm | ln(x) or logₑ(x) | Calculus, continuous growth, ML/statistics |
| b = 2 | Binary logarithm | lg(x) or log₂(x) | Computer science, information theory, algorithms |
The Natural Logarithm: Why Base e?
The natural logarithm (base ) might seem like an arbitrary choice, but it's actually the most natural base for calculus. The number is special because:
No other base produces such clean derivatives. This is why natural logarithms dominate in calculus, differential equations, and any field dealing with continuous change.
The Natural Choice: When working with rates of change, growth, or decay, the natural logarithm is almost always the right choice. It simplifies derivatives, integrals, and the mathematics of continuous processes.
The Binary Logarithm: Counting Bits
In computer science, the binary logarithm tells you "how many bits do I need?" For a number :
- bits are needed to represent distinct values
- Binary search on elements takes comparisons
- A balanced binary tree with nodes has height
Notation Warning
Properties of Logarithms
The Three Fundamental Laws
The properties of logarithms follow directly from the properties of exponents. Since is the inverse of , the laws of exponents transform into laws of logarithms.
Try the interactive explorer below to see these properties in action:
Logarithm Properties Explorer
Adjust the sliders to see how logarithm properties hold for any positive values.
Key Values to Memorize
| Property | Formula | Why It's True |
|---|---|---|
| Log of 1 | logᵦ(1) = 0 | b⁰ = 1 for any base |
| Log of the base | logᵦ(b) = 1 | b¹ = b |
| Log of a power of base | logᵦ(bⁿ) = n | By definition of logarithm |
| Inverse composition | b^(logᵦ(x)) = x | Inverse functions cancel |
Change of Base Formula
What if you need but your calculator only has ln and log₁₀? The change of base formula converts between any bases:
Change of Base Formula Calculator
The change of base formula lets you convert between any logarithm bases using only one type of logarithm.
Computational Tip
log (natural log) and sometimeslog10. Use the change of base formula:log2(x) = log(x) / log(2)Graphing Logarithmic Functions
The graph of is the reflection of across the line . This reflection relationship between inverse functions is the heart of logarithmic graphs — swap x and y, and you swap the two curves.
Drag along the blue curve below. The amber dashed segment shows the reflection across the line : the moment you pick a point , the mirror point on the green exponential curve falls into place automatically. That single picture is the inverse relationship.
The Mirror: log and exp are reflections across y = x
Drag horizontally to move the point along the log curve. Its mirror image jumps to the matching point on the exponential curve. The dashed line y = x is the mirror.
Swap the coordinates of any point on the blue curve and you land on the green curve. That coordinate-swap is reflection across y = x.
Below is the more traditional graph with adjustable base and a click-to-probe coordinate readout — use it to read off specific values like or .
Interactive Logarithm Graph: y = loge(x)
Click anywhere on the graph to see coordinates. The logarithm and exponential are reflections across y = x.
Key Features of Logarithmic Graphs
| Feature | Value | Explanation |
|---|---|---|
| Domain | (0, +∞) | Only positive inputs allowed |
| Range | (-∞, +∞) | Output can be any real number |
| Vertical Asymptote | x = 0 | Graph approaches but never touches y-axis |
| x-intercept | (1, 0) | logᵦ(1) = 0 for any base |
| Key Point | (b, 1) | logᵦ(b) = 1 |
| Behavior as x → 0⁺ | y → -∞ | Logarithm of small positive numbers is very negative |
| Behavior as x → +∞ | y → +∞ | Grows without bound, but very slowly |
How the Base Affects the Graph
- Base > 1: The function is increasing. Larger x gives larger y.
- Larger base: The curve rises more slowly. Compare log₁₀(100) = 2 vs log₂(100) ≈ 6.64.
- Base between 0 and 1: The function is decreasing (rarely used in practice).
Transformations of Logarithmic Functions
Standard function transformations apply to logarithms:
| Transformation | Effect on Graph | Example |
|---|---|---|
| y = logᵦ(x) + k | Vertical shift up by k | ln(x) + 2 |
| y = logᵦ(x - h) | Horizontal shift right by h | ln(x - 3), asymptote moves to x = 3 |
| y = a · logᵦ(x) | Vertical stretch by factor a | 2 ln(x) |
| y = logᵦ(cx) | Horizontal compression by factor c | ln(2x) |
| y = -logᵦ(x) | Reflection over x-axis | -ln(x) |
Real-World Logarithmic Scales
Many natural phenomena span enormous ranges of values. Logarithmic scales compress these ranges into manageable numbers, making patterns visible that would be hidden on a linear scale.
Logarithmic Scales in the Real World
Earthquake Magnitudes (Richter Scale)
Each unit increase = 10x more ground motion, ~31.6x more energy
Why use logarithmic scales? They compress huge ranges of values into manageable numbers. The difference between a magnitude 5 and magnitude 9 earthquake is about 10,000x in ground motion, but only 4 units on the Richter scale.
Why Logarithmic Scales?
- Compress huge ranges: The visible light spectrum covers wavelengths from 400nm to 700nm, but the full electromagnetic spectrum spans from 10⁻¹⁵m (gamma rays) to 10⁸m (radio waves)—a factor of 10²³!
- Match human perception: Our ears perceive loudness logarithmically. A sound 10x more intense sounds about twice as loud.
- Reveal multiplicative patterns: Exponential growth appears as a straight line on a log scale, making it easy to identify.
- Compare relative changes: A doubling looks the same size whether it's from 10 to 20 or from 10,000 to 20,000.
The Richter Scale: A Deep Dive
The Richter magnitude of an earthquake is:
where is the measured amplitude and is a reference amplitude. Each unit increase in magnitude means:
- 10× more ground motion (amplitude)
- ~31.6× more energy released (since energy ∝ amplitude²)
Applications in Machine Learning
Logarithms are ubiquitous in machine learning, not as a historical curiosity, but as an essential tool for numerical stability and theoretical elegance.
Why ML Uses Log-Probabilities
Consider computing the likelihood of observing data given a model. If you have independent observations with probabilities :
The Problem: If each and :
For smaller probabilities or more samples, underflows to exactly 0 in floating-point arithmetic—catastrophic for training!
The Solution: Work with log-likelihood instead:
Products become sums, tiny numbers become manageable negative numbers, and gradients become stable.
Logarithms in Machine Learning: Log-Likelihood
Why ML uses logarithms: products become sums, tiny probabilities become manageable numbers.
Cross-entropy loss is the negative average log-likelihood. Minimizing cross-entropy = maximizing likelihood!
Plain Python first — multiply, then add logs
Before we touch any framework, let's do log-likelihood by hand. The point of this snippet is not to be efficient — it is to make you feel the difference between a product and a sum of logs.
The one-line moral
Cross-Entropy Loss
The cross-entropy loss for classification is the negative log-likelihood:
where is the true label (one-hot) and is the predicted probability distribution.
Log-Softmax for Numerical Stability
The softmax function converts logits to probabilities:
The problem: For large logits, overflows. The solution: Compute log-softmax directly:
Using the "log-sum-exp trick" with a shift :
We will now build the log-softmax twice. First in plain NumPy — line by line, with every value hand-traced — so you can see the math. Then we will hand it to PyTorch and watch the same numbers come out in one call. Plain Python first, framework second.
Step 1 — Pure NumPy: trace every line
Step 2 — The same idea in PyTorch (one call)
Once you understand the trick by hand, PyTorch hides the boilerplate. torch.nn.functional.log_softmax already uses the log-sum-exp trick internally, with extra speed-ups for GPU and autograd.
Information Theory: Bits and Entropy
Claude Shannon's information theory uses the binary logarithm to measure information:
The entropy (average information content) of a distribution is:
| Event Probability | Information Content | Intuition |
|---|---|---|
| p = 1 (certain) | -log₂(1) = 0 bits | No surprise, no information |
| p = 0.5 (coin flip) | -log₂(0.5) = 1 bit | One binary question answered |
| p = 0.25 | -log₂(0.25) = 2 bits | Two binary questions answered |
| p = 0.001 (rare) | -log₂(0.001) ≈ 10 bits | Very surprising, high information |
Numerical Computing Considerations
Common Pitfalls
Numerical Hazards with Logarithms
- log(0): Undefined (returns -inf in most libraries)
- log(negative): Undefined in reals (returns NaN)
- log(1 + x) for small x: Use
log1p(x)for accuracy - Subtracting large logs: Can cause precision loss; use log-sum-exp tricks
The log1p Function
When computing for small , direct computation loses precision because in floating point. The function log1p(x) computes accurately:
For tiny , the Taylor series of starts with , so to first order . The naive computationnp.log(1 + x) first adds 1 to x, and that addition is where the precision is destroyed — before the logarithm ever runs. log1p sidesteps the bad addition by evaluating the series directly.
Summary
| Concept | Key Formula | Application |
|---|---|---|
| Definition | logᵦ(x) = y ⟺ bʸ = x | Inverse of exponentials |
| Product Rule | log(xy) = log(x) + log(y) | Multiply → Add |
| Quotient Rule | log(x/y) = log(x) - log(y) | Divide → Subtract |
| Power Rule | log(xⁿ) = n·log(x) | Exponent → Multiply |
| Change of Base | logₐ(x) = log(x)/log(a) | Convert between bases |
| Natural Log | ln(x) = logₑ(x) | Calculus, ML, continuous growth |
| Log-Likelihood | log L = Σ log(pᵢ) | ML training stability |
Key Takeaways
- Logarithms are inverses of exponentials—they answer "what power?"
- Three important bases: e (calculus), 10 (scientific), 2 (computing)
- Properties convert operations: multiplication → addition, powers → multiplication
- Real-world scales (Richter, decibels, pH) compress huge ranges
- ML relies on logarithms for numerical stability and theoretical elegance
- Always check for edge cases: log(0), log(negative), precision for small arguments
Exercises
Conceptual Questions
- Explain in your own words why for any valid base .
- Why can't we take the logarithm of a negative number using real numbers?
- If , how many elements are in the set? How many comparisons would binary search take?
- A sound at 60 dB is how many times more intense than a sound at 40 dB?
Computational Problems
- Simplify: log_3(81) + log_3left(rac{1}{27} ight)
- Solve for x:
- Express in terms of common logarithms (base 10).
- If an earthquake releases 1000 times more energy than a magnitude 4 earthquake, what is its magnitude? (Hint: energy ratio ≈ )
Programming Challenges
- Implement a function that computes for any base using only the natural log.
- Write a numerically stable function to compute cross-entropy loss given predicted probabilities and true labels.
- Create a visualization comparing linear and logarithmic scales for the electromagnetic spectrum (wavelengths from 10⁻¹² to 10⁴ meters).
Exploration
- Research Benford's Law: why do leading digits in many datasets follow a logarithmic distribution?
- Investigate how logarithms appear in the analysis of algorithm complexity (e.g., why is merge sort O(n log n)?).
- Explore the connection between logarithms and music: why is the frequency ratio between octaves 2:1, and how do logarithms relate to the perception of pitch?
In the next section, we'll explore trigonometric functions—the mathematics of circular motion and periodic phenomena. These functions are essential for understanding waves, oscillations, rotations, and countless applications in physics, engineering, and signal processing.