Learning Objectives
By the end of this section, you will be able to:
- Define exponential functions and explain the role of the base.
- Explain why Euler's number is special in calculus.
- Distinguish between exponential growth (base > 1) and decay (0 < base < 1).
- Derive from the compound-interest limit and confirm it numerically by hand.
- Recognize the unique self-derivative property .
- Apply exponentials to model growth, decay, cooling, and softmax / cross-entropy in ML.
The Big Picture: Why Exponential Functions Matter
"The greatest shortcoming of the human race is our inability to understand the exponential function." — Albert Bartlett, physicist
Linear functions describe additive change: every step adds the same amount. Exponential functions describe multiplicative change: every step multiplies by the same factor. That single switch — from plus to times — is responsible for everything from compound interest to viral spread to the softmax layer in a transformer.
The Core Insight
Linear: "I add 10 every hour" → 10, 20, 30, 40, … (arithmetic sequence).
Exponential: "I double every hour" → 1, 2, 4, 8, 16, 32, … (geometric sequence).
After 10 hours the linear rule gives 100; the exponential rule gives 1024. After 30 hours the linear rule gives 300; the exponential rule gives 1,073,741,824. Same starting point. Same time. A billion-fold gap. That gap is the exponential function.
The Intuition: a Bank Account That Refuses to Be Boring
Imagine putting $1 in an account that grows continuously — every instant, the account adds a little interest, and that new interest immediately starts earning its own interest. The account is not just growing; it is growing because it is growing. That self-referential loop is the soul of .
Where Exponential Functions Appear
🦠 Biology
- Bacterial colony growth
- Viral spread before immunity kicks in
- Radioactive decay of carbon-14
- Drug concentration after a single dose
💰 Finance
- Continuous compound interest
- Reinvested investment returns
- Inflation over decades
- Loan amortization curves
⚛️ Physics
- Radioactive half-life
- RC capacitor discharge
- Newton's law of cooling
- Atmospheric pressure vs altitude
🤖 Machine Learning
- Softmax over logits
- Cross-entropy loss
- Exponential learning-rate decay
- Attention weights in transformers
Historical Origins: From Logs to Limits
The story of exponential functions weaves three discoveries together.
1. Napier's Logarithms (1614)
John Napier invented logarithms to speed up astronomical multiplication. He observed:
Multiplication of large numbers reduced to addition of small ones. Logarithm tables exploded across Europe. But every logarithm implies an inverse — and that inverse is an exponential function.
2. Bernoulli's Banking Question (1683)
Jacob Bernoulli asked a question that looks like idle arithmetic but turned out to be cosmic:
If I invest $1 at 100% annual interest, and the bank compounds the interest more and more frequently, what is the most I can have at year's end?
Compounding once gives $2.00. Twice gives $2.25. Monthly gives $2.6130. Daily gives $2.7146. Hourly gives $2.7181. Continuous compounding gives a number Bernoulli could not put a name to:
That limit defines Euler's number. The interactive table further down lets you watch the convergence happen one row at a time.
3. Euler's Unification (1748)
Leonhard Euler recognized that has the magical property of being its own derivative, and he united exponentials with trigonometry through:
This connects the geometry of circles to the algebra of growth — one of the most beautiful equations in mathematics, and we will meet it again in Chapter 5.
Mathematical Definition
An exponential function with base (where and ) is defined as:
What Each Symbol Means
| Symbol | Name | Meaning |
|---|---|---|
| a | Base | A positive constant we keep fixed |
| x | Exponent | The variable input — can be any real number |
| a^x | Output | a multiplied by itself x times (extended to all reals by limits) |
Why the Restrictions?
- : A negative base breaks for fractional exponents — for example, is not real.
- : If , then for every x — a flat horizontal line, not an exponential.
- x can be any real number: Continuity lets us define for irrationals like via limits of rational exponents.
The Natural Exponential Function
The most important exponential uses Euler's number e:
It is often written in code and papers.
Exploring Exponential Functions
Drag the base slider below. Watch how the curve's shape transforms as you cross — the boundary between growth and decay. Hover anywhere on the plot to read off the exact value of at that x.
Exponential Function Explorer
Explore how changing the base affects exponential growth
What to Notice as You Play
- All curves cross at (0, 1). Because for every positive a — this is the family's universal anchor.
- Base > 1 → growth. The curve rises ever faster as x grows.
- 0 < Base < 1 → decay. The curve falls toward zero but never touches it.
- The x-axis is an asymptote on one side. Growth curves hug y=0 as x→-∞; decay curves hug it as x→+∞.
- The base e ≈ 2.718 sits between 2 and 3. It is geometrically unremarkable — but calculus elevates it to king status because its derivative equals itself.
Euler's Number e: The Most Important Constant in Calculus
Alongside , Euler's number is the most important constant in mathematics. It appears in any system where the rate of change is proportional to the current size — which is almost every system in nature.
Four Equivalent Definitions of e
Each definition emphasizes a different face of : financial, algebraic, differential, and integral. All four pinpoint the exact same constant.
Its Numerical Value
Like , is irrational (no fractional form) and transcendental (no polynomial with integer coefficients has it as a root).
Compound Interest & The Discovery of e
Play with the demo below. Push the compounding slider to the right — watch the final balance climb toward a ceiling. That ceiling is .
Compound Interest & The Discovery of e
See how continuous compounding leads to Euler's number
💡The Birth of e: What happens as we compound more frequently?
With $1 at 100% interest for 1 year, watch what happens to (1 + 1/n)^n as n increases:
| n (compounds/year) | (1 + 1/n)^n |
|---|---|
| 1 (annual) | 2.0000000000 |
| 2 | 2.2500000000 |
| 4 | 2.4414062500 |
| 12 (monthly) | 2.6130352902 |
| 52 | 2.6925969544 |
| 365 (daily) | 2.7145674820 |
| 8760 | 2.7181266916 |
| 100000 | 2.7182682372 |
| n → ∞ (limit) | e = 2.7182818284... |
This limit is exactly how Euler's number e was discovered!
The Compound Interest Formula
| Symbol | Meaning |
|---|---|
| A | Final amount after t years |
| P | Principal (initial deposit) |
| r | Annual interest rate (decimal, e.g. 0.05 for 5%) |
| n | Number of compounding periods per year |
| t | Time in years |
The Limit as Compounding Becomes Continuous
Let's push (compounding every instant). Substitute so that :
As we also have , and the inner bracket converges to . So:
The Practical Insight
Continuous compounding gives . In practice the gap between daily and continuous compounding is tiny — but the algebraic simplicity of makes it the formula of choice for every differential equation in finance and physics.
Worked Example: Bernoulli's Dollar by Hand
Before reading the code, do this with a pencil. Set , (100% annual interest), year. The formula collapses to . Compute six rows by hand and watch the convergence.
▶ Click to expand the full hand calculation
Step 1. Yearly compounding (n = 1).
Step 2. Semi-annual (n = 2). Inner term: .
Step 3. Quarterly (n = 4). Inner term: .
Step 4. Monthly (n = 12). Inner term: .
Step 5. Daily (n = 365). Inner term: .
Step 6. Hourly (n = 8760).
Pattern. Stack the answers side by side:
| n | A(n) | Gap to e ≈ 2.71828182 |
|---|---|---|
| 1 | 2.00000000 | ≈ 7.18 × 10⁻¹ |
| 2 | 2.25000000 | ≈ 4.68 × 10⁻¹ |
| 4 | 2.44140625 | ≈ 2.77 × 10⁻¹ |
| 12 | 2.61303529 | ≈ 1.05 × 10⁻¹ |
| 365 | 2.71456748 | ≈ 3.71 × 10⁻³ |
| 8 760 | 2.71812669 | ≈ 1.55 × 10⁻⁴ |
| ∞ | 2.71828182… | 0 |
Each time grows by roughly 10x, the gap shrinks by roughly 10x. The sequence is converging linearly to .
The intuition. For huge , every instant we add of our balance — and that tiny addition immediately starts earning its own interest. The infinite tower of "interest on interest on interest …" converges because each layer is times smaller than the previous one, and is finite. The sum is exactly .
Connection to the series definition. Bernoulli's limit and Euler's series are the same number: expanded by the binomial theorem yields, term by term, the partial sums of as .
Key Properties of Exponential Functions
The Laws of Exponents
| Property | Formula | Example |
|---|---|---|
| Product Rule | a^m · a^n = a^(m+n) | 2³ · 2² = 2⁵ = 32 |
| Quotient Rule | a^m / a^n = a^(m-n) | 3⁵ / 3² = 3³ = 27 |
| Power Rule | (a^m)^n = a^(m·n) | (2²)³ = 2⁶ = 64 |
| Zero Exponent | a^0 = 1 | 5⁰ = 1 |
| Negative Exponent | a^(-n) = 1 / a^n | 2⁻³ = 1/8 |
| Fractional Exponent | a^(1/n) = ⁿ√a | 8^(1/3) = 2 |
Function-Level Properties
- Domain: all reals .
- Range: positive reals .
- Y-intercept: always .
- Horizontal asymptote: the line .
- No x-intercept: for every x.
- Continuous and smooth: no breaks, no corners — derivatives of every order exist.
- One-to-one: distinct x give distinct y — so the inverse (the logarithm) exists.
Growth vs Decay: The Role of the Base
The base alone decides whether grows or shrinks as x increases.
📈 Exponential Growth (a > 1)
- Function increases as x increases
- Accelerates — the slope itself grows
- As ,
- As ,
Examples: populations, compound interest, viral spread.
📉 Exponential Decay (0 < a < 1)
- Function decreases as x increases
- Decelerates — the slope itself shrinks
- As ,
- As ,
Examples: radioactive decay, cooling, drug clearance.
Converting Between Growth and Decay
A decay can be rewritten as growth with a flipped exponent: . Likewise is the decay version of — same family, mirrored across the y-axis.
Preview: The Derivative of e^x
Here is the property that earns its place at the heart of calculus:
The slope of at any point equals the height of at that point. The function tells its own derivative what to be. In the demo below, slide the tangent point along the curve — the slope of the tangent line always equals the height of the function. Shrink the secant step toward zero and watch the secant collapse onto the tangent.
The Derivative of e^x: Visualized
Watch how the secant line approaches the tangent as h → 0
What Makes This Unique to e?
For any other base, the derivative carries an extra factor:
The factor is the "tax" that every base except pays for not being . Only when do we get , and the derivative collapses back to the function itself.
Why This Matters for Calculus
The self-derivative property makes the easiest function to differentiate and integrate:
- Derivative:
- Integral:
This is why appears in every solution of a linear differential equation, in every radioactive-decay formula, and in the softmax of every neural network.
Transformations of Exponential Functions
Like every function, exponentials can be shifted, stretched, and reflected. The general form is:
| Parameter | Effect | Example |
|---|---|---|
| A | Vertical stretch / compression; reflect if A<0 | 3 · 2^x is 3x taller than 2^x |
| a | Base — sets the growth/decay rate | e^x grows faster than 2^x |
| B | Horizontal stretch / compression | 2^(2x) compresses x-axis by 2 |
| h | Horizontal shift (right if h>0) | e^(x-2) shifts right by 2 |
| k | Vertical shift (up if k>0) | e^x + 3 shifts up by 3 |
Common Transformations
- : reflection through the y-axis (decay version of ).
- : reflection through the x-axis (always negative).
- : faster growth — horizontal compression by 2.
- : slower growth — horizontal stretch by 2.
Real-World Applications
1. Population Growth
Under unlimited resources, populations grow exponentially:
Here is the initial population, is the growth rate, and is time.
Example: Bacteria that double every 20 minutes have . Starting with 1000 cells, after 2 hours: .
2. Radioactive Decay
Unstable atoms decay independently and exponentially:
is the decay constant and is the half-life.
Example: Carbon-14 has a half-life of 5730 years — the foundation of carbon dating.
3. Newton's Law of Cooling
An object's temperature relaxes exponentially toward its surroundings:
is room temperature; is the starting temperature.
Example: A 90°C coffee in a 20°C room cools toward 20°C, with the gap shrinking exponentially.
Machine Learning Applications
Exponentials appear at the very heart of modern ML. The reason is always the same: we need to map arbitrary real-valued scores into positive numbers (probabilities, rates, weights), and is the smooth, differentiable way to do it.
1. Softmax
Converts a vector of logits into a probability distribution:
Every logit is exponentiated (forcing it positive), then normalized to sum to 1. Used in every classification head and inside every attention layer.
2. Cross-Entropy Loss
The logarithm is the inverse of the exponential in softmax — the two cancel out beautifully inside the gradient, producing the famously clean update .
3. Exponential Learning-Rate Decay
We start with a big learning rate (to make fast progress), then exponentially shrink it (to fine-tune as we converge). Same shape as radioactive decay — same math.
4. Attention Mechanisms
The softmax inside attention is the only nonlinearity in a transformer's mixing operation. The exponential makes attention weights sharp around the most-relevant key, while staying differentiable.
Python Implementation
We start with plain Python + NumPy + Matplotlib. First we plot the family; then we use the formula to watch e emerge numerically; then we verify the self-derivative property with a central-difference quotient.
Plotting the Exponential Family
Building e from Bernoulli's Limit
This is the worked-example table, but produced by the computer so we can push to a billion and confirm convergence to 9 decimals.
Verifying That e^x Is Its Own Derivative
PyTorch Implementation
Now in PyTorch. We'll use the very same in two places: building softmax probabilities (the ML-flavored use of exponentials) and confirming the self-derivative property via autograd (the calculus-flavored use).
Why autograd nails the self-derivative property exactly
When PyTorch computes the derivative of , it does not use a numerical approximation. Internally, ExpBackward caches (since the derivative happens to equal the forward value) and re-uses it as the gradient — so the assertion torch.allclose(y, x.grad) holds to within floating-point precision, not merely a tolerance.
Common Pitfalls
Confusing Exponential with Power Functions
(exponential) is NOT the same as (power function):
- : variable exponent, fixed base — exponential growth.
- : fixed exponent, variable base — polynomial growth.
For large x, exponentials always dominate: while .
Negative Bases Are Not Allowed
is not a valid exponential function:
- is not real.
- Many real x produce complex or undefined results.
That is why we require .
`e` vs `exp()` in code
In Python, e**x only works if you first set e = math.e. Otherwise e is undefined and you'll get a NameError. Prefer the explicit functions:
math.exp(x)ornumpy.exp(x)in Pythontorch.exp(x)in PyTorchMath.exp(x)in JavaScript
Softmax Overflow
A naive softmax computes directly. With a logit of 100, this is — far beyond float32's ~3.4e38 limit. Always subtract first (this is what F.softmax does). The result is mathematically identical but numerically safe.
Test Your Understanding
Test Your Understanding
Summary
Exponential functions describe multiplicative change — the universal pattern whenever the rate of change is proportional to the current quantity.
Key Formulas
| Formula | Description |
|---|---|
| f(x) = a^x | General exponential (a > 0, a ≠ 1) |
| f(x) = e^x | Natural exponential (e ≈ 2.718) |
| e = lim (1 + 1/n)^n | Bernoulli's definition of e |
| e = Σ 1/n! | Series definition of e |
| d/dx e^x = e^x | Self-derivative property |
| d/dx a^x = a^x · ln(a) | Derivative for any base |
| A = P e^(rt) | Continuous compounding / growth |
| softmax(z_i) = e^z_i / Σ e^z_j | ML probability normalization |
Key Takeaways
- Exponentials model multiplicative change: each step multiplies by the same factor.
- Every exponential passes through because .
- Base > 1 gives growth; 0 < base < 1 gives decay; the x-axis is the asymptote on one side.
- Euler's number emerges as the limit of — the natural ceiling of continuous compounding.
- is the unique exponential that equals its own derivative — confirmed analytically, numerically, and via PyTorch autograd.
- Exponentials are everywhere: population, decay, cooling, finance, softmax, attention, learning-rate schedules.
Coming Next: in the next section we invert the exponential to get the logarithm. You will see why turns multiplication into addition, why the natural log is the inverse of , and how all of this powers the cross-entropy loss in deep learning.