Chapter 1
22 min read
Section 5 of 353

Exponential Functions: The Mathematics of Growth

Mathematical Functions - The Building Blocks

Learning Objectives

By the end of this section, you will be able to:

  1. Define exponential functions f(x)=axf(x) = a^x and explain the role of the base.
  2. Explain why Euler's number e2.71828e \approx 2.71828 is special in calculus.
  3. Distinguish between exponential growth (base > 1) and decay (0 < base < 1).
  4. Derive ee from the compound-interest limit and confirm it numerically by hand.
  5. Recognize the unique self-derivative property ddxex=ex\tfrac{d}{dx} e^x = e^x.
  6. Apply exponentials to model growth, decay, cooling, and softmax / cross-entropy in ML.

The Big Picture: Why Exponential Functions Matter

"The greatest shortcoming of the human race is our inability to understand the exponential function." — Albert Bartlett, physicist

Linear functions describe additive change: every step adds the same amount. Exponential functions describe multiplicative change: every step multiplies by the same factor. That single switch — from plus to times — is responsible for everything from compound interest to viral spread to the softmax layer in a transformer.

The Core Insight

Linear: "I add 10 every hour" → 10, 20, 30, 40, … (arithmetic sequence).

Exponential: "I double every hour" → 1, 2, 4, 8, 16, 32, … (geometric sequence).

After 10 hours the linear rule gives 100; the exponential rule gives 1024. After 30 hours the linear rule gives 300; the exponential rule gives 1,073,741,824. Same starting point. Same time. A billion-fold gap. That gap is the exponential function.

The Intuition: a Bank Account That Refuses to Be Boring

Imagine putting $1 in an account that grows continuously — every instant, the account adds a little interest, and that new interest immediately starts earning its own interest. The account is not just growing; it is growing because it is growing. That self-referential loop is the soul of exe^x.

Where Exponential Functions Appear

🦠 Biology

  • Bacterial colony growth
  • Viral spread before immunity kicks in
  • Radioactive decay of carbon-14
  • Drug concentration after a single dose

💰 Finance

  • Continuous compound interest
  • Reinvested investment returns
  • Inflation over decades
  • Loan amortization curves

⚛️ Physics

  • Radioactive half-life
  • RC capacitor discharge
  • Newton's law of cooling
  • Atmospheric pressure vs altitude

🤖 Machine Learning

  • Softmax over logits
  • Cross-entropy loss
  • Exponential learning-rate decay
  • Attention weights in transformers

Historical Origins: From Logs to Limits

The story of exponential functions weaves three discoveries together.

1. Napier's Logarithms (1614)

John Napier invented logarithms to speed up astronomical multiplication. He observed:

log(a×b)=log(a)+log(b)\log(a \times b) = \log(a) + \log(b)

Multiplication of large numbers reduced to addition of small ones. Logarithm tables exploded across Europe. But every logarithm implies an inverse — and that inverse is an exponential function.

2. Bernoulli's Banking Question (1683)

Jacob Bernoulli asked a question that looks like idle arithmetic but turned out to be cosmic:

If I invest $1 at 100% annual interest, and the bank compounds the interest more and more frequently, what is the most I can have at year's end?

Compounding once gives $2.00. Twice gives $2.25. Monthly gives $2.6130. Daily gives $2.7146. Hourly gives $2.7181. Continuous compounding gives a number Bernoulli could not put a name to:

limn(1+1n)n=e2.71828\displaystyle\lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n = e \approx 2.71828\ldots

That limit defines Euler's number. The interactive table further down lets you watch the convergence happen one row at a time.

3. Euler's Unification (1748)

Leonhard Euler recognized that exe^x has the magical property of being its own derivative, and he united exponentials with trigonometry through:

eiθ=cosθ+isinθe^{i\theta} = \cos\theta + i\sin\theta

This connects the geometry of circles to the algebra of growth — one of the most beautiful equations in mathematics, and we will meet it again in Chapter 5.


Mathematical Definition

An exponential function with base aa (where a>0a > 0 and a1a \neq 1) is defined as:

f(x)=axfor every real x.f(x) = a^x \qquad \text{for every real } x.

What Each Symbol Means

SymbolNameMeaning
aBaseA positive constant we keep fixed
xExponentThe variable input — can be any real number
a^xOutputa multiplied by itself x times (extended to all reals by limits)

Why the Restrictions?

  • a>0a > 0: A negative base breaks for fractional exponents — for example, (1)1/2(-1)^{1/2} is not real.
  • a1a \neq 1: If a=1a=1, then 1x=11^x = 1 for every x — a flat horizontal line, not an exponential.
  • x can be any real number: Continuity lets us define axa^x for irrationals like 2π2^{\pi} via limits of rational exponents.

The Natural Exponential Function

The most important exponential uses Euler's number e:

f(x)=exwhere e2.71828f(x) = e^x \qquad \text{where } e \approx 2.71828\ldots

It is often written exp(x)\exp(x) in code and papers.


Exploring Exponential Functions

Drag the base slider below. Watch how the curve's shape transforms as you cross a=1a = 1 — the boundary between growth and decay. Hover anywhere on the plot to read off the exact value of axa^x at that x.

Exponential Function Explorer

Explore how changing the base affects exponential growth

0.1 (decay)e = 2.718...4 (rapid growth)
-3-2-10123012345678xf(x) = a^x(0, 1)y = 2.00^x
Y-Intercept
(0, 1)
a^0 = 1 for all a > 0
Behavior
Exponential Growth
Increases without bound
Derivative
2.00^x · ln(2.00)
≈ 0.693 × original

What to Notice as You Play

  • All curves cross at (0, 1). Because a0=1a^0 = 1 for every positive a — this is the family's universal anchor.
  • Base > 1 → growth. The curve rises ever faster as x grows.
  • 0 < Base < 1 → decay. The curve falls toward zero but never touches it.
  • The x-axis is an asymptote on one side. Growth curves hug y=0 as x→-∞; decay curves hug it as x→+∞.
  • The base e ≈ 2.718 sits between 2 and 3. It is geometrically unremarkable — but calculus elevates it to king status because its derivative equals itself.

Euler's Number e: The Most Important Constant in Calculus

Alongside π\pi, Euler's number ee is the most important constant in mathematics. It appears in any system where the rate of change is proportional to the current size — which is almost every system in nature.

Four Equivalent Definitions of e

1. As a limit (compound interest).
e=limn(1+1n)ne = \lim_{n \to \infty} \left(1 + \tfrac{1}{n}\right)^n
2. As an infinite series.
e=n=01n!=1+1+12+16+124+e = \sum_{n=0}^{\infty} \frac{1}{n!} = 1 + 1 + \tfrac{1}{2} + \tfrac{1}{6} + \tfrac{1}{24} + \cdots
3. As the base for which f' = f.
ddxex=ex\frac{d}{dx} e^x = e^x
4. As an area under 1/t.
e is the unique number with 1e1tdt=1.e \text{ is the unique number with } \int_1^e \tfrac{1}{t}\,dt = 1.

Each definition emphasizes a different face of ee: financial, algebraic, differential, and integral. All four pinpoint the exact same constant.

Its Numerical Value

e=2.71828182845904523536e = 2.71828\,18284\,59045\,23536\ldots

Like π\pi, ee is irrational (no fractional form) and transcendental (no polynomial with integer coefficients has it as a root).


Compound Interest & The Discovery of e

Play with the demo below. Push the compounding slider to the right — watch the final balance climb toward a ceiling. That ceiling is PertPe^{rt}.

Compound Interest & The Discovery of e

See how continuous compounding leads to Euler's number

0246810$0$500$1,000$1,500$2,000$2,500Time (years)Amount ($)Simple InterestCompound (12x/yr)Continuous (e^rt)
Simple Interest
$2,000
A = P(1 + rt)
Compound (12x/year)
$2,707.04
A = P(1 + r/n)^(nt)
Continuous
$2,718.28
A = Pe^(rt)

💡The Birth of e: What happens as we compound more frequently?

With $1 at 100% interest for 1 year, watch what happens to (1 + 1/n)^n as n increases:

n (compounds/year)(1 + 1/n)^n
1 (annual)2.0000000000
22.2500000000
42.4414062500
12 (monthly)2.6130352902
522.6925969544
365 (daily)2.7145674820
87602.7181266916
1000002.7182682372
n → ∞ (limit)e = 2.7182818284...

This limit is exactly how Euler's number e was discovered!

The Compound Interest Formula

A=P(1+rn)ntA = P \left(1 + \frac{r}{n}\right)^{nt}
SymbolMeaning
AFinal amount after t years
PPrincipal (initial deposit)
rAnnual interest rate (decimal, e.g. 0.05 for 5%)
nNumber of compounding periods per year
tTime in years

The Limit as Compounding Becomes Continuous

Let's push nn \to \infty (compounding every instant). Substitute m=n/rm = n/r so that n=mrn = mr:

A=P(1+rn)nt=P(1+1m)mrt=P[(1+1m)m]rt.A = P \left(1 + \tfrac{r}{n}\right)^{nt} = P \left(1 + \tfrac{1}{m}\right)^{mrt} = P \left[\left(1 + \tfrac{1}{m}\right)^{m}\right]^{rt}.

As nn \to \infty we also have mm \to \infty, and the inner bracket converges to ee. So:

A=Pert\boxed{\,A = P e^{rt}\,}

The Practical Insight

Continuous compounding gives A=PertA = P e^{rt}. In practice the gap between daily and continuous compounding is tiny — but the algebraic simplicity of erte^{rt} makes it the formula of choice for every differential equation in finance and physics.


Worked Example: Bernoulli's Dollar by Hand

Before reading the code, do this with a pencil. Set P=1P = 1, r=1r = 1 (100% annual interest), t=1t = 1 year. The formula collapses to A(n)=(1+1/n)nA(n) = (1 + 1/n)^n. Compute six rows by hand and watch the convergence.

▶ Click to expand the full hand calculation

Step 1. Yearly compounding (n = 1).

A(1)=(1+11)1=21=2.00000000A(1) = \left(1 + \tfrac{1}{1}\right)^1 = 2^1 = 2.00000000

Step 2. Semi-annual (n = 2). Inner term: 1+1/2=1.51 + 1/2 = 1.5.

A(2)=1.52=1.5×1.5=2.25000000A(2) = 1.5^2 = 1.5 \times 1.5 = 2.25000000

Step 3. Quarterly (n = 4). Inner term: 1.251.25.

A(4)=1.254=(1.252)2=1.56252=2.44140625A(4) = 1.25^4 = (1.25^2)^2 = 1.5625^2 = 2.44140625

Step 4. Monthly (n = 12). Inner term: 1+1/121.083331 + 1/12 \approx 1.08333.

A(12)=1.08333122.61303529A(12) = 1.08333\ldots^{12} \approx 2.61303529

Step 5. Daily (n = 365). Inner term: 1+1/3651.00273971 + 1/365 \approx 1.0027397.

A(365)=1.00273973652.71456748A(365) = 1.0027397\ldots^{365} \approx 2.71456748

Step 6. Hourly (n = 8760).

A(8760)2.71812669A(8760) \approx 2.71812669

Pattern. Stack the answers side by side:

nA(n)Gap to e ≈ 2.71828182
12.00000000≈ 7.18 × 10⁻¹
22.25000000≈ 4.68 × 10⁻¹
42.44140625≈ 2.77 × 10⁻¹
122.61303529≈ 1.05 × 10⁻¹
3652.71456748≈ 3.71 × 10⁻³
8 7602.71812669≈ 1.55 × 10⁻⁴
2.71828182…0

Each time nn grows by roughly 10x, the gap shrinks by roughly 10x. The sequence is converging linearly to ee.

The intuition. For huge nn, every instant we add 1/n1/n of our balance — and that tiny addition immediately starts earning its own interest. The infinite tower of "interest on interest on interest …" converges because each layer is 1/n1/n times smaller than the previous one, and 1/n!\sum 1/n! is finite. The sum is exactly ee.

Connection to the series definition. Bernoulli's limit and Euler's series are the same number: (1+1/n)n(1 + 1/n)^n expanded by the binomial theorem yields, term by term, the partial sums of 1/n!\sum 1/n! as nn \to \infty.


Key Properties of Exponential Functions

The Laws of Exponents

PropertyFormulaExample
Product Rulea^m · a^n = a^(m+n)2³ · 2² = 2⁵ = 32
Quotient Rulea^m / a^n = a^(m-n)3⁵ / 3² = 3³ = 27
Power Rule(a^m)^n = a^(m·n)(2²)³ = 2⁶ = 64
Zero Exponenta^0 = 15⁰ = 1
Negative Exponenta^(-n) = 1 / a^n2⁻³ = 1/8
Fractional Exponenta^(1/n) = ⁿ√a8^(1/3) = 2

Function-Level Properties

  • Domain: all reals (,)(-\infty, \infty).
  • Range: positive reals (0,)(0, \infty).
  • Y-intercept: always (0,1)(0, 1).
  • Horizontal asymptote: the line y=0y = 0.
  • No x-intercept: ax>0a^x > 0 for every x.
  • Continuous and smooth: no breaks, no corners — derivatives of every order exist.
  • One-to-one: distinct x give distinct y — so the inverse (the logarithm) exists.

Growth vs Decay: The Role of the Base

The base alone decides whether axa^x grows or shrinks as x increases.

📈 Exponential Growth (a > 1)

f(x)=ax with a>1f(x) = a^x \text{ with } a > 1
  • Function increases as x increases
  • Accelerates — the slope itself grows
  • As x+x \to +\infty, f(x)+f(x) \to +\infty
  • As xx \to -\infty, f(x)0+f(x) \to 0^+

Examples: populations, compound interest, viral spread.

📉 Exponential Decay (0 < a < 1)

f(x)=ax with 0<a<1f(x) = a^x \text{ with } 0 < a < 1
  • Function decreases as x increases
  • Decelerates — the slope itself shrinks
  • As x+x \to +\infty, f(x)0+f(x) \to 0^+
  • As xx \to -\infty, f(x)+f(x) \to +\infty

Examples: radioactive decay, cooling, drug clearance.

Converting Between Growth and Decay

A decay (1/2)x(1/2)^x can be rewritten as growth with a flipped exponent: (1/2)x=2x(1/2)^x = 2^{-x}. Likewise exe^{-x} is the decay version of exe^x — same family, mirrored across the y-axis.


Preview: The Derivative of e^x

Here is the property that earns ee its place at the heart of calculus:

ddxex=ex.\frac{d}{dx} e^x = e^x.

The slope of exe^x at any point equals the height of exe^x at that point. The function tells its own derivative what to be. In the demo below, slide the tangent point along the curve — the slope of the tangent line always equals the height of the function. Shrink the secant step hh toward zero and watch the secant collapse onto the tangent.

The Derivative of e^x: Visualized

Watch how the secant line approaches the tangent as h → 0

-2-10123024681012h = 0.500f(x)f(x) = e^xSecant lineTangent line
Secant Slope (Difference Quotient)
[f(x+h) - f(x)] / h
= [e^(1.00 + 0.500) - e^1.00] / 0.500
= 3.526814
Tangent Slope (True Derivative)
f'(x) = e^x
= e^1.00
= 2.718282
Error: |Secant - Tangent|
8.0853e-1
As h → 0, error → 0
Secant slope converges to tangent slope!
The Magical Property of e^x
Notice: f(x) = e^x = 2.7183 and f'(x) = e^x = 2.7183
They're the same! The function equals its own derivative.

What Makes This Unique to e?

For any other base, the derivative carries an extra factor:

ddxax=axln(a).\frac{d}{dx} a^x = a^x \cdot \ln(a).

The factor ln(a)\ln(a) is the "tax" that every base except ee pays for not being ee. Only when a=ea = e do we get ln(e)=1\ln(e) = 1, and the derivative collapses back to the function itself.

Why This Matters for Calculus

The self-derivative property makes exe^x the easiest function to differentiate and integrate:

  • Derivative: ddxex=ex\dfrac{d}{dx} e^x = e^x
  • Integral: exdx=ex+C\displaystyle\int e^x \, dx = e^x + C

This is why ee appears in every solution of a linear differential equation, in every radioactive-decay formula, and in the softmax of every neural network.


Transformations of Exponential Functions

Like every function, exponentials can be shifted, stretched, and reflected. The general form is:

f(x)=AaB(xh)+k.f(x) = A \cdot a^{\,B(x - h)} + k.
ParameterEffectExample
AVertical stretch / compression; reflect if A<03 · 2^x is 3x taller than 2^x
aBase — sets the growth/decay ratee^x grows faster than 2^x
BHorizontal stretch / compression2^(2x) compresses x-axis by 2
hHorizontal shift (right if h>0)e^(x-2) shifts right by 2
kVertical shift (up if k>0)e^x + 3 shifts up by 3

Common Transformations

  • exe^{-x}: reflection through the y-axis (decay version of exe^x).
  • ex-e^x: reflection through the x-axis (always negative).
  • e2xe^{2x}: faster growth — horizontal compression by 2.
  • ex/2e^{x/2}: slower growth — horizontal stretch by 2.

Real-World Applications

1. Population Growth

Under unlimited resources, populations grow exponentially:

P(t)=P0ert.P(t) = P_0 \, e^{rt}.

Here P0P_0 is the initial population, rr is the growth rate, and tt is time.

Example: Bacteria that double every 20 minutes have r=ln(2)/200.0347/minr = \ln(2)/20 \approx 0.0347/\text{min}. Starting with 1000 cells, after 2 hours: P=1000e0.0347×12064,000P = 1000 \cdot e^{0.0347 \times 120} \approx 64{,}000.

2. Radioactive Decay

Unstable atoms decay independently and exponentially:

N(t)=N0eλt=N0(12)t/t1/2.N(t) = N_0 \, e^{-\lambda t} = N_0 \left(\tfrac{1}{2}\right)^{t/t_{1/2}}.

λ\lambda is the decay constant and t1/2t_{1/2} is the half-life.

Example: Carbon-14 has a half-life of 5730 years — the foundation of carbon dating.

3. Newton's Law of Cooling

An object's temperature relaxes exponentially toward its surroundings:

T(t)=Tenv+(T0Tenv)ekt.T(t) = T_{\text{env}} + (T_0 - T_{\text{env}}) \, e^{-kt}.

TenvT_{\text{env}} is room temperature; T0T_0 is the starting temperature.

Example: A 90°C coffee in a 20°C room cools toward 20°C, with the gap shrinking exponentially.


Machine Learning Applications

Exponentials appear at the very heart of modern ML. The reason is always the same: we need to map arbitrary real-valued scores into positive numbers (probabilities, rates, weights), and exe^x is the smooth, differentiable way to do it.

1. Softmax

Converts a vector of logits into a probability distribution:

softmax(zi)=ezijezj.\mathrm{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}.

Every logit is exponentiated (forcing it positive), then normalized to sum to 1. Used in every classification head and inside every attention layer.

2. Cross-Entropy Loss

L=iyilog(y^i).\mathcal{L} = -\sum_{i} y_i \log(\hat{y}_i).

The logarithm is the inverse of the exponential in softmax — the two cancel out beautifully inside the gradient, producing the famously clean update L/zi=y^iyi\partial \mathcal{L} / \partial z_i = \hat{y}_i - y_i.

3. Exponential Learning-Rate Decay

ηt=η0ekt.\eta_t = \eta_0 \, e^{-k t}.

We start with a big learning rate (to make fast progress), then exponentially shrink it (to fine-tune as we converge). Same shape as radioactive decay — same math.

4. Attention Mechanisms

Attention(Q,K,V)=softmax ⁣(QKdk)V.\mathrm{Attention}(Q, K, V) = \mathrm{softmax}\!\left(\tfrac{QK^\top}{\sqrt{d_k}}\right) V.

The softmax inside attention is the only nonlinearity in a transformer's mixing operation. The exponential makes attention weights sharp around the most-relevant key, while staying differentiable.


Python Implementation

We start with plain Python + NumPy + Matplotlib. First we plot the family; then we use the formula A(n)=(1+1/n)nA(n) = (1 + 1/n)^n to watch e emerge numerically; then we verify the self-derivative property with a central-difference quotient.

Plotting the Exponential Family

Visualizing four exponential functions on one axes
🐍exponential_plot.py
1Import NumPy

NumPy gives us vectorized math. np.power and np.exp accept arrays element-wise, so we can compute base^x for 200 points at once.

EXAMPLE
np.power(2.0, np.array([0, 1, 2])) -> array([1., 2., 4.])
2Import matplotlib

pyplot is the plotting interface we'll use to draw the four curves on a single axes.

4Define exponential(x, base)

A thin wrapper around np.power so the call site reads like the math: exponential(x, base) computes base raised to x.

EXAMPLE
exponential(np.array([0, 1, 2]), 3.0) -> array([1., 3., 9.])
6Docstring states the math

The function implements f(x) = base ** x. We keep the docstring short because the formula is the whole story.

7Return base ** x

np.power broadcasts: scalar base, array x -> array of base^x. This is the single line where the entire visualization is computed.

EXAMPLE
Inputs base=2, x=[0,1,2,3] -> Output [1, 2, 4, 8]
9Create the x-axis grid

200 evenly spaced points from -3 to 3. Density matters: too few points and the curve looks like polylines, too many wastes work. 200 is a good default for a smooth curve over a small interval.

EXAMPLE
x[0] = -3.0, x[1] ≈ -2.9698, ..., x[199] = 3.0
11Four bases to compare

We pick bases that span the qualitative behaviors: 0.5 (decay), e ≈ 2.7183 (the natural base), 2.0 (a popular discrete base), and 3.0 (faster growth than e).

EXAMPLE
At x=1: 0.5^1=0.5, e^1≈2.7183, 2^1=2.0, 3^1=3.0. Notice e lives between 2 and 3.
12Color palette

Distinct hex colors so each curve stays visually separable when they overlap near x=0.

13Legend labels

We tag 0.5^x as decay so the reader doesn't have to compute that 0.5 < 1 implies a falling curve.

15Create the figure

figsize=(10, 6) gives a wide rectangle — better for showing that x = 3 is a *short* x-distance but a *huge* y-distance for exponential growth.

16Loop over the four bases

zip pairs (base, color, label) so each iteration draws one curve in its assigned color with its assigned legend entry.

EXAMPLE
Iter 1: base=0.5, color='#ef4444', label='0.5^x (decay)'
Iter 2: base=e, color='#22c55e', label='e^x'
Iter 3: base=2.0, color='#3b82f6', label='2^x'
Iter 4: base=3.0, color='#8b5cf6', label='3^x'
17Compute y = base^x

This is the line that produces the actual data. For base=2.0 and x=[-3,...,3] the endpoints are 2^-3=0.125 and 2^3=8.0 — a 64x range from a small change in x.

18Plot the curve

linewidth=2 is thick enough that all four curves remain readable when they cross near (0, 1).

21Mark (0, 1) — the universal anchor

Every exponential a^x passes through (0, 1) because a^0 = 1 for any positive a. This is the only point all four curves share.

EXAMPLE
0.5^0 = 1, e^0 = 1, 2^0 = 1, 3^0 = 1
22Annotate the anchor

A small label so the reader's eye lands on (0, 1) immediately — the geometric heart of the family.

24Horizontal reference line y=0

The x-axis is the horizontal asymptote: a^x → 0 as x → -∞ for a > 1, and as x → +∞ for 0 < a < 1. The curves approach this line but never touch it.

25Vertical reference line x=0

Helps the eye align (0, 1) with the y-axis.

26Axis labels

We label the axes x and f(x). Notice we are NOT labeling f(x) as y — these are different concepts in calculus: f(x) is a value of the function.

27Title and legend

Title states what the figure shows; legend distinguishes the four curves by color and label.

29Clip the y-range to [-1, 10]

Without this, 3^3 = 27 would dominate the plot and squash the interesting region near (0, 1). Clipping is a pedagogical choice, not a mathematical one.

EXAMPLE
Unclipped y-range would be roughly [0, 27]; clipped to [-1, 10] lets us still see 0.5^x decay clearly.
30Render

plt.show() pumps the figure to the screen. In a notebook this is implicit; in a script it's required.

8 lines without explanation
1import numpy as np
2import matplotlib.pyplot as plt
3
4def exponential(x, base):
5    """General exponential function f(x) = base ** x."""
6    return np.power(base, x)
7
8x = np.linspace(-3, 3, 200)
9
10bases  = [0.5, np.e, 2.0, 3.0]
11colors = ['#ef4444', '#22c55e', '#3b82f6', '#8b5cf6']
12labels = ['0.5^x (decay)', 'e^x', '2^x', '3^x']
13
14plt.figure(figsize=(10, 6))
15for base, color, label in zip(bases, colors, labels):
16    y = exponential(x, base)
17    plt.plot(x, y, color=color, linewidth=2, label=label)
18
19# The universal anchor point (0, 1) — true for every base.
20plt.scatter([0], [1], color='red', s=80, zorder=5)
21plt.annotate('(0, 1)', (0.1, 1.3), fontsize=12)
22
23plt.axhline(0, color='gray', linestyle='--', alpha=0.5)
24plt.axvline(0, color='gray', linestyle='--', alpha=0.5)
25plt.xlabel('x'); plt.ylabel('f(x)')
26plt.title('Exponential functions with different bases')
27plt.legend(); plt.grid(True, alpha=0.3)
28plt.ylim(-1, 10)
29plt.show()

Building e from Bernoulli's Limit

This is the worked-example table, but produced by the computer so we can push nn to a billion and confirm convergence to 9 decimals.

Numerically discovering e from (1 + 1/n)^n
🐍euler_from_limit.py
1Import math

We only need math.e (the exact constant e ≈ 2.7182818284...) for the comparison column. Everything else is pure arithmetic.

EXAMPLE
math.e = 2.718281828459045
9Choose compounding frequencies

We sweep n through the natural physical cadences: yearly (1), semi-annual (2), quarterly (4), monthly (12), daily (365), hourly (8760), every minute (525,600), every second (31,536,000).

EXAMPLE
8_760 = 24 * 365 (hours in a year)
525_600 = 60 * 8_760 (minutes in a year)
11Header row

Three columns: the compounding frequency n, the resulting balance A(n), and how far it is from e. We use right-justified widths so the decimal points line up.

12Separator

A line of dashes that visually delimits the header from the data — a small thing that makes the output skim-readable.

14Loop over n

Each iteration computes one row of the convergence table. The loop body is intentionally short: one formula, one gap, one print.

EXAMPLE
Iter 1: n=1     -> A(1)     = 2.0000000000      gap ≈ 7.18e-01
Iter 2: n=2     -> A(2)     = 2.2500000000      gap ≈ 4.68e-01
Iter 3: n=4     -> A(4)     ≈ 2.4414062500      gap ≈ 2.77e-01
Iter 4: n=12    -> A(12)    ≈ 2.6130352902      gap ≈ 1.05e-01
Iter 5: n=365   -> A(365)   ≈ 2.7145674820      gap ≈ 3.71e-03
Iter 6: n=8760  -> A(8760)  ≈ 2.7181266916      gap ≈ 1.55e-04
15Compute A(n) = (1 + 1/n)^n

This single line is the entire Bernoulli experiment. Notice it has two competing forces: (1 + 1/n) → 1 (which would push the whole expression to 1), and the exponent n → ∞ (which would push toward ∞). The limit is finite: e.

EXAMPLE
n=1:   (1 + 1)^1     = 2^1     = 2.0
n=2:   (1 + 0.5)^2   = 1.5^2   = 2.25
n=4:   (1.25)^4               ≈ 2.44140625
n=12:  (1.08333...)^12         ≈ 2.61303529
16Compute the gap to e

math.e - a_n is positive and shrinks toward zero — empirical proof that A(n) → e from below. The gap roughly halves each time n is multiplied by ~10, which is the rate-of-convergence signature of this limit (gap ≈ e / (2n)).

EXAMPLE
n=12:    gap ≈ 1.053e-01
n=365:   gap ≈ 3.715e-03   (about 28x smaller; n grew 30x)
n=8760:  gap ≈ 1.550e-04   (about 24x smaller; n grew 24x)
17Print a formatted row

f-string format specs: '>12' = right-justified width 12, '.10f' = 10 decimal places, '.2e' = scientific notation with 2 decimals. The alignment makes the convergence visually obvious.

19Print the true value of e

Showing math.e at the bottom lets the reader confirm by eye that the column on row n=31,536,000 matches e to ~9 decimal places — proof that 'continuous compounding' is just 'compounding so often the limit is reached for any practical accuracy'.

EXAMPLE
Final row: n=31_536_000 -> A(n) ≈ 2.7182817853  vs  e ≈ 2.7182818285
10 lines without explanation
1import math
2
3# Bernoulli's question (1683):
4#   $1 at 100% annual interest, compounded n times per year.
5#   What is the maximum we can earn in one year as n -> infinity?
6#
7# A(n) = (1 + 1/n) ** n
8
9ns = [1, 2, 4, 12, 365, 8_760, 525_600, 31_536_000]
10
11print(f"{'n':>12}  {'A(n) = (1 + 1/n)^n':>22}  {'gap to e':>12}")
12print("-" * 52)
13
14for n in ns:
15    a_n = (1 + 1 / n) ** n
16    gap = math.e - a_n
17    print(f"{n:>12}  {a_n:>22.10f}  {gap:>12.2e}")
18
19print(f"\n{'e':>12}  {math.e:>22.10f}")

Verifying That e^x Is Its Own Derivative

d/dx e^x = e^x, verified analytically and numerically
🐍exponential_derivative.py
1Import NumPy

We use np.exp (the natural exponential), np.power (general exponential), and np.log (natural logarithm).

3Define derivative_exponential

Returns the analytic derivative of base^x. The formula d/dx[a^x] = a^x · ln(a) follows from rewriting a^x = e^(x ln a) and applying the chain rule.

5Docstring captures the magic

The docstring states the rule and immediately notes the special case base = e where ln(e) = 1 collapses the formula to f'(x) = f(x).

8Return a^x · ln(a)

Vectorized: np.power(base, x) gives a^x, np.log(base) gives ln(a). Their product is the derivative.

EXAMPLE
derivative_exponential(1.5, 2.0)
  np.power(2.0, 1.5) = 2.8284271247
  np.log(2.0)        = 0.6931471806
  product            = 1.9605559857
11Choose a test point

x = 1.5 is a generic non-integer point. We avoid x=0 and x=1 because they are degenerate (a^0=1, a^1=a) and don't expose the multiplicative structure.

14Compute f(x) = e^x at x = 1.5

np.exp(1.5) ≈ 4.4816890703. This is the value of the natural exponential at x = 1.5.

EXAMPLE
f_e = e^1.5 ≈ 4.4816890703
15Compute the analytic derivative

For base e: f'(x) = e^x — the very same value. So f_e_prime is the same number as f_e. This is the defining miracle of e.

EXAMPLE
f_e_prime = e^1.5 ≈ 4.4816890703   (identical to f_e)
18Pick a tiny step h

h = 1e-6 is small enough that the central-difference quotient is accurate to roughly 11 decimal places (error ~ h^2 / 6 · f'''), but large enough to avoid floating-point cancellation.

19Central-difference quotient

Numerical derivative: (f(x+h) - f(x-h)) / (2h). This independent estimate confirms the analytic answer — useful because students often distrust the 'it equals itself' claim.

EXAMPLE
x = 1.5, h = 1e-6
  e^(1.500001)  ≈ 4.481693552
  e^(1.499999)  ≈ 4.481684589
  difference    ≈ 0.000008963
  / (2*h)       ≈ 4.4816890703   <-- matches analytic!
21Print the e^x summary

Three lines show f(x), the analytic f'(x), and the numeric f'(x). They are all the same number to 10 decimal places.

25The killer line: ratio f'(x)/f(x) = 1

Among all positive bases, only base e gives this exact ratio of 1. Every other base produces a ratio equal to ln(base).

28Now switch to base 2

We repeat the same experiment with base = 2.0 to show what happens for a 'non-magic' base.

29Compute 2^x and its derivative

Same structure: np.power for the value, derivative_exponential for the analytic derivative.

EXAMPLE
f_2 = 2^1.5 ≈ 2.8284271247
f_2_prime = 2^1.5 · ln(2) ≈ 1.9605559857
32Compute the ratio for 2^x

ratio = f'(x) / f(x) = ln(2) ≈ 0.6931, NOT 1. The function 2^x grows, but its growth rate is only 69.3% of its current value.

34Print the 2^x summary

The final line shows ratio ≈ 0.6931 alongside ln(2) ≈ 0.6931 — confirming d/dx[2^x] = 2^x · ln(2).

EXAMPLE
Expected stdout:
  2^x at x=1.5:
    f(x)             = 2.8284271247
    f'(x)            = 1.9605559857
    ratio f'(x)/f(x) = 0.6931471806   <-- equals ln(2) = 0.6931471806
21 lines without explanation
1import numpy as np
2
3def derivative_exponential(x, base):
4    """
5    Analytic derivative of f(x) = base^x is f'(x) = base^x * ln(base).
6    Special case: when base = e, ln(e) = 1, so f'(x) = e^x = f(x).
7    """
8    return np.power(base, x) * np.log(base)
9
10# Pick a test point.
11x = 1.5
12
13# Case 1: base = e (the magic base)
14f_e        = np.exp(x)                       # value of e^x at x=1.5
15f_e_prime  = np.exp(x)                       # analytic derivative
16
17# Numerical sanity check: central difference quotient.
18h = 1e-6
19numeric_e_prime = (np.exp(x + h) - np.exp(x - h)) / (2 * h)
20
21print(f"e^x at x={x}:")
22print(f"  f(x)             = {f_e:.10f}")
23print(f"  f'(x) analytic   = {f_e_prime:.10f}")
24print(f"  f'(x) numeric    = {numeric_e_prime:.10f}")
25print(f"  ratio f'(x)/f(x) = {f_e_prime / f_e:.10f}   <-- exactly 1")
26
27# Case 2: base = 2 (typical exponential, not magic)
28base = 2.0
29f_2          = np.power(base, x)
30f_2_prime    = derivative_exponential(x, base)
31ratio        = f_2_prime / f_2
32
33print(f"\n2^x at x={x}:")
34print(f"  f(x)             = {f_2:.10f}")
35print(f"  f'(x)            = {f_2_prime:.10f}")
36print(f"  ratio f'(x)/f(x) = {ratio:.10f}   <-- equals ln(2) = {np.log(2):.10f}")

PyTorch Implementation

Now in PyTorch. We'll use the very same exe^x in two places: building softmax probabilities (the ML-flavored use of exponentials) and confirming the self-derivative property via autograd (the calculus-flavored use).

Softmax and autograd: exponentials in PyTorch
🐍exponential_pytorch.py
1Import torch

torch gives us tensors, autograd, and torch.exp — the PyTorch analog of np.exp that also tracks gradients.

2Import functional API

torch.nn.functional contains the production-quality softmax that uses the log-sum-exp trick to avoid overflow on large logits.

8Create a tensor of logits

A vector of three raw scores. They are not probabilities yet — they can be any real numbers, including negatives. The softmax will turn them into a probability distribution.

EXAMPLE
logits = tensor([2.0, 1.0, 0.1])
11Element-wise e^z

torch.exp applies e^x to every element. This is where exponentials enter ML: it maps the entire real line into the positive reals, which is exactly what we need to interpret values as 'likelihoods'.

EXAMPLE
e^2.0 ≈ 7.3891
e^1.0 ≈ 2.7183
e^0.1 ≈ 1.1052
=> exps = tensor([7.3891, 2.7183, 1.1052])
12Normalize

Divide each exp by the sum so the components add to 1. Now probs is a valid probability distribution over the three classes.

EXAMPLE
sum = 7.3891 + 2.7183 + 1.1052 = 11.2126
probs = [7.3891/11.2126, 2.7183/11.2126, 1.1052/11.2126]
      ≈ [0.6590, 0.2424, 0.0986]
14Print logits

Sanity-check: we have not modified the input tensor — operations on logits are read-only.

15Print the raw exponentials

All three are positive. exps is what makes softmax differ from a simple 'argmax with ties': the relative magnitudes of e^z_i are preserved smoothly.

16Print the probabilities

The largest logit (2.0) gets the largest probability (≈0.659). The smallest logit (0.1) gets the smallest probability (≈0.099). The middle one falls between.

17Confirm probabilities sum to 1

.item() unwraps a 0-d tensor to a Python float. The print should display 1.0 (or 0.9999999... due to floating-point) — proof that probs is a valid distribution.

20PyTorch's built-in softmax

F.softmax(logits, dim=0) is mathematically identical to our manual version. It subtracts max(logits) before exponentiating to avoid overflow when logits are large (e.g. logit=100 would make exp = 2.7e43 and crash float32).

EXAMPLE
Numerically-stable form:
  z' = z - max(z) = [2.0, 1.0, 0.1] - 2.0 = [0.0, -1.0, -1.9]
  softmax(z') = softmax(z)    (proved algebraically)
The answer is the same: tensor([0.6590, 0.2424, 0.0986]).
26Tensor with requires_grad=True

requires_grad=True tells autograd to record every operation on x so we can later call .backward() and have x.grad populated with the derivative.

EXAMPLE
x.requires_grad = True   <-- autograd starts tracking
27Compute y = e^x

torch.exp is differentiable. Internally PyTorch builds a tiny computation graph: x -> ExpBackward -> y. The graph remembers the value of y because the derivative of e^x is y itself.

EXAMPLE
y = e^1.5 ≈ 4.4816890717   (tensor with grad_fn=<ExpBackward0>)
29Trigger backpropagation

y.backward() walks the graph backward: it computes dy/dx and writes the result into x.grad. For y = e^x the rule says dy/dx = e^x — which is the very value of y we already have.

31Print y

y ≈ 4.4816890717. Same as np.exp(1.5) in the previous code block — PyTorch and NumPy agree.

32Print dy/dx

x.grad ≈ 4.4816890717 — identical to y. This is autograd numerically confirming the analytic claim d/dx e^x = e^x.

33Assert they are equal

torch.allclose returns True if every element of y matches x.grad within floating-point tolerance. For the magic base e, the assertion holds exactly (to within ulp).

EXAMPLE
Expected stdout:
  y     = 4.481689453125
  dy/dx = 4.481689453125
  equal : True
16 lines without explanation
1import torch
2import torch.nn.functional as F
3
4# ----------------------------------------------------------------------
5# Part 1: softmax — exponentials turn a vector of logits into probabilities.
6# softmax(z_i) = exp(z_i) / sum_j exp(z_j)
7# ----------------------------------------------------------------------
8logits = torch.tensor([2.0, 1.0, 0.1])
9
10# Manual softmax with raw exponentials.
11exps   = torch.exp(logits)              # element-wise e^z_i
12probs  = exps / exps.sum()              # normalize to a probability vector
13
14print("logits :", logits)
15print("exps   :", exps)
16print("probs  :", probs)
17print("sum    :", probs.sum().item())   # must be 1.0
18
19# PyTorch's built-in (numerically stable) softmax — same answer.
20print("F.softmax:", F.softmax(logits, dim=0))
21
22# ----------------------------------------------------------------------
23# Part 2: autograd confirms d/dx e^x = e^x exactly.
24# ----------------------------------------------------------------------
25x = torch.tensor(1.5, requires_grad=True)
26y = torch.exp(x)
27
28y.backward()                            # populates x.grad with dy/dx
29
30print("\ny     =", y.item())            # e^1.5
31print("dy/dx =", x.grad.item())          # also e^1.5 — they are equal
32print("equal :", torch.allclose(y.detach(), x.grad))

Why autograd nails the self-derivative property exactly

When PyTorch computes the derivative of y=exy = e^x, it does not use a numerical approximation. Internally, ExpBackward caches yy (since the derivative happens to equal the forward value) and re-uses it as the gradient — so the assertion torch.allclose(y, x.grad) holds to within floating-point precision, not merely a tolerance.


Common Pitfalls

Confusing Exponential with Power Functions

2x2^x (exponential) is NOT the same as x2x^2 (power function):

  • 2x2^x: variable exponent, fixed base — exponential growth.
  • x2x^2: fixed exponent, variable base — polynomial growth.

For large x, exponentials always dominate: 210=10242^{10} = 1024 while 102=10010^2 = 100.

Negative Bases Are Not Allowed

(2)x(-2)^x is not a valid exponential function:

  • (2)1/2=2(-2)^{1/2} = \sqrt{-2} is not real.
  • Many real x produce complex or undefined results.

That is why we require a>0a > 0.

`e` vs `exp()` in code

In Python, e**x only works if you first set e = math.e. Otherwise e is undefined and you'll get a NameError. Prefer the explicit functions:

  • math.exp(x) or numpy.exp(x) in Python
  • torch.exp(x) in PyTorch
  • Math.exp(x) in JavaScript

Softmax Overflow

A naive softmax computes ezie^{z_i} directly. With a logit of 100, this is e1002.7×1043e^{100} \approx 2.7 \times 10^{43} — far beyond float32's ~3.4e38 limit. Always subtract max(z)\max(z) first (this is what F.softmax does). The result is mathematically identical but numerically safe.


Test Your Understanding

Test Your Understanding

Score: 0/0
Question 1 of 10
What is the value of e^0?

Summary

Exponential functions describe multiplicative change — the universal pattern whenever the rate of change is proportional to the current quantity.

Key Formulas

FormulaDescription
f(x) = a^xGeneral exponential (a > 0, a ≠ 1)
f(x) = e^xNatural exponential (e ≈ 2.718)
e = lim (1 + 1/n)^nBernoulli's definition of e
e = Σ 1/n!Series definition of e
d/dx e^x = e^xSelf-derivative property
d/dx a^x = a^x · ln(a)Derivative for any base
A = P e^(rt)Continuous compounding / growth
softmax(z_i) = e^z_i / Σ e^z_jML probability normalization

Key Takeaways

  1. Exponentials model multiplicative change: each step multiplies by the same factor.
  2. Every exponential passes through (0,1)(0, 1) because a0=1a^0 = 1.
  3. Base > 1 gives growth; 0 < base < 1 gives decay; the x-axis is the asymptote on one side.
  4. Euler's number e2.718e \approx 2.718 emerges as the limit of (1+1/n)n(1 + 1/n)^n — the natural ceiling of continuous compounding.
  5. exe^x is the unique exponential that equals its own derivative — confirmed analytically, numerically, and via PyTorch autograd.
  6. Exponentials are everywhere: population, decay, cooling, finance, softmax, attention, learning-rate schedules.
The Essence of Exponentials
"Among all bases, only e gives an exponential that is its own derivative — the natural language of every system whose rate of change is proportional to its size."
Coming Next: in the next section we invert the exponential to get the logarithm. You will see why log\log turns multiplication into addition, why the natural log ln\ln is the inverse of exe^x, and how all of this powers the cross-entropy loss in deep learning.
Loading comments...