Boo-AI — Master Artificial Intelligence by Building from Scratch

Where Exponential Breaks

The exponential model $dP/dt = r\,P$ says one thing very loudly: the more individuals there are, the faster more arrive. It fits bacteria in their first dozen generations, a viral epidemic in its first weeks, the world's human population from roughly 1700 to 1960. Then it always stops fitting.

Why? Because nothing grows without limit. The petri dish runs out of sugar. The forest runs out of light. The country runs out of farmland. The model that gave the early-time match has no idea any of these things exist. It needs a brake — a mechanism that quietly slows $P$ down as the environment fills.

The single question of this section

How can we modify $dP/dt = r P$ so that growth slows as $P$ approaches an upper limit — without abandoning the "rate proportional to population" intuition that worked so well in the early phase?

The fix is one extra factor in the equation. That factor, called the carrying capacity term, is what turns the exponential into the logistic. And the logistic is one of the most consequential equations ever written down in mathematical biology, economics, epidemiology, ecology, and the early theory of neural networks.

The Carrying Capacity Idea

Pierre-François Verhulst noticed it first, in 1838. He proposed that per-capita growth — the rate per individual — is not constant but should decrease as the population approaches some ceiling $K$ :

\displaystyle \frac{1}{P}\frac{dP}{dt} \;=\; r\!\left(1 - \frac{P}{K}\right)

Read it slowly. The left side is the fractional growth rate — how fast $P$ changes per current individual. The right side says: this fractional rate starts at $r$ when $P$ is tiny, decreases linearly as $P$ rises, and hits zero when $P = K$ .

Three readings of the same fraction

Resource fraction. $P/K$ is the fraction of resources already consumed; $1 - P/K$ is the fraction that remains. Growth is proportional to what is left, not what has been taken.
Crowding. When $P/K$ approaches 1, every birth is cancelled by a competition-induced death. The brake is the crowd itself.
Logistic line. Plotting per-capita rate vs $P$ gives a straight line of slope $-r/K$ crossing zero at $P = K$ . Real ecologists confirm this line empirically before fitting the ODE.

Writing the Equation From Intuition

Multiply both sides of the per-capita form by $P$ and the famous logistic ODE drops out:

\displaystyle \frac{dP}{dt} \;=\; r\,P\!\left(1 - \frac{P}{K}\right)

Two terms hide inside one expression. The first, $r P$ , is pure exponential growth — births winning. The second, $-r P^2 / K$ , is the crowd-induced drag — competition winning. Their balance is the entire sigmoid:

Regime	Which term dominates	Behavior
P much smaller than K	r·P (births)	Nearly exponential rise
P near K/2	Both terms equal in size	Maximum growth, inflection
P close to K	-r·P²/K (crowding)	Asymptotic flattening
P slightly above K	-r·P²/K dominates	Slow decay back down to K

The two equilibria fall out for free

Set $dP/dt = 0$ . Then $r P (1 - P/K) = 0$ , which gives $P = 0$ or $P = K$ . Two roots, two equilibria. We will see in the next section that $P=0$ is unstable (any tiny perturbation grows away from it) while $P=K$ is stable (perturbations decay back to it).

Phase Line: The Whole Story in One Picture

Plot the right side of the ODE, $f(P) = r P(1-P/K)$ , as a function of $P$ . It is an upside-down parabola: zero at $P=0$ , zero at $P=K$ , maximum at $P=K/2$ . From this single picture we read off every qualitative feature of the dynamics — without ever solving the ODE.

Loading phase line…

Drag $P$ along the axis and watch the value of $dP/dt$ . Whenever it is positive the population grows; wherever it is negative it shrinks. The two zeros are the equilibria. The arrows on the P-axis point in the direction $P$ moves. Notice both arrows squeeze toward $P = K$ from below and above — that is the geometric meaning of stable.

The peak of the parabola at $P = K/2$ is the fastest the population ever grows. Its height is $r K / 4$ . Remember that number — it is the single most useful summary statistic of a logistic curve.

Solving the Logistic ODE

The logistic equation is separable: every $P$ can be moved to one side and every $t$ to the other. The only trick is a partial-fraction decomposition on the way through.

Start with $\dfrac{dP}{dt} = r P \left(1 - \dfrac{P}{K}\right) = \dfrac{r}{K}\,P\,(K - P)$ .
Separate variables: $\dfrac{dP}{P(K-P)} = \dfrac{r}{K}\,dt$ .
Split the left side with partial fractions: $\dfrac{1}{P(K-P)} = \dfrac{1}{K}\!\left(\dfrac{1}{P} + \dfrac{1}{K-P}\right)$ .
Integrate both sides: $\displaystyle \frac{1}{K}\,\ln\!\left|\frac{P}{K-P}\right| = \frac{r}{K}\,t + C_1$ .
Multiply through by $K$ and exponentiate: $\dfrac{P}{K-P} = C\,e^{r t}$ , where $C = e^{K C_1}$ .
Apply the initial condition $P(0) = P_0$ : at $t = 0$ the right side is $C$ , so $C = P_0 / (K - P_0)$ .
Solve for $P$ : $P = (K - P)\,C e^{rt} \;\Rightarrow\; P\,(1 + C e^{r t}) = K C e^{r t}$ , i.e. $P(t) = \dfrac{K C e^{r t}}{1 + C e^{r t}}$ .

Divide top and bottom by $C e^{rt}$ and use $A = 1/C = (K - P_0)/P_0$ . The clean form falls out:

\displaystyle P(t) \;=\; \frac{K}{1 + A\,e^{-r t}}, \qquad A = \frac{K - P_0}{P_0}

Sanity check at the two ends

At $t = 0$ : $P(0) = K / (1 + A) = K \cdot P_0 / K = P_0$ . ✓
As $t \to \infty$ : $e^{-rt} \to 0$ , so $P \to K/1 = K$ . ✓
Differentiate $P(t)$ and you must recover $rP(1-P/K)$ . (Try it on paper — the chain rule does all the work.)

Three Regimes and the Inflection Point

The logistic curve has three distinct chapters in its life. They are decided entirely by the ratio $P/K$ .

Early phase: nearly exponential

When $P \ll K$ the factor $(1 - P/K)$ is essentially 1 and the equation degenerates to $dP/dt \approx r P$ . On a log-y plot the early logistic looks like a straight line of slope $r$ — indistinguishable from pure exponential growth. That is why early epidemics, early markets, and early bacterial cultures are all routinely (mis)fit as exponentials.

Middle phase: the inflection

The growth rate $dP/dt$ hits its maximum when the parabola peaks — at $P = K/2$ . That moment is the inflection point: the curve changes from concave-up (still accelerating) to concave-down (now decelerating). The maximum rate is $r K / 4$ . Solving $P(t^*) = K/2$ gives the inflection time:

\displaystyle t^* \;=\; \frac{\ln A}{r} \;=\; \frac{1}{r}\,\ln\!\frac{K - P_0}{P_0}

Late phase: saturation

When $P$ approaches $K$ the brake takes over. Subtracting $K$ from the solution and linearising in the small quantity $(K - P)$ shows that the gap decays exponentially with rate $r$ : $K - P(t) \approx (K - P_0)\, e^{-rt}$ for late times. So the early and late phases are both exponentials — same constant $r$ , opposite directions — glued together at the inflection.

Interactive Logistic Explorer

Drag $r$ , $K$ , and $P_0$ . Watch the cyan logistic curve next to the dashed-red "what an exponential would have done" overlay. The green tangent always carries slope $r P (1 - P/K)$ ; the purple dot marks the inflection at $P = K/2$ .

Loading explorer…

What to play with first

Set $P_0 = 50, K = 1000, r = 0.1$ and move the marker to $t \approx 29.4$ . The curve should pass exactly through the inflection at $P = 500$ and the rate readout should read $r K/4 = 25$ .
Start $P_0$ above $K$ — say $P_0 = 1500, K = 1000$ . The population does not crash to zero, it gently decays back down to $K$ . There is no inflection in this regime.
Crank $r$ up. The logistic just hits its ceiling faster — the shape never changes, only the time scale. That self-similarity is the defining feature of sigmoid curves.

Every Starting Population Lands at K

The strongest claim of the logistic model is its universal destination: no matter where you start (as long as $P_0 > 0$ ), the population ends at $K$ . The phase-line told us this is true qualitatively. Now we see it as a forest of trajectories all bending toward the same horizontal asymptote.

Loading visualizer…

Six different starting points: tiny (10), small (100), middle (400), right at the inflection (900 ≈ K), and two that begin above carrying capacity (1300, 1800). The lower three trace out classic S-shapes. The upper two trace mirror-image decays. All six lock onto $K$ .

Worked Example: $r = 0.1$ , $K = 1000$ , $P_0 = 50$

A small lake is being repopulated with trout. Initial release is 50 fish. Long-term monitoring of similar lakes suggests a carrying capacity of 1000 and a growth rate of $r = 0.1$ per year. Predict the population every ten years, find the inflection time, and identify when the population reaches 90% of carrying capacity. Try it on paper first, then expand the panel for the full work.

Show step-by-step solution

Step 1. Write the ODE, identify constants, and assemble the solution.

We have $\dfrac{dP}{dt} = 0.1\,P\!\left(1 - \dfrac{P}{1000}\right)$ with $P(0) = 50$ . The constant $A = (K - P_0)/P_0 = 950/50 = 19$ . So the closed-form is $P(t) = \dfrac{1000}{1 + 19\,e^{-0.1 t}}$ .

Step 2. Evaluate at landmark times.

t (yr)	Denominator 1 + 19·e^{-0.1 t}	P(t)	dP/dt = 0.1·P(1−P/1000)
0	1 + 19·1 = 20.00	50.00	4.75
10	1 + 19·0.3679 = 7.99	125.16	10.95
20	1 + 19·0.1353 = 3.571	280.00	20.16
30	1 + 19·0.0498 = 1.946	513.89	24.98
40	1 + 19·0.01832 = 1.348	741.84	19.15
60	1 + 19·0.00248 = 1.047	955.02	4.30
80	1 + 19·0.000336 = 1.0064	993.67	0.63

Two things to notice. (a) The numbers are not a constant multiplicative step — the early jumps look exponential (50 → 125 → 280 is roughly ×2.5 per decade) but the late jumps slow dramatically (742 → 955 is a much smaller proportional move). (b) The growth rate column climbs to its maximum near $t = 30$ and then declines. The inflection lives in there.

Step 3. Find the inflection time exactly.

Inflection happens when $P(t^*) = K/2 = 500$ . Plug into the closed form: $1000 / (1 + 19 e^{-0.1 t^*}) = 500$ , so $1 + 19 e^{-0.1 t^*} = 2$ , hence $e^{-0.1 t^*} = 1/19$ , hence $t^* = \ln(19)/0.1 \approx 29.444$ years.

Step 4. Confirm the maximum growth rate.

At the inflection, $dP/dt = 0.1 \cdot 500 \cdot (1 - 500/1000) = 0.1 \cdot 500 \cdot 0.5 = 25.0$ fish per year. That matches the closed-form maximum $rK/4 = 0.1 \cdot 1000/4 = 25$ . The two paths agree — always a satisfying cross-check.

Step 5. Time to reach 90% of carrying capacity.

Solve $P(t) = 900$ : $1000 / (1 + 19 e^{-0.1 t}) = 900$ , so $1 + 19 e^{-0.1 t} = 10/9$ , hence $19 e^{-0.1 t} = 1/9$ , hence $e^{-0.1 t} = 1/171$ , hence $t = \ln(171)/0.1 \approx 51.42$ years.

Step 6. Sanity-check on the explorer above. Set $r = 0.1$ , $K = 1000$ , $P_0 = 50$ , move the marker to $t = 30$ . The P readout should be ≈513.9, the dP/dt readout should be ≈24.98, and the inflection label should sit at t ≈ 29.44, P = 500.

Reality Check: Yeast in a Flask

Pearl and Reed's 1920 yeast-growth experiment is the canonical textbook fit. Cells were counted every two hours in a closed flask; the population started small, climbed through an inflection, and plateaued near a carrying capacity set by the available sugar. The logistic fits the data so well that the residuals can't even be seen on a normal-scale plot.

The same equation, three more times. Verhulst's logistic also reproduces (i) the spread of a rumor in a closed community (every conversation is between a knower and a non-knower, analogous to a birth meeting an empty niche), (ii) the adoption curve of new technology (the Bass diffusion model is logistic in disguise), and (iii) the firing rate of a biological neuron as input current rises (the famous sigmoid). Same equation, three different fields, every time the brake is some form of saturation.

Where the model still breaks

The logistic assumes $K$ is constant. Real ecosystems have seasons, harvests, predators, and climate. If $K$ itself depends on time you get a non-autonomous logistic; if it depends on $P$ you get the gamut of Allee effects, collapse dynamics, and chaotic discrete maps. The continuous logistic is a one-equation model; the textbook should not pretend otherwise.

Python: Stepping the ODE by Hand

Before we trust the closed form, let's rebuild the curve from the differential equation itself. We'll Euler-step $dP = r P (1 - P/K)\,dt$ forward and watch the per-step rule converge to the analytic $K/(1 + A e^{-rt})$ as the step size shrinks.

Hand-coded logistic simulator with closed-form comparison

🐍logistic_scratch.py

Explanation(26)

Code(38)

1Import math

We pull in math.exp for the closed-form check. Pure Python on purpose — every operation stays visible. No numpy, no scipy, no hidden vectorization.

EXAMPLE

math.exp(-1) -> 0.3678794

3Function logistic_step

One small Euler step. Given the current population P, the intrinsic rate r, the carrying capacity K, and a small time chunk dt, it returns the next P. The whole ODE collapses to a one-line update.

4Docstring

Pins the meaning to the equation in the section so a reader skimming the file can't confuse this with a generic Euler stepper.

5Compute the increment dP = r·P·(1 − P/K)·dt

This is the entire physics. The factor r·P alone would give exponential growth; the extra (1 − P/K) is the brake. When P is small the brake is near 1 — almost free; when P approaches K the brake approaches 0 — growth halts. The intuition lives inside this single product.

EXAMPLE

P=50, r=0.1, K=1000, dt=0.01 -> dP = 0.1*50*(1-0.05)*0.01 = 0.04750

6Return P + dP

We return a new value rather than mutating P. That keeps the stepper pure: same inputs always produce the same output. Pure step functions are trivial to test and trivial to plug into more advanced integrators (Runge-Kutta, etc.).

9Function simulate_logistic

Wraps the stepper in a loop. Builds up a list of (t, P) pairs so we can later compare against the exact answer or plot the trajectory.

11Initial state (t, P) = (0, P0)

Time starts at zero and P at the supplied initial value P0. Without that initial condition, the ODE has infinitely many solutions (one per choice of P₀).

EXAMPLE

P0 = 50.0 -> the trajectory begins at the point (0, 50).

12Empty history list with the start state

We record every (t, P) snapshot. The first row is (0, 50); the last row will be near (80, 994). Storing the history makes the simulator easy to debug — you can index any moment and see what was happening.

13while t < t_final

March forward until simulated time exceeds t_final. With dt = 0.01 and t_final = 80 we run 8000 steps — fast on any machine, slow on paper. That asymmetry is exactly why we use the computer.

14One Euler step

Compute the next P using the current P and dt. Notice: the new P is bigger than the old when P < K (because 1 − P/K > 0) and smaller when P > K (because 1 − P/K < 0). The single sign-flip captures the whole stability story.

EXAMPLE

Step 1 from P=50 -> P = 50 + 0.04750 = 50.0475

15Advance time by dt

Time bookkeeping. We update P first, then t — the standard explicit-Euler order. Swapping these would shift the recorded curve by one step.

16Record the pair

Append the new snapshot. The full history is a sampled version of the true continuous trajectory.

17Return the trajectory

Hands back the entire (t, P) history. Functions that build up state and return it cleanly compose well — we can plot, slice, or test without any global state.

20Function closed_form

The exact analytic solution we will derive on paper a few sections from here. We use it as ground truth to score how good the Euler simulation is.

22Compute A = (K − P0) / P0

A is the dimensionless constant that encodes the initial condition inside the solution. For P0 = 50, K = 1000: A = 950/50 = 19. The closer P0 is to K, the smaller A becomes — and the closer the start is to its destination.

EXAMPLE

A = (1000 - 50) / 50 = 19.0

23Evaluate K / (1 + A·exp(−r·t))

The sigmoid in closed form. At t = 0 the denominator is 1 + A so P = K/(1+A) = P0; at t → ∞ the exponential vanishes and P → K. Both boundary checks fall out without any extra work.

EXAMPLE

P(10) = 1000 / (1 + 19·exp(-1)) = 1000 / 7.989 = 125.16

26Pick parameters

We use the same numbers as the worked example: P0 = 50, r = 0.1, K = 1000. Concrete numbers anchor every later check.

27Set t_final = 80

Long enough to reach almost-saturation (around 994 out of 1000) without overshooting into rounding territory. About one and a half characteristic times after the inflection.

29Fine Euler trace dt = 0.01

Small step size makes Euler's approximation hug the true curve very closely. 8000 steps, all near-perfect.

30Coarse Euler trace dt = 2.0

Same physics, only 40 steps. The error becomes visible — and instructive. Explicit Euler systematically lags whenever the slope is changing fast, which is exactly what happens near the inflection.

33Pick landmark times

Four checkpoints spanning early growth (10), pre-inflection (20), past inflection (30), and near saturation (60). These four numbers give a fingerprint of the full curve.

34Loop the landmarks

For each landmark we look up the exact value and the two Euler approximations. We print them side by side so the size of the error is obvious.

35Exact from closed-form

Uses the analytic solution. This is our ground truth: 125.16, 280.00, 513.89, 955.02 at t = 10, 20, 30, 60.

36Index into trace_fine

int(t / 0.01) gives the step index. For t = 10 with dt = 0.01 that's index 1000 — we picked landmarks that land exactly on step boundaries so there is no interpolation noise to worry about.

37Index into trace_coarse

Same trick with dt = 2.0. For t = 10 the index is 5; for t = 30 it's 15. These rows show how much error a 40-step Euler accumulates.

38Print the comparison row

Running this you'll see the fine Euler match the exact value to about 3 decimal places, while the coarse Euler under-shoots at the inflection (step too big to track the changing slope) and over-shoots in saturation. Same equation, different discretization — that single contrast is the headline of all numerical ODE methods.

12 lines without explanation

1import math
2
3def logistic_step(P, r, K, dt):
4    """One small Euler step of dP/dt = r * P * (1 - P/K)."""
5    dP = r * P * (1.0 - P / K) * dt
6    return P + dP
7
8
9def simulate_logistic(P0, r, K, t_final, dt):
10    """March forward in time with constant step dt."""
11    t, P = 0.0, P0
12    history = [(t, P)]
13    while t < t_final:
14        P = logistic_step(P, r, K, dt)
15        t = t + dt
16        history.append((t, P))
17    return history
18
19
20def closed_form(P0, r, K, t):
21    """Analytic solution P(t) = K / (1 + A * e^{-r t}),  A = (K - P0) / P0."""
22    A = (K - P0) / P0
23    return K / (1.0 + A * math.exp(-r * t))
24
25
26P0, r, K = 50.0, 0.1, 1000.0
27t_final = 80.0
28
29trace_fine   = simulate_logistic(P0, r, K, t_final, dt=0.01)
30trace_coarse = simulate_logistic(P0, r, K, t_final, dt=2.0)
31
32# Compare against the exact answer at four landmark times.
33landmarks = [10.0, 20.0, 30.0, 60.0]
34for t in landmarks:
35    exact  = closed_form(P0, r, K, t)
36    fine   = trace_fine[int(t / 0.01)][1]
37    coarse = trace_coarse[int(t / 2.0)][1]
38    print(f"t={t:5.1f}  exact={exact:8.3f}  fine={fine:8.3f}  coarse={coarse:8.3f}")

The takeaway is the same as for the exponential case: solving the ODE on paper buys a calculation that needs no loop. The fine Euler is a sanity check that we set up the equation correctly; the coarse Euler is a sanity check that the analytic solution actually matters. If you ever distrust a differential-equation solver, this is the exact pattern to run: analytic vs fine numeric vs coarse numeric, side by side.

Python: Recovering r from Data via Linearization

Suppose you don't know $r$ but you have a few measurements and you do know the carrying capacity $K$ . The logistic linearizes beautifully: a single log transform turns the sigmoid into a straight line. Then any method that fits a line works.

Recovering r and A from six samples by linear least squares

🐍fit_logistic_linear.py

Explanation(25)

Code(43)

1Import math

We need math.log to apply the linearization trick and math.exp to recover A from its log. No external packages.

4samples list

Six (t, P) measurements taken from the same logistic ODE we just solved (r=0.1, K=1000, P0=50). The fitting code should recover those parameters from just these six rows.

5First sample (0, 50)

Time zero gives P = 50. This will linearize to ln(950/50) = ln(19) ≈ 2.9444 — the intercept of the straight line we are about to fit.

6Sample at t = 10

P = 125.161. Linearizes to ln((1000-125.161)/125.161) ≈ 1.9444 — exactly 1.0 less than the t=0 value. That spacing IS the value of r times the time step.

7Sample at t = 20

P = 280.005. Linearizes to ln(720/280) ≈ 0.9444 — again exactly 1.0 less than the previous row. The linearization is so clean it is almost suspicious.

8Sample at t = 30

P = 513.887, just past the inflection (K/2 = 500). Linearizes to ln(486/514) ≈ -0.0556 — the line has crossed the x-axis, which is exactly the moment P passes K/2.

9Sample at t = 40

P = 741.84. Now we are in the saturation phase. Linearizes to about -1.0556.

10Sample at t = 50

P = 886.51. Linearizes to about -2.0556. Six rows, six linear values, each 1.0 apart — the cleanest possible signal of slope = -0.1.

13Assume K = 1000

In real labs you often know the carrying capacity from a long-time observation (the flask runs out of sugar, the petri dish fills up). Knowing K reduces the fit to one parameter. The next code block fits both r and K together with PyTorch.

21Loop and apply the log transform

For each sample compute L = ln((K-P)/P). This is the standard logistic linearization: it turns the sigmoid into a straight line in L vs t.

EXAMPLE

ln((1000 - 280.005)/280.005) = ln(2.5713) ≈ 0.9444

22Append (t, L)

Collect the transformed rows. We will fit a straight line to these six points using the closed-form least-squares formula.

23Print each transformed row

You should see the L values stepping down by exactly 1.0 per 10 time units. That step size IS r times the spacing — the algorithm is reading r directly off the printout.

26Six accumulators

Standard least-squares closed form uses Σt, ΣL, Σt², ΣtL. We compute each in plain Python so no library hides the math.

27Sum of t-values

For our six samples this is 0 + 10 + 20 + 30 + 40 + 50 = 150.

28Sum of L-values

≈ 2.944 + 1.944 + 0.944 + (-0.056) + (-1.056) + (-2.056) = 2.667. The midpoint of the linearized values lives near zero because half the samples are below the inflection and half above.

29Sum of t² values

0 + 100 + 400 + 900 + 1600 + 2500 = 5500.

30Sum of t·L values

Cross-correlation term. This is the one that picks up the slope. Roughly -116.67 for our data.

32Closed-form slope formula

The textbook least-squares slope: slope = (nΣtL − ΣtΣL) / (nΣt² − (Σt)²). For our data the numerator is hugely negative and the denominator is comfortably positive, giving slope ≈ -0.10000.

EXAMPLE

slope = (6 * -116.67 - 150 * 2.667) / (6 * 5500 - 150*150) = -0.1

33Closed-form intercept

intercept = mean(L) − slope · mean(t). For our data this lands very close to ln(19) ≈ 2.944, confirming A = 19.

35Recover r from the slope

Because the linearization was ln((K-P)/P) = ln A − r t, the slope IS −r. Flip the sign and we have r ≈ 0.1.

36Recover A from the intercept

exp of the intercept gives A directly: exp(2.944) ≈ 19. From A and P0 = K/(1+A) we can also recover the initial population.

37Implied P0 from A

1000 / (1 + 19) = 50, recovering the initial condition without measuring it directly. That sanity check is the kind of cross-validation real-data fits live for.

39Print fitted r

You should see r_hat very close to 0.10. With noiseless synthetic data the recovery is essentially exact; in real labs the noise on P drives a small spread in r_hat from row to row.

40Print fitted A

Close to 19.0 — the initial-condition constant. Useful to print: if A comes out negative something is wrong (it would mean the linearization argument was negative, which means a measurement above K).

41Print implied P0

Reading P0 back from the fit, not from the measurement. When this matches the row-0 measurement, your model and your data agree.

18 lines without explanation

1import math
2
3# Five synthetic measurements of the logistic with r=0.1, K=1000, P0=50.
4samples = [
5    (0.0,   50.0000),
6    (10.0, 125.1610),
7    (20.0, 280.0046),
8    (30.0, 513.8867),
9    (40.0, 741.8413),
10    (50.0, 886.5083),
11]
12
13# Assume K is known (e.g. from steady-state observation).
14K = 1000.0
15
16# Linearize: P(t) = K / (1 + A e^{-rt})
17#   ->  (K - P) / P  =  A e^{-rt}
18#   ->  ln((K - P) / P) = ln A  -  r * t
19# So plotting ln((K - P)/P) against t gives a straight line of slope -r.
20
21linearized = []
22for t, P in samples:
23    L = math.log((K - P) / P)
24    linearized.append((t, L))
25    print(f"t={t:5.1f}  P={P:8.3f}  ln((K-P)/P) = {L:+.6f}")
26
27# Least-squares slope and intercept (formula form, no numpy).
28n = len(linearized)
29sum_t   = sum(t for t, _ in linearized)
30sum_L   = sum(L for _, L in linearized)
31sum_tt  = sum(t * t for t, _ in linearized)
32sum_tL  = sum(t * L for t, L in linearized)
33
34slope     = (n * sum_tL - sum_t * sum_L) / (n * sum_tt - sum_t * sum_t)
35intercept = (sum_L - slope * sum_t) / n
36
37r_hat = -slope          # slope was -r
38A_hat = math.exp(intercept)
39P0_hat = K / (1.0 + A_hat)
40
41print(f"\nFitted r = {r_hat:.6f}   (target 0.10)")
42print(f"Fitted A = {A_hat:.6f}   (target 19.0)")
43print(f"Implied P0 = {P0_hat:.4f}  (target 50.0)")

PyTorch: Fitting r and K Together

The linearization above assumed $K$ was known. In practice you usually have to fit both $r$ and $K$ from data. PyTorch turns this into one familiar pattern: forward pass, relative-MSE loss, autograd backward, optimizer step, clamp to keep parameters physical.

Joint fit of r and K with autograd

🐍fit_logistic_pytorch.py

Explanation(19)

Code(36)

1Import torch

PyTorch gives us autograd: every operation on a tensor with requires_grad=True is recorded and gradients propagate backward automatically. We use this so we don't have to derive ∂loss/∂r and ∂loss/∂K by hand.

4Tensor of time stamps

Shape (6,). One row per measurement. CPU is plenty fast for six scalars.

EXAMPLE

t = tensor([0., 10., 20., 30., 40., 50.])

5Tensor of measurements

Matching populations. Same shape (6,). Together (t, P) is the training data.

8Initial guess r = 0.30

Three times the true value, on purpose. requires_grad=True marks r as a parameter we want to learn. If gradient descent works, r should slide from 0.30 down to about 0.10.

9Initial guess K = 2000

Double the true K. The optimizer must shrink this back to 1000. Watching K and r move together is the most instructive part of running this script.

10Pin P0 to the first measurement

We treat P(0) as known so that A = (K - P0)/P0 depends only on K — one fewer parameter to fit. Letting P0 also learn is a one-line change but introduces a near-degeneracy between P0 and K at very early times.

12Adam optimizer over r and K

Adam adapts the learning rate per parameter using past gradient statistics. lr = 0.05 is aggressive enough to reach the basin in a few hundred steps without overshooting.

14Loop for 2000 steps

Each step does the standard PyTorch dance: zero gradients, forward pass, compute loss, backward pass, optimizer step, optionally project parameters back into a valid region.

15zero_grad()

PyTorch accumulates gradients into .grad. Without this call, gradients from previous iterations would still be sitting there and the next step would head in the wrong direction.

18Compute A inside the graph

A is a function of K, so it must be part of the computation graph. If you instead pre-computed A = (K.item() - P0)/P0 you would silently disconnect the gradient and the optimizer would never update K through this branch.

EXAMPLE

K=2000, P0=50 -> A = 39.0 on the first iteration

19Forward: P_hat = K / (1 + A exp(-r t))

The analytic logistic, evaluated as a vector. Because t has shape (6,) and r and K are scalars, broadcasting makes P_hat shape (6,) too — one prediction per measurement.

EXAMPLE

On the first iteration with r=0.3, K=2000 the predictions are massively off — the loss should be in the 0.1–1 range.

22Relative MSE loss

Plain MSE would let the largest measurement (886) dominate. Dividing by P first makes a 10% error count the same at P = 50 and at P = 900. This is the practical loss for sigmoidal fits across orders of magnitude.

EXAMPLE

((P_hat - P)/P)**2 with P_hat=P gives 0

24loss.backward()

Walks the computational graph in reverse, filling r.grad and K.grad with ∂loss/∂r and ∂loss/∂K. We never wrote those derivatives — autograd assembled them from the elementary operations.

25optimizer.step()

Updates r and K using their accumulated gradients and Adam's internal momentum/variance state. After this call both parameters are closer to the truth than they were on the previous iteration.

28torch.no_grad() block

We are about to modify K in place to keep A > 0. That modification must NOT be tracked by autograd, otherwise the next backward pass would try to differentiate through the clamp.

29K.clamp_(min=P.max() + 1)

If K ever drops below the largest observation, then A = (K - P0)/P0 becomes negative and exp(-r t) starts producing predictions above K — physically nonsense. Clamping prevents the optimizer from wandering into that region.

EXAMPLE

If P.max() = 886.51 then K is held at >= 887.51 at all times.

31Periodic print

Every 200 steps we print the current r, K, and loss. Watching loss drop several orders of magnitude while r slides 0.30 -> 0.10 and K slides 2000 -> 1000 is the most satisfying visualization of multi-parameter training.

32Format string

.item() unwraps a 0-dim tensor into a Python float. The :.4f / :.2f / :.3e format specifiers keep the printout aligned and readable.

35Final report

After 2000 steps you should see r ≈ 0.10000 and K ≈ 1000.0. If you change the initial guesses, the trajectory shifts but the answer stays the same — the loss landscape has one minimum and gradient descent finds it.

17 lines without explanation

1import torch
2
3# Same six samples, but pretend we know neither r nor K.
4t = torch.tensor([0.0, 10.0, 20.0, 30.0, 40.0, 50.0])
5P = torch.tensor([50.0, 125.1610, 280.0046, 513.8867, 741.8413, 886.5083])
6
7# Two learnable parameters. Initialize at clearly wrong values to verify movement.
8r = torch.tensor(0.30, requires_grad=True)
9K = torch.tensor(2000.0, requires_grad=True)
10P0 = torch.tensor(P[0].item())   # use the first sample as the known initial value
11
12optimizer = torch.optim.Adam([r, K], lr=0.05)
13
14for step in range(2000):
15    optimizer.zero_grad()
16
17    # Forward: P_hat(t) = K / (1 + A * exp(-r t))  with A = (K - P0) / P0.
18    A = (K - P0) / P0
19    P_hat = K / (1.0 + A * torch.exp(-r * t))
20
21    # Relative MSE so a 10% error matters the same at P=50 and P=900.
22    loss = torch.mean(((P_hat - P) / P) ** 2)
23
24    loss.backward()
25    optimizer.step()
26
27    # Light constraint: keep K above the largest observation so A stays positive.
28    with torch.no_grad():
29        K.clamp_(min=P.max().item() + 1.0)
30
31    if step % 200 == 0:
32        print(f"step {step:4d}   r={r.item():.4f}  K={K.item():.2f}  "
33              f"loss={loss.item():.3e}")
34
35print(f"\nRecovered r = {r.item():.5f}   (target 0.10000)")
36print(f"Recovered K = {K.item():.3f}    (target 1000.000)")

Why this idea matters beyond ecology

The sigmoid $\sigma(x) = 1/(1 + e^{-x})$ inside every classical neural network is literally the rescaled logistic $P(t) / K$ with $A = 1, r = 1$ . The training loop you just ran with two parameters is the smallest possible version of fitting a neural network with two weights to a sigmoidal target. The calculus that governs trout in a lake is the same calculus that governs the activation function of a perceptron.

Common Pitfalls

Mistaking early logistic for pure exponential. The early-time behavior is $P_0 e^{rt}$ to first order. Fit the first ten years of a logistic and you will get an excellent exponential model — that then predicts an extinction-of-resources catastrophe that never materializes. Always extend the model with a saturation term before extrapolating.
Forgetting that K is an output of the system. Inexperienced modelers treat $K$ as a fixed external constant. In real ecosystems $K$ emerges from the resource environment and can move with season, harvest, or climate. If the carrying capacity itself drifts, the logistic still applies pointwise but the asymptote is no longer flat.
Linearization with the wrong K. The trick $\ln((K-P)/P) = \ln A - rt$ requires knowing the correct $K$ . If you use $K_{\text{wrong}} < P_{\text{measured}}$ you get the log of a negative number — and your fitting code crashes. Use the PyTorch path to fit both parameters together when you are not sure of $K$ .
Reading the inflection time off the data. Inexperienced fitters eyeball $t^*$ from a plot and quote one decimal place. The inflection time is sensitive to both $r$ and the ratio $(K-P_0)/P_0$ ; small errors in either shift it dramatically. Always derive it from the fitted parameters, not from the chart.
Confusing continuous logistic with discrete logistic map. The continuous $dP/dt = r P (1-P/K)$ always settles monotonically to $K$ . The discrete $P_{n+1} = r P_n (1 - P_n)$ famously period-doubles and becomes chaotic for $r > 3.57$ . They look almost identical on paper. They are not the same equation.

Summary

Real populations cannot grow forever. The logistic ODE $dP/dt = r P (1 - P/K)$ adds one crowding term to the pure exponential, encoding a carrying capacity $K$ .
The right side, plotted vs $P$ , is an upside-down parabola: zeros at $0$ and $K$ (the two equilibria), maximum at $K/2$ (the inflection of the time curve).
Separation of variables plus partial fractions gives the closed form $P(t) = K/(1 + A e^{-rt})$ with $A = (K - P_0)/P_0$ .
Three regimes — exponential start, inflection at $P = K/2$ with max rate $rK/4$ , exponential approach to $K$ — together produce the S-shape.
A single log transform, $\ln((K-P)/P) = \ln A - r t$ , linearizes the sigmoid. With known $K$ , a five-line least-squares fit recovers $r$ exactly.
When $K$ is unknown, PyTorch fits both parameters jointly with autograd in a few hundred steps. The same script generalizes to the sigmoid layers of a neural network.
The logistic appears in ecology (Verhulst), epidemiology (early epidemic curves), economics (Bass diffusion of technology), neuroscience (sigmoid firing rate) and machine learning. Same equation; the meaning of $P$ , $r$ , and $K$ changes.

Where Exponential Breaks

The single question of this section

The Carrying Capacity Idea

Three readings of the same fraction

Writing the Equation From Intuition

The two equilibria fall out for free

Phase Line: The Whole Story in One Picture

Solving the Logistic ODE

Sanity check at the two ends

Three Regimes and the Inflection Point

Early phase: nearly exponential

Middle phase: the inflection

Late phase: saturation

Interactive Logistic Explorer

What to play with first

Every Starting Population Lands at K

Worked Example: r=0.1r = 0.1r=0.1, K=1000K = 1000K=1000, P0=50P_0 = 50P0​=50

Reality Check: Yeast in a Flask

Where the model still breaks

Python: Stepping the ODE by Hand

Python: Recovering r from Data via Linearization

PyTorch: Fitting r and K Together

Why this idea matters beyond ecology

Common Pitfalls

Summary

Worked Example: $r = 0.1$ , $K = 1000$ , $P_0 = 50$