Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

After working through this section you should be able to:

Derive Newton's law of cooling from a one-sentence physical observation, ending in the differential equation $\dfrac{dT}{dt} = -k\,(T - T_{\text{env}})$ .
Solve that ODE by separation of variables and recognize the answer $T(t) = T_{\text{env}} + (T_0 - T_{\text{env}})\,e^{-kt}$ as exponential relaxation toward the room.
Read off the meaning of each symbol — initial temperature, ambient temperature, cooling constant, half-life of the gap — and predict the temperature at any future time.
Apply the law to two classic problems: the cooling-coffee puzzle (when to add cream) and the forensic estimation of time of death.
Simulate the ODE with forward Euler in plain Python and recover the unknown $k$ from noisy measurements with PyTorch autograd.

Why Hot Things Cool Down

Imagine you place a cup of fresh coffee on the counter. After a minute it is a little cooler. After ten minutes it is much cooler. After an hour it is room temperature, and nothing more happens. The story of how a temperature decays is one of the very first systems ever modeled by a differential equation, and the story is the same for every object on Earth — a forgotten cup of tea, a forging cooling in air, a body on the floor of a crime scene.

The piece of physics behind all of these is so simple it almost feels like cheating. Heat moves from hot to cold, and the rate at which it moves is proportional to the temperature difference between the object and its surroundings. A cup at 95°C in a 20°C room is 75 degrees out of equilibrium and loses heat fast. Once it reaches 30°C, it is only 10 degrees out, and now it loses heat slowly. Cooling decelerates because it eats away its own driving force.

Newton's observation (1701)

The rate at which an object cools is proportional to the gap between its temperature and the surrounding temperature:

\frac{dT}{dt} \;\propto\; -(T - T_{\text{env}}).

That single sentence — written four hundred years before central heating was a thing — is enough to build the whole solution. We just need to turn the words into calculus.

Where Newton Got the Idea

Newton was investigating the temperature of red-hot iron as it cooled in air. He recorded paired numbers — time on a clock and temperature on a scale — and noticed something striking: in equal stretches of time, the temperature did not drop by equal amounts. Instead, the fraction of the remaining excess heat that was lost in each minute was roughly constant. Hot iron lost a lot in the first minute and a little in the tenth, but it always lost about the same percentage of how-far-from-cold it was.

Anything whose loss rate is proportional to its current size obeys an exponential decay law. So Newton was discovering, before exponentials had a name, that the temperature gap decays exponentially. Once you see this in the data, the differential equation almost writes itself.

Newton is using the same shape of argument that gives us radioactive decay, drug clearance, RC-circuit voltage, and chemical concentration in a stirred tank. Section 21.4 showed you the bare exponential. This section dresses it up in physical clothing.

Building the Differential Equation

Let $T(t)$ be the temperature of the body at time $t$ , and $T_{\text{env}}$ the (assumed constant) room temperature. Newton's observation says

\frac{dT}{dt} = -k\,(T - T_{\text{env}})

where $k > 0$ is a constant of proportionality whose units are 1/time (so the right side has units of temperature per time, just like the left). The minus sign encodes the physics: if $T > T_{\text{env}}$ , the parenthesis is positive, and we want $dT/dt < 0$ ; if $T < T_{\text{env}}$ , the parenthesis is negative, and $dT/dt > 0$ . In both cases T moves toward $T_{\text{env}}$ .

Notice this is a first-order linear ODE — exactly the kind we solved in Section 21.1 with an integrating factor. We could use that machinery here. But because the right-hand side depends only on

T

(not on

t

), the equation is also autonomous and separable. Separation is the cleanest path, so that is what we use.

Solving It by Separation of Variables

The trick is to move all the $T$ 's to one side of the equation and all the $t$ 's to the other. Dividing through by $(T - T_{\text{env}})$ and multiplying by $dt$ :

\frac{dT}{T - T_{\text{env}}} = -k\,dt.

Integrate both sides:

\int \frac{dT}{T - T_{\text{env}}} = \int -k\,dt

\ln\!\bigl|T - T_{\text{env}}\bigr| = -k\,t + C_1.

Exponentiating both sides — and absorbing the sign and the constant into a single arbitrary constant $C$ — gives

T - T_{\text{env}} = C\,e^{-kt}.

The constant $C$ is pinned down by the initial condition $T(0) = T_0$ : plug in $t = 0$ and you get $C = T_0 - T_{\text{env}}$ . Substituting back:

\boxed{\;T(t) = T_{\text{env}} + (T_0 - T_{\text{env}})\,e^{-kt}.\;}

That is the closed-form solution. Three lines of separable-equation algebra turn one physical sentence into a formula valid at every instant in time.

Anatomy of the Solution

Every piece of $T(t)$ means something physical. The formula is small, so let us name everything in it.

Symbol	Name	What it sets	Units
T_env	Ambient temperature	The horizontal asymptote — the curve approaches it but never crosses	°C (temperature)
T_0	Initial temperature	Where the curve starts at t = 0	°C (temperature)
T_0 − T_env	Initial gap	Vertical distance from the asymptote at t = 0 — the amplitude of the decay	°C
k	Cooling constant	How fast the gap shrinks. Big k = fast cooling. Small k = slow	1 / time
1/k	Time constant τ	Time for the gap to shrink to 1/e ≈ 36.8% of its starting value	time
ln(2)/k	Half-life of the gap	Time for the gap to shrink to half its starting value	time

Two limits worth memorizing. As $t \to 0$ , $T \to T_0$ . As $t \to \infty$ , $e^{-kt} \to 0$ so $T \to T_{\text{env}}$ . The exponential glues the two endpoints together with the right curvature.

Interactive: Direction Field & Solution Family

The clearest way to see the law is to look at its slope field. At every point in the $(t, T)$ plane there is a unique slope $-k(T - T_{\text{env}})$ , and the solution curves are the lines that always follow those slopes. Drag the sliders. Click on the left edge of the plot to seed new initial temperatures and watch them flow into the dashed equilibrium line.

Loading direction field...

Push

k

close to zero and the arrows flatten out — cooling stalls, curves barely move. Push

k

up to 0.5 and the field looks almost like a vacuum cleaner pointed at

T_{\text{env}}

: every curve sucks into the asymptote within a couple of minutes.

Worked Example: Coffee on the Counter

Let's pin all this down with one concrete problem. Try it by hand first; expand the panel to compare.

Problem. A cup of coffee at $T_0 = 95^\circ\!\mathrm{C}$ is left in a room at $T_{\text{env}} = 20^\circ\!\mathrm{C}$ . After 5 minutes the temperature is 78.41°C.

(a) Find the cooling constant $k$ .
(b) Predict the temperature at $t = 20$ min.
(c) When is the coffee at 60°C (drinkable)?

Show the full hand-worked solution

Step 1. Write the solution template.

T(t) = 20 + 75\,e^{-kt}

because

T_0 - T_{\text{env}} = 95 - 20 = 75

Step 2. Use the 5-minute reading to find $k$ .

78.41 = 20 + 75\,e^{-5k} \;\Longrightarrow\; e^{-5k} = \frac{58.41}{75} = 0.7788.

Take the natural log of both sides:

-5k = \ln 0.7788 = -0.2500, \quad\therefore\quad k = 0.0500 \text{ /min}.

(We chose the data so

k

comes out clean.)

Step 3 — part (b). Predict T(20).

T(20) = 20 + 75\,e^{-1.0} = 20 + 75 \times 0.3679 \approx 47.59^\circ\!\mathrm{C}.

After 20 minutes the coffee is about 47.6°C. Note that this is past the “ideal drinking” window — wait too long and the coffee passes through it.

Step 4 — part (c). When does T = 60°C? Solve

60 = 20 + 75\,e^{-0.05\,t}

e^{-0.05\,t} = \frac{40}{75} \approx 0.5333.

t = -\frac{\ln 0.5333}{0.05} = \frac{0.6286}{0.05} \approx 12.57 \text{ min}.

So the coffee enters the 60°C zone at about t ≈ 12.6 minutes.

Sanity check (half-life). The half-life of the gap is

\ln 2 / k = 0.6931 / 0.05 \approx 13.86

minutes. Starting from a 75° gap, after 13.86 min the gap should be 37.5°, i.e.

T \approx 57.5^\circ\!\mathrm{C}

— slightly below 60°, consistent with the t ≈ 12.6 min above. ✓

Interactive: The Coffee + Cream Puzzle

Suppose you want to drink your coffee as warm as possible after 10 minutes, and you have a splash of refrigerated cream that will lower the temperature instantly when added. Should you add the cream immediately, or wait until you are about to drink?

Most people guess “wait” — they reason that pouring cold cream into hot coffee will cool it down a lot, so why do it early? Newton's law gives a counterintuitive answer. The cooling rate is proportional to the gap $(T - T_{\text{env}})$ , so a cooler cup loses heat more slowly. By dropping the temperature now you also slow down every subsequent loss. Play with the simulator and watch the two strategies fight it out.

Loading coffee simulator...

Press “Show Cream Timing Comparison”. The orange curve (cream at

t = 5

) ends colder than the green curve (cream at

t = 0

). Adding the cream early wins, every time, for any positive

k

and any cream-temperature lower than room. This is the same logic that makes you put the milk in the fridge before opening the carton on a hot day.

Forensic Application: Time of Death

Newton's law of cooling is so dependable that it is used in forensic medicine. A body at $37^\circ\!\mathrm{C}$ starts cooling toward room temperature the moment the heart stops. Given two temperature readings, taken at known times after discovery, both $k$ and the time of death can be solved.

Let $\tau$ be the time since the first reading. Then $T(\tau) = T_{\text{env}} + (T_1 - T_{\text{env}})\,e^{-k\tau}$ . A second reading at $\tau = \Delta\tau$ gives

T_2 = T_{\text{env}} + (T_1 - T_{\text{env}})\,e^{-k\,\Delta\tau}

so we can solve for k:

k = -\,\dfrac{1}{\Delta\tau}\,\ln\!\left(\dfrac{T_2 - T_{\text{env}}}{T_1 - T_{\text{env}}}\right).

With $k$ in hand, run the curve backwards until $T = 37^\circ$ — that is the time of death:

t_{\text{death}} = \dfrac{1}{k}\,\ln\!\left(\dfrac{37 - T_{\text{env}}}{T_1 - T_{\text{env}}}\right).

Loading forensic estimator...

Try the default readings:

T_1 = 30^\circ

T_2 = 28^\circ

, room at

22^\circ

, one-hour gap. The estimator gives

k \approx 0.288 \text{/h}

and time of death about

2.19

hours before the first reading. Real forensic textbooks publish typical values of

k

for nude vs clothed bodies; the method is the same.

In the real world, room temperature drifts, bodies are not perfect spheres of muscle, and clothing matters. The single-exponential Newton law underpredicts cooling in the first hour (the “Henssge plateau” correction is used in practice). Calculus is still the scaffolding — the corrections are just better physics for the same ODE.

Python: Numerical vs Closed-Form Solution

We have a beautiful closed form. Why ever solve cooling numerically? Two reasons. (1) Real-world problems almost never have closed forms — once $T_{\text{env}}$ drifts with time or $k$ depends on $T$ , separation breaks. A numerical solver just keeps working. (2) Watching forward Euler underestimate the truth by exactly the right amount is the best way to internalize what a derivative actually is.

Forward Euler vs the analytic exponential cooling curve

🐍newton_cooling_euler.py

Explanation(22)

Code(29)

1Import the math module

We only need math.exp for the closed-form solution. Using the standard library keeps the example dependency-free so a student can paste it straight into a fresh Python file.

4Block: physical setup

The next four lines bind the four physical numbers of the problem to names. Reading the code top-to-bottom should feel like reading a lab notebook entry.

4T_env = 20.0 (the room)

T_env is the asymptote of Newton's law. Solutions slide toward it but never cross. Using 20°C means a typical room. Change this and the steady state moves with you.

5T0 = 95.0 (initial coffee)

Fresh-brewed coffee sits near 95°C. This is the value of T at t = 0 — the initial condition. The constant of integration in the analytic solution is C = T0 - T_env = 75.

6k = 0.05 (cooling rate)

Newton's k has units of 1/time. A k of 0.05 per minute means: every minute, the gap (T - T_env) shrinks by about 5%. The half-life of the gap is ln(2)/k ≈ 13.86 minutes.

7dt = 1.0 (Euler step)

Forward Euler advances time by dt at each step using only the slope at the current point. A small dt is more accurate but slower. We pick dt = 1 minute so the first few rows of output are easy to read.

8steps = 30 (simulate 30 minutes)

Total horizon. With dt = 1 and steps = 30, the loop produces 31 sample points covering t = 0, 1, 2, ..., 30 minutes — long enough to see the coffee reach drinkable temperature and then almost equilibrate.

10📚 def T_exact(t) — the analytic solution

We define the closed form T(t) = T_env + (T0 - T_env) e^{-kt} once, as a Python function. The function is pure: same t always returns the same T. We will use it both to print the truth row and to measure Euler's error.

11return T_env + (T0 - T_env) * math.exp(-k * t)

Compute (T0 - T_env) = 75, multiply by e^{-k t}, add T_env back. For t = 0: 20 + 75·1 = 95 ✓. For t = 5: 20 + 75·e^{-0.25} = 20 + 75·0.77880 ≈ 78.4101.

14T = T0 (state variable for Euler)

This is the variable that actually moves through time. We initialize it to T0 = 95. After each Euler step T will be updated. Notice this is a regular float, not a list — the list T_num below is what we save for plotting.

15Three result buffers

times stores the timeline (0, 1, 2, ...). T_num stores the numerical Euler approximation. T_an stores the analytic truth. Storing both lets the script print the error column at the end without re-running anything.

16for n in range(steps + 1): — the Euler loop

We iterate from n = 0 up to and INCLUDING n = steps. The +1 is intentional: we want 31 samples covering t = 0 through t = 30, not 30. Record-then-step is the standard pattern.

17t = n * dt

Reconstruct the wall-clock time from the integer iteration counter. With n = 0, 1, 2, ... and dt = 1, this gives t = 0, 1, 2, ... in minutes.

18times.append(t)

Save the current time to the timeline list. We append BEFORE updating T so that times[n] matches T_num[n] (both refer to the start of step n).

19T_num.append(T)

Save the current numerical temperature. At n = 0 this is 95.0000 by construction. At n = 1 it is 91.2500 (computed on the previous iteration's last line). The values march down toward 20.

20T_an.append(T_exact(t))

Call the analytic function at the same t. T_an[0] = 95.0000, T_an[1] = 95 - 75(1 - e^{-0.05}) ≈ 91.3422. The discrepancy with T_num[1] = 91.2500 is the Euler error of one step.

21dTdt = -k * (T - T_env) — Newton's law of cooling

This single line IS the differential equation. The slope at the current point is proportional to how far T is from the room, with a minus sign because hot bodies cool. At T = 95, T_env = 20, k = 0.05 it evaluates to -3.75 °C/min.

22T = T + dt * dTdt — Euler update

Tangent-line extrapolation: walk along the current slope for dt time units. With dTdt = -3.75 and dt = 1 we add -3.75 to T = 95 and get 91.25. This is why Euler at step 1 underestimates the truth slightly — the slope itself is changing while we walk.

25print(...) — header row

f-strings with format specs >4, >9 right-align each column. The capital words ('Euler', 'Exact', 'error') are literals; the rest are runtime values from the lists. Pure presentation — no math here.

26for n in range(0, 7): — show first 7 rows

Don't print all 31 rows — the first seven are enough to see the curvature of cooling and the small but growing Euler error. Indexing into times, T_num, T_an gives the n-th sample.

27err = T_num[n] - T_an[n]

Signed error. Negative numbers (which is what you will see) mean Euler is BELOW the true curve. This is typical for forward Euler on a concave-up cooling curve — the tangent line undershoots when the second derivative is positive.

28print(...) — print the row

Expected first few rows (numbers from running this on a CPython 3.11 — your values will match to 4 decimals): t=0 Euler=95.0000 Exact=95.0000 error=0.0000 t=1 Euler=91.2500 Exact=91.3422 error=-0.0922 t=2 Euler=87.6875 Exact=87.8628 error=-0.1753 t=5 Euler=78.0336 Exact=78.4101 error=-0.3765 The error grows roughly linearly with t — that's first-order Euler.

7 lines without explanation

1import math
2
3# ---- The physical setup ----------------------------------
4T_env = 20.0      # room temperature in degrees Celsius
5T0    = 95.0      # initial coffee temperature
6k     = 0.05      # cooling constant, per minute
7dt    = 1.0       # Euler step size, minutes
8steps = 30        # simulate 30 minutes
9
10# ---- (A) Closed-form solution ----------------------------
11def T_exact(t):
12    return T_env + (T0 - T_env) * math.exp(-k * t)
13
14# ---- (B) Forward-Euler numerical solver ------------------
15T = T0
16times, T_num, T_an = [], [], []
17for n in range(steps + 1):
18    t = n * dt
19    times.append(t)
20    T_num.append(T)
21    T_an.append(T_exact(t))
22    dTdt = -k * (T - T_env)        # Newton's law
23    T    = T + dt * dTdt           # one Euler step
24
25# ---- Show the first few rows -----------------------------
26print(f"{'t':>4} {'Euler':>9} {'Exact':>9} {'error':>9}")
27for n in range(0, 7):
28    err = T_num[n] - T_an[n]
29    print(f"{times[n]:>4.0f} {T_num[n]:>9.4f} {T_an[n]:>9.4f} {err:>9.4f}")

Expected console output:

   t     Euler     Exact     error
   0   95.0000   95.0000    0.0000
   1   91.2500   91.3422   -0.0922
   2   87.6875   87.8628   -0.1753
   3   84.3031   84.5531   -0.2500
   4   81.0880   81.4048   -0.3168
   5   78.0336   78.4101   -0.3765
   6   75.1319   75.5512   -0.4193

The error grows almost linearly with

t

— that is the hallmark of a first-order method. Halving

dt

would roughly halve the error. To remove it entirely you would move to a method like RK4, which we cover in a later chapter.

PyTorch: Fitting k from Real Measurements

Now flip the problem on its head. Instead of knowing $k$ and predicting the curve, suppose you have a thermometer and a stopwatch and you measured 11 temperatures over 20 minutes. What was $k$ ? The closed-form answer comes from any two measurements — but that throws away nine data points and is fragile against noise. A better approach is to fit the whole cooling curve to all the data at once.

That is exactly what an autograd optimizer does. Define a model (the Newton-law curve, parameterized by $k$ ), define a loss (mean-squared error against the measurements), and ask PyTorch to walk $k$ downhill on that loss until it stops moving. The same machinery that powers GPT does this without breaking a sweat.

Recover k from noisy data with PyTorch autograd

🐍newton_cooling_fit.py

Explanation(15)

Code(26)

1Import PyTorch

We are about to use PyTorch's autograd engine to fit a physical constant. The same engine that trains GPT also does single-variable curve fitting — and that is exactly what this script is.

4times tensor

Eleven sample times in minutes, expressed as a torch.tensor of floats. We do NOT mark this with requires_grad — the times are inputs we observed, not unknowns we're learning.

6T_obs tensor — synthetic data

Eleven measured temperatures (Celsius) corresponding to the times above. These numbers were generated from the true law T = 20 + 75·exp(-0.05·t) with a tiny bit of Gaussian noise — so the secretly-correct answer is k ≈ 0.05. Of course we pretend we don't know that.

9T_env = 20.0

Room temperature. We assume the room is known (a thermostat will tell you). It enters the law as the asymptote.

10T0 = 95.0

Initial coffee temperature, also assumed known from the very first measurement. The only unknown is the cooling rate k.

13📚 k = torch.tensor(0.10, requires_grad=True)

Create a 0-D tensor holding our current guess for k. requires_grad=True is the crucial flag — it tells autograd to record every operation involving k onto the computation graph so that loss.backward() can later fill k.grad with ∂loss/∂k. We start at 0.10 (twice the true value) on purpose. Adam will walk it down toward 0.05.

14📚 optimizer = torch.optim.Adam([k], lr=0.01)

Adam is a gradient-descent variant with adaptive per-parameter learning rates. The first argument is the LIST of tensors to optimize ([k] here — a single scalar). lr=0.01 is the base step size. If we used plain torch.optim.SGD instead, the same code would still work but converge more slowly and depend more on the initial guess.

17for step in range(2000): — training loop

2000 gradient steps. With Adam and lr=0.01 this is dramatic overkill for a one-parameter problem — convergence happens in a few hundred steps — but it leaves no doubt that k has settled to a fixed point.

18T_pred = T_env + (T0 - T_env) * torch.exp(-k * times)

Vectorized prediction. torch.exp broadcasts: -k * times is a length-11 tensor, exp of it is length-11, multiplied by the scalar (T0 - T_env) = 75, plus the scalar T_env = 20 → length-11 T_pred. Every operation here is differentiable in k, which is what autograd needs.

19📚 loss = ((T_pred - T_obs) ** 2).mean()

Mean-squared-error loss between prediction and observation. .mean() averages over the 11 sample points so the gradient magnitude doesn't depend on how many measurements we happen to have. This is a scalar; .backward() requires a scalar starting point.

21📚 optimizer.zero_grad()

PyTorch accumulates gradients into .grad attributes by default. If we don't zero them out, the next .backward() ADDS the new gradient on top of stale data from the previous iteration — a classic, silent bug.

22📚 loss.backward()

Computes ∂loss/∂k via reverse-mode autodiff and writes the result into k.grad. Mathematically: ∂loss/∂k = mean over t of 2·(T_pred(t) - T_obs(t)) · (-times[t]) · (T0 - T_env) · exp(-k·times[t]). PyTorch derives that automatically — that is the entire point of autograd.

23📚 optimizer.step()

Adam consumes k.grad and updates k in place. The arithmetic is roughly: k ← k - lr · m̂/(√v̂ + ε) where m̂, v̂ are running mean and variance of the gradient. The reader does not need to know Adam's formulas to use it — they just need to know it monotonically lowers the loss in practice.

25Report recovered k

After 2000 Adam steps, k.item() converts the 0-D tensor back to a regular Python float. Expected output: recovered k ≈ 0.0493 per minute — within 1% of the true 0.05 we used to generate the data. The remaining gap is the noise we added on purpose.

26Half-life of the temperature gap

Closed-form sanity check. The gap (T - T_env) halves when e^{-k·t_½} = 1/2, i.e. t_½ = ln(2)/k ≈ 0.6931 / 0.0493 ≈ 14.06 min. So this coffee loses half its 'excess heat' every 14 minutes — a number you could measure with a wristwatch and a thermometer.

11 lines without explanation

1import torch
2
3# ---- Synthetic measurements (every 2 minutes for 20 min) -
4times = torch.tensor([0., 2., 4., 6., 8., 10.,
5                      12., 14., 16., 18., 20.])
6T_obs = torch.tensor([95.28, 87.44, 81.20, 75.67, 69.97,
7                      65.47, 61.21, 56.99, 53.31, 50.55, 47.89])
8
9T_env = 20.0
10T0    = 95.0
11
12# ---- The unknown we want to discover ---------------------
13k = torch.tensor(0.10, requires_grad=True)   # bad initial guess
14optimizer = torch.optim.Adam([k], lr=0.01)
15
16# ---- Fit Newton's law of cooling to the data -------------
17for step in range(2000):
18    T_pred = T_env + (T0 - T_env) * torch.exp(-k * times)
19    loss   = ((T_pred - T_obs) ** 2).mean()
20
21    optimizer.zero_grad()
22    loss.backward()
23    optimizer.step()
24
25print(f"recovered k = {k.item():.4f} per minute")
26print(f"half-life of the temperature gap: {0.6931 / k.item():.2f} min")

Expected output:

recovered k = 0.0493 per minute
half-life of the temperature gap: 14.06 min

The true

k

used to generate the data was 0.0500. We recovered it to within 1.5% from 11 noisy measurements.

Replace

T0 = 95.0

with a learnable tensor and watch PyTorch recover both

T_0

and

k

. Once you make

T_{\text{env}}

a third unknown the model becomes ill-conditioned — there are many

(T_0, T_{\text{env}}, k)

triples that fit the same data. This is the smallest possible example of a real modeling trap.

Where Else Cooling Shows Up

Pharmacokinetics

Drug concentration in the bloodstream decays toward zero by the same ODE, with $T_{\text{env}} = 0$ (no drug at steady state) and $k$ set by the kidneys and liver. The half-life of a medication is $\ln 2/k$ . Dosing schedules — every 4 hours, every 12 hours — are chosen so the concentration stays inside a therapeutic window.

RC Circuits

A capacitor discharging through a resistor obeys $dV/dt = -V/(RC)$ , exact mathematical twin of Newton's law with $T_{\text{env}} = 0$ and $k = 1/(RC)$ . The time constant $\tau = RC$ is the electrical analog of the thermal $\tau = 1/k$ .

Radioactive Decay

Replace temperature with the number of unstable nuclei and you get $dN/dt = -\lambda N$ . Half-life $\ln 2 / \lambda$ is the basis of carbon dating.

Chemical Reactions

A first-order reaction $A \to \text{products}$ has concentration decaying as $[A](t) = [A]_0 e^{-kt}$ . Same skeleton, chemistry instead of physics.

Anywhere “rate ∝ gap from equilibrium” holds

Sales returning to a baseline after a promotion, a thermostat-fed room responding to weather, a population whose density approaches carrying capacity in the small-perturbation limit. The shape recurs because the assumption — proportional restoring rate — is the simplest nontrivial model of relaxation.

Common Pitfalls

Sign error on k. The minus is part of the law, not part of $k$ . Write $dT/dt = -k(T - T_{\text{env}})$ with $k > 0$ . If you absorb the minus into $k$ you will fit a negative cooling rate and your coffee will catch fire.
Forgetting the constant of integration. After separation you get $\ln|T - T_{\text{env}}| = -kt + C_1$ . That $C_1$ is essential — it becomes the amplitude $(T_0 - T_{\text{env}})$ after exponentiating.
Using only one data point to find k. One reading and an initial condition just barely gives you $k$ . Two readings make the answer overdetermined and robust to noise. Many data points + least squares is the gold standard.
Assuming T_env is constant. If the room itself is heating or cooling, replace $T_{\text{env}}$ with $T_{\text{env}}(t)$ . The ODE is now forced and non-autonomous; separation no longer works directly. You go back to the integrating-factor method of Section 21.1.
Mixing time units. $k$ has units of 1/time. If you measure in minutes but quote the half-life in hours, you will be off by 60×.

Summary

Concept	What to remember
Newton's law	dT/dt = -k (T − T_env). Rate of cooling ∝ gap from environment.
Solution	T(t) = T_env + (T_0 − T_env) e^(−kt). Exponential relaxation toward T_env.
Time constant	τ = 1/k. Half-life of the gap = ln(2)/k ≈ 0.693/k.
Geometry	T_env is the horizontal asymptote. Curves above descend; curves below ascend. None cross.
Finding k	Two readings → k = -(1/Δτ) ln[(T_2 − T_env)/(T_1 − T_env)].
Forensic use	Solve k from two readings, run curve back to 37°C → time of death.
Numerical	Forward Euler: T ← T + dt · dT/dt. First-order accurate; error grows ∝ dt.
Parameter fitting	MSE loss + autograd + Adam recovers k from noisy measurements.

Coming next: Section 21.6 generalizes “rate ∝ gap” to mixing problems — saltwater flowing in and out of a tank — where the conservation law is mass instead of energy but the differential equation is the same character in a new costume.