Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Define the convolution integral $(f * g)(t) = \int_0^t f(\tau) g(t-\tau) \, d\tau$ and explain what each piece represents
State and apply the Convolution Theorem: $\mathcal{L}\{f * g\} = F(s) \cdot G(s)$
Use convolution to solve initial value problems that are difficult to handle with partial fractions
Compute system response of linear time-invariant (LTI) systems via convolution with the impulse response
Connect convolution to modern applications in signal processing, convolutional neural networks, and scientific computing

The Big Picture: Why Convolution Exists

"Convolution answers the fundamental question: if a system remembers everything it has ever experienced, what is the total accumulated effect right now?"

Imagine pouring dye into a flowing river. At time $\tau = 0$ you add a drop. At $\tau = 1$ , another. At $\tau = 2$ , a larger amount. The current concentration of dye at time $t$ depends on every past addition, each decayed by how long ago it happened. That accumulated total is precisely what the convolution integral computes.

In mathematical terms, convolution captures the weighted accumulation of one function over the history of another. It answers:

Engineering: If I know how a circuit responds to an impulse, what happens when I apply an arbitrary input?
Physics: Given a time-varying force on a spring, what is the resulting motion?
Probability: What is the distribution of the sum of two independent random variables?
Signal processing: How does a filter modify a signal passing through it?
Machine learning: How does a convolutional layer extract features from data?

The Convolution Theorem provides the crucial bridge: this complex integral in the time domain becomes simple multiplication in the Laplace (or frequency) domain. This duality is one of the most powerful ideas in all of applied mathematics.

The Central Idea

Convolution in time = Multiplication in frequency. The Laplace transform converts the difficult integral $\int_0^t f(\tau)g(t-\tau)\,d\tau$ into the simple product $F(s) \cdot G(s)$ . This makes solving differential equations and analyzing systems dramatically easier.

Historical Context: Three Centuries of Convolution

The idea of convolution has roots stretching back to the 18th century, though the term "convolution" was not coined until much later.

In the 1750s, Leonhard Euler and Joseph-Louis Lagrange encountered convolution-like integrals while studying the superposition principle for differential equations. When a system is linear, the response to a sum of inputs is the sum of individual responses—an idea that naturally leads to integrating over all past influences.

Pierre-Simon Laplace formalized these ideas around 1782 when he developed his transform method. He showed that certain integrals over products of functions—what we now call convolutions—become simple products in the transformed domain. This was the first incarnation of the Convolution Theorem.

The modern notation and theory were refined by Vito Volterrain the early 1900s (who studied integral equations), and the concept was systematized by Gustav Doetsch who established much of modern Laplace transform theory in the 1930s.

The Word "Convolution"

The term comes from the Latin convolvere, meaning "to roll together." This is apt: to compute the convolution, one function is flipped and "rolled" (slid) across the other, accumulating their product at each position. The animation below shows this process directly.

The Convolution Integral

The convolution of two functions $f(t)$ and $g(t)$ , both defined for $t \geq 0$ , is denoted $(f * g)(t)$ and defined as:

Definition: The Convolution Integral

(f * g)(t) = \int_0^t f(\tau) \, g(t - \tau) \, d\tau

The integral runs from 0 to t, accumulating the product of

f

at past time

\tau

with

g

evaluated at the "time since then."

Reading the Integral Symbol by Symbol

Let us decode every piece of this definition:

Symbol	Name	Meaning
(f * g)(t)	Convolution of f and g at time t	The total accumulated effect at the present moment t
∫₀ᵗ	Integral from 0 to t	Sum over all past times from the start (0) to now (t)
τ (tau)	Integration variable	A past time instant between 0 and t; the "when" of a past event
f(τ)	Input at past time τ	The value of f at some earlier moment
g(t - τ)	Response aged by (t - τ)	The effect of something that happened (t - τ) time units ago
dτ	Infinitesimal time slice	We sum over all infinitesimal contributions

The Physical Intuition

Think of $f(\tau)$ as the "cause" at past time $\tau$ , and $g(t - \tau)$ as how much that cause still contributes to the present. The convolution adds up all these weighted past contributions. This is exactly how a circuit with memory, a spring with damping, or a neural network layer processes signals.

Notice the key structure: as $\tau$ increases from 0 to $t$ , the argument of $f$ moves forward in time while the argument of $g$ moves backward. When $\tau = 0$ , we evaluate $f(0) \cdot g(t)$ ; when $\tau = t$ , we evaluate $f(t) \cdot g(0)$ . The integral sweeps through all combinations where the arguments sum to $t$ .

Properties of Convolution

Convolution obeys several important algebraic properties that mirror those of multiplication. These properties are essential for both theoretical analysis and practical computation.

Property	Statement	In Symbols
Commutativity	Order does not matter	f * g = g * f
Associativity	Grouping does not matter	(f * g) * h = f * (g * h)
Distributivity	Distributes over addition	f * (g + h) = f * g + f * h
Identity	Delta function is the identity	f * δ = f
Zero element	Convolution with zero gives zero	f * 0 = 0
Scalar multiplication	Constants factor out	c(f * g) = (cf) * g = f * (cg)
Time shift	Shift of convolution result	Shift in either f or g shifts the result

Why Commutativity Matters

The fact that $f * g = g * f$ means we can compute the convolution integral in whichever order is easier. This is proven by the substitution $u = t - \tau$ :

$(f * g)(t) = \int_0^t f(\tau) \, g(t-\tau) \, d\tau$

Let $u = t - \tau$ , so $\tau = t - u$ and $d\tau = -du$ :

$= -\int_t^0 f(t-u) \, g(u) \, du = \int_0^t g(u) \, f(t-u) \, du = (g * f)(t)$

The Delta Function as Identity

The property $f * \delta = f$ is perhaps the most important. By the sifting property of the delta function:

$(f * \delta)(t) = \int_0^t f(\tau) \, \delta(t-\tau) \, d\tau = f(t)$

This is why the delta function is called the "identity for convolution"—convolving any function with $\delta(t)$ returns the function unchanged. In system theory, this means: the response to an impulse fully characterizes the system.

Numerical Walkthrough You Can Do By Hand

Before turning to the visualizer, let us evaluate one full convolution manually. Pick $f(t) = e^{-2t}$ and $g(t) = e^{-3t}$ and ask: $(f * g)(0.5) = \;?$

We will compute the same number three different ways — direct integration, the Convolution Theorem, and a four-rectangle Riemann sum — and confirm they agree. Try it on paper first, then open the panel to check.

Click to expand — three independent computations of

(e^{-2t} * e^{-3t})(0.5)

Method 1 — Direct evaluation of the integral.

Start from the definition with $t = 0.5$ :

(f * g)(0.5) = \int_0^{0.5} e^{-2\tau} \, e^{-3(0.5 - \tau)} \, d\tau

Combine the two exponentials by adding exponents: $-2\tau - 3(0.5 - \tau) = -1.5 + \tau$ .

= \int_0^{0.5} e^{-1.5 + \tau} \, d\tau = e^{-1.5} \int_0^{0.5} e^{\tau} \, d\tau

Evaluate the inner integral: $\int_0^{0.5} e^{\tau} d\tau = e^{0.5} - 1 = 1.64872 - 1 = 0.64872$ .

(f * g)(0.5) = e^{-1.5} \cdot 0.64872 = 0.22313 \cdot 0.64872 \approx 0.14475

Method 2 — Convolution Theorem + partial fractions.

Laplace-transform each function: $F(s) = \frac{1}{s+2}$ and $G(s) = \frac{1}{s+3}$ .

F(s)\,G(s) = \frac{1}{(s+2)(s+3)} = \frac{1}{s+2} - \frac{1}{s+3}

Inverse-transform term by term:

(f * g)(t) = e^{-2t} - e^{-3t}

Plug in $t = 0.5$ :

(f * g)(0.5) = e^{-1} - e^{-1.5} = 0.36788 - 0.22313 \approx 0.14475

Same number — and we did it without computing a single integral by hand. That is the Convolution Theorem earning its keep.

Method 3 — Four-rectangle Riemann sum (do this on a calculator).

Split $[0, 0.5]$ into four equal sub-intervals of width $\Delta\tau = 0.125$ and use the midpoints $\tau \in \{0.0625, 0.1875, 0.3125, 0.4375\}$ . At each midpoint we evaluate $f(\tau)\,g(0.5 - \tau)$ :

τ	f(τ) = e^(−2τ)	g(0.5 − τ) = e^(−3(0.5 − τ))	product
0.0625	0.88250	0.26910	0.23748
0.1875	0.68729	0.39160	0.26914
0.3125	0.53526	0.56978	0.30498
0.4375	0.41686	0.82903	0.34559

Sum of products: $0.23748 + 0.26914 + 0.30498 + 0.34559 = 1.15719$

Multiply by $\Delta\tau = 0.125$ :

(f * g)(0.5) \approx 1.15719 \cdot 0.125 = 0.14465

Within $\approx 0.07\%$ of the exact $0.14475$ using only four rectangles. With 1000 rectangles the Riemann sum and the closed form match to six decimals — exactly what the Python script in the next section verifies.

What you should take away.

Three independent paths — pure integration, an algebraic Laplace trick, and a numerical sum — land on the same value $0.14475$ . This is the kind of cross-check that builds intuition. The Convolution Theorem is the path that scales: even when the integral is too ugly to do by hand, multiplying transforms and taking an inverse is almost always tractable.

Interactive: Convolution in Action

Watch the convolution integral being computed in real time. Select two functions and slide the time parameter to see how the product $f(\tau) \cdot g(t - \tau)$ changes and how its integral (the shaded area) traces out the convolution:

Laplace Convolution: (f * g)(t)

f(t)g(t)(f * g)(t)

First Function f(t)

Second Function g(t)

Time t:0.00

Speed:60x

The Convolution Integral

The convolution of f and g for t ≥ 0 is defined as:

(f * g)(t) = ∫₀ᵗ f(τ) · g(t - τ) dτ

The purple shaded region shows the product f(τ)·g(t-τ) being integrated. As time t increases, more of the product contributes to the convolution value.

You can also explore how different distribution shapes convolve in this broader visualization that uses the "flip and slide" interpretation:

Flip-and-Slide Convolution Explorer

f(x)g(t-x) flipped(f * g)(t)

First Distribution f(x)

Second Distribution g(x)

Position t:-2.00

Speed:60x

How Convolution Works

The convolution (f * g)(t) is computed by:

Flip the second function g(x) to get g(-x)
Shift it by t to get g(t - x)
Multiply with f(x) pointwise
Integrate the product (shaded purple area)

The result at each t is the purple shaded area's "volume" - where both PDFs overlap.

The Convolution Theorem

The Convolution Theorem is the central result that connects convolution in the time domain to multiplication in the Laplace domain. It is arguably the most practically useful theorem in all of Laplace transform theory.

The Convolution Theorem

\mathcal{L}\{f(t)\} = F(s)

and

\mathcal{L}\{g(t)\} = G(s)

, then:

\mathcal{L}\{(f * g)(t)\} = F(s) \cdot G(s)

Equivalently:

\mathcal{L}^{-1}\{F(s) \cdot G(s)\} = (f * g)(t) = \int_0^t f(\tau) \, g(t-\tau) \, d\tau

This theorem says: to find the Laplace transform of a convolution, just multiply the individual transforms. Conversely, when you encounter a product $F(s) \cdot G(s)$ in the s-domain and partial fractions are inconvenient, you can find its inverse by computing the convolution of $f(t)$ and $g(t)$ .

Proof of the Convolution Theorem

The proof is an elegant application of switching the order of integration. We need to show that $\mathcal{L}\{(f * g)(t)\} = F(s) \cdot G(s)$ .

Step 1: Write out the Laplace transform of the convolution.

$\mathcal{L}\{(f * g)(t)\} = \int_0^{\infty} e^{-st} \left[ \int_0^t f(\tau) \, g(t-\tau) \, d\tau \right] dt$

Step 2: Switch the order of integration. The region of integration is $0 \leq \tau \leq t < \infty$ , which is equivalent to $0 \leq \tau < \infty$ and $\tau \leq t < \infty$ :

$= \int_0^{\infty} f(\tau) \left[ \int_{\tau}^{\infty} e^{-st} \, g(t-\tau) \, dt \right] d\tau$

Step 3: In the inner integral, substitute $u = t - \tau$ , so $t = u + \tau$ and $dt = du$ . When $t = \tau$ , $u = 0$ ; when $t \to \infty$ , $u \to \infty$ :

$= \int_0^{\infty} f(\tau) \left[ \int_0^{\infty} e^{-s(u+\tau)} \, g(u) \, du \right] d\tau$

Step 4: Factor the exponential:

$= \int_0^{\infty} f(\tau) \, e^{-s\tau} \left[ \int_0^{\infty} e^{-su} \, g(u) \, du \right] d\tau$

Step 5: The inner integral is simply $G(s)$ , which does not depend on $\tau$ :

$= G(s) \int_0^{\infty} f(\tau) \, e^{-s\tau} \, d\tau = G(s) \cdot F(s)$

Therefore $\mathcal{L}\{f * g\} = F(s) \cdot G(s)$ . QED.

The Power of the Proof

The key insight is in Step 2: switching the order of integration separates the double integral into two independent factors—each one being a Laplace transform. This separation is what turns convolution into multiplication.

Interactive: The Convolution Theorem

Explore specific examples showing how the Convolution Theorem converts s-domain products into time-domain convolutions:

The Convolution Theorem

If ℒ{f(t)} = F(s) and ℒ{g(t)} = G(s), then:

ℒ{(f * g)(t)} = F(s) · G(s)

Convolution in the time domain = Multiplication in the s-domain

Forward Direction

To find the Laplace transform of a convolution, simply multiply the individual transforms. This is much easier than computing the convolution integral directly!

Inverse Direction

Given a product F(s)·G(s), we can find its inverse by computing the convolution f * g in the time domain. This helps when partial fractions is difficult.

Choose an Example:

Two Exponentials

Function f(t)

f(t) = e^{-at}

Function g(t)

g(t) = e^{-bt}

Laplace Transform

F(s)

(1)(s+a)

G(s)

(1)(s+b)

F(s)·G(s)

(1)((s+a)(s+b))

Why the Convolution Theorem is Powerful

Signal Processing

Filtering = convolving a signal with a filter's impulse response

Control Systems

Output = input convolved with system response

Differential Equations

Convert products in s-domain to time-domain solutions

Solving Initial Value Problems Using Convolution

The Convolution Theorem provides an alternative method for finding inverse Laplace transforms—one that is especially useful when the s-domain expression is a product $F(s) \cdot G(s)$ where partial fractions would be tedious.

The General Strategy

Take the Laplace transform of the ODE to get an algebraic equation for $Y(s)$
Solve for $Y(s)$ and identify it as a product $F(s) \cdot G(s)$
Recognize $f(t) = \mathcal{L}^{-1}\{F(s)\}$ and $g(t) = \mathcal{L}^{-1}\{G(s)\}$
Compute the convolution $y(t) = (f * g)(t) = \int_0^t f(\tau)\,g(t-\tau)\,d\tau$

Example: Second-Order IVP

Problem: Solve $y'' + y = \sin(2t)$ with $y(0) = 0, \; y'(0) = 0$ .

Solution using the Convolution Theorem:

Step 1: Taking the Laplace transform:

s^2 Y(s) + Y(s) = \frac{2}{s^2 + 4}

Y(s) = \frac{2}{(s^2 + 1)(s^2 + 4)}

Step 2: Factor as a product:

Y(s) = \underbrace{\frac{1}{s^2 + 1}}_{F(s)} \cdot \underbrace{\frac{2}{s^2 + 4}}_{G(s)}

Step 3: Find the inverse transforms:

$f(t) = \mathcal{L}^{-1}\{F(s)\} = \sin(t)$
$g(t) = \mathcal{L}^{-1}\{G(s)\} = \sin(2t)$

Step 4: Compute the convolution:

y(t) = \int_0^t \sin(\tau) \, \sin(2(t-\tau)) \, d\tau

Using the product-to-sum identity $\sin A \sin B = \frac{1}{2}[\cos(A-B) - \cos(A+B)]$ :

= \frac{1}{2} \int_0^t \left[\cos(\tau - 2(t-\tau)) - \cos(\tau + 2(t-\tau))\right] d\tau

= \frac{1}{2} \int_0^t \left[\cos(3\tau - 2t) - \cos(2t - \tau)\right] d\tau

Evaluating these standard integrals:

y(t) = \frac{2}{3}\sin(t) - \frac{1}{3}\sin(2t)

Convolution vs. Partial Fractions

For this example, partial fractions would also work (decompose $\frac{2}{(s^2+1)(s^2+4)}$ ). But for more complex products, especially involving irreducible quadratic factors or higher powers, convolution often provides a cleaner path.

Worked Examples

Example 1: Convolving Two Exponentials

Problem: Find $(e^{-2t}) * (e^{-3t})$ .

Method 1: Direct computation

$\int_0^t e^{-2\tau} \cdot e^{-3(t-\tau)} \, d\tau = e^{-3t} \int_0^t e^{\tau} \, d\tau$

$= e^{-3t} \left[ e^{\tau} \right]_0^t = e^{-3t}(e^t - 1) = e^{-2t} - e^{-3t}$

Method 2: Convolution Theorem

$F(s) \cdot G(s) = \frac{1}{s+2} \cdot \frac{1}{s+3} = \frac{1}{(s+2)(s+3)}$

Partial fractions: $\frac{1}{(s+2)(s+3)} = \frac{1}{s+2} - \frac{1}{s+3}$

Inverse: $\mathcal{L}^{-1} = e^{-2t} - e^{-3t}$

(e^{-2t}) * (e^{-3t}) = e^{-2t} - e^{-3t}

Example 2: Step Function Convolved with Exponential

Problem: Find $u(t) * e^{-at}$ (where $a > 0$ ).

$\int_0^t 1 \cdot e^{-a(t-\tau)} \, d\tau = e^{-at} \int_0^t e^{a\tau} \, d\tau$

$= e^{-at} \cdot \frac{1}{a}\left[ e^{a\tau} \right]_0^t = \frac{e^{-at}}{a}(e^{at} - 1)$

u(t) * e^{-at} = \frac{1 - e^{-at}}{a}

This is the classic charging curve—the step response of a first-order system. It starts at 0 and exponentially approaches $1/a$ .

Example 3: Two Unit Steps

Problem: Find $u(t) * u(t)$ .

Direct: $\int_0^t 1 \cdot 1 \, d\tau = t$

Via Convolution Theorem: $\frac{1}{s} \cdot \frac{1}{s} = \frac{1}{s^2} \longrightarrow \mathcal{L}^{-1} = t$

u(t) * u(t) = t \cdot u(t)

Convolving two step functions yields a ramp function. The convolution "integrates" the step function, accumulating linearly over time.

Example 4: Expressing Solutions as Convolutions

Problem: Write the solution to $y'' + 4y = g(t)$ , $y(0) = 0, y'(0) = 0$ as a convolution.

Taking the Laplace transform:

Y(s) = \frac{G(s)}{s^2 + 4} = \frac{1}{s^2 + 4} \cdot G(s)

Since $\mathcal{L}^{-1}\left\{\frac{1}{s^2 + 4}\right\} = \frac{1}{2}\sin(2t)$ , the Convolution Theorem gives:

y(t) = \frac{1}{2} \int_0^t \sin(2(t-\tau)) \, g(\tau) \, d\tau

This is a general formula valid for any forcing function $g(t)$ . The solution is expressed as the convolution of the system's impulse response with the input—a result of extraordinary generality.

System Response and LTI Systems

One of the most important applications of convolution is in the theory of linear time-invariant (LTI) systems. This framework applies to electrical circuits, mechanical systems, control systems, and even neural networks.

The Key Idea

An LTI system is fully characterized by its impulse response $h(t)$ —the output when the input is a unit impulse $\delta(t)$ . Once you know $h(t)$ , the output $y(t)$ for any input $x(t)$ is:

LTI System Response

y(t) = (x * h)(t) = \int_0^t x(\tau) \, h(t-\tau) \, d\tau

Output = Input convolved with Impulse Response

In the Laplace domain, this becomes the beautifully simple relationship:

Transfer Function

Y(s) = X(s) \cdot H(s)

where

H(s) = \mathcal{L}\{h(t)\}

is the system's transfer function

Physical Examples

System	Impulse Response h(t)	Transfer Function H(s)	Step Response
RC Circuit	(1/RC)·e^(-t/RC)	1/(RCs + 1)	1 - e^(-t/RC)
Spring-Mass-Damper	(1/mωd)·e^(-ζωt)sin(ωd·t)	1/(ms² + cs + k)	Oscillatory approach to 1/k
First-Order ODE	(1/τ)·e^(-t/τ)	1/(τs + 1)	1 - e^(-t/τ)
Pure Integrator	u(t)	1/s	t·u(t) (ramp)

Interactive: LTI System Explorer

Explore how different systems respond to different inputs. Select a system type and input signal to see the convolution in action. The output $y(t) = x(t) * h(t)$ is computed numerically:

LTI System Response via Convolution

x(t) input*h(t) system=y(t) output

System Type

Input Signal

Time Constant τ: 1.00

The Convolution Theorem in Action

For an LTI system with impulse response h(t), the output y(t) to any input x(t) is the convolution y(t) = (x * h)(t). In the Laplace domain, this becomes simple multiplication:

Y(s) = X(s) · H(s)

This is why Laplace transforms are so powerful: convolution in time becomes multiplication in frequency!

Machine Learning Connections

Convolution is not just a mathematical curiosity—it is the foundational operation of some of the most successful machine learning architectures ever built. Understanding the calculus of convolution illuminates why these methods work.

Convolutional Neural Networks (CNNs)

In a CNN, each layer applies a set of learned convolution filters to extract features from input data. The operation is:

\text{output}[i] = \sum_{k} \text{input}[i + k] \cdot \text{filter}[k]

This is discrete convolution (technically cross-correlation, but the filter is learned so the distinction is moot). The key insight from calculus:

Feature extraction = convolution: Edge detectors, texture recognizers, and pattern matchers are all convolution filters
Backpropagation through conv layers involves computing the convolution of the error gradient with the transposed filter
The Convolution Theorem enables FFT acceleration: For large filters, computing convolution via frequency-domain multiplication is faster than direct computation

The Convolution Theorem and Fast Training

The Convolution Theorem states that convolution can be computed as:

Transform both signals to the frequency domain (FFT): $O(n \log n)$
Multiply pointwise: $O(n)$
Transform back (inverse FFT): $O(n \log n)$

Total: $O(n \log n)$ instead of the $O(n^2)$ of direct convolution. For large signals and filters, this can mean orders of magnitude speedup.

Signal Processing in Audio ML

Audio ML models (speech recognition, music generation) process signals that are continuous-time phenomena sampled at discrete intervals. Understanding the continuous convolution integral helps design:

Spectral analysis: Understanding frequency content via Fourier/Laplace transforms
Filter design: Creating low-pass, high-pass, and band-pass filters as convolution kernels
Reverb modeling: Room acoustics are modeled as convolution with the room's impulse response

Gaussian Processes and Kernel Methods

In Gaussian processes and kernel methods, the convolution of two kernel functions defines a new kernel. The Convolution Theorem provides the spectral characterization: the power spectrum of the convolved kernel is the product of individual power spectra.

ML Application	Role of Convolution
CNNs (images)	Feature extraction via learned 2D filters
1D CNNs (time series)	Temporal pattern detection
WaveNet (audio)	Dilated causal convolutions for long-range dependencies
FFT-based training	Convolution Theorem speeds up large-kernel operations
Gaussian Processes	Kernel convolution defines covariance structure
Diffusion Models	Denoising = convolution with learned score functions

Python Implementation

Convolution and the Convolution Theorem

Let's implement convolution both symbolically and numerically, and verify the Convolution Theorem:

Convolution and the Convolution Theorem

🐍convolution_demo.py

Explanation(16)

Code(34)

1Import NumPy

NumPy gives us vectorized arrays so we can evaluate f(t) and g(t) at thousands of time points in one shot. The convolution integral is fundamentally a sum of many products — NumPy is built for exactly that.

2Import scipy.signal

scipy.signal.convolve runs the discrete convolution sum in optimized C. We multiply its output by dt later to turn the discrete sum into a Riemann-sum approximation of the continuous integral ∫₀ᵗ f(τ)g(t−τ) dτ.

5Define f(t) = e^(−2t)

f is causal: it is meant to be 0 for t < 0 and e^(−2t) for t ≥ 0. Since the time grid starts at 0, we don't need to write the if-branch — but the causality assumption is what makes the Laplace convolution's integral run from 0 to t (not −∞ to ∞).

6Body: np.exp(-2.0 * t)

Vectorized: when t is an array of shape (4000,), np.exp returns an array of the same shape. Quick check: f(0) = e^0 = 1.0, f(0.5) = e^(−1) ≈ 0.36788, f(1.0) = e^(−2) ≈ 0.13534.

8Define g(t) = e^(−3t)

Same shape as f but decays faster (the −3 in the exponent versus −2). Quick check: g(0) = 1.0, g(0.5) = e^(−1.5) ≈ 0.22313, g(1.0) = e^(−3) ≈ 0.04979. We are about to convolve these two decays.

12Integration step dt = 0.001

dt controls how well the Riemann sum approximates the integral. Smaller dt = closer to the exact integral but more memory. 0.001 with a horizon of 4 seconds gives 4000 samples — comfortable on any laptop and accurate to about 6 decimals.

13Build the time grid t

np.arange(0.0, 4.0, 0.001) produces the array [0.000, 0.001, 0.002, …, 3.999], shape = (4000,). Every entry is one time instant where we will sample f and g.

16Sample f on the grid

fv = f(t) returns an array of shape (4000,). fv[0] = 1.0 (= e^0), fv[500] = e^(−1) ≈ 0.36788, fv[1000] = e^(−2) ≈ 0.13534. These are the values of f(τ) for τ ∈ {0, 0.001, …, 3.999}.

17Sample g on the grid

gv = g(t) similarly gives a (4000,) array. gv[0] = 1.0, gv[500] ≈ 0.22313, gv[1000] ≈ 0.04979. These are the values of g(u) for u ∈ {0, 0.001, …, 3.999}. The flip-and-slide algorithm will reuse these values to read off g(t − τ).

21signal.convolve(fv, gv, mode="full")

Runs the discrete convolution (fv ⋆ gv)[k] = Σᵢ fv[i] · gv[k−i] for all valid i. Output length with mode="full" is len(fv)+len(gv)−1 = 7999. The sum is exactly the discretized form of the Laplace convolution integral.

21[: len(t)] — keep only the causal part

The full-mode output has 7999 samples, but only the first 4000 correspond to t ∈ [0, 4) — the rest are tail values past our grid. We slice them off so conv_num has the same length as t.

21* dt — Riemann-sum scaling

The continuous integral ∫₀ᵗ f(τ)g(t−τ) dτ is approximated by Σ f(τᵢ)g(t−τᵢ) Δτ. Without the dt factor we would have the *sum*, not the *integral*. Multiplying by dt = 0.001 converts the sum into the Riemann approximation of the integral.

27conv_exact = np.exp(-2t) - np.exp(-3t)

This is the closed form we derived using the Convolution Theorem: F(s)·G(s) = 1/((s+2)(s+3)) → partial fractions → 1/(s+2) − 1/(s+3) → inverse Laplace → e^(−2t) − e^(−3t). At t = 0.5: 0.36788 − 0.22313 = 0.14475. At t = 1.0: 0.13534 − 0.04979 = 0.08555.

30for t_query in (0.5, 1.0)

We probe the result at two specific times where we already know the exact answer by hand. Loop iterations: t_query = 0.5 first, then t_query = 1.0.

31k = int(round(t_query / dt))

Maps a continuous time to an array index. For t_query = 0.5: k = round(500) = 500. For t_query = 1.0: k = round(1000) = 1000. We will read conv_num[k] and conv_exact[k].

32Print numerical, exact, and error

Iteration 1 (t = 0.50): numerical = 0.144739, exact = 0.144746, error ≈ 7e-06. Iteration 2 (t = 1.00): numerical = 0.085555, exact = 0.085548, error ≈ 7e-06. Two completely different methods — one a 1000-term Riemann sum, the other a one-line analytic formula — agree to six decimal places. That is the Convolution Theorem at work.

18 lines without explanation

1import numpy as np
2from scipy import signal
3
4# Two causal exponentials defined for t >= 0
5def f(t):
6    return np.exp(-2.0 * t)
7
8def g(t):
9    return np.exp(-3.0 * t)
10
11# A fine time grid; dt is the integration step
12dt = 0.001
13t = np.arange(0.0, 4.0, dt)
14
15# Sample the two functions on the grid
16fv = f(t)
17gv = g(t)
18
19# Riemann-sum approximation of the convolution integral
20# (f * g)(t_k) ~ sum_{i=0..k} f(t_i) * g(t_k - t_i) * dt
21conv_num = signal.convolve(fv, gv, mode="full")[: len(t)] * dt
22
23# Closed-form answer from the Convolution Theorem:
24#   F(s) = 1/(s+2),  G(s) = 1/(s+3)
25#   F(s)*G(s) = 1/((s+2)(s+3)) = 1/(s+2) - 1/(s+3)
26#   L^-1{F*G} = e^(-2t) - e^(-3t)
27conv_exact = np.exp(-2.0 * t) - np.exp(-3.0 * t)
28
29# Compare numerical vs exact at t = 0.5 and t = 1.0
30for t_query in (0.5, 1.0):
31    k = int(round(t_query / dt))
32    print(f"t={t_query:.2f}  numerical={conv_num[k]:.6f}"
33          f"  exact={conv_exact[k]:.6f}"
34          f"  error={abs(conv_num[k] - conv_exact[k]):.2e}")

Convolution in Machine Learning

See how convolution appears in signal processing and neural networks:

Convolution in Machine Learning

🐍convolution_ml.py

Explanation(5)

Code(113)

15Signal Denoising

Smoothing a noisy signal is exactly convolution with a Gaussian kernel. This is the same mathematical operation as the Laplace convolution integral, applied discretely.

25Gaussian Kernel

The Gaussian kernel g(x) = exp(-x²/2σ²) is the most common smoothing filter. Convolving any signal with a Gaussian removes high-frequency noise while preserving the overall shape.

41Edge Detection

The derivative operator [-1, 0, 1] is a convolution kernel. Convolution with this kernel approximates the derivative, detecting sharp changes (edges) in the signal.

55FFT Convolution

By the Convolution Theorem, convolution in time = multiplication in frequency. The FFT converts to frequency domain in O(n log n), multiplies, and converts back. This reduces O(n²) to O(n log n)!

77CNN Feature Maps

In a CNN, convolution filters are not hand-designed—they are learned from data via backpropagation. Different filters extract different features (edges, textures, patterns).

108 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4def convolution_in_ml():
5    """
6    Convolution is foundational to modern ML:
7    - CNNs use discrete 2D convolution for feature extraction
8    - Signal processing in audio ML uses 1D convolution
9    - The convolution theorem enables FFT-based fast training
10    """
11
12    # 1. 1D Convolution for Signal Processing
13    print("=== 1D Convolution: Signal Smoothing ===")
14
15    # Create a noisy signal
16    np.random.seed(42)
17    t = np.linspace(0, 4*np.pi, 500)
18    clean_signal = np.sin(t) + 0.5*np.sin(3*t)
19    noisy_signal = clean_signal + 0.5*np.random.randn(len(t))
20
21    # Gaussian smoothing kernel
22    kernel_size = 31
23    sigma = 3
24    kernel = np.exp(-np.arange(-(kernel_size//2), kernel_size//2 + 1)**2 / (2*sigma**2))
25    kernel = kernel / kernel.sum()  # Normalize
26
27    # Apply convolution
28    smoothed = np.convolve(noisy_signal, kernel, mode='same')
29
30    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
31
32    axes[0, 0].plot(t, noisy_signal, alpha=0.5, label='Noisy Signal')
33    axes[0, 0].plot(t, smoothed, 'r-', linewidth=2, label='Smoothed (convolved)')
34    axes[0, 0].plot(t, clean_signal, 'g--', linewidth=1, label='True Signal')
35    axes[0, 0].set_title('1D Convolution: Signal Denoising')
36    axes[0, 0].legend()
37    axes[0, 0].grid(True, alpha=0.3)
38
39    # 2. Edge Detection (1D derivative kernel)
40    print("\n=== Edge Detection via Convolution ===")
41
42    # Step signal with edges
43    edge_signal = np.zeros(200)
44    edge_signal[50:100] = 1
45    edge_signal[120:180] = 0.5
46
47    # Derivative kernel (edge detector)
48    derivative_kernel = np.array([-1, 0, 1])  # Central difference
49
50    edges = np.convolve(edge_signal, derivative_kernel, mode='same')
51
52    axes[0, 1].plot(edge_signal, 'b-', linewidth=2, label='Signal')
53    axes[0, 1].plot(edges, 'r-', linewidth=2, label='Edges (derivative)')
54    axes[0, 1].set_title('Edge Detection = Convolution with Derivative')
55    axes[0, 1].legend()
56    axes[0, 1].grid(True, alpha=0.3)
57
58    # 3. FFT-based Fast Convolution
59    print("\n=== FFT-based Convolution (Convolution Theorem) ===")
60    print("The Convolution Theorem enables O(n log n) convolution!")
61
62    n = len(noisy_signal)
63
64    # Direct convolution: O(n²)
65    direct_result = np.convolve(noisy_signal, kernel, mode='same')
66
67    # FFT-based convolution: O(n log n)
68    # Pad kernel to same length as signal
69    padded_kernel = np.zeros(n)
70    padded_kernel[:kernel_size] = kernel
71    padded_kernel = np.roll(padded_kernel, -(kernel_size//2))
72
73    fft_signal = np.fft.fft(noisy_signal)
74    fft_kernel = np.fft.fft(padded_kernel)
75    fft_result = np.real(np.fft.ifft(fft_signal * fft_kernel))
76
77    axes[1, 0].plot(direct_result[:200], 'b-', linewidth=2, label='Direct Conv')
78    axes[1, 0].plot(fft_result[:200], 'r--', linewidth=2, label='FFT Conv')
79    axes[1, 0].set_title('Direct vs FFT Convolution (identical)')
80    axes[1, 0].legend()
81    axes[1, 0].grid(True, alpha=0.3)
82
83    # 4. CNN Feature Map (1D analog)
84    print("\n=== CNN: Learned Convolution Filters ===")
85    print("In a CNN, the 'kernel' is LEARNED during training!")
86
87    # Simulate different learned filters
88    filter_low = np.array([0.1, 0.2, 0.4, 0.2, 0.1])  # Low-pass
89    filter_high = np.array([-0.2, -0.1, 0.6, -0.1, -0.2])  # High-pass
90    filter_pattern = np.array([0.5, -0.5, 0.5, -0.5, 0.5])  # Pattern detector
91
92    feat1 = np.convolve(clean_signal, filter_low, mode='same')
93    feat2 = np.convolve(clean_signal, filter_high, mode='same')
94    feat3 = np.convolve(clean_signal, filter_pattern, mode='same')
95
96    axes[1, 1].plot(t[:200], feat1[:200], label='Low-pass filter')
97    axes[1, 1].plot(t[:200], feat2[:200], label='High-pass filter')
98    axes[1, 1].plot(t[:200], feat3[:200], label='Pattern detector')
99    axes[1, 1].set_title('CNN Feature Maps (Convolution Outputs)')
100    axes[1, 1].legend()
101    axes[1, 1].grid(True, alpha=0.3)
102
103    plt.tight_layout()
104    plt.show()
105
106    # Key insight
107    print("\n=== Key ML-Calculus Connection ===")
108    print("1. CNNs learn convolution filters via gradient descent")
109    print("2. The Convolution Theorem enables FFT-based fast training")
110    print("3. Backprop through conv layers uses cross-correlation")
111    print("4. Laplace/Fourier analysis explains filter behavior")
112
113convolution_in_ml()

From the Laplace Integral to a CNN Layer (PyTorch)

The Laplace convolution $(x * h)(t) = \int_0^t x(\tau) h(t-\tau)\,d\tau$ and the discrete convolution $y[k] = \sum_{i} x[k+i]\,h[K-1-i]$ are the same operation in two different worlds — continuous time vs. evenly sampled time. The next snippet runs the discrete version with PyTorch's F.conv1d, then re-derives the answer with a hand-written loop. Both routes produce identical numbers, which is how you know the giant CNN inside ResNet or GPT really is computing the convolution integral your differential-equations textbook introduced two centuries ago.

Discrete Convolution with torch.nn.functional.conv1d

🐍conv1d_demo.py

Explanation(16)

Code(31)

1Import torch

torch is the PyTorch core. Tensors live here, and they carry the same vectorized semantics as NumPy arrays plus the ability to run on a GPU and track gradients — both important the moment this discrete convolution becomes a learned CNN filter.

2Import torch.nn.functional as F

F gives us conv1d, conv2d, and friends — the same operations that sit inside every Conv1d / Conv2d module. Using the functional version lets us run a one-off convolution without wrapping it in a learnable Module.

5x = the input signal

x has shape (5,) with values [1, 2, 3, 4, 5]. Think of this as samples of a continuous signal x(t) taken at times t = 0, 1, 2, 3, 4. This is the discrete analog of the input function in the Laplace convolution y(t) = ∫ x(τ) h(t−τ) dτ.

8h = the impulse response

h has shape (3,) with values [1, 2, 3]. In LTI-systems language this is the discrete impulse response: the output the system produces in response to a unit impulse at time 0. Convolving anything with h tells you how the system would respond to that anything.

11x_b = x.view(1, 1, 5)

conv1d expects a 3-D tensor of shape (batch, channels, length). We reshape x from (5,) to (1, 1, 5) — one batch element, one channel, five time samples. .view is a free, no-copy reshape because the data is contiguous.

12h_b = h.view(1, 1, 3)

Kernels in conv1d have shape (out_channels, in_channels, kernel_size) — here (1, 1, 3). Same idea as the input reshape: we add the missing batch and channel axes.

17h_flipped = torch.flip(h_b, dims=[-1])

Reverses h along its last axis: [1, 2, 3] becomes [3, 2, 1]. This step is what converts PyTorch's cross-correlation into a textbook convolution. If h were symmetric (e.g. [0.5, 1, 0.5]) the flip would be invisible — which is exactly why CNN literature blurs the cross-correlation/convolution distinction: learned filters absorb the flip.

21y = F.conv1d(x_b, h_flipped)

Runs the sliding dot-product. With no padding, the kernel can sit in 5 − 3 + 1 = 3 positions, so the output has length 3. At each position k, conv1d computes y[k] = Σᵢ x_b[k+i] · h_flipped[i] — exactly the discrete form of the continuous integral ∫ x(τ) h(t−τ) dτ.

22print y.squeeze().tolist()

Drops the (batch, channel) singleton axes and converts to a Python list. Expected output: [10.0, 16.0, 22.0]. By hand: y[0] = 1·3 + 2·2 + 3·1 = 10, y[1] = 2·3 + 3·2 + 4·1 = 16, y[2] = 3·3 + 4·2 + 5·1 = 22. Each output sample is the dot-product of three consecutive input samples with the time-reversed kernel — the discrete convolution.

25manual = [] — set up the textbook sum

We are about to recompute the same three numbers using only Python loops and the convolution formula. This is the discretized form of (x * h)(t) = ∫₀ᵗ x(τ) h(t − τ) dτ with the integral replaced by a finite sum.

26for k in range(3) — iterate output positions

Iteration 1: k = 0 — computing y[0]. Iteration 2: k = 1 — computing y[1]. Iteration 3: k = 2 — computing y[2]. Three iterations because the kernel has 3 valid positions over a length-5 input.

27s = 0.0 — running accumulator

Per-iteration reset. s plays the role of the integral's running total: at the end of the inner loop, s will equal y[k].

28for i in range(3) — iterate kernel taps

Inner loop over the 3 taps of h. The pair (k, i) picks out one input sample x[k+i] and one mirrored kernel sample h[2−i].

29s += x[k+i].item() * h[2-i].item()

Walk through k=0: (i=0) s += x[0]·h[2] = 1·3 = 3, (i=1) s += x[1]·h[1] = 2·2 = 4 → s=7, (i=2) s += x[2]·h[0] = 3·1 = 3 → s=10. After k=0, s = 10 — matches conv1d. The index h[2 − i] is the discrete analog of h(t − τ): as τ moves forward, we read h backwards. The flip is the convolution.

30manual.append(s)

Stores y[k] before moving on. After all 3 outer iterations: manual = [10.0, 16.0, 22.0]. Identical to the conv1d tensor — confirming that the highly-optimized C++ implementation is doing exactly the discrete convolution sum we have studied analytically.

31print("manual sum :", manual)

Side-by-side comparison printed. Both lines should read [10.0, 16.0, 22.0]. That is the bridge: the Laplace convolution integral ∫ x(τ) h(t−τ) dτ in continuous time becomes Σᵢ x[k+i] h[K−1−i] in discrete time, which is exactly what every CNN layer runs millions of times per second.

15 lines without explanation

1import torch
2import torch.nn.functional as F
3
4# A discrete input signal (a tiny time-series)
5x = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0])
6
7# A 3-tap impulse response — the discrete analog of h(t) in y = x * h
8h = torch.tensor([1.0, 2.0, 3.0])
9
10# conv1d expects shape (batch, channels, length)
11x_b = x.view(1, 1, 5)
12h_b = h.view(1, 1, 3)
13
14# PyTorch's conv1d performs cross-correlation, NOT true convolution.
15# For the true convolution sum y[k] = sum_i x[k+i] * h[K-1-i]
16# we flip the kernel along its time axis first.
17h_flipped = torch.flip(h_b, dims=[-1])
18
19# Apply the discrete convolution
20# Output length = 5 - 3 + 1 = 3 (no padding)
21y = F.conv1d(x_b, h_flipped)
22print("conv1d output:", y.squeeze().tolist())
23
24# Confirm against the textbook discrete-convolution sum
25manual = []
26for k in range(3):
27    s = 0.0
28    for i in range(3):
29        s += x[k + i].item() * h[2 - i].item()
30    manual.append(s)
31print("manual sum   :", manual)

Why CNNs Skip the Flip

PyTorch's conv1d / conv2d actually compute cross-correlation, not true convolution — they read the kernel forward, not flipped. For a CNN this is fine because the kernel weights are learned: gradient descent will just learn the mirror-image of the "true" convolution kernel, and the loss is unchanged. But when you are connecting the Laplace convolution integral to a discrete operation, the flip matters. Once you flip the kernel, conv1d and the mathematical convolution agree bit-for-bit.

Common Mistakes to Avoid

Mistake 1: Wrong Integration Limits

Wrong: $(f * g)(t) = \int_0^{\infty} f(\tau)g(t-\tau)\,d\tau$

Correct: $(f * g)(t) = \int_0^t f(\tau)g(t-\tau)\,d\tau$

For the Laplace convolution (causal functions), the upper limit is $t$ , not $\infty$ . The Fourier version uses $-\infty$ to $\infty$ , but the Laplace version integrates only over the interval $[0, t]$ .

Mistake 2: Confusing Convolution with Multiplication

Wrong: $(f * g)(t) = f(t) \cdot g(t)$

Correct: $(f * g)(t) = \int_0^t f(\tau)g(t-\tau)\,d\tau$

The asterisk $*$ in convolution is NOT pointwise multiplication. Convolution involves an integral over the product with a shifted argument. In the s-domain, $\mathcal{L}\{f \cdot g\} \neq F(s) \cdot G(s)$ ; instead, $\mathcal{L}\{f * g\} = F(s) \cdot G(s)$ .

Mistake 3: Forgetting the Convolution Theorem Direction

Remember:

Convolution in time $\longleftrightarrow$ Multiplication in s-domain
Multiplication in time $\longleftrightarrow$ Convolution in s-domain (a different, less common result)

Do not mix up these two correspondences. The standard Convolution Theorem is about convolution in time becoming multiplication in frequency.

Mistake 4: Applying to Non-Causal Functions

Important: The Laplace convolution assumes both functions are causal (zero for $t < 0$ ). If working with non-causal functions, use the full bilateral convolution with limits from $-\infty$ to $\infty$ .

Mistake 5: Forgetting to Verify Existence

Not all convolutions exist. The convolution integral may diverge for functions that grow too rapidly. If $f$ and $g$ both have Laplace transforms that converge in overlapping regions of the s-plane, the convolution exists.

Test Your Understanding

Question 1 of 8

What is the definition of the convolution (f * g)(t) for t ≥ 0?

Current Score: 0 correct

Summary

Convolution is one of the most consequential operations in all of applied mathematics. It connects time-domain behavior to frequency-domain analysis, characterizes linear systems, and underpins modern signal processing and machine learning.

Key Formulas

Formula	Name	Use
(f * g)(t) = ∫₀ᵗ f(τ)g(t-τ) dτ	Convolution Integral	Computes accumulated effect
ℒ{f * g} = F(s)·G(s)	Convolution Theorem	Turns convolution into multiplication
ℒ⁻¹{F·G} = f * g	Inverse form	Finds inverse of products
y(t) = x(t) * h(t)	LTI System Response	Output from input and impulse response
Y(s) = X(s)·H(s)	Transfer Function	Algebraic input-output relation
f * δ = f	Impulse identity	Delta is the identity for convolution

Key Takeaways

Convolution computes accumulated effect: The integral $\int_0^t f(\tau)g(t-\tau)\,d\tau$ sums all past contributions of $f$ weighted by how they decay via $g$ .
Time convolution = frequency multiplication: The Convolution Theorem converts the integral into the product $F(s) \cdot G(s)$ , enabling algebraic computation.
Impulse response characterizes systems: For an LTI system, knowing $h(t)$ determines the response to any input via convolution.
Convolution is commutative: $f * g = g * f$ . Compute in whichever order is simpler.
FFT accelerates convolution: The Convolution Theorem enables $O(n \log n)$ computation instead of $O(n^2)$ .
CNNs are built on convolution: Feature extraction in deep learning is discrete convolution with learned filters.

The Core Insight:

"Convolution turns the question 'What is the total accumulated effect of one function acting through another?' into a simple multiplication in the frequency domain. This duality between time and frequency is one of the deepest ideas in mathematics."

Coming Next: In Transfer Functions, we'll see how the ratio $H(s) = Y(s)/X(s)$ completely characterizes the input-output behavior of a linear system, connecting convolution to the powerful framework of block diagrams and feedback control.