Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will be able to:

Construct the Dirac delta $\delta(t-a)$ as a careful limit of unit-area spikes, and explain why the limit must be interpreted as a distribution rather than a classical function.
Use the sifting property $\int_{-\infty}^{\infty} f(t)\,\delta(t-a)\,dt = f(a)$ to evaluate integrals and recognize when a delta is doing the work.
Compute the Laplace transform $\mathcal{L}\{\delta(t-a)\} = e^{-as}$ and contrast it with the step-function transform.
Solve initial-value problems of the form $y'' + by' + cy = A\,\delta(t - t_0)$ and interpret the impulse as an instantaneous change in momentum.
Connect the impulse response to convolution, linear systems, and the role of identity elements in signal processing and neural networks.

The Big Picture: Modeling an Instantaneous Kick

"A hammer strikes a tuning fork. The contact lasts almost no time at all, but it deposits a definite amount of momentum. How do we write down a force that is zero everywhere — except for one instant where it is infinite, yet has a finite total effect?"

The step function let us model a switch that turns on. But many of the most interesting events in physics and engineering are even more violent than a switch: they happen essentially at a single instant, and yet they have a definite measurable consequence.

A bat hits a baseball — contact time is <1 ms, ball gains 40 m/s of velocity.
Lightning strikes — a current pulse so brief and intense it cooks the air into plasma.
A neuron fires — a millisecond-wide voltage spike triggers the next neuron in a chain.
A gradient update in deep learning — a single backward pass injects a kick into every weight in the network.

None of these is a finite-amplitude continuous function. They all look like arbitrarily tall, arbitrarily narrow spikes that nonetheless carry a definite total "area" — total momentum, total charge, total update. To describe them mathematically we need a new object: the Dirac delta function $\delta(t-a)$ .

Why the Delta Function Matters

The delta function is the mathematical limit of an impulse: zero width, infinite height, unit area. It is not a classical function — there is no value of $\delta(0)$ in the usual sense. But every operation we care about (integration with a test function, Laplace transform, convolution) gives a perfectly well-defined answer. This makes it the most useful "non-function" in applied mathematics.

Historical Context: Paul Dirac's Function That Was Not a Function

The delta function is named after Paul Adrien Maurice Dirac (1902–1984), the British theoretical physicist who introduced it in 1930 in his book The Principles of Quantum Mechanics. Dirac needed an object to represent a particle perfectly localized at one point — its wave function is zero everywhere except at that point, but it must still integrate to a finite probability.

"The most general function of x which is zero except at one value of x, where it is infinite, will be denoted by $\delta(x)$ ." — P. A. M. Dirac, 1930

Mathematicians of the day were horrified. There simply is no function in the classical sense with these properties. For nearly twenty years the delta function was treated as a useful fiction by physicists and engineers, despite reservations from pure mathematicians.

From Heresy to Rigor: Schwartz's Distributions

In 1944 the French mathematician Laurent Schwartz built the theory of distributions (generalized functions) — a rigorous framework in which the delta is a perfectly valid object. The key idea: stop thinking of $\delta$ as a thing with values at each point, and start thinking of it as something that acts on test functions:

\delta : f \mapsto f(0)

Schwartz earned the Fields Medal in 1950 for this work. Today the delta function is a routine member of every applied mathematician's toolkit and is rigorously justified.

Sidney Coleman's Quip

"The delta function is to the integral what the imaginary unit is to the square root: an entity that breaks the original rules so beautifully that the rules end up being rewritten in its favor."

Building the Delta from a Limit

The cleanest way to understand $\delta(t)$ is to construct it. Start with a family of perfectly ordinary functions, each one a slightly narrower and taller spike, all with the same area underneath. Then watch what happens.

The Unit-Area Rectangle

Define the rectangular pulse:

R_\varepsilon(t) = \begin{cases} \dfrac{1}{\varepsilon} & |t| \leq \varepsilon/2 \\[4pt] 0 & \text{otherwise} \end{cases}

The width is $\varepsilon$ and the height is $1/\varepsilon$ , so the area is $\varepsilon \cdot (1/\varepsilon) = 1$ — no matter how tiny we make $\varepsilon$ . As $\varepsilon \to 0^+$ :

The support shrinks to a single point at $t = 0$ .
The peak height blows up to $+\infty$ .
The total area stays pinned at $1$ .

The Smooth Gaussian Approximation

We could just as well use a smooth bell-shaped curve:

G_\varepsilon(t) = \dfrac{1}{\varepsilon\sqrt{\pi}}\, e^{-t^2/\varepsilon^2}

Same story: area equals 1 for every $\varepsilon > 0$ , and as $\varepsilon \to 0$ the bell collapses into the same spike. The shape of the approximation does not matter — only the limit does.

The Key Insight

The delta function is the limit of any unit-area family whose mass concentrates at a single point. Different shapes (rectangles, Gaussians, Lorentzians, triangles) converge to the same distribution, because every smooth test function integrated against them gives the same answer in the limit.

Interactive: Watch the Nascent Delta Collapse

Drag the $\varepsilon$ slider toward zero. Notice how the peak shoots up while the shaded area below stays at exactly 1. Swap between Rectangle / Gaussian / Lorentzian to confirm the limit does not depend on shape.

Nascent Delta: a spike of unit area as ε → 0

Shape

Width parameter ε0.800

thin / tallwide / short

Center a0.00

peak height1.250

total area1.0000

Notice: shrinking ε makes the spike taller, but the area stays pinned at 1. That is the defining property of δ(t-a).

Defining the Delta Function

Formally, the Dirac delta is not a function in the classical sense. It is a distribution — an object defined entirely by how it acts inside an integral. The defining properties are:

Defining Properties of δ(t)

\delta(t) = 0 \quad \text{for all } t \neq 0

\int_{-\infty}^{\infty} \delta(t)\, dt = 1

Zero everywhere except at a single point, yet the integral equals 1. Classical functions cannot do this — only distributions can.

For the shifted version $\delta(t-a)$ , the spike sits at $t = a$ instead of the origin:

\delta(t-a) = 0 \;\; (t \neq a), \qquad \int_{-\infty}^{\infty} \delta(t-a)\, dt = 1

Important: δ Is Not a Number-Valued Function

The expression " $\delta(0) = \infty$ " is a useful cartoon, not a mathematical statement. The delta function has no pointwise values — it only has meaning inside an integral paired with a well-behaved test function. Trying to algebraically manipulate $\delta(0)$ as a number leads to paradoxes.

Delta as the Derivative of the Step Function

Recall the Heaviside step function from the previous section:

u(t) = \begin{cases} 0 & t < 0 \\ 1 & t \geq 0 \end{cases}

Classically, the derivative of $u(t)$ is zero everywhere except at $t = 0$ , where it is undefined — there is an instantaneous jump of size 1. But what if we insist on a derivative anyway?

Approximate the step with a smooth ramp that climbs from 0 to 1 over a width of $\varepsilon$ . Its derivative is a rectangle of width $\varepsilon$ and height $1/\varepsilon$ — exactly our nascent delta. In the limit $\varepsilon \to 0$ :

The Derivative Identity

\frac{d}{dt}\, u(t-a) \;=\; \delta(t-a)

The delta is the derivative of the step — interpreted as a distribution.

Equivalently:

u(t-a) \;=\; \int_{-\infty}^{t} \delta(\tau - a)\, d\tau

The step is the running integral of the delta. This pair (impulse, step) mirrors the pair (velocity, position) for an object that gets kicked: the position jumps in finite time only if the velocity contains a delta.

Two-Way Dictionary

Operation	Step view	Delta view
Time domain	u(t)	d/dt u(t) = δ(t)
Laplace domain	1/s	s · (1/s) = 1
Action on f	∫₀^t f(τ) dτ	f(t) — instantaneous read-off

The Sifting Property

The single most important fact about the delta function is the sifting property. It is what makes the delta useful: sliding a delta across an integral "sifts out" a single value of the function it meets.

The Sifting Property

\int_{-\infty}^{\infty} f(t)\,\delta(t-a)\, dt \;=\; f(a)

The delta picks out the value of f at the single point t = a. Everything else is multiplied by zero.

Why It Works (Informally)

Replace $\delta(t-a)$ with the nascent rectangle $R_\varepsilon(t-a)$ . Then:

\int f(t) R_\varepsilon(t-a)\, dt = \frac{1}{\varepsilon}\int_{a-\varepsilon/2}^{a+\varepsilon/2} f(t)\, dt

That right-hand side is the average value of f over a width-ε window around a. As $\varepsilon \to 0$ and $f$ is continuous at $a$ , the average converges to $f(a)$ . That is the sifting property.

What You Can Do with Sifting

Integral	Equals	Why
∫ sin(t) δ(t − π) dt	sin(π) = 0	Sift at t = π
∫ e^t δ(t − 2) dt	e² ≈ 7.389	Sift at t = 2
∫ t² δ(t − 3) dt	9	Sift at t = 3
∫ f(t) δ(t) dt	f(0)	Sift at t = 0
∫ f(t) δ(t − a) dt	f(a)	General case

Interactive: Sifting in Action

Pick a test function and drag the impulse location $a$ . Watch the yellow dot ride along $f(t)$ — its height is always exactly $f(a)$ , the result of the sifting integral.

The sifting property: δ picks one value out of f(t)

Test function f(t)

f(t) = ½ t² − 1

Impulse location a1.50

∫ f(t) δ(t − a) dt = f(a) = 0.125

Slide a and watch the yellow dot ride along the curve. The integral always equals f(a) — no matter the shape of f.

Laplace Transform of the Delta Function

Now we can compute the Laplace transform of $\delta(t-a)$ directly from the definition — and the sifting property does all the work.

Derivation:

By definition:

$\mathcal{L}\{\delta(t-a)\} = \int_{0}^{\infty} \delta(t-a)\, e^{-st}\, dt$

The integrand is zero except at $t = a$ . For $a > 0$ the spike is inside the integration domain, and by the sifting property:

$= e^{-st}\Big|_{t = a} = e^{-as}$

Laplace Transform of the Delta

\mathcal{L}\{\delta(t-a)\} = e^{-as}

\mathcal{L}\{\delta(t)\} = 1

The simplest non-trivial Laplace pair: a single instant in time maps to a single complex exponential.

Compare with the Step Function

Time domain	Laplace transform	Relationship
u(t − a)	e^(−as) / s	step at t = a
δ(t − a)	e^(−as)	derivative of step

The factor of $s$ between them is no coincidence: differentiating in the time domain corresponds to multiplying by $s$ in the Laplace domain. So:

\mathcal{L}\{\delta(t-a)\} = s \cdot \mathcal{L}\{u(t-a)\} = s \cdot \frac{e^{-as}}{s} = e^{-as}

Two derivations, same answer. The delta really is the derivative of the step — even in the Laplace domain.

Solving Initial Value Problems with an Impulsive Forcing

Once we can transform the delta, solving differential equations with impulsive forcing becomes a routine algebraic exercise. The method is exactly the one you already know — only the right-hand side changes.

Template Problem

Solve $y'' + 4y = \delta(t - 2)$ with $y(0) = 0,\; y'(0) = 0$ .

Physical interpretation. A unit-mass on a spring with natural frequency $\omega_0 = 2$ , initially at rest, struck by an impulsive force at $t = 2$ .

Solution.

Step 1. Take the Laplace transform of both sides:

$s^2 Y(s) - s\, y(0) - y'(0) + 4\, Y(s) = e^{-2s}$

With the zero initial conditions:

$(s^2 + 4)\, Y(s) = e^{-2s}$

Step 2. Solve for $Y(s)$ :

$Y(s) = \frac{e^{-2s}}{s^2 + 4}$

Step 3. Recognize the un-shifted piece:

$\mathcal{L}^{-1}\!\left\{\frac{1}{s^2 + 4}\right\} = \frac{1}{2}\sin(2t)$

Step 4. Apply the second shifting theorem for the $e^{-2s}$ factor:

y(t) = \tfrac{1}{2}\sin\!\big(2(t - 2)\big) \cdot u(t - 2)

The solution is exactly zero for $t < 2$ , then jumps into a pure sinusoidal oscillation the moment the impulse hits. Position is continuous at the impulse — it is velocity that jumps.

Interactive: Impulse Response of a Second-Order System

Play with the natural frequency $\omega_0$ , damping $\gamma$ , impulse time $t_0$ , and amplitude $A$ . The system stays silent until the impulse arrives, then rings down according to its own intrinsic dynamics. Try the underdamped, critically damped, and overdamped regimes.

Impulse response of y″ + 2γy′ + ω₀² y = A · δ(t − t₀)

Natural frequency ω₀2.00

Damping γ0.30

Impulse time t₀1.50

Impulse strength A1.00

regimeunderdamped

ω₀² − γ²3.910

The system sits at rest until t₀; the impulse imparts a sudden kick equivalent to giving it an initial velocity of A. From t₀ onward you see the natural ringdown.

What an Impulse Really Means: Momentum, Not Force

Looking at the solution above raises a puzzle. The forcing on the right-hand side is infinite at one instant. Why doesn't the position jump to infinity too?

Because the right object to track is not the force itself, but the force's integral: the impulse = total momentum delivered.

Newton's second law for a unit mass is $\dfrac{dv}{dt} = F(t)$ . Integrate across the instant $t = t_0$ :

\int_{t_0 - 0}^{t_0 + 0} \frac{dv}{dt}\, dt = \int_{t_0 - 0}^{t_0 + 0} A\, \delta(t - t_0)\, dt

The left side is $v(t_0^+) - v(t_0^-)$ ; the right side is $A$ (by sifting). So:

\Delta v = A

The velocity jumps by $A$ at the impulse. The position $x(t)$ is the integral of velocity and therefore stays continuous — no instantaneous jump in position, because the velocity is only delta-singular for an infinitesimal moment.

The Mental Picture

A $\delta$ in the forcing of a Newton-type equation does not mean "infinite displacement." It means instantaneous transfer of momentum. Like a billiard ball getting hit — its position is exactly where it was, but its velocity has just changed by a definite amount.

Worked Example: Hammer Strike on a Damped Spring

📖 Click to expand the full hand calculation

Problem. A mass-spring-damper system has the equation of motion

y'' + 2 y' + 5 y = 3\, \delta(t - 1), \quad y(0) = 0,\; y'(0) = 0.

The system is at rest until a hammer of impulse 3 strikes at $t = 1$ . Find $y(t)$ .

Step 1. Laplace transform both sides.

(s^2 + 2s + 5) Y(s) = 3\, e^{-s}

Step 2. Solve for $Y(s)$ .

Y(s) = \frac{3\, e^{-s}}{s^2 + 2s + 5}

Step 3. Complete the square in the denominator.

s^2 + 2s + 5 = (s+1)^2 + 4

Step 4. Recognize the standard pair $\mathcal{L}\{e^{-t}\sin(2t)\} = \dfrac{2}{(s+1)^2 + 4}$ . So:

\mathcal{L}^{-1}\!\left\{\frac{3}{(s+1)^2 + 4}\right\} = \tfrac{3}{2}\, e^{-t} \sin(2t)

Step 5. Apply the second shifting theorem for the $e^{-s}$ factor:

y(t) = \tfrac{3}{2}\, e^{-(t-1)} \sin\!\big(2(t-1)\big) \cdot u(t - 1)

Sanity check. At $t = 1$ : position is 0 (the hammer just hit), but velocity $y'(1^+) = \tfrac{3}{2} \cdot 2 = 3$ — the exact impulse value. As expected: the hammer transfers exactly 3 units of momentum to a unit mass.

Numerical samples.

t	y(t)	Interpretation
0.5	0.0000	before strike — silence
1.0	0.0000	strike instant — position not yet moved
1.5	0.7676	rising into the first oscillation
1 + π/4 ≈ 1.785	0.6826	near first peak
1 + π/2 ≈ 2.571	0.0000	first zero crossing
1 + π ≈ 4.142	-0.1944	second swing, much damped

The damping factor $e^{-(t-1)}$ shrinks the amplitude by a factor of $e \approx 2.718$ every unit of time, so after a few oscillations the response is essentially zero.

Machine Learning Connections

Impulses and deltas show up in ML in several places — usually disguised, but always doing the same job: concentrating information at a single point.

One-Hot Encoding = Discrete Delta

The one-hot vector $(0, 0, \ldots, 1, \ldots, 0)$ is just a Kronecker delta $\delta_{i,k}$ — a discrete impulse at class $k$ . Cross-entropy against a one-hot label is exactly the sifting property in disguise:

-\sum_{i} \delta_{i,k}\, \log p_i \;=\; -\log p_k

Convolution Has δ as Its Identity

For any signal $f$ and impulse $\delta$ :

(f * \delta)(t) = f(t)

In a CNN, a kernel that is zero everywhere except a single 1 at the center is the identity map: it copies the input feature through unchanged. This is what makes residual connections ( $y = f(x) + x$ ) so well-suited to deep networks — the second term $x$ is literally the output of a delta-kernel convolution.

Derivative of ReLU Is a Step; Second Derivative Is a Delta

ReLU is $\text{ReLU}(x) = x \cdot u(x)$ . Differentiating:

\frac{d}{dx} \text{ReLU}(x) = u(x), \qquad \frac{d^2}{dx^2} \text{ReLU}(x) = \delta(x)

The second derivative is a delta at the kink. In practice deep-learning frameworks define $u'(0)$ as a subgradient (anywhere in $[0,1]$ ), which is exactly how distribution theory tells you to handle a delta inside a numerical gradient pipeline.

Point Source of a Green's Function

In physics-informed neural networks and PDE solvers, the response to a delta forcing is called the Green's function:

\mathcal{L}\, G(t, t_0) = \delta(t - t_0)

Once you have $G$ , the response to any forcing is just a convolution with $G$ . The whole field of linear-system analysis is built on this one idea.

Python Implementation

Building the Delta from a Limit, Numerically

First, let's watch the nascent delta collapse and verify that the sifting property emerges from a finite-width rectangle:

Nascent Delta and the Sifting Property

🐍nascent_delta.py

Explanation(20)

Code(38)

1Import NumPy

We need numerical arrays and vectorized math. NumPy is the backbone for sampling our function on a dense grid.

2Import Matplotlib

We pull in plotting in case we want to visualize the limit ourselves. The print statements below already tell the full story, but plt is ready if needed.

6🪟 Rectangle nascent-delta — the simplest spike

A rectangle of width ε centered at 0 with height 1/ε. The product width × height = 1, so the area under it is locked at 1 forever. As ε shrinks, height blows up — that is exactly the cartoon of δ(t).

EXECUTION STATE

t = array of sample points

eps = ε — the width parameter

height = 1.0 / eps

7📚 np.where(cond, A, B)

Element-wise ternary: where cond is True use A, else B. Here it gives 1/ε inside the spike and 0 outside. Try: np.where(np.array([1,5,9]) > 3, 100, 0) → [0, 100, 100].

EXECUTION STATE

np.abs(t) <= eps/2 = True for t inside [−ε/2, ε/2]

value if True = 1.0/eps

value if False = 0.0

10🔔 Gaussian nascent-delta — smooth version

Same idea but smooth. The pre-factor 1/(ε√π) is chosen so the area ∫ e^(−t²/ε²) dt / (ε√π) = 1. As ε → 0 the bell collapses into the same spike.

EXAMPLE

with eps=0.3:  peak = 1/(0.3·√π) ≈ 1.881

11📚 np.exp(x)

Vectorized exp. Computes e^x element-wise. Combined with the −t²/ε² argument it produces a Gaussian bell.

14Sampling grid t

4001 points from −2 to 2. Density matters — too few samples and the integral approximation underestimates the area when ε gets tiny.

EXECUTION STATE

t[0] = −2.0

t[-1] = +2.0

len(t) = 4001

15Spacing dt

Distance between adjacent samples. We use it to turn the sum into a Riemann integral approximation: ∫ ≈ Σ f(tᵢ) · dt.

EXECUTION STATE

dt = 4 / 4000 = 0.001

17Sweep over shrinking ε

Three widths, each ten times smaller than the previous. We will watch the peak grow and the area stay pinned at 1.

18Numerical integral of the rectangle

Riemann sum approximation. Multiplies each sample by dt and adds them up.

LOOP TRACE · 3 iterations

eps = 0.5

rect peak = 1/0.5 = 2.00

rect area = ≈ 1.0000

eps = 0.1

rect peak = 1/0.1 = 10.00

rect area = ≈ 1.0000

eps = 0.02

rect peak = 1/0.02 = 50.00

rect area = ≈ 1.0000

19Numerical integral of the Gaussian

Same Riemann sum but for the smooth nascent delta. Tails fall off fast, so as long as the grid covers ±2 we capture essentially all of the mass.

20Peak height

For the rectangle this is the literal height. It is what blows up to infinity in the limit ε → 0 — the only way to keep the area at 1 while the support shrinks.

21Logging output

Side-by-side comparison of peak height versus area. The peak doubles, then multiplies by 5, but the area never moves.

EXAMPLE

eps = 0.500  rect peak =     2.00  rect area =  1.0000  gauss area =  1.0000
eps = 0.100  rect peak =    10.00  rect area =  1.0000  gauss area =  1.0000
eps = 0.020  rect peak =    50.00  rect area =  1.0000  gauss area =  1.0000

30🎯 Sifting property — the punchline

If δ really is concentrated at one point, multiplying it by f(t) and integrating should just read off f at that point. We will verify this numerically.

33Test function f(t) = t² − 3t + 1

A simple polynomial. Easy to compute by hand at any t. Picking t = 2 gives f(2) = 4 − 6 + 1 = −1 — that is the value δ should sift out.

EXECUTION STATE

f(0) = 1

f(2) = −1

f(3) = 1

36Impulse location a = 2

Where we place the spike. The delta lives entirely at t = a = 2, so it ignores everything else f(t) does.

37Target value f(a)

This is what the integral should equal. By substituting a = 2 into f: 2² − 3·2 + 1 = −1.

EXECUTION STATE

target = −1.0

40Sifted integral

rect_delta(t − a, ε) is the spike at t = a. Multiplying by f(t) and summing approximates ∫ f(t) δ(t − a) dt. We expect the answer to converge to f(a) = −1.

LOOP TRACE · 3 iterations

eps = 0.5 (wide rectangle)

approx = ≈ −0.9583 (rectangle averages f over [1.75, 2.25])

error = +0.0417

eps = 0.1

approx = ≈ −0.9992

error = +0.0008

eps = 0.02 (almost a delta)

approx = ≈ −1.0000

error = +0.00003

41Error monitor

We track approx − target to see the convergence rate. The error shrinks roughly like ε² for smooth f, because we are essentially averaging f over a small window.

42Final printed table

Shows the integral homing in on −1 as ε → 0. That is the sifting property emerging from a finite-width rectangle.

EXAMPLE

eps = 0.500  sifted value =  -0.9583  (target = -1.0, error = +0.0417)
eps = 0.100  sifted value =  -0.9992  (target = -1.0, error = +0.0008)
eps = 0.020  sifted value =  -1.0000  (target = -1.0, error = +0.00003)

18 lines without explanation

1import numpy as np
2import matplotlib.pyplot as plt
3
4# A "nascent delta": a rectangle of width epsilon and height 1/epsilon.
5# By construction its area is exactly 1, no matter how small epsilon gets.
6def rect_delta(t, eps):
7    return np.where(np.abs(t) <= eps / 2, 1.0 / eps, 0.0)
8
9# A Gaussian nascent delta: smooth, same unit-area property.
10def gauss_delta(t, eps):
11    return (1.0 / (eps * np.sqrt(np.pi))) * np.exp(-(t ** 2) / (eps ** 2))
12
13# Sweep through three values of epsilon and confirm the area stays 1.
14t = np.linspace(-2, 2, 4001)
15dt = t[1] - t[0]
16
17for eps in [0.5, 0.1, 0.02]:
18    rect_area = np.sum(rect_delta(t, eps)) * dt
19    gauss_area = np.sum(gauss_delta(t, eps)) * dt
20    rect_peak = 1.0 / eps
21    print(f"eps = {eps:5.3f}  "
22          f"rect peak = {rect_peak:8.2f}  rect area = {rect_area:7.4f}  "
23          f"gauss area = {gauss_area:7.4f}")
24
25# Now verify the SIFTING property numerically:
26#     integral of f(t) * delta(t - a) dt  ==  f(a)
27# We use f(t) = t**2 - 3*t + 1 and a = 2, so the answer should be f(2) = -1.
28def f(t):
29    return t ** 2 - 3 * t + 1
30
31a = 2.0
32target = f(a)                       # expected value: 4 - 6 + 1 = -1
33
34for eps in [0.5, 0.1, 0.02]:
35    approx = np.sum(f(t) * rect_delta(t - a, eps)) * dt
36    error = approx - target
37    print(f"eps = {eps:5.3f}  sifted value = {approx:8.4f}  "
38          f"(target = {target}, error = {error:+8.4f})")

Symbolic Laplace of δ and a Full IVP

Now we let SymPy do the symbolic algebra — and check that the answer matches our hand calculation:

Delta in the Laplace Domain and Solving y″ + 4y = δ(t − 2)

🐍delta_laplace_ivp.py

Explanation(17)

Code(37)

1📚 Imports from SymPy

DiracDelta is SymPy's symbolic δ. Heaviside is the step. laplace_transform / inverse_laplace_transform do the L and L⁻¹ operations symbolically. dsolve is SymPy's ODE solver.

5Declare symbols

t is time, s is the complex Laplace variable. Marking them positive/real tells SymPy how to simplify, especially across the e^(−as) factor.

EXECUTION STATE

t = time symbol (positive, real)

s = Laplace variable (positive, real)

6Symbol a > 0

The impulse location. We require a > 0 so the entire spike lies in the integration domain [0, ∞) of the Laplace transform.

9🎯 L{δ(t − a)}

By definition L{f(t)} = ∫₀^∞ f(t) e^(−st) dt. For δ(t − a) the sifting property gives e^(−s·a) — the entire transform is just a complex exponential.

EXECUTION STATE

DiracDelta(t - a) = symbolic δ(t − a)

F = exp(−a·s)

10📚 laplace_transform(expr, t, s, noconds=True)

Computes ∫₀^∞ expr · e^(−st) dt. noconds=True suppresses the convergence-region info. Without it you would get (exp(-a*s), 0, True).

11Expected output: exp(-a*s)

Compare against the step-function result e^(−as)/s. The delta has no 1/s factor because it is the DERIVATIVE of the step — and differentiation in t becomes multiplication by s in the Laplace domain.

EXAMPLE

L{u(t-a)} = exp(-a*s)/s   (step)
L{δ(t-a)} = exp(-a*s)      (impulse)

14Special case a = 0

δ at the origin transforms to 1. The simplest, cleanest Laplace pair: an instantaneous all-frequencies kick maps to a uniform spectrum.

EXECUTION STATE

F0 = 1

19🔧 Set up the IVP

y(t) is the unknown function. We declare it as a SymPy Function so dsolve can manipulate it symbolically.

20The differential equation

Spring-mass with natural frequency 2: y'' + 4y. The forcing is a unit impulse at t = 2 — think of a hammer striking a tuning fork at exactly that moment.

EXECUTION STATE

left side = y''(t) + 4·y(t)

right side = δ(t − 2)

21📚 dsolve(ode, y(t), ics={...})

Solves the ODE with initial conditions. ics is a dict mapping y(0)=0 and y'(0)=0 — the spring starts at rest.

EXECUTION STATE

y(0) = 0 (at equilibrium)

y'(0) = 0 (at rest)

23Print solution

SymPy returns an Eq object; .rhs is the right-hand side, i.e. the formula for y(t).

EXAMPLE

y(t) = sin(2*t - 4)*Heaviside(t - 2)/2
Meaning: 0 before t=2, then ½ sin(2(t−2)) afterward.

27🧮 Inverse-Laplace by hand, machine-checked

We construct Y(s) directly and let SymPy invert it. This mirrors what you would do with pen and paper using the second shifting theorem.

28Y(s) = e^(−2s) / (s² + 4)

From s² Y + 4 Y = e^(−2s) we factor out Y to get this. Notice the s² + 4 in the denominator — that is the system's characteristic polynomial.

EXECUTION STATE

Y = e^(−2s) / (s² + 4)

29📚 inverse_laplace_transform(Y, s, t)

Performs L⁻¹. It recognizes e^(−2s) as a time-shift factor and 1/(s² + 4) as the transform of (1/2) sin(2t), then combines them via the second shifting theorem.

EXAMPLE

y(t) = (1/2) sin(2(t − 2)) · u(t − 2)

34Numerical sample function

We mirror the symbolic answer in plain NumPy so we can probe specific times. np.where handles the step: 0 before 2, formula after.

35📚 np.where(cond, A, B)

Same pattern as before — vectorized branch. Implements the u(t − 2) factor without writing an explicit loop.

38Probe times

We evaluate the solution at five times. Before t = 2 the system is dead silent. After t = 2 it rings sinusoidally with amplitude ½ and frequency 2.

LOOP TRACE · 5 iterations

t = 1.000

y(t) = 0.000000 (before the impulse)

t = 2.000

y(t) = 0.000000 (impulse just arrived — position still 0)

t = 2.500

y(t) = 0.5·sin(1.0) ≈ +0.420735

t = 3.000

y(t) = 0.5·sin(2.0) ≈ +0.454649

t = 2 + π/4 ≈ 2.7854

y(t) = 0.5·sin(π/2) = +0.500000 (peak amplitude)

20 lines without explanation

1from sympy import (symbols, DiracDelta, Heaviside, exp, sin, cos,
2                   laplace_transform, inverse_laplace_transform, Function,
3                   dsolve, Eq, Derivative)
4
5t, s = symbols('t s', positive=True, real=True)
6a = symbols('a', positive=True)
7
8# 1. Laplace transform of the unit impulse at t = a
9F = laplace_transform(DiracDelta(t - a), t, s, noconds=True)
10print("L{delta(t - a)} =", F)        # expected: exp(-a*s)
11
12# 2. Laplace transform at a = 0 (the canonical delta at the origin)
13F0 = laplace_transform(DiracDelta(t), t, s, noconds=True)
14print("L{delta(t)}     =", F0)       # expected: 1
15
16# 3. Solve y'' + 4 y = delta(t - 2),  y(0) = 0,  y'(0) = 0
17y = Function('y')
18ode = Eq(Derivative(y(t), t, t) + 4 * y(t), DiracDelta(t - 2))
19sol = dsolve(ode, y(t), ics={y(0): 0, Derivative(y(t), t).subs(t, 0): 0})
20print("\nIVP solution:")
21print("y(t) =", sol.rhs)
22
23# 4. Inverse-transform check, the hand-method way:
24#    Y(s)(s^2 + 4) = exp(-2s)  =>  Y(s) = exp(-2s) / (s^2 + 4)
25#    L^{-1}{1/(s^2+4)} = (1/2) sin(2t)
26#    Then the e^{-2s} factor shifts the inverse by 2 (second shifting theorem).
27Y = exp(-2 * s) / (s ** 2 + 4)
28y_hand = inverse_laplace_transform(Y, s, t)
29print("\nInverse-Laplace y(t) =", y_hand)
30
31# 5. Evaluate the solution numerically at a few points
32import numpy as np
33def y_num(tv):
34    return np.where(tv < 2, 0.0, 0.5 * np.sin(2 * (tv - 2)))
35
36for tv in [1.0, 2.0, 2.5, 3.0, 2 + np.pi / 4]:
37    print(f"y({tv:5.3f}) = {y_num(tv):+.6f}")

The Discrete Delta in PyTorch

Finally, the digital cousin of the delta function — the Kronecker impulse — and its role as the identity element of convolution and the probe for any linear filter:

The Discrete Delta: Identity of Convolution & Impulse Response

🐍discrete_delta_pytorch.py

Explanation(27)

Code(42)

1📚 import torch

PyTorch tensors and autograd. Discrete signals here live as 1-D float tensors.

2📚 torch.nn.functional as F

Functional API. We need F.conv1d for explicit convolution-style operations on plain tensors (no nn.Module).

6🎯 Discrete impulse

The Kronecker delta δ[n − 2]: zero everywhere except at index 2 where it equals 1. It is the discrete counterpart of δ(t − 2).

7Place the spike

Index 2 gets the unit value. Everything else stays zero.

EXECUTION STATE

delta = [0, 0, 1, 0, 0, 0, 0, 0]

8Print to confirm

We list out the tensor so the spike is visible. Expect [0.0, 0.0, 1.0, 0.0, ..., 0.0].

EXAMPLE

delta = [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]

11Filter taps h

A three-tap kernel: weights 1, 2, 3 applied to successive samples. Could be a smoother, an edge detector, or any FIR filter.

EXECUTION STATE

h = [1.0, 2.0, 3.0]

12Print h

Sanity check that h holds the expected values.

EXAMPLE

filter h = [1.0, 2.0, 3.0]

16🔁 Delta is the IDENTITY of convolution

Mathematically: f * δ = f. In ML this is why a 1×1 conv with weight 1 at the center leaves a feature map unchanged — that center weight is a discrete delta.

17Reshape for conv1d input

F.conv1d wants (batch, channels, length). We use (1, 1, 8) — one batch, one channel, eight samples.

EXECUTION STATE

delta_b.shape = torch.Size([1, 1, 8])

18📚 .flip(0) — convolution vs cross-correlation

PyTorch conv1d actually performs cross-correlation. Flipping the kernel along axis 0 turns it into a true convolution. Try: torch.tensor([1,2,3]).flip(0) → tensor([3,2,1]).

EXAMPLE

h         = [1, 2, 3]
h.flip(0) = [3, 2, 1]

19📚 F.conv1d(input, weight, padding=2)

Slides the (flipped) kernel across the input. padding=2 zero-pads both ends so the output keeps the same length as the input. Returns shape (batch, out_channels, length).

EXECUTION STATE

input shape = (1, 1, 8)

weight shape = (1, 1, 3)

padding = 2 zeros on each side

output = the filter h itself, shifted into positions 1-3

20Verify identity behavior

Output should reveal h placed at the spike's location — i.e., h * δ[n − 2] = h shifted to start at index 1.

EXAMPLE

h * delta = [0.0, 1.0, 2.0, 3.0, 0.0, 0.0, 0.0, 0.0]
             ^ entries 1,2,3 are exactly h, starting one position before the spike.

24Arbitrary signal x

An 8-sample signal with no special structure. We will convolve it with a shifted delta to demonstrate the time-shift property.

EXECUTION STATE

x = [4.0, -1.0, 3.0, 0.5, 2.0, 7.0, 1.0, 0.0]

25A delta at index 3

A 5-sample tensor with the spike at position 3. Convolving x with this should shift x by 1 sample (because the kernel center is at position 2 inside a length-5 kernel).

26Set the spike

shift_delta = [0, 0, 0, 1, 0]. The position of the 1 controls the shift amount.

EXECUTION STATE

shift_delta = [0, 0, 0, 1, 0]

28Reshape x for conv1d

Add batch and channel dims: shape becomes (1, 1, 8). Same routine as before.

29Flip the delta kernel

Same flip trick — keeps the operation a true mathematical convolution.

30Run conv1d with padding=2

padding=2 plus a length-5 kernel keeps the output at length 8. Because the delta is the identity of convolution, the result is x shifted by 1.

EXECUTION STATE

shifted = x shifted right by 1 sample (because the spike sat 1 step past center)

31Print shifted

Expect x to reappear, displaced.

EXAMPLE

x                 = [4.0, -1.0,  3.0,  0.5,  2.0,  7.0,  1.0,  0.0]
x * delta_shifted = [-1.0,  3.0,  0.5,  2.0,  7.0,  1.0,  0.0,  0.0]
                     ^ same numbers, shifted left by 1 (kernel offset).

35🎛️ Impulse response of a filter

If you feed δ into a linear filter, the OUTPUT is the filter's impulse response. This single experiment fully characterizes the filter — every other output can be reconstructed by convolution with the input.

36Define a 3-tap recursion

A weighted moving average: y[n] = 0.5·x[n] + 0.3·x[n−1] + 0.2·x[n−2]. The coefficients sum to 1, so it is a smoother.

37Pad with two zeros at the front

We need x[n−1] and x[n−2] to be defined for the first samples. Prepending two zeros plays the role of 'silence before the signal'.

EXECUTION STATE

pad = [0, 0] ⌢ signal — two-zero prepend

38Vectorized output

Slicing pad[2:], pad[1:-1], pad[:-2] gives three aligned arrays representing x[n], x[n−1], x[n−2]. The weighted sum produces y[n].

EXAMPLE

pad[2:]   = x[n]
pad[1:-1] = x[n-1]
pad[:-2]  = x[n-2]
y[n] = 0.5·x[n] + 0.3·x[n-1] + 0.2·x[n-2]

41Input: a unit impulse

Six-sample signal with a 1 at index 0 and zeros elsewhere. The cleanest possible probe.

EXECUTION STATE

impulse_in = [1, 0, 0, 0, 0, 0]

42Set the spike at n = 0

Position 0 gets the 1. The output will then read off the filter coefficients in order.

43Run the filter on δ

Because δ is the identity of convolution, the output literally is the impulse response of the filter — i.e., its coefficients.

EXECUTION STATE

impulse_response = [0.5, 0.3, 0.2, 0, 0, 0]

44Print to confirm

We expect the first three taps to be 0.5, 0.3, 0.2 — the filter's coefficients — followed by zeros. This is the discrete echo of 'inverse Laplace of 1/D(s)' = the impulse response of a linear system.

EXAMPLE

impulse response = [0.5, 0.3, 0.2, 0.0, 0.0, 0.0]

15 lines without explanation

1import torch
2import torch.nn.functional as F
3
4# A discrete impulse: zero everywhere except a single 1 at index 2.
5# This is the digital cousin of delta(t - 2).
6delta = torch.zeros(8)
7delta[2] = 1.0
8print("delta             =", delta.tolist())
9
10# A simple 3-tap filter — could be a smoothing kernel, an FIR filter, etc.
11h = torch.tensor([1.0, 2.0, 3.0])
12print("filter h          =", h.tolist())
13
14# Convolution treats delta as the IDENTITY element:
15#   (h * delta)[n] = h[n - k0]
16# We use F.conv1d, which expects shape (batch, channels, length).
17delta_b = delta.view(1, 1, -1)
18h_kernel = h.flip(0).view(1, 1, -1)   # PyTorch conv1d does cross-correlation
19out = F.conv1d(delta_b, h_kernel, padding=2).view(-1)
20print("h * delta         =", out.tolist())
21
22# Same idea: convolve an arbitrary signal x with a delta and get x back,
23# shifted by the impulse position.
24x = torch.tensor([4.0, -1.0, 3.0, 0.5, 2.0, 7.0, 1.0, 0.0])
25shift_delta = torch.zeros(5)
26shift_delta[3] = 1.0
27x_b = x.view(1, 1, -1)
28d_kernel = shift_delta.flip(0).view(1, 1, -1)
29shifted = F.conv1d(x_b, d_kernel, padding=2).view(-1)
30print("x * delta_shifted =", shifted.tolist())
31
32# Impulse response of a digital filter: feed delta in, output IS the filter.
33# This is the discrete echo of "the inverse Laplace transform of 1/D(s)".
34def my_filter(signal):
35    # y[n] = 0.5 * x[n] + 0.3 * x[n-1] + 0.2 * x[n-2]
36    pad = torch.cat([torch.zeros(2), signal])
37    return 0.5 * pad[2:] + 0.3 * pad[1:-1] + 0.2 * pad[:-2]
38
39impulse_in = torch.zeros(6)
40impulse_in[0] = 1.0
41impulse_response = my_filter(impulse_in)
42print("impulse response  =", impulse_response.tolist())

Common Mistakes

Mistake 1: Treating δ(0) as a Number

Wrong. Writing " $\delta(0)^2$ " or asking whether $\delta(0) > 0$ .

Right. The delta function has no pointwise values. Only integrals against test functions are defined. If a manipulation requires a value at a single point, you are outside the rules of distribution theory.

Mistake 2: Forgetting the Step Factor in the Solution

Wrong. Writing $y(t) = \tfrac{1}{2}\sin(2(t-2))$ as the answer to $y'' + 4y = \delta(t-2)$ with zero initial conditions.

Right. The solution must be zero before the impulse arrives. Always include the $u(t-a)$ factor: $y(t) = \tfrac{1}{2}\sin(2(t-2)) \cdot u(t-2)$ .

Mistake 3: Mixing Up L{δ} and L{u}

Wrong. Writing $\mathcal{L}\{\delta(t)\} = 1/s$ .

Right. $\mathcal{L}\{\delta(t)\} = 1$ , and $\mathcal{L}\{u(t)\} = 1/s$ . The delta is one power of $s$ "flatter" than the step — exactly because the delta is the derivative of the step.

Mistake 4: Putting the Impulse Outside the Integration Domain

Wrong. Claiming $\mathcal{L}\{\delta(t + 2)\} = e^{2s}$ .

Right. The Laplace transform integrates from $0$ to $\infty$ . The spike of $\delta(t + 2)$ sits at $t = -2$ , outside the domain. The integrand is zero throughout. So $\mathcal{L}\{\delta(t+2)\} = 0$ .

Mistake 5: Confusing Position-Jump with Velocity-Jump

Wrong. Saying the impulse causes an instantaneous jump in position.

Right. An impulsive force changes velocity instantaneously, not position. Position is the integral of velocity and is continuous across the impulse. This is why $y(t_0^+) = y(t_0^-) = 0$ in the worked example, even though $y'(t_0^+) - y'(t_0^-) = A$ .

Test Your Understanding

Test Yourself: Impulse Functions

Q1.What is the Laplace transform of δ(t − 3)?

Q2.What does the sifting property ∫ f(t) δ(t − 2) dt evaluate to?

Q3.If the step function u(t) is the integral of δ(t), then δ(t) is…

Q4.Solve y″ + y = δ(t − π), y(0) = y′(0) = 0. What is y(t)?

Q5.An impulsive force F·δ(t − t₀) acting on a unit mass at rest produces what?

Summary

The Dirac delta is the ideal limit of an arbitrarily narrow, arbitrarily tall, unit-area spike. It is not a classical function but a distribution — defined by how it acts inside integrals. Combined with the Laplace transform, it gives us the cleanest possible way to model instantaneous events: collisions, switch closures, lightning strikes, neural spikes, and gradient updates.

Key Formulas

Formula	Name	Use
δ(t − a)	Dirac delta at a	instantaneous spike of unit area
∫ f(t) δ(t − a) dt = f(a)	Sifting property	evaluate integrals against δ
d/dt u(t − a) = δ(t − a)	Step ↔ Delta	delta is the derivative of the step
L{δ(t)} = 1	Laplace of δ	instant impulse at origin
L{δ(t − a)} = e^(−as)	Laplace of shifted δ	instant impulse at a
(f * δ)(t) = f(t)	Identity of convolution	δ leaves any signal unchanged
L · G(t, t₀) = δ(t − t₀)	Green's function	impulse response of operator L

Key Takeaways

Construct, don't evaluate. Think of δ as the limit of unit-area spikes, not as a function with pointwise values.
Sifting is the workhorse. Almost every manipulation of δ reduces to $\int f(t)\, \delta(t-a)\, dt = f(a)$ .
Differentiation in time = multiplication by s. Because δ is the derivative of the step, its Laplace transform is $s \cdot \tfrac{1}{s} = 1$ .
An impulse delivers momentum, not displacement. Velocity jumps; position stays continuous.
Impulse response = system identity. Feed δ in, and the output completely characterizes any linear time-invariant system. Every other response is a convolution with this one.
ML is full of deltas. One-hot labels, residual connections, Green's functions for PDE solvers, ReLU's second derivative — all are deltas in disguise.

The Core Insight:

"A delta function is a forever-instant. It carries no width and no value at any point, yet it transfers a definite amount of something — momentum, charge, probability mass — across a single moment in time."

Coming Next: In Convolution we will build on the "δ is the identity" idea to show how any input signal can be written as a continuous sum of shifted impulses, each weighted by the input's value at that instant. Convolving with the impulse response then gives the system's output for arbitrary forcing.