Learning Objectives
By the end of this section, you will be able to:
- Construct the Dirac delta as a careful limit of unit-area spikes, and explain why the limit must be interpreted as a distribution rather than a classical function.
- Use the sifting property to evaluate integrals and recognize when a delta is doing the work.
- Compute the Laplace transform and contrast it with the step-function transform.
- Solve initial-value problems of the form and interpret the impulse as an instantaneous change in momentum.
- Connect the impulse response to convolution, linear systems, and the role of identity elements in signal processing and neural networks.
The Big Picture: Modeling an Instantaneous Kick
"A hammer strikes a tuning fork. The contact lasts almost no time at all, but it deposits a definite amount of momentum. How do we write down a force that is zero everywhere — except for one instant where it is infinite, yet has a finite total effect?"
The step function let us model a switch that turns on. But many of the most interesting events in physics and engineering are even more violent than a switch: they happen essentially at a single instant, and yet they have a definite measurable consequence.
- A bat hits a baseball — contact time is <1 ms, ball gains 40 m/s of velocity.
- Lightning strikes — a current pulse so brief and intense it cooks the air into plasma.
- A neuron fires — a millisecond-wide voltage spike triggers the next neuron in a chain.
- A gradient update in deep learning — a single backward pass injects a kick into every weight in the network.
None of these is a finite-amplitude continuous function. They all look like arbitrarily tall, arbitrarily narrow spikes that nonetheless carry a definite total "area" — total momentum, total charge, total update. To describe them mathematically we need a new object: the Dirac delta function .
Why the Delta Function Matters
The delta function is the mathematical limit of an impulse: zero width, infinite height, unit area. It is not a classical function — there is no value of in the usual sense. But every operation we care about (integration with a test function, Laplace transform, convolution) gives a perfectly well-defined answer. This makes it the most useful "non-function" in applied mathematics.
Historical Context: Paul Dirac's Function That Was Not a Function
The delta function is named after Paul Adrien Maurice Dirac (1902–1984), the British theoretical physicist who introduced it in 1930 in his book The Principles of Quantum Mechanics. Dirac needed an object to represent a particle perfectly localized at one point — its wave function is zero everywhere except at that point, but it must still integrate to a finite probability.
"The most general function of x which is zero except at one value of x, where it is infinite, will be denoted by ." — P. A. M. Dirac, 1930
Mathematicians of the day were horrified. There simply is no function in the classical sense with these properties. For nearly twenty years the delta function was treated as a useful fiction by physicists and engineers, despite reservations from pure mathematicians.
From Heresy to Rigor: Schwartz's Distributions
In 1944 the French mathematician Laurent Schwartz built the theory of distributions (generalized functions) — a rigorous framework in which the delta is a perfectly valid object. The key idea: stop thinking of as a thing with values at each point, and start thinking of it as something that acts on test functions:
Schwartz earned the Fields Medal in 1950 for this work. Today the delta function is a routine member of every applied mathematician's toolkit and is rigorously justified.
Sidney Coleman's Quip
"The delta function is to the integral what the imaginary unit is to the square root: an entity that breaks the original rules so beautifully that the rules end up being rewritten in its favor."
Building the Delta from a Limit
The cleanest way to understand is to construct it. Start with a family of perfectly ordinary functions, each one a slightly narrower and taller spike, all with the same area underneath. Then watch what happens.
The Unit-Area Rectangle
Define the rectangular pulse:
The width is and the height is , so the area is — no matter how tiny we make . As :
- The support shrinks to a single point at .
- The peak height blows up to .
- The total area stays pinned at .
The Smooth Gaussian Approximation
We could just as well use a smooth bell-shaped curve:
Same story: area equals 1 for every , and as the bell collapses into the same spike. The shape of the approximation does not matter — only the limit does.
The Key Insight
The delta function is the limit of any unit-area family whose mass concentrates at a single point. Different shapes (rectangles, Gaussians, Lorentzians, triangles) converge to the same distribution, because every smooth test function integrated against them gives the same answer in the limit.
Interactive: Watch the Nascent Delta Collapse
Drag the slider toward zero. Notice how the peak shoots up while the shaded area below stays at exactly 1. Swap between Rectangle / Gaussian / Lorentzian to confirm the limit does not depend on shape.
Defining the Delta Function
Formally, the Dirac delta is not a function in the classical sense. It is a distribution — an object defined entirely by how it acts inside an integral. The defining properties are:
Defining Properties of δ(t)
For the shifted version , the spike sits at instead of the origin:
Important: δ Is Not a Number-Valued Function
The expression "" is a useful cartoon, not a mathematical statement. The delta function has no pointwise values — it only has meaning inside an integral paired with a well-behaved test function. Trying to algebraically manipulate as a number leads to paradoxes.
Delta as the Derivative of the Step Function
Recall the Heaviside step function from the previous section:
Classically, the derivative of is zero everywhere except at , where it is undefined — there is an instantaneous jump of size 1. But what if we insist on a derivative anyway?
Approximate the step with a smooth ramp that climbs from 0 to 1 over a width of . Its derivative is a rectangle of width and height — exactly our nascent delta. In the limit :
The Derivative Identity
Equivalently:
The step is the running integral of the delta. This pair (impulse, step) mirrors the pair (velocity, position) for an object that gets kicked: the position jumps in finite time only if the velocity contains a delta.
Two-Way Dictionary
| Operation | Step view | Delta view |
|---|---|---|
| Time domain | u(t) | d/dt u(t) = δ(t) |
| Laplace domain | 1/s | s · (1/s) = 1 |
| Action on f | ∫₀^t f(τ) dτ | f(t) — instantaneous read-off |
The Sifting Property
The single most important fact about the delta function is the sifting property. It is what makes the delta useful: sliding a delta across an integral "sifts out" a single value of the function it meets.
The Sifting Property
Why It Works (Informally)
Replace with the nascent rectangle . Then:
That right-hand side is the average value of f over a width-ε window around a. As and is continuous at , the average converges to . That is the sifting property.
What You Can Do with Sifting
| Integral | Equals | Why |
|---|---|---|
| ∫ sin(t) δ(t − π) dt | sin(π) = 0 | Sift at t = π |
| ∫ e^t δ(t − 2) dt | e² ≈ 7.389 | Sift at t = 2 |
| ∫ t² δ(t − 3) dt | 9 | Sift at t = 3 |
| ∫ f(t) δ(t) dt | f(0) | Sift at t = 0 |
| ∫ f(t) δ(t − a) dt | f(a) | General case |
Interactive: Sifting in Action
Pick a test function and drag the impulse location . Watch the yellow dot ride along — its height is always exactly , the result of the sifting integral.
Laplace Transform of the Delta Function
Now we can compute the Laplace transform of directly from the definition — and the sifting property does all the work.
Derivation:
By definition:
The integrand is zero except at . For the spike is inside the integration domain, and by the sifting property:
Laplace Transform of the Delta
Compare with the Step Function
| Time domain | Laplace transform | Relationship |
|---|---|---|
| u(t − a) | e^(−as) / s | step at t = a |
| δ(t − a) | e^(−as) | derivative of step |
The factor of between them is no coincidence: differentiating in the time domain corresponds to multiplying by in the Laplace domain. So:
Two derivations, same answer. The delta really is the derivative of the step — even in the Laplace domain.
Solving Initial Value Problems with an Impulsive Forcing
Once we can transform the delta, solving differential equations with impulsive forcing becomes a routine algebraic exercise. The method is exactly the one you already know — only the right-hand side changes.
Template Problem
Solve with .
Physical interpretation. A unit-mass on a spring with natural frequency , initially at rest, struck by an impulsive force at .
Solution.
Step 1. Take the Laplace transform of both sides:
With the zero initial conditions:
Step 2. Solve for :
Step 3. Recognize the un-shifted piece:
Step 4. Apply the second shifting theorem for the factor:
The solution is exactly zero for , then jumps into a pure sinusoidal oscillation the moment the impulse hits. Position is continuous at the impulse — it is velocity that jumps.
Interactive: Impulse Response of a Second-Order System
Play with the natural frequency , damping , impulse time , and amplitude . The system stays silent until the impulse arrives, then rings down according to its own intrinsic dynamics. Try the underdamped, critically damped, and overdamped regimes.
What an Impulse Really Means: Momentum, Not Force
Looking at the solution above raises a puzzle. The forcing on the right-hand side is infinite at one instant. Why doesn't the position jump to infinity too?
Because the right object to track is not the force itself, but the force's integral: the impulse = total momentum delivered.
Newton's second law for a unit mass is . Integrate across the instant :
The left side is ; the right side is (by sifting). So:
The velocity jumps by at the impulse. The position is the integral of velocity and therefore stays continuous — no instantaneous jump in position, because the velocity is only delta-singular for an infinitesimal moment.
The Mental Picture
A in the forcing of a Newton-type equation does not mean "infinite displacement." It means instantaneous transfer of momentum. Like a billiard ball getting hit — its position is exactly where it was, but its velocity has just changed by a definite amount.
Worked Example: Hammer Strike on a Damped Spring
📖 Click to expand the full hand calculation
Problem. A mass-spring-damper system has the equation of motion
The system is at rest until a hammer of impulse 3 strikes at . Find .
Step 1. Laplace transform both sides.
Step 2. Solve for .
Step 3. Complete the square in the denominator.
Step 4. Recognize the standard pair . So:
Step 5. Apply the second shifting theorem for the factor:
Sanity check. At : position is 0 (the hammer just hit), but velocity — the exact impulse value. As expected: the hammer transfers exactly 3 units of momentum to a unit mass.
Numerical samples.
| t | y(t) | Interpretation |
|---|---|---|
| 0.5 | 0.0000 | before strike — silence |
| 1.0 | 0.0000 | strike instant — position not yet moved |
| 1.5 | 0.7676 | rising into the first oscillation |
| 1 + π/4 ≈ 1.785 | 0.6826 | near first peak |
| 1 + π/2 ≈ 2.571 | 0.0000 | first zero crossing |
| 1 + π ≈ 4.142 | -0.1944 | second swing, much damped |
The damping factor shrinks the amplitude by a factor of every unit of time, so after a few oscillations the response is essentially zero.
Machine Learning Connections
Impulses and deltas show up in ML in several places — usually disguised, but always doing the same job: concentrating information at a single point.
One-Hot Encoding = Discrete Delta
The one-hot vector is just a Kronecker delta — a discrete impulse at class . Cross-entropy against a one-hot label is exactly the sifting property in disguise:
Convolution Has δ as Its Identity
For any signal and impulse :
In a CNN, a kernel that is zero everywhere except a single 1 at the center is the identity map: it copies the input feature through unchanged. This is what makes residual connections () so well-suited to deep networks — the second term is literally the output of a delta-kernel convolution.
Derivative of ReLU Is a Step; Second Derivative Is a Delta
ReLU is . Differentiating:
The second derivative is a delta at the kink. In practice deep-learning frameworks define as a subgradient (anywhere in ), which is exactly how distribution theory tells you to handle a delta inside a numerical gradient pipeline.
Point Source of a Green's Function
In physics-informed neural networks and PDE solvers, the response to a delta forcing is called the Green's function:
Once you have , the response to any forcing is just a convolution with . The whole field of linear-system analysis is built on this one idea.
Python Implementation
Building the Delta from a Limit, Numerically
First, let's watch the nascent delta collapse and verify that the sifting property emerges from a finite-width rectangle:
Symbolic Laplace of δ and a Full IVP
Now we let SymPy do the symbolic algebra — and check that the answer matches our hand calculation:
The Discrete Delta in PyTorch
Finally, the digital cousin of the delta function — the Kronecker impulse — and its role as the identity element of convolution and the probe for any linear filter:
Common Mistakes
Mistake 1: Treating δ(0) as a Number
Wrong. Writing "" or asking whether .
Right. The delta function has no pointwise values. Only integrals against test functions are defined. If a manipulation requires a value at a single point, you are outside the rules of distribution theory.
Mistake 2: Forgetting the Step Factor in the Solution
Wrong. Writing as the answer to with zero initial conditions.
Right. The solution must be zero before the impulse arrives. Always include the factor: .
Mistake 3: Mixing Up L{δ} and L{u}
Wrong. Writing .
Right. , and . The delta is one power of "flatter" than the step — exactly because the delta is the derivative of the step.
Mistake 4: Putting the Impulse Outside the Integration Domain
Wrong. Claiming .
Right. The Laplace transform integrates from to . The spike of sits at , outside the domain. The integrand is zero throughout. So .
Mistake 5: Confusing Position-Jump with Velocity-Jump
Wrong. Saying the impulse causes an instantaneous jump in position.
Right. An impulsive force changes velocity instantaneously, not position. Position is the integral of velocity and is continuous across the impulse. This is why in the worked example, even though .
Test Your Understanding
Summary
The Dirac delta is the ideal limit of an arbitrarily narrow, arbitrarily tall, unit-area spike. It is not a classical function but a distribution — defined by how it acts inside integrals. Combined with the Laplace transform, it gives us the cleanest possible way to model instantaneous events: collisions, switch closures, lightning strikes, neural spikes, and gradient updates.
Key Formulas
| Formula | Name | Use |
|---|---|---|
| δ(t − a) | Dirac delta at a | instantaneous spike of unit area |
| ∫ f(t) δ(t − a) dt = f(a) | Sifting property | evaluate integrals against δ |
| d/dt u(t − a) = δ(t − a) | Step ↔ Delta | delta is the derivative of the step |
| L{δ(t)} = 1 | Laplace of δ | instant impulse at origin |
| L{δ(t − a)} = e^(−as) | Laplace of shifted δ | instant impulse at a |
| (f * δ)(t) = f(t) | Identity of convolution | δ leaves any signal unchanged |
| L · G(t, t₀) = δ(t − t₀) | Green's function | impulse response of operator L |
Key Takeaways
- Construct, don't evaluate. Think of δ as the limit of unit-area spikes, not as a function with pointwise values.
- Sifting is the workhorse. Almost every manipulation of δ reduces to .
- Differentiation in time = multiplication by s. Because δ is the derivative of the step, its Laplace transform is .
- An impulse delivers momentum, not displacement. Velocity jumps; position stays continuous.
- Impulse response = system identity. Feed δ in, and the output completely characterizes any linear time-invariant system. Every other response is a convolution with this one.
- ML is full of deltas. One-hot labels, residual connections, Green's functions for PDE solvers, ReLU's second derivative — all are deltas in disguise.
Coming Next: In Convolution we will build on the "δ is the identity" idea to show how any input signal can be written as a continuous sum of shifted impulses, each weighted by the input's value at that instant. Convolving with the impulse response then gives the system's output for arbitrary forcing.