Boo-AI — Master Artificial Intelligence by Building from Scratch

The Seam Problem

Open any photo editor and try to paste a piece of one photo onto another. You drag a sunlit face onto a moody portrait, drop a tropical bird into a winter landscape, or splice a tumour patch into a medical scan for training data. Almost always the same thing happens: a hard, obvious seam appears along the edge of the pasted region. The patch "sits" on top of the scene instead of belonging to it.

The seam is not a hardware bug or a missing alpha channel. It is a boundary mismatch: the pixel colours at the inside edge of your patch disagree with the pixel colours just outside it. Even a tiny brightness offset of 10 grey levels is instantly visible to the human visual system, because the eye is enormously sensitive to gradients — local differences in brightness and colour.

The key insight: what makes a region look "part of" the surrounding scene is not its absolute colour but the way its colour changes as you walk across the boundary. If you can stitch the gradients smoothly, the eye stops seeing the seam — even when the interior pixels are different.

In 2003, Patrick Pérez, Michel Gangnet, and Andrew Blake turned this insight into a clean variational problem and solved it with Poisson's equation $\Delta f = \operatorname{div}(\mathbf{v})$ . This section is the calculus story behind that idea — how a 19th-century equation about gravitational potential ended up driving Photoshop's healing brush, computational photography, gradient-domain HDR, and modern image completion.

Intuition: Edit Gradients, Not Pixels

Think of a photograph not as a grid of colours but as a landscape of heights: pixel value = altitude. A bright sky is a high plateau, a dark shadow is a valley, an edge is a cliff. Most of what your eye registers as "the content" of an image is the slopes of this landscape — the gradient $\nabla f$ .

When you copy-paste, you carry the source patch's altitudes with you. Almost always the patch lands on the target landscape at a different overall height. The cliff at the boundary of the patch is the seam.

The trick: instead of copying altitudes, copy slopes. Take the patch's slope field $\mathbf{v} = \nabla g$ , drop it onto the target landscape, and then ask the patch's heights to be re-computed so that (a) the slopes match $\mathbf{v}$ inside, and (b) the heights match the target on the boundary. Calculus will solve for the heights for us.

That is the entire idea of Poisson blending. We give up control over what the patch looks like in absolute brightness, in exchange for total control over how it flows into its neighbourhood. The eye, watching only slopes, never notices the seam.

An analogy that helps: imagine you are restoring a damaged painting. A square inch is missing in the middle of a fresco. You can't just paste in a square from another fresco — colours won't match. But if you have a sketch of the brushwork pattern that would have been in that square, you can ask a careful painter to redraw the missing inch with that brushwork, blending the pigments smoothly into the surrounding wall. Poisson blending is that careful painter, mathematised.

From Calculus to Pixels: The Discrete Laplacian

Continuous calculus gives us the Laplacian $\Delta f = \partial_x^2 f + \partial_y^2 f$ . An image is a grid, not a smooth function, so we replace each partial derivative with a finite difference. The standard approximation at pixel $(i,j)$ is:

\partial_x^2 f \;\approx\; f(i+1,j) - 2f(i,j) + f(i-1,j)

Add the same thing in the $y$ direction and you get the famous 5-point stencil:

\Delta f(i,j) \;=\; f(i+1,j) + f(i-1,j) + f(i,j+1) + f(i,j-1) - 4 f(i,j)

Visually, that is just: (sum of four neighbours) minus four times the centre. Flat regions give $\Delta f = 0$ . Bumps and dips give nonzero values; the sign tells you whether the pixel is a local maximum or minimum, the magnitude tells you how sharp it is.

Region in the image	Sign of Δf	Meaning
Smooth gradient (sky)	≈ 0	Slopes are constant, no curvature
Bright spot on dark background	negative	Centre higher than neighbours
Dark spot on bright background	positive	Centre lower than neighbours
Edge (cliff)	large magnitude	Strong second derivative across the edge

The 5-point Laplacian is the bridge between calculus and image processing. Everything in Poisson blending will be expressed as a relationship between two of these stencils — one on the blended image we are solving for, one on the source we want to import.

Deriving Poisson's Equation for Blending

Let's state the problem cleanly. We have:

A target image $f^* : \Omega^c \cup \partial\Omega \to \mathbb{R}$ , defined everywhere except inside the patch region.
A source image $g$ defined on $\Omega$ (the region we want to fill).
We want to find a function $f$ on $\Omega \cup \partial\Omega$ whose gradient looks like the source's gradient inside $\Omega$ , but whose values match $f^*$ on the boundary.

Mathematically, define the guidance vector field $\mathbf{v} = \nabla g$ and minimise:

\min_{f} \;\; \iint_{\Omega} \big\lVert \nabla f - \mathbf{v} \big\rVert^2 \, dA \quad \text{subject to}\quad f\big|_{\partial\Omega} = f^*\big|_{\partial\Omega}.

In English: among all functions that agree with the target on the boundary, pick the one whose slopes are closest, in least-squares sense, to the source's slopes. This is a textbook calculus-of-variations problem. The Euler-Lagrange equation falls out by setting the first variation to zero — and the result is exactly Poisson's equation:

\Delta f \;=\; \operatorname{div}(\mathbf{v}) \;=\; \Delta g \quad \text{inside } \Omega,

f\big|_{\partial\Omega} = f^*\big|_{\partial\Omega}.

Where the Euler-Lagrange step comes from: a variation $f \to f + \varepsilon \eta$ with $\eta|_{\partial\Omega} = 0$ changes the objective by $2\varepsilon \iint_\Omega (\nabla f - \mathbf{v}) \cdot \nabla \eta \, dA$ . Integrate by parts (Green's identity) — the boundary term vanishes because $\eta = 0$ there — and you are left with $-2\varepsilon \iint_\Omega (\Delta f - \operatorname{div}\mathbf{v}) \, \eta \, dA = 0$ for every $\eta$ . The only way that is true is $\Delta f = \operatorname{div}\mathbf{v}$ pointwise.

Discretise with the 5-point Laplacian on both sides and we get the linear system we will actually solve. For every pixel $(i,j)$ inside $\Omega$ :

4 f(i,j) - \!\!\!\!\sum_{(i',j') \in \mathcal{N}(i,j)}\!\!\!\! f(i',j') \;=\; 4 g(i,j) - \!\!\!\!\sum_{(i',j') \in \mathcal{N}(i,j)}\!\!\!\! g(i',j').

Whenever a neighbour $(i',j')$ lies outside $\Omega$ , we know its value: it is $f^*(i',j')$ , the target. So that term moves to the right-hand side and becomes a fixed contribution. The remaining unknowns are exactly the pixels inside $\Omega$ , and we have one linear equation per unknown — the system is sparse, symmetric, and positive-definite. Perfect for iterative solvers.

Why Dirichlet Boundary Conditions Make Seams Vanish

The Dirichlet boundary condition $f|_{\partial\Omega} = f^*|_{\partial\Omega}$ is doing more than "pin the edge." It is the entire reason the seam disappears.

Suppose the source patch is uniformly 30 grey levels brighter than the target at the boundary. The naive paste produces a hard 30-level jump along the boundary — instantly visible. Poisson blending responds by adding a harmonic correction $h$ across the entire interior of $\Omega$ : a smooth function with $\Delta h = 0$ inside, whose values on the boundary cancel the mismatch exactly.

Mathematically, write $f = g + h$ inside $\Omega$ . Substituting into $\Delta f = \Delta g$ gives $\Delta h = 0$ — so $h$ is harmonic. And the boundary condition $f = f^*$ becomes $h = f^* - g$ on $\partial\Omega$ . So $h$ is the unique harmonic function that smoothly repairs the boundary disagreement over the whole patch. It is, in a precise sense, the "best feathering" you could hope for — better than any handcrafted blur kernel because the eye sees no second derivative jumps.

Why this matters for art and ML: the harmonic correction is what humans perceive as "the patch belongs." Modern neural style-transfer and diffusion-based inpainting systems can be viewed as learned versions of exactly this correction — they regress a more powerful, non-linear inpainting field, but the boundary-matching principle is the same.

Interactive: Watch a 1D Seam Disappear

In one dimension the math is identical but the picture is simpler. The target is a smooth curve; the "patch" is a sub-interval $[a, b]$ we want to replace by a source curve. The naive paste creates a vertical jump at each endpoint (the seam). The Poisson blend solves $f''(x) = g''(x)$ on $(a, b)$ with $f(a) = f^*(a)$ and $f(b) = f^*(b)$ . The result keeps the source's wiggle but ramps up or down to match the target on both sides.

Drag the Source DC offset slider to mis-align the patch vertically. The red curve (naive paste) develops bigger and bigger seams. The blue curve (Poisson blend) keeps the same internal shape but shifts/tilts to absorb the offset — and the seams vanish.

Source DC offset: 0.70Controls how badly the patch mismatches the target.Source amplitude: 0.35Internal "texture" of the patch (its gradient).Target tilt: 0.40Slope of the surrounding scene.

Show naive paste (red)Show Poisson blend (blue)

Notice what the blue curve does as you slide the offset: it does not simply translate the source up or down. It applies a smooth, almost linear, correction across the interval — exactly the 1D version of the harmonic function $h$ . (In 1D, harmonic = linear, because $h'' = 0$ .) The interior wiggle is preserved because its second derivative, not its absolute height, is what the equation cares about.

Worked Example: A 4×4 Patch by Hand

Before we trust a solver, let's do one fully by hand. A tiny $4 \times 4$ patch is small enough to write every equation explicitly. Open the box below and try the arithmetic on paper — it makes the matrix structure of Poisson blending tangible.

Click to expand: solve a 4×4 Poisson blend by hand

Take a tiny scene. The target $f^*$ is a 4×4 image:

f^* = \begin{pmatrix} 10 & 12 & 14 & 16 \\ 12 & ? & ? & 18 \\ 14 & ? & ? & 20 \\ 16 & 18 & 20 & 22 \end{pmatrix}

The four $?$ 's are the patch $\Omega$ : the inner $2 \times 2$ block, with the 12 boundary cells around it acting as $\partial\Omega$ .

The source $g$ on the same grid is a much brighter object:

g = \begin{pmatrix} 100 & 101 & 102 & 103 \\ 101 & 200 & 201 & 104 \\ 102 & 201 & 202 & 105 \\ 103 & 104 & 105 & 106 \end{pmatrix}

The naive paste copies the four central $g$ values $200, 201, 201, 202$ straight into the target — producing a hideous bright square next to the dim border (12, 14, 18, 20).

Step 1 — compute div(v) at each interior pixel

For the patch pixel at position $(1,1)$ (using row, column with 0-indexing) the 5-point Laplacian of $g$ is:

\Delta g(1,1) = 4 \cdot 200 - 101 - 201 - 101 - 201 = 800 - 604 = 196.

By symmetry the other three give:

Pixel (r, c)	g value	Sum of 4 g-neighbours	Δg = 4·g − sum
(1,1)	200	101+201+101+201 = 604	196
(1,2)	201	200+104+102+201 = 607	197
(2,1)	201	102+202+201+104 = 609	195
(2,2)	202	201+105+201+105 = 612	196

Step 2 — write the linear system

Let $a, b, c, d$ be the four unknown blended pixels in the same order (top-left, top-right, bottom-left, bottom-right of the inner block). Each one's discrete Poisson equation pulls in its known target neighbours from $\partial\Omega$ :

4a - b - c - 12 - 12 = 196 \;\Longrightarrow\; 4a - b - c = 220

4b - a - d - 14 - 18 = 197 \;\Longrightarrow\; 4b - a - d = 229

4c - a - d - 14 - 18 = 195 \;\Longrightarrow\; 4c - a - d = 227

4d - b - c - 20 - 20 = 196 \;\Longrightarrow\; 4d - b - c = 236

Step 3 — solve the 4×4 linear system

The system $A\mathbf{x} = \mathbf{r}$ is small enough to solve by adding and subtracting rows. Add the first and fourth equations, and the second and third equations:

(4a + 4d) - (b + c) - (b + c) = 220 + 236 \;\Rightarrow\; 4(a+d) - 2(b+c) = 456

(4b + 4c) - (a + d) - (a + d) = 229 + 227 \;\Rightarrow\; 4(b+c) - 2(a+d) = 456

Let $S = a+d$ and $T = b+c$ . Then $4S - 2T = 456$ and $4T - 2S = 456$ . Adding gives $2S + 2T = 456 \cdot 2 / 1$ … let's just solve cleanly. From the first: $T = 2S - 228$ . Substitute: $4(2S - 228) - 2S = 456$ , so $6S = 1368$ , giving $S = 228$ and $T = 228$ .

Now subtract equation 1 from equation 4: $4(d - a) = 16$ , so $d - a = 4$ . With $a + d = 228$ we get $a = 112$ , $d = 116$ . Equation 2 minus equation 3 gives $4(b - c) = 2$ , so $b - c = 0.5$ . With $b + c = 228$ we get $b = 114.25$ , $c = 113.75$ .

Step 4 — interpret

The blended patch is approximately:

f_{\Omega} \approx \begin{pmatrix} 112 & 114.25 \\ 113.75 & 116 \end{pmatrix}

Compare three things side by side at pixel $(1,1)$ : target said $?$ , naive paste said $200$ , Poisson blend says $112$ . The bright source has been pulled down toward the dim target — but critically, the differences between the four blended pixels $(112, 114.25, 113.75, 116)$ match the differences in the source $(200, 201, 201, 202)$ . Same internal texture; rescaled absolute level. That is Poisson blending in four numbers.

Sanity check via Jacobi. A single Jacobi sweep starting from the target values $(0, 0, 0, 0)$ in the interior takes about 25 iterations to reach those numbers to within $10^{-3}$ . The next interactive (and the Python code below) runs exactly this loop on full images.

Interactive: 2D Seamless Cloning Playground

Two synthetic 96×72 images. The target is a soft blue-to-green scene with a sun-bright spot; the source is a colourful spiral object. Click anywhere on the target to drop the patch region $\Omega$ . Switch between naive paste, full Poisson blend, and mixed gradients (next section) to compare.

Target (click to move Ω)

Source (object to clone)

Result (poisson)

Patch radius: 14Iterations: 500More iterations → smoother seam.Mode

Click anywhere on the target image to place the patch region Ω. The Poisson solver runs on every change — give it a moment for large iterations counts.

Two things to play with: (a) drop the patch on a bright part of the scene versus a dark part — notice how Poisson blending pulls the patch's brightness toward whatever ∂Ω demands, without you ever recolouring it; (b) crank iterations from 50 up to 1500 — at low iterations the seam is still slightly visible because Jacobi has not yet propagated the boundary correction all the way to the centre of $\Omega$ .

Plain Python: A Tiny Jacobi Solver

Twelve lines of NumPy is all it takes. We build the guidance field $\operatorname{div}(\mathbf{v}) = \Delta g$ once, then run Jacobi sweeps that, at every pixel inside $\Omega$ , replace its value by the average of its four neighbours plus $\Delta g / 4$ . Pixels outside $\Omega$ are never touched — that is the boundary condition in code.

Poisson blending in plain NumPy (Jacobi)

🐍python

Explanation(11)

Code(43)

1📚 Import NumPy

NumPy gives us vectorised array slicing (g[1:-1, 1:-1]) and the np.clip / np.zeros_like helpers. Slicing replaces the inner double-for-loop over pixels and runs roughly 100× faster than pure Python.

EXAMPLE

np.zeros_like(np.ones((2,3))) → array of zeros, shape (2,3)

3Function signature — three inputs + iteration cap

We hand the solver three image-shaped arrays. Conceptually: • target = f*, the background scene that owns the boundary ∂Ω. • source = g, the object whose gradients we want to transplant. • mask = the binary stencil that marks Ω (where to solve) versus the rest (where we trust the target).

EXAMPLE

target.shape = source.shape = (480, 640, 3)
mask.shape   = (480, 640)
mask.sum()   = 1827   (pixels inside Omega)

14Read the image dimensions

Pulling H, W, C off the target tensor lets the function generalise to any image size without hardcoding. C is the channel count — typically 3 for RGB.

EXAMPLE

target.shape = (480, 640, 3) → H=480, W=640, C=3

15Copy target into the working buffer f

Two reasons: 1. Cast to float64 so subtractions never overflow uint8. 2. .copy() avoids mutating the caller's array. The pixels OUTSIDE Omega will keep their target values for the entire iteration — that is how the Dirichlet boundary condition f|∂Ω = f* is enforced. We only update pixels where mask = 1.

EXAMPLE

target dtype uint8 → f dtype float64; same shape (H, W, 3)

19📚 Discrete Laplacian of the source — the guidance field

This is the heart of Poisson blending. The continuous equation Δf = Δg becomes, per pixel: 4 f(i,j) - f(i-1,j) - f(i+1,j) - f(i,j-1) - f(i,j+1) = div_v(i,j) with div_v(i,j) = 4 g(i,j) - g(i-1,j) - g(i+1,j) - g(i,j-1) - g(i,j+1). We pre-compute div_v once because g never changes during the iteration. The slice g[1:-1, 1:-1] is the interior of the image; g[:-2, 1:-1] is shifted UP by one row (so it represents the left/up neighbour in the stencil).

EXAMPLE

If g[r, c] = 100 and all four neighbours are 90, then
div_v[r, c] = 4·100 - 4·90 = 40
→ the source has a 'pull' of +40 at this pixel (a bump).

21Vectorised stencil — read each shifted slice

Each slice is a copy of g shifted by one pixel along a single axis. • g[:-2, 1:-1] ↔ up neighbours (row index - 1) • g[2: , 1:-1] ↔ down neighbours (row index + 1) • g[1:-1, :-2] ↔ left neighbours (col index - 1) • g[1:-1, 2: ] ↔ right neighbours (col index + 1) Writing it this way means NumPy does the work in C; the inner loop is gone.

EXAMPLE

g = [[1,2,3],[4,5,6],[7,8,9]]
g[1:-1, 1:-1] = [[5]]
g[:-2,  1:-1] = [[2]]  (up of 5)
g[2:,   1:-1] = [[8]]  (down of 5)

29Cast the mask to bool

f[m] = upd[m] uses boolean fancy indexing — only the pixels where m is True are overwritten. If we passed a float or int mask NumPy would broadcast it numerically (producing nonsense), so we explicitly cast.

EXAMPLE

mask = np.array([[0,1,0],[1,1,0]], dtype=np.uint8)
mask.astype(bool) → [[F,T,F],[T,T,F]]

30Jacobi iteration loop

Jacobi solves the linear system A f = b by repeatedly applying f ← D⁻¹ (b - (A - D) f) where D is the diagonal of A. For our 5-point Laplacian, A has 4 on the diagonal and -1 on each neighbour, so D⁻¹·... = (sum of neighbours + b) / 4 — which is exactly the next line. We expect convergence in O(N²) iterations where N is the patch radius; for small Omega this is fast.

EXAMPLE

n_iter = 2000 is plenty for a 40-pixel-radius patch.

32📚 The vectorised Jacobi update

Read this as: 'replace every interior pixel by the average of its four neighbours, plus div/4.' The shifted slices reuse the same trick as for the source Laplacian: • f[:-2, 1:-1] = neighbour above • f[2:, 1:-1] = neighbour below • f[1:-1, :-2] = neighbour to the left • f[1:-1, 2: ] = neighbour to the right Multiplying by 0.25 is the D⁻¹ step. The +div[1:-1, 1:-1] / 4 inside the parentheses pulls the source's local 'shape' into the blended pixel — this is what makes the patch keep the source texture while honouring the target boundary.

EXAMPLE

For a flat target (all neighbours = 100) with div = 40:
upd = 0.25 · (100 + 100 + 100 + 100 + 40) = 110
→ blended pixel inherits the +10 'bump' from the source.

40Write only inside Omega — preserves the Dirichlet boundary

f[m] = upd[m] copies just the masked pixels. Anything outside Omega keeps its original target value forever; that is how the boundary condition f|∂Ω = f* survives every iteration. If we forgot this line and wrote f = upd, the smoother would erase the target outside Omega.

EXAMPLE

Before: f outside Omega = target.
Update computed everywhere, but f[m] = upd[m] only writes inside.
After:  outside Omega untouched, inside Omega advanced one Jacobi step.

42Clip to valid pixel range and return

Solving Poisson can push pixel values slightly outside [0, 255] when div is large — for example near a strong source edge sitting right at the Dirichlet boundary. np.clip flattens those overshoots. For float output you can also keep the raw values and let the renderer tone-map.

EXAMPLE

np.clip([-3.4, 12, 271], 0, 255) → [0., 12., 255.]

32 lines without explanation

1import numpy as np
2
3def poisson_blend(target, source, mask, n_iter=2000):
4    """
5    Seamlessly clone a source patch into a target image.
6
7    target : (H, W, 3) float array, the destination scene.
8    source : (H, W, 3) float array, the object to clone.
9    mask   : (H, W)    bool/uint8 array. 1 inside the region Omega.
10    n_iter : Jacobi sweeps.
11
12    Returns: (H, W, 3) float array, the blended image.
13    """
14    H, W, C = target.shape
15    f = target.astype(np.float64).copy()
16
17    # Discrete Laplacian of the source = guidance "divergence" field
18    # div_v(i,j) = 4 g(i,j) - g(i-1,j) - g(i+1,j) - g(i,j-1) - g(i,j+1)
19    g = source.astype(np.float64)
20    div = np.zeros_like(g)
21    div[1:-1, 1:-1] = (
22        4.0 * g[1:-1, 1:-1]
23        - g[:-2, 1:-1]
24        - g[2:, 1:-1]
25        - g[1:-1, :-2]
26        - g[1:-1, 2:]
27    )
28
29    m = mask.astype(bool)
30    for _ in range(n_iter):
31        # Jacobi update: f_new = (left + right + up + down + div) / 4
32        upd = np.zeros_like(f)
33        upd[1:-1, 1:-1] = 0.25 * (
34            f[:-2, 1:-1]
35            + f[2:, 1:-1]
36            + f[1:-1, :-2]
37            + f[1:-1, 2:]
38            + div[1:-1, 1:-1]
39        )
40        # Only overwrite pixels inside Omega; boundary (and outside) stays fixed
41        f[m] = upd[m]
42
43    return np.clip(f, 0, 255)

On a typical laptop this runs a 200×200 patch with 2000 iterations in about two seconds — already usable for prototyping. For real-time editing or 4K images you want a GPU.

PyTorch on the GPU

The Jacobi iteration is embarrassingly parallel: every pixel's update depends only on the previous iteration's neighbours, so we can update them all at once. PyTorch makes this trivial — replace NumPy slices with the same kind of slicing on $(C, H, W)$ tensors, pad with $\texttt{F.pad}$ for clean borders, and run on $\texttt{cuda}$ .

GPU Poisson blending in PyTorch

🐍python

Explanation(13)

Code(40)

1📚 Import PyTorch

torch gives us GPU tensors plus autograd; torch.nn.functional has F.pad — the cleanest way to replicate-pad an image so the Laplacian stencil is well-defined at the borders.

EXAMPLE

torch.zeros(2, 3, 4, device='cuda') → tensor on GPU, shape (2, 3, 4)

4Function signature

Layout matches PyTorch's image convention: (C, H, W) for the image tensors, (H, W) for the mask. device='cuda' so the entire iteration runs on the GPU — Jacobi is embarrassingly parallel.

EXAMPLE

target.shape = (3, 480, 640), mask.shape = (480, 640)

9Move target to device and cast

Two side-effects: copies to GPU (if it isn't there already) and promotes to float32 so subtractions don't wrap around in uint8.

EXAMPLE

torch.tensor([10, 250], dtype=torch.uint8).float() → tensor([10., 250.])

11Mask becomes a boolean tensor

We need a boolean for torch.where on line 28. Without .bool() the where call still works but the cast happens implicitly on every iteration — wasteful inside a hot loop.

EXAMPLE

torch.tensor([[0,1],[1,0]]).bool() → tensor([[F,T],[T,F]])

14Local helper: 5-point Laplacian via F.pad

Defining lap once avoids duplicating the slicing logic. F.pad(x, (1,1,1,1), mode='replicate') pads the LAST two dims by 1 on every side, copying the edge pixel outward. After padding, x_pad has shape (..., H+2, W+2) and the four shifted slices have shape (..., H, W) — exactly aligning with x.

EXAMPLE

F.pad(tensor([[1,2]]), (1,1,1,1), mode='replicate') →
tensor([[1,1,2,2],[1,1,2,2],[1,1,2,2]])

16📚 The padded-shift Laplacian

Same 5-point stencil as the NumPy version, just expressed with PyTorch indexing: • x_pad[..., :-2, 1:-1] ↔ up • x_pad[..., 2:, 1:-1] ↔ down • x_pad[..., 1:-1, :-2] ↔ left • x_pad[..., 1:-1, 2: ] ↔ right The leading ... ellipsis makes the helper work both with batched and unbatched inputs.

EXAMPLE

If x = tensor of all 5s and edge = 5 (after replicate pad),
lap(x) = 4·5 - 5 - 5 - 5 - 5 = 0  (flat regions have zero Laplacian).

24Precompute div = lap(source)

The guidance field never changes inside the iteration, so we compute it once outside the loop. This is the single most important optimisation — without it you'd run lap(g) 2000 times for free.

EXAMPLE

div.shape = (3, 480, 640), same as t and g.

25Working buffer f starts at target

Cloning t (not just naming f = t) avoids mutating the caller's tensor. Pixels outside Omega will be preserved across all iterations because torch.where keeps them on line 28.

EXAMPLE

f = t.clone() → independent tensor, same values to start.

27Jacobi sweep — same idea, on GPU

Each iteration runs in parallel across every pixel on every channel. On a 3090 a 640×480 patch with 2000 iterations finishes in well under a second.

EXAMPLE

n_iter = 2000 → about 6 ms/iter on a modern GPU.

28Pad inside the loop

Why re-pad each iteration? Because f changes every step, the padded edges must be refreshed. The pad is cheap; the matmul-style slice arithmetic that follows dominates the cost.

EXAMPLE

f_pad.shape = (3, 482, 642) before slicing back to (3, 480, 640).

30📚 The vectorised Jacobi update

Same 'average of four neighbours plus div/4' rule as the NumPy version, but every arithmetic op fuses into a handful of CUDA kernels. The +div term injects the source's local shape; the four-neighbour average smooths everything else toward the Dirichlet boundary that lives outside Omega.

EXAMPLE

For a flat region of f and div=40: upd = 0.25·(100+100+100+100+40) = 110 — a +10 bump inherited from the source.

38torch.where — branchless mask update

torch.where(cond, a, b) takes a wherever cond is True and b otherwise. m.unsqueeze(0) reshapes mask from (H, W) to (1, H, W) so it broadcasts across the channel dimension. This single op replaces the boolean-fancy-indexing f[m] = upd[m] from NumPy and is fully vectorised on GPU.

EXAMPLE

torch.where(
  tensor([[T,F],[F,T]]),
  tensor([[10,20],[30,40]]),
  tensor([[ 1, 2],[ 3, 4]]),
) → tensor([[10, 2],[ 3,40]])

41Clamp and return

f.clamp(0, 255) is the in-tensor equivalent of np.clip. It guarantees the returned image still fits inside the displayable range even when div has very large values right at ∂Ω.

EXAMPLE

tensor([-3.0, 12.0, 271.0]).clamp(0, 255) → tensor([0., 12., 255.])

27 lines without explanation

1import torch
2import torch.nn.functional as F
3
4def poisson_blend_torch(target, source, mask, n_iter=2000, device="cuda"):
5    """
6    GPU Poisson blending. All tensors must be float32, shape (C, H, W) for
7    the images and (H, W) for the mask.
8    """
9    t = target.to(device).float()
10    g = source.to(device).float()
11    m = mask.to(device).bool()
12
13    # Pad once so shifted slices share the same shape as t
14    def lap(x):
15        # 5-point discrete Laplacian: 4·x - up - down - left - right
16        x_pad = F.pad(x, (1, 1, 1, 1), mode="replicate")
17        return (
18            4.0 * x
19            - x_pad[..., :-2, 1:-1]
20            - x_pad[..., 2:, 1:-1]
21            - x_pad[..., 1:-1, :-2]
22            - x_pad[..., 1:-1, 2:]
23        )
24
25    div = lap(g)                         # guidance field, fixed for all iterations
26    f = t.clone()                        # working buffer; outside Omega stays at t
27
28    for _ in range(n_iter):
29        f_pad = F.pad(f, (1, 1, 1, 1), mode="replicate")
30        upd = 0.25 * (
31            f_pad[..., :-2, 1:-1]
32            + f_pad[..., 2:, 1:-1]
33            + f_pad[..., 1:-1, :-2]
34            + f_pad[..., 1:-1, 2:]
35            + div
36        )
37        # Boolean mask broadcasts across the channel dim
38        f = torch.where(m.unsqueeze(0), upd, f)
39
40    return f.clamp(0, 255)

Two production improvements you should be aware of, even though we keep them out of the teaching code: (1) replacing Jacobi with red-black Gauss-Seidel or conjugate gradient roughly halves the iteration count; (2) for rectangular patches a closed-form solver via the discrete sine transform exists and runs in $O(N \log N)$ — essentially a single FFT pass.

Mixing Gradients: Keep the Strongest Edges

Pure Poisson blending always uses the source's gradients $\mathbf{v} = \nabla g$ . That is exactly right when the patch should replace what is under it. But sometimes you want to overlay — paste graffiti onto a wall and keep the wall's brick texture visible through the thin parts of the letters, or paste a face into a low-contrast background and keep some background detail.

The fix is one line of code. Instead of always taking $\mathbf{v} = \nabla g$ , at each pixel pick — per direction — whichever of $\nabla g$ or $\nabla f^*$ has the larger magnitude:

\mathbf{v}(p) \;=\; \begin{cases} \nabla g(p) & \text{if } |\nabla g(p)| > |\nabla f^*(p)|, \\ \nabla f^*(p) & \text{otherwise.} \end{cases}

The Poisson equation, the boundary condition, and the solver are unchanged — only the right-hand-side guidance field differs. In the 2D interactive above, switch the mode to mixed and drag the patch over the bright sun in the target: the sun's strong gradient survives through the patch because it "won" the pixel-by-pixel competition.

Variants you can build with the same machinery: texture flattening (kill weak gradients, keep strong edges → cartoon look); local illumination change (compress the log of the magnitude of $\nabla f$ → HDR tone-mapping); seamless tiling (force the gradient to be periodic on the boundary). They are all one-line changes to $\mathbf{v}$ .

Where This Shows Up in the Real World

Poisson blending is not a niche technique. The same equation, with the same Jacobi-or-better solver underneath, powers a surprising slice of modern visual computing.

Domain	What is solved	What changes
Photoshop healing brush / seamless clone	Δf = Δg on the brushed region	Boundary takes original pixels — invisible mends.
Computational photography (HDR, flash/no-flash)	Δf = div(compressed gradients)	Compresses dynamic range while preserving local contrast.
Panorama stitching	Δf = Δg with mixed gradients across seams	Removes exposure jumps between adjacent photos.
Texture / inpainting	Δf = 0 inside Ω (Laplace equation)	Pure feathering; smoothly fills small holes.
Depth completion (lidar)	Δd = guidance derived from RGB edges	Sparse lidar depth is densified using image gradients.
Image-to-image translation pre-/post-processing	Poisson blending on the generator output	Pastes generated content into a real photo seamlessly.
Medical imaging (data augmentation)	Poisson blending of synthetic lesions	Trains tumour detectors with realistic synthetic positives.

Once you have a Poisson solver you also essentially have a Laplace solver (just feed it $\Delta g = 0$ ) — and Laplace turns up in physics simulation (heat, electrostatics), fluid dynamics (pressure projection in incompressible flow), and graph signal processing. The image-blending application is just the most photogenic example.

Summary

The human eye reads images through gradients, not absolute pixel values. Match gradients on the inside, match values on the boundary, and seams vanish.
That requirement is exactly the variational problem $\min \iint |\nabla f - \mathbf{v}|^2 \, dA$ with a Dirichlet boundary condition. Its Euler-Lagrange equation is $\Delta f = \operatorname{div}\mathbf{v}$ — Poisson's equation.
Discretising with the 5-point Laplacian turns the PDE into a sparse symmetric positive-definite linear system. Jacobi sweeps solve it; on GPU the same code runs in milliseconds.
Swapping the guidance field $\mathbf{v}$ reuses the same solver for mixed gradients, texture flattening, HDR tone-mapping, panorama stitching, depth completion, and more.
Poisson blending was the first vivid example in modern image processing of an idea now everywhere in ML: specify what the answer should look like locally, and let an optimiser fill in the global picture.