Chapter 8
30 min read
Section 55 of 175

Jacobian Transformation Method

Transformations of Random Variables

Learning Objectives

The Jacobian transformation method is one of the most powerful and elegant techniques in probability theory. By mastering it, you will gain the ability to derive the probability distribution of any function of random variables. This section will equip you to:

  1. Understand why probability density must be adjusted when random variables are transformed, and the deep connection to conservation of probability
  2. Derive the change of variables formula fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \cdot |\frac{d}{dy}g^{-1}(y)| for univariate transformations
  3. Compute Jacobian matrices and determinants for multivariate transformations
  4. Visualize how the Jacobian measures local stretching and compression of probability space
  5. Apply the technique to derive famous distributions: log-normal, chi-square, F-distribution, and more
  6. Connect the Jacobian to modern AI: normalizing flows, variational autoencoders, and density estimation
  7. Implement Jacobian transformations in Python with NumPy and PyTorch

Why This Matters

The Jacobian transformation method is the mathematical foundation for understanding how probability flows through computational graphs. Whether you're deriving the distribution of a neural network output, training a generative model, or performing Bayesian inference, you're implicitly using the Jacobian.


Why the Jacobian Matters: The Fundamental Problem

"The Jacobian is the bridge between random variables—it tells us how probability must redistribute when we transform."

Suppose we have a random variable XX with a known probability density function fX(x)f_X(x). Now we define a new random variable Y=g(X)Y = g(X) where gg is some function. The fundamental question is:

What is the PDF of Y? That is, what is fY(y)f_Y(y)?

This is not simply fY(y)=fX(g1(y))f_Y(y) = f_X(g^{-1}(y)). Why? Because probability must be conserved. The total probability in any interval must remain the same before and after transformation.

The Conservation Principle

Consider an infinitesimal interval [x,x+dx][x, x + dx] in the domain of XX. The probability in this interval is approximately:

P(xXx+dx)fX(x)dxP(x \leq X \leq x + dx) \approx f_X(x) \cdot dx

When we transform via Y=g(X)Y = g(X), this interval maps to [g(x),g(x+dx)][g(x), g(x+dx)] in the range of YY. The width of the new interval is:

dy=g(x+dx)g(x)g(x)dxdy = g(x + dx) - g(x) \approx g'(x) \cdot dx

Since probability must be conserved:

fY(y)dy=fX(x)dxf_Y(y) \cdot dy = f_X(x) \cdot dx

Solving for fY(y)f_Y(y):

fY(y)=fX(x)dxdy=fX(g1(y))ddyg1(y)f_Y(y) = f_X(x) \cdot \frac{dx}{dy} = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|

The Key Insight

The factor \left| \frac{d}{dy} g^{-1}(y) \right| = \frac{1}{|g'(x)|} is the Jacobian. It measures how the transformation stretches or compresses space:

  • If |g'(x)| > 1: The transformation stretches space, so density decreases
  • If |g&apos;(x)| < 1: The transformation compresses space, so density increases
  • If |g&apos;(x)| = 1: The transformation preserves local scale (like a shift)

The Historical Story: Carl Gustav Jacob Jacobi

The Jacobian is named after Carl Gustav Jacob Jacobi (1804-1851), a German mathematician who made profound contributions to analysis, number theory, and mechanics. In his 1841 paper on the theory of determinants, Jacobi systematized the study of functional determinants—now called Jacobians.

The Problem Jacobi Solved

Mathematicians had long struggled with changing variables in multiple integrals. While single-variable substitution (u=g(x)u = g(x), du = g&apos;(x)dx) was well understood, the multi-dimensional case was far more subtle.

Jacobi showed that when transforming from coordinates (x,y)(x, y) to (u,v)(u, v), the area element transforms as:

dxdy=det(xuxvyuyv)dudvdx\,dy = \left| \det \begin{pmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{pmatrix} \right| du\,dv

This determinant is the Jacobian determinant. Jacobi proved it measures how infinitesimal areas (or volumes in higher dimensions) scale under transformation.

From Calculus to Probability

The connection to probability came later, when statisticians realized that probability is just an integral:

P(aYb)=abfY(y)dy=g1(a)g1(b)fX(x)dxP(a \leq Y \leq b) = \int_a^b f_Y(y)\,dy = \int_{g^{-1}(a)}^{g^{-1}(b)} f_X(x)\,dx

The Jacobian ensures that this integral gives the same answer regardless of which variable we integrate over—probability is coordinate-independent.

Modern Relevance

Today, the Jacobian is central to deep learning. Normalizing flows, variational autoencoders (VAEs), and neural density estimation all rely on computing or approximating Jacobian determinants efficiently.

The Univariate Case: Functions of a Single Random Variable

The Formal Theorem

Let XX be a continuous random variable with PDF fX(x)f_X(x). Let Y=g(X)Y = g(X) where gg is a monotonic (strictly increasing or decreasing) and differentiable function. Then the PDF of YY is:

fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|

Equivalently, using the inverse function theorem:

fY(y)=fX(g1(y))g(g1(y))f_Y(y) = \frac{f_X(g^{-1}(y))}{|g'(g^{-1}(y))|}

Step-by-Step Procedure

  1. Identify the transformation: Write Y=g(X)Y = g(X) explicitly
  2. Find the inverse: Solve for X=g1(Y)X = g^{-1}(Y)
  3. Compute the Jacobian: Calculate ddyg1(y)\left| \frac{d}{dy} g^{-1}(y) \right| or equivalently \frac{1}{|g&apos;(g^{-1}(y))|}
  4. Apply the formula: fY(y)=fX(g1(y))Jacobianf_Y(y) = f_X(g^{-1}(y)) \cdot |\text{Jacobian}|
  5. Determine the support: Find the range of YY (where fY(y)>0f_Y(y) > 0)

Example 1: Linear Transformation

Let XN(0,1)X \sim N(0, 1) and Y=aX+bY = aX + b where a0a \neq 0.

  • Inverse: X=YbaX = \frac{Y - b}{a}
  • Jacobian: dXdY=1a\left| \frac{dX}{dY} \right| = \frac{1}{|a|}
  • Result: fY(y)=1a12πe(yb)22a2=1a2πe(yb)22a2f_Y(y) = \frac{1}{|a|} \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{(y-b)^2}{2a^2}} = \frac{1}{|a|\sqrt{2\pi}} e^{-\frac{(y-b)^2}{2a^2}}

This confirms YN(b,a2)Y \sim N(b, a^2)—linear transformations of normals are normal.

Example 2: Log-Normal Distribution

Let XN(μ,σ2)X \sim N(\mu, \sigma^2) and Y=eXY = e^X.

  • Inverse: X=ln(Y)X = \ln(Y) for Y>0Y > 0
  • Jacobian: dXdY=1Y\left| \frac{dX}{dY} \right| = \frac{1}{Y}
  • Result: fY(y)=1yσ2πe(lnyμ)22σ2f_Y(y) = \frac{1}{y \sigma \sqrt{2\pi}} e^{-\frac{(\ln y - \mu)^2}{2\sigma^2}} for y>0y > 0

This is the log-normal distribution, widely used to model stock prices, biological measurements, and any quantity that results from multiplicative processes.

Non-Monotonic Transformations

When gg is not monotonic, multiple values of XX may map to the same YY. We must sum contributions from all branches:

fY(y)=i:g(xi)=yfX(xi)g(xi)f_Y(y) = \sum_{i: g(x_i) = y} \frac{f_X(x_i)}{|g'(x_i)|}

Example 3: Chi-Square from Normal

Let XN(0,1)X \sim N(0, 1) and Y=X2Y = X^2.

For any y>0y > 0, both x=yx = \sqrt{y} and x=yx = -\sqrt{y} map to yy. The Jacobian at each point is 2x=2y|2x| = 2\sqrt{y}.

fY(y)=fX(y)2y+fX(y)2y=212πey/22y=12πyey/2f_Y(y) = \frac{f_X(\sqrt{y})}{2\sqrt{y}} + \frac{f_X(-\sqrt{y})}{2\sqrt{y}} = \frac{2 \cdot \frac{1}{\sqrt{2\pi}} e^{-y/2}}{2\sqrt{y}} = \frac{1}{\sqrt{2\pi y}} e^{-y/2}

This is the chi-square distribution with 1 degree of freedom: Yχ2(1)Y \sim \chi^2(1).


Interactive Exploration: Univariate Transformations

The visualization below lets you explore how different transformation functions g(X)g(X) affect the probability distribution. Watch how the Jacobian stretches or compresses different regions of the PDF.

📈Univariate Transformation Explorer

See how probability distributions transform under different functions of random variables. The Jacobian |dY/dX| determines how probability density stretches or compresses.

Y = aX + b
Linear transformations scale and shift the distribution uniformly.
Jacobian: |dY/dX| = |a| = 2

Source: X ~ N(0, 1\u00B2)

-2-1012Xf\u2093(x)

Transformed: Y = g(X)

0246Yf\u2099(y)
\ud83d\udca1 Key Formula: Change of Variables
f\u2099(y) = f\u2093(g\u207b\u00b9(y)) \u00b7 |d(g\u207b\u00b9)/dy| = f\u2093(g\u207b\u00b9(y)) / |dg/dx|
The Jacobian |dg/dx| measures how much the transformation stretches or compresses space. Where the transformation stretches (large Jacobian), probability density decreases. Where it compresses (small Jacobian), density increases.

Geometric Interpretation: Why We Divide by the Jacobian

The most intuitive way to understand the Jacobian is through area conservation. When we transform coordinates, probability must redistribute to maintain the total of 1.

Think of probability as incompressible fluid. When the transformation squeezes space, the fluid (probability) piles up higher. When it stretches space, the fluid spreads thinner.

The interactive demonstration below shows this geometrically for Y=X2Y = X^2. Notice how:

  • Near X=0X = 0: The Jacobian 2X|2X| is small, so stretching is minimal and density stays high
  • For larger X|X|: The Jacobian grows, stretching is greater, and density decreases proportionally
\ud83d\udcceJacobian as Area Stretching

The Jacobian |dY/dX| measures how a small interval in X stretches or compresses when transformed to Y-space. This is why we divide by the Jacobian in the PDF formula.

Move this to see how stretching varies

Size of the interval in X-space

XY0.511.522.512345678Y = X\u00B2\u0394x = 0.50\u0394y = 1.00
Input Interval
\u0394x = 0.500
Output Interval
\u0394y = 1.000
Jacobian |dY/dX|
2.00
= 2|X\u2080| = 2\u00b71.0
Stretch Ratio
2.00\u00d7
\u0394y/\u0394x \u2248 Jacobian

\ud83d\udca1 What This Means for Probability

Probability must be conserved: the total probability in any interval must remain the same after transformation.

P(X \u2208 [x, x+\u0394x]) = P(Y \u2208 [g(x), g(x+\u0394x)])

Since f(x)\u00b7\u0394x = f\u2099(y)\u00b7\u0394y, and \u0394y \u2248 |g'(x)|\u00b7\u0394x (Jacobian):

f\u2099(y) = f\u2093(x) / |g'(x)| = f\u2093(g\u207b\u00b9(y)) / |g'(g\u207b\u00b9(y))|

Current example: At X\u2080 = 1.0, the transformation Y = X\u00b2 stretches space by a factor of 2.00, so the PDF must be divided by 2.00 to conserve probability.


The Bivariate Case: Functions of Two Random Variables

The univariate formula extends naturally to multiple dimensions. For a transformation (U,V)=g(X,Y)(U, V) = g(X, Y), the joint PDF transforms as:

fU,V(u,v)=fX,Y(g1(u,v))J1f_{U,V}(u, v) = f_{X,Y}(g^{-1}(u, v)) \cdot |J^{-1}|

where J1J^{-1} is the Jacobian of the inverse transformation. Equivalently:

fU,V(u,v)=fX,Y(x,y)det(J)f_{U,V}(u, v) = \frac{f_{X,Y}(x, y)}{|\det(J)|}

where (x,y)=g1(u,v)(x, y) = g^{-1}(u, v) and JJ is the Jacobian matrix of the forward transformation.


The Jacobian Matrix: Multidimensional Stretching

For a transformation (u,v)=g(x,y)(u, v) = g(x, y) where u=u(x,y)u = u(x, y) and v=v(x,y)v = v(x, y), the Jacobian matrix is:

J=(uxuyvxvy)J = \begin{pmatrix} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y} \\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y} \end{pmatrix}

Each row captures how one output variable depends on all inputs. Each column captures how all outputs depend on one input.

The Jacobian Determinant

The Jacobian determinant is:

det(J)=uxvyuyvx\det(J) = \frac{\partial u}{\partial x} \cdot \frac{\partial v}{\partial y} - \frac{\partial u}{\partial y} \cdot \frac{\partial v}{\partial x}

This determinant has a beautiful geometric interpretation:

  • det(J)>1|\det(J)| > 1: Local area expands
  • det(J)<1|\det(J)| < 1: Local area contracts
  • det(J)=1|\det(J)| = 1: Area-preserving transformation (like rotation)
  • det(J)=0\det(J) = 0: Transformation is singular (not invertible) at that point

Classic Example: Polar Coordinates

The transformation from polar (r,θ)(r, \theta) to Cartesian (x,y)(x, y):

x=rcos(θ),y=rsin(θ)x = r\cos(\theta), \quad y = r\sin(\theta)

The Jacobian matrix is:

J=(xrxθyryθ)=(cosθrsinθsinθrcosθ)J = \begin{pmatrix} \frac{\partial x}{\partial r} & \frac{\partial x}{\partial \theta} \\ \frac{\partial y}{\partial r} & \frac{\partial y}{\partial \theta} \end{pmatrix} = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}

The determinant:

det(J)=rcos2θ+rsin2θ=r\det(J) = r\cos^2\theta + r\sin^2\theta = r

Why dA = r dr d\u03b8

This explains why the area element in polar coordinates is dA=rdrdθdA = r\,dr\,d\theta. The factor rr is the Jacobian! As rr increases, arc segments at constant dθd\theta get longer, so infinitesimal rectangles have more area.


Interactive Exploration: Bivariate Transformations

This visualization shows how rectangular grids in the original space transform into curves in the new space. The color coding indicates the local Jacobian determinant—warmer colors mean more expansion.

\ud83c\udf102D Jacobian Transformation

Watch how a rectangular grid in the original space transforms into curves in the new space. The Jacobian determinant measures local area scaling at each point.

Polar to Cartesian
x = r·cos(θ), y = r·sin(θ)
Jacobian Determinant: |J| = r
Original Space (r, \u03b8) or (x, y)Transformed Space (u, v)T(x,y)rθuv
Jacobian |J|:
Small (compression)
Large (expansion)

\ud83d\udca1 The Jacobian Matrix and Determinant

For a 2D transformation (x, y) \u2192 (u, v), the Jacobian matrix is:

J = [\u2202u/\u2202x \u2202u/\u2202y; \u2202v/\u2202x \u2202v/\u2202y]

The Jacobian determinant |J| = \u2202u/\u2202x \u00b7 \u2202v/\u2202y - \u2202u/\u2202y \u00b7 \u2202v/\u2202x measures how an infinitesimal area element dA = dx\u00b7dy transforms:

dA' = |J| \u00b7 dA   \u21d2   f\u2090,\u1d65(u,v) = f\u2093,\u2099(x,y) / |J|

Common Transformations and Their Jacobians

Here are the most important transformations you'll encounter:

TransformationFormulaJacobianApplication
LinearY = aX + b|a|Standardization, Z-scores
ExponentialY = e^Xe^XLog-normal from normal
LogarithmY = ln(X)1/XNormal from log-normal
SquareY = X²2|X|Chi-square from normal
Polar to Cartesian(x,y) = (r cosθ, r sinθ)r2D integration, circular distributions
Box-MullerSee formula2π/xGenerating normal samples from uniform

The Box-Muller Transformation

This elegant transformation generates two independent standard normal random variables from two independent uniform random variables:

Z1=2lnU1cos(2πU2),Z2=2lnU1sin(2πU2)Z_1 = \sqrt{-2\ln U_1} \cos(2\pi U_2), \quad Z_2 = \sqrt{-2\ln U_1} \sin(2\pi U_2)

where U1,U2Uniform(0,1)U_1, U_2 \sim \text{Uniform}(0, 1). The Jacobian is 12πU1\frac{1}{2\pi U_1}, and the transformation produces Z1,Z2N(0,1)Z_1, Z_2 \sim N(0, 1) independently.

Why This Works

The uniform distribution on [0,1]2[0, 1]^2 has constant density. After transformation, the Jacobian and the structure of the map conspire to produce the bivariate standard normal. This is a favorite example in computational statistics.


AI/ML Applications: Why Deep Learning Engineers Need the Jacobian

"The Jacobian determinant is the key that unlocks exact likelihood computation in generative models."

1. Normalizing Flows

Normalizing flows are a class of generative models that transform a simple base distribution (usually Gaussian) into a complex target distribution through a sequence of invertible transformations.

The fundamental equation is:

logpθ(x)=logp0(z)+i=1KlogdetJi\log p_\theta(x) = \log p_0(z) + \sum_{i=1}^{K} \log |\det J_i|

where z=f1(x)z = f^{-1}(x) is the latent code and JiJ_i is the Jacobian of the ii-th transformation layer.

  • RealNVP: Uses coupling layers with triangular Jacobians (O(d) determinant)
  • GLOW: Adds 1x1 convolutions with O(d\u00b3) but cacheable Jacobians
  • Continuous Normalizing Flows: ODEs with trace estimation for Jacobians

2. Variational Autoencoders (VAEs)

In VAEs, the reparameterization trick implicitly uses the Jacobian. When we sample z=μ+σϵz = \mu + \sigma \odot \epsilon where ϵN(0,I)\epsilon \sim N(0, I):

logqϕ(zx)=logp(ϵ)ilogσi\log q_\phi(z|x) = \log p(\epsilon) - \sum_i \log \sigma_i

The ilogσi\sum_i \log \sigma_i term is the log-Jacobian of the affine transformation!

3. Change of Variables in Bayesian Inference

When transforming posterior distributions (e.g., from constrained to unconstrained parameters), the Jacobian ensures the prior/posterior transforms correctly:

p(θ)=p(ϕ)detJ,θ=g(ϕ)p(\theta) = p(\phi) \cdot |\det J|, \quad \theta = g(\phi)

This is essential for HMC and other MCMC methods that work in unconstrained spaces.

4. Density Estimation and Anomaly Detection

Neural density estimators (MAF, IAF, NSF) use the Jacobian to compute exact likelihoods:

  • Train by maximizing log-likelihood
  • Detect anomalies as low-likelihood points
  • Generate samples by inverting the flow

Computational Challenge

Computing det(J)\det(J) naively costs O(d3)O(d^3). Modern architectures (autoregressive, coupling layers) are designed so that the Jacobian is triangular, making the determinant O(d)O(d).

Normalizing Flows Demo: Jacobian in Action

This interactive demonstration shows how normalizing flows transform a simple Gaussian into a more complex distribution. Each flow layer warps the space, and the Jacobian determinant ensures we can still compute exact likelihoods.

\ud83c\udf0aNormalizing Flows: Jacobian in Deep Learning

Normalizing flows use the Jacobian to transform a simple distribution (like a Gaussian) into a complex target distribution while maintaining tractable likelihood computation.

1
Affine (Scale + Shift)
2
Planar Flow
3
Radial Flow
200

Distribution after 0 layers

Average Log-Likelihood

-2.433

The Jacobian determinant tracks how probability density changes through each transformation, enabling exact likelihood computation.

Change of Variables Formula

log p(x) = log p(z) + \u03a3 log|det(J\u1d62)|
where z = f\u207b\u00b9(x) and J\u1d62 is the Jacobian of layer i

Why This Matters for AI

  • \u2022 VAEs: Reparameterization trick uses Jacobian
  • \u2022 Diffusion Models: Score-based models rely on density estimation
  • \u2022 Generative Models: Exact likelihood training
  • \u2022 Density Estimation: Neural network probability distributions

\ud83d\udca1 The Power of Invertible Transformations

Normalizing flows are chains of invertible transformations with tractable Jacobian determinants. By stacking simple transformations (affine, planar, radial, coupling layers), we can model arbitrarily complex distributions while maintaining the ability to:

  1. Sample efficiently: Draw z ~ N(0,I), then compute x = f(z)
  2. Compute exact likelihood: log p(x) = log p(f\u207b\u00b9(x)) + log|det(J)|
  3. Train with MLE: Maximize log-likelihood directly

Python Implementation

Univariate Transformations

Univariate Jacobian Transformation
🐍jacobian_univariate.py
4The Core Formula

The Jacobian method transforms PDF f_X(x) to f_Y(y) using: f_Y(y) = f_X(g⁻¹(y)) / |g′(g⁻¹(y))|

20Inverse Function

To find the PDF at y, we first find which x value maps to y. This requires the inverse: x = g⁻¹(y).

23Jacobian Computation

The Jacobian |dg/dx| measures local stretching. We take absolute value since PDFs must be non-negative.

27Division by Jacobian

We DIVIDE by the Jacobian because where the transformation stretches space, probability density must decrease to conserve total probability.

37Two Branches for Y = X²

When Y = X², both x = +√y and x = -√y map to the same y. We must sum contributions from both branches.

44Chi-Square Distribution

This derivation proves that if X ~ N(0,1), then Y = X² ~ χ²(1). The factor 1/(2√y) is the Jacobian |dx/dy|.

57 lines without explanation
1import numpy as np
2from scipy import stats
3import matplotlib.pyplot as plt
4
5# The Jacobian Transformation Method for Y = g(X)
6# Given: X ~ f_X(x), find f_Y(y) where Y = g(X)
7
8def transform_pdf_univariate(x_pdf, g, g_inverse, g_derivative, y_values):
9    """
10    Transform a PDF using the Jacobian method.
11
12    Parameters:
13    - x_pdf: Original PDF function f_X(x)
14    - g: Transformation function Y = g(X)
15    - g_inverse: Inverse function X = g^{-1}(Y)
16    - g_derivative: Derivative dg/dx
17    - y_values: Y values at which to evaluate new PDF
18
19    Returns:
20    - PDF values for Y at each y_value
21    """
22    y_pdf = np.zeros_like(y_values, dtype=float)
23
24    for i, y in enumerate(y_values):
25        try:
26            # Step 1: Find x = g^{-1}(y)
27            x = g_inverse(y)
28
29            # Step 2: Compute Jacobian |dg/dx| at x
30            jacobian = np.abs(g_derivative(x))
31
32            # Step 3: Apply change of variables formula
33            # f_Y(y) = f_X(g^{-1}(y)) / |g'(g^{-1}(y))|
34            if jacobian > 1e-10:  # Avoid division by zero
35                y_pdf[i] = x_pdf(x) / jacobian
36        except:
37            y_pdf[i] = 0
38
39    return y_pdf
40
41# Example: X ~ N(0, 1), Y = X^2 gives Chi-square(1)
42x_pdf = lambda x: stats.norm.pdf(x, 0, 1)
43g = lambda x: x**2
44g_inverse = lambda y: np.sqrt(np.maximum(y, 0))
45g_derivative = lambda x: 2 * np.abs(x)
46
47# For Y = X^2, both +sqrt(y) and -sqrt(y) contribute
48def chi_square_pdf(y_values):
49    pdf = np.zeros_like(y_values, dtype=float)
50    for i, y in enumerate(y_values):
51        if y > 0:
52            x_pos = np.sqrt(y)
53            x_neg = -np.sqrt(y)
54            # Sum contributions from both branches
55            pdf[i] = (x_pdf(x_pos) + x_pdf(x_neg)) / (2 * np.sqrt(y))
56    return pdf
57
58# Compare with scipy's chi-square distribution
59y_vals = np.linspace(0.01, 6, 100)
60computed_pdf = chi_square_pdf(y_vals)
61true_pdf = stats.chi2.pdf(y_vals, df=1)
62
63print("Max difference from Chi-square(1):", np.max(np.abs(computed_pdf - true_pdf)))

Bivariate Transformations

Bivariate Jacobian Matrix
🐍jacobian_bivariate.py
5Bivariate Change of Variables

For 2D transformations, the Jacobian is a 2×2 matrix. The determinant |det(J)| replaces |dg/dx| from the univariate case.

12Jacobian Matrix Structure

Each entry J[i,j] is the partial derivative ∂uᵢ/∂xⱼ. It captures how each output changes with respect to each input.

16Numerical Differentiation

Central differences give O(h²) accuracy. For analytical Jacobians, use symbolic differentiation or derive by hand.

27Determinant as Area Scaling

|det(J)| measures how an infinitesimal area element dA transforms. It’s the ratio of transformed area to original area.

33Polar Coordinates Example

The classic example: (r, θ) → (x, y). The Jacobian |J| = r explains why dA = r dr dθ in polar coordinates.

49 lines without explanation
1import numpy as np
2from scipy import stats
3
4# The Jacobian Matrix Method for (U, V) = g(X, Y)
5# Key: f_{U,V}(u,v) = f_{X,Y}(g^{-1}(u,v)) / |det(J)|
6
7def jacobian_matrix(transform_funcs, x, y, h=1e-7):
8    """
9    Numerically compute the Jacobian matrix at (x, y).
10
11    J = [[du/dx, du/dy],
12         [dv/dx, dv/dy]]
13    """
14    u_func, v_func = transform_funcs
15
16    # Partial derivatives via central differences
17    du_dx = (u_func(x + h, y) - u_func(x - h, y)) / (2 * h)
18    du_dy = (u_func(x, y + h) - u_func(x, y - h)) / (2 * h)
19    dv_dx = (v_func(x + h, y) - v_func(x - h, y)) / (2 * h)
20    dv_dy = (v_func(x, y + h) - v_func(x, y - h)) / (2 * h)
21
22    J = np.array([[du_dx, du_dy],
23                  [dv_dx, dv_dy]])
24    return J
25
26def jacobian_determinant(J):
27    """Compute |det(J)|"""
28    return np.abs(np.linalg.det(J))
29
30# Example: Polar to Cartesian transformation
31# X = R * cos(Theta), Y = R * sin(Theta)
32
33def polar_to_cartesian(r, theta):
34    x = r * np.cos(theta)
35    y = r * np.sin(theta)
36    return x, y
37
38# The Jacobian for polar coordinates is |J| = r
39# This is why area element dA = r dr dθ in polar coords
40
41r_vals = np.linspace(0.1, 2, 5)
42theta_vals = np.linspace(0, np.pi/2, 5)
43
44for r in r_vals:
45    for theta in theta_vals:
46        u_func = lambda r_, t_, r=r, theta=theta: r_ * np.cos(t_)
47        v_func = lambda r_, t_, r=r, theta=theta: r_ * np.sin(t_)
48
49        J = jacobian_matrix((u_func, v_func), r, theta)
50        det_J = jacobian_determinant(J)
51
52print(f"r={r:.1f}, theta={theta:.2f}: |J| = {det_J:.4f}, r = {r:.4f}")
53
54# The Jacobian determinant equals r, confirming our formula!

Normalizing Flows in PyTorch

Normalizing Flows: Jacobian in Deep Learning
🐍normalizing_flow.py
5Normalizing Flows Core Idea

Flows transform simple distributions (Gaussian) into complex ones through invertible mappings. The Jacobian enables exact likelihood computation.

11Affine Flow: Simplest Case

Y = scale·X + shift is fully invertible with diagonal Jacobian. log|det(J)| = Σlog|scaleᵢ| is O(d) to compute.

20Log-Determinant Trick

We compute log|det(J)| directly to avoid numerical overflow. For diagonal Jacobians, it’s just sum of log-scales.

30Planar Flow: More Expressive

Planar flows bend space with tanh nonlinearity. They can model multimodal distributions while maintaining tractable O(d) Jacobian.

42Planar Jacobian Formula

The clever design ensures |det(J)| = |1 + uᵀψ| where ψ depends on tanh derivative. This is O(d), not O(d³)!

59Log-Probability via Change of Variables

log p(x) = log p(z) + Σ log|det(Jᵢ)|. This is the fundamental equation that makes flows trainable with MLE.

74 lines without explanation
1import numpy as np
2import torch
3import torch.nn as nn
4
5# Normalizing Flows: Learning Complex Distributions
6# via Invertible Transformations with Tractable Jacobians
7
8class AffineFlow(nn.Module):
9    """
10    Simplest flow: Y = scale * X + shift
11    Jacobian = diag(scale), log|det(J)| = sum(log|scale|)
12    """
13    def __init__(self, dim):
14        super().__init__()
15        self.log_scale = nn.Parameter(torch.zeros(dim))
16        self.shift = nn.Parameter(torch.zeros(dim))
17
18    def forward(self, x):
19        """Transform x -> y and compute log|det(J)|"""
20        y = torch.exp(self.log_scale) * x + self.shift
21        log_det = self.log_scale.sum()
22        return y, log_det
23
24    def inverse(self, y):
25        """Inverse: x = (y - shift) / scale"""
26        x = (y - self.shift) * torch.exp(-self.log_scale)
27        return x
28
29class PlanarFlow(nn.Module):
30    """
31    Planar flow: Y = X + u * tanh(w^T X + b)
32    More expressive but still tractable Jacobian
33    """
34    def __init__(self, dim):
35        super().__init__()
36        self.w = nn.Parameter(torch.randn(dim))
37        self.u = nn.Parameter(torch.randn(dim))
38        self.b = nn.Parameter(torch.zeros(1))
39
40    def forward(self, x):
41        activation = torch.tanh(x @ self.w + self.b)
42        y = x + self.u * activation.unsqueeze(-1)
43
44        # Jacobian determinant: |1 + u^T * psi|
45        # where psi = (1 - tanh^2(w^T x + b)) * w
46        psi = (1 - activation**2) * self.w
47        log_det = torch.log(torch.abs(1 + psi @ self.u) + 1e-8)
48
49        return y, log_det
50
51class NormalizingFlow(nn.Module):
52    """Stack of flow layers for complex distributions"""
53    def __init__(self, dim, n_layers=4):
54        super().__init__()
55        self.flows = nn.ModuleList([
56            PlanarFlow(dim) for _ in range(n_layers)
57        ])
58        self.prior = torch.distributions.Normal(0, 1)
59
60    def log_prob(self, x):
61        """Compute log p(x) using change of variables"""
62        log_det_sum = 0
63        z = x
64
65        # Inverse through all layers
66        for flow in reversed(self.flows):
67            # In practice, use inverse flows or coupling layers
68            pass
69
70        # log p(x) = log p(z) + sum of log|det(J)|
71        log_prior = self.prior.log_prob(z).sum(-1)
72        return log_prior + log_det_sum
73
74    def sample(self, n_samples):
75        """Generate samples by transforming prior"""
76        z = self.prior.sample((n_samples, 2))
77        x = z
78        for flow in self.flows:
79            x, _ = flow(x)
80        return x

Common Pitfalls and Misconceptions

Pitfall 1: Forgetting the Absolute Value

The Jacobian in the PDF formula must be the absolute value of the derivative/determinant. PDFs cannot be negative, regardless of whether the transformation is increasing or decreasing.

\u274c f_Y(y) = f_X(x) \u00b7 g'(x)
\u2705 f_Y(y) = f_X(x) \u00b7 |g'(x)|\u207b\u00b9

Pitfall 2: Missing Branches for Non-Monotonic Transformations

For transformations like Y=X2Y = X^2, both +y+\sqrt{y} and y-\sqrt{y} map to the same yy. You must sum contributions from all inverse branches.

Pitfall 3: Confusing Jacobian Directions

There are two equivalent formulations:

  • fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \cdot |\frac{d}{dy}g^{-1}(y)| (Jacobian of inverse)
  • f_Y(y) = \frac{f_X(g^{-1}(y))}{|g&apos;(g^{-1}(y))|} (reciprocal of Jacobian of forward)

These are equivalent but look different. Be consistent!

Pitfall 4: Forgetting the Support

The support (domain where PDF is nonzero) changes under transformation. If XRX \in \mathbb{R} and Y=eXY = e^X, then Y(0,)Y \in (0, \infty). Always specify the new support.

Pitfall 5: Singular Jacobians

If det(J)=0\det(J) = 0 at some point, the transformation is not locally invertible there. The PDF formula breaks down. This happens at:

  • Critical points of the transformation (where g&apos;(x) = 0)
  • Folding points in multi-dimensional maps

Summary: What You've Mastered

You now have a deep understanding of one of the most powerful tools in probability theory. Let's recap the key insights:

Core Concepts

  • The Jacobian measures local stretching/compression of space under transformation
  • Probability conservation requires dividing by the Jacobian: f_Y(y) = f_X(g^{-1}(y)) / |g&apos;|
  • Non-monotonic functions require summing contributions from all inverse branches
  • Multivariate case uses the Jacobian matrix and its determinant
  • Computational efficiency in ML comes from designing transformations with tractable Jacobians

Practical Skills

  • Derive PDFs for transformed random variables using the change of variables formula
  • Compute Jacobian matrices and determinants numerically and analytically
  • Understand why polar coordinates have area element dA=rdrdθdA = r\,dr\,d\theta
  • Implement Jacobian transformations in Python/PyTorch
  • Design invertible neural networks with efficient Jacobian computation

AI/ML Connections

  • Normalizing Flows: Chain of invertible transforms with tractable Jacobians
  • VAEs: Reparameterization trick uses affine Jacobians
  • Bayesian Inference: Parameter transformations require Jacobian corrections
  • Density Estimation: Neural networks + Jacobians = exact likelihoods

The Big Picture

The Jacobian is the mathematical bridge that allows us to transform probability distributions while preserving their essential properties. Whether you're deriving the chi-square distribution from the normal, generating samples with the Box-Muller method, or training a state-of-the-art generative model, you're leveraging the same fundamental principle: probability must be conserved, and the Jacobian tells us how to redistribute it.

Next Steps

In the following sections, we'll apply the Jacobian method to:

  1. Sums of Random Variables: Derive the distribution of Z=X+YZ = X + Y
  2. Order Statistics: Find distributions of min, max, and k-th order statistics
  3. Convolutions: Understand the convolution theorem and its applications
Loading comments...