Chapter 4: Multivariate Derivatives
“In one dimension, the derivative is a number. In n dimensions, it’s a vector, a matrix, or a tensor.”
Connection to the Central Thesis
Most real systems are multivariate. A robot arm has joint angles. A neural network has millions of weights. A climate model has temperature, pressure, and humidity at every grid point.
The governing dynamics are: $\(\frac{d\mathbf{x}}{dt} = \mathbf{f}(\mathbf{x}, t)\)$
where x ∈ ℝⁿ is the state vector. To understand this system from data, we need:
Gradients: How does a scalar output depend on vector input?
Jacobians: How does a vector output depend on vector input?
Hessians: What is the curvature of the landscape?
The approximation paradigm extends naturally: fit a multivariate surrogate, then differentiate it. But the curse of dimensionality makes this harder—we need exponentially more data to cover high-dimensional spaces.
PyDelt addresses this with separable fitting strategies and neural network surrogates that scale to high dimensions.
From Scalar to Vector
Your calculus course covered f: ℝ → ℝ. Real problems involve:
f: ℝⁿ → ℝ (scalar field, e.g., loss function)
f: ℝⁿ → ℝᵐ (vector field, e.g., neural network layer)
f: ℝⁿ → ℝᵐˣᵖ (tensor field)
Each requires different derivative objects.
Partial Derivatives
Definition
The partial derivative of f(x₁, …, xₙ) with respect to xᵢ:
Hold all other variables fixed, differentiate with respect to one.
Notation
Notation |
Meaning |
|---|---|
∂f/∂xᵢ |
Partial derivative |
fₓᵢ |
Subscript notation |
∂ᵢf |
Index notation |
Dᵢf |
Operator notation |
Computing from Data
For discrete data, use finite differences along each dimension:
import numpy as np
def partial_derivative(f_values, grid, dim, order=1):
"""
Compute partial derivative along dimension dim.
f_values: n-dimensional array of function values
grid: list of 1D arrays for each dimension
dim: which dimension to differentiate
"""
h = grid[dim][1] - grid[dim][0] # Assume uniform
# Central difference along specified axis
return np.gradient(f_values, h, axis=dim)
The Gradient
Definition
For f: ℝⁿ → ℝ, the gradient is the vector of partial derivatives:
Geometric Interpretation
Direction: Points toward steepest ascent
Magnitude: Rate of change in that direction
Perpendicular to level sets (contours)
The Gradient in ML
Gradient descent: move opposite to gradient to minimize loss.
Computing Gradients from Data
from pydelt.multivariate import MultivariateDerivatives
from pydelt.interpolation import SplineInterpolator
import numpy as np
# Generate 2D data
np.random.seed(42)
n_points = 500
x = np.random.randn(n_points, 2) # 2D input
y = x[:, 0]**2 + x[:, 1]**2 # f(x,y) = x² + y²
# Fit and compute gradient
mv = MultivariateDerivatives(SplineInterpolator, smoothing=0.1)
mv.fit(x, y)
gradient_func = mv.gradient()
# Evaluate gradient at test points
test_points = np.array([[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]])
gradients = gradient_func(test_points)
# For f = x² + y², gradient = (2x, 2y)
print("Computed gradients:")
print(gradients)
print("\nTrue gradients:")
print(2 * test_points)
The Jacobian
Definition
For f: ℝⁿ → ℝᵐ, the Jacobian is the m×n matrix of partial derivatives:
Row i contains the gradient of fᵢ.
Interpretation
The Jacobian is the best linear approximation to f near a point:
The Jacobian in ML
Backpropagation: Chain rule with Jacobians
Normalizing flows: Jacobian determinant for density transformation
Sensitivity analysis: How outputs depend on inputs
Computing Jacobians from Data
# Vector-valued function: f(x,y) = (x+y, x*y)
x = np.random.randn(500, 2)
y = np.column_stack([x[:, 0] + x[:, 1], x[:, 0] * x[:, 1]])
mv = MultivariateDerivatives(SplineInterpolator, smoothing=0.1)
mv.fit(x, y)
jacobian_func = mv.jacobian()
# Evaluate at a point
point = np.array([[1.0, 2.0]])
J = jacobian_func(point)
# True Jacobian at (1, 2):
# [[1, 1], [2, 1]] (df1/dx=1, df1/dy=1, df2/dx=y=2, df2/dy=x=1)
print(f"Computed Jacobian:\n{J[0]}")
The Hessian
Definition
For f: ℝⁿ → ℝ, the Hessian is the n×n matrix of second partial derivatives:
For smooth functions, the Hessian is symmetric (mixed partials are equal).
Interpretation
The Hessian captures curvature:
Eigenvalues > 0: Convex (bowl-shaped)
Eigenvalues < 0: Concave (dome-shaped)
Mixed signs: Saddle point
The Hessian in ML
Newton’s method: Uses H⁻¹∇f for faster convergence
Second-order optimization: Adam, L-BFGS approximate Hessian
Loss landscape analysis: Characterize critical points
Critical Point Classification
At a critical point (∇f = 0):
Hessian Eigenvalues |
Classification |
|---|---|
All positive |
Local minimum |
All negative |
Local maximum |
Mixed signs |
Saddle point |
Some zero |
Degenerate (need higher order) |
In high dimensions, saddle points dominate. For a random function in n dimensions, the probability of a local minimum vs. saddle point decreases exponentially with n.
Computing Hessians from Data
# Scalar function
x = np.random.randn(500, 2)
y = x[:, 0]**2 + 2*x[:, 1]**2 # f = x² + 2y²
mv = MultivariateDerivatives(SplineInterpolator, smoothing=0.1)
mv.fit(x, y)
hessian_func = mv.hessian()
# Evaluate at origin
point = np.array([[0.0, 0.0]])
H = hessian_func(point)
# True Hessian: [[2, 0], [0, 4]]
print(f"Computed Hessian:\n{H[0]}")
The Laplacian
Definition
The Laplacian is the trace of the Hessian (sum of second derivatives):
Interpretation
Diffusion: Heat equation ∂u/∂t = α∇²u
Smoothness: Laplacian measures deviation from local average
Graph Laplacian: Discrete analog on networks
The Laplacian in ML
Graph neural networks: Message passing uses graph Laplacian
Regularization: Laplacian smoothing
Physics-informed ML: PDEs often involve Laplacian
laplacian_func = mv.laplacian()
lap = laplacian_func(point)
# For f = x² + 2y², Laplacian = 2 + 4 = 6
print(f"Computed Laplacian: {lap[0]}")
Directional Derivatives
Definition
The derivative of f in direction v (unit vector):
Interpretation
Rate of change of f as you move in direction v.
Maximum Rate of Change
The gradient direction gives maximum rate of change:
Direction: v = ∇f / ‖∇f‖
Rate: ‖∇f‖
The Chain Rule in Multiple Dimensions
Scalar Composition
If h(t) = f(g(t)) where g: ℝ → ℝⁿ and f: ℝⁿ → ℝ:
Vector Composition
If h = f ∘ g where g: ℝᵖ → ℝⁿ and f: ℝⁿ → ℝᵐ:
This is backpropagation: multiply Jacobians in reverse order.
Numerical Challenges
Curse of Dimensionality
To estimate gradients in n dimensions with k points per dimension:
Total points needed: kⁿ
For n=10, k=10: 10¹⁰ points!
Solutions:
Sparse sampling + interpolation
Automatic differentiation (if you have the function)
Dimensionality reduction
Mixed Partial Derivatives
Computing ∂²f/∂x∂y from data is hard:
Requires 2D local fitting
Noise amplifies twice
Most methods approximate as zero
PyDelt’s MultivariateDerivatives computes diagonal Hessian elements (∂²f/∂xᵢ²) but approximates mixed partials.
Boundary Effects
Gradients near the boundary of the data domain are unreliable:
Fewer neighbors for local fitting
Extrapolation artifacts
Consider trimming boundary regions
PyDelt’s Multivariate Interface
from pydelt.multivariate import MultivariateDerivatives
from pydelt.interpolation import SplineInterpolator, LlaInterpolator
# Choose any interpolator
mv = MultivariateDerivatives(
interpolator_class=SplineInterpolator,
smoothing=0.1
)
# Fit to data
mv.fit(input_data, output_data)
# Get derivative functions
gradient = mv.gradient() # For scalar outputs
jacobian = mv.jacobian() # For vector outputs
hessian = mv.hessian() # Second derivatives
laplacian = mv.laplacian() # Trace of Hessian
# Evaluate at query points
grad_values = gradient(query_points)
Limitations
Separable fitting: Fits 1D interpolators for each input-output pair
Mixed partials: Approximated as zero
High dimensions: Accuracy degrades with dimension
Data density: Needs sufficient coverage of input space
For exact multivariate derivatives, use automatic differentiation with neural networks.
Key Takeaways
Gradient (∇f): Direction of steepest ascent, used in optimization
Jacobian (J): Matrix of partials, used in backprop and flows
Hessian (H): Curvature matrix, classifies critical points
Laplacian (∇²f): Sum of second derivatives, appears in PDEs
Chain rule: Jacobian multiplication, foundation of backprop
Curse of dimensionality: High-D derivatives are expensive
Exercises
Gradient field: For f(x,y) = sin(x)cos(y), compute and plot the gradient field. Verify gradients point perpendicular to contours.
Saddle point: For f(x,y) = x² - y², compute the Hessian at (0,0). Verify it’s a saddle point (mixed eigenvalue signs).
Jacobian of rotation: For f(x,y) = (x cos θ - y sin θ, x sin θ + y cos θ), compute the Jacobian. Verify det(J) = 1 (area-preserving).
Laplacian smoothing: Generate a noisy 2D image. Apply Laplacian smoothing: u_new = u + α∇²u. Observe the effect.
Previous: ← Interpolation Methods | Next: Approximation Theory →