Chapter 3: Interpolation Methods

“To differentiate discrete data, first make it continuous. The choice of interpolation method determines everything.”

Connection to the Central Thesis

This chapter is the heart of the approximation paradigm. Interpolation creates the differentiable surrogate that makes everything else possible.

The true system evolves according to unknown dynamics. We observe discrete samples. By fitting an interpolant, we construct a continuous, differentiable function that approximates the true trajectory. This surrogate:

  • Exists everywhere in the domain (not just at data points)

  • Has well-defined derivatives (by construction)

  • Encodes smoothness assumptions about the true system

  • Yields to standard calculus tools

The choice of interpolation method is the most consequential decision in numerical differentiation. It determines what class of functions you’re searching over, what smoothness you assume, and what errors you’ll incur.

The Interpolation-Differentiation Pipeline

The standard approach to numerical differentiation:

  1. Fit a continuous function f̂(x) to discrete data (xᵢ, yᵢ)

  2. Differentiate f̂(x) analytically or numerically

  3. Evaluate f̂’(x) at desired points

The interpolation method determines:

  • Smoothness of derivatives

  • Bias-variance tradeoff

  • Computational cost

  • Behavior at boundaries

Polynomial Interpolation

Lagrange Interpolation

Given n+1 points, there’s a unique polynomial of degree n passing through all of them:

\[p(x) = \sum_{i=0}^{n} y_i \prod_{j \neq i} \frac{x - x_j}{x_i - x_j}\]

Problem: High-degree polynomials oscillate wildly (Runge’s phenomenon).

import numpy as np
import matplotlib.pyplot as plt

# Runge's function
f = lambda x: 1 / (1 + 25*x**2)

# Interpolate with increasing degree
x = np.linspace(-1, 1, 1000)
for n in [5, 10, 15]:
    xi = np.linspace(-1, 1, n+1)
    yi = f(xi)
    
    # Lagrange interpolation
    from numpy.polynomial import polynomial as P
    coeffs = np.polyfit(xi, yi, n)
    p = np.polyval(coeffs, x)
    
    plt.plot(x, p, label=f'degree {n}')

plt.plot(x, f(x), 'k--', label='true', linewidth=2)
plt.ylim(-1, 2)
plt.legend()
plt.title("Runge's Phenomenon")
plt.show()

Lesson: Global polynomial interpolation is unstable. Use piecewise methods instead.

Spline Interpolation

What is a Spline?

A spline of degree k is a piecewise polynomial where:

  • Each piece is degree k

  • Function is Cᵏ⁻¹ continuous (k-1 continuous derivatives)

  • Pieces join at knots

Cubic Splines (k=3)

Most common choice. Between knots xᵢ and xᵢ₊₁:

\[S_i(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x-x_i)^3\]

Constraints:

  • Interpolation: S(xᵢ) = yᵢ

  • C¹ continuity: S’ᵢ(xᵢ₊₁) = S’ᵢ₊₁(xᵢ₊₁)

  • C² continuity: S’’ᵢ(xᵢ₊₁) = S’’ᵢ₊₁(xᵢ₊₁)

  • Boundary conditions (natural, clamped, not-a-knot)

Natural Cubic Splines

Boundary condition: S’’(x₀) = S’’(xₙ) = 0

This minimizes the “total curvature”:

\[\int_{x_0}^{x_n} (S''(x))^2 dx\]

among all C² interpolants.

Smoothing Splines

Don’t interpolate exactly—balance fit and smoothness:

\[\min_f \sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \int (f''(x))^2 dx\]

The solution is a natural cubic spline with knots at all data points.

from scipy.interpolate import UnivariateSpline
import numpy as np

t = np.linspace(0, 2*np.pi, 100)
y = np.sin(t) + 0.1 * np.random.randn(100)

# Interpolating spline (s=0)
spline_interp = UnivariateSpline(t, y, s=0)

# Smoothing spline
spline_smooth = UnivariateSpline(t, y, s=1.0)

# Derivatives are analytical
dy_smooth = spline_smooth.derivative()(t)
d2y_smooth = spline_smooth.derivative(n=2)(t)

B-Splines

Splines represented in the B-spline basis:

\[S(x) = \sum_{i} c_i B_{i,k}(x)\]

where Bᵢ,ₖ are B-spline basis functions of degree k.

Advantages:

  • Numerically stable

  • Local support (changing one coefficient affects limited region)

  • Efficient algorithms (de Boor’s algorithm)

from scipy.interpolate import BSpline, make_interp_spline

# Create B-spline interpolant
spline = make_interp_spline(t, y, k=3)

# Evaluate and differentiate
y_interp = spline(t_new)
dy_interp = spline.derivative()(t_new)

Local Polynomial Regression

The Idea

Instead of fitting one global function, fit local polynomials around each query point.

At query point x₀:

  1. Select nearby data points

  2. Fit weighted polynomial (weights decrease with distance)

  3. Use polynomial value/derivative at x₀

LOWESS/LOESS

LOcally Weighted Scatterplot Smoothing:

\[\min_{\beta} \sum_{i=1}^n w_i(x_0) (y_i - \beta_0 - \beta_1(x_i - x_0) - ...)^2\]

where weights w_i(x₀) decrease with |xᵢ - x₀|.

Common weight function (tricube):

\[\begin{split}w(u) = \begin{cases} (1 - |u|^3)^3 & |u| < 1 \\ 0 & |u| \geq 1 \end{cases}\end{split}\]
from pydelt.interpolation import LowessInterpolator

# LOWESS with 10% of data in each local fit
lowess = LowessInterpolator(frac=0.1)
lowess.fit(t, y)

# Smooth values and derivatives
y_smooth = lowess(t)
dy_smooth = lowess.differentiate(order=1)(t)

Local Linear Approximation (LLA)

PyDelt’s LLA fits local polynomials with explicit derivative estimation:

from pydelt.interpolation import LlaInterpolator

# Window-based local fitting
lla = LlaInterpolator(window_size=11, poly_order=2)
lla.fit(t, y)

# First and second derivatives
dy = lla.differentiate(order=1)(t)
d2y = lla.differentiate(order=2)(t)

Comparison: Global vs. Local

Aspect

Global (Splines)

Local (LOWESS/LLA)

Smoothness

C² everywhere

May have discontinuities

Adaptivity

Same smoothing everywhere

Can adapt to local features

Computation

O(n) after setup

O(n²) naive, O(n log n) with trees

Extrapolation

Dangerous

Very dangerous

Outlier robustness

Poor

Good (with robust weights)

Kernel Methods

Kernel Smoothing

Generalization of local averaging:

\[\hat{f}(x) = \frac{\sum_i K_h(x - x_i) y_i}{\sum_i K_h(x - x_i)}\]

where Kₕ(u) = K(u/h)/h is a scaled kernel.

Common Kernels

Kernel

Formula

Properties

Gaussian

exp(-u²/2)

Infinitely smooth

Epanechnikov

(3/4)(1-u²) for

u

Tricube

(1-

u

Uniform

1 for

u

Bandwidth Selection

The bandwidth h controls bias-variance:

  • Small h: Low bias, high variance

  • Large h: High bias, low variance

Optimal bandwidth (for MSE) scales as n^(-1/5) for kernel regression.

Radial Basis Functions (RBF)

The Approach

Represent the function as:

\[f(x) = \sum_{i=1}^n c_i \phi(\|x - x_i\|)\]

where φ is a radial basis function.

Common RBFs

RBF

Formula

Properties

Gaussian

exp(-r²/ε²)

Infinitely smooth

Multiquadric

√(1 + (εr)²)

Smooth, good for interpolation

Thin plate spline

r² log(r)

Minimizes bending energy

Polyharmonic

rᵏ or rᵏ log(r)

Generalizes thin plate

Interpolation with RBFs

Solve the linear system:

\[\begin{split}\begin{pmatrix} \Phi & P \\ P^T & 0 \end{pmatrix} \begin{pmatrix} c \\ \lambda \end{pmatrix} = \begin{pmatrix} y \\ 0 \end{pmatrix}\end{split}\]

where Φᵢⱼ = φ(‖xᵢ - xⱼ‖) and P contains polynomial terms.

from scipy.interpolate import RBFInterpolator
import numpy as np

# 1D example
x = np.linspace(0, 2*np.pi, 20).reshape(-1, 1)
y = np.sin(x.flatten())

# RBF interpolation
rbf = RBFInterpolator(x, y, kernel='thin_plate_spline')

# Evaluate
x_new = np.linspace(0, 2*np.pi, 100).reshape(-1, 1)
y_new = rbf(x_new)

Gaussian Process Regression

Probabilistic Interpolation

Model the function as a Gaussian process:

\[f(x) \sim \mathcal{GP}(m(x), k(x, x'))\]

where m(x) is the mean function and k(x, x’) is the covariance kernel.

Posterior Mean and Variance

Given data (X, y), the posterior at new point x* is:

\[f(x^*) | X, y \sim \mathcal{N}(\mu^*, \sigma^{*2})\]

where: $\(\mu^* = k(x^*, X) [K + \sigma_n^2 I]^{-1} y\)\( \)\(\sigma^{*2} = k(x^*, x^*) - k(x^*, X) [K + \sigma_n^2 I]^{-1} k(X, x^*)\)$

Derivatives of GPs

If f ~ GP, then f’ ~ GP with:

  • Mean: derivative of posterior mean

  • Covariance: derivative of kernel

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, WhiteKernel

# GP regression
kernel = RBF(length_scale=1.0) + WhiteKernel(noise_level=0.1)
gp = GaussianProcessRegressor(kernel=kernel)
gp.fit(t.reshape(-1, 1), y)

# Predictions with uncertainty
y_pred, y_std = gp.predict(t_new.reshape(-1, 1), return_std=True)

PyDelt’s Interpolation Classes

Unified Interface

All interpolators share the same API:

from pydelt.interpolation import (
    SplineInterpolator,
    LlaInterpolator,
    GllaInterpolator,
    LowessInterpolator,
    LoessInterpolator,
    FdaInterpolator,
    NeuralNetworkInterpolator
)

# Same pattern for all
interp = SplineInterpolator(smoothing=0.1)
interp.fit(t, y)

# Evaluate
y_smooth = interp(t_query)

# Differentiate
dy = interp.differentiate(order=1)(t_query)
d2y = interp.differentiate(order=2)(t_query)

Method Comparison

Method

Best For

Smoothing Control

Derivative Quality

Spline

Smooth functions

smoothing parameter

Excellent (analytical)

LLA

Local features

window_size

Good (polynomial)

GLLA

Noisy + local

embedding, n

Good

LOWESS

Outliers

frac

Moderate (numerical)

FDA

Functional data

Basis functions

Excellent

Neural

Complex patterns

Architecture

Exact (autodiff)

Choosing an Interpolation Method

Decision Tree

  1. Is data very noisy?

    • Yes → Use smoothing spline or LOWESS

    • No → Interpolating spline may work

  2. Are there outliers?

    • Yes → LOWESS/LOESS (robust)

    • No → Splines are fine

  3. Do you need derivatives at non-data points?

    • Yes → Splines or GP (continuous derivatives)

    • No → Local methods are fine

  4. Is the function highly nonlinear/complex?

    • Yes → Neural network or GP

    • No → Splines or local polynomials

  5. Is computational speed critical?

    • Yes → Splines (O(n) after setup)

    • No → Any method

Key Takeaways

  1. Avoid high-degree global polynomials (Runge’s phenomenon)

  2. Splines are the workhorse: piecewise polynomials with continuity

  3. Local methods (LOWESS, LLA) adapt to local structure

  4. Smoothing parameter controls bias-variance tradeoff

  5. PyDelt provides unified interface across all methods

Exercises

  1. Runge’s phenomenon: Interpolate 1/(1+25x²) with polynomials of degree 5, 10, 15, 20. Then use cubic splines. Compare.

  2. Spline vs. LOWESS: Generate sin(x) + outliers. Compare derivative estimates from smoothing spline vs. LOWESS.

  3. Bandwidth selection: Implement cross-validation for LOWESS bandwidth. Find optimal for sin(x) + noise.

  4. GP derivatives: Use sklearn’s GP to fit noisy data. Extract derivative by finite-differencing the posterior mean. Compare to analytical GP derivative.


Previous: ← Noise and Smoothing | Next: Multivariate Derivatives →