Chapter 1: Numerical Differentiation

“The derivative is a limit. In practice, we can’t take limits—we can only approximate them.”

Connection to the Central Thesis

Recall our core problem: we observe a system governed by unknown dynamics dx/dt = f(x,t), but we only have discrete, noisy samples. We cannot compute the exact derivative—we can only approximate it.

Finite differences are the simplest approximation. They work directly on the data without fitting a surrogate. This makes them fast and assumption-free, but also fragile: they amplify noise and require careful step size selection.

Understanding finite differences is essential because:

  1. They reveal why differentiation is hard (ill-posed inverse problem)

  2. They establish error analysis techniques used throughout

  3. They motivate why we need smoothing and interpolation

The Fundamental Problem

You have discrete samples (tᵢ, yᵢ) and want to estimate dy/dt. The textbook definition:

\[f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}\]

is useless because:

  1. You can’t take h → 0 with finite precision arithmetic

  2. Your data has gaps—you can’t evaluate f at arbitrary points

  3. Your measurements have noise

This chapter covers the classical approaches and their limitations.

Finite Difference Formulas

Forward, Backward, and Central Differences

Given evenly-spaced data with step h:

Formula

Expression

Error Order

Forward

(f(x+h) - f(x)) / h

O(h)

Backward

(f(x) - f(x-h)) / h

O(h)

Central

(f(x+h) - f(x-h)) / 2h

O(h²)

The central difference is more accurate because errors cancel symmetrically.

Derivation via Taylor Series

Expand f(x+h) and f(x-h) around x:

\[f(x+h) = f(x) + hf'(x) + \frac{h^2}{2}f''(x) + \frac{h^3}{6}f'''(x) + O(h^4)\]
\[f(x-h) = f(x) - hf'(x) + \frac{h^2}{2}f''(x) - \frac{h^3}{6}f'''(x) + O(h^4)\]

Subtracting:

\[f(x+h) - f(x-h) = 2hf'(x) + \frac{h^3}{3}f'''(x) + O(h^5)\]

So:

\[\frac{f(x+h) - f(x-h)}{2h} = f'(x) + \frac{h^2}{6}f'''(x) + O(h^4)\]

The error is O(h²), not O(h).

Higher-Order Formulas

More points = higher accuracy:

Five-point stencil for f’(x): $\(f'(x) \approx \frac{-f(x+2h) + 8f(x+h) - 8f(x-h) + f(x-2h)}{12h} + O(h^4)\)$

Second derivative (central): $\(f''(x) \approx \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} + O(h^2)\)$

Implementation

import numpy as np

def finite_diff_derivative(y, h, order=1, accuracy=2):
    """
    Compute derivative using finite differences.
    
    Parameters:
        y: array of function values
        h: step size (assumed uniform)
        order: derivative order (1 or 2)
        accuracy: 2 or 4 (determines stencil width)
    """
    n = len(y)
    dy = np.zeros(n)
    
    if order == 1 and accuracy == 2:
        # Central difference O(h²)
        dy[1:-1] = (y[2:] - y[:-2]) / (2*h)
        dy[0] = (y[1] - y[0]) / h  # Forward at boundary
        dy[-1] = (y[-1] - y[-2]) / h  # Backward at boundary
        
    elif order == 1 and accuracy == 4:
        # Five-point stencil O(h⁴)
        dy[2:-2] = (-y[4:] + 8*y[3:-1] - 8*y[1:-3] + y[:-4]) / (12*h)
        # Fall back to lower order at boundaries
        dy[0] = (y[1] - y[0]) / h
        dy[1] = (y[2] - y[0]) / (2*h)
        dy[-2] = (y[-1] - y[-3]) / (2*h)
        dy[-1] = (y[-1] - y[-2]) / h
        
    elif order == 2 and accuracy == 2:
        # Second derivative O(h²)
        dy[1:-1] = (y[2:] - 2*y[1:-1] + y[:-2]) / h**2
        dy[0] = dy[1]  # Extrapolate at boundaries
        dy[-1] = dy[-2]
        
    return dy

The Two Sources of Error

Truncation Error

From dropping higher-order Taylor terms. For central difference:

\[\epsilon_{\text{trunc}} \approx \frac{h^2}{6} |f'''(x)|\]

Decreases as h → 0.

Rounding Error

From finite precision arithmetic. When subtracting nearly-equal numbers:

\[\epsilon_{\text{round}} \approx \frac{\epsilon_{\text{machine}} |f(x)|}{h}\]

where ε_machine ≈ 10⁻¹⁶ for float64.

Increases as h → 0.

The Optimal Step Size

Total error ≈ truncation + rounding:

\[\epsilon_{\text{total}} \approx \frac{h^2}{6}|f'''| + \frac{\epsilon_{\text{machine}}|f|}{h}\]

Minimize by taking derivative with respect to h and setting to zero:

\[h_{\text{opt}} \approx \left(\frac{3\epsilon_{\text{machine}}|f|}{|f'''|}\right)^{1/3}\]

For typical functions and float64: h_opt ≈ 10⁻⁵ to 10⁻⁶.

import numpy as np
import matplotlib.pyplot as plt

def test_optimal_h():
    """Demonstrate optimal step size."""
    f = np.sin
    f_prime = np.cos
    x = 1.0
    
    h_values = np.logspace(-15, 0, 100)
    errors = []
    
    for h in h_values:
        approx = (f(x + h) - f(x - h)) / (2 * h)
        true = f_prime(x)
        errors.append(abs(approx - true))
    
    plt.loglog(h_values, errors)
    plt.xlabel('Step size h')
    plt.ylabel('Absolute error')
    plt.title('Finite Difference Error vs Step Size')
    plt.axvline(1e-5, color='r', linestyle='--', label='Optimal h ≈ 10⁻⁵')
    plt.legend()
    plt.show()
    
    # Find optimal
    opt_idx = np.argmin(errors)
    print(f"Optimal h: {h_values[opt_idx]:.2e}, Error: {errors[opt_idx]:.2e}")

The Noise Amplification Problem

With noisy data y = f(x) + ε where ε ~ N(0, σ²):

\[\frac{y(x+h) - y(x-h)}{2h} = f'(x) + \frac{\epsilon(x+h) - \epsilon(x-h)}{2h}\]

The noise term has variance:

\[\text{Var}\left(\frac{\epsilon(x+h) - \epsilon(x-h)}{2h}\right) = \frac{2\sigma^2}{4h^2} = \frac{\sigma^2}{2h^2}\]

Noise in derivative scales as σ/h. Smaller h = more noise amplification.

Demonstration

import numpy as np
from pydelt.interpolation import SplineInterpolator

# True function and derivative
t = np.linspace(0, 2*np.pi, 200)
y_true = np.sin(t)
dy_true = np.cos(t)

# Add noise
noise_level = 0.02
y_noisy = y_true + noise_level * np.random.randn(len(t))

# Finite difference (naive)
h = t[1] - t[0]
dy_fd = np.gradient(y_noisy, h)

# Compare noise levels
print(f"Signal noise std: {np.std(y_noisy - y_true):.4f}")
print(f"Derivative noise std: {np.std(dy_fd - dy_true):.4f}")
print(f"Amplification factor: {np.std(dy_fd - dy_true) / np.std(y_noisy - y_true):.1f}x")

Irregular Grids

Real data often has non-uniform spacing. The formulas change:

Two-Point (Non-uniform)

\[f'(x_i) \approx \frac{f(x_{i+1}) - f(x_{i-1})}{x_{i+1} - x_{i-1}}\]

Three-Point (Non-uniform)

For points x₀, x₁, x₂ with spacings h₁ = x₁ - x₀ and h₂ = x₂ - x₁:

\[f'(x_1) \approx \frac{h_1^2 f(x_2) - h_2^2 f(x_0) + (h_2^2 - h_1^2) f(x_1)}{h_1 h_2 (h_1 + h_2)}\]

Fornberg’s Algorithm

For arbitrary stencils, use Fornberg’s algorithm to compute optimal weights:

def fornberg_weights(z, x, m):
    """
    Compute finite difference weights for derivative of order m
    at point z using function values at points x.
    
    Based on Fornberg (1988).
    """
    n = len(x) - 1
    c = np.zeros((n+1, m+1))
    c[0, 0] = 1
    c1 = 1
    
    for i in range(1, n+1):
        mn = min(i, m)
        c2 = 1
        for j in range(i):
            c3 = x[i] - x[j]
            c2 *= c3
            if i <= m:
                c[i-1, i] = 0
            for k in range(mn, -1, -1):
                c[i, k] = c1 * (k * c[i-1, k-1] - (x[i-1] - z) * c[i-1, k]) / c2
                c[j, k] = ((x[i] - z) * c[j, k] - k * c[j, k-1]) / c3
        c1 = c2
    
    return c[:, m]

PyDelt’s Finite Difference Methods

PyDelt provides robust finite difference implementations:

from pydelt import finite_difference

# Basic usage
t = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * t)

# First derivative
dy = finite_difference(t, y, order=1)

# Second derivative
d2y = finite_difference(t, y, order=2)

# With specified accuracy
dy_high = finite_difference(t, y, order=1, accuracy=4)

When Finite Differences Fail

Finite differences are inadequate when:

  1. Noise level is significant (σ/h becomes large)

  2. Data is sparse (can’t use higher-order stencils)

  3. Function has sharp features (truncation error dominates)

  4. You need derivatives at non-sample points

The solution: smoothing and interpolation, covered in the next chapters.

Key Takeaways

  1. Central differences are O(h²), better than forward/backward O(h)

  2. Optimal h balances truncation and rounding error (~10⁻⁵ for float64)

  3. Noise amplifies as 1/h—smaller steps = more noise

  4. Irregular grids require modified formulas (Fornberg’s algorithm)

  5. Finite differences alone are insufficient for noisy data

Exercises

  1. Optimal step size: For f(x) = exp(x) at x = 0, find the step size that minimizes total error. Compare to the theoretical prediction.

  2. Noise amplification: Generate sin(x) with Gaussian noise σ = 0.01. Plot derivative error vs. step size h. At what h does noise dominate?

  3. Higher-order stencils: Implement the 5-point stencil for f’(x). Compare accuracy to 3-point for f(x) = x⁵ at x = 1.

  4. Non-uniform grid: Generate data on a random grid. Implement the 3-point non-uniform formula and compare to linear interpolation + uniform finite difference.


Previous: ← The Numerical Calculus Problem | Next: Noise and Smoothing →