Chapter 2: Noise and Smoothing

“Differentiation amplifies noise. Smoothing reduces noise but introduces bias. There is no free lunch.”

Connection to the Central Thesis

Real measurements are never clean. Sensors have precision limits. Environments fluctuate. Sampling introduces quantization. The data we observe is:

\[y_i = x(t_i) + \epsilon_i\]

where x(t) is the true state and ε is measurement noise.

This creates a fundamental tension: differentiation amplifies noise, but smoothing distorts the signal. The approximation paradigm addresses this by treating numerical differentiation as a regularized inverse problem—we accept some bias to control variance.

The smoothing parameter encodes our belief about the true system: “the underlying dynamics are probably smooth, so extreme oscillations in the derivative are likely noise, not signal.”

The Fundamental Tradeoff

Every numerical differentiation method faces the same dilemma:

  • No smoothing: Derivatives are noisy (high variance)

  • Heavy smoothing: Derivatives are biased (systematic error)

This is the bias-variance tradeoff applied to differentiation.

Why Differentiation Amplifies Noise

Frequency Domain Perspective

Differentiation in the frequency domain is multiplication by iω:

\[\mathcal{F}[f'(t)] = i\omega \cdot \mathcal{F}[f(t)]\]

High-frequency components get amplified. Noise is typically high-frequency. Therefore, differentiation amplifies noise.

Quantitative Analysis

If your signal is y(t) = f(t) + ε(t) where ε is white noise with power spectral density S_ε:

  • Signal derivative power: |iω|² S_f(ω) = ω² S_f(ω)

  • Noise derivative power: |iω|² S_ε = ω² S_ε

The signal-to-noise ratio after differentiation:

\[\text{SNR}_{\text{derivative}} = \frac{\omega^2 S_f(\omega)}{\omega^2 S_\epsilon} = \frac{S_f(\omega)}{S_\epsilon}\]

The SNR is unchanged only if signal and noise have the same spectrum. In practice, noise dominates at high frequencies, so differentiation destroys SNR.

Smoothing as Regularization

The Regularization Framework

Instead of solving: $\(\text{Find } f' \text{ such that } f \text{ matches data}\)$

Solve: $\(\text{Minimize } \|f - \text{data}\|^2 + \lambda \cdot \text{Roughness}(f)\)$

where λ controls the bias-variance tradeoff:

  • λ = 0: Interpolate exactly (high variance)

  • λ → ∞: Maximally smooth (high bias)

Common Roughness Penalties

Penalty

Formula

Effect

First derivative

∫(f’)² dt

Penalizes slopes

Second derivative

∫(f’’)² dt

Penalizes curvature

Total variation

f’

The second derivative penalty is most common—it produces cubic smoothing splines.

Smoothing Methods

1. Moving Average

The simplest smoother: replace each point with the average of its neighbors.

import numpy as np

def moving_average(y, window):
    """Simple moving average."""
    kernel = np.ones(window) / window
    return np.convolve(y, kernel, mode='same')

Properties:

  • Reduces variance by factor of 1/window

  • Introduces phase lag

  • Blurs sharp features

  • Not optimal for any particular noise model

2. Savitzky-Golay Filter

Fit a polynomial locally, use the polynomial’s value (or derivative) at center.

from scipy.signal import savgol_filter

# Smooth and differentiate in one step
y_smooth = savgol_filter(y, window_length=11, polyorder=3)
dy_smooth = savgol_filter(y, window_length=11, polyorder=3, deriv=1, delta=dt)

Properties:

  • Preserves moments up to polynomial order

  • Better edge preservation than moving average

  • Analytical derivatives from fitted polynomial

  • Window size and polynomial order are tuning parameters

3. Gaussian Smoothing

Convolve with a Gaussian kernel:

\[y_{\text{smooth}}(t) = \int y(\tau) \cdot \frac{1}{\sqrt{2\pi}\sigma} e^{-(t-\tau)^2/2\sigma^2} d\tau\]
from scipy.ndimage import gaussian_filter1d

y_smooth = gaussian_filter1d(y, sigma=2.0)

Properties:

  • Optimal for Gaussian noise (in MSE sense)

  • No ringing artifacts

  • σ controls smoothing strength

  • Derivative of Gaussian = smoothed derivative

4. Kernel Regression (Nadaraya-Watson)

Weighted average with kernel weights:

\[\hat{f}(x) = \frac{\sum_i K\left(\frac{x - x_i}{h}\right) y_i}{\sum_i K\left(\frac{x - x_i}{h}\right)}\]

where K is a kernel (Gaussian, Epanechnikov, etc.) and h is the bandwidth.

from sklearn.kernel_ridge import KernelRidge

# Kernel regression
kr = KernelRidge(kernel='rbf', gamma=1.0)
kr.fit(t.reshape(-1, 1), y)
y_smooth = kr.predict(t.reshape(-1, 1))

5. Spline Smoothing

Minimize:

\[\sum_i (y_i - f(t_i))^2 + \lambda \int (f''(t))^2 dt\]

The solution is a natural cubic spline with knots at data points.

from scipy.interpolate import UnivariateSpline

# Smoothing spline (s controls smoothing)
spline = UnivariateSpline(t, y, s=len(y)*noise_variance)
y_smooth = spline(t)
dy_smooth = spline.derivative()(t)

The Bias-Variance Decomposition

For any estimator f̂ of true function f:

\[\mathbb{E}[(f̂ - f)^2] = \text{Bias}^2 + \text{Variance}\]

where:

  • Bias = E[f̂] - f (systematic error from smoothing)

  • Variance = E[(f̂ - E[f̂])²] (random error from noise)

Visualizing the Tradeoff

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline

# True function
t = np.linspace(0, 2*np.pi, 200)
y_true = np.sin(t)
dy_true = np.cos(t)

# Noisy observations
np.random.seed(42)
noise = 0.1
y_noisy = y_true + noise * np.random.randn(len(t))

# Different smoothing levels
smoothing_values = [0.01, 0.1, 1.0, 10.0]

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()

for ax, s in zip(axes, smoothing_values):
    spline = UnivariateSpline(t, y_noisy, s=s*len(t))
    dy_est = spline.derivative()(t)
    
    ax.plot(t, dy_true, 'k-', label='True derivative', linewidth=2)
    ax.plot(t, dy_est, 'r-', label=f'Estimated (s={s})', alpha=0.8)
    ax.set_title(f'Smoothing = {s}')
    ax.legend()
    ax.set_xlabel('t')
    ax.set_ylabel("f'(t)")

plt.tight_layout()
plt.show()

Choosing the Smoothing Parameter

1. Cross-Validation

Leave out data points, predict them, minimize prediction error:

\[\text{CV}(\lambda) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{f}_{-i}(x_i))^2\]

where f̂₋ᵢ is fitted without point i.

2. Generalized Cross-Validation (GCV)

Efficient approximation to leave-one-out CV:

\[\text{GCV}(\lambda) = \frac{\frac{1}{n}\sum_i (y_i - \hat{f}(x_i))^2}{(1 - \text{tr}(S)/n)^2}\]

where S is the smoothing matrix (ŷ = Sy).

3. AIC/BIC

Information criteria that penalize complexity:

\[\text{AIC} = n \log(\text{RSS}/n) + 2 \cdot \text{df}\]
\[\text{BIC} = n \log(\text{RSS}/n) + \log(n) \cdot \text{df}\]

where df is the effective degrees of freedom.

4. Known Noise Level

If you know σ², set smoothing so residual variance ≈ σ²:

# For UnivariateSpline, s ≈ n * sigma^2
spline = UnivariateSpline(t, y, s=len(t) * sigma**2)

Differentiation-Specific Considerations

Smoothing for Derivatives vs. Values

The optimal smoothing for estimating f(x) differs from optimal smoothing for f’(x):

  • For values: Moderate smoothing

  • For derivatives: More smoothing needed (noise amplification)

  • For second derivatives: Even more smoothing

Rule of thumb: Increase smoothing parameter by ~2-4x per derivative order.

Boundary Effects

Smoothing methods often perform poorly near boundaries:

  • Less data available for local averaging

  • Splines may oscillate

  • Derivatives especially unreliable

Solutions:

  • Extend data with padding (reflection, extrapolation)

  • Use boundary-aware methods

  • Trim boundary regions from analysis

Preserving Features

Heavy smoothing can destroy important features:

  • Peaks get flattened

  • Edges get blurred

  • Oscillations get damped

Solutions:

  • Adaptive smoothing (less smoothing where curvature is high)

  • Edge-preserving methods (total variation, bilateral filter)

  • Physics-informed constraints

PyDelt’s Smoothing Options

from pydelt.interpolation import SplineInterpolator, LowessInterpolator

# Spline with explicit smoothing
spline = SplineInterpolator(smoothing=0.1)
spline.fit(t, y)
dy = spline.differentiate(order=1)(t)

# LOWESS (robust to outliers)
lowess = LowessInterpolator(frac=0.1)
lowess.fit(t, y)
dy = lowess.differentiate(order=1)(t)

Practical Guidelines

When to Use What

Situation

Recommended Method

Gaussian noise, smooth function

Spline smoothing

Outliers present

LOWESS/LOESS

Sharp features to preserve

Savitzky-Golay or adaptive

Very high noise

Heavy smoothing + accept bias

Unknown noise level

Cross-validation

Diagnostic Checks

  1. Residual analysis: Residuals should look like noise, not signal

  2. Sensitivity analysis: How much do results change with smoothing parameter?

  3. Physical plausibility: Do derivatives make sense (sign, magnitude)?

Key Takeaways

  1. Differentiation amplifies noise by factor ~1/h or ~ω in frequency domain

  2. Smoothing reduces variance but introduces bias

  3. Optimal smoothing depends on noise level and derivative order

  4. Cross-validation provides data-driven smoothing selection

  5. Boundaries and features require special attention

Exercises

  1. Bias-variance visualization: For sin(x) + noise, plot bias² and variance of derivative estimate vs. smoothing parameter. Find the optimal smoothing.

  2. Cross-validation: Implement leave-one-out CV for spline smoothing. Compare to GCV.

  3. Derivative order: For the same data, find optimal smoothing for f, f’, and f’’. How do they differ?

  4. Outlier robustness: Add outliers to your data. Compare spline vs. LOWESS derivative estimates.


Previous: ← Numerical Differentiation | Next: Interpolation Methods →