Chapter 2: Noise and Smoothing
“Differentiation amplifies noise. Smoothing reduces noise but introduces bias. There is no free lunch.”
Connection to the Central Thesis
Real measurements are never clean. Sensors have precision limits. Environments fluctuate. Sampling introduces quantization. The data we observe is:
where x(t) is the true state and ε is measurement noise.
This creates a fundamental tension: differentiation amplifies noise, but smoothing distorts the signal. The approximation paradigm addresses this by treating numerical differentiation as a regularized inverse problem—we accept some bias to control variance.
The smoothing parameter encodes our belief about the true system: “the underlying dynamics are probably smooth, so extreme oscillations in the derivative are likely noise, not signal.”
The Fundamental Tradeoff
Every numerical differentiation method faces the same dilemma:
No smoothing: Derivatives are noisy (high variance)
Heavy smoothing: Derivatives are biased (systematic error)
This is the bias-variance tradeoff applied to differentiation.
Why Differentiation Amplifies Noise
Frequency Domain Perspective
Differentiation in the frequency domain is multiplication by iω:
High-frequency components get amplified. Noise is typically high-frequency. Therefore, differentiation amplifies noise.
Quantitative Analysis
If your signal is y(t) = f(t) + ε(t) where ε is white noise with power spectral density S_ε:
Signal derivative power: |iω|² S_f(ω) = ω² S_f(ω)
Noise derivative power: |iω|² S_ε = ω² S_ε
The signal-to-noise ratio after differentiation:
The SNR is unchanged only if signal and noise have the same spectrum. In practice, noise dominates at high frequencies, so differentiation destroys SNR.
Smoothing as Regularization
The Regularization Framework
Instead of solving: $\(\text{Find } f' \text{ such that } f \text{ matches data}\)$
Solve: $\(\text{Minimize } \|f - \text{data}\|^2 + \lambda \cdot \text{Roughness}(f)\)$
where λ controls the bias-variance tradeoff:
λ = 0: Interpolate exactly (high variance)
λ → ∞: Maximally smooth (high bias)
Common Roughness Penalties
Penalty |
Formula |
Effect |
|---|---|---|
First derivative |
∫(f’)² dt |
Penalizes slopes |
Second derivative |
∫(f’’)² dt |
Penalizes curvature |
Total variation |
∫ |
f’ |
The second derivative penalty is most common—it produces cubic smoothing splines.
Smoothing Methods
1. Moving Average
The simplest smoother: replace each point with the average of its neighbors.
import numpy as np
def moving_average(y, window):
"""Simple moving average."""
kernel = np.ones(window) / window
return np.convolve(y, kernel, mode='same')
Properties:
Reduces variance by factor of 1/window
Introduces phase lag
Blurs sharp features
Not optimal for any particular noise model
2. Savitzky-Golay Filter
Fit a polynomial locally, use the polynomial’s value (or derivative) at center.
from scipy.signal import savgol_filter
# Smooth and differentiate in one step
y_smooth = savgol_filter(y, window_length=11, polyorder=3)
dy_smooth = savgol_filter(y, window_length=11, polyorder=3, deriv=1, delta=dt)
Properties:
Preserves moments up to polynomial order
Better edge preservation than moving average
Analytical derivatives from fitted polynomial
Window size and polynomial order are tuning parameters
3. Gaussian Smoothing
Convolve with a Gaussian kernel:
from scipy.ndimage import gaussian_filter1d
y_smooth = gaussian_filter1d(y, sigma=2.0)
Properties:
Optimal for Gaussian noise (in MSE sense)
No ringing artifacts
σ controls smoothing strength
Derivative of Gaussian = smoothed derivative
4. Kernel Regression (Nadaraya-Watson)
Weighted average with kernel weights:
where K is a kernel (Gaussian, Epanechnikov, etc.) and h is the bandwidth.
from sklearn.kernel_ridge import KernelRidge
# Kernel regression
kr = KernelRidge(kernel='rbf', gamma=1.0)
kr.fit(t.reshape(-1, 1), y)
y_smooth = kr.predict(t.reshape(-1, 1))
5. Spline Smoothing
Minimize:
The solution is a natural cubic spline with knots at data points.
from scipy.interpolate import UnivariateSpline
# Smoothing spline (s controls smoothing)
spline = UnivariateSpline(t, y, s=len(y)*noise_variance)
y_smooth = spline(t)
dy_smooth = spline.derivative()(t)
The Bias-Variance Decomposition
For any estimator f̂ of true function f:
where:
Bias = E[f̂] - f (systematic error from smoothing)
Variance = E[(f̂ - E[f̂])²] (random error from noise)
Visualizing the Tradeoff
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
# True function
t = np.linspace(0, 2*np.pi, 200)
y_true = np.sin(t)
dy_true = np.cos(t)
# Noisy observations
np.random.seed(42)
noise = 0.1
y_noisy = y_true + noise * np.random.randn(len(t))
# Different smoothing levels
smoothing_values = [0.01, 0.1, 1.0, 10.0]
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()
for ax, s in zip(axes, smoothing_values):
spline = UnivariateSpline(t, y_noisy, s=s*len(t))
dy_est = spline.derivative()(t)
ax.plot(t, dy_true, 'k-', label='True derivative', linewidth=2)
ax.plot(t, dy_est, 'r-', label=f'Estimated (s={s})', alpha=0.8)
ax.set_title(f'Smoothing = {s}')
ax.legend()
ax.set_xlabel('t')
ax.set_ylabel("f'(t)")
plt.tight_layout()
plt.show()
Choosing the Smoothing Parameter
1. Cross-Validation
Leave out data points, predict them, minimize prediction error:
where f̂₋ᵢ is fitted without point i.
2. Generalized Cross-Validation (GCV)
Efficient approximation to leave-one-out CV:
where S is the smoothing matrix (ŷ = Sy).
3. AIC/BIC
Information criteria that penalize complexity:
where df is the effective degrees of freedom.
4. Known Noise Level
If you know σ², set smoothing so residual variance ≈ σ²:
# For UnivariateSpline, s ≈ n * sigma^2
spline = UnivariateSpline(t, y, s=len(t) * sigma**2)
Differentiation-Specific Considerations
Smoothing for Derivatives vs. Values
The optimal smoothing for estimating f(x) differs from optimal smoothing for f’(x):
For values: Moderate smoothing
For derivatives: More smoothing needed (noise amplification)
For second derivatives: Even more smoothing
Rule of thumb: Increase smoothing parameter by ~2-4x per derivative order.
Boundary Effects
Smoothing methods often perform poorly near boundaries:
Less data available for local averaging
Splines may oscillate
Derivatives especially unreliable
Solutions:
Extend data with padding (reflection, extrapolation)
Use boundary-aware methods
Trim boundary regions from analysis
Preserving Features
Heavy smoothing can destroy important features:
Peaks get flattened
Edges get blurred
Oscillations get damped
Solutions:
Adaptive smoothing (less smoothing where curvature is high)
Edge-preserving methods (total variation, bilateral filter)
Physics-informed constraints
PyDelt’s Smoothing Options
from pydelt.interpolation import SplineInterpolator, LowessInterpolator
# Spline with explicit smoothing
spline = SplineInterpolator(smoothing=0.1)
spline.fit(t, y)
dy = spline.differentiate(order=1)(t)
# LOWESS (robust to outliers)
lowess = LowessInterpolator(frac=0.1)
lowess.fit(t, y)
dy = lowess.differentiate(order=1)(t)
Practical Guidelines
When to Use What
Situation |
Recommended Method |
|---|---|
Gaussian noise, smooth function |
Spline smoothing |
Outliers present |
LOWESS/LOESS |
Sharp features to preserve |
Savitzky-Golay or adaptive |
Very high noise |
Heavy smoothing + accept bias |
Unknown noise level |
Cross-validation |
Diagnostic Checks
Residual analysis: Residuals should look like noise, not signal
Sensitivity analysis: How much do results change with smoothing parameter?
Physical plausibility: Do derivatives make sense (sign, magnitude)?
Key Takeaways
Differentiation amplifies noise by factor ~1/h or ~ω in frequency domain
Smoothing reduces variance but introduces bias
Optimal smoothing depends on noise level and derivative order
Cross-validation provides data-driven smoothing selection
Boundaries and features require special attention
Exercises
Bias-variance visualization: For sin(x) + noise, plot bias² and variance of derivative estimate vs. smoothing parameter. Find the optimal smoothing.
Cross-validation: Implement leave-one-out CV for spline smoothing. Compare to GCV.
Derivative order: For the same data, find optimal smoothing for f, f’, and f’’. How do they differ?
Outlier robustness: Add outliers to your data. Compare spline vs. LOWESS derivative estimates.
Previous: ← Numerical Differentiation | Next: Interpolation Methods →