Chapter 2: Derivatives Intuition
“The derivative measures instantaneous rate of change. It’s the slope of the tangent line. It tells you: if I nudge the input, how much does the output change?”
Three Ways to Think About Derivatives
The derivative is one of the most important concepts in all of mathematics—and it has multiple interpretations that are all equally valid.
Interpretation 1: Instantaneous Rate of Change
Imagine you’re driving a car. Your speedometer shows your instantaneous speed—how fast you’re going right now, not your average speed over the trip.
If your position is p(t) at time t, then:
Average speed over interval [t₁, t₂]: (p(t₂) - p(t₁)) / (t₂ - t₁)
Instantaneous speed at time t: p’(t) = lim[h→0] (p(t+h) - p(t)) / h
The derivative gives you the instantaneous rate of change.
Interpretation 2: Slope of the Tangent Line
Geometrically, the derivative at a point is the slope of the line that just touches the curve at that point.
/
/ ← tangent line (slope = derivative)
•
/|
/ |
/ |
──────────
If the tangent line goes up (positive slope), the function is increasing
If the tangent line goes down (negative slope), the function is decreasing
If the tangent line is flat (zero slope), you’re at a local maximum or minimum
Interpretation 3: Sensitivity (The ML Interpretation)
This is the most useful interpretation for machine learning:
The derivative tells you how sensitive the output is to changes in the input.
If f’(x) = 3, then a small change Δx in the input produces approximately 3Δx change in the output.
This is exactly what gradients tell you during training:
Large gradient → weight has big impact on loss
Small gradient → weight has little impact on loss
Zero gradient → changing this weight doesn’t affect loss (at this point)
The Derivative as a Function
The derivative of f(x) is itself a function, f’(x), that tells you the slope at every point.
Example: f(x) = x²
x |
f(x) = x² |
f’(x) = 2x |
Interpretation |
|---|---|---|---|
-2 |
4 |
-4 |
Steeply decreasing |
-1 |
1 |
-2 |
Decreasing |
0 |
0 |
0 |
Flat (minimum!) |
1 |
1 |
2 |
Increasing |
2 |
4 |
4 |
Steeply increasing |
Notice: the derivative is zero exactly where the function has its minimum. This is the foundation of optimization!
Computing Derivatives with PyDelt
When you have data instead of formulas, PyDelt computes derivatives numerically:
import numpy as np
from pydelt.interpolation import SplineInterpolator
# Generate data from f(x) = x²
x = np.linspace(-3, 3, 100)
y = x**2
# Fit and differentiate
interpolator = SplineInterpolator(smoothing=0.01)
interpolator.fit(x, y)
derivative_func = interpolator.differentiate(order=1)
# Evaluate derivative
x_test = np.array([-2, -1, 0, 1, 2])
derivatives = derivative_func(x_test)
print(f"Computed: {derivatives}")
print(f"Exact: {2 * x_test}")
# Output: Computed: [-4. -2. 0. 2. 4.]
# Exact: [-4 -2 0 2 4]
Higher-Order Derivatives
You can differentiate a derivative to get the second derivative, and so on:
f(x): Position
f’(x): Velocity (first derivative)
f’’(x): Acceleration (second derivative)
f’’’(x): Jerk (third derivative)
What Second Derivatives Tell You
The second derivative measures curvature—how the slope itself is changing:
f’’(x) > 0: Curve is concave up (like a smile 😊), slope is increasing
f’’(x) < 0: Curve is concave down (like a frown 😞), slope is decreasing
f’’(x) = 0: Inflection point (curvature changes sign)
ML Connection: The Hessian
In optimization, the second derivative (or its multidimensional analog, the Hessian) tells you about the curvature of your loss landscape:
Positive curvature: You’re in a valley (good for optimization)
Negative curvature: You’re on a ridge (saddle point or maximum)
Mixed curvature: Saddle point (common in high dimensions)
# Second derivative with PyDelt
second_derivative_func = interpolator.differentiate(order=2)
curvature = second_derivative_func(x_test)
print(f"Curvature at all points: {curvature}")
# For f(x) = x², f''(x) = 2 everywhere
Notation: A Quick Guide
Different fields use different notation for derivatives:
Notation |
Read as |
Common in |
|---|---|---|
f’(x) |
“f prime of x” |
Mathematics |
df/dx |
“d f d x” |
Physics, engineering |
∂f/∂x |
“partial f partial x” |
Multivariate calculus |
∇f |
“gradient of f” |
Machine learning |
Df |
“D f” |
Functional analysis |
They all mean the same thing: the derivative of f with respect to x.
Derivatives of Common Functions
Here are derivatives you’ll encounter constantly:
Function |
Derivative |
Why It Matters |
|---|---|---|
xⁿ |
n·xⁿ⁻¹ |
Polynomial layers |
eˣ |
eˣ |
Softmax, exponential families |
ln(x) |
1/x |
Log-likelihood, cross-entropy |
sin(x) |
cos(x) |
Positional encodings, Fourier |
cos(x) |
-sin(x) |
Positional encodings, Fourier |
σ(x) = 1/(1+e⁻ˣ) |
σ(x)(1-σ(x)) |
Sigmoid activation |
tanh(x) |
1 - tanh²(x) |
Tanh activation |
max(0,x) |
1 if x>0, 0 if x<0 |
ReLU activation |
The Derivative Doesn’t Always Exist
Some functions have points where the derivative is undefined:
1. Corners (Non-Differentiable Points)
def relu(x):
return max(0, x)
At x = 0, ReLU has a “corner”—the left slope is 0, the right slope is 1. The derivative doesn’t exist (though we define it as 0 by convention).
2. Vertical Tangents
If the tangent line is vertical, the slope is infinite—the derivative doesn’t exist as a finite number.
3. Discontinuities
If the function jumps, there’s no tangent line at the jump point.
Why This Matters for ML
ReLU is not differentiable at 0, but we use it anyway (subgradients)
Discrete operations (argmax, sampling) have no gradients (use Gumbel-Softmax, REINFORCE)
Quantization breaks differentiability (use straight-through estimators)
Numerical vs. Analytical Derivatives
Analytical Derivatives
When you have a formula, you can compute the derivative exactly using rules (next chapter).
Pros: Exact, fast to evaluate Cons: Requires knowing the formula
Numerical Derivatives
When you only have data, you approximate:
Pros: Works with any data Cons: Approximate, sensitive to noise, amplifies high-frequency errors
Automatic Differentiation (Autodiff)
The best of both worlds—used by PyTorch and TensorFlow:
Computes exact derivatives
Works with complex compositions
No need to derive formulas by hand
PyDelt’s NeuralNetworkInterpolator uses autodiff internally.
Visualizing Derivatives
import numpy as np
import matplotlib.pyplot as plt
from pydelt.interpolation import SplineInterpolator
# Create data
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
# Fit and differentiate
interp = SplineInterpolator(smoothing=0.01)
interp.fit(x, y)
deriv = interp.differentiate(order=1)
# Plot
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 6))
ax1.plot(x, y, 'b-', label='f(x) = sin(x)')
ax1.set_ylabel('f(x)')
ax1.legend()
ax1.grid(True)
ax2.plot(x, deriv(x), 'r-', label="f'(x) = cos(x)")
ax2.plot(x, np.cos(x), 'k--', alpha=0.5, label='Exact cos(x)')
ax2.set_xlabel('x')
ax2.set_ylabel("f'(x)")
ax2.legend()
ax2.grid(True)
plt.tight_layout()
plt.show()
Key Takeaways
Derivatives measure instantaneous rate of change
Geometrically, it’s the slope of the tangent line
For ML, it’s sensitivity: how outputs respond to input changes
Second derivatives measure curvature (important for optimization)
Derivatives don’t always exist (corners, jumps, vertical tangents)
PyDelt computes derivatives from data when you don’t have formulas
Exercises
Intuition check: If f’(3) = -2, is f increasing or decreasing at x = 3? By approximately how much does f change if x increases from 3 to 3.1?
Find the minimum: For f(x) = x² - 4x + 5, find where f’(x) = 0. Verify this is a minimum by checking f’’(x).
Code it: Use PyDelt to compute the derivative of f(x) = e^(-x²) and plot both the function and its derivative.
Previous: ← Functions and Limits | Next: Differentiation Rules →