BOCD-NIG: Normal-Inverse-Gamma Model
Overview
The BOCD-NIG model is a conjugate Bayesian approach for detecting changepoints in univariate data with unknown mean and variance. It uses the Normal-Inverse-Gamma (NIG) distribution as the conjugate prior for efficient sequential inference.
When to Use BOCD-NIG
Best suited for:
- Univariate continuous data streams
- Data with unknown mean and variance
- Fast real-time processing requirements
- Applications requiring simple interpretability
- When data is approximately normally distributed
Advantages:
- Computationally efficient (conjugate updates)
- Closed-form posterior updates
- Low memory footprint
- Fast execution
Limitations:
- Assumes univariate normal data
- Less robust to outliers and multimodal distributions
- May underperform on non-Gaussian data
Parameters
Initialization
from pybocd import BOCDNIG
model = BOCDNIG(
m_0=0.0, # Prior mean
kappa_0=1.0, # Prior precision (inverse variance scale)
alpha_0=1.0, # Prior shape for inverse-gamma distribution
beta_0=1.0, # Prior rate for inverse-gamma distribution
l=200.0, # Expected run length (transition probability)
threshold=1e-4 # Pruning threshold for negligible weights
)
Parameter Descriptions
| Parameter | Description |
|---|---|
m_0 | Prior mean of the normal distribution |
kappa_0 | Prior precision scaling factor (higher = stronger prior belief) |
alpha_0 | Shape parameter of the inverse-gamma distribution for variance |
beta_0 | Rate parameter of the inverse-gamma distribution for variance |
l | Expected run length; probability of changepoint = 1/l at each time step |
threshold | Minimum weight threshold for maintaining run-length hypotheses |
Usage Example
import numpy as np
from pybocd import BOCDNIG
# Generate synthetic data with a changepoint
np.random.seed(42)
data = np.concatenate([
np.random.normal(0, 1, 100), # Segment 1: mean=0, std=1
np.random.normal(3, 1, 100) # Segment 2: mean=3, std=1
])
# Initialize model
model = BOCDNIG(m_0=0.0, kappa_0=1.0, alpha_0=1.0, beta_0=1.0, l=200.0)
# Process data
for t, x in enumerate(data):
model.add_data(x)
# Print MAP estimate and probability of changepoint
if model.run_length > 50:
print(f"Time {t}: Run length = {model.run_length:.1f}, Changepoint likely")
Accessing Results
After calling add_data(), you can access:
# Maximum a posteriori (MAP) estimate of run length
run_length = model.run_length
# Full posterior distribution over run lengths
dist = model.run_length_dist # Dictionary: {run_length: probability}
# Log probability (for likelihood evaluation)
log_prob = model.log_prob
Tuning the Model
Expected Run Length (l)
A higher l means changepoints are less likely. Choose based on domain knowledge:
l = 50: Expect a changepoint every ~50 observationsl = 200: Expect a changepoint every ~200 observationsl = 1000: Expect a changepoint every ~1000 observations
Pruning Threshold
The threshold parameter removes run-length hypotheses with negligible posterior weight, reducing computation:
model = BOCDNIG(..., threshold=1e-2) # More aggressive pruning (faster)
model = BOCDNIG(..., threshold=1e-6) # Less pruning (more accurate)
Prior Parameters
For weak priors (less informative):
BOCDNIG(m_0=0.0, kappa_0=0.1, alpha_0=0.5, beta_0=0.5)
For strong priors (more informative):
BOCDNIG(m_0=5.0, kappa_0=10.0, alpha_0=10.0, beta_0=10.0)
References
For theoretical details, see the original BOCD paper and the pybocd documentation.