BOCD-NIG: Normal-Inverse-Gamma Model

Overview

The BOCD-NIG model is a conjugate Bayesian approach for detecting changepoints in univariate data with unknown mean and variance. It uses the Normal-Inverse-Gamma (NIG) distribution as the conjugate prior for efficient sequential inference.

When to Use BOCD-NIG

Best suited for:

  • Univariate continuous data streams
  • Data with unknown mean and variance
  • Fast real-time processing requirements
  • Applications requiring simple interpretability
  • When data is approximately normally distributed

Advantages:

  • Computationally efficient (conjugate updates)
  • Closed-form posterior updates
  • Low memory footprint
  • Fast execution

Limitations:

  • Assumes univariate normal data
  • Less robust to outliers and multimodal distributions
  • May underperform on non-Gaussian data

Parameters

Initialization

from pybocd import BOCDNIG

model = BOCDNIG(
    m_0=0.0,           # Prior mean
    kappa_0=1.0,       # Prior precision (inverse variance scale)
    alpha_0=1.0,       # Prior shape for inverse-gamma distribution
    beta_0=1.0,        # Prior rate for inverse-gamma distribution
    l=200.0,           # Expected run length (transition probability)
    threshold=1e-4     # Pruning threshold for negligible weights
)

Parameter Descriptions

Parameter Description
m_0 Prior mean of the normal distribution
kappa_0 Prior precision scaling factor (higher = stronger prior belief)
alpha_0 Shape parameter of the inverse-gamma distribution for variance
beta_0 Rate parameter of the inverse-gamma distribution for variance
l Expected run length; probability of changepoint = 1/l at each time step
threshold Minimum weight threshold for maintaining run-length hypotheses

Usage Example

import numpy as np
from pybocd import BOCDNIG

# Generate synthetic data with a changepoint
np.random.seed(42)
data = np.concatenate([
    np.random.normal(0, 1, 100),      # Segment 1: mean=0, std=1
    np.random.normal(3, 1, 100)       # Segment 2: mean=3, std=1
])

# Initialize model
model = BOCDNIG(m_0=0.0, kappa_0=1.0, alpha_0=1.0, beta_0=1.0, l=200.0)

# Process data
for t, x in enumerate(data):
    model.add_data(x)
    
    # Print MAP estimate and probability of changepoint
    if model.run_length > 50:
        print(f"Time {t}: Run length = {model.run_length:.1f}, Changepoint likely")

Accessing Results

After calling add_data(), you can access:

# Maximum a posteriori (MAP) estimate of run length
run_length = model.run_length

# Full posterior distribution over run lengths
dist = model.run_length_dist  # Dictionary: {run_length: probability}

# Log probability (for likelihood evaluation)
log_prob = model.log_prob

Tuning the Model

Expected Run Length (l)

A higher l means changepoints are less likely. Choose based on domain knowledge:

  • l = 50: Expect a changepoint every ~50 observations
  • l = 200: Expect a changepoint every ~200 observations
  • l = 1000: Expect a changepoint every ~1000 observations

Pruning Threshold

The threshold parameter removes run-length hypotheses with negligible posterior weight, reducing computation:

model = BOCDNIG(..., threshold=1e-2)  # More aggressive pruning (faster)
model = BOCDNIG(..., threshold=1e-6)  # Less pruning (more accurate)

Prior Parameters

For weak priors (less informative):

BOCDNIG(m_0=0.0, kappa_0=0.1, alpha_0=0.5, beta_0=0.5)

For strong priors (more informative):

BOCDNIG(m_0=5.0, kappa_0=10.0, alpha_0=10.0, beta_0=10.0)

References

For theoretical details, see the original BOCD paper and the pybocd documentation.