BOCD-GMM: Gaussian Mixture Model
Overview
The BOCD-GMM model is a particle-based, non-parametric approach for detecting changepoints in complex data distributions. It uses Gaussian Mixture Models (GMM) to handle multimodal data and provides robustness against outliers.
When to Use BOCD-GMM
Best suited for:
- Multimodal data with multiple modes
- Heavy-tailed distributions
- Outlier-prone data streams
- Non-Gaussian distributions
- Complex, heterogeneous data
Advantages:
- Handles multimodal and non-Gaussian data
- Robust to outliers through mixture components
- Flexible distribution modeling
- Better performance on realistic data
Limitations:
- Computationally more expensive than NIG
- Many hyperparameters to tune
- Requires more data for stable estimates
- Slower execution than BOCD-NIG
Parameters
Initialization
from pybocd import BOCDGMM
model = BOCDGMM(
# Component parameters
alpha_0=2.0,
beta_0=2.0,
# Mean parameters
m_0=0.0,
kappa_0=1.0,
# Precision parameters
alpha_p_0=2.0,
beta_p_0=2.0,
# Mixture weight parameters
mu_p_0=0.0,
sigma_p_sq_0=1.0,
# Jitter (smoothing) parameters
jitter_mu=0.01,
jitter_sigma_sq=0.01,
jitter_tau_sq=0.01,
jitter_pi=0.01,
# Inference parameters
l=200.0, # Expected run length
m=20, # Number of mixture components
n=200, # Number of particles
init_particle_n=50 # Initial number of particles
)
Parameter Descriptions
| Parameter | Description |
|---|---|
alpha_0, beta_0 | Prior parameters for component weighting |
m_0, kappa_0 | Prior mean and precision for mixture component means |
alpha_p_0, beta_p_0 | Prior shape/rate for component precisions |
mu_p_0, sigma_p_sq_0 | Parameters for precision prior distribution |
jitter_* | Smoothing parameters for particle updates |
l | Expected run length between changepoints |
m | Maximum number of mixture components |
n | Number of particles for sequential Monte Carlo |
init_particle_n | Initial particle count before resampling |
Usage Example
import numpy as np
from pybocd import BOCDGMM
# Generate synthetic multimodal data
np.random.seed(42)
data = np.concatenate([
np.random.normal(-2, 0.5, 100), # Mode 1: mean=-2
np.random.normal(2, 0.5, 100), # Mode 2: mean=2
np.random.normal(0, 0.5, 100), # Mode 1 returns
np.random.normal(-2, 0.5, 100)
])
# Add some outliers
outlier_indices = np.random.choice(len(data), 10, replace=False)
data[outlier_indices] += np.random.normal(0, 3, 10)
# Initialize GMM-based model
model = BOCDGMM(
alpha_0=2.0, beta_0=2.0,
m_0=0.0, kappa_0=1.0,
alpha_p_0=2.0, beta_p_0=2.0,
mu_p_0=0.0, sigma_p_sq_0=1.0,
jitter_mu=0.01, jitter_sigma_sq=0.01,
jitter_tau_sq=0.01, jitter_pi=0.01,
l=200.0, m=20, n=200, init_particle_n=50
)
# Process data
for t, x in enumerate(data):
model.add_data(x)
if t % 50 == 0:
print(f"Time {t}: Run length = {model.run_length:.1f}")
Accessing Results
# MAP estimate of run length
run_length = model.run_length
# Full posterior distribution
dist = model.run_length_dist
# Mixture component information
# (availability depends on implementation)
Tuning the GMM Model
Number of Particles (n)
More particles = more accurate but slower:
model = BOCDGMM(..., n=100) # Fast, less accurate
model = BOCDGMM(..., n=500) # Balanced
model = BOCDGMM(..., n=1000) # Slow, more accurate
Number of Components (m)
Controls mixture complexity:
model = BOCDGMM(..., m=5) # Few components, simple patterns
model = BOCDGMM(..., m=20) # Moderate complexity
model = BOCDGMM(..., m=50) # High complexity, more flexible
Jitter Parameters
Smoothing for particle diversity:
# Less smoothing (sharper updates)
model = BOCDGMM(..., jitter_mu=0.001, jitter_sigma_sq=0.001)
# More smoothing (smoother updates)
model = BOCDGMM(..., jitter_mu=0.1, jitter_sigma_sq=0.1)
Prior Settings
Weak priors:
BOCDGMM(alpha_0=1.0, beta_0=1.0, kappa_0=0.1, ...)
Strong priors:
BOCDGMM(alpha_0=10.0, beta_0=10.0, kappa_0=10.0, ...)
Performance Considerations
- Memory: Grows linearly with number of particles and components
- Speed: Slower than BOCD-NIG by 5-50x depending on parameters
- Accuracy: Better on non-Gaussian, multimodal data
- Stability: More stable with larger particle counts
Comparison with BOCD-NIG
| Aspect | BOCD-NIG | BOCD-GMM |
|---|---|---|
| Data Type | Univariate, normal | Multimodal, complex |
| Speed | Very fast | Slower |
| Robustness | Low to outliers | High |
| Hyperparameters | Few (4-5) | Many (10+) |
| Computational Cost | O(1) | O(n × m) |
| Best For | Simple streams | Complex distributions |
References
For theoretical details, see the original BOCD paper and advanced particle filtering literature.