Fitting the CMA Model to Kaggle Brist1D CGM Data¶
Reference: https://www.kaggle.com/competitions/brist1d/data
Dataset description (from Kaggle)¶
The dataset is from a study that collected data from young adults in the UK with type 1 diabetes, who used a continuous glucose monitor (CGM), an insulin pump and a smartwatch. These devices collected blood glucose readings, insulin dosage, carbohydrate intake, and activity data. The data collected was aggregated to five-minute intervals and formatted into samples. Each sample represents a point in time and includes the aggregated five-minute intervals from the previous six hours. The aim is to predict the blood glucose reading an hour into the future, for each of these samples.
The training set takes samples from the first three months of study data from nine of the participants and includes the future blood glucose value. These training samples appear in chronological order and overlap. The testing set takes samples from the remainder of the study period from fifteen of the participants (so unseen participants appear in the testing set). These testing samples do not overlap and are in a random order to avoid data leakage.
Complexities to be aware of:
- this is medical data so there are missing values and noise in the data
- the participants did not all use the same device models (CGM, insulin pump and smartwatch) so there may be differences in the collection method of the data
- some participants in the test set do not appear in the training set
#%%sh
### Install dependencies for conda:
# nix-shell -p micromamba
# micromamba install -c conda-forge ipykernel pandas --force-reinstall
#"$HOME/.nix-profile/bin/micromamba" run pip install -e '../../' --upgrade --force-reinstall
Load the Kaggle Brist1D Dataset¶
import pandas as pd
import numpy as np
from pfun_cma_model.misc.pathdefs import PFunDataPaths
import duckdb
from datetime import datetime
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Ensure your kaggle.json is correctly configured before running this command
# We use the built-in paths helper to download the Brist1D dataset via the Kaggle API and save it as a fast Parquet file
paths = PFunDataPaths()
paths.download_kaggle_brist1d(overwrite=True)
# Load CGM data from the downloaded Brist1D dataset
df = pd.read_parquet(paths.brist1d_data_fpath)
df
2026-04-22 17:37:37 [DEBUG] root: Parsing REDIS_CONNECTION_STRING: redis://default:HyTxCIiOwQecaBKAUEvRNjQD5EYFZQsp@redis-16117.c16.us-east-1-2.ec2.cloud.redislabs.com:16117 2026-04-22 17:37:37 [DEBUG] root: Parsed Redis URL: ParseResult(scheme='redis', netloc='default:HyTxCIiOwQecaBKAUEvRNjQD5EYFZQsp@redis-16117.c16.us-east-1-2.ec2.cloud.redislabs.com:16117', path='', params='', query='', fragment='') 2026-04-22 17:37:37 [DEBUG] root: Parsed Redis host: redis-16117.c16.us-east-1-2.ec2.cloud.redislabs.com 2026-04-22 17:37:37 [DEBUG] root: Parsed Redis settings: host=redis-16117.c16.us-east-1-2.ec2.cloud.redislabs.com, port=16117, user=default, db=0 2026-04-22 17:37:37 [DEBUG] matplotlib: matplotlib data path: /home/robbiec/Git/pfun-cma-model/.venv/lib/python3.12/site-packages/matplotlib/mpl-data 2026-04-22 17:37:37 [DEBUG] matplotlib: CONFIGDIR=/home/robbiec/.config/matplotlib 2026-04-22 17:37:37 [DEBUG] matplotlib: interactive is False 2026-04-22 17:37:37 [DEBUG] matplotlib: platform is linux 2026-04-22 17:37:37 [DEBUG] matplotlib: CACHEDIR=/home/robbiec/.cache/matplotlib 2026-04-22 17:37:37 [DEBUG] matplotlib.font_manager: Using fontManager instance from /home/robbiec/.cache/matplotlib/fontlist-v390.json 2026-04-22 17:37:37 [DEBUG] root: pfun-cma-model version: 0.4.199 2026-04-22 17:37:38 [INFO] pfun-cma-model-app: Downloading brist1d competition dataset from Kaggle... Downloading train.csv.zip to /home/robbiec/Git/pfun-cma-model/.venv/lib/python3.12/site-packages/pfun_common/data
100%|██████████| 7.95M/7.95M [00:00<00:00, 39.8MB/s]
2026-04-22 17:37:39 [INFO] pfun-cma-model-app: Converting downloaded CSV to Parquet using DuckDB... 2026-04-22 17:37:42 [INFO] pfun-cma-model-app: Successfully converted and saved to /home/robbiec/Git/pfun-cma-model/.venv/lib/python3.12/site-packages/pfun_common/data/brist1d_train.parquet.
| id | p_num | time | bg-5:55 | bg-5:50 | bg-5:45 | bg-5:40 | bg-5:35 | bg-5:30 | bg-5:25 | ... | activity-0:40 | activity-0:35 | activity-0:30 | activity-0:25 | activity-0:20 | activity-0:15 | activity-0:10 | activity-0:05 | activity-0:00 | bg+1:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | p01_0 | p01 | 06:10:00 | NaN | NaN | 9.6 | NaN | NaN | 9.7 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 13.4 |
| 1 | p01_1 | p01 | 06:25:00 | NaN | NaN | 9.7 | NaN | NaN | 9.2 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 12.8 |
| 2 | p01_2 | p01 | 06:40:00 | NaN | NaN | 9.2 | NaN | NaN | 8.7 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15.5 |
| 3 | p01_3 | p01 | 06:55:00 | NaN | NaN | 8.7 | NaN | NaN | 8.4 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 14.8 |
| 4 | p01_4 | p01 | 07:10:00 | NaN | NaN | 8.4 | NaN | NaN | 8.1 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 12.7 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 177019 | p12_25294 | p12 | 23:35:00 | 8.8 | 9.1 | 9.2 | 9.4 | 9.8 | 10.2 | 10.4 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 11.1 |
| 177020 | p12_25295 | p12 | 23:40:00 | 9.1 | 9.2 | 9.4 | 9.8 | 10.2 | 10.4 | 10.3 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10.9 |
| 177021 | p12_25296 | p12 | 23:45:00 | 9.2 | 9.4 | 9.8 | 10.2 | 10.4 | 10.3 | 10.1 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10.7 |
| 177022 | p12_25297 | p12 | 23:50:00 | 9.4 | 9.8 | 10.2 | 10.4 | 10.3 | 10.1 | 10.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10.5 |
| 177023 | p12_25298 | p12 | 23:55:00 | 9.8 | 10.2 | 10.4 | 10.3 | 10.1 | 10.0 | 9.8 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10.2 |
177024 rows × 508 columns
display(df.dtypes.to_frame().T)
| id | p_num | time | bg-5:55 | bg-5:50 | bg-5:45 | bg-5:40 | bg-5:35 | bg-5:30 | bg-5:25 | ... | activity-0:40 | activity-0:35 | activity-0:30 | activity-0:25 | activity-0:20 | activity-0:15 | activity-0:10 | activity-0:05 | activity-0:00 | bg+1:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | str | str | object | float64 | float64 | float64 | float64 | float64 | float64 | float64 | ... | str | str | str | str | str | str | str | str | str | float64 |
1 rows × 508 columns
Preprocess the data¶
The model requires 'time' in UTC format, and typically 'G' (glucose) and 'sg' (sensor glucose) columns
In the Kaggle dataset, 'time' is just HH:MM:SS, and 'bg-0:00' is the current blood glucose reading. We synthesize full dates starting from today for demonstration purposes.
from typing import Sequence
# For this example, we will focus on a single participant's continuous glucose trace
p_num = 'p02'
participant_df = df[df['p_num'] == p_num].copy().reset_index(drop=True)
def create_synthetic_datetime(time: pd.Series) -> Sequence['datetime64[us, UTC]']:
today_string = pd.Timestamp.today().strftime('%Y-%m-%d')
datetime_string = today_string + ' ' + time.astype(str)
datetime_vector = pd.to_datetime(datetime_string).dt.tz_localize('UTC')
return datetime_vector
participant_df['time'] = create_synthetic_datetime(participant_df["time"])
participant_df.sort_values(by="time", inplace=True)
participant_df.set_index("time", drop=False, inplace=True)
participant_df['displayTime'] = participant_df['time']
participant_df['systemTime'] = participant_df['time']
# Combine sources of glucose data (shifting time index to match actual measurement time)
participant_df['G'] = participant_df['bg-0:00'] # Actual glucose value at the time index
bg_columns = [c for c in participant_df.columns if 'bg' in c]
time_vec = pd.to_datetime(participant_df["time"]).drop_duplicates(keep='first')
for col in bg_columns:
print(f"Incorporating {col}...")
delta = col.replace("bg", "") + ":00"
shifted_time = time_vec + pd.Timedelta(delta)
shifted_time.name = "shifted_time"
g_vector = participant_df[col].drop_duplicates().resample('5min').mean()
bg_new = pd.DataFrame({"G": g_vector}, index=shifted_time).dropna()
participant_df = participant_df.combine_first(bg_new)
print(f"N_missing=", participant_df["G"].isna().sum())
# complete cgm dataset
participant_df['sg'] = participant_df['G']
cgm_data = participant_df[['time', 'G', 'sg']]
cgm_data = cgm_data \
.groupby(level=0).mean()
# Checking the structure of the loaded data
print(cgm_data.head(), end="\n")
print("Shape:\n", cgm_data.shape, end="\n")
print("Data types:\n", cgm_data.dtypes, end="\n")
print(cgm_data.describe(), end="\n")
Incorporating bg-5:55...
N_missing= 40
Incorporating bg-5:50...
N_missing= 33
Incorporating bg-5:45...
N_missing= 30
Incorporating bg-5:40...
N_missing= 30
Incorporating bg-5:35...
N_missing= 30
Incorporating bg-5:30...
N_missing= 30
Incorporating bg-5:25...
N_missing= 30
Incorporating bg-5:20...
N_missing= 30
Incorporating bg-5:15...
N_missing= 30
Incorporating bg-5:10...
N_missing= 30
Incorporating bg-5:05...
N_missing= 30
Incorporating bg-5:00...
N_missing= 30
Incorporating bg-4:55...
N_missing= 30
Incorporating bg-4:50...
N_missing= 30
Incorporating bg-4:45...
N_missing= 30
Incorporating bg-4:40...
N_missing= 30
Incorporating bg-4:35...
N_missing= 30
Incorporating bg-4:30...
N_missing= 30
Incorporating bg-4:25...
N_missing= 30
Incorporating bg-4:20...
N_missing= 30
Incorporating bg-4:15...
N_missing= 30
Incorporating bg-4:10...
N_missing= 30
Incorporating bg-4:05...
N_missing= 30
Incorporating bg-4:00...
N_missing= 30
Incorporating bg-3:55...
N_missing= 30
Incorporating bg-3:50...
N_missing= 30
Incorporating bg-3:45...
N_missing= 30
Incorporating bg-3:40...
N_missing= 30
Incorporating bg-3:35...
N_missing= 30
Incorporating bg-3:30...
N_missing= 30
Incorporating bg-3:25...
N_missing= 30
Incorporating bg-3:20...
N_missing= 30
Incorporating bg-3:15...
N_missing= 30
Incorporating bg-3:10...
N_missing= 30
Incorporating bg-3:05...
N_missing= 30
Incorporating bg-3:00...
N_missing= 30
Incorporating bg-2:55...
N_missing= 30
Incorporating bg-2:50...
N_missing= 30
Incorporating bg-2:45...
N_missing= 30
Incorporating bg-2:40...
N_missing= 30
Incorporating bg-2:35...
N_missing= 30
Incorporating bg-2:30...
N_missing= 30
Incorporating bg-2:25...
N_missing= 30
Incorporating bg-2:20...
N_missing= 30
Incorporating bg-2:15...
N_missing= 30
Incorporating bg-2:10...
N_missing= 30
Incorporating bg-2:05...
N_missing= 30
Incorporating bg-2:00...
N_missing= 30
Incorporating bg-1:55...
N_missing= 30
Incorporating bg-1:50...
N_missing= 30
Incorporating bg-1:45...
N_missing= 30
Incorporating bg-1:40...
N_missing= 30
Incorporating bg-1:35...
N_missing= 29
Incorporating bg-1:30...
N_missing= 28
Incorporating bg-1:25...
N_missing= 27
Incorporating bg-1:20...
N_missing= 26
Incorporating bg-1:15...
N_missing= 25
Incorporating bg-1:10...
N_missing= 24
Incorporating bg-1:05...
N_missing= 24
Incorporating bg-1:00...
N_missing= 24
Incorporating bg-0:55...
N_missing= 24
Incorporating bg-0:50...
N_missing= 24
Incorporating bg-0:45...
N_missing= 24
Incorporating bg-0:40...
N_missing= 24
Incorporating bg-0:35...
N_missing= 24
Incorporating bg-0:30...
N_missing= 24
Incorporating bg-0:25...
N_missing= 24
Incorporating bg-0:20...
N_missing= 24
Incorporating bg-0:15...
N_missing= 24
Incorporating bg-0:10...
N_missing= 24
Incorporating bg-0:05...
N_missing= 24
Incorporating bg-0:00...
N_missing= 24
Incorporating bg+1:00...
N_missing= 23
time G sg
time
2026-04-22 00:00:00+00:00 2026-04-22 00:00:00+00:00 10.767632 10.767632
2026-04-22 00:05:00+00:00 2026-04-22 00:05:00+00:00 10.778140 10.778140
2026-04-22 00:10:00+00:00 2026-04-22 00:10:00+00:00 10.906512 10.906512
2026-04-22 00:15:00+00:00 2026-04-22 00:15:00+00:00 11.069767 11.069767
2026-04-22 00:20:00+00:00 2026-04-22 00:20:00+00:00 11.068837 11.068837
Shape:
(288, 3)
Data types:
time datetime64[us, UTC]
G float64
sg float64
dtype: object
G sg
count 288.000000 288.000000
mean 9.358485 9.358485
std 1.007725 1.007725
min 7.486667 7.486667
25% 8.605232 8.605232
50% 9.392788 9.392788
75% 10.232831 10.232831
max 11.305871 11.305871
Plot the selected CGM data¶
from pfun_common.plot import lineplot as pfun_lineplot
original_sg = cgm_data['sg'].copy()
cgm_data['sg'] = original_sg * 18
cgm_data['G'] = original_sg * 18
ax = pfun_lineplot(cgm_data, tcol="time", ycol="sg")
ax.set_title(f"BG Data ({p_num})")
ax.figure.set_size_inches(10, 4)
Perform the model fit¶
- Our fitting procedure is sensitive to periodic noise.
- Some hyperparameter optimization will help find an ROI in the parameter space.
#from scipy.optimize import minimize
from pfun_cma_model.engine.fit import fit_model
# Fit the model to the CGM data
# Bounds-constrained minimization:
# Nelder-Mead, L-BFGS-B, TNC, SLSQP, Powell
# Define the hyperparameters to vary
algorithms = (
"Nelder-Mead",
"L-BFGS-B",
"TNC",
"SLSQP",
"Powell"
)
nr_timepoints = (24, 48, 96, 192, 512, 720, 1024, 4096, )
meal_freq_estimates = ('30min', '1h', '2h', '6h', )
residuals = {}
def get_residual(result) -> float:
residual = result.infodict["result"].fun
return residual
for N in nr_timepoints: # number of time points (resampling)
for tm_freq in meal_freq_estimates: # meal frequency estimate
for algo in algorithms: # bounds-constrained minimization algorithm
try:
result = fit_model(
cgm_data,
tcol='time',
ycol='G',
tm_freq=tm_freq,
N=N,
curve_fit_kwargs={
"method": f"{algo}",
}
)
except RuntimeError as exception:
logging.warning(
"\n({%s %s, %s) Failed to converge.",
N, algo, tm_freq,
exc_info=False
)
else:
result_key = (N, algo, tm_freq)
residual = get_residual(result)
residuals[result_key] = result
print(f"{str(result_key)} residual = {residual:.4f}")
(24, 'Nelder-Mead', '30min') residual = 0.2006 (24, 'L-BFGS-B', '30min') residual = 0.2006 (24, 'TNC', '30min') residual = 0.2006 (24, 'SLSQP', '30min') residual = 0.2006 (24, 'Powell', '30min') residual = 0.2006 (24, 'Nelder-Mead', '1h') residual = 0.2023 (24, 'L-BFGS-B', '1h') residual = 0.2023 (24, 'TNC', '1h') residual = 0.2023 (24, 'SLSQP', '1h') residual = 0.2023 (24, 'Powell', '1h') residual = 0.2023 (24, 'Nelder-Mead', '2h') residual = 0.3974 (24, 'L-BFGS-B', '2h') residual = 0.3974 (24, 'TNC', '2h') residual = 0.3974 (24, 'SLSQP', '2h') residual = 0.3974 (24, 'Powell', '2h') residual = 0.3974 (24, 'Nelder-Mead', '6h') residual = 0.2009 (24, 'L-BFGS-B', '6h') residual = 0.2009 (24, 'TNC', '6h') residual = 0.2009 (24, 'SLSQP', '6h') residual = 0.2009 (24, 'Powell', '6h') residual = 0.2009 (48, 'Nelder-Mead', '30min') residual = 0.4304 (48, 'L-BFGS-B', '30min') residual = 0.4304 (48, 'TNC', '30min') residual = 0.4304 (48, 'SLSQP', '30min') residual = 0.4304 (48, 'Powell', '30min') residual = 0.4304 (48, 'Nelder-Mead', '1h') residual = 0.4337 (48, 'L-BFGS-B', '1h') residual = 0.4337 (48, 'TNC', '1h') residual = 0.4337 (48, 'SLSQP', '1h') residual = 0.4337 (48, 'Powell', '1h') residual = 0.4337 (48, 'Nelder-Mead', '2h') residual = 0.8018 (48, 'L-BFGS-B', '2h') residual = 0.8018 (48, 'TNC', '2h') residual = 0.8018 (48, 'SLSQP', '2h') residual = 0.8018 (48, 'Powell', '2h') residual = 0.8018 (48, 'Nelder-Mead', '6h') residual = 0.4283 (48, 'L-BFGS-B', '6h') residual = 0.4283 (48, 'TNC', '6h') residual = 0.4283 (48, 'SLSQP', '6h') residual = 0.4283 (48, 'Powell', '6h') residual = 0.4283 (96, 'Nelder-Mead', '30min') residual = 1.0431 (96, 'L-BFGS-B', '30min') residual = 1.0431 (96, 'TNC', '30min') residual = 1.0431 (96, 'SLSQP', '30min') residual = 1.0431 (96, 'Powell', '30min') residual = 1.0431 (96, 'Nelder-Mead', '1h') residual = 0.8712 (96, 'L-BFGS-B', '1h') residual = 0.8712 (96, 'TNC', '1h') residual = 0.8712 (96, 'SLSQP', '1h') residual = 0.8712 (96, 'Powell', '1h') residual = 0.8712 (96, 'Nelder-Mead', '2h') residual = 1.6713 (96, 'L-BFGS-B', '2h') residual = 1.6713 (96, 'TNC', '2h') residual = 1.6713 (96, 'SLSQP', '2h') residual = 1.6713 (96, 'Powell', '2h') residual = 1.6713 (96, 'Nelder-Mead', '6h') residual = 0.8688 (96, 'L-BFGS-B', '6h') residual = 0.8688 (96, 'TNC', '6h') residual = 0.8688 (96, 'SLSQP', '6h') residual = 0.8688 (96, 'Powell', '6h') residual = 0.8688 (192, 'Nelder-Mead', '30min') residual = 1.7460 (192, 'L-BFGS-B', '30min') residual = 1.7460 (192, 'TNC', '30min') residual = 1.7460 (192, 'SLSQP', '30min') residual = 1.7460 (192, 'Powell', '30min') residual = 1.7460 (192, 'Nelder-Mead', '1h') residual = 4.6229 (192, 'L-BFGS-B', '1h') residual = 4.6229 (192, 'TNC', '1h') residual = 4.6229 (192, 'SLSQP', '1h') residual = 4.6229 (192, 'Powell', '1h') residual = 4.6229 (192, 'Nelder-Mead', '2h') residual = 1.7460 (192, 'L-BFGS-B', '2h') residual = 1.7460 (192, 'TNC', '2h') residual = 1.7460 (192, 'SLSQP', '2h') residual = 1.7460 (192, 'Powell', '2h') residual = 1.7460 (192, 'Nelder-Mead', '6h') residual = 1.7460 (192, 'L-BFGS-B', '6h') residual = 1.7460 (192, 'TNC', '6h') residual = 1.7460 (192, 'SLSQP', '6h') residual = 1.7460 (192, 'Powell', '6h') residual = 1.7460 (512, 'Nelder-Mead', '30min') residual = 5.0998 (512, 'L-BFGS-B', '30min') residual = 5.0998 (512, 'TNC', '30min') residual = 5.0998 (512, 'SLSQP', '30min') residual = 5.0998 (512, 'Powell', '30min') residual = 5.0998 (512, 'Nelder-Mead', '1h') residual = 4.6731 (512, 'L-BFGS-B', '1h') residual = 4.6731 (512, 'TNC', '1h') residual = 4.6731 (512, 'SLSQP', '1h') residual = 4.6731 (512, 'Powell', '1h') residual = 4.6731 (512, 'Nelder-Mead', '2h') residual = 4.6717 (512, 'L-BFGS-B', '2h') residual = 4.6717 (512, 'TNC', '2h') residual = 4.6717 (512, 'SLSQP', '2h') residual = 4.6717 (512, 'Powell', '2h') residual = 4.6717 (512, 'Nelder-Mead', '6h') residual = 4.6731 (512, 'L-BFGS-B', '6h') residual = 4.6731 (512, 'TNC', '6h') residual = 4.6731 (512, 'SLSQP', '6h') residual = 4.6731 (512, 'Powell', '6h') residual = 4.6731 (720, 'Nelder-Mead', '30min') residual = 6.5632 (720, 'L-BFGS-B', '30min') residual = 6.5632 (720, 'TNC', '30min') residual = 6.5632 (720, 'SLSQP', '30min') residual = 6.5632 (720, 'Powell', '30min') residual = 6.5632 (720, 'Nelder-Mead', '1h') residual = 6.5632 (720, 'L-BFGS-B', '1h') residual = 6.5632 (720, 'TNC', '1h') residual = 6.5632 (720, 'SLSQP', '1h') residual = 6.5632 (720, 'Powell', '1h') residual = 6.5632 (720, 'Nelder-Mead', '2h') residual = 6.5609 (720, 'L-BFGS-B', '2h') residual = 6.5609 (720, 'TNC', '2h') residual = 6.5609 (720, 'SLSQP', '2h') residual = 6.5609 (720, 'Powell', '2h') residual = 6.5609 (720, 'Nelder-Mead', '6h') residual = 6.5610 (720, 'L-BFGS-B', '6h') residual = 6.5610 (720, 'TNC', '6h') residual = 6.5610 (720, 'SLSQP', '6h') residual = 6.5610 (720, 'Powell', '6h') residual = 6.5610 (1024, 'Nelder-Mead', '30min') residual = 9.3311 (1024, 'L-BFGS-B', '30min') residual = 9.3311 (1024, 'TNC', '30min') residual = 9.3311 (1024, 'SLSQP', '30min') residual = 9.3311 (1024, 'Powell', '30min') residual = 9.3311 (1024, 'Nelder-Mead', '1h') residual = 9.3311 (1024, 'L-BFGS-B', '1h') residual = 9.3311 (1024, 'TNC', '1h') residual = 9.3311 (1024, 'SLSQP', '1h') residual = 9.3311 (1024, 'Powell', '1h') residual = 9.3311 (1024, 'Nelder-Mead', '2h') residual = 9.3311 (1024, 'L-BFGS-B', '2h') residual = 9.3311 (1024, 'TNC', '2h') residual = 9.3311 (1024, 'SLSQP', '2h') residual = 9.3311 (1024, 'Powell', '2h') residual = 9.3311 (1024, 'Nelder-Mead', '6h') residual = 9.3311 (1024, 'L-BFGS-B', '6h') residual = 9.3311 (1024, 'TNC', '6h') residual = 9.3311 (1024, 'SLSQP', '6h') residual = 9.3311 (1024, 'Powell', '6h') residual = 9.3311 (4096, 'Nelder-Mead', '30min') residual = 37.3694 (4096, 'L-BFGS-B', '30min') residual = 37.3694 (4096, 'TNC', '30min') residual = 37.3694 (4096, 'SLSQP', '30min') residual = 37.3694 (4096, 'Powell', '30min') residual = 37.3694 (4096, 'Nelder-Mead', '1h') residual = 37.3694 (4096, 'L-BFGS-B', '1h') residual = 37.3694 (4096, 'TNC', '1h') residual = 37.3694 (4096, 'SLSQP', '1h') residual = 37.3694 (4096, 'Powell', '1h') residual = 37.3694 (4096, 'Nelder-Mead', '2h') residual = 37.3672 (4096, 'L-BFGS-B', '2h') residual = 37.3672 (4096, 'TNC', '2h') residual = 37.3672 (4096, 'SLSQP', '2h') residual = 37.3672 (4096, 'Powell', '2h') residual = 37.3672 (4096, 'Nelder-Mead', '6h') residual = 37.4414 (4096, 'L-BFGS-B', '6h') residual = 37.4414 (4096, 'TNC', '6h') residual = 37.4414 (4096, 'SLSQP', '6h') residual = 37.4414
Analyze hyperparameters¶
resid_by_N = map(lambda pair: (pair[0][0], get_residual(pair[1])), residuals.items())
df_resid_by_N = pd.DataFrame(resid_by_N, columns=["N", "residual"])
ax = df_resid_by_N.plot(x="N", y="residual", linestyle='', marker='o')
ax.set_ylabel("Residual")
resid_by_Algo = map(lambda pair: (pair[0][1], get_residual(pair[1])), residuals.items())
df_resid_by_Algo = pd.DataFrame(resid_by_Algo, columns=["Algo", "residual"])
ax = df_resid_by_Algo.boxplot(by="Algo")
resid_by_tMf = map(lambda pair: (pair[0][2], get_residual(pair[1])), residuals.items())
df_resid_by_tMf = pd.DataFrame(resid_by_tMf, columns=["Meal Freq", "residual"])
ax = df_resid_by_tMf.boxplot(by="Meal Freq")
def weighted_objective_function(result):
residual = get_residual(result)
N = result.soln.shape[0]
weighted_objective = residual / ((N + 1)**(0.5))
return weighted_objective
best_algorithm, model_results = min(
residuals.items(),
key=lambda pair: weighted_objective_function(pair[1])
)
best_residual = get_residual(model_results)
print(f"Algorithm: '{best_algorithm}' ({best_residual:.3f})")
model_results.soln.head()
model_results.soln.plot(x="t", y="G")
print("Model Parameters:")
print(model_results.popt_named)
print("\nModel Fit Results Info:")
print(model_results.infodict)
print(model_results.infodict['message'])
# Displaying results
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
# Use formatted_data for normalized raw glucose values (scaled to [0, 2])
plt.plot(model_results.formatted_data['t'], model_results.formatted_data['G'], label='CGM Data (normalized)', marker='o', linestyle='none', alpha=0.5)
# The model_results.soln dataframe contains the fitted curve (indexed by timedelta in hours)
plt.plot(model_results.soln["t"], model_results.soln['G'], label='Fitted CMA Model', color='red', linewidth=2)
plt.title(f'Participant {p_num}, CMA Model Fit\nResidual = {best_residual:.3f}\nNormalized BG over 24-hour time')
plt.xlabel('Time (hours)')
plt.ylabel('Glucose (normalized)')
plt.legend()
plt.show()
Posthoc Analysis¶
The PFun CMA model effectively characterizes the dynamic continuous glucose monitor (CGM) traces by adjusting specific physiological parameters (taup, taug, Cm).
By applying the model directly to the Kaggle Brist1D dataset, we observe that the CMA model is capable of fitting its compartmental equations to the raw glucose (G) values, providing the solid 'red' regression line above. The infodict generated through the fitting process yields vital metrics, like termination messages, to confirm the algorithm successfully converged on optimized coefficients.
Key Highlights:
- The fitting process expects timestamp columns (
timeby default) in UTC format, which we handled by appending arbitrary UTC dates to theHH:MM:SStime values provided in the competition dataset. - Additional preprocessing steps are often handled directly through
fit_modelcallingformat_data. - By observing the parameter bounds and bounds conditions, one can tune the CMA model further to adjust to real physiological profiles seen across different participants.
Save rendered html¶
%%sh
uv run jupyter nbconvert --to html kaggle-brist1d-fit-model-example.ipynb --output kaggle-brist1d-fit-model-example.html
cp kaggle-brist1d-fit-model-example.html ../../pfun_cma_model/static/notebooks/kaggle-brist1d-fit-model-example.html