Fitting the CMA Model to Kaggle Brist1D CGM Data¶

Reference: https://www.kaggle.com/competitions/brist1d/data

Dataset description (from Kaggle)¶

The dataset is from a study that collected data from young adults in the UK with type 1 diabetes, who used a continuous glucose monitor (CGM), an insulin pump and a smartwatch. These devices collected blood glucose readings, insulin dosage, carbohydrate intake, and activity data. The data collected was aggregated to five-minute intervals and formatted into samples. Each sample represents a point in time and includes the aggregated five-minute intervals from the previous six hours. The aim is to predict the blood glucose reading an hour into the future, for each of these samples.

The training set takes samples from the first three months of study data from nine of the participants and includes the future blood glucose value. These training samples appear in chronological order and overlap. The testing set takes samples from the remainder of the study period from fifteen of the participants (so unseen participants appear in the testing set). These testing samples do not overlap and are in a random order to avoid data leakage.

Complexities to be aware of:

  • this is medical data so there are missing values and noise in the data
  • the participants did not all use the same device models (CGM, insulin pump and smartwatch) so there may be differences in the collection method of the data
  • some participants in the test set do not appear in the training set
In [1]:
#%%sh

### Install dependencies for conda:

# nix-shell -p micromamba

# micromamba install -c conda-forge ipykernel pandas --force-reinstall
#"$HOME/.nix-profile/bin/micromamba" run pip install -e '../../' --upgrade --force-reinstall

Load the Kaggle Brist1D Dataset¶

In [2]:
import pandas as pd
import numpy as np
from pfun_cma_model.misc.pathdefs import PFunDataPaths
import duckdb
from datetime import datetime
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Ensure your kaggle.json is correctly configured before running this command
# We use the built-in paths helper to download the Brist1D dataset via the Kaggle API and save it as a fast Parquet file
paths = PFunDataPaths()
paths.download_kaggle_brist1d(overwrite=True)

# Load CGM data from the downloaded Brist1D dataset
df = pd.read_parquet(paths.brist1d_data_fpath)
df
2026-04-22 17:37:37 [DEBUG] root: Parsing REDIS_CONNECTION_STRING: redis://default:HyTxCIiOwQecaBKAUEvRNjQD5EYFZQsp@redis-16117.c16.us-east-1-2.ec2.cloud.redislabs.com:16117
2026-04-22 17:37:37 [DEBUG] root: Parsed Redis URL: ParseResult(scheme='redis', netloc='default:HyTxCIiOwQecaBKAUEvRNjQD5EYFZQsp@redis-16117.c16.us-east-1-2.ec2.cloud.redislabs.com:16117', path='', params='', query='', fragment='')
2026-04-22 17:37:37 [DEBUG] root: Parsed Redis host: redis-16117.c16.us-east-1-2.ec2.cloud.redislabs.com
2026-04-22 17:37:37 [DEBUG] root: Parsed Redis settings: host=redis-16117.c16.us-east-1-2.ec2.cloud.redislabs.com, port=16117, user=default, db=0
2026-04-22 17:37:37 [DEBUG] matplotlib: matplotlib data path: /home/robbiec/Git/pfun-cma-model/.venv/lib/python3.12/site-packages/matplotlib/mpl-data
2026-04-22 17:37:37 [DEBUG] matplotlib: CONFIGDIR=/home/robbiec/.config/matplotlib
2026-04-22 17:37:37 [DEBUG] matplotlib: interactive is False
2026-04-22 17:37:37 [DEBUG] matplotlib: platform is linux
2026-04-22 17:37:37 [DEBUG] matplotlib: CACHEDIR=/home/robbiec/.cache/matplotlib
2026-04-22 17:37:37 [DEBUG] matplotlib.font_manager: Using fontManager instance from /home/robbiec/.cache/matplotlib/fontlist-v390.json
2026-04-22 17:37:37 [DEBUG] root: pfun-cma-model version: 0.4.199
2026-04-22 17:37:38 [INFO] pfun-cma-model-app: Downloading brist1d competition dataset from Kaggle...
Downloading train.csv.zip to /home/robbiec/Git/pfun-cma-model/.venv/lib/python3.12/site-packages/pfun_common/data
100%|██████████| 7.95M/7.95M [00:00<00:00, 39.8MB/s]
2026-04-22 17:37:39 [INFO] pfun-cma-model-app: Converting downloaded CSV to Parquet using DuckDB...
2026-04-22 17:37:42 [INFO] pfun-cma-model-app: Successfully converted and saved to /home/robbiec/Git/pfun-cma-model/.venv/lib/python3.12/site-packages/pfun_common/data/brist1d_train.parquet.
Out[2]:
id p_num time bg-5:55 bg-5:50 bg-5:45 bg-5:40 bg-5:35 bg-5:30 bg-5:25 ... activity-0:40 activity-0:35 activity-0:30 activity-0:25 activity-0:20 activity-0:15 activity-0:10 activity-0:05 activity-0:00 bg+1:00
0 p01_0 p01 06:10:00 NaN NaN 9.6 NaN NaN 9.7 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.4
1 p01_1 p01 06:25:00 NaN NaN 9.7 NaN NaN 9.2 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.8
2 p01_2 p01 06:40:00 NaN NaN 9.2 NaN NaN 8.7 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 15.5
3 p01_3 p01 06:55:00 NaN NaN 8.7 NaN NaN 8.4 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.8
4 p01_4 p01 07:10:00 NaN NaN 8.4 NaN NaN 8.1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.7
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
177019 p12_25294 p12 23:35:00 8.8 9.1 9.2 9.4 9.8 10.2 10.4 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 11.1
177020 p12_25295 p12 23:40:00 9.1 9.2 9.4 9.8 10.2 10.4 10.3 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 10.9
177021 p12_25296 p12 23:45:00 9.2 9.4 9.8 10.2 10.4 10.3 10.1 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 10.7
177022 p12_25297 p12 23:50:00 9.4 9.8 10.2 10.4 10.3 10.1 10.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 10.5
177023 p12_25298 p12 23:55:00 9.8 10.2 10.4 10.3 10.1 10.0 9.8 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 10.2

177024 rows × 508 columns

In [3]:
display(df.dtypes.to_frame().T)
id p_num time bg-5:55 bg-5:50 bg-5:45 bg-5:40 bg-5:35 bg-5:30 bg-5:25 ... activity-0:40 activity-0:35 activity-0:30 activity-0:25 activity-0:20 activity-0:15 activity-0:10 activity-0:05 activity-0:00 bg+1:00
0 str str object float64 float64 float64 float64 float64 float64 float64 ... str str str str str str str str str float64

1 rows × 508 columns

Preprocess the data¶

The model requires 'time' in UTC format, and typically 'G' (glucose) and 'sg' (sensor glucose) columns

In the Kaggle dataset, 'time' is just HH:MM:SS, and 'bg-0:00' is the current blood glucose reading. We synthesize full dates starting from today for demonstration purposes.

In [4]:
from typing import Sequence


# For this example, we will focus on a single participant's continuous glucose trace
p_num = 'p02'
participant_df = df[df['p_num'] == p_num].copy().reset_index(drop=True)

def create_synthetic_datetime(time: pd.Series) -> Sequence['datetime64[us, UTC]']:
    today_string = pd.Timestamp.today().strftime('%Y-%m-%d') 
    datetime_string = today_string + ' ' + time.astype(str)
    datetime_vector = pd.to_datetime(datetime_string).dt.tz_localize('UTC')
    return datetime_vector

participant_df['time'] = create_synthetic_datetime(participant_df["time"])
participant_df.sort_values(by="time", inplace=True)
participant_df.set_index("time", drop=False, inplace=True)
participant_df['displayTime'] = participant_df['time']
participant_df['systemTime'] = participant_df['time']

# Combine sources of glucose data (shifting time index to match actual measurement time)
participant_df['G'] = participant_df['bg-0:00']  # Actual glucose value at the time index
bg_columns = [c for c in participant_df.columns if 'bg' in c]
time_vec = pd.to_datetime(participant_df["time"]).drop_duplicates(keep='first')
for col in bg_columns:
    print(f"Incorporating {col}...")
    delta = col.replace("bg", "") + ":00"
    shifted_time = time_vec + pd.Timedelta(delta)
    shifted_time.name = "shifted_time"
    g_vector = participant_df[col].drop_duplicates().resample('5min').mean()
    bg_new = pd.DataFrame({"G": g_vector}, index=shifted_time).dropna()
    participant_df = participant_df.combine_first(bg_new)
    print(f"N_missing=", participant_df["G"].isna().sum())

# complete cgm dataset
participant_df['sg'] = participant_df['G']
cgm_data = participant_df[['time', 'G', 'sg']]
cgm_data = cgm_data \
    .groupby(level=0).mean()

# Checking the structure of the loaded data
print(cgm_data.head(), end="\n")
print("Shape:\n", cgm_data.shape, end="\n")
print("Data types:\n", cgm_data.dtypes, end="\n")
print(cgm_data.describe(), end="\n")
Incorporating bg-5:55...
N_missing= 40
Incorporating bg-5:50...
N_missing= 33
Incorporating bg-5:45...
N_missing= 30
Incorporating bg-5:40...
N_missing= 30
Incorporating bg-5:35...
N_missing= 30
Incorporating bg-5:30...
N_missing= 30
Incorporating bg-5:25...
N_missing= 30
Incorporating bg-5:20...
N_missing= 30
Incorporating bg-5:15...
N_missing= 30
Incorporating bg-5:10...
N_missing= 30
Incorporating bg-5:05...
N_missing= 30
Incorporating bg-5:00...
N_missing= 30
Incorporating bg-4:55...
N_missing= 30
Incorporating bg-4:50...
N_missing= 30
Incorporating bg-4:45...
N_missing= 30
Incorporating bg-4:40...
N_missing= 30
Incorporating bg-4:35...
N_missing= 30
Incorporating bg-4:30...
N_missing= 30
Incorporating bg-4:25...
N_missing= 30
Incorporating bg-4:20...
N_missing= 30
Incorporating bg-4:15...
N_missing= 30
Incorporating bg-4:10...
N_missing= 30
Incorporating bg-4:05...
N_missing= 30
Incorporating bg-4:00...
N_missing= 30
Incorporating bg-3:55...
N_missing= 30
Incorporating bg-3:50...
N_missing= 30
Incorporating bg-3:45...
N_missing= 30
Incorporating bg-3:40...
N_missing= 30
Incorporating bg-3:35...
N_missing= 30
Incorporating bg-3:30...
N_missing= 30
Incorporating bg-3:25...
N_missing= 30
Incorporating bg-3:20...
N_missing= 30
Incorporating bg-3:15...
N_missing= 30
Incorporating bg-3:10...
N_missing= 30
Incorporating bg-3:05...
N_missing= 30
Incorporating bg-3:00...
N_missing= 30
Incorporating bg-2:55...
N_missing= 30
Incorporating bg-2:50...
N_missing= 30
Incorporating bg-2:45...
N_missing= 30
Incorporating bg-2:40...
N_missing= 30
Incorporating bg-2:35...
N_missing= 30
Incorporating bg-2:30...
N_missing= 30
Incorporating bg-2:25...
N_missing= 30
Incorporating bg-2:20...
N_missing= 30
Incorporating bg-2:15...
N_missing= 30
Incorporating bg-2:10...
N_missing= 30
Incorporating bg-2:05...
N_missing= 30
Incorporating bg-2:00...
N_missing= 30
Incorporating bg-1:55...
N_missing= 30
Incorporating bg-1:50...
N_missing= 30
Incorporating bg-1:45...
N_missing= 30
Incorporating bg-1:40...
N_missing= 30
Incorporating bg-1:35...
N_missing= 29
Incorporating bg-1:30...
N_missing= 28
Incorporating bg-1:25...
N_missing= 27
Incorporating bg-1:20...
N_missing= 26
Incorporating bg-1:15...
N_missing= 25
Incorporating bg-1:10...
N_missing= 24
Incorporating bg-1:05...
N_missing= 24
Incorporating bg-1:00...
N_missing= 24
Incorporating bg-0:55...
N_missing= 24
Incorporating bg-0:50...
N_missing= 24
Incorporating bg-0:45...
N_missing= 24
Incorporating bg-0:40...
N_missing= 24
Incorporating bg-0:35...
N_missing= 24
Incorporating bg-0:30...
N_missing= 24
Incorporating bg-0:25...
N_missing= 24
Incorporating bg-0:20...
N_missing= 24
Incorporating bg-0:15...
N_missing= 24
Incorporating bg-0:10...
N_missing= 24
Incorporating bg-0:05...
N_missing= 24
Incorporating bg-0:00...
N_missing= 24
Incorporating bg+1:00...
N_missing= 23
                                               time          G         sg
time                                                                     
2026-04-22 00:00:00+00:00 2026-04-22 00:00:00+00:00  10.767632  10.767632
2026-04-22 00:05:00+00:00 2026-04-22 00:05:00+00:00  10.778140  10.778140
2026-04-22 00:10:00+00:00 2026-04-22 00:10:00+00:00  10.906512  10.906512
2026-04-22 00:15:00+00:00 2026-04-22 00:15:00+00:00  11.069767  11.069767
2026-04-22 00:20:00+00:00 2026-04-22 00:20:00+00:00  11.068837  11.068837
Shape:
 (288, 3)
Data types:
 time    datetime64[us, UTC]
G                   float64
sg                  float64
dtype: object
                G          sg
count  288.000000  288.000000
mean     9.358485    9.358485
std      1.007725    1.007725
min      7.486667    7.486667
25%      8.605232    8.605232
50%      9.392788    9.392788
75%     10.232831   10.232831
max     11.305871   11.305871

Plot the selected CGM data¶

In [5]:
from pfun_common.plot import lineplot as pfun_lineplot
original_sg = cgm_data['sg'].copy()
cgm_data['sg'] = original_sg * 18
cgm_data['G'] = original_sg * 18
ax = pfun_lineplot(cgm_data, tcol="time", ycol="sg")
ax.set_title(f"BG Data ({p_num})")
ax.figure.set_size_inches(10, 4)
No description has been provided for this image

Perform the model fit¶

  • Our fitting procedure is sensitive to periodic noise.
  • Some hyperparameter optimization will help find an ROI in the parameter space.
In [ ]:
#from scipy.optimize import minimize
from pfun_cma_model.engine.fit import fit_model

# Fit the model to the CGM data

# Bounds-constrained minimization:
#   Nelder-Mead, L-BFGS-B, TNC, SLSQP, Powell

# Define the hyperparameters to vary
algorithms = (
    "Nelder-Mead",
    "L-BFGS-B",
    "TNC",
    "SLSQP",
    "Powell"
)
nr_timepoints = (24, 48, 96, 192, 512, 720, 1024, 4096, )
meal_freq_estimates = ('30min', '1h', '2h', '6h', )

residuals = {}

def get_residual(result) -> float:
    residual = result.infodict["result"].fun
    return residual

for N in nr_timepoints:  # number of time points (resampling)
    for tm_freq in meal_freq_estimates:  # meal frequency estimate
        for algo in algorithms:  # bounds-constrained minimization algorithm
            try:
                result = fit_model(
                    cgm_data,
                    tcol='time',
                    ycol='G',
                    tm_freq=tm_freq,
                    N=N,
                    curve_fit_kwargs={
                        "method": f"{algo}",
                    }
                )
            except RuntimeError as exception:
                logging.warning(
                    "\n({%s %s, %s) Failed to converge.",
                    N, algo, tm_freq,
                    exc_info=False
                )
            else:
                result_key = (N, algo, tm_freq)
                residual = get_residual(result)
                residuals[result_key] = result
                print(f"{str(result_key)} residual = {residual:.4f}")
(24, 'Nelder-Mead', '30min') residual = 0.2006
(24, 'L-BFGS-B', '30min') residual = 0.2006
(24, 'TNC', '30min') residual = 0.2006
(24, 'SLSQP', '30min') residual = 0.2006
(24, 'Powell', '30min') residual = 0.2006
(24, 'Nelder-Mead', '1h') residual = 0.2023
(24, 'L-BFGS-B', '1h') residual = 0.2023
(24, 'TNC', '1h') residual = 0.2023
(24, 'SLSQP', '1h') residual = 0.2023
(24, 'Powell', '1h') residual = 0.2023
(24, 'Nelder-Mead', '2h') residual = 0.3974
(24, 'L-BFGS-B', '2h') residual = 0.3974
(24, 'TNC', '2h') residual = 0.3974
(24, 'SLSQP', '2h') residual = 0.3974
(24, 'Powell', '2h') residual = 0.3974
(24, 'Nelder-Mead', '6h') residual = 0.2009
(24, 'L-BFGS-B', '6h') residual = 0.2009
(24, 'TNC', '6h') residual = 0.2009
(24, 'SLSQP', '6h') residual = 0.2009
(24, 'Powell', '6h') residual = 0.2009
(48, 'Nelder-Mead', '30min') residual = 0.4304
(48, 'L-BFGS-B', '30min') residual = 0.4304
(48, 'TNC', '30min') residual = 0.4304
(48, 'SLSQP', '30min') residual = 0.4304
(48, 'Powell', '30min') residual = 0.4304
(48, 'Nelder-Mead', '1h') residual = 0.4337
(48, 'L-BFGS-B', '1h') residual = 0.4337
(48, 'TNC', '1h') residual = 0.4337
(48, 'SLSQP', '1h') residual = 0.4337
(48, 'Powell', '1h') residual = 0.4337
(48, 'Nelder-Mead', '2h') residual = 0.8018
(48, 'L-BFGS-B', '2h') residual = 0.8018
(48, 'TNC', '2h') residual = 0.8018
(48, 'SLSQP', '2h') residual = 0.8018
(48, 'Powell', '2h') residual = 0.8018
(48, 'Nelder-Mead', '6h') residual = 0.4283
(48, 'L-BFGS-B', '6h') residual = 0.4283
(48, 'TNC', '6h') residual = 0.4283
(48, 'SLSQP', '6h') residual = 0.4283
(48, 'Powell', '6h') residual = 0.4283
(96, 'Nelder-Mead', '30min') residual = 1.0431
(96, 'L-BFGS-B', '30min') residual = 1.0431
(96, 'TNC', '30min') residual = 1.0431
(96, 'SLSQP', '30min') residual = 1.0431
(96, 'Powell', '30min') residual = 1.0431
(96, 'Nelder-Mead', '1h') residual = 0.8712
(96, 'L-BFGS-B', '1h') residual = 0.8712
(96, 'TNC', '1h') residual = 0.8712
(96, 'SLSQP', '1h') residual = 0.8712
(96, 'Powell', '1h') residual = 0.8712
(96, 'Nelder-Mead', '2h') residual = 1.6713
(96, 'L-BFGS-B', '2h') residual = 1.6713
(96, 'TNC', '2h') residual = 1.6713
(96, 'SLSQP', '2h') residual = 1.6713
(96, 'Powell', '2h') residual = 1.6713
(96, 'Nelder-Mead', '6h') residual = 0.8688
(96, 'L-BFGS-B', '6h') residual = 0.8688
(96, 'TNC', '6h') residual = 0.8688
(96, 'SLSQP', '6h') residual = 0.8688
(96, 'Powell', '6h') residual = 0.8688
(192, 'Nelder-Mead', '30min') residual = 1.7460
(192, 'L-BFGS-B', '30min') residual = 1.7460
(192, 'TNC', '30min') residual = 1.7460
(192, 'SLSQP', '30min') residual = 1.7460
(192, 'Powell', '30min') residual = 1.7460
(192, 'Nelder-Mead', '1h') residual = 4.6229
(192, 'L-BFGS-B', '1h') residual = 4.6229
(192, 'TNC', '1h') residual = 4.6229
(192, 'SLSQP', '1h') residual = 4.6229
(192, 'Powell', '1h') residual = 4.6229
(192, 'Nelder-Mead', '2h') residual = 1.7460
(192, 'L-BFGS-B', '2h') residual = 1.7460
(192, 'TNC', '2h') residual = 1.7460
(192, 'SLSQP', '2h') residual = 1.7460
(192, 'Powell', '2h') residual = 1.7460
(192, 'Nelder-Mead', '6h') residual = 1.7460
(192, 'L-BFGS-B', '6h') residual = 1.7460
(192, 'TNC', '6h') residual = 1.7460
(192, 'SLSQP', '6h') residual = 1.7460
(192, 'Powell', '6h') residual = 1.7460
(512, 'Nelder-Mead', '30min') residual = 5.0998
(512, 'L-BFGS-B', '30min') residual = 5.0998
(512, 'TNC', '30min') residual = 5.0998
(512, 'SLSQP', '30min') residual = 5.0998
(512, 'Powell', '30min') residual = 5.0998
(512, 'Nelder-Mead', '1h') residual = 4.6731
(512, 'L-BFGS-B', '1h') residual = 4.6731
(512, 'TNC', '1h') residual = 4.6731
(512, 'SLSQP', '1h') residual = 4.6731
(512, 'Powell', '1h') residual = 4.6731
(512, 'Nelder-Mead', '2h') residual = 4.6717
(512, 'L-BFGS-B', '2h') residual = 4.6717
(512, 'TNC', '2h') residual = 4.6717
(512, 'SLSQP', '2h') residual = 4.6717
(512, 'Powell', '2h') residual = 4.6717
(512, 'Nelder-Mead', '6h') residual = 4.6731
(512, 'L-BFGS-B', '6h') residual = 4.6731
(512, 'TNC', '6h') residual = 4.6731
(512, 'SLSQP', '6h') residual = 4.6731
(512, 'Powell', '6h') residual = 4.6731
(720, 'Nelder-Mead', '30min') residual = 6.5632
(720, 'L-BFGS-B', '30min') residual = 6.5632
(720, 'TNC', '30min') residual = 6.5632
(720, 'SLSQP', '30min') residual = 6.5632
(720, 'Powell', '30min') residual = 6.5632
(720, 'Nelder-Mead', '1h') residual = 6.5632
(720, 'L-BFGS-B', '1h') residual = 6.5632
(720, 'TNC', '1h') residual = 6.5632
(720, 'SLSQP', '1h') residual = 6.5632
(720, 'Powell', '1h') residual = 6.5632
(720, 'Nelder-Mead', '2h') residual = 6.5609
(720, 'L-BFGS-B', '2h') residual = 6.5609
(720, 'TNC', '2h') residual = 6.5609
(720, 'SLSQP', '2h') residual = 6.5609
(720, 'Powell', '2h') residual = 6.5609
(720, 'Nelder-Mead', '6h') residual = 6.5610
(720, 'L-BFGS-B', '6h') residual = 6.5610
(720, 'TNC', '6h') residual = 6.5610
(720, 'SLSQP', '6h') residual = 6.5610
(720, 'Powell', '6h') residual = 6.5610
(1024, 'Nelder-Mead', '30min') residual = 9.3311
(1024, 'L-BFGS-B', '30min') residual = 9.3311
(1024, 'TNC', '30min') residual = 9.3311
(1024, 'SLSQP', '30min') residual = 9.3311
(1024, 'Powell', '30min') residual = 9.3311
(1024, 'Nelder-Mead', '1h') residual = 9.3311
(1024, 'L-BFGS-B', '1h') residual = 9.3311
(1024, 'TNC', '1h') residual = 9.3311
(1024, 'SLSQP', '1h') residual = 9.3311
(1024, 'Powell', '1h') residual = 9.3311
(1024, 'Nelder-Mead', '2h') residual = 9.3311
(1024, 'L-BFGS-B', '2h') residual = 9.3311
(1024, 'TNC', '2h') residual = 9.3311
(1024, 'SLSQP', '2h') residual = 9.3311
(1024, 'Powell', '2h') residual = 9.3311
(1024, 'Nelder-Mead', '6h') residual = 9.3311
(1024, 'L-BFGS-B', '6h') residual = 9.3311
(1024, 'TNC', '6h') residual = 9.3311
(1024, 'SLSQP', '6h') residual = 9.3311
(1024, 'Powell', '6h') residual = 9.3311
(4096, 'Nelder-Mead', '30min') residual = 37.3694
(4096, 'L-BFGS-B', '30min') residual = 37.3694
(4096, 'TNC', '30min') residual = 37.3694
(4096, 'SLSQP', '30min') residual = 37.3694
(4096, 'Powell', '30min') residual = 37.3694
(4096, 'Nelder-Mead', '1h') residual = 37.3694
(4096, 'L-BFGS-B', '1h') residual = 37.3694
(4096, 'TNC', '1h') residual = 37.3694
(4096, 'SLSQP', '1h') residual = 37.3694
(4096, 'Powell', '1h') residual = 37.3694
(4096, 'Nelder-Mead', '2h') residual = 37.3672
(4096, 'L-BFGS-B', '2h') residual = 37.3672
(4096, 'TNC', '2h') residual = 37.3672
(4096, 'SLSQP', '2h') residual = 37.3672
(4096, 'Powell', '2h') residual = 37.3672
(4096, 'Nelder-Mead', '6h') residual = 37.4414
(4096, 'L-BFGS-B', '6h') residual = 37.4414
(4096, 'TNC', '6h') residual = 37.4414
(4096, 'SLSQP', '6h') residual = 37.4414

Analyze hyperparameters¶

In [ ]:
resid_by_N = map(lambda pair: (pair[0][0], get_residual(pair[1])), residuals.items())
df_resid_by_N = pd.DataFrame(resid_by_N, columns=["N", "residual"])
ax = df_resid_by_N.plot(x="N", y="residual", linestyle='', marker='o')
ax.set_ylabel("Residual")
In [ ]:
resid_by_Algo = map(lambda pair: (pair[0][1], get_residual(pair[1])), residuals.items())
df_resid_by_Algo = pd.DataFrame(resid_by_Algo, columns=["Algo", "residual"])
ax = df_resid_by_Algo.boxplot(by="Algo")
In [ ]:
resid_by_tMf = map(lambda pair: (pair[0][2], get_residual(pair[1])), residuals.items())
df_resid_by_tMf = pd.DataFrame(resid_by_tMf, columns=["Meal Freq", "residual"])
ax = df_resid_by_tMf.boxplot(by="Meal Freq")
In [ ]:
def weighted_objective_function(result):
    residual = get_residual(result)
    N = result.soln.shape[0]
    weighted_objective = residual / ((N + 1)**(0.5))
    return weighted_objective


best_algorithm, model_results = min(
    residuals.items(),
    key=lambda pair: weighted_objective_function(pair[1])
)
best_residual = get_residual(model_results)
print(f"Algorithm: '{best_algorithm}' ({best_residual:.3f})")

model_results.soln.head()
model_results.soln.plot(x="t", y="G")

print("Model Parameters:")
print(model_results.popt_named)
print("\nModel Fit Results Info:")
print(model_results.infodict)
print(model_results.infodict['message'])
In [ ]:
# Displaying results
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
# Use formatted_data for normalized raw glucose values (scaled to [0, 2])
plt.plot(model_results.formatted_data['t'], model_results.formatted_data['G'], label='CGM Data (normalized)', marker='o', linestyle='none', alpha=0.5)

# The model_results.soln dataframe contains the fitted curve (indexed by timedelta in hours)
plt.plot(model_results.soln["t"], model_results.soln['G'], label='Fitted CMA Model', color='red', linewidth=2)

plt.title(f'Participant {p_num}, CMA Model Fit\nResidual = {best_residual:.3f}\nNormalized BG over 24-hour time')
plt.xlabel('Time (hours)')
plt.ylabel('Glucose (normalized)')
plt.legend()
plt.show()

Posthoc Analysis¶

The PFun CMA model effectively characterizes the dynamic continuous glucose monitor (CGM) traces by adjusting specific physiological parameters (taup, taug, Cm).

By applying the model directly to the Kaggle Brist1D dataset, we observe that the CMA model is capable of fitting its compartmental equations to the raw glucose (G) values, providing the solid 'red' regression line above. The infodict generated through the fitting process yields vital metrics, like termination messages, to confirm the algorithm successfully converged on optimized coefficients.

Key Highlights:

  • The fitting process expects timestamp columns (time by default) in UTC format, which we handled by appending arbitrary UTC dates to the HH:MM:SS time values provided in the competition dataset.
  • Additional preprocessing steps are often handled directly through fit_model calling format_data.
  • By observing the parameter bounds and bounds conditions, one can tune the CMA model further to adjust to real physiological profiles seen across different participants.

Save rendered html¶

In [ ]:
%%sh

uv run jupyter nbconvert --to html kaggle-brist1d-fit-model-example.ipynb --output kaggle-brist1d-fit-model-example.html
cp kaggle-brist1d-fit-model-example.html ../../pfun_cma_model/static/notebooks/kaggle-brist1d-fit-model-example.html