tools.py

This module provides functions to perform dynamic linear model fits and for the evaluation of the results.

dlmhelper.tools.annual_level_increase(data: DLMResult, year: int, tolerance: timedelta64 = numpy.timedelta64(1, 'D')) Tuple[float, float]

Calculate annual increase in level increase between year and year+1 for a given DLMResult object. Returns increase and corresponding error. Uses times from DLMResult object closest to ‘year-01-01’ and ‘year+1-12-31’ within given tolerance for the calculation. The tolerance defaults to one day. Returns (None, None) if no times fall within tolerance.

Parameters:
  • data (DLMResult) – DLMResult object used for calculation

  • year (int) – Year

  • tolerance (np.timedelta64) – Tolerance

Returns:

Tuple[float, float]: a tuple containing the annual increase and standard deviation.

dlmhelper.tools.cv_dlm_ensemble(timeseries: TimeSeries, level: List[bool] = [True], variable_level: List[bool] = [False], trend: List[bool] = [True], variable_trend: List[bool] = [True], seasonal: List[bool] = [True], seasonal_period: List[List[float]] = [[365]], seasonal_harmonics: List[List[List[int]]] = [[[1, 2, 3, 4]]], variable_seasonal: List[List[List[bool]]] = [[[True, False]]], autoregressive: List[int] = [1], irregular: List[bool] = [False, True], scores: dict | None = None, folds: int = 5, verbose: int = 0) dict

Performs cross validation using the specified number of folds and calculates the average mean squated error (AMSE) for all configurations.

Returns a dictionary with keys corresponding to the model configurations gained by dlmhelper.dlm_data.DLMResult.name_from_spec() and the AMSE as values.

See dlmhelper.dlm_helper.dlm_ensemble() for information on the parameters.

Parameters:
  • timeseries (TimeSeries) – TimeSeries object do be fitted

  • name (str) – Identifier for the DLMResult object

  • level (List[bool]) – Whether to include a level component, defaults to [True]

  • variable_level (List[bool]) – Wheter to allow the level component to vary, defaults to [False]

  • trend (List[bool]) – Whether to include a trend component, defaults to [True]

  • variable_trend (List[bool]) – Whether to allow the trend component to vary, defaults to [True]

  • seasonal (List[bool]) – Whether to include a seasonal component, defaults to [True]

  • seasonal_period (List[List[float]]) – List of configurations of seasonal components. Each element is a list containing the periods of the seasonal components, defaults to [[365]]

  • seasonal_harmonics (List[List[List[int]]]) – List harmonics to try for the corresponding seasonal components. For each element of seasonal_period this should include a list of harmonics to try, defaults to [[[1,2,3,4]]]

  • variable_seasonal (List[List[List[bool]]]) – Whether a seasonal component is allowed to vary. For each element of seasonal_period this should include a list of options, defaults to [[[True, False]]]

  • autoregressive (List[int]) – List of autoregressive components to try, the integer determines the order of the autoregressive component, defaults to [1]

  • irregular (List[bool]) – Whether to include an additional Gaussian noise, defaults to [True, False]

  • scores (dict) – A dictionary containing scores for different configurations. Currently used to pass the results of cross validation to the final ensemble fit, defaults to None

  • folds (int) – Number of folds to use for cross validation, defaults to 5

  • verbose (int) – Determines the amount of outpout, 0 means no output and 2 means maximum outout, defaults to 0

Returns:

A dictionary containing the AMSE for each model config

Return type:

dict

dlmhelper.tools.dlm_ensemble(timeseries: TimeSeries, name: str, level: List[bool] = [True], variable_level: List[bool] = [False], trend: List[bool] = [True], variable_trend: List[bool] = [True], seasonal: List[bool] = [True], seasonal_period: List[List[float]] = [[365]], seasonal_harmonics: List[List[List[int]]] = [[[1, 2, 3, 4]]], variable_seasonal: List[List[List[bool]]] = [[[True, False]]], autoregressive: List[int] = [1], irregular: List[bool] = [True, False], scores: dict | None = None, verbose: int = 0) DLMResultList

Fits an ensemble of Dynamic Linear Models to a TimeSeries object and returns a DLMResultList object.

For all keyword arguments (except scores) a list or nested list is used to determine the configurations used in the ensemble.

For most parameters a boolean List is used. For example variable_level = [True, False] would include model configurations with and without a variable level in the ensemble. The possible values are therefore [True], [False], [True, False].

If seasonal components are included in the ensemble they can be specified using nested lists. Each configuration can included multiple seasonal components

Parameters:
  • timeseries (TimeSeries) – TimeSeries object do be fitted

  • name (str) – Identifier for the DLMResult object

  • level (List[bool]) – Whether to include a level component, defaults to [True]

  • variable_level (List[bool]) – Wheter to allow the level component to vary, defaults to [False]

  • trend (List[bool]) – Whether to include a trend component, defaults to [True]

  • variable_trend (List[bool]) – Whether to allow the trend component to vary, defaults to [True]

  • seasonal (List[bool]) – Whether to include a seasonal component, defaults to [True]

  • seasonal_period (List[List[float]]) – List of configurations of seasonal components. Each element is a list containing the periods of the seasonal components, defaults to [[365]]

  • seasonal_harmonics (List[List[List[int]]]) – List harmonics to try for the corresponding seasonal components. For each element of seasonal_period this should include a list of harmonics to try, defaults to [[[1,2,3,4]]]

  • variable_seasonal (List[List[List[bool]]]) – Whether a seasonal component is allowed to vary. For each element of seasonal_period this should include a list of options, defaults to [[[True, False]]]

  • autoregressive (List[int]) – List of autoregressive components to try, the integer determines the order of the autoregressive component, defaults to [1]

  • irregular (List[bool]) – Whether to include an additional Gaussian noise, defaults to [True, False]

  • scores (dict) – A dictionary containing scores for different configurations. Currently used to pass the results of cross validation to the final ensemble fit, defaults to None

  • verbose (int) – Determines the amount of outpout, 0 means no output and 2 means maximum outout, defaults to 0

Returns:

An object containing multiple DLMResult objects

Return type:

DLMResultList

dlmhelper.tools.dlm_fit(timeseries: TimeSeries, name: str, level: bool = True, variable_level: bool = False, trend: bool = True, variable_trend: bool = True, seasonal: bool = True, seasonal_period: List[float] = [365], seasonal_harmonics: List[int] = [4], variable_seasonal: List[bool] = [False], autoregressive: int = 1, irregular: bool = True, verbose: int = 0) DLMResult

Performs a dynamic linear model fit on the given TimeSeries object and returns a DLMResult object.

Parameters:
  • timeseries (TimeSeries) – TimeSeries object do be fitted

  • name (str) – Identifier for the DLMResult object

  • level (bool) – Whether to include a level component, defaults to True

  • variable_level (bool) – Whether to allow the level component to vary, defaults to False

  • trend (bool) – Whether to include a trend (i.e. changing level), defaults to True

  • variable_trend (bool) – Whether to allow the trend component to vary, defaults to True

  • seasonal (bool) – Whether to include seasonal components, defaults to True

  • seasonal_period (List[bool]) – List of periods for the seasonal components, defaults to [365]

  • seasonal_harmonics (List[int]) – Number of harmonics to use for the seasonal components, defaults to [4]

  • variable_seasonal (List[bool]) – Whether the seasonal componets are allowed to vary, defaults to [False]

  • autoregressive (int | None) – Determines the order of the autoregressive component, use None to not include, defaults to 1

  • irregular (bool) – Whether to a Gaussian noise term, defaults to True

  • verbose (int) – Determines the amount of outpout, 0 means no output and 2 means maximum outout, defaults to 0

Returns:

A DLMResult object

Return type:

DLMResult

dlmhelper.tools.mean_level_from_dates(data: DLMResult, t1: datetime64, t2: datetime64, tolerance: timedelta64 = numpy.timedelta64(1, 'D')) float

Returns the mean level between the two given dates. Uses times from DLMResult object closest to ‘t1’ and ‘t2’ within given tolerance for the calculation. The tolerance defaults to one day. Returns None if no times fall within tolerance.

Parameters:
  • data (DLMResult) – DLMResult object used for calculation

  • t1 (np.datetime64) – Date

  • t2 (np.datetime64) – Date

  • tolerance (np.timedelta64) – Tolerance

Returns:

float: Mean of the values in X that fall within the specified date range.

dlmhelper.tools.model_selection_bias_ALI(results: DLMResultList, years: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], percentile: float = 25, metric: str = 'aic')

Calculate the model selection bias for Dynamic Linear Models results.

This function computes the model selection bias for ALIs for the given DLMResultList. The bias is calculated by computing the weighted variance between the average fit ALI and each individual fit ALI for each year. The bias is calculated using all models whose metric is within the specified percentile.

Parameters:
  • results (DLMResultList) – DLMResultList

  • years (ArrayLike) – Array of years for which the bias is calculated

  • percentile (float) – Percentile of models to use for comparison

  • metric (str) – Metric to use for comparison of models, defaults to ‘aic’

Returns:

np.ndarray: An array containing the model selection bias for each year specified in the years array.

dlmhelper.tools.model_selection_bias_trend(results: DLMResultList, t1: datetime64 | None = None, t2: datetime64 | None = None, percentile: float = 25, metric: str = 'aic', tolerance: timedelta64 | None = None)

Calculate the model selection bias for Dynamic Linear Models (DLM) results.

This function computes the model selection bias for growth rates for the given DLMResultsList. The bias is calculated by computing the weighted variance between the average fit trend (growth rate) and each individual fit trend. The bias is calculated using all models whose metric is within the specified percentile. If t1 and/or t2 are specified the times will be used to determine the start and end date for the comparison.

Parameters:
  • results (DLMResultList) – DLMResultList

  • t1 (np.datetime64) – Date

  • t2 (np.datetime64) – Date

  • tolerance (np.timedelta64) – Tolerance

  • percentile (float) – Percentile of models to use for comparison

  • metric (str) – Metric to use for comparison of models, defaults to ‘aic’

Returns:

np.ndarray: An array containing the model selection bias for each year specified in the years array.