tools.py
This module provides functions to perform dynamic linear model fits and for the evaluation of the results.
- dlmhelper.tools.annual_level_increase(data: DLMResult, year: int, tolerance: timedelta64 = numpy.timedelta64(1, 'D')) Tuple[float, float]
Calculate annual increase in level increase between year and year+1 for a given DLMResult object. Returns increase and corresponding error. Uses times from DLMResult object closest to ‘year-01-01’ and ‘year+1-12-31’ within given tolerance for the calculation. The tolerance defaults to one day. Returns (None, None) if no times fall within tolerance.
- Parameters:
data (DLMResult) – DLMResult object used for calculation
year (int) – Year
tolerance (np.timedelta64) – Tolerance
- Returns:
Tuple[float, float]: a tuple containing the annual increase and standard deviation.
- dlmhelper.tools.cv_dlm_ensemble(timeseries: TimeSeries, level: List[bool] = [True], variable_level: List[bool] = [False], trend: List[bool] = [True], variable_trend: List[bool] = [True], seasonal: List[bool] = [True], seasonal_period: List[List[float]] = [[365]], seasonal_harmonics: List[List[List[int]]] = [[[1, 2, 3, 4]]], variable_seasonal: List[List[List[bool]]] = [[[True, False]]], autoregressive: List[int] = [1], irregular: List[bool] = [False, True], scores: dict | None = None, folds: int = 5, verbose: int = 0) dict
Performs cross validation using the specified number of folds and calculates the average mean squated error (AMSE) for all configurations.
Returns a dictionary with keys corresponding to the model configurations gained by
dlmhelper.dlm_data.DLMResult.name_from_spec()
and the AMSE as values.See
dlmhelper.dlm_helper.dlm_ensemble()
for information on the parameters.- Parameters:
timeseries (TimeSeries) – TimeSeries object do be fitted
name (str) – Identifier for the DLMResult object
level (List[bool]) – Whether to include a level component, defaults to [True]
variable_level (List[bool]) – Wheter to allow the level component to vary, defaults to [False]
trend (List[bool]) – Whether to include a trend component, defaults to [True]
variable_trend (List[bool]) – Whether to allow the trend component to vary, defaults to [True]
seasonal (List[bool]) – Whether to include a seasonal component, defaults to [True]
seasonal_period (List[List[float]]) – List of configurations of seasonal components. Each element is a list containing the periods of the seasonal components, defaults to [[365]]
seasonal_harmonics (List[List[List[int]]]) – List harmonics to try for the corresponding seasonal components. For each element of seasonal_period this should include a list of harmonics to try, defaults to [[[1,2,3,4]]]
variable_seasonal (List[List[List[bool]]]) – Whether a seasonal component is allowed to vary. For each element of seasonal_period this should include a list of options, defaults to [[[True, False]]]
autoregressive (List[int]) – List of autoregressive components to try, the integer determines the order of the autoregressive component, defaults to [1]
irregular (List[bool]) – Whether to include an additional Gaussian noise, defaults to [True, False]
scores (dict) – A dictionary containing scores for different configurations. Currently used to pass the results of cross validation to the final ensemble fit, defaults to None
folds (int) – Number of folds to use for cross validation, defaults to 5
verbose (int) – Determines the amount of outpout, 0 means no output and 2 means maximum outout, defaults to 0
- Returns:
A dictionary containing the AMSE for each model config
- Return type:
dict
- dlmhelper.tools.dlm_ensemble(timeseries: TimeSeries, name: str, level: List[bool] = [True], variable_level: List[bool] = [False], trend: List[bool] = [True], variable_trend: List[bool] = [True], seasonal: List[bool] = [True], seasonal_period: List[List[float]] = [[365]], seasonal_harmonics: List[List[List[int]]] = [[[1, 2, 3, 4]]], variable_seasonal: List[List[List[bool]]] = [[[True, False]]], autoregressive: List[int] = [1], irregular: List[bool] = [True, False], scores: dict | None = None, verbose: int = 0) DLMResultList
Fits an ensemble of Dynamic Linear Models to a TimeSeries object and returns a DLMResultList object.
For all keyword arguments (except scores) a list or nested list is used to determine the configurations used in the ensemble.
For most parameters a boolean List is used. For example variable_level = [True, False] would include model configurations with and without a variable level in the ensemble. The possible values are therefore [True], [False], [True, False].
If seasonal components are included in the ensemble they can be specified using nested lists. Each configuration can included multiple seasonal components
- Parameters:
timeseries (TimeSeries) – TimeSeries object do be fitted
name (str) – Identifier for the DLMResult object
level (List[bool]) – Whether to include a level component, defaults to [True]
variable_level (List[bool]) – Wheter to allow the level component to vary, defaults to [False]
trend (List[bool]) – Whether to include a trend component, defaults to [True]
variable_trend (List[bool]) – Whether to allow the trend component to vary, defaults to [True]
seasonal (List[bool]) – Whether to include a seasonal component, defaults to [True]
seasonal_period (List[List[float]]) – List of configurations of seasonal components. Each element is a list containing the periods of the seasonal components, defaults to [[365]]
seasonal_harmonics (List[List[List[int]]]) – List harmonics to try for the corresponding seasonal components. For each element of seasonal_period this should include a list of harmonics to try, defaults to [[[1,2,3,4]]]
variable_seasonal (List[List[List[bool]]]) – Whether a seasonal component is allowed to vary. For each element of seasonal_period this should include a list of options, defaults to [[[True, False]]]
autoregressive (List[int]) – List of autoregressive components to try, the integer determines the order of the autoregressive component, defaults to [1]
irregular (List[bool]) – Whether to include an additional Gaussian noise, defaults to [True, False]
scores (dict) – A dictionary containing scores for different configurations. Currently used to pass the results of cross validation to the final ensemble fit, defaults to None
verbose (int) – Determines the amount of outpout, 0 means no output and 2 means maximum outout, defaults to 0
- Returns:
An object containing multiple DLMResult objects
- Return type:
- dlmhelper.tools.dlm_fit(timeseries: TimeSeries, name: str, level: bool = True, variable_level: bool = False, trend: bool = True, variable_trend: bool = True, seasonal: bool = True, seasonal_period: List[float] = [365], seasonal_harmonics: List[int] = [4], variable_seasonal: List[bool] = [False], autoregressive: int = 1, irregular: bool = True, verbose: int = 0) DLMResult
Performs a dynamic linear model fit on the given TimeSeries object and returns a DLMResult object.
- Parameters:
timeseries (TimeSeries) – TimeSeries object do be fitted
name (str) – Identifier for the DLMResult object
level (bool) – Whether to include a level component, defaults to True
variable_level (bool) – Whether to allow the level component to vary, defaults to False
trend (bool) – Whether to include a trend (i.e. changing level), defaults to True
variable_trend (bool) – Whether to allow the trend component to vary, defaults to True
seasonal (bool) – Whether to include seasonal components, defaults to True
seasonal_period (List[bool]) – List of periods for the seasonal components, defaults to [365]
seasonal_harmonics (List[int]) – Number of harmonics to use for the seasonal components, defaults to [4]
variable_seasonal (List[bool]) – Whether the seasonal componets are allowed to vary, defaults to [False]
autoregressive (int | None) – Determines the order of the autoregressive component, use None to not include, defaults to 1
irregular (bool) – Whether to a Gaussian noise term, defaults to True
verbose (int) – Determines the amount of outpout, 0 means no output and 2 means maximum outout, defaults to 0
- Returns:
A DLMResult object
- Return type:
- dlmhelper.tools.mean_level_from_dates(data: DLMResult, t1: datetime64, t2: datetime64, tolerance: timedelta64 = numpy.timedelta64(1, 'D')) float
Returns the mean level between the two given dates. Uses times from DLMResult object closest to ‘t1’ and ‘t2’ within given tolerance for the calculation. The tolerance defaults to one day. Returns None if no times fall within tolerance.
- Parameters:
data (DLMResult) – DLMResult object used for calculation
t1 (np.datetime64) – Date
t2 (np.datetime64) – Date
tolerance (np.timedelta64) – Tolerance
- Returns:
float: Mean of the values in X that fall within the specified date range.
- dlmhelper.tools.model_selection_bias_ALI(results: DLMResultList, years: _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], percentile: float = 25, metric: str = 'aic')
Calculate the model selection bias for Dynamic Linear Models results.
This function computes the model selection bias for ALIs for the given DLMResultList. The bias is calculated by computing the weighted variance between the average fit ALI and each individual fit ALI for each year. The bias is calculated using all models whose metric is within the specified percentile.
- Parameters:
results (DLMResultList) – DLMResultList
years (ArrayLike) – Array of years for which the bias is calculated
percentile (float) – Percentile of models to use for comparison
metric (str) – Metric to use for comparison of models, defaults to ‘aic’
- Returns:
np.ndarray: An array containing the model selection bias for each year specified in the years array.
- dlmhelper.tools.model_selection_bias_trend(results: DLMResultList, t1: datetime64 | None = None, t2: datetime64 | None = None, percentile: float = 25, metric: str = 'aic', tolerance: timedelta64 | None = None)
Calculate the model selection bias for Dynamic Linear Models (DLM) results.
This function computes the model selection bias for growth rates for the given DLMResultsList. The bias is calculated by computing the weighted variance between the average fit trend (growth rate) and each individual fit trend. The bias is calculated using all models whose metric is within the specified percentile. If t1 and/or t2 are specified the times will be used to determine the start and end date for the comparison.
- Parameters:
results (DLMResultList) – DLMResultList
t1 (np.datetime64) – Date
t2 (np.datetime64) – Date
tolerance (np.timedelta64) – Tolerance
percentile (float) – Percentile of models to use for comparison
metric (str) – Metric to use for comparison of models, defaults to ‘aic’
- Returns:
np.ndarray: An array containing the model selection bias for each year specified in the years array.