data.py

This module provides two classes which are used to handle data from DLM results.

  • Grid: Container for gridded data which can be easily converted to

    a TimeSeries object

  • TimeSeries: Container for timeseries data which is used for all

    dlm fit functions

  • DLMResult: Handles single results

  • DLMResultList: Acts as a container for all DLMResults generated for a

    given time series

class dlmhelper.data.DLMResult(name: str, timeseries: TimeSeries, dlm_specification: dict, dlm_fit_params: dict, dlm_fit_rating: dict, level: ndarray, level_cov: ndarray, trend: ndarray, trend_cov: ndarray, seas: ndarray, seas_cov: ndarray, ar: ndarray, ar_cov: ndarray, resid: ndarray, _loglikeburn: int)

This class is a container for results from dlm fits. Objects from this class are created by dlmhelper.tools.dlm_fit() or dlmhelper.tools.dlm_ensemble() or when loading saved results from a .json file using DLMResult.load().

Parameters:
  • name (str) – Name/Identifier of object

  • timeseries (TimeSeries) – TimeSeries object that has been fitted

  • dlm_specification (dict) – Dict containing the dlm configuration

  • dlm_fit_params (dict) – Dict containing the values of the underlying parameters fitted by the dlm

  • dlm_fit_rating (dict) – Dict containing the various metrics which aid in comparing diffrent dlm fits

  • level (np.ndarray) – level component of shape (time)

  • level_cov (np.ndarray) – covariance of level component of shape (time)

  • trend (np.ndarray) – trend component of shape (time)

  • trend_cov (np.ndarray) – covariance of trend component of shape (time)

  • seas (np.ndarray) – seasonal components of shape (time, n) where n is the number of seasonal components

  • seas_cov (np.ndarray) – covariance of seasonal components of shape (time,n)

  • ar (np.ndarray) – auto-regressive component of shape (time)

  • ar_cov (np.ndarray) – covariance of auto-regressive component of shape (time)

  • resid – Residual of shape (time)

  • _loglikeburn (int) – UnobservedComponentsResults.loglikelihood_burn

classmethod create(name, timeseries: TimeSeries, result: UnobservedComponentsResults, score=None)

Creates a DLMResult object from a TimeSeries object and an UnobservedComponentsResults object

Parameters:
  • name (str) – Identifier

  • timeseries (TimeSeries) – Fitted timeseries

  • result – Fit results

  • score (dict) – dictionary of cross validation scores (this will change in the future)

Returns:

DLMResult object

Return type:

DLMResult

classmethod load(path)

Load TimeSeries object from .json

Parameters:

path (str) – path to .json file

Returns:

DLMResult object

Return type:

DLMResult

name_from_spec() str

Creates a unique identifier describing the dlm configuration

Returns:

Identifier

Return type:

str

plot(ax=None, seas=True)

Plot the fit result. If ax is not specified, create a new figure. Returns the figure and axis.

Parameters:
  • ax (matplotlib.axes, optional) – (Optional) The axis the plot should be drawn on, defaults to None

  • seas (bool, optional) – Whether to draw the seasonal component, defaults to True

Returns:

The axis and figure

Return type:

matplotlib.axes, matlotlib.figure

plot_summary(seas=True, fig=None, figsize=None)

Create a figure showing an overview of the dlm fit.

Parameters:
  • seas (bool, optional) – Whether the seasonal component is plotted in fit overview (if False only plot the level), defaults to True

  • fig (matplotlib.figure, optional) – Figure to use

  • figsize (Tuple, optional) – Tuple describing the figure size

Returns:

Figure

Return type:

matplotlib.figure

save(path, fname=None, verbose=1)

Save DLMResult object as .json Filename will be ‘DLMResult_fname.json’. If fname is not specified, the file is named using the name field of the object and an identifier generated from the DLM configuration.

Parameters:
  • path (str) – Path to .json file

  • fname (str, optional) – Filename, defaults to None

summary()

Plots a summary of the object This is currently not completely implemented

class dlmhelper.data.DLMResultList(results: List[DLMResult])

This class is used for handling multiple dlm results created by dlmhelper.tools.dlm_ensemble()

Params results:

List of DLMResult objects

get_best_result(converged: bool = True, sort: str = 'aic', dlm_spec_filter: List[dict] | None = None, dlm_fit_params_filter: List[dict] | None = None, n: int = 0)

Get the best dlm fit result using the metric given by sort

Parameters:
  • converged (bool, optional) – If True only show configurations for which the fit converged, defaults to True

  • sort (str) – Metric by which the results are sorted, defaults to aic

  • dlm_spec_filter (List[dict], optional) – TODO

  • dlm_fit_params_filter (List[dict], optional) – TODO

  • n (int, optional) – Get the n-th best result, defaults to 0

classmethod load_archive(path: str)

Load DLMResultList object saved as a .tar archive

Parameters:

path (str) – Path to archive

Returns:

DLMResultList object

Return type:

DLMResultList

plot_summary(num: str | int = 'all', converged: bool = True, sort: str = 'aic', seas: bool = True, dlm_spec_filter: List[dict] | None = None, dlm_fit_params_filter: List[dict] | None = None, figsize=(20, 20))

Plot a summary of the dlm results

Parameters:
  • num (str | int, optional) – Number of dlm results to plot. Can be ‘all’ or integer

  • converged (bool, optional) – If True only show configurations for which the fit converged, defaults to True

  • sort (str, optional) – Metric by which the results are sorted, defaults to aic

  • seas (bool, optional) – If True plot level+seasonal term

  • dlm_spec_filter (List[dict], optional) – TODO

  • dlm_fit_params_filter (List[dict], optional) – TODO

  • figsize (tuple, optional) – Figsize to be passed to matplotlib

save_archive(path: str)

Save the DLMResultList object as a .tar archive

Parameters:

path (str) – Path to file to save

summary(converged=True, sort='aic')

Print a list of all DLMResults

Parameters:
  • converged (bool, optional) – If True only show configurations for which the fit converged, defaults to True

  • sort (str) – Metric by which the results are sorted, defaults to aic

class dlmhelper.data.Grid(data: ndarray, lat: ndarray, lon: ndarray, time: ndarray | None = None, time64: ndarray | None = None, time_unit: str = 'day', reference_time: datetime64 = numpy.datetime64('1970-01-01'), error: ndarray | None = None, N: ndarray | None = None, product_type: str | None = None, grid_dim: dict | None = None)

This class acts as a container for gridded data. When creating a Grid object either time or time64 AND time_unit and reference_time has to be specified.

The data is then automatically sorted and missing time steps of time_unit are added to the arrays (e.g. for daily data all missing days are added, if time_unit=hours all missing hours are added) and filled with NaN’s.

filter_inhomogeneity_spatial(hs_lim: float | None = None, scale_lat: float | None = None, scale_lon: float | None = None)

Filter the data using the spatial inhomogeneity (hs) calculated by dlmhelper.spatio_temporal.inhomogeneity_spatial(). Filters days with hs>hs_lim, if hs_lim is not specified it will be calculated using the following formula:: hs_lim = median(hs) + 2*std(hs)

Parameters:
  • hs_lim (float, optional) – Maximum spatial inhomogeneity

  • scale_lat (float, optional) – Weight of the latitudinal part of the spatial inhomogeneity, if not specified lat and lon part will be equally weighted, defaults to None

  • scale_lon (float, optional) – Weight of the longitudinal part of the spatial inhomogeneity, if not specified lat and lon part will be equally weighted, defaults to None

filter_inhomogeneity_temporal(ht_lim: float = 0.5)

Filter the data using the temporal inhomogeneity (ht) calculated by dlmhelper.spatio_temporal.inhomogeneity_temporal(). Filters grid cells with ht>ht_lim.

Parameters:

ht_lim (float) – Limit for temporal inhomogeneity, defaults to 0.5

inhomogeneity_spatial(scale_lat: float | None = None, scale_lon: float | None = None)

Return the spatial inhomogeneity of the data using dlmhelper.spatio_temporal.inhomogeneity_spatial()

Parameters:
  • scale_lat (float, optional) – Weight of the latitudinal part of the spatial inhomogeneity, if not specified lat and lon part will be equally weighted, defaults to None

  • scale_lon (float, optional) – Weight of the longitudinal part of the spatial inhomogeneity, if not specified lat and lon part will be equally weighted, defaults to None

Returns:

Array of shape (time, 3), aach row contains the inhomogeneity, asymmetry component, and entropy component for the corresponding time step in N.

Return type:

np.ndarray

inhomogeneity_temporal()

Return the temporal inhomogeneity of the data using dlmhelper.spatio_temporal.inhomogeneity_temporal()

Returns:

Array of temporal homogeneity values at each grid point, with shape (lat, lon, 3). The last dimension contains the temporal inhomogeneity, asymmetry component, and entropy component.

Return type:

np.ndarray:

to_TimeSeries(zonal_avg: bool = False)

Creates a TimeSeries object from the gridded data by calculating the area-weighted average. If zonal_avg=True the data is first averaged over the longitudes and only then over the latitudes. This can help with the sampling bias in certain cases (e.g. if the data represents an atmospheric trace gas which is zonally well-mixed).

Parameters:

zonal_avg (bool) – Whether to average zonally first

Returns:

TimeSeries object from the data

Return type:

TimeSeries

class dlmhelper.data.TimeSeries(data: ndarray, time: ndarray | None = None, time_unit: str = 'day', time64: ndarray | None = None, reference_time: datetime64 = numpy.datetime64('1970-01-01'), error: ndarray | None = None, N: ndarray | None = None, product_type: str | None = None, grid_dim: dict | None = None)

This class acts as a container for timeseries data. When creating a TimeSeries object either time or time64 AND time_unit and reference_time has to be specified.

The data is then automatically sorted and missing time steps of time_unit are added to the arrays (e.g. for daily data all missing days are added, if time_unit=hours all missing hours are added) and filled with NaN’s.

Parameters:
  • data (np.ndarray) – The timeseries data of shape (time)

  • time (np.ndarray, optional) – Time values corresponding to time_unit since reference_time of shape (time)

  • time_unit (str) – Unit of the values from time. Possible values are listed in dlmhelper.TIME_ALIASES

  • reference_time (np.datetime64, optional) – Reference time for the values from time array, defaults to Unix-epoch

  • error (np.ndarray, optional) – Errors for the timeseries data of shape (time)

  • N (np.ndarray, optional) – Number of datapoints averaged for each timestep of shape (time)

  • product_type (str, optional) – Identifier of the data used

  • grid_dim (dict, optional) – Dimensions of the averaged area, can be created with dlmhelper.data.grid_dim()

classmethod load(path)

Load TimeSeries object from .json

Parameters:

path (str) – path to .json file

Returns:

TimeSeries object

Return type:

TimeSeries

plot(ax=None)

Plot the time series. If ax is not specified, create a new figure. Returns the figure and axis.

Parameters:

ax (matplotlib.axes, optional) – (Optional) The axis the plot should be drawn on, defaults to None

Returns:

The axis and figure

Return type:

matplotlib.axes, matlotlib.figure

save(path, fname, verbose=1)

Save TimeSeries object as .json Filename will be ‘TimeSeries_fname.json’

Parameters:
  • path (str) – Path to .json file

  • fname (str) – fname

dlmhelper.data.get_grid_dim(lat_min: float, lat_max: float, lon_min: float, lon_max: float, lat_step: float, lon_step: float) dict

Returns a dictionary containing the specified grid dimensions. Used by various functions in this package.

Parameters:
  • lat_min (float) – Minimum latitude

  • lat_max (float) – Maximum latitude

  • lon_min (float) – Minimum longitude

  • lon_max (float) – Maximum longitude

  • lat_step (float) – Latitude step-size

  • lon_step – Longitude step-size

Returns:

dictionary

Return type:

dict