data.py
This module provides two classes which are used to handle data from DLM results.
Grid: Container for gridded data which can be easily converted to a TimeSeries object
TimeSeries: Container for timeseries data which is used for all dlm fit functions
DLMResult: Handles single results
DLMResultList: Acts as a container for all DLMResults generated for a given time series
- class dlmhelper.data.DLMResult(name: str, timeseries: TimeSeries, dlm_specification: dict, dlm_fit_params: dict, dlm_fit_rating: dict, level: ndarray, level_cov: ndarray, trend: ndarray, trend_cov: ndarray, seas: ndarray, seas_cov: ndarray, ar: ndarray, ar_cov: ndarray, resid: ndarray)
This class is a container for results from dlm fits. Objects from this class are created by
dlmhelper.tools.dlm_fit()
ordlmhelper.tools.dlm_ensemble()
or when loading saved results from a .json file usingDLMResult.load()
.- Parameters:
name (str) – Name/Identifier of object
timeseries (TimeSeries) – TimeSeries object that has been fitted
dlm_specification (dict) – Dict containing the dlm configuration
dlm_fit_params (dict) – Dict containing the values of the underlying parameters fitted by the dlm
dlm_fit_rating (dict) – Dict containing the various metrics which aid in comparing diffrent dlm fits
level (np.ndarray) – level component of shape (time)
level_cov (np.ndarray) – covariance of level component of shape (time)
trend (np.ndarray) – trend component of shape (time)
trend_cov (np.ndarray) – covariance of trend component of shape (time)
seas (np.ndarray) – seasonal components of shape (time, n) where n is the number of seasonal components
seas_cov (np.ndarray) – covariance of seasonal components of shape (time,n)
ar (np.ndarray) – auto-regressive component of shape (time)
ar_cov (np.ndarray) – covariance of auto-regressive component of shape (time)
resid – Residual of shape (time). The residual is defined as the difference between the data and the fit (level+seas+AR)
- classmethod create(name, timeseries: TimeSeries, result: UnobservedComponentsResults, score=None)
Creates a DLMResult object from a TimeSeries object and an UnobservedComponentsResults object
- Parameters:
name (str) – Identifier
timeseries (TimeSeries) – Fitted timeseries
result – Fit results
score (dict) – dictionary of cross validation scores (this will change in the future)
- Returns:
DLMResult object
- Return type:
- classmethod load(path)
Load TimeSeries object from .json
- Parameters:
path (str) – path to .json file
- Returns:
DLMResult object
- Return type:
- name_from_spec() str
Creates a unique identifier describing the dlm configuration
- Returns:
Identifier
- Return type:
str
- plot(ax=None, seas=True)
Plot the fit result. If ax is not specified, create a new figure. Returns the figure and axis.
- Parameters:
ax (matplotlib.axes, optional) – (Optional) The axis the plot should be drawn on, defaults to None
seas (bool, optional) – Whether to draw the seasonal component, defaults to True
- Returns:
The axis and figure
- Return type:
matplotlib.axes, matlotlib.figure
- plot_summary(seas=True, fig=None, figsize=None)
Create a figure showing an overview of the dlm fit.
- Parameters:
seas (bool, optional) – Whether the seasonal component is plotted in fit overview (if False only plot the level), defaults to True
fig (matplotlib.figure, optional) – Figure to use
figsize (Tuple, optional) – Tuple describing the figure size
- Returns:
Figure
- Return type:
matplotlib.figure
- save(path, fname=None, verbose=1)
Save DLMResult object as .json Filename will be ‘DLMResult_fname.json’. If fname is not specified, the file is named using the name field of the object and an identifier generated from the DLM configuration.
- Parameters:
path (str) – Path to .json file
fname (str, optional) – Filename, defaults to None
- summary()
Plots a summary of the object This is currently not completely implemented
- class dlmhelper.data.DLMResultList(results: List[DLMResult])
This class is used for handling multiple dlm results created by
dlmhelper.tools.dlm_ensemble()
- Params results:
List of DLMResult objects
- get_best_result(converged: bool = True, sort: str = 'aic', dlm_spec_filter: List[dict] | None = None, dlm_fit_params_filter: List[dict] | None = None, n: int = 0)
Get the best dlm fit result using the metric given by sort
- Parameters:
converged (bool, optional) – If True only show configurations for which the fit converged, defaults to True
sort (str) – Metric by which the results are sorted, defaults to aic
dlm_spec_filter (List[dict], optional) – TODO
dlm_fit_params_filter (List[dict], optional) – TODO
n (int, optional) – Get the n-th best result, defaults to 0
- classmethod load_archive(path: str)
Load DLMResultList object saved as a .tar archive
- Parameters:
path (str) – Path to archive
- Returns:
DLMResultList object
- Return type:
- plot_summary(num: str | int = 'all', converged: bool = True, sort: str = 'aic', seas: bool = True, dlm_spec_filter: List[dict] | None = None, dlm_fit_params_filter: List[dict] | None = None, figsize=(20, 20))
Plot a summary of the dlm results
- Parameters:
num (str | int, optional) – Number of dlm results to plot. Can be ‘all’ or integer
converged (bool, optional) – If True only show configurations for which the fit converged, defaults to True
sort (str, optional) – Metric by which the results are sorted, defaults to aic
seas (bool, optional) – If True plot level+seasonal term
dlm_spec_filter (List[dict], optional) – TODO
dlm_fit_params_filter (List[dict], optional) – TODO
figsize (tuple, optional) – Figsize to be passed to matplotlib
- save_archive(path: str)
Save the DLMResultList object as a .tar archive
- Parameters:
path (str) – Path to file to save
- summary(converged=True, sort='aic')
Print a list of all DLMResults
- Parameters:
converged (bool, optional) – If True only show configurations for which the fit converged, defaults to True
sort (str) – Metric by which the results are sorted, defaults to aic
- class dlmhelper.data.Grid(data: ndarray, lat: ndarray, lon: ndarray, time: ndarray | None = None, time64: ndarray | None = None, time_unit: str = 'day', reference_time: datetime64 = numpy.datetime64('1970-01-01'), error: ndarray | None = None, N: ndarray | None = None, product_type: str | None = None, grid_dim: dict | None = None)
This class acts as a container for gridded data. When creating a Grid object either time or time64 AND time_unit and reference_time has to be specified.
The data is then automatically sorted and missing time steps of time_unit are added to the arrays (e.g. for daily data all missing days are added, if time_unit=hours all missing hours are added) and filled with NaN’s.
- filter_inhomogeneity_spatial(hs_lim: float | None = None, scale_lat: float | None = None, scale_lon: float | None = None)
Filter the data using the spatial inhomogeneity (hs) calculated by
dlmhelper.spatio_temporal.inhomogeneity_spatial()
. Filters days with hs>hs_lim, if hs_lim is not specified it will be calculated using the following formula:: hs_lim = median(hs) + 2*std(hs)- Parameters:
hs_lim (float, optional) – Maximum spatial inhomogeneity
scale_lat (float, optional) – Weight of the latitudinal part of the spatial inhomogeneity, if not specified lat and lon part will be equally weighted, defaults to None
scale_lon (float, optional) – Weight of the longitudinal part of the spatial inhomogeneity, if not specified lat and lon part will be equally weighted, defaults to None
- filter_inhomogeneity_temporal(ht_lim: float = 0.5)
Filter the data using the temporal inhomogeneity (ht) calculated by
dlmhelper.spatio_temporal.inhomogeneity_temporal()
. Filters grid cells with ht>ht_lim.- Parameters:
ht_lim (float) – Limit for temporal inhomogeneity, defaults to 0.5
- inhomogeneity_spatial(scale_lat: float | None = None, scale_lon: float | None = None)
Return the spatial inhomogeneity of the data using
dlmhelper.spatio_temporal.inhomogeneity_spatial()
- Parameters:
scale_lat (float, optional) – Weight of the latitudinal part of the spatial inhomogeneity, if not specified lat and lon part will be equally weighted, defaults to None
scale_lon (float, optional) – Weight of the longitudinal part of the spatial inhomogeneity, if not specified lat and lon part will be equally weighted, defaults to None
- Returns:
Array of shape (time, 3), aach row contains the inhomogeneity, asymmetry component, and entropy component for the corresponding time step in N.
- Return type:
np.ndarray
- inhomogeneity_temporal()
Return the temporal inhomogeneity of the data using
dlmhelper.spatio_temporal.inhomogeneity_temporal()
- Returns:
Array of temporal homogeneity values at each grid point, with shape (lat, lon, 3). The last dimension contains the temporal inhomogeneity, asymmetry component, and entropy component.
- Return type:
np.ndarray:
- to_TimeSeries(zonal_avg: bool = False)
Creates a TimeSeries object from the gridded data by calculating the area-weighted average. If zonal_avg=True the data is first averaged over the longitudes and only then over the latitudes. This can help with the sampling bias in certain cases (e.g. if the data represents an atmospheric trace gas which is zonally well-mixed).
- Parameters:
zonal_avg (bool) – Whether to average zonally first
- Returns:
TimeSeries object from the data
- Return type:
- class dlmhelper.data.TimeSeries(data: ndarray, time: ndarray | None = None, time_unit: str = 'day', time64: ndarray | None = None, reference_time: datetime64 = numpy.datetime64('1970-01-01'), error: ndarray | None = None, N: ndarray | None = None, product_type: str | None = None, grid_dim: dict | None = None)
This class acts as a container for timeseries data. When creating a TimeSeries object either time or time64 AND time_unit and reference_time has to be specified.
The data is then automatically sorted and missing time steps of time_unit are added to the arrays (e.g. for daily data all missing days are added, if time_unit=hours all missing hours are added) and filled with NaN’s.
- Parameters:
data (np.ndarray) – The timeseries data of shape (time)
time (np.ndarray, optional) – Time values corresponding to time_unit since reference_time of shape (time)
time_unit (str) – Unit of the values from time. Possible values are listed in
dlmhelper.TIME_ALIASES
reference_time (np.datetime64, optional) – Reference time for the values from time array, defaults to Unix-epoch
error (np.ndarray, optional) – Errors for the timeseries data of shape (time)
N (np.ndarray, optional) – Number of datapoints averaged for each timestep of shape (time)
product_type (str, optional) – Identifier of the data used
grid_dim (dict, optional) – Dimensions of the averaged area, can be created with
dlmhelper.data.grid_dim()
- classmethod load(path)
Load TimeSeries object from .json
- Parameters:
path (str) – path to .json file
- Returns:
TimeSeries object
- Return type:
- plot(ax=None)
Plot the time series. If ax is not specified, create a new figure. Returns the figure and axis.
- Parameters:
ax (matplotlib.axes, optional) – (Optional) The axis the plot should be drawn on, defaults to None
- Returns:
The axis and figure
- Return type:
matplotlib.axes, matlotlib.figure
- save(path, fname, verbose=1)
Save TimeSeries object as .json Filename will be ‘TimeSeries_fname.json’
- Parameters:
path (str) – Path to .json file
fname (str) – fname
- dlmhelper.data.get_grid_dim(lat_min: float, lat_max: float, lon_min: float, lon_max: float, lat_step: float, lon_step: float) dict
Returns a dictionary containing the specified grid dimensions. Used by various functions in this package.
- Parameters:
lat_min (float) – Minimum latitude
lat_max (float) – Maximum latitude
lon_min (float) – Minimum longitude
lon_max (float) – Maximum longitude
lat_step (float) – Latitude step-size
lon_step – Longitude step-size
- Returns:
dictionary
- Return type:
dict