Code Overview: Modules
analogs.py
- synopsis:
This module contain the principal classes and functions to make the time series preprocessing and reconstructions
The rascal.analogs.Station()
class stores station metadata (code, name, altitude, longitude and latitude) and calculate daily time series.
- class rascal.analogs.Station(path)
Stores station metadata (code, name, altitude, longitude and latitude) and calculate daily time series.
- Parameters:
path (str) – Path of the directory that contains the observations.
- path
Path of the directory that contains the observations
- Type:
str
- meta
DataFrame with the metadata of the station (code, name, altitude, longitude and latitude)
- Type:
pd.DataFrame
- longitude
Longitude of the station
- Type:
float
- latitude
Latitude of the station
- Type:
float
- altitude
Elevation of the station
- Type:
float
- get_data(variable[, skipna=True])
Get the daily time series of the
variable
- Parameters:
variable (str) – variable name.
skipna (bool) – skipna when resampling to daily frequency.
- Returns:
data
- Return type:
pd.DataFrame
- get_gridpoint(grid_latitudes, grid_longitudes)
The rascal.analogs.Predictor()
class stores the predictor data and Principal Component Analysis results:
- class rascal.analogs.Predictor(paths, grouping, lat_min, lat_max, lon_min, lon_max[, mosaic=True][, number=None])
Predictor class. This contains data about the predictor variable to use for the reconstruction.
- Parameters:
path (list[str]) – Paths of the grib file to open.
grouping (str or None) –
Method of grouping the data, str format = “frequency_method”
frequency=(“hourly”, “daily”, “monthly”, “yearly”)
method=(“mean”, “max”, “min”, “sum”)
lat_min (float) – Predictor field minimum latitude
lat_max (float) – Predictor field maximum latitude
lon_min (float) – Predictor field minimum longitude
lon_max (float) – Predictor field maximum longitude
mosaic (bool or None) – if True apply
.to_mosaic()
methodnumber (int or None) – Ensemble member number
- data
- Type:
xr.Dataset
- crop(lat_min, lat_max, lon_min, lon_max)
Crop the domain of the dataframe
- Parameters:
lat_min (float) – New minimum latitude
lat_max (float) – New maximum latitude
lon_min (float) – New minimum longitude
lon_max (float) – New maximum longitude
- to_mosaic()
To use various simultaneous predictors or a vectorial variable, concatenate the variables along the longitude axis to obtain a single compound variable, easier to use when performing PCA.
- Returns:
compound_predictor
- Return type:
xr.Dataset
- module()
Get the module of the predictor variables as if they were components of a vector.
- Returns:
self
- Return type:
- anomalies([seasons][, standardize][, mean_period])
Calculate seasonal anomalies of the field. The definition of season is flexible, being only a list of months contained within it.
- Parameters:
seasons (list[list[int]] or None) – Months of the season. Default = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
standardize (bool or None) – Standardize anomalies. Default = True
mean_period (list[pd.DatetimeIndex] or None) – Dates to use as mean climatology to calculate the anomalies.
- Returns:
anomalies (dims = [time, latitude, longitude, season])
- Return type:
xr.Dataset
- pcs(path, npcs[, seasons][, standardize][, pcscaling][, overwrite][, training][, project])
Perform Principal Component Analysis. To save computation time, the PCA object can be saved as a pickle, so the analysis does not have to be performed every time.
- Parameters:
path (str) – Path to save the PCA results
npcs (int) – Number of components.
seasons (list[list[int]] or None) – List of list of months of every season.
standardize (bool or None) – If True, the anomalies used in the PCA are standardized.
pcscaling (int or None) –
Set the scaling of the PCs used to compute covariance. The following values are accepted:
0 : Un-scaled PCs.
1 : PCs are scaled to unit variance (divided by the square-root of their eigenvalue) (default).
2 : PCs are multiplied by the square-root of their eigenvalue.
overwrite (bool or None) – Default = False. If True recalculate the PCA and overwrite the pickle with the PCA
training (list[pd.DatetimeIndex] or None) – Dates to use for calculating the PCA
project (xr.Dataset or None) – Data to project onto the calculated PCA results.
The rascal.analogs.Analogs()
get the pool of analog days and reconstruct the time series:
- class rascal.analogs.Analogs(pcs, dates, observations)
Predictor class. This contains data about the predictor variable to use for the reconstruction.
- Parameters:
path (list[str] or None) – Optional “kind” of ingredients.
- get_pool(size[, vw_size][, vw_type][, distance])
Get the pool of
size
closest neighbors to each day- Parameters:
size (int) – Number of neighbors in the pool.
vw_size (int or None) – Validation window size. How many data points around each point is ignored to validate the reconstruction.
vw_type (str or None) –
Type of validation window. Options:
forward: The original date is the last date of the window.
backward: The original date is the firs date of the window.
centered: The original date is in the center of the window.
distance (str or None) –
Metric to determine the distance between points in the PCs space. Options:
euclidean
mahalanobis (Wishlist)
- Returns:
(analog_dates, analog_distances)
, dates of the analogs in the pool for each day, and distances in the PCs space of each- Return type:
(pd.DataFrame, pd.DataFrame)
- reconstruct([pool_size][, method][, sample_size][, mapping_variable][, vw_size][, vw_type][, distance])
Reconstruct a time series using the analog pool for each day.
- Parameters:
pool_size (int or None) – Size of the analog pool for each day.
method (str or None) –
Similarity method to select the best analog of the pool. Options are:
’closest’: (Selected by default) Select the closest analog in the PCs space
’average’: Calculate the weighted average of the ‘sample_size’ closest analogs in the PCs space.
’quantilemap’: Select the analog that represent the same quantile in the observations pool that another mapping variable.
sample_size (int or None) – Number of analogs to average in the ‘average’ method
mapping_variable (Predictor or None) – Time series of a variable to use as mapping in ‘quantilemap’
vw_size (int or None) – Validation window size. How many data points around each point is ignored to validate the reconstruction.
vw_type (str or None) –
Type of validation window. Options:
forward: The original date is the last date of the window.
backward: The original date is the firs date of the window.
centered: The original date is in the center of the window.
distance (str or None) –
Metric to determine the distance between points in the PCs space. Options:
euclidean
mahalanobis (Wishlist)
- Returns:
reconstruction
- Return type:
pd.DataFrame
analysis.py
- synopsis:
This module contain the principal classes and functions analyze the skill and validate the reconstructions
You can use the rascal.analysis.RSkill()
class to validate and analyze the skill of the reconstructions:
- class rascal.analysis.RSkill([observations][, reconstructions][, reanalysis][, data])
Predictor class. This contains data about the predictor variable to use for the reconstruction.
- Parameters:
observations (pd.DataFrame or None) – Obstervations time series
reconstructions (pd.DataFrame or None) – Reconstructions time series
reanalysis (pd.DataFrame or None) – Reanalysis time series
data (pd.DataFrame or None) – All data joined (observations, reconstructions, reanalysis)
- observations
Obstervations time series
- Type:
pd.DataFrame
- reconstructions
Reconstructions time series
- Type:
pd.DataFrame
- reanalysis
Reanalysis time series
- Type:
pd.DataFrame
- data
All data joined (observations, reconstructions, reanalysis) concatenated in the columns axis
- Type:
pd.DataFrame
- resample(freq, grouping[, hydroyear][, skipna])
Resample the dataset containing observations, reconstructions and reanalysis data.
- Parameters:
freq (str) – New sampling frequency.
grouping (str) – Options=”mean”, “median” or “sum”
hydroyear (bool or None) – Default=False. If True, when the resampling frequency is “1Y” it takes hydrological years (from October to September) instead of natural years
skipna (bool or None) – Default=False. If True ignore NaNs.
- Returns:
RSkill with resampled data
- Return type:
- plotseries([color][, start][, end][, methods])
Plot the time series of the reconstructions with the reanalysis and observations series
- Parameters:
color (dict or None) – dict of which color to use (values) with each dataset (keys)
start (Datetime or None) – Start date of the plot
end (Datetime or None) – End date of the plot
methods (list[str] or None) – Reconstruction methods to plot
- skill([reference=None][, threshold=None])
Generate a pd.DataFrame with the table of skills of various simulations. The skill metrics are:
Mean Bias Error (bias)
Root Mean Squared Error (rmse)
Correlation Coefficient (r2)
Standard Deviation (std)
MSE-based Skill Score (ssmse)
Heidke Skill Score (hss)
Brier Score (bs)
- param reference:
Time series of a reference model to compare when calculating SSMSE and HSS.
- param threshold:
Threshold to use when computing the HSS and BS
- type referece:
pd.DataFrame or None
- type threshold:
float or None
- return:
(observation_std, skill_table)
, Standard deviation of the observations and table of each skill score for each simulation.- rtype:
(float, pd.DataFrame)
- taylor()
Calls
.skill()
method and computes the Taylor diagram- Returns:
fig, ax
- annual_cycle([grouping][, color])
Plot the annual cycle of the reconstructions, reanalysis and observations
- Parameters:
grouping (str or None) – (Default=”mean”) Monthly grouping to plot in the cylce. Options=(“sum”, “mean”, “median”, “std”)
color (dict or None) – dict of which color to use (values) with each dataset (keys)
- qqplot()
Quantile-Quantile plot
indices.py
- synopsis:
This module contain the principal classes and functions to calculate relevant climatic indices
You can use the rascal.indices.CIndex()
class to retrieve relevant climatic indices based on:
Data, C. (2009). Guidelines on analysis of extremes in a changing climate in support of informed decisions for adaptation. World Meteorological Organization.
- class rascal.analysis.CIndex(df)
- Parameters:
df (pd.DataFrame) – Time series containing the relevant variables for the index calculation.
- fd()
Count of days where TN (daily minimum temperature) < 0°C Let TNij be the daily minimum temperature on day i in period j. Count the number of days where TNij < 0°C.
- Returns:
idx
- Return type:
pd.DataFrame
- su()
Count of days where TX (daily maximum temperature) > 25°C Let TXij be the daily maximum temperature on day i in period j. Count the number of days where TXij > 25°C.
- Returns:
idx
- Return type:
pd.DataFrame
- id()
Count of days where TX < 0°C Let TXij be the daily maximum temperature on day i in period j. Count the number of days where TXij < 0°C.
- Returns:
idx
- Return type:
pd.DataFrame
- tr()
Count of days where TN > 20°C Let TNij be the daily minimum temperature on day i in period j. Count the number of days where TNij > 20°C.
- Returns:
idx
- Return type:
pd.DataFrame
- gsl()
Annual count of days between first span of at least six days where TG (daily mean temperature) > 5°C and first span in second half of the year of at least six days where TG < 5°C. Let TGij be the daily mean temperature on day i in period j. Count the annual (1 Jan to 31 Dec in Northern Hemisphere, 1 July to 30 June in Southern Hemisphere) number of days between the first occurrence of at least six consecutive days where TGij > 5°C and the first occurrence after 1 July (1 Jan in Southern Hemisphere) of at least six consecutive days where TGij < 5°C.
- Returns:
idx
- Return type:
pd.DataFrame
- txx()
Monthly maximum value of daily maximum temperature: Let TXik be the daily maximum temperature on day i in month k. The maximum daily maximum temperature is then TXx = max (TXik).
- Returns:
idx
- Return type:
pd.DataFrame
- tnx()
Monthly maximum value of daily minimum temperature: Let TNik be the daily minium temperature on day i in month k. The maximum daily minimum temperature is then TNx = max (TNik).
- Returns:
idx
- Return type:
pd.DataFrame
- txn()
Monthly minimum value of daily maximum temperature: Let TXik be the daily maximum temperature on day i in month k. The minimum daily maximum temperature is then TXn = min (TXik)
- Returns:
idx
- Return type:
pd.DataFrame
- tnn()
Monthly minimum value of daily minimum temperature: Let TNik be the daily minimum temperature on day i in month k. The minimum daily minimum temperature is then TNn = min (TNik)
- Returns:
idx
- Return type:
pd.DataFrame
- tn10p()
Count of days where TN < 10th percentile Let TNij be the daily minimum temperature on day i in period j and let TNin10 be the calendar day 10th percentile of daily minimum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where TNij < TNin10.
- Returns:
idx
- Return type:
pd.DataFrame
- tx10p()
Count of days where TX < 10th percentile Let TXij be the daily maximum temperature on day i in period j and let TXin10 be the calendar day 10th percentile of daily maximum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where TXij < TXin10
- Returns:
idx
- Return type:
pd.DataFrame
- tn90p()
Count of days where TN > 90th percentile Let TNij be the daily minimum temperature on day i in period j and let TNin90 be the calendar day 90th percentile of daily minimum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where TNij > TNin90
- Returns:
idx
- Return type:
pd.DataFrame
- tx90p()
Count of days where TX > 90th percentile Let TXij be the daily maximum temperature on day i in period j and let TXin90 be the calendar day 90th percentile of daily maximum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where TXij > TXin90.
- Returns:
idx
- Return type:
pd.DataFrame
- wsdi()
Count of days in a span of at least six days where TX > 90th percentile Let TXij be the daily maximum temperature on day i in period j and let TXin90 be the calendar day 90th percentile of daily maximum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where, in intervals of at least six consecutive days TXij > TXin90.
- Returns:
idx
- Return type:
pd.DataFrame
- csdi()
Count of days in a span of at least six days where TN > 10th percentile Let TNij be the daily minimum temperature on day i in period j and let TNin10 be the calendar day 10th percentile of daily minimum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where, in intervals of at least six consecutive days TNij < TNin10.
- Returns:
idx
- Return type:
pd.DataFrame
- dtr()
Mean difference between TX and TN (°C) Let TXij and TNij be the daily maximum and minium temperature on day i in period j. If I represents the total number of days in j then the mean diurnal temperature range in period j DTRj = sum (TXij - TNij) / I.
- Returns:
idx
- Return type:
pd.DataFrame
- rx1day()
Highest precipitation amount in one-day period Let RRij be the daily precipitation amount on day i in period j. The maximum one-day value for period j is RX1dayj = max (RRij).
- Returns:
idx
- Return type:
pd.DataFrame
- rx5day()
Highest precipitation amount in five-day period Let RRkj be the precipitation amount for the five-day interval k in period j, where k is defined by the last day. The maximum five-day values for period j are RX5dayj = max (RRkj)
- Returns:
idx
- Return type:
pd.DataFrame
- sdii()
Mean precipitation amount on a wet day Let RRij be the daily precipitation amount on wet day w (RR ≥ 1 mm) in period j. If W represents the number of wet days in j then the simple precipitation intensity index SDIIj = sum (RRwj) / W.
- Returns:
idx
- Return type:
pd.DataFrame
- r10mm()
Count of days where RR (daily precipitation amount) ≥ 10 mm Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ 10 mm.
- Returns:
idx
- Return type:
pd.DataFrame
- r20mm()
Count of days where RR ≥ 20 mm Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ 20 mm.
- Returns:
idx
- Return type:
pd.DataFrame
- rnnmm(threshold)
- Parameters:
threshold (float) – Precipitation threshold
Count of days where RR ≥ user-defined threshold in mm Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ nn mm.
- Returns:
idx
- Return type:
pd.DataFrame
- cdd()
Maximum length of dry spell (RR < 1 mm) Let RRij be the daily precipitation amount on day i in period j. Count the largest number of consecutive days where RRij < 1 mm.
- Returns:
idx
- Return type:
pd.DataFrame
- cwd()
Maximum length of wet spell (RR ≥ 1 mm) Let RRij be the daily precipitation amount on day i in period j. Count the largest number of consecutive days where RRij ≥ 1 mm
- Returns:
idx
- Return type:
pd.DataFrame
- r95ptot()
Precipitation due to very wet days (> 95th percentile) Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j and let RRwn95 be the 95th percentile of precipitation on wet days in the base period n (1961-1990). Then R95pTOTj = sum (RRwj), where RRwj > RRwn95.
- Returns:
idx
- Return type:
pd.DataFrame
- r99ptot()
Precipitation due to extremely wet days (> 99th percentile) Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j and let RRwn99 be the 99th percentile of precipitation on wet days in the base period n (1961-1990). Then R99pTOTj = sum (RRwj), where RRwj > RRwn99
- Returns:
idx
- Return type:
pd.DataFrame
- prcptot()
Total precipitation in wet days (> 1 mm) Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j. Then PRCPTOTj = sum (RRwj)
- Returns:
idx
- Return type:
pd.DataFrame