Code Overview: Modules

analogs.py

synopsis:

This module contain the principal classes and functions to make the time series preprocessing and reconstructions

The rascal.analogs.Station() class stores station metadata (code, name, altitude, longitude and latitude) and calculate daily time series.

class rascal.analogs.Station(path)

Stores station metadata (code, name, altitude, longitude and latitude) and calculate daily time series.

Parameters:

path (str) – Path of the directory that contains the observations.

path

Path of the directory that contains the observations

Type:

str

meta

DataFrame with the metadata of the station (code, name, altitude, longitude and latitude)

Type:

pd.DataFrame

longitude

Longitude of the station

Type:

float

latitude

Latitude of the station

Type:

float

altitude

Elevation of the station

Type:

float

get_data(variable[, skipna=True])

Get the daily time series of the variable

Parameters:
  • variable (str) – variable name.

  • skipna (bool) – skipna when resampling to daily frequency.

Returns:

data

Return type:

pd.DataFrame

get_gridpoint(grid_latitudes, grid_longitudes)

The rascal.analogs.Predictor() class stores the predictor data and Principal Component Analysis results:

class rascal.analogs.Predictor(paths, grouping, lat_min, lat_max, lon_min, lon_max[, mosaic=True][, number=None])

Predictor class. This contains data about the predictor variable to use for the reconstruction.

Parameters:
  • path (list[str]) – Paths of the grib file to open.

  • grouping (str or None) –

    Method of grouping the data, str format = “frequency_method”

    • frequency=(“hourly”, “daily”, “monthly”, “yearly”)

    • method=(“mean”, “max”, “min”, “sum”)

  • lat_min (float) – Predictor field minimum latitude

  • lat_max (float) – Predictor field maximum latitude

  • lon_min (float) – Predictor field minimum longitude

  • lon_max (float) – Predictor field maximum longitude

  • mosaic (bool or None) – if True apply .to_mosaic() method

  • number (int or None) – Ensemble member number

data
Type:

xr.Dataset

crop(lat_min, lat_max, lon_min, lon_max)

Crop the domain of the dataframe

Parameters:
  • lat_min (float) – New minimum latitude

  • lat_max (float) – New maximum latitude

  • lon_min (float) – New minimum longitude

  • lon_max (float) – New maximum longitude

to_mosaic()

To use various simultaneous predictors or a vectorial variable, concatenate the variables along the longitude axis to obtain a single compound variable, easier to use when performing PCA.

Returns:

compound_predictor

Return type:

xr.Dataset

module()

Get the module of the predictor variables as if they were components of a vector.

Returns:

self

Return type:

Predictor

anomalies([seasons][, standardize][, mean_period])

Calculate seasonal anomalies of the field. The definition of season is flexible, being only a list of months contained within it.

Parameters:
  • seasons (list[list[int]] or None) – Months of the season. Default = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

  • standardize (bool or None) – Standardize anomalies. Default = True

  • mean_period (list[pd.DatetimeIndex] or None) – Dates to use as mean climatology to calculate the anomalies.

Returns:

anomalies (dims = [time, latitude, longitude, season])

Return type:

xr.Dataset

pcs(path, npcs[, seasons][, standardize][, pcscaling][, overwrite][, training][, project])

Perform Principal Component Analysis. To save computation time, the PCA object can be saved as a pickle, so the analysis does not have to be performed every time.

Parameters:
  • path (str) – Path to save the PCA results

  • npcs (int) – Number of components.

  • seasons (list[list[int]] or None) – List of list of months of every season.

  • standardize (bool or None) – If True, the anomalies used in the PCA are standardized.

  • pcscaling (int or None) –

    Set the scaling of the PCs used to compute covariance. The following values are accepted:

    • 0 : Un-scaled PCs.

    • 1 : PCs are scaled to unit variance (divided by the square-root of their eigenvalue) (default).

    • 2 : PCs are multiplied by the square-root of their eigenvalue.

  • overwrite (bool or None) – Default = False. If True recalculate the PCA and overwrite the pickle with the PCA

  • training (list[pd.DatetimeIndex] or None) – Dates to use for calculating the PCA

  • project (xr.Dataset or None) – Data to project onto the calculated PCA results.

The rascal.analogs.Analogs() get the pool of analog days and reconstruct the time series:

class rascal.analogs.Analogs(pcs, dates, observations)

Predictor class. This contains data about the predictor variable to use for the reconstruction.

Parameters:

path (list[str] or None) – Optional “kind” of ingredients.

get_pool(size[, vw_size][, vw_type][, distance])

Get the pool of size closest neighbors to each day

Parameters:
  • size (int) – Number of neighbors in the pool.

  • vw_size (int or None) – Validation window size. How many data points around each point is ignored to validate the reconstruction.

  • vw_type (str or None) –

    Type of validation window. Options:

    • forward: The original date is the last date of the window.

    • backward: The original date is the firs date of the window.

    • centered: The original date is in the center of the window.

  • distance (str or None) –

    Metric to determine the distance between points in the PCs space. Options:

    • euclidean

    • mahalanobis (Wishlist)

Returns:

(analog_dates, analog_distances), dates of the analogs in the pool for each day, and distances in the PCs space of each

Return type:

(pd.DataFrame, pd.DataFrame)

reconstruct([pool_size][, method][, sample_size][, mapping_variable][, vw_size][, vw_type][, distance])

Reconstruct a time series using the analog pool for each day.

Parameters:
  • pool_size (int or None) – Size of the analog pool for each day.

  • method (str or None) –

    Similarity method to select the best analog of the pool. Options are:

    • ’closest’: (Selected by default) Select the closest analog in the PCs space

    • ’average’: Calculate the weighted average of the ‘sample_size’ closest analogs in the PCs space.

    • ’quantilemap’: Select the analog that represent the same quantile in the observations pool that another mapping variable.

  • sample_size (int or None) – Number of analogs to average in the ‘average’ method

  • mapping_variable (Predictor or None) – Time series of a variable to use as mapping in ‘quantilemap’

  • vw_size (int or None) – Validation window size. How many data points around each point is ignored to validate the reconstruction.

  • vw_type (str or None) –

    Type of validation window. Options:

    • forward: The original date is the last date of the window.

    • backward: The original date is the firs date of the window.

    • centered: The original date is in the center of the window.

  • distance (str or None) –

    Metric to determine the distance between points in the PCs space. Options:

    • euclidean

    • mahalanobis (Wishlist)

Returns:

reconstruction

Return type:

pd.DataFrame

analysis.py

synopsis:

This module contain the principal classes and functions analyze the skill and validate the reconstructions

You can use the rascal.analysis.RSkill() class to validate and analyze the skill of the reconstructions:

class rascal.analysis.RSkill([observations][, reconstructions][, reanalysis][, data])

Predictor class. This contains data about the predictor variable to use for the reconstruction.

Parameters:
  • observations (pd.DataFrame or None) – Obstervations time series

  • reconstructions (pd.DataFrame or None) – Reconstructions time series

  • reanalysis (pd.DataFrame or None) – Reanalysis time series

  • data (pd.DataFrame or None) – All data joined (observations, reconstructions, reanalysis)

observations

Obstervations time series

Type:

pd.DataFrame

reconstructions

Reconstructions time series

Type:

pd.DataFrame

reanalysis

Reanalysis time series

Type:

pd.DataFrame

data

All data joined (observations, reconstructions, reanalysis) concatenated in the columns axis

Type:

pd.DataFrame

resample(freq, grouping[, hydroyear][, skipna])

Resample the dataset containing observations, reconstructions and reanalysis data.

Parameters:
  • freq (str) – New sampling frequency.

  • grouping (str) – Options=”mean”, “median” or “sum”

  • hydroyear (bool or None) – Default=False. If True, when the resampling frequency is “1Y” it takes hydrological years (from October to September) instead of natural years

  • skipna (bool or None) – Default=False. If True ignore NaNs.

Returns:

RSkill with resampled data

Return type:

RSkill

plotseries([color][, start][, end][, methods])

Plot the time series of the reconstructions with the reanalysis and observations series

Parameters:
  • color (dict or None) – dict of which color to use (values) with each dataset (keys)

  • start (Datetime or None) – Start date of the plot

  • end (Datetime or None) – End date of the plot

  • methods (list[str] or None) – Reconstruction methods to plot

skill([reference=None][, threshold=None])

Generate a pd.DataFrame with the table of skills of various simulations. The skill metrics are:

  • Mean Bias Error (bias)

  • Root Mean Squared Error (rmse)

  • Correlation Coefficient (r2)

  • Standard Deviation (std)

  • MSE-based Skill Score (ssmse)

  • Heidke Skill Score (hss)

  • Brier Score (bs)

param reference:

Time series of a reference model to compare when calculating SSMSE and HSS.

param threshold:

Threshold to use when computing the HSS and BS

type referece:

pd.DataFrame or None

type threshold:

float or None

return:

(observation_std, skill_table), Standard deviation of the observations and table of each skill score for each simulation.

rtype:

(float, pd.DataFrame)

taylor()

Calls .skill() method and computes the Taylor diagram

Returns:

fig, ax

annual_cycle([grouping][, color])

Plot the annual cycle of the reconstructions, reanalysis and observations

Parameters:
  • grouping (str or None) – (Default=”mean”) Monthly grouping to plot in the cylce. Options=(“sum”, “mean”, “median”, “std”)

  • color (dict or None) – dict of which color to use (values) with each dataset (keys)

qqplot()

Quantile-Quantile plot

indices.py

synopsis:

This module contain the principal classes and functions to calculate relevant climatic indices

You can use the rascal.indices.CIndex() class to retrieve relevant climatic indices based on: Data, C. (2009). Guidelines on analysis of extremes in a changing climate in support of informed decisions for adaptation. World Meteorological Organization.

class rascal.analysis.CIndex(df)
Parameters:

df (pd.DataFrame) – Time series containing the relevant variables for the index calculation.

fd()

Count of days where TN (daily minimum temperature) < 0°C Let TNij be the daily minimum temperature on day i in period j. Count the number of days where TNij < 0°C.

Returns:

idx

Return type:

pd.DataFrame

su()

Count of days where TX (daily maximum temperature) > 25°C Let TXij be the daily maximum temperature on day i in period j. Count the number of days where TXij > 25°C.

Returns:

idx

Return type:

pd.DataFrame

id()

Count of days where TX < 0°C Let TXij be the daily maximum temperature on day i in period j. Count the number of days where TXij < 0°C.

Returns:

idx

Return type:

pd.DataFrame

tr()

Count of days where TN > 20°C Let TNij be the daily minimum temperature on day i in period j. Count the number of days where TNij > 20°C.

Returns:

idx

Return type:

pd.DataFrame

gsl()

Annual count of days between first span of at least six days where TG (daily mean temperature) > 5°C and first span in second half of the year of at least six days where TG < 5°C. Let TGij be the daily mean temperature on day i in period j. Count the annual (1 Jan to 31 Dec in Northern Hemisphere, 1 July to 30 June in Southern Hemisphere) number of days between the first occurrence of at least six consecutive days where TGij > 5°C and the first occurrence after 1 July (1 Jan in Southern Hemisphere) of at least six consecutive days where TGij < 5°C.

Returns:

idx

Return type:

pd.DataFrame

txx()

Monthly maximum value of daily maximum temperature: Let TXik be the daily maximum temperature on day i in month k. The maximum daily maximum temperature is then TXx = max (TXik).

Returns:

idx

Return type:

pd.DataFrame

tnx()

Monthly maximum value of daily minimum temperature: Let TNik be the daily minium temperature on day i in month k. The maximum daily minimum temperature is then TNx = max (TNik).

Returns:

idx

Return type:

pd.DataFrame

txn()

Monthly minimum value of daily maximum temperature: Let TXik be the daily maximum temperature on day i in month k. The minimum daily maximum temperature is then TXn = min (TXik)

Returns:

idx

Return type:

pd.DataFrame

tnn()

Monthly minimum value of daily minimum temperature: Let TNik be the daily minimum temperature on day i in month k. The minimum daily minimum temperature is then TNn = min (TNik)

Returns:

idx

Return type:

pd.DataFrame

tn10p()

Count of days where TN < 10th percentile Let TNij be the daily minimum temperature on day i in period j and let TNin10 be the calendar day 10th percentile of daily minimum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where TNij < TNin10.

Returns:

idx

Return type:

pd.DataFrame

tx10p()

Count of days where TX < 10th percentile Let TXij be the daily maximum temperature on day i in period j and let TXin10 be the calendar day 10th percentile of daily maximum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where TXij < TXin10

Returns:

idx

Return type:

pd.DataFrame

tn90p()

Count of days where TN > 90th percentile Let TNij be the daily minimum temperature on day i in period j and let TNin90 be the calendar day 90th percentile of daily minimum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where TNij > TNin90

Returns:

idx

Return type:

pd.DataFrame

tx90p()

Count of days where TX > 90th percentile Let TXij be the daily maximum temperature on day i in period j and let TXin90 be the calendar day 90th percentile of daily maximum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where TXij > TXin90.

Returns:

idx

Return type:

pd.DataFrame

wsdi()

Count of days in a span of at least six days where TX > 90th percentile Let TXij be the daily maximum temperature on day i in period j and let TXin90 be the calendar day 90th percentile of daily maximum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where, in intervals of at least six consecutive days TXij > TXin90.

Returns:

idx

Return type:

pd.DataFrame

csdi()

Count of days in a span of at least six days where TN > 10th percentile Let TNij be the daily minimum temperature on day i in period j and let TNin10 be the calendar day 10th percentile of daily minimum temperature calculated for a five-day window centred on each calendar day in the base period n (1961-1990). Count the number of days where, in intervals of at least six consecutive days TNij < TNin10.

Returns:

idx

Return type:

pd.DataFrame

dtr()

Mean difference between TX and TN (°C) Let TXij and TNij be the daily maximum and minium temperature on day i in period j. If I represents the total number of days in j then the mean diurnal temperature range in period j DTRj = sum (TXij - TNij) / I.

Returns:

idx

Return type:

pd.DataFrame

rx1day()

Highest precipitation amount in one-day period Let RRij be the daily precipitation amount on day i in period j. The maximum one-day value for period j is RX1dayj = max (RRij).

Returns:

idx

Return type:

pd.DataFrame

rx5day()

Highest precipitation amount in five-day period Let RRkj be the precipitation amount for the five-day interval k in period j, where k is defined by the last day. The maximum five-day values for period j are RX5dayj = max (RRkj)

Returns:

idx

Return type:

pd.DataFrame

sdii()

Mean precipitation amount on a wet day Let RRij be the daily precipitation amount on wet day w (RR ≥ 1 mm) in period j. If W represents the number of wet days in j then the simple precipitation intensity index SDIIj = sum (RRwj) / W.

Returns:

idx

Return type:

pd.DataFrame

r10mm()

Count of days where RR (daily precipitation amount) ≥ 10 mm Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ 10 mm.

Returns:

idx

Return type:

pd.DataFrame

r20mm()

Count of days where RR ≥ 20 mm Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ 20 mm.

Returns:

idx

Return type:

pd.DataFrame

rnnmm(threshold)
Parameters:

threshold (float) – Precipitation threshold

Count of days where RR ≥ user-defined threshold in mm Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ nn mm.

Returns:

idx

Return type:

pd.DataFrame

cdd()

Maximum length of dry spell (RR < 1 mm) Let RRij be the daily precipitation amount on day i in period j. Count the largest number of consecutive days where RRij < 1 mm.

Returns:

idx

Return type:

pd.DataFrame

cwd()

Maximum length of wet spell (RR ≥ 1 mm) Let RRij be the daily precipitation amount on day i in period j. Count the largest number of consecutive days where RRij ≥ 1 mm

Returns:

idx

Return type:

pd.DataFrame

r95ptot()

Precipitation due to very wet days (> 95th percentile) Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j and let RRwn95 be the 95th percentile of precipitation on wet days in the base period n (1961-1990). Then R95pTOTj = sum (RRwj), where RRwj > RRwn95.

Returns:

idx

Return type:

pd.DataFrame

r99ptot()

Precipitation due to extremely wet days (> 99th percentile) Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j and let RRwn99 be the 99th percentile of precipitation on wet days in the base period n (1961-1990). Then R99pTOTj = sum (RRwj), where RRwj > RRwn99

Returns:

idx

Return type:

pd.DataFrame

prcptot()

Total precipitation in wet days (> 1 mm) Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j. Then PRCPTOTj = sum (RRwj)

Returns:

idx

Return type:

pd.DataFrame