ise.utils package
Submodules
ise.utils.functions module
- ise.utils.functions.add_variable_to_nc(source_file_path, target_file_path, variable_name)[source]
Copies a variable from a source NetCDF file to a target NetCDF file.
Parameters: - source_file_path: Path to the source NetCDF file. - target_file_path: Path to the target NetCDF file. - variable_name: Name of the variable to be copied.
Both files are assumed to have matching dimensions for the variable.
- ise.utils.functions.calculate_distribution_metrics(dataset: DataFrame, column: str | None = None, condition: str | None = None)[source]
Wrapper for calculating distribution metrics from a dataset. Includes ise.utils.data.group_by_run to group the true values and predicted values into NXM matrices (with N=number of samples and M=85, or the number of years in the series). Then, it uses ise.utils.data.create_distribution to calculate individual distributions from the arrays and calculates the divergences.
- Parameters:
dataset (pd.DataFrame) - Dataset to be grouped
column (str, optional) - Column to subset on. Defaults to None.
condition (str, optional) - Condition to subset with. Can be int, str, float, etc. Defaults to None.
- Returns:
Dictionary containing dict[‘kl’] for the KL-Divergence and dict[‘js’] for the Jensen-Shannon Divergence.
- Return type:
dict
- ise.utils.functions.check_input(input: str, options: List[str], argname: str | None = None)[source]
Checks validity of input argument. Not used frequently due to error raising being better practice.
- Parameters:
input (str) - Input value.
options (List[str]) - Valid options for the input value.
argname (str, optional) - Name of the argument being tested. Defaults to None.
- ise.utils.functions.combine_testing_results(data_directory: str, preds: ndarray, sd: dict | None = None, gp_data: dict | None = None, time_series: bool = True, save_directory: str | None = None)[source]
Creates testing results dataframe that reverts input data to original formatting and adds on predictions, losses, and uncertainty bounds. Useful for plotting purposes and overall analysis.
- Parameters:
data_directory (str) - Directory containing training and testing data.
preds (np.ndarray | pd.Series | str) - Array/Series of neural network predictions, or the path to the csv containing predictions.
bounds (dict | pd.DataFrame) - Dictionary or pd.DataFrame of uncertainty bounds to be added to the dataframe, generally outputted from ise.models.testing.pretrained.test_pretrained_model. Defaults to None.
gp_data (dict | pd.DataFrame) - Dictionary or pd.DataFrame containing gaussian process predictions to add to the dataset. Columns/keys must be preds and std. Defaults to None.
time_series (bool, optional) - Flag denoting whether to process the data as a time-series dataset or traditional non-time dataset. Defaults to True.
save_directory (str, optional) - Directory where output files will be saved. Defaults to None.
- Returns:
test results dataframe.
- Return type:
pd.DataFrame
- ise.utils.functions.create_distribution(dataset: ndarray, min_range=-30, max_range=20, step=0.01)[source]
- ise.utils.functions.get_X_y(data, dataset_type='sectors', return_format=None, cols='all', with_chars=True)[source]
- ise.utils.functions.get_all_filepaths(path: str, filetype: str | None = None, contains: str | None = None, not_contains: str | None = None)[source]
Retrieves all filepaths for files within a directory. Supports subsetting based on filetype and substring search.
- Parameters:
path (str) - Path to directory to be searched.
filetype (str, optional) - File type to be returned (e.g. csv, nc). Defaults to None.
contains (str, optional) - Substring that files found must contain. Defaults to None.
not_contains (str, optional) - Substring that files found must NOT contain. Defaults to None.
- Returns:
list of files within the directory matching the input criteria.
- Return type:
List[str]
- ise.utils.functions.get_uncertainty_bands(data: DataFrame, confidence: str = '95', quantiles: List[float] = [0.05, 0.95])[source]
Calculates uncertainty bands on the monte carlo dropout protocol. Includes traditional confidence interval calculation as well as a quantile-based approach.
- Parameters:
data (pd.DataFrame) - Dataframe or array of NXM, typically from ise.utils.functions.group_by_run.
confidence (str, optional) - Confidence level, must be in [95, 99]. Defaults to ‘95’.
quantiles (list[float], optional) - Quantiles of uncertainty bands. Defaults to [0.05, 0.95].
- Returns:
Tuple containing [mean, sd, upper_ci, lower_ci, upper_q, lower_q], or the mean prediction, standard deviation, and the lower and upper confidence interval and quantile bands.
- Return type:
tuple
- ise.utils.functions.group_by_run(dataset: DataFrame, column: str | None = None, condition: str | None = None)[source]
Groups the dataset into each individual simulation series by both the true value of the simulated SLE as well as the model predicted SLE. The resulting arrays are NXM matrices with N being the number of simulations and M being 85, or the length of the series.
- Parameters:
dataset (pd.DataFrame) - Dataset to be grouped
column (str, optional) - Column to subset on. Defaults to None.
condition (str, optional) - Condition to subset with. Can be int, str, float, etc. Defaults to None.
- Returns:
Tuple containing [all_trues, all_preds], or NXM matrices of each series corresponding to true values and predicted values.
- Return type:
tuple
- ise.utils.functions.load_ml_data(data_directory: str, time_series: bool = True)[source]
Loads training and testing data for machine learning models. These files are generated using functions in the ise.data.processing modules or process_data in the ise.pipelines.processing module.
- Parameters:
data_directory (str) - Directory containing processed files.
time_series (bool) - Flag denoting whether to load the time-series version of the data.
- Returns:
Tuple containing [train features, train_labels, test_features, test_labels, test_scenarios], or the training and testing datasets including the scenarios used in testing.
- Return type:
tuple
- ise.utils.functions.load_model(model_path, model_class, architecture, mc_dropout=False, dropout_prob=0.1)[source]
Loads PyTorch model from saved state_dict.
- Parameters:
model_path (str) - Filepath to model state_dict.
model_class (Model) - Model class.
architecture (dict) - Defined architecture of pretrained model.
mc_dropout (bool) - Flag denoting wether the model was trained using MC Dropout.
dropout_prob (float) - Value between 0 and 1 denoting the dropout probability.
- Returns:
Pretrained model.
- Return type:
model (Model)
- ise.utils.functions.to_tensor(x)[source]
Converts input data to a PyTorch tensor of type float.
- Parameters:
x - Input data to be converted. Must be a pandas dataframe, numpy array, or PyTorch tensor.
- Returns:
A PyTorch tensor of type float.
- ise.utils.functions.undummify(df: DataFrame, prefix_sep: str = '-')[source]
Undummifies, or reverses pd.get_dummies, a dataframe. Includes taking encoded categorical variable columns (boolean indices), and converts them back into the original data format.
- Parameters:
df (pd.DataFrame) - Dataframe to be converted.
prefix_sep (str, optional) - Prefix separator used in pd.get_dummies. Recommended not to change this. Defaults to “-“.
- Returns:
_description_
- Return type:
_type_
- ise.utils.functions.unscale(y, scaler_path)[source]
Unscale the output data using the scaler saved during training.
- Parameters:
y - Input data to be unscaled.
scaler_path - Path to the scaler used for scaling the data.
- Returns:
The unscaled data.
- ise.utils.functions.unscale_column(dataset: DataFrame, column: str = 'year')[source]
Unscale column in dataset, particularly for unscaling year and sectors column given that they have a known range of values (2016-2100 and 1-18 respectively).
- Parameters:
dataset (pd.DataFrame) - Dataset containing columns to unscale.
column (str | list, optional) - Columns to be unscaled, must be in [year, sectors]. Can be both. Defaults to ‘year’.
- Returns:
dataset containing unscaled columns.
- Return type:
pd.DataFrame
ise.utils.training module
- class ise.utils.training.CheckpointSaver(model: Module, optimizer: Optimizer, checkpoint_path: str, verbose: bool = False)[source]
Bases:
object
- class ise.utils.training.EarlyStoppingCheckpointer(model, optimizer, checkpoint_path='checkpoint.pt', patience=10, verbose=False)[source]
Bases:
CheckpointSaver