ise.utils
ise.utils.functions
- ise.utils.functions.add_variable_to_nc(source_file_path, target_file_path, variable_name)[source]
Copies a variable from a source NetCDF file to a target NetCDF file.
- Parameters:
source_file_path (str) - Path to the source NetCDF file.
target_file_path (str) - Path to the target NetCDF file.
variable_name (str) - Name of the variable to be copied.
- Raises:
FileNotFoundError - If the specified variable is not found in the source file.
- ise.utils.functions.calculate_distribution_metrics(dataset: DataFrame, column: str = None, condition: str = None)[source]
Computes distribution divergence metrics between true and predicted values.
This function groups the dataset by simulation runs, creates probability distributions for true and predicted values, and calculates the Kullback-Leibler (KL) and Jensen-Shannon (JS) divergences.
- Parameters:
dataset (pd.DataFrame) - The dataset containing true and predicted values.
column (str, optional) - Column name to subset on. Defaults to None.
condition (str, optional) - Value to filter the dataset based on the specified column. Defaults to None.
- Returns:
- A dictionary containing:
’kl’ (float): KL-Divergence value.
’js’ (float): Jensen-Shannon Divergence value.
- Return type:
dict
- ise.utils.functions.check_input(input: str, options: List[str], argname: str = None)[source]
Validates whether a given input string is within an expected list of options.
- Parameters:
input (str) - The input value to validate.
options (List[str]) - A list of valid options.
argname (str, optional) - Name of the argument being checked for better error messaging. Defaults to None.
- Raises:
ValueError - If the input is not in the list of allowed options.
- ise.utils.functions.combine_testing_results(data_directory: str, preds: ndarray, sd: dict = None, gp_data: dict = None, time_series: bool = True, save_directory: str = None)[source]
Combines test results into a DataFrame with predictions, uncertainties, and true values.
- Parameters:
data_directory (str) - Directory containing training and testing data.
preds (np.ndarray | pd.Series | str) - Predictions array, Series, or path to a CSV file with predictions.
sd (dict | pd.DataFrame, optional) - Standard deviations for uncertainty estimation. Defaults to None.
gp_data (dict | pd.DataFrame, optional) - Gaussian process predictions and standard deviations. Defaults to None.
time_series (bool, optional) - Whether to process time-series data. Defaults to True.
save_directory (str, optional) - Directory where results should be saved. Defaults to None.
- Returns:
DataFrame containing test results with true values, predictions, errors, and uncertainty bounds.
- Return type:
pd.DataFrame
- ise.utils.functions.create_distribution(dataset: ndarray, min_range=-30, max_range=20, step=0.01)[source]
Creates a probability density function (PDF) using Gaussian kernel density estimation (KDE).
- Parameters:
dataset (np.ndarray) - Input data for KDE.
min_range (float, optional) - Minimum range for support values. Defaults to -30.
max_range (float, optional) - Maximum range for support values. Defaults to 20.
step (float, optional) - Step size for the support values. Defaults to 0.01.
- Returns:
- A tuple containing:
density (np.ndarray): Density values from KDE.
support (np.ndarray): Support values for the density function.
- Return type:
tuple
- ise.utils.functions.get_X_y(data, dataset_type='sectors', return_format=None, cols='all', with_chars=True)[source]
Extracts input features (X) and target labels (y) from a dataset.
Supports various dataset types (sectors, regions, scenarios) and formats (numpy, tensor, pandas).
- Parameters:
data (str or pd.DataFrame) - Filepath to the dataset CSV or a pandas DataFrame.
dataset_type (str, optional) - The type of dataset (‘sectors’, ‘regions’, ‘scenarios’). Defaults to “sectors”.
return_format (str, optional) - Format of the returned data (‘numpy’, ‘tensor’, or ‘pandas’). Defaults to None.
cols (str or list, optional) - Columns to include in the features. Defaults to “all”.
with_chars (bool, optional) - Whether to include characteristic columns in features. Defaults to True.
- Returns:
- A tuple containing:
X (pd.DataFrame or np.ndarray or torch.Tensor): The input features.
y (pd.DataFrame or np.ndarray or torch.Tensor): The target labels.
scenarios (list, optional): Scenario identifiers if dataset type is “regions”.
- Return type:
tuple
- ise.utils.functions.get_all_filepaths(path: str, filetype: str = None, contains: str = None, not_contains: str = None)[source]
Retrieves all filepaths for files within a directory. Supports subsetting based on filetype and substring search.
- Parameters:
path (str) - Path to directory to be searched.
filetype (str, optional) - File type to be returned (e.g. csv, nc). Defaults to None.
contains (str, optional) - Substring that files found must contain. Defaults to None.
not_contains (str, optional) - Substring that files found must NOT contain. Defaults to None.
- Returns:
list of files within the directory matching the input criteria.
- Return type:
List[str]
- ise.utils.functions.get_data(data_dir, dataset_type='sectors', return_format='tensor')[source]
Loads training, validation, and test datasets, formatting them for model training.
- Parameters:
data_dir (str) - Path to the directory containing the dataset files.
dataset_type (str, optional) - Type of dataset (‘sectors’ or ‘scenarios’). Defaults to ‘sectors’.
return_format (str, optional) - Format of the returned data (‘tensor’, ‘numpy’, or ‘pandas’). Defaults to ‘tensor’.
- Returns:
- A tuple containing:
X_train (pd.DataFrame, np.ndarray, or torch.Tensor): Training features.
y_train (pd.DataFrame, np.ndarray, or torch.Tensor): Training labels.
X_val (pd.DataFrame, np.ndarray, or torch.Tensor): Validation features.
y_val (pd.DataFrame, np.ndarray, or torch.Tensor): Validation labels.
X_test (pd.DataFrame, np.ndarray, or torch.Tensor): Testing features.
y_test (pd.DataFrame, np.ndarray, or torch.Tensor): Testing labels.
- Return type:
tuple
- ise.utils.functions.get_uncertainty_bands(data: DataFrame, confidence: str = '95', quantiles: List[float] = [0.05, 0.95])[source]
Computes uncertainty bands using confidence intervals and quantiles.
- Parameters:
data (pd.DataFrame) - Data matrix of shape (N, M), where N is samples and M is time steps.
confidence (str, optional) - Confidence level (‘95’ or ‘99’). Defaults to “95”.
quantiles (List[float], optional) - Quantiles for uncertainty bands. Defaults to [0.05, 0.95].
- Returns:
- A tuple containing:
mean (np.ndarray): Mean values.
sd (np.ndarray): Standard deviation values.
upper_ci (np.ndarray): Upper confidence interval.
lower_ci (np.ndarray): Lower confidence interval.
upper_q (np.ndarray): Upper quantile bound.
lower_q (np.ndarray): Lower quantile bound.
- Return type:
tuple
- ise.utils.functions.group_by_run(dataset: DataFrame, column: str = None, condition: str = None)[source]
Groups dataset simulations into structured matrices for true and predicted values.
- Parameters:
dataset (pd.DataFrame) - Dataset containing simulation results.
column (str, optional) - Column name to subset on. Defaults to None.
condition (str, optional) - Condition for filtering the dataset. Defaults to None.
- Returns:
- A tuple containing:
all_trues (np.ndarray): Matrix of true values (N x M, where N is the number of simulations and M is time steps).
all_preds (np.ndarray): Matrix of predicted values.
scenarios (list): List of scenario information for each simulation.
- Return type:
tuple
- ise.utils.functions.load_ml_data(data_directory: str, time_series: bool = True)[source]
Loads machine learning training and testing data from CSV files.
- Parameters:
data_directory (str) - Directory containing the processed data files.
time_series (bool, optional) - Whether to load the time-series version of the data. Defaults to True.
- Returns:
- A tuple containing:
train_features (pd.DataFrame): Training feature set.
train_labels (pd.Series): Training labels.
test_features (pd.DataFrame): Testing feature set.
test_labels (pd.Series): Testing labels.
test_scenarios (list): List of test scenarios.
- Return type:
tuple
- ise.utils.functions.load_model(model_path, model_class, architecture, mc_dropout=False, dropout_prob=0.1)[source]
Loads a PyTorch model from a saved state_dict file.
- Parameters:
model_path (str) - Path to the model’s state dictionary file.
model_class (type) - Class reference of the model to be loaded.
architecture (dict) - Dictionary specifying the architecture of the model.
mc_dropout (bool, optional) - Whether the model uses Monte Carlo Dropout. Defaults to False.
dropout_prob (float, optional) - Dropout probability if MC Dropout is used. Defaults to 0.1.
- Returns:
The loaded PyTorch model set to the available device (CPU/GPU).
- Return type:
torch.nn.Module
- ise.utils.functions.to_tensor(x)[source]
Converts input data into a PyTorch tensor with float32 dtype.
- Parameters:
x (pd.DataFrame, np.ndarray, or torch.Tensor) - Input data.
- Returns:
Converted tensor.
- Return type:
torch.Tensor
- Raises:
ValueError - If the input data type is not supported.
- ise.utils.functions.undummify(df: DataFrame, prefix_sep: str = '-')[source]
Converts a one-hot encoded dataframe back to its categorical form.
- Parameters:
df (pd.DataFrame) - DataFrame containing one-hot encoded categorical columns.
prefix_sep (str, optional) - Separator used in column names to identify categories. Defaults to “-“.
- Returns:
DataFrame with categorical values restored.
- Return type:
pd.DataFrame
- ise.utils.functions.unscale(y, scaler_path)[source]
Unscales a dataset using a previously saved MinMaxScaler.
- Parameters:
y (np.ndarray) - The scaled data.
scaler_path (str) - Path to the saved MinMaxScaler object.
- Returns:
The unscaled data.
- Return type:
np.ndarray
- ise.utils.functions.unscale_column(dataset: DataFrame, column: str = 'year')[source]
Unscales specified columns back to their original range using known value distributions.
This function is specifically used to revert the normalization of ‘year’ and ‘sectors’ columns since they have known value ranges.
- Parameters:
dataset (pd.DataFrame) - Dataset containing the scaled columns.
column (str or list, optional) - Column(s) to be unscaled. Can be ‘year’, ‘sectors’, or a list containing both. Defaults to “year”.
- Returns:
Dataset with the specified column(s) unscaled.
- Return type:
pd.DataFrame
ise.utils.training
- class ise.utils.training.CheckpointSaver(model: Module, optimizer: Optimizer, checkpoint_path: str, verbose: bool = False)[source]
Bases:
objectA class to handle saving and loading of model checkpoints during training.
This class monitors the model’s loss and saves the model’s state when an improvement is detected. It can also be configured to save the model at every epoch.
- checkpoint_path
Path where the checkpoint will be saved.
- Type:
str
- model
The PyTorch model being trained.
- Type:
torch.nn.Module
- optimizer
The optimizer used during training.
- Type:
torch.optim.Optimizer
- best_loss
The best recorded loss value. Initially set to infinity.
- Type:
float
- verbose
If True, logs messages when a checkpoint is saved.
- Type:
bool
- log
Stores log messages for saving actions.
- Type:
str or None
- __call__(loss, epoch, save_best_only=True)[source]
Checks whether to save the checkpoint based on loss improvement.
- _determine_if_better(loss)[source]
Determines if the new loss is an improvement over the best recorded loss.
- save_checkpoint(epoch, loss, path=None)[source]
Saves the model’s state, optimizer state, and epoch information.
- load_checkpoint(path=None)[source]
Loads a saved checkpoint and restores model and optimizer states.
- load_checkpoint(path: str = None)[source]
Loads a checkpoint and restores the model and optimizer states.
- Parameters:
path (str, optional) - The file path to load the checkpoint from. If None, the default path is used.
- Returns:
The epoch number from which training should resume.
- Return type:
int
- save_checkpoint(epoch, loss, path: str = None)[source]
Saves the model checkpoint, including model state, optimizer state, and epoch.
- Parameters:
epoch (int) - The current epoch number.
loss (float) - The loss value associated with this checkpoint.
path (str, optional) - The file path to save the checkpoint. If None, the default path is used.
- class ise.utils.training.EarlyStoppingCheckpointer(model, optimizer, checkpoint_path='checkpoint.pt', patience=10, verbose=False)[source]
Bases:
CheckpointSaverA class that extends CheckpointSaver to implement early stopping.
This class tracks model performance and stops training when the validation loss does not improve for a specified number of epochs (patience).
- patience
The number of epochs with no improvement before stopping.
- Type:
int
- counter
Tracks the number of epochs since the last improvement.
- Type:
int
- early_stop
Flag indicating whether early stopping should occur.
- Type:
bool