abcgan package

Submodules

abcgan.constants module

File for global constants used in the program.

abcgan.interface module

Code for top level interface.

This code is added to the main package level in __init__.py

abcgan.interface.anomaly_estimation_1d(fakes, data)

compute an unbounded anomaly score for a new data sample using logsumexp computation method

Parameters
  • fakes (torch.Tensor) – n_samples x n_alt x n_features background variables

  • data (torch.Tensor) – 1 x n_alt x n_features broadcast n_samples times to match fakes data shape

Returns

anomalies – 1 x n_alt x n_feat output of anomaly scores (unbounded).

Return type

1 xnp.ndarray, np.ndarray

abcgan.interface.anomaly_estimation_nd(fakes, data)

compute an unbounded anomaly score for a new data sample using logsumexp computation method (N-dimensional)

Parameters
  • fakes (torch.Tensor) – n_samples x n_alt x n_features background variables

  • data (torch.Tensor) – 1 x n_alt x n_features broadcast n_samples times to match fakes data shape

Returns

anomalies – 1 x n_alt x n_feat output of anomaly scores (unbounded).

Return type

1 xnp.ndarray, np.ndarray

abcgan.interface.anomaly_score(drivers, data=None, model='mm_gan_radar', bv_type='radar')

returns unbounded anomaly score for a given set of driver parameters and data. more positive numbers are more confident.

Parameters
  • drivers (np.ndarray) – 1 x n_drivers input driving parameters (not z-scaled). one sample at a time

  • data (np.ndarray) – 1 x n_alt_in x n_meas

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns

anomalies – 1 x n_alt x n_feat output of anomaly scores (unbounded).

Return type

1 xnp.ndarray, np.ndarray

abcgan.interface.discriminate(drivers, measurements, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MLT', 'SLT', 'SZA', 'ap', 'MEI', 'RMM1', 'RMM2', 'TCI', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], model='mm_gan_radar', bv_type='radar')

Score how well the measurements match with historical observations.

Parameters
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • measurements (np.ndarray) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than max_alt.

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns

scores – n_samples x n_alt output normalcy scores in the range [0, 1.0].

Return type

np.ndarray

abcgan.interface.estimate_drivers(drivers, model='dr_gan')

Predict drivers 2 hours into the future driver GAN model. Used for real-time background predictions using drivers from 2 hours ago.

Parameters
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • model (str, optional) – name of model to use

Returns

predicted_drivers – estimation of driver features two hours from the drivers inputted

Return type

np.ndarray

abcgan.interface.gen_stats(drivers, data=None, model='mm_gan_radar', bv_type='radar')

Statistical distribution of 10,000 upper altitude data points conditioned on driver parameters and lower altitude measurements.

Parameters
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • data (np.ndarray) – n_samples x n_alt_in x n_meas

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns

samples – 2xn_avg*n_samples x n_alt x n_feat output anomaly scores (unbounded). The first element is the fake output. The second array entry contains the scaled background variables with repeats

Return type

[np.ndarray, np.ndarray]

abcgan.interface.generate(drivers, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MLT', 'SLT', 'SZA', 'ap', 'MEI', 'RMM1', 'RMM2', 'TCI', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], measurements=None, n_alt=30, model='mm_gan_radar', bv_type='radar')

Generate synthetic data consistent with the historical distribution.

Parameters
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • measurements (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt. These represent fixed measurements for the lowest altitudes to condition on. Usually left as default (None)

  • n_alt (int, optional) – number of altitude measurements to draw, defaults to max_alt

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns

samples – n_samples x n_alt x n_meas output measurements at each requested altitude. If measurements is not None then the measurements for the first n_alt_in will be copied over from the input.

Return type

np.ndarray

abcgan.interface.stack_bvs(bv_dict, bv_type='radar')

Stacks drivers in appropriate format.

This function is provided for convenience.

Parameters
  • bv_dict (dict) – Dictionary mapping names of background variables to numpy arrays with values for those bvs. Each array should have shape n_sapmles x n_altitudes. Can also use h5py.Group.

  • bv_type (str) – string specifying weather to stack radar or lidar data

  • abcgan.bv_names (Valid names for drivers can be found at) –

Raises
  • ValueError: – If the input shape of the bv dict values is not corrects

  • KeyError: – If one of the required bvs is missing.

abcgan.interface.stack_drivers(driver_dict, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MLT', 'SLT', 'SZA', 'ap', 'MEI', 'RMM1', 'RMM2', 'TCI', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Stacks drivers in appropriate format.

This function is provided for convenience.

Parameters
  • driver_dict (dict) – Dictionary mapping names of drivers to the numpy arrays with values for those drivers. Each array has a single dimension of the same length n_samples. Can also use an h5py.Group.

  • driver_names (list) – names of the drivers to load

  • abcgan.driver_names (Valid names for drivers can be found at) –

Raises
  • ValueError: – If the driver values have the wrong type or shape.

  • KeyError: – If one of the required drivers is missing.

abcgan.mask module

abcgan.mask.mask_altitude(bv_feat)

Creates an altitude mask for nans in bvs.

Also replaces nans with numbers.

Parameters

bv_feat (torch.Tensor) – background variables

Returns

  • bv_feat (torch.Tensor) – bv_feat with nans replaced, done in place but returned for clarity

  • alt_mask (torch.Tensor) – Mask that is true for valid altitudes

Raises

ValueError: – If valid values are not contiguous.

abcgan.mask.prev_driver_mask(unix_time)

Creates a driver mask of samples that have a previous sample and a mapping vector to the previous sample.

Parameters

unix_time (np.array) – time stamp of driver samples

Returns

  • prev_dr_map (np.array) – vector mapping each sample to its delayed sample

  • dr_mask (torch.Tensor) – Mask of valid driver samples that have a delayed sample

abcgan.mean_estimation module

class abcgan.mean_estimation.Transformer(d_dr: int = 18, d_bv: int = 12, n_alt: int = 30, d_model: int = 64, nhead: int = 1, num_encoder_layers: int = 1, dim_feedforward: int = 64, dropout: float = 0.0, activation: str = 'relu')

Bases: torch.nn.modules.module.Module

Transformer Class with only the encoder

Parameters
  • d_model (int) – the number of expected features in the encoder/decoder inputs

  • d_stack (int) – the number of features to stack to output

  • nhead (int) – the number of heads in the multiheadattention models

  • num_encoder_layers (int) – the number of sub-encoder-layers in the encoder

  • dim_feedforward (int) – the dimension of the feedforward network model

  • dropout (int) – the dropout value

  • activation (str) – the activation function of encoder/decoder intermediate layer

forward(driver_src: torch.Tensor, bv_src: torch.Tensor, src_key_padding_mask: Optional[torch.Tensor] = None)

Take in and process masked source/target sequences.

Parameters
  • driver_src (torch.Tensor) – (n_batch, d_dr) the sequence to the encoder (required) .

  • bv_src (torch.Tensor) – (n_batch, n_alt, d_bv) the sequence to the decoder (required).

  • src_key_padding_mask (torch.Tensor, optional) – the ByteTensor mask for src keys per batch (optional).

generate_square_subsequent_mask(sz: int) torch.Tensor

Generate a square mask for the sequence. The masked positions are filled with float(‘-inf’).

Parameters

sz (int) – Unmasked positions are filled with float(0.0).

training: bool

abcgan.model module

class abcgan.model.Critic(transformer: torch.nn.modules.module.Module, n_layers=4, img_dim=12, hidden_dim=128)

Bases: torch.nn.modules.module.Module

Critic Class

Parameters
  • transformer (torch.nn.Module) – transformer for the critic

  • n_layers (int) – number of layers in MLP

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(bv_features, driver_src, real, src_key_mask=None)

Function for completing a forward pass of the critic: Given an image tensor, returns a 1-dimension tensor representing a fake/real prediction.

Parameters
  • bv_features (torch.Tensor) – a flattened image tensor with dimension (n_batch, max_alt, n_bv_feat)

  • driver_src (torch.Tensor) – tensor of driver features from data loader (n_batch, n_dr_feat)

  • real (torch.Tensor) – tensor of bv features from data loader (n_batch, n_alt, n_bv_feat)

  • src_key_mask (torch.Tensor, optional) – mask for bv features from data loader (n_batch, n_alt)

training: bool
class abcgan.model.Driver_Critic(n_layers=2, img_dim=18, hidden_dim=64)

Bases: torch.nn.modules.module.Module

Critic Class

Parameters
  • n_layers (int) – number of layers in MLP

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(dr_src, dr_prev)

forward pass of the critic for driver augmentation: Given an image tensor, returns a 1-dimension tensor representing a fake/real prediction.

Parameters
  • dr_src (torch.Tensor) – tensor of driver features (n_batch, n_dr_feat)

  • dr_prev (torch.Tensor) – tensor of past driver features (n_batch, n_dr_feat)

training: bool
class abcgan.model.Driver_Generator(n_layers=2, latent_dim=16, img_dim=18, hidden_dim=64)

Bases: torch.nn.modules.module.Module

Generator Class

Parameters
  • n_layers (int) – number of MLP layers

  • latent_dim (int) – the dimension of the input latent vector

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(dr_prev, noise=None)

forward pass of the generator for driver augmentation: Given driver sample from the past and noise tensor, returns generated driver sample.

Parameters
  • dr_prev (torch.Tensor) – tensor of past driver features from data loader (n_batch, n_dr_feat)

  • noise (torch.Tensor, optional) – a noise tensor with dimensions (n_batch, latent_dim)

training: bool
class abcgan.model.Generator(transformer: torch.nn.modules.module.Module, n_layers=4, latent_dim=16, img_dim=12, hidden_dim=128)

Bases: torch.nn.modules.module.Module

Generator Class

Parameters
  • transformer (torch.nn.Module) – transformer for the generator

  • n_layers (int) – number of MLP layers

  • latent_dim (int) – the dimension of the input latent vector

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(driver_src, bv_src, src_key_mask=None, noise=None)

Function for completing a forward pass of the generator: Given a noise tensor, returns generated images.

Parameters
  • driver_src (torch.Tensor) – tensor of driver features from data loader (n_batch, n_dr_feat)

  • bv_src (torch.Tensor) – tensor of bv featrues from data loader (n_batch, n_alt, n_bv_feat)

  • src_key_mask (torch.Tensor, optional) – mask for bv features from data loader (n_alt, n_batch)

  • noise (torch.Tensor, optional) – a noise tensor with dimensions (n_batch, latent_dim)

training: bool

abcgan.persist module

This module supports persistence of the generator and discriminator.

It saves two files a parameters file and a configuration file.

It also supports persisting of multiple modules.

To be persistable in this way the module must have a property containing a json serializable input dictionary as mdl.input_args

abcgan.persist.fullname(inst)
abcgan.persist.persist(generator, critic, name='wgan_gp', dir_path='/home/valentic/sandbox/atmosense/test/lib/python3.9/site-packages/abcgan/models')

Persists abcgan generator and critic modules.

Persists both input arguments and parameters.

Parameters
  • generator – torch.nn.Module module for the generator

  • critic – torch.nn.Module module for the critic

  • name – str, optional name of the saved configuration

  • dir_path – str, optional default is the models directory. None assumes file is in local directory.

The generator, critic and any transformers passed in as arguments to these must be registered in persist.py and must have a parameter ‘input_args’ that specifies their input arguments as a dictionary

abcgan.persist.recreate(name='wgan_gp', dir_path='/home/valentic/sandbox/atmosense/test/lib/python3.9/site-packages/abcgan/models')

Load a pre-trained generator and discriminator.

Parameters
  • name (str, optional) – name of the configuration to load, as saved by persist. default: ‘wgan_gp’

  • dir_path (str, optional) – default is the models directory. None assumes file is in local directory.

Returns

  • generator (torch.nn.module) – the loaded generator

  • critic (torch.nn.module) – the loaded critic

  • Modules must have previosuly been saved. All modules are

  • loaded on the cpu, they can subsequently be moved.

abcgan.transforms module

Transforms to and from z-scaled variables.

Uses numpy only (no pytorch)

abcgan.transforms.compute_valid(bvs, bv_thresholds=array([[- 1.00000000e+00, 2.88214929e+14], [1.00000000e+00, 1.89264686e+12], [- 1.00000000e+00, 5.00000000e+05], [- 1.00000000e+00, 4.39506857e+09], [- 1.00000000e+00, 1.00247000e+05], [- 1.00000000e+00, 8.45428636e+06], [- 2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03], [- 2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03], [- 2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03]]))
abcgan.transforms.decode(data, driver_names)

Encode variables, or just add extra dimension

Parameters
  • data (np.ndarray) – array of feature values.

  • driver_names (list: str) – list driver names in data

Returns

enc – array of encoded variables

Return type

np.ndarray

abcgan.transforms.encode(data, name)

Encode variables, or just add extra dimension

Parameters
  • data (np.ndarray) – array of variable values.

  • name (str) – name of the variable.

Returns

enc – array of encoded variables (with an extra dimension in all cases)

Return type

np.ndarray

abcgan.transforms.get_bv(bv_feat, bv_type='radar')

Invert featurization to recover bvs.

Parameters
  • bv_feat (np.ndarray) – n_samples x n_bv_feat

  • bv_type (str) – radar or lidar bvs

Returns

scaled_feat – n_samples x n_bv

Return type

np.ndarray

abcgan.transforms.get_driver(driver_feat, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MLT', 'SLT', 'SZA', 'ap', 'MEI', 'RMM1', 'RMM2', 'TCI', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Invert featurization to recover driving parameters.

Parameters
  • driver_feat (np.ndarray) – n_samples x n_driver_feat

  • driver_names (list: str) – list driver names in driver_feat

Returns

original driver – n_samples x n_driver

Return type

np.ndarray

abcgan.transforms.scale_bv(bvs, bv_type='radar')

Return a scaled version of the drivers.

Parameters
  • bvs (np.ndarray) – n_samples x n_bv

  • bv_type (str) – string specifying weather to scale

Returns

bv_feat – n_samples x n_bv_feat

Return type

np.ndarray

abcgan.transforms.scale_driver(drivers, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MLT', 'SLT', 'SZA', 'ap', 'MEI', 'RMM1', 'RMM2', 'TCI', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Return a scaled version of the drivers.

Parameters
  • drivers (np.ndarray) – n_samples x n_driver

  • driver_names (list: str) – list of driver names

Returns

driver_feat – n_samples x n_driver_feat

Return type

np.ndarray

Module contents

abcgan.anomaly_score(drivers, data=None, model='mm_gan_radar', bv_type='radar')

returns unbounded anomaly score for a given set of driver parameters and data. more positive numbers are more confident.

Parameters
  • drivers (np.ndarray) – 1 x n_drivers input driving parameters (not z-scaled). one sample at a time

  • data (np.ndarray) – 1 x n_alt_in x n_meas

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns

anomalies – 1 x n_alt x n_feat output of anomaly scores (unbounded).

Return type

1 xnp.ndarray, np.ndarray

abcgan.discriminate(drivers, measurements, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MLT', 'SLT', 'SZA', 'ap', 'MEI', 'RMM1', 'RMM2', 'TCI', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], model='mm_gan_radar', bv_type='radar')

Score how well the measurements match with historical observations.

Parameters
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • measurements (np.ndarray) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than max_alt.

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns

scores – n_samples x n_alt output normalcy scores in the range [0, 1.0].

Return type

np.ndarray

abcgan.estimate_drivers(drivers, model='dr_gan')

Predict drivers 2 hours into the future driver GAN model. Used for real-time background predictions using drivers from 2 hours ago.

Parameters
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • model (str, optional) – name of model to use

Returns

predicted_drivers – estimation of driver features two hours from the drivers inputted

Return type

np.ndarray

abcgan.gen_stats(drivers, data=None, model='mm_gan_radar', bv_type='radar')

Statistical distribution of 10,000 upper altitude data points conditioned on driver parameters and lower altitude measurements.

Parameters
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • data (np.ndarray) – n_samples x n_alt_in x n_meas

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns

samples – 2xn_avg*n_samples x n_alt x n_feat output anomaly scores (unbounded). The first element is the fake output. The second array entry contains the scaled background variables with repeats

Return type

[np.ndarray, np.ndarray]

abcgan.generate(drivers, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MLT', 'SLT', 'SZA', 'ap', 'MEI', 'RMM1', 'RMM2', 'TCI', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], measurements=None, n_alt=30, model='mm_gan_radar', bv_type='radar')

Generate synthetic data consistent with the historical distribution.

Parameters
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • measurements (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt. These represent fixed measurements for the lowest altitudes to condition on. Usually left as default (None)

  • n_alt (int, optional) – number of altitude measurements to draw, defaults to max_alt

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns

samples – n_samples x n_alt x n_meas output measurements at each requested altitude. If measurements is not None then the measurements for the first n_alt_in will be copied over from the input.

Return type

np.ndarray

abcgan.stack_bvs(bv_dict, bv_type='radar')

Stacks drivers in appropriate format.

This function is provided for convenience.

Parameters
  • bv_dict (dict) – Dictionary mapping names of background variables to numpy arrays with values for those bvs. Each array should have shape n_sapmles x n_altitudes. Can also use h5py.Group.

  • bv_type (str) – string specifying weather to stack radar or lidar data

  • abcgan.bv_names (Valid names for drivers can be found at) –

Raises
  • ValueError: – If the input shape of the bv dict values is not corrects

  • KeyError: – If one of the required bvs is missing.

abcgan.stack_drivers(driver_dict, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MLT', 'SLT', 'SZA', 'ap', 'MEI', 'RMM1', 'RMM2', 'TCI', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Stacks drivers in appropriate format.

This function is provided for convenience.

Parameters
  • driver_dict (dict) – Dictionary mapping names of drivers to the numpy arrays with values for those drivers. Each array has a single dimension of the same length n_samples. Can also use an h5py.Group.

  • driver_names (list) – names of the drivers to load

  • abcgan.driver_names (Valid names for drivers can be found at) –

Raises
  • ValueError: – If the driver values have the wrong type or shape.

  • KeyError: – If one of the required drivers is missing.