multi_med_image_ml package¶
Submodules¶
multi_med_image_ml.DataBaseWrapper module¶
- class multi_med_image_ml.DataBaseWrapper.DataBaseWrapper(database=None, filename=None, labels=[], confounds=[], X_dim=None, key_to_filename=<function key_to_filename_default>, val_ranges={}, precedence=[])¶
Bases:
object
Wrapper for Pandas table to cache some common and repeated functions
DataBaseWrapper stores a pandas dataframe that contains metadata about the images being read in. It also builds up this dataframe in real time if only DICOM or NIFTI/JSON files are present. One purpose is to translate tokenized values in the dataframe to one-hot vectors that can be read by a DL model in the fastest way possible. Another purpose is to contain a storage of cached image files of a particular size.
- database¶
The internal Pandas dataframe (default None)
- Type:
pandas.DataFrame
- filename¶
The filename of the Pandas Dataframe pickle (default None)
- Type:
str
- labels¶
Labels that are read in by the current MedImageLoader (default [])
- Type:
list
- confounds¶
Confound names that are read in by the current MedImageLoader (default [])
- Type:
list
- val_ranges¶
List of values that can be returned. See MedImageLoader (default {})
- Type:
dict
- X_dim¶
Image dimensions (default None)
- Type:
tuple
- key_to_filename¶
Function to translate a key to a filename and back (default key_to_filename_default)
- Type:
callback
- jdict¶
Dictionary of values read from JSON files that are accumulated and periodically merged with the pandas Dataframe (database) as it’s build up. Used as an intermediary to prevent too much fragmenting in the DataFrame (default {})
- Type:
dict
- columns¶
Columns of the database. Used for quick reference.
- Type:
set
- add_json(nifti_file, json_file=None)¶
Adds a JSON from a DICOM file to the internal jdict, which will later be compiled into the DataFrame
- Parameters:
nifti_file (str) – Nifti (.nii.gz) file that was output by the DICOM preprocessing software
json_file (str) – JSON file was output by the DICOM preprocessing software. If not present, searches for the JSON file in the same directory as the the input nifti file.
- build_metadata()¶
Builds internal lookup tables used to convert continuous and label- based variables to one-hot vectors that can be read by an ML model.
- get_ID(npy_file: str) str ¶
Returns the Patient ID, if present in the database. Attempts to guess it using the keys ‘PatientID’ and ‘Patient ID’
- Parameters:
npy_file (str) – Cached numpy file of the image
- Returns:
ID of the patient in question
- Return type:
id (str)
- get_birth_date(npy_file: str) date ¶
- get_confound_encode(npy_file: str) list ¶
Returns list of integers that represent the confounds of a given input file
- Parameters:
npy_file (str) – Numpy file of the given record, which is converted into a key
- Returns:
A list of integers indicating the nth confound of the datapoint
- Return type:
cnum_list (list)
- get_exam_date(npy_file: str) date ¶
- get_file_list()¶
- get_label_encode(npy_file: str)¶
Returns list of integers that represent the confounds of a given input file
- Parameters:
npy_file (str) – Numpy file of the given record, which is converted into a key
- Returns:
A list of integers indicating the nth label of the datapoint
- Return type:
cnum_list (list)
- has_im(im: ImageRecord) bool ¶
Determines if the DataBaseWrapper contains a given ImageRecord
- Parameters:
im (ImageRecord) – Input image
- Returns:
bool - Whether image is present in database
- in_val_ranges(fkey: str) bool ¶
Determines if the input fkey is in a valid value range
- Parameters:
fkey (str) – Lookup key to the pandas DataFrame
- Returns:
bool - Whether the key is present
- loc_val(npy_file, c)¶
- out_dataframe(fkey_ass=None)¶
Merges the jdict with the dataframe and outputs it.
- parse_date(d, date_format='%Y-%m-%d %H:%M:%S')¶
Parses the date string
- stack_list_by_label(filename_list, label)¶
Reorganizes an input filename list by the list of labels. So, a list of filenames of men and women with sex as the given input label will be returned as two lists of men and women.
- Parameters:
filename_list (list) – List of filenames
label (str) – Label to be reorganized by. Must be one of the confounds or labels in the DataBaseWrapper.
- Returns:
List of separate filename lists
- Return type:
filename_list_stack (list[list])
multi_med_image_ml.MedImageLoader module¶
- class multi_med_image_ml.MedImageLoader.MedImageLoader(*image_folders, pandas_cache='./pandas/', cache=True, key_to_filename=<function key_to_filename_default>, batch_by_pid=False, file_record_name=None, database=None, batch_size=14, X_dim=(96, 96, 96), get_encoded=False, static_inputs=None, confounds=[], match_confounds=[], label=[], augment=True, val_ranges={}, dtype='torch', Y_dim=(1, 32), C_dim=(16, 32), return_obj=False, channels_first=True, gpu_ids='', save_ram=True, precedence=[], n_dyn_inputs=14, verbose=False)¶
Bases:
object
Loads medical images into a format that may be used by MultiInputModule.
This loader preprocesses, reshapes, augments, and batches images and metadata into a format that may be read by MultiInputModule. It additionally may apply a data matching algorithm to ensure that no overly confounded data is fed into the model during training. It is capable of maintaining different lists of images to balance classes for both the classifier and regressor.
- database¶
Object used to store and access metadata about particular files. MedImageLoader builds this automatically from a folder, or it can read from one directly if it’s already been built (default None)
- Type:
- X_dim¶
Three-tuple dimension to which the images will be resized to upon output (default (96,96,96))
- Type:
tuple
- Y_dim¶
A tuple indicating the dimension of the image’s label. The first number is the number of labels associated with the image and the second is the number of choices that has. Extra choices will not affect the model but fewer will throw an error — thus, if Y_dim is (1,2) and the label has three classes, it will crash. But (1,4) will just result in an output that is always zero. This should match the Y_dim parameter in the associated MultiInputModule (default (1,32))
- Type:
tuple
- C_dim¶
A tuple indicating the dimension of the image’s confounds. This effectively operates the same way as Y_dim, except the default number of confounds is higher (default (16,32))
- Type:
tuple
- batch_size¶
Max number of images that can be read in per batch. Note that if batch_by_pid is True, this is the maximum number of images that can be read in, and it’s best to set it to the same value as n_dyn_inputs in MultiInputModule (default 14)
- Type:
int
- augment¶
Whether to augment images during training. Note that this only works if the images are returned directly (i.e. return_obj = False). Otherwise images are augmented when get_X is called from ImageRecord (default True)
- Type:
bool
- dtype¶
Type of image to be returned – either “torch” or “numpy” (default “torch”)
- Type:
str
- label¶
List of labels that will be read in from DataBaseWrapper to the Y output. Must be smaller than the first value of Y_dim.
- Type:
list
- confounds¶
List of confounds that will be read in from DataBaseWrapper to the C output. Must be smaller than the first value of C_dim.
- Type:
list
- pandas_cache¶
Directory in which the database pandas file is stored
- Type:
str
- cache¶
Whether to cache images of a particular dimension as .npy files, for faster reading and indexing in the database (default True)
- Type:
str
- key_to_filename¶
Function that translates a key to the DataBaseWrapper into a full filepath from which an image may be read. Needs to accept an additional parameter to reverse this as well (default key_to_filename_default)
- Type:
callback
- batch_by_pid¶
Whether to batch images together by their Patient ID in a BatchRecord or not (default False)
- Type:
bool
- file_record_name¶
Path of the record of files that were read in by the MedImageLoader, if it needs to be examined later (default None)
- Type:
str
- channels_first¶
Whether to put channels in the first or last dimension of images (default True)
- Type:
bool
- save_ram¶
Clears images from ImageRecords and applies garbage collection frequently to save RAM. Useful for very large datasets (default True)
- Type:
bool
- static_inputs¶
List of variables from DataBaseWrapper that will be input as static, per-patient text inputs (like Sex of Ethnicity) to the MultiInputModule (default None)
- Type:
list
- val_ranges¶
Dictionary that may be used to indicate ranges of values that may be loaded in. So, if you want to only study males, val_ranges could be {‘SexDSC’:’MALE’}, and of you only wanted to study people between ages 30 and 60, val_ranges could be {‘Ages’:(30,60)}; these can be combined, too. Note that ‘Ages’ and ‘SexDSC’ must be present in DataBaseWrapper as metadata variable names for this to work (default {})
- Type:
dict
- match_confounds¶
Used to apply data matching between the labels. So, if you wanted to distinguish between AD and Controls and wanted to match by age, match_confounds could be set to [‘Ages’] and this would only return sets of AD and Control of the same age ranges. Note that this may severely limit the dataset or even return nothing if the match_confound variable and the label variable are mutually exclusive (default [])
- Type:
list
- all_records¶
Cache to store ImageRecords in and clear them if images in main memory get too high.
- Type:
multi_med_image_loader.Records.AllRecords
- n_dyn_inputs¶
Max number of inputs of the ML model, to be passed into BatchRecord when it’s used as a patient record (default 14)
- Type:
int
- precedence¶
Because labeling is by image in the database and diagnosis is by patient, this option allows “precedence” in labeling when assigning an overall label to a patient. So, if a patient has three images, two marked as “Healthy” and one marked as “Alzheimer’s”, you can pass “[Alzheimer’s,Healthy]” into precedence and it would assign the whole patient the “Alzheimer’s” label (default [])
- Type:
list
- build_pandas_database()¶
Builds up the entire Pandas DataFrame from the filesystem in one go. May take a while.
- get_file_list()¶
- load_image_stack()¶
Loads a stack of images to an internal queue
- name()¶
- read_record()¶
- record(flist, index=None)¶
- rotate_labels(zero_list_addendum=None)¶
- switch_stack()¶
- tl()¶
Top label
- multi_med_image_ml.MedImageLoader.key_to_filename_default(fkey, reverse=False)¶
multi_med_image_ml.MultiInputTester module¶
- class multi_med_image_ml.MultiInputTester.MultiInputTester(database, model, out_record_folder: str | None = None, checkpoint_dir: str | None = None, verbose: bool = False, name: str = 'experiment_name', test_name: str = '', include_inds: list = [0, 1])¶
Bases:
object
Used for testing the outputs of MultiInputModule.
MultiInputTester abstracts many of the functions for testing DL models, including grad cam and group AUC outputs.
- database¶
Associated database for testing
- Type:
- model¶
Model to be tested
- Type:
- out_record_folder¶
Folder to output results (default None)
- Type:
str
- checkpoint_dir¶
Folder that has model checkpoints (default none)
- Type:
str
- name¶
Name of the model to be tested (default ‘experiment_name’)
- Type:
str
- test_name¶
The name of the experiment (default “”)
- Type:
str
- database_key¶
Variable used when grouping data together for AUROC analysis
- Type:
str
- min_pids¶
(default 1)
- Type:
int
- top_not_mean¶
Given multiple AUC output files, this will select one randomly instead of coming up with the mean prediction of all of them
- Type:
bool
- include_inds¶
(default [0,1])
- Type:
list
- same_patients¶
If true, only plots AUC/Accuracy for patients that are equally divided between groups (default False)
- Type:
bool
- x_axis_opts¶
Whether the X axis of the plot should be “images”, “patients”, or “images_per_patient” (default: “images”)
- Type:
str
- acc(database_key=None, opt=None, divides=None, same_pids_across_groups=False, save=False, ind=0, acc_or_auc='acc')¶
- grad_cam(pr: BatchRecord, add_symlink: bool = True, grad_layer: int = 7, save: bool = True, database_key: str | None = None) Tensor ¶
Outputs a gradient class activation map for the input record
- Parameters:
pr (BatchRecord) – Image batch to apply Grad-Cam to
add_symlink (bool) – If true, adds a symbolic link to the original image in the same folder as the grad-cam is stored in (default True)
grad_layer (int) – (default 7)
save (bool) – Save the output to the results folder (default True)
- loop(pr: BatchRecord, record_encoding=False)¶
Tests one input and saves it.
- Parameters:
pr (BatchRecord) – Image batch
- out_grad_cam_groups(prefix=None)¶
- pca_analysis(database_keys: list, ml_model='pca')¶
- plot(ind=0, x_axis_opts='images', acc_or_auc='auc', database_key=None, opt=None, divides=None, same_pids_across_groups=False, min_pids=1)¶
- read_encodings()¶
- read_json()¶
Reads all json files output by MultiInputTester.
- record_encodings(X_files)¶
- save_encodings()¶
- exception multi_med_image_ml.MultiInputTester.NotEnoughPatients(message)¶
Bases:
Exception
multi_med_image_ml.MultiInputTrainer module¶
- class multi_med_image_ml.MultiInputTrainer.MultiInputTrainer(model, lr=1e-05, betas=(0.5, 0.999), loss_function=MSELoss(), batch_size=64, regress=True, loss_image_dir=None, checkpoint_dir=None, name='experiment_name', verbose=False, save_latest_freq=100)¶
Bases:
object
Used to train MultiInputModule.
MultiInputModule requires an adversarial technique to train it, and the various data queueing techniques used get a bit complicated, so this method is used to abstract all of that.
- model¶
Input model to train
- Type:
- lr¶
Learning rate
- Type:
float
- loss_function¶
Pytorch loss function. MSE loss is used instead of class entropy because it is smoother and tends to work a bit better with the adversarial function, but this can be tested further (default nn.MSELoss)
- name¶
Name of the model, which is used for saving checkpoints and output graphs (default ‘experiment_name’)
- Type:
str
- optimizer¶
Adam optimizer for the encoder/classifier. Incentivized to classify by the true label and set the regressor to the same values.
- Type:
torch.optim
- optimizer_reg¶
Adam optimizer for the encoder/regressor. Incentivized to detect confounds from each individual image.
- Type:
torch.optim
- loss_image_dir¶
If set, outputs images of the loss function for the optimizers over time, for the classifier, the regressor, and the adversarial loss (default None)
- Type:
str
- checkpoint_dir¶
If set, saves the model and optimizer state (default None)
- Type:
str
- save_latest_freq¶
Number of iterations before it saves the loss image and the checkpoint (default 100)
- Type:
int
- batch_size¶
Batch size of the training. Due to the optional-input nature, this cannot be set in the dataloader. Only one set of images can be passed through the loop at a given time. batch_size is how frequently the backpropagation algorithm is applied after graphs have accumulated (default 64)
- Type:
int
- verbose¶
Whether to print (default False)
- Type:
bool
- one_step¶
Boolean to determine whether optimizer (True) or optimizer_reg (False) is applied
- Type:
bool
- index¶
Counts the number of iterations the trainer has gone through
- Type:
int
- loop(pr: BatchRecord, dataloader=None)¶
Loops a single BatchRecord through one iteration
Loops a BatchRecord through one iteration. Also switches the queues of the MedImageLoader as it switches between optimizers.
- Parameters:
pr (multi_med_image_ml.Records.BatchRecord) – Record to be evaluated
dataloader (multi_med_image_ml.DataBaseWrapper.DataBaseWrapper) – Database
- test()¶
multi_med_image_ml.Records module¶
- class multi_med_image_ml.Records.AllRecords¶
Bases:
object
Contains a dictionary of BatchRecord
Used to both prevent duplicate data from being called and to be able to clear all images from main memory and perform garbage collection when necessary.
- image_dict¶
Dictionary of ImageRecord, mapped by their given filename
- Type:
dict
- mem_limit¶
Limit of memory that can be read into RAM
- Type:
int
- obj_size¶
Average size of an object given the image dimension of the dataloader
- Type:
int
- cur_mem¶
Count of current memory read in (TODO)
- Type:
int
- add(filename: str, im: ImageRecord)¶
- check_mem()¶
- clear_images()¶
- get(filename: str)¶
- get_mem()¶
- has(filename: str)¶
- class multi_med_image_ml.Records.BatchRecord(image_records: list, dtype: str = 'torch', sort: bool = True, batch_by_pid: bool = False, channels_first: bool = True, gpu_ids: str = '', batch_size: int = 14, get_text_records: bool = False)¶
Bases:
object
Class that stores batches of ImageRecord
BatchRecord essentially abstracts lists of ImageRecord so that it returns them in batches. It is also used to store patient data for instances in which patients have multiple images.
- image_records¶
List of ImageRecord classes
- Type:
list
- dtype¶
Type to be returned, either “torch” or “numpy” (default “torch”)
- Type:
str
- gpu_ids¶
GPU, if any, on which to read the images out to (default “”)
- Type:
list
- channels_first¶
Whether channels in the images are the first or last dimension (default True)
- Type:
bool
- batch_size¶
The maximum number of images that may be returned in an instance of get_X (default 14)
- Type:
int
- get_C()¶
- get_C_dud()¶
- get_X(augment=False)¶
- get_X_files()¶
- get_Y()¶
- get_birth_dates()¶
- get_exam_dates()¶
- get_static_inputs()¶
- get_text_records()¶
- name()¶
- class multi_med_image_ml.Records.FileLookup(filename=None, npy_name=None, fkey=None)¶
Bases:
object
- file()¶
- key()¶
- npy_file()¶
- class multi_med_image_ml.Records.ImageRecord(filename: str, static_inputs: list = [], database=None, X_dim: tuple = (96, 96, 96), dtype: str = 'torch', extra_info_list: list | None = None, y_on_c: bool = True, cache: bool = True, Y_dim: tuple = (1, 32), C_dim: tuple = (16, 32), y_nums: list | None = None, c_nums: list | None = None)¶
Bases:
Record
A class used to represent an abstraction of an image for MedImageLoader.
ImageRecord is used to keep and organize a given image in main memory. The same image may be represented on the file system as a nifti, dicom, or an npy file, which caches the file at a particular size. This reads in the file without creating duplicates. The image may also be cleared or read in in real time to avoid having the images take up too much space in main memory.
- filename¶
Filename of the image
- Type:
str
- database¶
Object used to quickly look up metadata associated with the image (default None)
- Type:
str
- dtype¶
Type of output (either “torch” or “numpy”) (default “torch”)
- Type:
str
- extra_info_list¶
- Type:
list
- X_dim¶
Standard dimension that the image will be resized to upon returning it (default (96,96,96))
- Type:
tuple
- Y_dim¶
A tuple indicating the dimension of the image’s label. The first number is the number of labels associated with the image and the second is the number of choices that has. Extra choices will not affect the model but fewer will throw an error — thus, if Y_dim is (1,2) and the label has three classes, it will crash. But (1,4) will just result in an output that is always zero. This should match the Y_dim parameter in the associated MultiInputModule (default (1,32))
- Type:
tuple
- C_dim¶
A tuple indicating the dimension of the image’s confounds. This effectively operates the same way as Y_dim, except the default number of confounds is higher (default (16,32))
- Type:
tuple
- image¶
Variable containing the actual image, at size dim. It may be None, to save memory (default None)
- Type:
Numpy array
- Y¶
Variable containing the encoding of the image label(s), at size Y_dim
- Type:
Numpy array
- C¶
Variable containing the encoding of the image confound(s), at size C_dim
- Type:
Numpy array
- y_on_c¶
If true, replicates the Y array on the bottom of all C arrays. Used for regression training. C_dim must to large enough to accommodate the extra Y array or it will crash. (default True)
- Type:
bool
- times_called¶
Counter to count the number of times get_X is called (default 0)
- Type:
int
- static_inputs¶
A list of values that may be called to put into the model as text (e.g. “SEX”, “AGE”)
- Type:
list
- static_input_res¶
The values once they’re looked up from the database (e.g. “MALE”, “22”)
- Type:
list
- cache¶
If true, caches the image file as a .npy array. Takes up extra space but it’s recommended. (default True)
- Type:
bool
- npy_file¶
Path of the cached record
- Type:
bool
- npy_file¶
Path of the cached .npy record
- Type:
str
- exam_date¶
Date that the image was taken, if it can be read in from the database/dicom records (default None)
- Type:
datetime
- bdate¶
Birth date of the patient, if it can be read in from the database/dicom records (default None)
- Type:
datetime
- json_file¶
File name of the json that results from a DICOM being converted to nifti (default None)
- Type:
str
- loaded¶
True if images are loaded into main memory, False if not (default False)
- Type:
bool
- clear_image()¶
Clears the array data from main memory
- get_C()¶
- get_C_dud()¶
Returns an array of duds with the same dimensionality as C
Returns an array of duds with the same dimensionality as C but with all values set to the first choice. Used in training the regressor. If y_on_c is set to True, this replicates the Y array on the bottom rows of the array.
- get_X(augment=False)¶
Reads in and returns the image, with the option to augment
- get_X_files()¶
- get_Y()¶
- get_image_type()¶
Determines the type of image that self.filename is
- get_mem() float ¶
Estimates the memory of the larger objects stored in ImageRecord
- read_image()¶
- class multi_med_image_ml.Records.PatientRecord(pid, items)¶
Bases:
object
Returns text records, like medication history, of a given patient
- pid¶
Patient ID
- Type:
str
- get_record(item)¶
- get_records(confounds)¶
- class multi_med_image_ml.Records.Record(static_inputs=[], database=None)¶
Bases:
object
- get_ID()¶
- get_birth_date()¶
- get_exam_date()¶
- get_static_inputs()¶
Loads in static inputs from the database
- load_extra_info()¶
- multi_med_image_ml.Records.TextRecord(Record)¶
multi_med_image_ml.models module¶
- class multi_med_image_ml.models.AutoEncoder1D(input_dim, latent_dim=2, device='cpu')¶
Bases:
Module
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class multi_med_image_ml.models.Classifier(latent_dim, n_inputs, base_feat, n_out, n_labels)¶
Bases:
Module
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- parameters()¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Parameters:
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter – module parameter
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- training: bool¶
- class multi_med_image_ml.models.Decoder¶
Bases:
Module
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class multi_med_image_ml.models.Decoder1D(input_dim, output_dim, conv=False)¶
Bases:
Module
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class multi_med_image_ml.models.Encoder(latent_dim=512)¶
Bases:
Module
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- parameters()¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Parameters:
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter – module parameter
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- training: bool¶
- class multi_med_image_ml.models.Encoder1D(input_dim, output_dim, conv=False)¶
Bases:
Module
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class multi_med_image_ml.models.EnsembleModel(model_list)¶
Bases:
Module
- forward(input, hidden)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class multi_med_image_ml.models.MultiInputModule(Y_dim: tuple = (1, 32), C_dim: tuple = (16, 32), n_dyn_inputs: int = 14, n_stat_inputs: int = 2, use_attn: bool = False, encode_age: bool = False, variational: bool = False, zero_input: bool = False, remove_uncertain: bool = False, device=device(type='cpu'), latent_dim: int = 128, weights: str | None = None, grad_layer: int = 7)¶
Bases:
Module
Takes variable imaging and non-imaging data and outputs a prediction
Can take multiple, variable-sized images and static text inputs as input and output a label prediction, while also regressing confounds.
- encoder¶
Encodes input images to latent array
- Type:
nn.Module
- classifier¶
Takes multiple images encoded by the encoder and combines them into a single predictive value
- Type:
nn.Module
- regressor¶
Optional network that regresses confounds from the encoder’s latent representation using adversarial regression
- Type:
nn.Module
- Y_dim¶
A tuple indicating the dimension of the image’s label. The first number is the number of labels associated with the image and the second is the number of choices that has. Extra choices will not affect the model but fewer will throw an error — thus, if Y_dim is (1,2) and the label has three classes, it will crash. But (1,4) will just result in an output that is always zero. This should match the Y_dim parameter in the associated Records class (default (1,32))
- Type:
tuple
- C_dim¶
A tuple indicating the dimension of the image’s confounds. This effectively operates the same way as Y_dim, except the default number of confounds is higher (default (16,32))
- Type:
tuple
- n_dyn_inputs¶
The maximum number of images that can be passed in (default 14)
- Type:
int
- n_stat_inputs¶
The maximum number of text-based static inputs that can be input into the model (default 2)
- Type:
int
- encode_age¶
Encode the age of the patient on individual images prior to being input into the classifier (default False)
- Type:
bool
- device¶
GPU/CPU that the module is on (default: torch.device(‘cpu’))
- Type:
torch.device
- weights¶
Pretrained weight indicator. Weights automatically download if this is set. Default options must be in place or results are unpredictable. (default None)
- Type:
str
- latent_dim¶
Size of the intermediary representation that the encoder outputs and inputs into the classifier (default 128)
- Type:
int
- variational¶
Turns the encoding into a variational setup, a la a variational autoencoder, in which the encoding is sampled from a Gaussian distribution rather than a set array of numbers (default False)
- Type:
bool
- remove_uncertain¶
UNIMPLEMENTED/UNTESTED. Experimental subroutine designed to remove from consideration encoded images that are a certain “distance” from the training set (default False)
- Type:
bool
- use_attn¶
UNIMPLEMENTED/UNTESTED. Adds an attention mechanism to the classifier (default False)
- Type:
bool
- num_training_samples¶
Number of training samples to sample for the uncertainty removal mechanism (default 300)
- Type:
int
- static_record¶
Set of static keys put into the model during training, to prevent unrecognized keys from being input during testing
- Type:
set
- activations_hook(grad)¶
- classifier_freeze()¶
- classifier_parameters()¶
- cpu()¶
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
self
- Return type:
Module
- cuda(device)¶
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Parameters:
device (int, optional) – if specified, all parameters will be copied to that device
- Returns:
self
- Return type:
Module
- forward(x, static_input=None, dates=None, bdate=None, return_regress=False, return_encoded=False, encoded_input=False, grad_eval=False, record_encoding=False)¶
Puts image or BatchRecord through model and predicts a value.
- Parameters:
x (torch.Tensor or BatchRecord) – Image or BatchRecord that contains data to be predicted
static_input (list) – List of text to be input into model
dates (list[datetime.datetime]) – List of dates input in the model, when a BatchRecord is not input
bdate (datetime.datetime) – Patient birthdate, when a BatchRecord is not input
return_regress (bool) – If True, returns the confound prediction array as a second value (default False)
encoded (return) – If True, returns the encoded values of the images (default False)
encoded_input (bool) – Indicator that X is input that’s already been encoded and can be put straight into the classifier (default False)
record_encoding (bool) – If set, saves the most recent encoding as a numpy file in the variable saved_encoding
- forward_ensemble(kwargs, n_ens=10)¶
- get_activations(x)¶
- get_activations_gradient()¶
- load_state_dict(state_dict, *args, **kwargs)¶
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Parameters:
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type:
NamedTuple
withmissing_keys
andunexpected_keys
fields
Note
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- regressor_freeze()¶
- state_dict(*args, **kwargs)¶
Returns a dictionary containing references to the whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Note
The returned object is a shallow copy. It contains references to the module’s parameters and buffers.
Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Parameters:
destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default:
''
.keep_vars (bool, optional) – by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set toTrue
, detaching will not be performed. Default:False
.
- Returns:
a dictionary containing a whole state of the module
- Return type:
dict
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> module.state_dict().keys() ['bias', 'weight']
- training: bool¶
- class multi_med_image_ml.models.Regressor(latent_dim, n_confounds, n_choices, device='cpu')¶
Bases:
Module
- cpu()¶
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
self
- Return type:
Module
- cuda(device)¶
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Parameters:
device (int, optional) – if specified, all parameters will be copied to that device
- Returns:
self
- Return type:
Module
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- load_state_dict(state_dict, *args, **kwargs)¶
Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Parameters:
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns:
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type:
NamedTuple
withmissing_keys
andunexpected_keys
fields
Note
If a parameter or buffer is registered as
None
and its corresponding key exists instate_dict
,load_state_dict()
will raise aRuntimeError
.
- parameters()¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
- Parameters:
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
- Yields:
Parameter – module parameter
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> for param in model.parameters(): >>> print(type(param), param.size()) <class 'torch.Tensor'> (20L,) <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
- state_dict(*args, **kwargs)¶
Returns a dictionary containing references to the whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Note
The returned object is a shallow copy. It contains references to the module’s parameters and buffers.
Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Parameters:
destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default:
''
.keep_vars (bool, optional) – by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set toTrue
, detaching will not be performed. Default:False
.
- Returns:
a dictionary containing a whole state of the module
- Return type:
dict
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> module.state_dict().keys() ['bias', 'weight']
- training: bool¶
- class multi_med_image_ml.models.Reshape(*target_shape)¶
Bases:
Module
Used in a nn.Sequential pipeline to reshape on the fly.
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class multi_med_image_ml.models.VAE(input_dim, latent_dim=2, device=device(type='cpu'))¶
Bases:
Module
- forward(x)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- reset_parameters()¶
- training: bool¶
- multi_med_image_ml.models.get_age_arr(age, max_age=120.0, d=512)¶
- multi_med_image_ml.models.get_age_encoding(date, birthdate, d=512)¶
- multi_med_image_ml.models.time_index(i, pos, d=512, c=10000)¶
multi_med_image_ml.utils module¶
- multi_med_image_ml.utils.YC_conv(Y, C, y_weight)¶
- multi_med_image_ml.utils.bucketize(arr, n_buckets)¶
- multi_med_image_ml.utils.check_key_to_filename(key_to_filename: Callable[[str, bool], str])¶
Verifies that the key to file name conversion method is working properly
This method is called to verify that a user-defined key-to-filename function is properly implemented, such that the function is able to convert an input path to a key forwards and backwards.
- multi_med_image_ml.utils.class_balance(classes, confounds, plim=0.05, recurse=True, exclude_none=True, unique_classes=None)¶
- multi_med_image_ml.utils.compile_dicom(dicom_folder: str, cache=True, db_builder=None, verbose=False)¶
Compiles a folder of DICOMs into a .nii and .json file
Takes a folder of dicom files and turns it into a .nii.gz file, with metadata stored in a .json file. Relies on dcm2niix.
- Parameters:
dicom_folder (str) – The folder with DICOM files
cache (bool) – Whether to cache .npy files in the DICOM folder
db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Object that may optionally be input for building up the database
- multi_med_image_ml.utils.compile_dicom_folder(dicom_folder: str, db_builder=None)¶
Converts a folder of dicoms to a .nii.gz, with .json metadata
Uses dcm2niix, since that’s had the best results overall when converting dicom to nifti, even though it’s a system command. Uses pydicom as a backup. The resulting files are stored in the folder. Also takes a DatabaseWrapper object for building the database in real time.
- Parameters:
dicom_folder (str) – Folder of interest
db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Optional input for building up the database
- multi_med_image_ml.utils.compile_dicom_py(dicom_folder: str)¶
- multi_med_image_ml.utils.date_sorter(folder, ext)¶
- multi_med_image_ml.utils.determine_random_partition(arr2d, labels)¶
- multi_med_image_ml.utils.determine_random_partition2(arr2d, labels)¶
- multi_med_image_ml.utils.diagnose_network(net, name='network')¶
Calculate and print the mean of average absolute(gradients)
- Parameters:
net (torch network) –
name (str) –
- multi_med_image_ml.utils.discretize_value(v, buckets)¶
- multi_med_image_ml.utils.download_file_from_google_drive(file_id: str, destination: str)¶
Downloads files from Google drive
Downloads files from Google drive and saves them to a destination.
- Parameters:
file_id (str) – ID in the Google Drive URL
destination (str) – Place to save the file to
- multi_med_image_ml.utils.download_weights(weights: str)¶
Downloads and caches pretrained model weights
Downloads model weights from Google drive and stores them in the user’s cache for future use.
- Parameters:
weights (str) – String indicating which weights can be used.
- multi_med_image_ml.utils.encode_static_inputs(static_input, d=512)¶
- multi_med_image_ml.utils.equal_terms(term)¶
- multi_med_image_ml.utils.get_balanced_filename_list(test_variable, confounds_array, selection_ratios=[0.66, 0.16, 0.16], selection_limits=[inf, inf, inf], value_ranges=[], output_selection_savepath=None, test_value_ranges=None, get_all_test_set=False, total_size_limit=None, verbose=False, non_confound_value_ranges={}, database=None, n_buckets=10, patient_id_key=None)¶
- multi_med_image_ml.utils.get_class_selection(classes, primed, unique_classes=None)¶
- multi_med_image_ml.utils.get_confirm_token(response)¶
- multi_med_image_ml.utils.get_data_from_filenames(filename_list, test_variable=None, confounds=None, return_as_strs=False, unique_test_vals=None, database=None, return_choice_arr=False, dict_obj=None, return_as_dict=False, key_to_filename=None, X_encoder=None, vae_encoder=False, uniques=None, density_confound_sort=True, n_buckets=3)¶
- multi_med_image_ml.utils.get_dim_str(filename: str | None = None, X_dim: tuple | None = None, outtype: str = '.npy') str ¶
Converts an input filename to the filename of the cached .npy file
Given an input filename (e.g. /path/to/myfile.nii.gz) with a given dimension (e.g. (96,48,48)), converts the filepath to the cached version (e.g. /path/to/myfile_resized_96_48_48.npy). Perfect cube dimensions are annotated with a single number rather than three. If no filename is input, the string itself is returned (resized_96_48_48.npy).
- Parameters:
filename (str) – Name of the file to be converted (Default None)
X_dim (tuple) – Size that the image is going to be resized to (Default None)
outtype (str) –
- Returns:
String of the cached image file, or a string that can be added to a filename
- multi_med_image_ml.utils.get_file_list(obj, allow_list_of_list: str = True, db_builder=None)¶
Searches a folder tree for all applicable images.
Uses the os.walk method to search a folder tree and returns a list of image files. Relies on get_file_list_from_str and get_file_list_from_list to do so. Takes in a DataBaseWrapper (db_builder) to build up a pandas dataframe during the search.
- Parameters:
obj (list or str) – List of string of interest
allow_list_of_list (str) – Allows lists of lists to be parsed
db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Optional input to allow database to be build up
- multi_med_image_ml.utils.get_file_list_from_list(obj, allow_list_of_list=True, db_builder=None)¶
- multi_med_image_ml.utils.get_file_list_from_str(obj, db_builder=None)¶
- multi_med_image_ml.utils.get_first_n_primes(n)¶
- multi_med_image_ml.utils.get_lr(optimizer)¶
- multi_med_image_ml.utils.get_multilabel_acc(y_pred, Y)¶
- multi_med_image_ml.utils.get_none_array(classes=None, confounds=None)¶
- multi_med_image_ml.utils.get_prime_form(confounds, n_buckets, sorted_confounds=None)¶
- multi_med_image_ml.utils.integrate_arrs(S1, S2)¶
- multi_med_image_ml.utils.integrate_arrs_none(S1, S2)¶
- multi_med_image_ml.utils.is_dicom(filename)¶
Determines if file is dicom
- multi_med_image_ml.utils.is_float(N)¶
- multi_med_image_ml.utils.is_image_file(filename: str) bool ¶
Determines if input file is medical image
Determines if the input is an applicable image file. Excludes temporary files.
- Parameters:
filename (str) – Path to file
- Returns:
bool
- multi_med_image_ml.utils.is_list_str(s)¶
- multi_med_image_ml.utils.is_nan(k, inc_null_str=False)¶
- multi_med_image_ml.utils.key_to_filename_default(filename: str, reverse: bool = False) str ¶
Default function for converting a pandas key to a filename
This function can be replaced by a more elaborate one that is able to convert the location of a .npy file on a filesystem to a lookup key in a pandas dataframe. By default, the file path is the key.
- multi_med_image_ml.utils.label_to_community(labels)¶
- multi_med_image_ml.utils.list_to_str(val)¶
- multi_med_image_ml.utils.mod_meas(arr2d, labels)¶
- multi_med_image_ml.utils.multi_mannwhitneyu(arr)¶
- multi_med_image_ml.utils.nifti_to_np(nifti_filepath, X_dim)¶
- multi_med_image_ml.utils.not_temp(filename)¶
- multi_med_image_ml.utils.output_test(args, test_val_ranges, output_results, test_predictions_file, mucran, database, X_files=None, return_Xfiles=False)¶
- multi_med_image_ml.utils.parsedate(d, date_format='%Y-%m-%d %H:%M:%S')¶
- multi_med_image_ml.utils.prime(i, primes)¶
- multi_med_image_ml.utils.print_numpy(x, val=True, shp=False)¶
Print the mean, min, max, median, std, and size of a numpy array
- Parameters:
val (bool) –
shp (bool) –
- multi_med_image_ml.utils.recompute_selection_ratios(selection_ratios, selection_limits, N)¶
- multi_med_image_ml.utils.resize_np(nifti_data, dim)¶
- multi_med_image_ml.utils.save_image(image_numpy, image_path, aspect_ratio=1.0)¶
Save a numpy image to the disk
- Parameters:
image_numpy (numpy array) –
image_path (str) –
- multi_med_image_ml.utils.save_response_content(response, destination)¶
- multi_med_image_ml.utils.separate_set(selections, set_divisions=[0.5, 0.5], IDs=None)¶
- multi_med_image_ml.utils.str_to_list(s, nospace=False)¶
- multi_med_image_ml.utils.tensor2im(input_image, imtype=<class 'numpy.uint8'>)¶
“Converts a Tensor array into a numpy image array.
- Parameters:
input_image (tensor) –
imtype (type) –
- multi_med_image_ml.utils.test_all(classes, confounds)¶
- multi_med_image_ml.utils.text_to_bin(text, n_bin=32, d=512)¶
Encodes strings as binary arrays
- multi_med_image_ml.utils.validate_database(database, args)¶