multi_med_image_ml package

Submodules

multi_med_image_ml.DataBaseWrapper module

class multi_med_image_ml.DataBaseWrapper.DataBaseWrapper(database=None, filename=None, labels=[], confounds=[], X_dim=None, key_to_filename=<function key_to_filename_default>, val_ranges={}, precedence=[])

Bases: object

Wrapper for Pandas table to cache some common and repeated functions

DataBaseWrapper stores a pandas dataframe that contains metadata about the images being read in. It also builds up this dataframe in real time if only DICOM or NIFTI/JSON files are present. One purpose is to translate tokenized values in the dataframe to one-hot vectors that can be read by a DL model in the fastest way possible. Another purpose is to contain a storage of cached image files of a particular size.

database

The internal Pandas dataframe (default None)

Type:

pandas.DataFrame

filename

The filename of the Pandas Dataframe pickle (default None)

Type:

str

labels

Labels that are read in by the current MedImageLoader (default [])

Type:

list

confounds

Confound names that are read in by the current MedImageLoader (default [])

Type:

list

val_ranges

List of values that can be returned. See MedImageLoader (default {})

Type:

dict

X_dim

Image dimensions (default None)

Type:

tuple

key_to_filename

Function to translate a key to a filename and back (default key_to_filename_default)

Type:

callback

jdict

Dictionary of values read from JSON files that are accumulated and periodically merged with the pandas Dataframe (database) as it’s build up. Used as an intermediary to prevent too much fragmenting in the DataFrame (default {})

Type:

dict

columns

Columns of the database. Used for quick reference.

Type:

set

add_json(nifti_file, json_file=None)

Adds a JSON from a DICOM file to the internal jdict, which will later be compiled into the DataFrame

Parameters:
  • nifti_file (str) – Nifti (.nii.gz) file that was output by the DICOM preprocessing software

  • json_file (str) – JSON file was output by the DICOM preprocessing software. If not present, searches for the JSON file in the same directory as the the input nifti file.

build_metadata()

Builds internal lookup tables used to convert continuous and label- based variables to one-hot vectors that can be read by an ML model.

get_ID(npy_file: str) str

Returns the Patient ID, if present in the database. Attempts to guess it using the keys ‘PatientID’ and ‘Patient ID’

Parameters:

npy_file (str) – Cached numpy file of the image

Returns:

ID of the patient in question

Return type:

id (str)

get_birth_date(npy_file: str) date
get_confound_encode(npy_file: str) list

Returns list of integers that represent the confounds of a given input file

Parameters:

npy_file (str) – Numpy file of the given record, which is converted into a key

Returns:

A list of integers indicating the nth confound of the datapoint

Return type:

cnum_list (list)

get_exam_date(npy_file: str) date
get_file_list()
get_label_encode(npy_file: str)

Returns list of integers that represent the confounds of a given input file

Parameters:

npy_file (str) – Numpy file of the given record, which is converted into a key

Returns:

A list of integers indicating the nth label of the datapoint

Return type:

cnum_list (list)

has_im(im: ImageRecord) bool

Determines if the DataBaseWrapper contains a given ImageRecord

Parameters:

im (ImageRecord) – Input image

Returns:

bool - Whether image is present in database

in_val_ranges(fkey: str) bool

Determines if the input fkey is in a valid value range

Parameters:

fkey (str) – Lookup key to the pandas DataFrame

Returns:

bool - Whether the key is present

loc_val(npy_file, c)
out_dataframe(fkey_ass=None)

Merges the jdict with the dataframe and outputs it.

parse_date(d, date_format='%Y-%m-%d %H:%M:%S')

Parses the date string

stack_list_by_label(filename_list, label)

Reorganizes an input filename list by the list of labels. So, a list of filenames of men and women with sex as the given input label will be returned as two lists of men and women.

Parameters:
  • filename_list (list) – List of filenames

  • label (str) – Label to be reorganized by. Must be one of the confounds or labels in the DataBaseWrapper.

Returns:

List of separate filename lists

Return type:

filename_list_stack (list[list])

multi_med_image_ml.MedImageLoader module

class multi_med_image_ml.MedImageLoader.MedImageLoader(*image_folders, pandas_cache='./pandas/', cache=True, key_to_filename=<function key_to_filename_default>, batch_by_pid=False, file_record_name=None, database=None, batch_size=14, X_dim=(96, 96, 96), get_encoded=False, static_inputs=None, confounds=[], match_confounds=[], label=[], augment=True, val_ranges={}, dtype='torch', Y_dim=(1, 32), C_dim=(16, 32), return_obj=False, channels_first=True, recycle=True, gpu_ids='', save_ram=True, precedence=[], n_dyn_inputs=14)

Bases: object

Loads medical images into a format that may be used by MultiInputModule.

This loader preprocesses, reshapes, augments, and batches images and metadata into a format that may be read by MultiInputModule. It additionally may apply a data matching algorithm to ensure that no overly confounded data is fed into the model during training. It is capable of maintaining different lists of images to balance classes for both the classifier and regressor.

database

Object used to store and access metadata about particular files. MedImageLoader builds this automatically from a folder, or it can read from one directly if it’s already been built (default None)

Type:

DataBaseWrapper

X_dim

Three-tuple dimension to which the images will be resized to upon output (default (96,96,96))

Type:

tuple

Y_dim

A tuple indicating the dimension of the image’s label. The first number is the number of labels associated with the image and the second is the number of choices that has. Extra choices will not affect the model but fewer will throw an error — thus, if Y_dim is (1,2) and the label has three classes, it will crash. But (1,4) will just result in an output that is always zero. This should match the Y_dim parameter in the associated MultiInputModule (default (1,32))

Type:

tuple

C_dim

A tuple indicating the dimension of the image’s confounds. This effectively operates the same way as Y_dim, except the default number of confounds is higher (default (16,32))

Type:

tuple

batch_size

Max number of images that can be read in per batch. Note that if batch_by_pid is True, this is the maximum number of images that can be read in, and it’s best to set it to the same value as n_dyn_inputs in MultiInputModule (default 14)

Type:

int

augment

Whether to augment images during training. Note that this only works if the images are returned directly (i.e. return_obj = False). Otherwise images are augmented when get_X is called from ImageRecord (default True)

Type:

bool

dtype

Type of image to be returned – either “torch” or “numpy” (default “torch”)

Type:

str

label

List of labels that will be read in from DataBaseWrapper to the Y output. Must be smaller than the first value of Y_dim.

Type:

list

confounds

List of confounds that will be read in from DataBaseWrapper to the C output. Must be smaller than the first value of C_dim.

Type:

list

pandas_cache

Directory in which the database pandas file is stored

Type:

str

cache

Whether to cache images of a particular dimension as .npy files, for faster reading and indexing in the database (default True)

Type:

str

key_to_filename

Function that translates a key to the DataBaseWrapper into a full filepath from which an image may be read. Needs to accept an additional parameter to reverse this as well (default key_to_filename_default)

Type:

callback

batch_by_pid

Whether to batch images together by their Patient ID in a BatchRecord or not (default False)

Type:

bool

file_record_name

Path of the record of files that were read in by the MedImageLoader, if it needs to be examined later (default None)

Type:

str

channels_first

Whether to put channels in the first or last dimension of images (default True)

Type:

bool

save_ram

Clears images from ImageRecords and applies garbage collection frequently to save RAM. Useful for very large datasets (default True)

Type:

bool

static_inputs

List of variables from DataBaseWrapper that will be input as static, per-patient text inputs (like Sex of Ethnicity) to the MultiInputModule (default None)

Type:

list

val_ranges

Dictionary that may be used to indicate ranges of values that may be loaded in. So, if you want to only study males, val_ranges could be {‘SexDSC’:’MALE’}, and of you only wanted to study people between ages 30 and 60, val_ranges could be {‘Ages’:(30,60)}; these can be combined, too. Note that ‘Ages’ and ‘SexDSC’ must be present in DataBaseWrapper as metadata variable names for this to work (default {})

Type:

dict

match_confounds

Used to apply data matching between the labels. So, if you wanted to distinguish between AD and Controls and wanted to match by age, match_confounds could be set to [‘Ages’] and this would only return sets of AD and Control of the same age ranges. Note that this may severely limit the dataset or even return nothing if the match_confound variable and the label variable are mutually exclusive (default [])

Type:

list

all_records

Cache to store ImageRecords in and clear them if images in main memory get too high.

Type:

multi_med_image_loader.Records.AllRecords

n_dyn_inputs

Max number of inputs of the ML model, to be passed into BatchRecord when it’s used as a patient record (default 14)

Type:

int

precedence

Because labeling is by image in the database and diagnosis is by patient, this option allows “precedence” in labeling when assigning an overall label to a patient. So, if a patient has three images, two marked as “Healthy” and one marked as “Alzheimer’s”, you can pass “[Alzheimer’s,Healthy]” into precedence and it would assign the whole patient the “Alzheimer’s” label (default [])

Type:

list

build_pandas_database()

Builds up the entire Pandas DataFrame from the filesystem in one go. May take a while.

get_file_list()
load_image_stack()

Loads a stack of images to an internal queue

name()
read_record()
record(flist, index=None)
rotate_labels(zero_list_addendum=None)
switch_stack()
tl()

Top label

multi_med_image_ml.MedImageLoader.key_to_filename_default(fkey, reverse=False)

multi_med_image_ml.MultiInputTester module

class multi_med_image_ml.MultiInputTester.MultiInputTester(database, model, out_record_folder: str | None = None, checkpoint_dir: str | None = None, verbose: bool = False, name: str = 'experiment_name', test_name: str = '', database_key: str = 'ProtocolNameSimplified', min_pids: int = 1, top_not_mean: bool = False, include_inds: list = [0, 1], same_patients: bool = False, x_axis_opts: str = 'images')

Bases: object

Used for testing the outputs of MultiInputModule.

MultiInputTester abstracts many of the functions for testing DL models, including grad cam and group AUC outputs.

database

Associated database for testing

Type:

DataBaseWrapper

model

Model to be tested

Type:

MultiInputModule

out_record_folder

Folder to output results (default None)

Type:

str

checkpoint_dir

Folder that has model checkpoints (default none)

Type:

str

name

Name of the model to be tested (default ‘experiment_name’)

Type:

str

test_name

The name of the experiment (default “”)

Type:

str

database_key

Variable used when grouping data together for AUROC analysis

Type:

str

min_pids

(default 1)

Type:

int

top_not_mean

Given multiple AUC output files, this will select one randomly instead of coming up with the mean prediction of all of them

Type:

bool

include_inds

(default [0,1])

Type:

list

same_patients

If true, only plots AUC/Accuracy for patients that are equally divided between groups (default False)

Type:

bool

x_axis_opts

Whether the X axis of the plot should be “images”, “patients”, or “images_per_patient” (default: “images”)

Type:

str

acc(database_key=None, opt=None, divides=None, same_pids_across_groups=False)
auc(ind=0, database_key=None, opt=None, divides=None, same_pids_across_groups=False)
grad_cam(pr: BatchRecord, add_symlink: bool = True, grad_layer: int = 7) Tensor

Outputs a gradient class activation map for the input record

Parameters:
  • pr (BatchRecord) – Image batch to apply Grad-Cam to

  • add_symlink (bool) – If true, adds a symbolic link to the original image in the same folder as the grad-cam is stored in (default True)

  • grad_layer (int) – (default 7)

loop(pr: BatchRecord)

Tests one input and saves it.

Parameters:

pr (BatchRecord) – Image batch

plot(ind=0, x_axis_opts='images', acc_or_auc='auc', database_key=None, opt=None, divides=None, same_pids_across_groups=False)
read_json()

Reads all json files output by MultiInputTester.

multi_med_image_ml.MultiInputTrainer module

class multi_med_image_ml.MultiInputTrainer.MultiInputTrainer(model, lr=1e-05, betas=(0.5, 0.999), loss_function=MSELoss(), batch_size=64, regress=True, loss_image_dir=None, checkpoint_dir=None, name='experiment_name', verbose=False, save_latest_freq=100)

Bases: object

Used to train MultiInputModule.

MultiInputModule requires an adversarial technique to train it, and the various data queueing techniques used get a bit complicated, so this method is used to abstract all of that.

model

Input model to train

Type:

MultiInputModule

lr

Learning rate

Type:

float

loss_function

Pytorch loss function. MSE loss is used instead of class entropy because it is smoother and tends to work a bit better with the adversarial function, but this can be tested further (default nn.MSELoss)

name

Name of the model, which is used for saving checkpoints and output graphs (default ‘experiment_name’)

Type:

str

optimizer

Adam optimizer for the encoder/classifier. Incentivized to classify by the true label and set the regressor to the same values.

Type:

torch.optim

optimizer_reg

Adam optimizer for the encoder/regressor. Incentivized to detect confounds from each individual image.

Type:

torch.optim

loss_image_dir

If set, outputs images of the loss function for the optimizers over time, for the classifier, the regressor, and the adversarial loss (default None)

Type:

str

checkpoint_dir

If set, saves the model and optimizer state (default None)

Type:

str

save_latest_freq

Number of iterations before it saves the loss image and the checkpoint (default 100)

Type:

int

batch_size

Batch size of the training. Due to the optional-input nature, this cannot be set in the dataloader. Only one set of images can be passed through the loop at a given time. batch_size is how frequently the backpropagation algorithm is applied after graphs have accumulated (default 64)

Type:

int

verbose

Whether to print (default False)

Type:

bool

one_step

Boolean to determine whether optimizer (True) or optimizer_reg (False) is applied

Type:

bool

index

Counts the number of iterations the trainer has gone through

Type:

int

loop(pr: BatchRecord, dataloader=None)

Loops a single BatchRecord through one iteration

Loops a BatchRecord through one iteration. Also switches the queues of the MedImageLoader as it switches between optimizers.

Parameters:
test()

multi_med_image_ml.Records module

class multi_med_image_ml.Records.AllRecords

Bases: object

Contains a dictionary of BatchRecord

Used to both prevent duplicate data from being called and to be able to clear all images from main memory and perform garbage collection when necessary.

image_dict

Dictionary of ImageRecord, mapped by their given filename

Type:

dict

mem_limit

Limit of memory that can be read into RAM

Type:

int

obj_size

Average size of an object given the image dimension of the dataloader

Type:

int

cur_mem

Count of current memory read in (TODO)

Type:

int

add(filename: str, im: ImageRecord)
check_mem()
clear_images()
get(filename: str)
get_mem()
has(filename: str)
class multi_med_image_ml.Records.BatchRecord(image_records, dtype='torch', sort=True, batch_by_pid=False, channels_first=True, gpu_ids='', batch_size=14, get_text_records=False)

Bases: object

Class that stores batches of ImageRecord

BatchRecord essentially abstracts lists of ImageRecord so that it returns them in batches. It is also used to store patient data for instances in which patients have multiple images.

image_records

List of ImageRecord classes

Type:

list

dtype

Type to be returned, either “torch” or “numpy” (default “torch”)

Type:

str

gpu_ids

GPU, if any, on which to read the images out to (default “”)

Type:

list

channels_first

Whether channels in the images are the first or last dimension (default True)

Type:

bool

batch_size

The maximum number of images that may be returned in an instance of get_X (default 14)

Type:

int

get_C()
get_C_dud()
get_X(augment=False)
get_X_files()
get_Y()
get_birth_dates()
get_exam_dates()
get_static_inputs()
get_text_records()
name()
class multi_med_image_ml.Records.FileLookup(filename=None, npy_name=None, fkey=None)

Bases: object

file()
key()
npy_file()
class multi_med_image_ml.Records.ImageRecord(filename: str, static_inputs=[], database=None, X_dim: tuple = (96, 96, 96), dtype: str = 'torch', extra_info_list: list | None = None, y_on_c: bool = True, cache: bool = True, Y_dim: tuple = (1, 32), C_dim: tuple = (16, 32), y_nums: list | None = None, c_nums: list | None = None)

Bases: Record

A class used to represent an abstraction of an image for MedImageLoader.

ImageRecord is used to keep and organize a given image in main memory. The same image may be represented on the file system as a nifti, dicom, or an npy file, which caches the file at a particular size. This reads in the file without creating duplicates. The image may also be cleared or read in in real time to avoid having the images take up too much space in main memory.

filename

Filename of the image

Type:

str

database

Object used to quickly look up metadata associated with the image (default None)

Type:

str

dtype

Type of output (either “torch” or “numpy”) (default “torch”)

Type:

str

extra_info_list
Type:

list

X_dim

Standard dimension that the image will be resized to upon returning it (default (96,96,96))

Type:

tuple

Y_dim

A tuple indicating the dimension of the image’s label. The first number is the number of labels associated with the image and the second is the number of choices that has. Extra choices will not affect the model but fewer will throw an error — thus, if Y_dim is (1,2) and the label has three classes, it will crash. But (1,4) will just result in an output that is always zero. This should match the Y_dim parameter in the associated MultiInputModule (default (1,32))

Type:

tuple

C_dim

A tuple indicating the dimension of the image’s confounds. This effectively operates the same way as Y_dim, except the default number of confounds is higher (default (16,32))

Type:

tuple

image

Variable containing the actual image, at size dim. It may be None, to save memory (default None)

Type:

Numpy array

Y

Variable containing the encoding of the image label(s), at size Y_dim

Type:

Numpy array

C

Variable containing the encoding of the image confound(s), at size C_dim

Type:

Numpy array

y_on_c

If true, replicates the Y array on the bottom of all C arrays. Used for regression training. C_dim must to large enough to accommodate the extra Y array or it will crash. (default True)

Type:

bool

times_called

Counter to count the number of times get_X is called (default 0)

Type:

int

static_inputs

A list of values that may be called to put into the model as text (e.g. “SEX”, “AGE”)

Type:

list

static_input_res

The values once they’re looked up from the database (e.g. “MALE”, “22”)

Type:

list

cache

If true, caches the image file as a .npy array. Takes up extra space but it’s recommended. (default True)

Type:

bool

cached_record

Path of the cached record

Type:

bool

npy_file

Path of the cached .npy record

Type:

str

exam_date

Date that the image was taken, if it can be read in from the database/dicom records (default None)

Type:

datetime

bdate

Birth date of the patient, if it can be read in from the database/dicom records (default None)

Type:

datetime

json_file

File name of the json that results from a DICOM being converted to nifti (default None)

Type:

str

loaded

True if images are loaded into main memory, False if not (default False)

Type:

bool

clear_image()

Clears the array data from main memory

get_C()
get_C_dud()

Returns an array of duds with the same dimensionality as C

Returns an array of duds with the same dimensionality as C but with all values set to the first choice. Used in training the regressor. If y_on_c is set to True, this replicates the Y array on the bottom rows of the array.

get_X(augment=False)

Reads in and returns the image, with the option to augment

get_X_files()
get_Y()
get_image_type()

Determines the type of image that self.filename is

get_mem() float

Estimates the memory of the larger objects stored in ImageRecord

read_image()
class multi_med_image_ml.Records.PatientRecord(pid, items)

Bases: object

Returns text records, like medication history, of a given patient

pid

Patient ID

Type:

str

get_record(item)
get_records(confounds)
class multi_med_image_ml.Records.Record(static_inputs=[], database=None)

Bases: object

get_ID()
get_birth_date()
get_exam_date()
get_static_inputs()

Loads in static inputs from the database

load_extra_info()
multi_med_image_ml.Records.TextRecord(Record)

multi_med_image_ml.models module

class multi_med_image_ml.models.Classifier(latent_dim, n_inputs, base_feat, n_out, n_labels)

Bases: Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

parameters()

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters:

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields:

Parameter – module parameter

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
training: bool
class multi_med_image_ml.models.Decoder

Bases: Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class multi_med_image_ml.models.Encoder(latent_dim=512)

Bases: Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

parameters()

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters:

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields:

Parameter – module parameter

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
training: bool
class multi_med_image_ml.models.EnsembleModel(model_list)

Bases: Module

forward(input, hidden)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class multi_med_image_ml.models.MultiInputModule(Y_dim: tuple = (1, 32), C_dim: tuple = (16, 32), n_dyn_inputs: int = 14, n_stat_inputs: int = 2, use_attn: bool = False, encode_age: bool = False, variational: bool = False, zero_input: bool = False, remove_uncertain: bool = False, device=device(type='cpu'), latent_dim: int = 128, weights: str | None = None, grad_layer: int = 7)

Bases: Module

Takes variable imaging and non-imaging data and outputs a prediction

Can take multiple, variable-sized images and static text inputs as input and output a label prediction, while also regressing confounds.

encoder

Encodes input images to latent array

Type:

nn.Module

classifier

Takes multiple images encoded by the encoder and combines them into a single predictive value

Type:

nn.Module

regressor

Optional network that regresses confounds from the encoder’s latent representation using adversarial regression

Type:

nn.Module

Y_dim

A tuple indicating the dimension of the image’s label. The first number is the number of labels associated with the image and the second is the number of choices that has. Extra choices will not affect the model but fewer will throw an error — thus, if Y_dim is (1,2) and the label has three classes, it will crash. But (1,4) will just result in an output that is always zero. This should match the Y_dim parameter in the associated Records class (default (1,32))

Type:

tuple

C_dim

A tuple indicating the dimension of the image’s confounds. This effectively operates the same way as Y_dim, except the default number of confounds is higher (default (16,32))

Type:

tuple

n_dyn_inputs

The maximum number of images that can be passed in (default 14)

Type:

int

n_stat_inputs

The maximum number of text-based static inputs that can be input into the model (default 2)

Type:

int

encode_age

Encode the age of the patient on individual images prior to being input into the classifier (default False)

Type:

bool

device

GPU/CPU that the module is on (default: torch.device(‘cpu’))

Type:

torch.device

weights

Pretrained weight indicator. Weights automatically download if this is set. Default options must be in place or results are unpredictable. (default None)

Type:

str

latent_dim

Size of the intermediary representation that the encoder outputs and inputs into the classifier (default 128)

Type:

int

variational

Turns the encoding into a variational setup, a la a variational autoencoder, in which the encoding is sampled from a Gaussian distribution rather than a set array of numbers (default False)

Type:

bool

remove_uncertain

UNIMPLEMENTED/UNTESTED. Experimental subroutine designed to remove from consideration encoded images that are a certain “distance” from the training set (default False)

Type:

bool

use_attn

UNIMPLEMENTED/UNTESTED. Adds an attention mechanism to the classifier (default False)

Type:

bool

num_training_samples

Number of training samples to sample for the uncertainty removal mechanism (default 300)

Type:

int

static_record

Set of static keys put into the model during training, to prevent unrecognized keys from being input during testing

Type:

set

activations_hook(grad)
classifier_freeze()
classifier_parameters()
cpu()

Moves all model parameters and buffers to the CPU.

Note

This method modifies the module in-place.

Returns:

self

Return type:

Module

cuda(device)

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Note

This method modifies the module in-place.

Parameters:

device (int, optional) – if specified, all parameters will be copied to that device

Returns:

self

Return type:

Module

forward(x, static_input=None, dates=None, bdate=None, return_regress=False, return_encoded=False, encoded_input=False, grad_eval=False)

Puts image or BatchRecord through model and predicts a value.

Parameters:
  • x (torch.Tensor or BatchRecord) – Image or BatchRecord that contains data to be predicted

  • static_input (list) – List of text to be input into model

  • dates (list[datetime.datetime]) – List of dates input in the model, when a BatchRecord is not input

  • bdate (datetime.datetime) – Patient birthdate, when a BatchRecord is not input

  • return_regress (bool) – If True, returns the confound prediction array as a second value (default False)

  • encoded (return) – If True, returns the encoded values of the images (default False)

  • encoded_input (bool) – Indicator that X is input that’s already been encoded and can be put straight into the classifier (default False)

forward_ensemble(kwargs, n_ens=10)
get_activations(x)
get_activations_gradient()
load_state_dict(state_dict, *args, **kwargs)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters:
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns:

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type:

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

regressor_freeze()
state_dict(*args, **kwargs)

Returns a dictionary containing references to the whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to None are not included.

Note

The returned object is a shallow copy. It contains references to the module’s parameters and buffers.

Warning

Currently state_dict() also accepts positional arguments for destination, prefix and keep_vars in order. However, this is being deprecated and keyword arguments will be enforced in future releases.

Warning

Please avoid the use of argument destination as it is not designed for end-users.

Parameters:
  • destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an OrderedDict will be created and returned. Default: None.

  • prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default: ''.

  • keep_vars (bool, optional) – by default the Tensor s returned in the state dict are detached from autograd. If it’s set to True, detaching will not be performed. Default: False.

Returns:

a dictionary containing a whole state of the module

Return type:

dict

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> module.state_dict().keys()
['bias', 'weight']
training: bool
class multi_med_image_ml.models.Regressor(latent_dim, n_confounds, n_choices, device='cpu')

Bases: Module

cpu()

Moves all model parameters and buffers to the CPU.

Note

This method modifies the module in-place.

Returns:

self

Return type:

Module

cuda(device)

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Note

This method modifies the module in-place.

Parameters:

device (int, optional) – if specified, all parameters will be copied to that device

Returns:

self

Return type:

Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

load_state_dict(state_dict, *args, **kwargs)

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters:
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

Returns:

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type:

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

parameters()

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters:

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields:

Parameter – module parameter

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
state_dict(*args, **kwargs)

Returns a dictionary containing references to the whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to None are not included.

Note

The returned object is a shallow copy. It contains references to the module’s parameters and buffers.

Warning

Currently state_dict() also accepts positional arguments for destination, prefix and keep_vars in order. However, this is being deprecated and keyword arguments will be enforced in future releases.

Warning

Please avoid the use of argument destination as it is not designed for end-users.

Parameters:
  • destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an OrderedDict will be created and returned. Default: None.

  • prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default: ''.

  • keep_vars (bool, optional) – by default the Tensor s returned in the state dict are detached from autograd. If it’s set to True, detaching will not be performed. Default: False.

Returns:

a dictionary containing a whole state of the module

Return type:

dict

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> module.state_dict().keys()
['bias', 'weight']
training: bool
class multi_med_image_ml.models.Reshape(*target_shape)

Bases: Module

Used in a nn.Sequential pipeline to reshape on the fly.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
multi_med_image_ml.models.get_age_arr(age, max_age=120.0, d=512)
multi_med_image_ml.models.get_age_encoding(date, birthdate, d=512)
multi_med_image_ml.models.time_index(i, pos, d=512, c=10000)

multi_med_image_ml.utils module

multi_med_image_ml.utils.YC_conv(Y, C, y_weight)
multi_med_image_ml.utils.bucketize(arr, n_buckets)
multi_med_image_ml.utils.check_key_to_filename(key_to_filename: Callable[[str, bool], str])

Verifies that the key to file name conversion method is working properly

This method is called to verify that a user-defined key-to-filename function is properly implemented, such that the function is able to convert an input path to a key forwards and backwards.

multi_med_image_ml.utils.class_balance(classes, confounds, plim=0.05, recurse=True, exclude_none=True, unique_classes=None)
multi_med_image_ml.utils.compile_dicom(dicom_folder: str, cache=True, db_builder=None)

Compiles a folder of DICOMs into a .nii and .json file

Takes a folder of dicom files and turns it into a .nii.gz file, with metadata stored in a .json file. Relies on dcm2niix.

Parameters:
  • dicom_folder (str) – The folder with DICOM files

  • cache (bool) – Whether to cache .npy files in the DICOM folder

  • db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Object that may optionally be input for building up the database

multi_med_image_ml.utils.compile_dicom_folder(dicom_folder: str, db_builder=None)

Converts a folder of dicoms to a .nii.gz, with .json metadata

Uses dcm2niix, since that’s had the best results overall when converting dicom to nifti, even though it’s a system command. Uses pydicom as a backup. The resulting files are stored in the folder. Also takes a DatabaseWrapper object for building the database in real time.

Parameters:
  • dicom_folder (str) – Folder of interest

  • db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Optional input for building up the database

multi_med_image_ml.utils.compile_dicom_py(dicom_folder: str)
multi_med_image_ml.utils.date_sorter(folder, ext)
multi_med_image_ml.utils.determine_random_partition(arr2d, labels)
multi_med_image_ml.utils.determine_random_partition2(arr2d, labels)
multi_med_image_ml.utils.diagnose_network(net, name='network')

Calculate and print the mean of average absolute(gradients)

Parameters:
  • net (torch network) –

  • name (str) –

multi_med_image_ml.utils.discretize_value(v, buckets)
multi_med_image_ml.utils.download_file_from_google_drive(file_id: str, destination: str)

Downloads files from Google drive

Downloads files from Google drive and saves them to a destination.

Parameters:
  • file_id (str) – ID in the Google Drive URL

  • destination (str) – Place to save the file to

multi_med_image_ml.utils.download_weights(weights: str)

Downloads and caches pretrained model weights

Downloads model weights from Google drive and stores them in the user’s cache for future use.

Parameters:

weights (str) – String indicating which weights can be used.

multi_med_image_ml.utils.encode_static_inputs(static_input, d=512)
multi_med_image_ml.utils.equal_terms(term)
multi_med_image_ml.utils.get_balanced_filename_list(test_variable, confounds_array, selection_ratios=[0.66, 0.16, 0.16], selection_limits=[inf, inf, inf], value_ranges=[], output_selection_savepath=None, test_value_ranges=None, get_all_test_set=False, total_size_limit=None, verbose=False, non_confound_value_ranges={}, database=None, n_buckets=10, patient_id_key=None)
multi_med_image_ml.utils.get_class_selection(classes, primed, unique_classes=None)
multi_med_image_ml.utils.get_confirm_token(response)
multi_med_image_ml.utils.get_data_from_filenames(filename_list, test_variable=None, confounds=None, return_as_strs=False, unique_test_vals=None, database=None, return_choice_arr=False, dict_obj=None, return_as_dict=False, key_to_filename=None, X_encoder=None, vae_encoder=False, uniques=None, density_confound_sort=True, n_buckets=3)
multi_med_image_ml.utils.get_dim_str(filename: str | None = None, X_dim: tuple | None = None, outtype: str = '.npy') str

Converts an input filename to the filename of the cached .npy file

Given an input filename (e.g. /path/to/myfile.nii.gz) with a given dimension (e.g. (96,48,48)), converts the filepath to the cached version (e.g. /path/to/myfile_resized_96_48_48.npy). Perfect cube dimensions are annotated with a single number rather than three. If no filename is input, the string itself is returned (resized_96_48_48.npy).

Parameters:
  • filename (str) – Name of the file to be converted (Default None)

  • X_dim (tuple) – Size that the image is going to be resized to (Default None)

  • outtype (str) –

Returns:

String of the cached image file, or a string that can be added to a filename

multi_med_image_ml.utils.get_file_list(obj, allow_list_of_list: str = True, db_builder=None)

Searches a folder tree for all applicable images.

Uses the os.walk method to search a folder tree and returns a list of image files. Relies on get_file_list_from_str and get_file_list_from_list to do so. Takes in a DataBaseWrapper (db_builder) to build up a pandas dataframe during the search.

Parameters:
  • obj (list or str) – List of string of interest

  • allow_list_of_list (str) – Allows lists of lists to be parsed

  • db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Optional input to allow database to be build up

multi_med_image_ml.utils.get_file_list_from_list(obj, allow_list_of_list=True, db_builder=None)
multi_med_image_ml.utils.get_file_list_from_str(obj, db_builder=None)
multi_med_image_ml.utils.get_first_n_primes(n)
multi_med_image_ml.utils.get_lr(optimizer)
multi_med_image_ml.utils.get_multilabel_acc(y_pred, Y)
multi_med_image_ml.utils.get_none_array(classes=None, confounds=None)
multi_med_image_ml.utils.get_prime_form(confounds, n_buckets, sorted_confounds=None)
multi_med_image_ml.utils.hidden_batch_predictions(X, model, group_vars, last_icd, last_hidden_var, ensemble=False, device=None)
multi_med_image_ml.utils.integrate_arrs(S1, S2)
multi_med_image_ml.utils.integrate_arrs_none(S1, S2)
multi_med_image_ml.utils.is_dicom(filename)

Determines if file is dicom

multi_med_image_ml.utils.is_float(N)
multi_med_image_ml.utils.is_image_file(filename: str) bool

Determines if input file is medical image

Determines if the input is an applicable image file. Excludes temporary files.

Parameters:

filename (str) – Path to file

Returns:

bool

multi_med_image_ml.utils.is_list_str(s)
multi_med_image_ml.utils.is_nan(k, inc_null_str=False)
multi_med_image_ml.utils.key_to_filename_default(filename: str, reverse: bool = False) str

Default function for converting a pandas key to a filename

This function can be replaced by a more elaborate one that is able to convert the location of a .npy file on a filesystem to a lookup key in a pandas dataframe. By default, the file path is the key.

multi_med_image_ml.utils.label_to_community(labels)
multi_med_image_ml.utils.list_to_str(val)
multi_med_image_ml.utils.mod_meas(arr2d, labels)
multi_med_image_ml.utils.multi_mannwhitneyu(arr)
multi_med_image_ml.utils.nifti_to_np(nifti_filepath, X_dim)
multi_med_image_ml.utils.not_temp(filename)
multi_med_image_ml.utils.output_test(args, test_val_ranges, output_results, test_predictions_file, mucran, database, X_files=None, return_Xfiles=False)
multi_med_image_ml.utils.parsedate(d, date_format='%Y-%m-%d %H:%M:%S')
multi_med_image_ml.utils.prime(i, primes)
multi_med_image_ml.utils.print_numpy(x, val=True, shp=False)

Print the mean, min, max, median, std, and size of a numpy array

Parameters:
  • val (bool) –

  • shp (bool) –

multi_med_image_ml.utils.recompute_selection_ratios(selection_ratios, selection_limits, N)
multi_med_image_ml.utils.resize_np(nifti_data, dim)
multi_med_image_ml.utils.save_image(image_numpy, image_path, aspect_ratio=1.0)

Save a numpy image to the disk

Parameters:
  • image_numpy (numpy array) –

  • image_path (str) –

multi_med_image_ml.utils.save_response_content(response, destination)
multi_med_image_ml.utils.separate_set(selections, set_divisions=[0.5, 0.5], IDs=None)
multi_med_image_ml.utils.str_to_list(s, nospace=False)
multi_med_image_ml.utils.tensor2im(input_image, imtype=<class 'numpy.uint8'>)

“Converts a Tensor array into a numpy image array.

Parameters:
  • input_image (tensor) –

  • imtype (type) –

multi_med_image_ml.utils.test_all(classes, confounds)
multi_med_image_ml.utils.text_to_bin(text, n_bin=32, d=512)

Encodes strings as binary arrays

multi_med_image_ml.utils.validate_database(database, args)

Module contents