multi_med_image_ml package

Submodules

multi_med_image_ml.DataBaseWrapper module

class multi_med_image_ml.DataBaseWrapper.DataBaseWrapper(database=None, filename=None, labels=[], confounds=[], X_dim=None, key_to_filename=<function key_to_filename_default>, val_ranges={}, precedence=[])

Bases: object

Wrapper for Pandas table to cache some common and repeated functions

DataBaseWrapper stores a pandas dataframe that contains metadata about the images being read in. It also builds up this dataframe in real time if only DICOM or NIFTI/JSON files are present. One purpose is to translate tokenized values in the dataframe to one-hot vectors that can be read by a DL model in the fastest way possible. Another purpose is to contain a storage of cached image files of a particular size.

database

The internal Pandas dataframe (default None)

Type:

pandas.DataFrame

filename

The filename of the Pandas Dataframe pickle (default None)

Type:

str

labels

Labels that are read in by the current MedImageLoader (default [])

Type:

list

confounds

Confound names that are read in by the current MedImageLoader (default [])

Type:

list

val_ranges

List of values that can be returned. See MedImageLoader (default {})

Type:

dict

X_dim

Image dimensions (default None)

Type:

tuple

key_to_filename

Function to translate a key to a filename and back (default key_to_filename_default)

Type:

callback

jdict

Dictionary of values read from JSON files that are accumulated and periodically merged with the pandas Dataframe (database) as it’s build up. Used as an intermediary to prevent too much fragmenting in the DataFrame (default {})

Type:

dict

columns

Columns of the database. Used for quick reference.

Type:

set

add_json(nifti_file, json_file=None)

Adds a JSON from a DICOM file to the internal jdict, which will later be compiled into the DataFrame

Parameters:
  • nifti_file (str) – Nifti (.nii.gz) file that was output by the DICOM preprocessing software

  • json_file (str) – JSON file was output by the DICOM preprocessing software. If not present, searches for the JSON file in the same directory as the the input nifti file.

build_metadata()

Builds internal lookup tables used to convert continuous and label- based variables to one-hot vectors that can be read by an ML model.

get_ID(npy_file: str) str

Returns the Patient ID, if present in the database. Attempts to guess it using the keys ‘PatientID’ and ‘Patient ID’

Parameters:

npy_file (str) – Cached numpy file of the image

Returns:

ID of the patient in question

Return type:

id (str)

get_birth_date(npy_file: str) date
get_confound_encode(npy_file: str, return_lim=False) list

Returns list of integers that represent the confounds of a given input file

Parameters:

npy_file (str) – Numpy file of the given record, which is converted into a key

Returns:

A list of integers indicating the nth confound of the datapoint return_lim (bool): If true, also returns the number of total confounds, as a second value (default False)

Return type:

cnum_list (list)

get_exam_date(npy_file: str) date
get_file_list()
get_label_encode(npy_file: str)

Returns list of integers that represent the confounds of a given input file

Parameters:

npy_file (str) – Numpy file of the given record, which is converted into a key

Returns:

A list of integers indicating the nth label of the datapoint

Return type:

cnum_list (list)

has_im(im: ImageRecord) bool

Determines if the DataBaseWrapper contains a given ImageRecord

Parameters:

im (ImageRecord) – Input image

Returns:

bool - Whether image is present in database

in_val_ranges(fkey: str) bool

Determines if the input fkey is in a valid value range

Parameters:

fkey (str) – Lookup key to the pandas DataFrame

Returns:

bool - Whether the key is present

loc_val(npy_file, c)
out_dataframe(fkey_ass=None)

Merges the jdict with the dataframe and outputs it.

parse_date(d, date_format='%Y-%m-%d %H:%M:%S')

Parses the date string

stack_list_by_label(filename_list, label)

Reorganizes an input filename list by the list of labels. So, a list of filenames of men and women with sex as the given input label will be returned as two lists of men and women.

Parameters:
  • filename_list (list) – List of filenames

  • label (str) – Label to be reorganized by. Must be one of the confounds or labels in the DataBaseWrapper.

Returns:

List of separate filename lists

Return type:

filename_list_stack (list[list])

multi_med_image_ml.MedImageLoader module

class multi_med_image_ml.MedImageLoader.MedImageLoader(*image_folders, pandas_cache='./pandas/', cache=True, key_to_filename=<function key_to_filename_default>, batch_by_pid=False, file_record_name=None, database=None, batch_size=14, X_dim=(96, 96, 96), static_inputs=None, confounds=[], match_confounds=[], label=[], augment=True, val_ranges={}, dtype='torch', Y_dim=(1, 32), C_dim=(16, 32), return_obj=False, channels_first=True, gpu_ids='', precedence=[], n_dyn_inputs=14, verbose=False)

Bases: object

Loads medical images into a format that may be used by MultiInputModule.

This loader preprocesses, reshapes, augments, and batches images and metadata into a format that may be read by MultiInputModule. It additionally may apply a data matching algorithm to ensure that no overly confounded data is fed into the model during training. It is capable of maintaining different lists of images to balance classes for both the classifier and regressor.

database

Object used to store and access metadata about particular files. MedImageLoader builds this automatically from a folder, or it can read from one directly if it’s already been built (default None)

Type:

DataBaseWrapper

X_dim

Three-tuple dimension to which the images will be resized to upon output (default (96,96,96))

Type:

tuple

Y_dim

A tuple indicating the dimension of the image’s label. The first number is the number of labels associated with the image and the second is the number of choices that has. Extra choices will not affect the model but fewer will throw an error — thus, if Y_dim is (1,2) and the label has three classes, it will crash. But (1,4) will just result in an output that is always zero. This should match the Y_dim parameter in the associated MultiInputModule (default (1,32))

Type:

tuple

C_dim

A tuple indicating the dimension of the image’s confounds. This effectively operates the same way as Y_dim, except the default number of confounds is higher (default (16,32))

Type:

tuple

batch_size

Max number of images that can be read in per batch. Note that if batch_by_pid is True, this is the maximum number of images that can be read in, and it’s best to set it to the same value as n_dyn_inputs in MultiInputModule (default 14)

Type:

int

augment

Whether to augment images during training. Note that this only works if the images are returned directly (i.e. return_obj = False). Otherwise images are augmented when get_X is called from ImageRecord (default True)

Type:

bool

dtype

Type of image to be returned – either “torch” or “numpy” (default “torch”)

Type:

str

label

List of labels that will be read in from DataBaseWrapper to the Y output. Must be smaller than the first value of Y_dim.

Type:

list

confounds

List of confounds that will be read in from DataBaseWrapper to the C output. Must be smaller than the first value of C_dim.

Type:

list

pandas_cache

Directory in which the database pandas file is stored

Type:

str

cache

Whether to cache images of a particular dimension as .npy files, for faster reading and indexing in the database (default True)

Type:

str

key_to_filename

Function that translates a key to the DataBaseWrapper into a full filepath from which an image may be read. Needs to accept an additional parameter to reverse this as well (default key_to_filename_default)

Type:

callback

batch_by_pid

Whether to batch images together by their Patient ID in a BatchRecord or not (default False)

Type:

bool

file_record_name

Path of the record of files that were read in by the MedImageLoader, if it needs to be examined later (default None)

Type:

str

channels_first

Whether to put channels in the first or last dimension of images (default True)

Type:

bool

static_inputs

List of variables from DataBaseWrapper that will be input as static, per-patient text inputs (like Sex of Ethnicity) to the MultiInputModule (default None)

Type:

list

val_ranges

Dictionary that may be used to indicate ranges of values that may be loaded in. So, if you want to only study males, val_ranges could be {‘SexDSC’:’MALE’}, and of you only wanted to study people between ages 30 and 60, val_ranges could be {‘Ages’:(30,60)}; these can be combined, too. Note that ‘Ages’ and ‘SexDSC’ must be present in DataBaseWrapper as metadata variable names for this to work (default {})

Type:

dict

match_confounds

Used to apply data matching between the labels. So, if you wanted to distinguish between AD and Controls and wanted to match by age, match_confounds could be set to [‘Ages’] and this would only return sets of AD and Control of the same age ranges. Note that this may severely limit the dataset or even return nothing if the match_confound variable and the label variable are mutually exclusive (default [])

Type:

list

all_records

Cache to store ImageRecords in and clear them if images in main memory get too high.

Type:

multi_med_image_loader.Records.AllRecords

n_dyn_inputs

Max number of inputs of the ML model, to be passed into BatchRecord when it’s used as a patient record (default 14)

Type:

int

precedence

Because labeling is by image in the database and diagnosis is by patient, this option allows “precedence” in labeling when assigning an overall label to a patient. So, if a patient has three images, two marked as “Healthy” and one marked as “Alzheimer’s”, you can pass “[Alzheimer’s,Healthy]” into precedence and it would assign the whole patient the “Alzheimer’s” label (default [])

Type:

list

build_pandas_database()

Builds up the entire Pandas DataFrame from the filesystem in one go. May take a while.

get_abs_index()
get_file_list()
get_index()
index_plus()
index_zero()
load_image_stack()

Loads a stack of images to an internal queue

name()
read_record()
record(flist, index=None)
rotate_labels(zero_list_addendum=None)
switch_stack()
tl()

Top label

multi_med_image_ml.MedImageLoader.key_to_filename_default(fkey, reverse=False)

multi_med_image_ml.MultiInputTester module

class multi_med_image_ml.MultiInputTester.MultiInputTester(database, model, out_record_folder: str | None = None, checkpoint_dir: str | None = None, verbose: bool = False, name: str = 'experiment_name', test_name: str = '', include_inds: list = [0, 1], return_confidence: bool = False)

Bases: object

Used for testing the outputs of MultiInputModule.

MultiInputTester abstracts many of the functions for testing DL models, including grad cam and group AUC outputs.

database

Associated database for testing

Type:

DataBaseWrapper

model

Model to be tested

Type:

MultiInputModule

out_record_folder

Folder to output results (default None)

Type:

str

checkpoint_dir

Folder that has model checkpoints (default none)

Type:

str

name

Name of the model to be tested (default ‘experiment_name’)

Type:

str

test_name

The name of the experiment (default “”)

Type:

str

database_key

Variable used when grouping data together for AUROC analysis

Type:

str

min_pids

(default 1)

Type:

int

top_not_mean

Given multiple AUC output files, this will select one randomly instead of coming up with the mean prediction of all of them

Type:

bool

include_inds

(default [0,1])

Type:

list

same_patients

If true, only plots AUC/Accuracy for patients that are equally divided between groups (default False)

Type:

bool

x_axis_opts

Whether the X axis of the plot should be “images”, “patients”, or “images_per_patient” (default: “images”)

Type:

str

acc(target_label, database_key=None, opt=None, divides=None, same_pids_across_groups=False, save=False, ind=0, acc_or_auc='acc', min_pids=1, top_not_mean=False)
attn_map_vis(att_mat: list, X_dim, patch_size=16)
grad_cam(pr: BatchRecord, add_symlink: bool = True, grad_layer: int = 7, save: bool = True, database_key: str | None = None, target_label: str | None = None, confidence_thresh: float = inf, register: bool = False) Tensor

Outputs a gradient class activation map for the input record

Parameters:
  • pr (BatchRecord) – Image batch to apply Grad-Cam to

  • add_symlink (bool) – If true, adds a symbolic link to the original image in the same folder as the grad-cam is stored in (default True)

  • grad_layer (int) – (default 7)

  • save (bool) – Save the output to the results folder (default True)

loop(pr: BatchRecord, target_label='Folder', record_encoding=False)

Tests one input and saves it.

Parameters:

pr (BatchRecord) – Image batch

out_grad_cam_groups(prefix=None)
out_state_dict()
pca_analysis(database_keys: list, ml_model='pca')
plot(target_label, ind=0, x_axis_opts='images', acc_or_auc='auc', database_key=None, opt=None, divides=None, same_pids_across_groups=False, min_pids=1, do_adjust_text=True, top_not_mean=False)
read_json(target_label)

Reads all json files output by MultiInputTester.

record(reset=False)
save_encodings()
exception multi_med_image_ml.MultiInputTester.NotEnoughPatients(message)

Bases: Exception

multi_med_image_ml.MultiInputTrainer module

class multi_med_image_ml.MultiInputTrainer.LossTracker(name, loss_image_dir)

Bases: object

add_target(target, label_or_confound)
get_last_xs()
get_last_y_class_adv_loss()
get_last_y_class_loss()
get_last_y_kl_loss()
get_last_y_reg_loss()
load()
plot(smooth=False, log_yscale=False, temptitle=None)
save()
smooth(arr)
title(target)
update(xs, y_class_loss, y_reg_loss, y_class_adv_loss, target, label_or_confound, y_kl_loss=None, label_m=None, conf_m=None)
class multi_med_image_ml.MultiInputTrainer.MultiInputTrainer(model, dataloader, lr=1e-05, betas=(0.5, 0.999), loss_function='mse', batch_size=64, regress=True, out_record_folder=None, checkpoint_dir=None, name='experiment_name', verbose=False, save_latest_freq=100, return_lim=True, discriminator_optimizer='adam', classifier_optimizer='adam', forget_optimizer_state=False, use_triplet=False)

Bases: object

Used to train MultiInputModule.

MultiInputModule requires an adversarial technique to train it, and the various data queueing techniques used get a bit complicated, so this method is used to abstract all of that.

model

Input model to train

Type:

MultiInputModule

lr

Learning rate

Type:

float

loss_function

Pytorch loss function. MSE loss is used instead of class entropy because it is smoother and tends to work a bit better with the adversarial function, but this can be tested further (default nn.MSELoss)

name

Name of the model, which is used for saving checkpoints and output graphs (default ‘experiment_name’)

Type:

str

optimizer

Adam optimizer for the encoder/classifier. Incentivized to classify by the true label and set the regressor to the same values.

Type:

torch.optim

optimizer_reg

Adam optimizer for the encoder/regressor. Incentivized to detect confounds from each individual image.

Type:

torch.optim

out_record_folder

If set, outputs images of the loss function for the optimizers over time, for the classifier, the regressor, and the adversarial loss (default None)

Type:

str

checkpoint_dir

If set, saves the model and optimizer state (default None)

Type:

str

save_latest_freq

Number of iterations before it saves the loss image and the checkpoint (default 100)

Type:

int

batch_size

Batch size of the training. Due to the optional-input nature, this cannot be set in the dataloader. Only one set of images can be passed through the loop at a given time. batch_size is how frequently the backpropagation algorithm is applied after graphs have accumulated (default 64)

Type:

int

verbose

Whether to print (default False)

Type:

bool

update_classifier

Boolean to determine whether optimizer (True) or optimizer_reg (False) is applied

Type:

bool

index

Counts the number of iterations the trainer has gone through

Type:

int

get_time_str()
log_time(m)
loop(pr: BatchRecord)

Loops a single BatchRecord through one iteration

Loops a BatchRecord through one iteration. Also switches the queues of the MedImageLoader as it switches between optimizers.

Parameters:

pr (multi_med_image_ml.Records.BatchRecord) – Record to be evaluated

reset_time()
test()

multi_med_image_ml.Records module

class multi_med_image_ml.Records.AllRecords

Bases: object

Contains a dictionary of BatchRecord

Used to both prevent duplicate data from being called and to be able to clear all images from main memory and perform garbage collection when necessary.

image_dict

Dictionary of ImageRecord, mapped by their given filename

Type:

dict

mem_limit

Limit of memory that can be read into RAM

Type:

int

obj_size

Average size of an object given the image dimension of the dataloader

Type:

int

cur_mem

Count of current memory read in (TODO)

Type:

int

add(filename: str, im: ImageRecord)
check_mem()
clear_images()
get(filename: str)
get_mem()
has(filename: str)
class multi_med_image_ml.Records.BatchRecord(image_records: list, dtype: str = 'torch', sort: bool = True, batch_by_pid: bool = False, channels_first: bool = True, gpu_ids: str = '', batch_size: int = 14, get_text_records: bool = False)

Bases: object

Class that stores batches of ImageRecord

BatchRecord essentially abstracts lists of ImageRecord so that it returns them in batches. It is also used to store patient data for instances in which patients have multiple images.

image_records

List of ImageRecord classes

Type:

list

dtype

Type to be returned, either “torch” or “numpy” (default “torch”)

Type:

str

gpu_ids

GPU, if any, on which to read the images out to (default “”)

Type:

list

channels_first

Whether channels in the images are the first or last dimension (default True)

Type:

bool

batch_size

The maximum number of images that may be returned in an instance of get_X (default 14)

Type:

int

get_C(confound=None, return_lim=False)
get_C_dud(confound=None, return_lim=False)
get_X(augment=False)
get_X_files()
get_Y(label=None)
get_birth_dates()
get_exam_dates()
get_static_inputs()
get_text_records()
name()
shift_order()
sort_order(scramble=False, batch_size=None)
class multi_med_image_ml.Records.FileLookup(filename=None, npy_name=None, fkey=None)

Bases: object

file()
key()
npy_file()
class multi_med_image_ml.Records.ImageRecord(filename: str, static_inputs: list = [], database=None, X_dim: tuple = (96, 96, 96), dtype: str = 'torch', extra_info_list: list | None = None, y_on_c: bool = False, cache: bool = True, Y_dim: tuple = (1, 32), C_dim: tuple = (16, 32), y_nums: list | None = None, c_nums: list | None = None)

Bases: Record

A class used to represent an abstraction of an image for MedImageLoader.

ImageRecord is used to keep and organize a given image in main memory. The same image may be represented on the file system as a nifti, dicom, or an npy file, which caches the file at a particular size. This reads in the file without creating duplicates. The image may also be cleared or read in in real time to avoid having the images take up too much space in main memory.

filename

Filename of the image

Type:

str

database

Object used to quickly look up metadata associated with the image (default None)

Type:

str

dtype

Type of output (either “torch” or “numpy”) (default “torch”)

Type:

str

extra_info_list
Type:

list

X_dim

Standard dimension that the image will be resized to upon returning it (default (96,96,96))

Type:

tuple

Y_dim

A tuple indicating the dimension of the image’s label. The first number is the number of labels associated with the image and the second is the number of choices that has. Extra choices will not affect the model but fewer will throw an error — thus, if Y_dim is (1,2) and the label has three classes, it will crash. But (1,4) will just result in an output that is always zero. This should match the Y_dim parameter in the associated MultiInputModule (default (1,32))

Type:

tuple

C_dim

A tuple indicating the dimension of the image’s confounds. This effectively operates the same way as Y_dim, except the default number of confounds is higher (default (16,32))

Type:

tuple

image

Variable containing the actual image, at size dim. It may be None, to save memory (default None)

Type:

Numpy array

Y

Variable containing the encoding of the image label(s), at size Y_dim

Type:

Numpy array

C

Variable containing the encoding of the image confound(s), at size C_dim

Type:

Numpy array

y_on_c

If true, replicates the Y array on the bottom of all C arrays. Used for regression training. C_dim must to large enough to accommodate the extra Y array or it will crash. (default True)

Type:

bool

times_called

Counter to count the number of times get_X is called (default 0)

Type:

int

static_inputs

A list of values that may be called to put into the model as text (e.g. “SEX”, “AGE”)

Type:

list

static_input_res

The values once they’re looked up from the database (e.g. “MALE”, “22”)

Type:

list

cache

If true, caches the image file as a .npy array. Takes up extra space but it’s recommended. (default True)

Type:

bool

npy_file

Path of the cached record

Type:

bool

npy_file

Path of the cached .npy record

Type:

str

exam_date

Date that the image was taken, if it can be read in from the database/dicom records (default None)

Type:

datetime

bdate

Birth date of the patient, if it can be read in from the database/dicom records (default None)

Type:

datetime

json_file

File name of the json that results from a DICOM being converted to nifti (default None)

Type:

str

loaded

True if images are loaded into main memory, False if not (default False)

Type:

bool

clear_image()

Clears the array data from main memory

get_C(confound=None, return_lim=False)
get_C_dud(confound=None, return_lim=False)

Returns an array of duds with the same dimensionality as C

Returns an array of duds with the same dimensionality as C but with all values set to the first choice. Used in training the regressor. If y_on_c is set to True, this replicates the Y array on the bottom rows of the array.

get_X(augment=False)

Reads in and returns the image, with the option to augment

get_X_files()
get_Y(label=None)
get_image_type(cache=True)

Determines the type of image that self.filename is

get_mem() float

Estimates the memory of the larger objects stored in ImageRecord

read_image()
class multi_med_image_ml.Records.PatientRecord(pid, items)

Bases: object

Returns text records, like medication history, of a given patient

pid

Patient ID

Type:

str

get_record(item)
get_records(confounds)
class multi_med_image_ml.Records.Record(static_inputs=[], database=None)

Bases: object

get_ID()
get_birth_date()
get_exam_date()
get_static_inputs()

Loads in static inputs from the database

load_extra_info()
multi_med_image_ml.Records.TextRecord(Record)

multi_med_image_ml.models module

multi_med_image_ml.utils module

multi_med_image_ml.utils.YC_conv(Y, C, y_weight)
multi_med_image_ml.utils.bucketize(arr, n_buckets)
multi_med_image_ml.utils.check_key_to_filename(key_to_filename: Callable[[str, bool], str])

Verifies that the key to file name conversion method is working properly

This method is called to verify that a user-defined key-to-filename function is properly implemented, such that the function is able to convert an input path to a key forwards and backwards.

multi_med_image_ml.utils.class_balance(classes, confounds, plim=0.05, recurse=True, exclude_none=True, unique_classes=None)
multi_med_image_ml.utils.combine_covar(C1, C2, M1, M2, N1, N2)
multi_med_image_ml.utils.compile_dicom(dicom_folder: str, cache=True, db_builder=None, verbose=False)

Compiles a folder of DICOMs into a .nii and .json file

Takes a folder of dicom files and turns it into a .nii.gz file, with metadata stored in a .json file. Relies on dcm2niix.

Parameters:
  • dicom_folder (str) – The folder with DICOM files

  • cache (bool) – Whether to cache .npy files in the DICOM folder

  • db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Object that may optionally be input for building up the database

multi_med_image_ml.utils.compile_dicom_folder(dicom_folder: str, db_builder=None)

Converts a folder of dicoms to a .nii.gz, with .json metadata

Uses dcm2niix, since that’s had the best results overall when converting dicom to nifti, even though it’s a system command. Uses pydicom as a backup. The resulting files are stored in the folder. Also takes a DatabaseWrapper object for building the database in real time.

Parameters:
  • dicom_folder (str) – Folder of interest

  • db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Optional input for building up the database

multi_med_image_ml.utils.compile_dicom_py(dicom_folder: str)
multi_med_image_ml.utils.date_sorter(folder, ext)
multi_med_image_ml.utils.determine_random_partition(arr2d, labels)
multi_med_image_ml.utils.determine_random_partition2(arr2d, labels)
multi_med_image_ml.utils.diagnose_network(net, name='network')

Calculate and print the mean of average absolute(gradients)

Parameters:
  • net (torch network) –

  • name (str) –

multi_med_image_ml.utils.discretize_value(v, buckets)
multi_med_image_ml.utils.download_file_from_google_drive(file_id: str, destination: str)

Downloads files from Google drive

Downloads files from Google drive and saves them to a destination.

Parameters:
  • file_id (str) – ID in the Google Drive URL

  • destination (str) – Place to save the file to

multi_med_image_ml.utils.download_weights(weights: str)

Downloads and caches pretrained model weights

Downloads model weights from Google drive and stores them in the user’s cache for future use.

Parameters:

weights (str) – String indicating which weights can be used.

multi_med_image_ml.utils.encode_static_inputs(static_input, d=512)
multi_med_image_ml.utils.equal_terms(term)
multi_med_image_ml.utils.get_balanced_filename_list(test_variable, confounds_array, selection_ratios=[0.66, 0.16, 0.16], selection_limits=[inf, inf, inf], value_ranges=[], output_selection_savepath=None, test_value_ranges=None, get_all_test_set=False, total_size_limit=None, verbose=False, non_confound_value_ranges={}, database=None, n_buckets=10, patient_id_key=None)
multi_med_image_ml.utils.get_class_selection(classes, primed, unique_classes=None)
multi_med_image_ml.utils.get_confirm_token(response)
multi_med_image_ml.utils.get_data_from_filenames(filename_list, test_variable=None, confounds=None, return_as_strs=False, unique_test_vals=None, database=None, return_choice_arr=False, dict_obj=None, return_as_dict=False, key_to_filename=None, X_encoder=None, vae_encoder=False, uniques=None, density_confound_sort=True, n_buckets=3)
multi_med_image_ml.utils.get_dim_str(filename: str | None = None, X_dim: tuple | None = None, outtype: str = '.npy') str

Converts an input filename to the filename of the cached .npy file

Given an input filename (e.g. /path/to/myfile.nii.gz) with a given dimension (e.g. (96,48,48)), converts the filepath to the cached version (e.g. /path/to/myfile_resized_96_48_48.npy). Perfect cube dimensions are annotated with a single number rather than three. If no filename is input, the string itself is returned (resized_96_48_48.npy).

Parameters:
  • filename (str) – Name of the file to be converted (Default None)

  • X_dim (tuple) – Size that the image is going to be resized to (Default None)

  • outtype (str) –

Returns:

String of the cached image file, or a string that can be added to a filename

multi_med_image_ml.utils.get_file_list(obj, allow_list_of_list: str = True, db_builder=None)

Searches a folder tree for all applicable images.

Uses the os.walk method to search a folder tree and returns a list of image files. Relies on get_file_list_from_str and get_file_list_from_list to do so. Takes in a DataBaseWrapper (db_builder) to build up a pandas dataframe during the search.

Parameters:
  • obj (list or str) – List of string of interest

  • allow_list_of_list (str) – Allows lists of lists to be parsed

  • db_builder (multi_med_image_ml.DataBaseWrapper.DateBaseWrapper) – Optional input to allow database to be build up

multi_med_image_ml.utils.get_file_list_from_list(obj, allow_list_of_list=True, db_builder=None)
multi_med_image_ml.utils.get_file_list_from_str(obj, db_builder=None)
multi_med_image_ml.utils.get_first_n_primes(n)
multi_med_image_ml.utils.get_lr(optimizer)
multi_med_image_ml.utils.get_multilabel_acc(y_pred, Y)
multi_med_image_ml.utils.get_none_array(classes=None, confounds=None)
multi_med_image_ml.utils.get_prime_form(confounds, n_buckets, sorted_confounds=None)
multi_med_image_ml.utils.hidden_batch_predictions(X, model, group_vars, last_icd, last_hidden_var, ensemble=False, device=None)
multi_med_image_ml.utils.integrate_arrs(S1, S2)
multi_med_image_ml.utils.integrate_arrs_none(S1, S2)
multi_med_image_ml.utils.is_dicom(filename)

Determines if file is dicom

multi_med_image_ml.utils.is_float(N)
multi_med_image_ml.utils.is_image_file(filename: str) bool

Determines if input file is medical image

Determines if the input is an applicable image file. Excludes temporary files.

Parameters:

filename (str) – Path to file

Returns:

bool

multi_med_image_ml.utils.is_list_str(s)
multi_med_image_ml.utils.is_nan(k, inc_null_str=False)
multi_med_image_ml.utils.key_to_filename_default(filename: str, reverse: bool = False) str

Default function for converting a pandas key to a filename

This function can be replaced by a more elaborate one that is able to convert the location of a .npy file on a filesystem to a lookup key in a pandas dataframe. By default, the file path is the key.

multi_med_image_ml.utils.label_to_community(labels)
multi_med_image_ml.utils.list_to_str(val)
multi_med_image_ml.utils.mod_meas(arr2d, labels)
multi_med_image_ml.utils.multi_mannwhitneyu(arr)
multi_med_image_ml.utils.nifti_to_np(nifti_filepath, X_dim)
multi_med_image_ml.utils.not_temp(filename)
multi_med_image_ml.utils.output_test(args, test_val_ranges, output_results, test_predictions_file, mucran, database, X_files=None, return_Xfiles=False)
multi_med_image_ml.utils.parsedate(d, date_format='%Y-%m-%d %H:%M:%S')
multi_med_image_ml.utils.prime(i, primes)
multi_med_image_ml.utils.print_numpy(x, val=True, shp=False)

Print the mean, min, max, median, std, and size of a numpy array

Parameters:
  • val (bool) –

  • shp (bool) –

multi_med_image_ml.utils.recompute_selection_ratios(selection_ratios, selection_limits, N)
multi_med_image_ml.utils.resize_np(nifti_data, dim)
multi_med_image_ml.utils.save_image(image_numpy, image_path, aspect_ratio=1.0)

Save a numpy image to the disk

Parameters:
  • image_numpy (numpy array) –

  • image_path (str) –

multi_med_image_ml.utils.save_response_content(response, destination)
multi_med_image_ml.utils.separate_set(selections, set_divisions=[0.5, 0.5], IDs=None)
multi_med_image_ml.utils.str_to_list(s, nospace=False)
multi_med_image_ml.utils.tensor2im(input_image, imtype=<class 'numpy.uint8'>)

“Converts a Tensor array into a numpy image array.

Parameters:
  • input_image (tensor) –

  • imtype (type) –

multi_med_image_ml.utils.test_all(classes, confounds)
multi_med_image_ml.utils.text_to_bin(text, n_bin=32, d=512)

Encodes strings as binary arrays

multi_med_image_ml.utils.validate_database(database, args)

Module contents