spacr package

Submodules

spacr.app_annotate module

spacr.app_annotate.initiate_annotation_app(parent_frame)[source]
spacr.app_annotate.start_annotate_app()[source]

spacr.app_classify module

spacr.app_classify.start_classify_app()[source]

spacr.app_make_masks module

spacr.app_make_masks.initiate_make_mask_app(parent_frame)[source]
spacr.app_make_masks.start_make_mask_app()[source]

spacr.app_mask module

spacr.app_mask.start_mask_app()[source]

spacr.app_measure module

spacr.app_measure.start_measure_app()[source]

spacr.app_sequencing module

spacr.app_sequencing.start_seq_app()[source]

spacr.app_umap module

spacr.app_umap.start_umap_app()[source]

spacr.core module

spacr.core.all_elements_match(list1, list2)[source]
spacr.core.analyze_data_reg(sequencing_loc, dv_loc, agg_type='mean', min_cell_count=50, min_reads=100, min_wells=2, max_wells=1000, remove_outlier_genes=False, refine_model=False, by_plate=False, threshold=0.5, fishers=False)[source]
spacr.core.analyze_plaques(folder)[source]
spacr.core.analyze_recruitment(settings={})[source]

Analyze recruitment data by grouping the DataFrame by well coordinates and plotting controls and recruitment data.

Parameters: settings (dict): settings.

Returns: None

spacr.core.annotate_results(pred_loc)[source]
spacr.core.apply_model(src, model_path, image_size=224, batch_size=64, normalize=True, n_jobs=10)[source]
spacr.core.apply_model_to_tar(settings={})[source]
spacr.core.check_cellpose_models(settings)[source]
spacr.core.check_index(df, elements=5, split_char='_')[source]
spacr.core.compare_cellpose_masks(src, verbose=False, processes=None, save=True)[source]
spacr.core.compare_mask(args)[source]
spacr.core.find_optimal_threshold(y_true, y_pred_proba)[source]

Find the optimal threshold for binary classification based on the F1-score.

Args: y_true (array-like): True binary labels. y_pred_proba (array-like): Predicted probabilities for the positive class.

Returns: float: The optimal threshold.

spacr.core.generate_cellpose_masks(src, settings, object_type)[source]
spacr.core.generate_dataset(settings={})[source]
spacr.core.generate_dataset_from_lists(dst, class_data, classes, test_split=0.1)[source]
spacr.core.generate_image_umap(settings={})[source]

Generate UMAP or tSNE embedding and visualize the data with clustering.

Parameters: settings (dict): Dictionary containing the following keys: src (str): Source directory containing the data. row_limit (int): Limit the number of rows to process. tables (list): List of table names to read from the database. visualize (str): Visualization type. image_nr (int): Number of images to display. dot_size (int): Size of dots in the scatter plot. n_neighbors (int): Number of neighbors for UMAP. figuresize (int): Size of the figure. black_background (bool): Whether to use a black background. remove_image_canvas (bool): Whether to remove the image canvas. plot_outlines (bool): Whether to plot outlines. plot_points (bool): Whether to plot points. smooth_lines (bool): Whether to smooth lines. verbose (bool): Whether to print verbose output. embedding_by_controls (bool): Whether to use embedding from controls. col_to_compare (str): Column to compare for control-based embedding. pos (str): Positive control value. neg (str): Negative control value. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). exclude (list): List of columns to exclude from the analysis. plot_images (bool): Whether to plot images. reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). save_figure (bool): Whether to save the figure as a PDF.

Returns: pd.DataFrame: DataFrame with the original data and an additional column ‘cluster’ containing the cluster identity.

spacr.core.generate_loaders(src, mode='train', image_size=224, batch_size=32, classes=['nc', 'pc'], n_jobs=None, validation_split=0.0, pin_memory=False, normalize=False, channels=[1, 2, 3], augment=False, preload_batches=3, verbose=False)[source]

Generate data loaders for training and validation/test datasets.

Parameters: - src (str): The source directory containing the data. - mode (str): The mode of operation. Options are ‘train’ or ‘test’. - image_size (int): The size of the input images. - batch_size (int): The batch size for the data loaders. - classes (list): The list of classes to consider. - n_jobs (int): The number of worker threads for data loading. - validation_split (float): The fraction of data to use for validation. - pin_memory (bool): Whether to pin memory for faster data transfer. - normalize (bool): Whether to normalize the input images. - verbose (bool): Whether to print additional information and show images. - channels (list): The list of channels to retain. Options are [1, 2, 3] for all channels, [1, 2] for blue and green, etc.

Returns: - train_loaders (list): List of data loaders for training datasets. - val_loaders (list): List of data loaders for validation datasets.

spacr.core.generate_masks_from_imgs(src, model, model_name, batch_size, diameter, cellprob_threshold, flow_threshold, grayscale, save, normalize, channels, percentiles, circular, invert, plot, resize, target_height, target_width, remove_background, background, Signal_to_noise, verbose)[source]
spacr.core.generate_ml_scores(src, settings)[source]
spacr.core.generate_training_data_file_list(src, target='protein of interest', cell_dim=4, nucleus_dim=5, pathogen_dim=6, channel_of_interest=1, pathogen_size_min=0, nucleus_size_min=0, cell_size_min=0, pathogen_min=0, nucleus_min=0, cell_min=0, target_min=0, mask_chans=[0, 1, 2], filter_data=False, include_noninfected=False, include_multiinfected=False, include_multinucleated=False, cells_per_well=10, save_filtered_filelist=False)[source]
spacr.core.generate_training_dataset(settings)[source]
spacr.core.identify_masks_finetune(settings)[source]
spacr.core.jitterplot_by_annotation(src, x_column, y_column, plot_title='Jitter Plot', output_path=None, filter_column=None, filter_values=None)[source]

Reads a CSV file and creates a jitter plot of one column grouped by another column.

Args: src (str): Path to the source data. x_column (str): Name of the column to be used for the x-axis. y_column (str): Name of the column to be used for the y-axis. plot_title (str): Title of the plot. Default is ‘Jitter Plot’. output_path (str): Path to save the plot image. If None, the plot will be displayed. Default is None.

Returns: pd.DataFrame: The filtered and balanced DataFrame.

spacr.core.join_measurments_and_annotation(src, tables=['cell', 'nucleus', 'pathogen', 'cytoplasm'])[source]
spacr.core.map_condition(col_value, neg='c1', pos='c2', mix='c3')[source]
spacr.core.merge_pred_mes(src, pred_loc, target='protein of interest', cell_dim=4, nucleus_dim=5, pathogen_dim=6, channel_of_interest=1, pathogen_size_min=0, nucleus_size_min=0, cell_size_min=0, pathogen_min=0, nucleus_min=0, cell_min=0, target_min=0, mask_chans=[0, 1, 2], filter_data=False, include_noninfected=False, include_multiinfected=False, include_multinucleated=False, cells_per_well=10, save_filtered_filelist=False, verbose=False)[source]
spacr.core.ml_analysis(df, channel_of_interest=3, location_column='col', positive_control='c2', negative_control='c1', exclude=None, n_repeats=10, top_features=30, n_estimators=100, test_size=0.2, model_type='xgboost', n_jobs=-1, remove_low_variance_features=True, remove_highly_correlated_features=True, verbose=False)[source]

Calculates permutation importance for numerical features in the dataframe, comparing groups based on specified column values and uses the model to predict the class for all other rows in the dataframe.

Args: df (pandas.DataFrame): The DataFrame containing the data. feature_string (str): String to filter features that contain this substring. location_column (str): Column name to use for comparing groups. positive_control, negative_control (str): Values in location_column to create subsets for comparison. exclude (list or str, optional): Columns to exclude from features. n_repeats (int): Number of repeats for permutation importance. top_features (int): Number of top features to plot based on permutation importance. n_estimators (int): Number of trees in the random forest, gradient boosting, or XGBoost model. test_size (float): Proportion of the dataset to include in the test split. random_state (int): Random seed for reproducibility. model_type (str): Type of model to use (‘random_forest’, ‘logistic_regression’, ‘gradient_boosting’, ‘xgboost’). n_jobs (int): Number of jobs to run in parallel for applicable models.

Returns: pandas.DataFrame: The original dataframe with added prediction and data usage columns. pandas.DataFrame: DataFrame containing the importances and standard deviations.

spacr.core.prepare_batch_for_cellpose(batch)[source]
spacr.core.preprocess_generate_masks(src, settings={})[source]
spacr.core.process_reads(df, min_reads, min_wells, max_wells, gene_column, remove_outliers=False)[source]

Perform a hyperparameter search for UMAP or tSNE on the given data.

Parameters: settings (dict): Dictionary containing the following keys: src (str): Source directory containing the data. row_limit (int): Limit the number of rows to process. tables (list): List of table names to read from the database. filter_by (str): Column to filter the data. sample_size (int): Number of samples to use for the hyperparameter search. remove_highly_correlated (bool): Whether to remove highly correlated columns. log_data (bool): Whether to log transform the data. verbose (bool): Whether to print verbose output. reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). reduction_params (list): List of dictionaries containing hyperparameters to test for the reduction method. dbscan_params (list): List of dictionaries containing DBSCAN hyperparameters to test. kmeans_params (list): List of dictionaries containing KMeans hyperparameters to test. pointsize (int): Size of the points in the scatter plot. save (bool): Whether to save the resulting plot as a file.

Returns: None

spacr.core.regression_analasys(dv_df, sequencing_loc, min_reads=75, min_wells=2, max_wells=0, model_type='mlr', min_cells=100, transform='logit', min_frequency=0.05, gene_column='gene', effect_size_threshold=0.25, fishers=True, clean_regression=False, VIF_threshold=10)[source]
spacr.core.save_results_and_figure(src, fig, results)[source]
spacr.core.shap_analysis(model, X_train, X_test)[source]

Performs SHAP analysis on the given model and data.

Args: model: The trained model. X_train (pandas.DataFrame): Training feature set. X_test (pandas.DataFrame): Testing feature set. Returns: fig: Matplotlib figure object containing the SHAP summary plot.

spacr.core.train_cellpose(settings)[source]
spacr.core.training_dataset_from_annotation(db_path, dst, annotation_column='test', annotated_classes=(1, 2))[source]

spacr.deep_spacr module

class spacr.deep_spacr.SmoothGrad(model, n_samples=50, stdev_spread=0.15)[source]

Bases: object

compute_smooth_grad(input_tensor, target_class)[source]
spacr.deep_spacr.deep_spacr(settings={})[source]
spacr.deep_spacr.evaluate_model_performance(model, loader, epoch, loss_type)[source]

Evaluates the performance of a model on a given data loader.

Parameters:
  • model (torch.nn.Module) – The model to evaluate.

  • loader (torch.utils.data.DataLoader) – The data loader to evaluate the model on.

  • loader_name (str) – The name of the data loader.

  • epoch (int) – The current epoch number.

  • loss_type (str) – The type of loss function to use.

Returns:

The classification metrics data as a DataFrame. prediction_pos_probs (list): The positive class probabilities for each prediction. all_labels (list): The true labels for each prediction.

Return type:

data_df (pandas.DataFrame)

spacr.deep_spacr.test_model_core(model, loader, loader_name, epoch, loss_type)[source]
spacr.deep_spacr.test_model_performance(loaders, model, loader_name_list, epoch, loss_type)[source]

Test the performance of a model on given data loaders.

Parameters:
  • loaders (list) – List of data loaders.

  • model – The model to be tested.

  • loader_name_list (list) – List of names for the data loaders.

  • epoch (int) – The current epoch.

  • loss_type – The type of loss function.

Returns:

A tuple containing the test results and the results dataframe.

Return type:

tuple

spacr.deep_spacr.train_model(dst, model_type, train_loaders, epochs=100, learning_rate=0.0001, weight_decay=0.05, amsgrad=False, optimizer_type='adamw', use_checkpoint=False, dropout_rate=0, n_jobs=20, val_loaders=None, test_loaders=None, init_weights='imagenet', intermedeate_save=None, chan_dict=None, schedule=None, loss_type='binary_cross_entropy_with_logits', gradient_accumulation=False, gradient_accumulation_steps=4, channels=['r', 'g', 'b'], verbose=False)[source]

Trains a model using the specified parameters.

Parameters:
  • dst (str) – The destination path to save the model and results.

  • model_type (str) – The type of model to train.

  • train_loaders (list) – A list of training data loaders.

  • epochs (int, optional) – The number of training epochs. Defaults to 100.

  • learning_rate (float, optional) – The learning rate for the optimizer. Defaults to 0.0001.

  • weight_decay (float, optional) – The weight decay for the optimizer. Defaults to 0.05.

  • amsgrad (bool, optional) – Whether to use AMSGrad for the optimizer. Defaults to False.

  • optimizer_type (str, optional) – The type of optimizer to use. Defaults to ‘adamw’.

  • use_checkpoint (bool, optional) – Whether to use checkpointing during training. Defaults to False.

  • dropout_rate (float, optional) – The dropout rate for the model. Defaults to 0.

  • n_jobs (int, optional) – The number of n_jobs for data loading. Defaults to 20.

  • val_loaders (list, optional) – A list of validation data loaders. Defaults to None.

  • test_loaders (list, optional) – A list of test data loaders. Defaults to None.

  • init_weights (str, optional) – The initialization weights for the model. Defaults to ‘imagenet’.

  • intermedeate_save (list, optional) – The intermediate save thresholds. Defaults to None.

  • chan_dict (dict, optional) – The channel dictionary. Defaults to None.

  • schedule (str, optional) – The learning rate schedule. Defaults to None.

  • loss_type (str, optional) – The loss function type. Defaults to ‘binary_cross_entropy_with_logits’.

  • gradient_accumulation (bool, optional) – Whether to use gradient accumulation. Defaults to False.

  • gradient_accumulation_steps (int, optional) – The number of steps for gradient accumulation. Defaults to 4.

Returns:

None

spacr.deep_spacr.train_test_model(settings)[source]
spacr.deep_spacr.visualize_classes(model, dtype, class_names, **kwargs)[source]
spacr.deep_spacr.visualize_grad_cam(src, model_path, target_layers=None, image_size=224, channels=[1, 2, 3], normalize=True, class_names=None, save_cam=False, save_dir='grad_cam')[source]
spacr.deep_spacr.visualize_integrated_gradients(src, model_path, target_label_idx=0, image_size=224, channels=[1, 2, 3], normalize=True, save_integrated_grads=False, save_dir='integrated_grads')[source]
spacr.deep_spacr.visualize_saliency_map(src, model_type='maxvit', model_path='', image_size=224, channels=[1, 2, 3], normalize=True, class_names=None, save_saliency=False, save_dir='saliency_maps')[source]
spacr.deep_spacr.visualize_smooth_grad(src, model_path, target_label_idx, image_size=224, channels=[1, 2, 3], normalize=True, save_smooth_grad=False, save_dir='smooth_grad')[source]

spacr.graph_learning module

class spacr.graph_learning.Decoder(hidden_feats, out_feats)[source]

Bases: Module

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class spacr.graph_learning.Encoder(in_feats, hidden_feats)[source]

Bases: Module

forward(g, features)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class spacr.graph_learning.GraphTransformer(in_feats, hidden_feats, out_feats)[source]

Bases: Module

forward(g, features)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

spacr.graph_learning.analyze_associations(probabilities, sequencing_data)[source]
spacr.graph_learning.apply_model(graphs, model)[source]
spacr.graph_learning.construct_well_graph(images, image_metadata, grna_data)[source]
spacr.graph_learning.create_graphs_for_wells(images, metadata_list, sequencing_data)[source]
spacr.graph_learning.extract_metadata_from_path(path)[source]

Extract metadata from the image path. The path format is expected to be plate_well_field_objectnumber.png

Parameters: path (str): The path to the image file.

Returns: dict: A dictionary with the extracted metadata.

spacr.graph_learning.generate_synthetic_grna_data(n_samples, n_features)[source]
spacr.graph_learning.load_images(image_paths, image_size=224, channels=[1, 2, 3], normalize=True)[source]
spacr.graph_learning.load_mnist_data()[source]
spacr.graph_learning.normalize_sequencing_data(sequencing_data)[source]
spacr.graph_learning.preprocess_image(image_path, image_size=224, channels=[1, 2, 3], normalize=True)[source]
spacr.graph_learning.process_sequencing_df(seq)[source]
spacr.graph_learning.train(graphs, labels, model, loss_fn, optimizer, epochs=100)[source]
spacr.graph_learning.train_graph_transformer(src, lr=0.01, epochs=100, hidden_feats=128, n_classes=2, row_limit=None, image_size=224, channels=[1, 2, 3], normalize=True, test_mode=False)[source]

spacr.gui module

class spacr.gui.MainApp(default_app=None)[source]

Bases: Tk

clear_description()[source]
clear_frame(frame)[source]
create_startup_screen()[source]
create_widgets()[source]
load_app(app_name, app_func)[source]
show_description(description)[source]
update_description()[source]
spacr.gui.gui_app()[source]

spacr.gui_core module

spacr.gui_core.clear_unused_figures()[source]
spacr.gui_core.display_figure(fig)[source]
spacr.gui_core.import_settings(settings_type='mask')[source]
spacr.gui_core.initiate_abort()[source]
spacr.gui_core.initiate_root(parent, settings_type='mask')[source]

Initializes the root window and sets up the GUI components based on the specified settings type.

Parameters:
  • parent (tkinter.Tk or tkinter.Toplevel) – The parent window for the GUI.

  • settings_type (str, optional) – The type of settings to be displayed in the GUI. Defaults to ‘mask’.

Returns:

A tuple containing the parent frame and the dictionary of variables used in the GUI.

Return type:

tuple

spacr.gui_core.main_thread_update_function(root, q, fig_queue, canvas_widget)[source]
spacr.gui_core.process_console_queue()[source]
spacr.gui_core.process_fig_queue()[source]
spacr.gui_core.set_globals(thread_control_var, q_var, console_output_var, parent_frame_var, vars_dict_var, canvas_var, canvas_widget_var, scrollable_frame_var, fig_queue_var, figures_var, figure_index_var, progress_bar_var, usage_bars_var, fig_memory_limit_var, figure_current_memory_usage_var)[source]
spacr.gui_core.setup_button_section(horizontal_container, settings_type='mask', run=True, abort=True, download=True, import_btn=True)[source]
spacr.gui_core.setup_console(vertical_container)[source]
spacr.gui_core.setup_plot_section(vertical_container)[source]
spacr.gui_core.setup_progress_frame(vertical_container)[source]
spacr.gui_core.setup_settings_panel(vertical_container, settings_type='mask')[source]
spacr.gui_core.setup_usage_panel(horizontal_container, btn_col, uppdate_frequency)[source]
spacr.gui_core.show_next_figure()[source]
spacr.gui_core.show_previous_figure()[source]
spacr.gui_core.start_process(q=None, fig_queue=None, settings_type='mask')[source]
spacr.gui_core.toggle_settings(button_scrollable_frame)[source]

spacr.gui_elements module

class spacr.gui_elements.AnnotateApp(root, db_path, src, image_type=None, channels=None, image_size=200, annotation_column='annotate', normalize=False, percentiles=(1, 99), measurement=None, threshold=None)[source]

Bases: object

add_colored_border(img, border_width, border_color)[source]
calculate_grid_dimensions()[source]
filter_channels(img)[source]
get_on_image_click(path, label, img)[source]
load_images()[source]
load_single_image(path_annotation_tuple)[source]
next_page()[source]
static normalize_image(img, normalize=False, percentiles=(1, 99))[source]
prefilter_paths_annotations()[source]
previous_page()[source]
shutdown()[source]
update_database_worker()[source]
update_gui_text(text)[source]
static update_html(text)[source]
class spacr.gui_elements.ModifyMaskApp(root, folder_path, scale_factor)[source]

Bases: object

apply_brush(event)[source]
apply_brush_release(event)[source]
apply_dividing_line()[source]
apply_magic_wand(image, mask, seed_point, tolerance, maximum, action='add')[source]
apply_normalization()[source]
apply_zoom_on_enter(event)[source]
canvas_to_image(x_canvas, y_canvas)[source]
clear_objects()[source]
display_image()[source]
display_zoomed_image()[source]
draw(event)[source]
draw_on_zoomed_mask(draw_coordinates)[source]
erase_brush(event)[source]
erase_brush_release(event)[source]
erase_object(event)[source]
fill_objects()[source]
finish_dividing_line(event)[source]
finish_drawing(event)[source]
finish_drawing_if_active(event)[source]
get_scaling_factors(img_width, img_height, canvas_width, canvas_height)[source]
image_to_canvas(x_image, y_image)[source]
initialize_flags()[source]
invert_mask()[source]
load_first_image()[source]
load_image_and_mask(index)[source]
magic_wand_normal(seed_point, tolerance, action)[source]
magic_wand_zoomed(seed_point, tolerance, action)[source]
next_image()[source]
normalize_image(image, lower_quantile, upper_quantile)[source]
overlay_mask_on_image(image, mask, alpha=0.5)[source]
previous_image()[source]
relabel_objects()[source]
remove_small_objects()[source]
resize_arrays(img, mask)[source]
save_mask()[source]
set_zoom_rectangle_end(event)[source]
set_zoom_rectangle_start(event)[source]
setup_canvas()[source]
setup_function_toolbar()[source]
setup_mode_toolbar()[source]
setup_navigation_toolbar()[source]
setup_zoom_toolbar()[source]
start_dividing_line(event)[source]
toggle_brush_mode()[source]
toggle_dividing_line_mode()[source]
toggle_draw_mode()[source]
toggle_erase_mode()[source]
toggle_magic_wand_mode()[source]
toggle_zoom_mode()[source]
update_display()[source]
update_dividing_line_preview(event)[source]
update_mouse_info(event)[source]
update_original_mask(zoomed_mask, x0, x1, y0, y1)[source]
update_original_mask_from_zoom()[source]
update_zoom_box(event)[source]
use_magic_wand(event)[source]
spacr.gui_elements.create_menu_bar(root)[source]
spacr.gui_elements.modify_figure(fig)[source]
spacr.gui_elements.modify_figure_properties(fig, scale_x=None, scale_y=None, line_width=None, font_size=None, x_lim=None, y_lim=None, grid=False, legend=None, title=None, x_label_rotation=None, remove_axes=False, bg_color=None, text_color=None, line_color=None)[source]

Modifies the properties of the figure, including scaling, line widths, font sizes, axis limits, x-axis label rotation, background color, text color, line color, and other common options.

Parameters: - fig: The Matplotlib figure object to modify. - scale_x: Scaling factor for the width of subplots (optional). - scale_y: Scaling factor for the height of subplots (optional). - line_width: Desired line width for all lines (optional). - font_size: Desired font size for all text (optional). - x_lim: Tuple specifying the x-axis limits (min, max) (optional). - y_lim: Tuple specifying the y-axis limits (min, max) (optional). - grid: Boolean to add grid lines to the plot (optional). - legend: Boolean to show/hide the legend (optional). - title: String to set as the title of the plot (optional). - x_label_rotation: Angle to rotate the x-axis labels (optional). - remove_axes: Boolean to remove or show the axes labels (optional). - bg_color: Color for the figure and subplot background (optional). - text_color: Color for all text in the figure (optional). - line_color: Color for all lines in the figure (optional).

spacr.gui_elements.save_figure_as_format(fig, file_format)[source]
spacr.gui_elements.set_dark_style(style, parent_frame=None, containers=None, widgets=None, font_family='OpenSans', font_size=12, bg_color='black', fg_color='white', active_color='blue', inactive_color='dark_gray')[source]
spacr.gui_elements.set_element_size()[source]
class spacr.gui_elements.spacrButton(parent, text='', command=None, font=None, icon_name=None, size=50, show_text=True, outline=False, animation=True, *args, **kwargs)[source]

Bases: Frame

animate_zoom(target_scale, steps=10, delay=10)[source]
clear_description(event)[source]
create_rounded_rectangle(x1, y1, x2, y2, radius=20, **kwargs)[source]
get_icon_path(icon_name)[source]
load_icon()[source]
on_click(event=None)[source]
on_enter(event=None)[source]
on_leave(event=None)[source]
update_description(event)[source]
zoom_icon(scale_factor)[source]
class spacr.gui_elements.spacrCheck(parent, text='', variable=None, *args, **kwargs)[source]

Bases: Frame

draw_rounded_square(color)[source]
toggle_variable(event)[source]
update_check(*args)[source]
class spacr.gui_elements.spacrCheckbutton(parent, text='', variable=None, command=None, *args, **kwargs)[source]

Bases: Checkbutton

class spacr.gui_elements.spacrCombo(parent, textvariable=None, values=None, width=None, *args, **kwargs)[source]

Bases: Frame

close_dropdown()[source]
draw_rounded_rectangle(color)[source]
on_click(event)[source]
on_select(value)[source]
open_dropdown()[source]
set(value)[source]
class spacr.gui_elements.spacrContainer(parent, orient='vertical', bg=None, *args, **kwargs)[source]

Bases: Frame

add(widget, stretch='always')[source]
create_sash()[source]
on_configure(event)[source]
on_enter_sash(event)[source]
on_leave_sash(event)[source]
perform_resize(event)[source]
reposition_panes()[source]
start_resize(event)[source]
class spacr.gui_elements.spacrDropdownMenu(parent, variable, options, command=None, font=None, size=50, **kwargs)[source]

Bases: Frame

create_rounded_rectangle(x1, y1, x2, y2, radius=20, **kwargs)[source]
on_click(event=None)[source]
on_enter(event=None)[source]
on_leave(event=None)[source]
on_select(option)[source]
post_menu()[source]
update_styles(active_categories=None)[source]
class spacr.gui_elements.spacrEntry(parent, textvariable=None, outline=False, width=None, *args, **kwargs)[source]

Bases: Frame

draw_rounded_rectangle(color)[source]
on_focus_in(event)[source]
on_focus_out(event)[source]
class spacr.gui_elements.spacrFont(font_name, font_style, font_size=12)[source]

Bases: object

get_font(size=None)[source]

Returns the font in the specified size.

Parameters: - size: int, the size of the font (optional).

Returns: - tkFont.Font object.

get_font_path(font_name, font_style)[source]

Returns the font path based on the font name and style.

Parameters: - font_name: str, the name of the font. - font_style: str, the style of the font.

Returns: - str, the path to the font file.

load_font()[source]

Loads the font into Tkinter.

class spacr.gui_elements.spacrFrame(container, width=None, *args, bg='black', radius=20, scrollbar=True, textbox=False, **kwargs)[source]

Bases: Frame

rounded_rectangle(canvas, x1, y1, x2, y2, radius=20, **kwargs)[source]
class spacr.gui_elements.spacrLabel(parent, text='', font=None, style=None, align='right', height=None, **kwargs)[source]

Bases: Frame

set_text(text)[source]
class spacr.gui_elements.spacrProgressBar(parent, label=True, *args, **kwargs)[source]

Bases: Progressbar

set_label_position()[source]
update_label()[source]
spacr.gui_elements.spacrScrollbarStyle(style, inactive_color, active_color)[source]
class spacr.gui_elements.spacrSwitch(parent, text='', variable=None, command=None, *args, **kwargs)[source]

Bases: Frame

animate_movement(start_x, end_x, final_color)[source]
animate_switch()[source]
create_rounded_rectangle(x1, y1, x2, y2, radius=9, **kwargs)[source]
get()[source]
set(value)[source]
toggle(event=None)[source]
update_switch()[source]
class spacr.gui_elements.spacrToolTip(widget, text)[source]

Bases: object

hide_tooltip(event)[source]
show_tooltip(event)[source]
spacr.gui_elements.standardize_figure(fig)[source]

spacr.gui_utils module

class spacr.gui_utils.WriteToQueue(q)[source]

Bases: TextIOBase

A custom file-like class that writes any output to a given queue. This can be used to redirect stdout and stderr.

flush()[source]

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

write(msg)[source]

Write string to stream. Returns the number of characters written (which is always equal to the length of the string).

spacr.gui_utils.annotate(settings)[source]
spacr.gui_utils.annotate_app(parent_frame, settings)[source]
spacr.gui_utils.annotate_with_image_refs(settings, root, shutdown_callback)[source]
spacr.gui_utils.cancel_after_tasks(frame)[source]
spacr.gui_utils.convert_settings_dict_for_gui(settings)[source]
spacr.gui_utils.create_input_field(frame, label_text, row, var_type='entry', options=None, default_value=None)[source]

Create an input field in the specified frame.

Parameters:
  • frame (tk.Frame) – The frame in which the input field will be created.

  • label_text (str) – The text to be displayed as the label for the input field.

  • row (int) – The row in which the input field will be placed.

  • var_type (str, optional) – The type of input field to create. Defaults to ‘entry’.

  • options (list, optional) – The list of options for a combo box input field. Defaults to None.

  • default_value (str, optional) – The default value for the input field. Defaults to None.

Returns:

A tuple containing the label, input widget, variable, and custom frame.

Return type:

tuple

Raises:

Exception – If an error occurs while creating the input field.

spacr.gui_utils.download_dataset(q, repo_id, subfolder, local_dir=None, retries=5, delay=5)[source]

Downloads a dataset or settings files from Hugging Face and returns the local path.

Parameters:
  • repo_id (str) – The repository ID (e.g., ‘einarolafsson/toxo_mito’ or ‘einarolafsson/spacr_settings’).

  • subfolder (str) – The subfolder path within the repository (e.g., ‘plate1’ or the settings subfolder).

  • local_dir (str) – The local directory where the files will be saved. Defaults to the user’s home directory.

  • retries (int) – Number of retry attempts in case of failure.

  • delay (int) – Delay in seconds between retries.

Returns:

The local path to the downloaded files.

Return type:

str

spacr.gui_utils.download_hug_dataset(q, vars_dict)[source]
spacr.gui_utils.ensure_after_tasks(frame)[source]
spacr.gui_utils.function_gui_wrapper(function=None, settings={}, q=None, fig_queue=None, imports=1)[source]

Wraps the run_multiple_simulations function to integrate with GUI processes.

Parameters: - settings: dict, The settings for the run_multiple_simulations function. - q: multiprocessing.Queue, Queue for logging messages to the GUI. - fig_queue: multiprocessing.Queue, Queue for sending figures to the GUI.

spacr.gui_utils.generate_annotate_fields(frame)[source]
spacr.gui_utils.hide_all_settings(vars_dict, categories)[source]

Function to initially hide all settings in the GUI.

Parameters: - categories: dict, The categories of settings with their corresponding settings. - vars_dict: dict, The dictionary containing the settings and their corresponding widgets.

spacr.gui_utils.initialize_cuda()[source]

Initializes CUDA in the main process by performing a simple GPU operation.

spacr.gui_utils.load_app(root, app_name, app_func)[source]
spacr.gui_utils.load_next_app(root)[source]
spacr.gui_utils.parse_list(value)[source]

Parses a string representation of a list and returns the parsed list.

Parameters:

value (str) – The string representation of the list.

Returns:

The parsed list.

Return type:

list

Raises:

ValueError – If the input value is not a valid list format or contains mixed types or unsupported types.

spacr.gui_utils.proceed_with_app(root, app_name, app_func)[source]
spacr.gui_utils.process_stdout_stderr(q)[source]

Redirect stdout and stderr to the queue q.

spacr.gui_utils.run_annotate_app(vars_dict, parent_frame)[source]
spacr.gui_utils.run_function_gui(settings_type, settings, q, fig_queue, stop_requested)[source]
spacr.gui_utils.set_cpu_affinity(process)[source]
spacr.gui_utils.set_high_priority(process)[source]
spacr.gui_utils.setup_frame(parent_frame)[source]
spacr.gui_utils.spacrFigShow(fig_queue=None)[source]

Replacement for plt.show() that queues figures instead of displaying them.

spacr.io module

class spacr.io.CombineLoaders(train_loaders)[source]

Bases: object

A class that combines multiple data loaders into a single iterator.

Parameters:

train_loaders (list) – A list of data loaders.

train_loaders

A list of data loaders.

Type:

list

loader_iters

A list of iterator objects for each data loader.

Type:

list

__iter__()[source]

Returns the iterator object itself.

__next__()[source]

Returns the next batch from one of the data loaders.

Raises:

StopIteration – If all data loaders have been exhausted.

class spacr.io.CombinedDataset(datasets, shuffle=True)[source]

Bases: Dataset

A dataset that combines multiple datasets into one.

Parameters:
  • datasets (list) – A list of datasets to be combined.

  • shuffle (bool, optional) – Whether to shuffle the combined dataset. Defaults to True.

class spacr.io.NoClassDataset(data_dir, transform=None, shuffle=True, load_to_memory=False)[source]

Bases: Dataset

load_image(img_path)[source]
shuffle_dataset()[source]
class spacr.io.TarImageDataset(tar_path, transform=None)[source]

Bases: Dataset

spacr.io.concatenate_and_normalize(src, channels, save_dtype=<class 'numpy.float32'>, settings={})[source]
spacr.io.convert_numpy_to_tiff(folder_path, limit=None)[source]

Converts all numpy files in a folder to TIFF format and saves them in a subdirectory ‘tiff’.

Args: folder_path (str): The path to the folder containing numpy files.

spacr.io.delete_empty_subdirectories(folder_path)[source]

Deletes all empty subdirectories in the specified folder.

Args: - folder_path (str): The path to the folder in which to look for empty subdirectories.

spacr.io.generate_cellpose_train_test(src, test_split=0.1)[source]
spacr.io.preprocess_img_data(settings)[source]
spacr.io.read_plot_model_stats(file_path, save=False)[source]
class spacr.io.spacrDataLoader(*args, preload_batches=1, **kwargs)[source]

Bases: DataLoader

cleanup()[source]
class spacr.io.spacrDataset(data_dir, loader_classes, transform=None, shuffle=True, pin_memory=False, specific_files=None, specific_labels=None)[source]

Bases: Dataset

get_plate(filepath)[source]
load_image(img_path)[source]
shuffle_dataset()[source]

spacr.logger module

spacr.logger.log_function_call(func)[source]

spacr.measure module

spacr.measure.filepaths_to_database(img_paths, settings, source_folder, crop_mode)[source]
spacr.measure.generate_cellpose_train_set(folders, dst, min_objects=5)[source]
spacr.measure.get_components(cell_mask, nucleus_mask, pathogen_mask)[source]

Get the components (nucleus and pathogens) for each cell in the given masks.

Parameters:
  • cell_mask (ndarray) – Binary mask of cell labels.

  • nucleus_mask (ndarray) – Binary mask of nucleus labels.

  • pathogen_mask (ndarray) – Binary mask of pathogen labels.

Returns:

A tuple containing two dataframes - nucleus_df and pathogen_df.
nucleus_df (DataFrame): Dataframe with columns ‘cell_id’ and ‘nucleus’,

representing the mapping of each cell to its nucleus.

pathogen_df (DataFrame): Dataframe with columns ‘cell_id’ and ‘pathogen’,

representing the mapping of each cell to its pathogens.

Return type:

tuple

spacr.measure.get_object_counts(src)[source]
spacr.measure.img_list_to_grid(grid, titles=None)[source]

Plot a grid of images with optional titles.

Parameters:
  • grid (list) – List of images to be plotted.

  • titles (list) – List of titles for the images.

Returns:

The matplotlib figure object containing the image grid.

Return type:

fig (Figure)

spacr.measure.measure_crop(settings)[source]

Measure the crop of an image based on the provided settings.

Parameters:

settings (dict) – The settings for measuring the crop.

Returns:

None

spacr.measure.process_meassure_crop_results(partial_results, settings)[source]

Process the results, display, and optionally save the figures.

Parameters:
  • partial_results (list) – List of partial results.

  • settings (dict) – Settings dictionary.

  • save_figures (bool) – Flag to save figures or not.

spacr.measure.save_and_add_image_to_grid(png_channels, img_path, grid, plot=False)[source]

Add an image to a grid and save it as PNG.

Parameters:
  • png_channels (ndarray) – The array representing the image channels.

  • img_path (str) – The path to save the image as PNG.

  • grid (list) – The grid of images to be plotted later.

Returns:

Updated grid with the new image added.

Return type:

grid (list)

spacr.plot module

spacr.plot.generate_mask_random_cmap(mask)[source]

Generate a random colormap based on the unique labels in the given mask.

Parameters: mask (numpy.ndarray): The input mask array.

Returns: matplotlib.colors.ListedColormap: The random colormap.

spacr.plot.generate_plate_heatmap(df, plate_number, variable, grouping, min_max, min_count)[source]
spacr.plot.normalize_and_visualize(image, normalized_image, title='')[source]

Utility function for visualization

spacr.plot.plot_arrays(src, figuresize=10, cmap='inferno', nr=1, normalize=True, q1=1, q2=99)[source]

Plot randomly selected arrays from a given directory.

Parameters: - src (str): The directory path containing the arrays. - figuresize (int): The size of the figure (default: 50). - cmap (str): The colormap to use for displaying the arrays (default: ‘inferno’). - nr (int): The number of arrays to plot (default: 1). - normalize (bool): Whether to normalize the arrays (default: True). - q1 (int): The lower percentile for normalization (default: 1). - q2 (int): The upper percentile for normalization (default: 99).

Returns: None

spacr.plot.plot_comparison_results(comparison_results)[source]
spacr.plot.plot_feature_importance(feature_importance_df)[source]
spacr.plot.plot_histogram(df, dependent_variable)[source]
spacr.plot.plot_image_mask_overlay(file, channels, cell_channel, nucleus_channel, pathogen_channel, figuresize=10, normalize=True, thickness=3, save_pdf=True)[source]

Plot image and mask overlays.

spacr.plot.plot_images_and_arrays(folders, lower_percentile=1, upper_percentile=99, threshold=1000, extensions=['.npy', '.tif', '.tiff', '.png'], overlay=False, max_nr=None, randomize=True)[source]

Plot images and arrays from the given folders.

Parameters:
  • folders (list) – A list of folder paths containing the images and arrays.

  • lower_percentile (int, optional) – The lower percentile for image normalization. Defaults to 1.

  • upper_percentile (int, optional) – The upper percentile for image normalization. Defaults to 99.

  • threshold (int, optional) – The threshold for determining whether to display an image as a mask or normalize it. Defaults to 1000.

  • extensions (list, optional) – A list of file extensions to consider. Defaults to [‘.npy’, ‘.tif’, ‘.tiff’, ‘.png’].

  • overlay (bool, optional) – If True, overlay the outlines of the objects on the image. Defaults to False.

spacr.plot.plot_lorenz_curves(csv_files, remove_keys=['TGGT1_220950_1', 'TGGT1_233460_4'])[source]
spacr.plot.plot_masks(batch, masks, flows, cmap='inferno', figuresize=10, nr=1, file_type='.npz', print_object_number=True)[source]

Plot the masks and flows for a given batch of images.

Parameters:
  • batch (numpy.ndarray) – The batch of images.

  • masks (list or numpy.ndarray) – The masks corresponding to the images.

  • flows (list or numpy.ndarray) – The flows corresponding to the images.

  • cmap (str, optional) – The colormap to use for displaying the images. Defaults to ‘inferno’.

  • figuresize (int, optional) – The size of the figure. Defaults to 20.

  • nr (int, optional) – The maximum number of images to plot. Defaults to 1.

  • file_type (str, optional) – The file type of the flows. Defaults to ‘.npz’.

  • print_object_number (bool, optional) – Whether to print the object number on the mask. Defaults to True.

Returns:

None

spacr.plot.plot_merged(src, settings)[source]

Plot the merged images after applying various filters and modifications.

Parameters:
  • src (path) – Path to folder with images.

  • settings (dict) – The settings for the plot.

Returns:

None

spacr.plot.plot_object_outlines(src, objects=['nucleus', 'cell', 'pathogen'], channels=[0, 1, 2], max_nr=10)[source]
spacr.plot.plot_permutation(permutation_df)[source]
spacr.plot.plot_plates(df, variable, grouping, min_max, cmap, min_count=0, verbose=True)[source]
spacr.plot.plot_resize(images, resized_images, labels, resized_labels)[source]
spacr.plot.print_mask_and_flows(stack, mask, flows, overlay=False)[source]
spacr.plot.random_cmap(num_objects=100)[source]

Generate a random colormap.

Parameters: num_objects (int): The number of objects to generate colors for. Default is 100.

Returns: random_cmap (matplotlib.colors.ListedColormap): A random colormap.

spacr.plot.read_and_plot__vision_results(base_dir, y_axis='accuracy', name_split='_time', y_lim=[0.8, 0.9])[source]
spacr.plot.visualize_cellpose_masks(masks, titles=None, filename=None, save=False, src=None)[source]

Visualize multiple masks with optional titles.

Parameters:
  • masks (list of np.ndarray) – A list of masks to visualize.

  • titles (list of str, optional) – A list of titles for the masks. If None, default titles will be used.

  • comparison_title (str) – Title for the entire figure.

spacr.plot.visualize_masks(mask1, mask2, mask3, title='Masks Comparison')[source]
spacr.plot.volcano_plot(coef_df, filename='volcano_plot.pdf')[source]

spacr.sequencing module

spacr.sequencing.apply_transformation(X, transform)[source]
spacr.sequencing.calculate_p_values(X, y, model)[source]
spacr.sequencing.check_normality(data, variable_name, verbose=False)[source]

Check if the data is normally distributed using the Shapiro-Wilk test.

spacr.sequencing.clean_controls(df, pc, nc, other)[source]
spacr.sequencing.complement(dna_sequence)[source]
spacr.sequencing.consensus_sequence(fastq_r1, fastq_r2, output_file, chunk_size=1000000, n_jobs=None)[source]

Calculate the consensus sequence from two FASTQ files (R1 and R2) and write the result to an output file.

Parameters: - fastq_r1 (str): Path to the R1 FASTQ file. - fastq_r2 (str): Path to the R2 FASTQ file. - output_file (str): Path to the output file where the consensus sequence will be written. - chunk_size (int): Number of reads to process in each chunk. Default is 1000000. - n_jobs (int): Number of parallel processes to use. If None, it will use the number of available CPUs minus 2.

Returns: None

spacr.sequencing.consensus_sequence_v1(fastq_r1, fastq_r2, output_file, chunk_size=1000000)[source]

Generate a consensus sequence from paired-end FASTQ files.

Parameters:
  • fastq_r1 (str) – Path to the first input FASTQ file.

  • fastq_r2 (str) – Path to the second input FASTQ file.

  • output_file (str) – Path to the output FASTQ file.

  • chunk_size (int, optional) – Number of reads to process in each iteration. Defaults to 1000000.

Returns:

None

spacr.sequencing.count_mismatches(seq1, seq2, align_length=10)[source]
spacr.sequencing.equalize_lengths(seq1, seq2, pad_char='N')[source]
spacr.sequencing.extract_barcodes_from_fastq(fastq, output_file, chunk_size, barcode_mapping, n_jobs=None, compression='zlib', complevel=9)[source]

Extracts barcodes from a FASTQ file and maps them based on a barcode mapping.

Parameters:
  • fastq (str) – Path to the input FASTQ file.

  • output_file (str) – Path to the output file where the mapped barcodes will be saved.

  • chunk_size (int) – Number of records to process in each chunk.

  • barcode_mapping (dict) – Dictionary containing barcode mapping information. The keys are the names of the barcode sets, and the values are tuples containing the path to the CSV file, barcode coordinates, and reverse complement flag.

  • n_jobs (int, optional) – Number of parallel processes to use for mapping. Defaults to None.

  • compression (str, optional) – Compression algorithm to use for saving the output file. Defaults to ‘zlib’.

  • complevel (int, optional) – Compression level to use for saving the output file. Defaults to 9.

Returns:

None

spacr.sequencing.extract_barcodes_from_fastq_v1(fastq, output_file, chunk_size, barcode_mapping, n_jobs=None, compression='zlib', complevel=9)[source]

Extracts barcodes from a FASTQ file and saves the results to an output file.

Parameters: - fastq (str): Path to the input FASTQ file. - output_file (str): Path to the output file where the barcode data will be saved. - chunk_size (int): Number of records to process in each chunk. - barcode_mapping (dict): Mapping of barcode keys to CSV file paths, barcode coordinates, and reverse complement flags. - n_jobs (int, optional): Number of parallel processes to use for barcode mapping. Defaults to None. - compression (str, optional): Compression algorithm to use for the output file. Defaults to ‘zlib’. - complevel (int, optional): Compression level to use for the output file. Defaults to 9.

spacr.sequencing.extract_barecodes(r1_fastq, r2_fastq, csv_loc, chunk_size=100000)[source]
spacr.sequencing.file_len(fname)[source]
spacr.sequencing.find_grna(df, grna_df)[source]
spacr.sequencing.generate_barecode_mapping(settings={})[source]
spacr.sequencing.generate_fraction_map(df, gene_column, min_=10, plates=['p1', 'p2', 'p3', 'p4'], metric='count', plot=False)[source]
spacr.sequencing.generate_plate_heatmap(df, plate_number, variable, grouping, min_max)[source]
spacr.sequencing.get_read_data(identifier, prefix)[source]
spacr.sequencing.get_sequence_data(r1, r2)[source]
spacr.sequencing.get_top_two_matches(seq, barcode_dict)[source]

Finds the top two closest matches for a given sequence in a barcode dictionary.

Parameters:
  • seq (str) – The sequence to find the closest matches for.

  • barcode_dict (dict) – A dictionary containing barcodes as keys and their corresponding values.

Returns:

A list containing up to two tuples, each with a barcode match and its score.

Return type:

list of tuples

spacr.sequencing.grna_plate_heatmap(path, specific_grna=None, min_max='all', cmap='viridis', min_count=0, save=True)[source]

Generate a heatmap of gRNA plate data.

Parameters:
  • path (str) – The path to the CSV file containing the gRNA plate data.

  • specific_grna (str, optional) – The specific gRNA to filter the data for. Defaults to None.

  • min_max (str or list or tuple, optional) – The range of values to use for the color scale. If ‘all’, the range will be determined by the minimum and maximum values in the data. If ‘allq’, the range will be determined by the 2nd and 98th percentiles of the data. If a list or tuple of two values, the range will be determined by those values. Defaults to ‘all’.

  • cmap (str, optional) – The colormap to use for the heatmap. Defaults to ‘viridis’.

  • min_count (int, optional) – The minimum count threshold for including a gRNA in the heatmap. Defaults to 0.

  • save (bool, optional) – Whether to save the heatmap as a PDF file. Defaults to True.

Returns:

The generated heatmap figure.

Return type:

matplotlib.figure.Figure

spacr.sequencing.map_unmapped_grnas(df)[source]
spacr.sequencing.parse_gz_files(folder_path)[source]

Parses the .fastq.gz files in the specified folder path and returns a dictionary containing the sample names and their corresponding file paths.

Parameters:

folder_path (str) – The path to the folder containing the .fastq.gz files.

Returns:

A dictionary where the keys are the sample names and the values are dictionaries containing the file paths for the ‘R1’ and ‘R2’ read directions.

Return type:

dict

spacr.sequencing.perform_mixed_model(y, X, groups, alpha=1.0)[source]
spacr.sequencing.perform_regression(df, settings)[source]
spacr.sequencing.plot_data(df, v, h, color, n_col, ax, x_axis, y_axis, fontsize=12, lw=2, ls='-', log_x=False, log_y=False, title=None)[source]
spacr.sequencing.plot_plates(df, variable, grouping, min_max, cmap)[source]
spacr.sequencing.pos_dict(string)[source]
spacr.sequencing.precess_reads(csv_path, fraction_threshold, plate)[source]
spacr.sequencing.process_barecodes(df)[source]
spacr.sequencing.process_chunk_for_consensus(r1_chunk, r2_chunk)[source]

Process a chunk of paired-end sequencing reads to generate consensus sequences.

Parameters:
  • r1_chunk (list) – List of SeqRecord objects representing the first read in each pair.

  • r2_chunk (list) – List of SeqRecord objects representing the second read in each pair.

Returns:

List of SeqRecord objects representing the consensus sequences.

Return type:

list

spacr.sequencing.process_chunk_for_mapping(records, barcode_mapping, barcode_dicts, barcode_coordinates, reverse_complements)[source]

Process a chunk of records for barcode mapping, including highest and second-highest scores.

Parameters:
  • records (list) – A list of records to process.

  • barcode_mapping (dict) – A dictionary mapping barcodes to their corresponding keys.

  • barcode_dicts (dict) – A dictionary of barcode dictionaries.

  • barcode_coordinates (dict) – A dictionary mapping barcode keys to their start and end coordinates.

  • reverse_complements (dict) – A dictionary indicating whether to reverse complement the extracted sequences for each barcode key.

Returns:

A DataFrame containing the processed data.

Return type:

pandas.DataFrame

spacr.sequencing.process_scores(df, dependent_variable, plate, min_cell_count=25, agg_type='mean', transform=None, regression_type='ols')[source]
spacr.sequencing.regression(df, csv_path, dependent_variable='predictions', regression_type=None, alpha=1.0, remove_row_column_effect=False)[source]
spacr.sequencing.regression_model(X, y, regression_type='ols', groups=None, alpha=1.0, remove_row_column_effect=True)[source]
spacr.sequencing.remove_outliers(df, low=0.01, high=0.99)[source]
spacr.sequencing.reverse_complement(dna_sequence)[source]
spacr.sequencing.save_to_hdf(queue, output_file, complevel=9, compression='zlib')[source]

Save data from a queue to an HDF file.

Parameters: - queue: Queue object containing chunks of data to be saved - output_file: Path to the output HDF file - complevel: Compression level (default: 9) - compression: Compression algorithm (default: ‘zlib’)

Returns: None

spacr.sequencing.split_fastq(input_fastq, output_base, num_files)[source]
spacr.sequencing.test_error(df, min_=25, max_=3025, metric='count', log_x=False, log_y=False)[source]
spacr.sequencing.translate_barecodes(df, grna_df, map_unmapped=False)[source]
spacr.sequencing.truncate_read(seq, qual, target)[source]
spacr.sequencing.vert_horiz(v, h, n_col)[source]

spacr.settings module

spacr.settings.check_settings(vars_dict, expected_types, q=None)[source]
spacr.settings.deep_spacr_defaults(settings)[source]
spacr.settings.generate_fields(variables, scrollable_frame)[source]
spacr.settings.get_analyze_reads_default_settings(settings)[source]
spacr.settings.get_analyze_recruitment_default_settings(settings)[source]
spacr.settings.get_check_cellpose_models_default_settings(settings)[source]
spacr.settings.get_identify_masks_finetune_default_settings(settings)[source]
spacr.settings.get_map_barcodes_default_settings(settings)[source]
spacr.settings.get_measure_crop_settings(settings)[source]
spacr.settings.get_perform_regression_default_settings(settings)[source]
spacr.settings.get_train_cellpose_default_settings(settings)[source]
spacr.settings.set_annotate_default_settings(settings)[source]
spacr.settings.set_default_analyze_screen(settings)[source]
spacr.settings.set_default_generate_barecode_mapping(settings={})[source]
spacr.settings.set_default_plot_merge_settings()[source]
spacr.settings.set_default_settings_preprocess_generate_masks(src, settings={})[source]
spacr.settings.set_default_settings_preprocess_img_data(settings)[source]
spacr.settings.set_default_train_test_model(settings)[source]
spacr.settings.set_default_umap_image_settings(settings={})[source]
spacr.settings.set_generate_training_dataset_defaults(settings)[source]

spacr.sim module

spacr.sim.append_database(src, table, table_name)[source]

Append a pandas DataFrame to an SQLite database table.

Parameters: src (str): The source directory where the database file is located. table (pandas.DataFrame): The DataFrame to be appended to the database table. table_name (str): The name of the database table.

Returns: None

spacr.sim.calculate_permutation_importance(df, target='prauc', exclude=None, n_repeats=10, clean=True)[source]

Calculates permutation importance for the given features in the dataframe.

Args: df (pandas.DataFrame): The DataFrame containing the data. features (list): List of column names to include as features. target (str): The name of the target variable column.

Returns: dict: Dictionary containing the importances and standard deviations.

spacr.sim.cell_level_roc_auc(cell_scores)[source]

Compute the ROC AUC and precision-recall metrics at the cell level.

Parameters:

cell_scores (list) – List of scores for each cell.

Returns:

DataFrame containing the ROC AUC metrics for each cell. cell_pr_dict_df (DataFrame): DataFrame containing the precision-recall metrics for each cell. cell_scores (list): Updated list of scores after applying the optimum threshold. cell_cm (array): Confusion matrix for the cell-level classification.

Return type:

cell_roc_dict_df (DataFrame)

spacr.sim.classifier(positive_mean, positive_variance, negative_mean, negative_variance, classifier_accuracy, df)[source]

Classifies the data in the DataFrame based on the given parameters and a classifier error rate.

Parameters:
  • positive_mean (float) – The mean of the positive distribution.

  • positive_variance (float) – The variance of the positive distribution.

  • negative_mean (float) – The mean of the negative distribution.

  • negative_variance (float) – The variance of the negative distribution.

  • classifier_accuracy (float) – The likelihood (0 to 1) that a gene is correctly classified according to its true label.

  • df (pandas.DataFrame) – The DataFrame containing the data to be classified.

Returns:

The DataFrame with an additional ‘score’ column containing the classification scores.

Return type:

pandas.DataFrame

spacr.sim.classifier_v2(positive_mean, positive_variance, negative_mean, negative_variance, df)[source]

Classifies the data in the DataFrame based on the given parameters.

Parameters:
  • positive_mean (float) – The mean of the positive distribution.

  • positive_variance (float) – The variance of the positive distribution.

  • negative_mean (float) – The mean of the negative distribution.

  • negative_variance (float) – The variance of the negative distribution.

  • df (pandas.DataFrame) – The DataFrame containing the data to be classified.

Returns:

The DataFrame with an additional ‘score’ column containing the classification scores.

Return type:

pandas.DataFrame

spacr.sim.compute_precision_recall(cell_scores)[source]

Compute precision, recall, F1 score, and PR AUC for a given set of cell scores.

Parameters: - cell_scores (DataFrame): A DataFrame containing the cell scores with columns ‘is_active’ and ‘score’.

Returns: - cell_pr_dict (dict): A dictionary containing the computed precision, recall, F1 score, PR AUC, and threshold values.

spacr.sim.compute_roc_auc(cell_scores)[source]

Compute the Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) for cell scores.

Parameters: - cell_scores (DataFrame): DataFrame containing cell scores with columns ‘is_active’ and ‘score’.

Returns: - cell_roc_dict (dict): Dictionary containing the ROC curve information, including the threshold, true positive rate (TPR), false positive rate (FPR), and ROC AUC.

spacr.sim.create_database(db_path)[source]

Creates a SQLite database at the specified path.

Parameters:

db_path (str) – The path where the database should be created.

Returns:

None

spacr.sim.dist_gen(mean, sd, df)[source]

Generate a Poisson distribution based on a gamma distribution.

Parameters: mean (float): Mean of the gamma distribution. sd (float): Standard deviation of the gamma distribution. df (pandas.DataFrame): Input data.

Returns: tuple: A tuple containing the generated Poisson distribution and the length of the input data.

spacr.sim.generate_floats(start, stop, step)[source]
spacr.sim.generate_gene_list(number_of_genes, number_of_all_genes)[source]

Generates a list of randomly selected genes.

Parameters:
  • number_of_genes (int) – The number of genes to be selected.

  • number_of_all_genes (int) – The total number of genes available.

Returns:

A list of randomly selected genes.

Return type:

list

spacr.sim.generate_gene_weights(positive_mean, positive_variance, df)[source]

Generate gene weights using a beta distribution.

Parameters: - positive_mean (float): The mean value for the positive distribution. - positive_variance (float): The variance value for the positive distribution. - df (pandas.DataFrame): The DataFrame containing the data.

Returns: - weights (numpy.ndarray): An array of gene weights generated using a beta distribution.

spacr.sim.generate_integers(start, stop, step)[source]
spacr.sim.generate_paramiters(settings)[source]

Generate a list of parameter sets for simulation based on the given settings.

Parameters:

settings (dict) – A dictionary containing the simulation settings.

Returns:

A list of parameter sets for simulation.

Return type:

list

spacr.sim.generate_plate_map(nr_plates)[source]

Generate a plate map based on the number of plates.

Parameters: nr_plates (int): The number of plates to generate the map for.

Returns: pandas.DataFrame: The generated plate map dataframe.

spacr.sim.generate_power_law_distribution(num_elements, coeff)[source]

Generate a power law distribution.

Parameters: - num_elements (int): The number of elements in the distribution. - coeff (float): The coefficient of the power law.

Returns: - normalized_distribution (ndarray): The normalized power law distribution.

spacr.sim.generate_shap_summary_plot(df, target='prauc', clean=True)[source]

Generates a SHAP summary plot for the given features in the dataframe.

Args: df (pandas.DataFrame): The DataFrame containing the data. features (list): List of column names to include as features. target (str): The name of the target variable column.

Returns: None

spacr.sim.generate_well_score(cell_scores)[source]

Generate well scores based on cell scores.

Parameters:

cell_scores (DataFrame) – DataFrame containing cell scores.

Returns:

DataFrame containing well scores with average active score, gene list, and score.

Return type:

DataFrame

spacr.sim.get_optimum_threshold(cell_pr_dict)[source]

Calculates the optimum threshold based on the f1_score in the given cell_pr_dict.

Parameters: cell_pr_dict (dict): A dictionary containing precision, recall, and f1_score values for different thresholds.

Returns: float: The optimum threshold value.

spacr.sim.gini(x)[source]

Calculate the Gini coefficient for a given array of values.

Parameters: x (array-like): The input array of values.

Returns: float: The Gini coefficient.

References: - Based on bottom eq: http://www.statsdirect.com/help/content/image/stat0206_wmf.gif - From: http://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm - All values are treated equally, arrays must be 1d.

spacr.sim.gini_coefficient(x)[source]

Compute Gini coefficient of array of values.

Parameters: x (array-like): Array of values.

Returns: float: Gini coefficient.

spacr.sim.gini_gene_well(x)[source]

Calculate the Gini coefficient for a given income distribution.

The Gini coefficient measures income inequality in a population. A value of 0 represents perfect income equality (everyone has the same income), while a value of 1 represents perfect income inequality (one individual has all the income).

Parameters: x (array-like): An array-like object representing the income distribution.

Returns: float: The Gini coefficient for the given income distribution.

spacr.sim.normalize_array(arr)[source]

Normalize an array by scaling its values between 0 and 1.

Parameters: arr (numpy.ndarray): The input array to be normalized.

Returns: numpy.ndarray: The normalized array.

spacr.sim.plot_confusion_matrix(data, ax, title)[source]

Plots a confusion matrix using a heatmap.

Parameters: data (numpy.ndarray): The confusion matrix data. ax (matplotlib.axes.Axes): The axes object to plot the heatmap on. title (str): The title of the plot.

Returns: None

spacr.sim.plot_correlation_matrix(df, annot=False, cmap='inferno', clean=True)[source]

Plots a correlation matrix for the specified variables and the target variable.

Args: df (pandas.DataFrame): The DataFrame containing the data. variables (list): List of column names to include in the correlation matrix. target_variable (str): The target variable column name.

Returns: None

spacr.sim.plot_feature_importance(df, target='prauc', exclude=None, clean=True)[source]

Trains a RandomForestRegressor to determine the importance of each feature in predicting the target.

Args: df (pandas.DataFrame): The DataFrame containing the data. target (str): The target variable column name. exclude (list or str, optional): Column names to exclude from features.

Returns: matplotlib.figure.Figure: The figure object containing the feature importance plot.

spacr.sim.plot_histogram(data, x_label, ax, color, title, binwidth=0.01, log=False)[source]

Plots a histogram of the given data.

Parameters: - data: The data to be plotted. - x_label: The label for the x-axis. - ax: The matplotlib axis object to plot on. - color: The color of the histogram bars. - title: The title of the plot. - binwidth: The width of each histogram bin. - log: Whether to use a logarithmic scale for the y-axis.

Returns: None

spacr.sim.plot_partial_dependences(df, target='prauc', clean=True)[source]

Creates partial dependence plots for the specified features, with improved layout to avoid text overlap.

Args: df (pandas.DataFrame): The DataFrame containing the data. target (str): The target variable.

Returns: None

spacr.sim.plot_roc_pr(data, ax, title, x_label, y_label)[source]

Plot the ROC (Receiver Operating Characteristic) and PR (Precision-Recall) curves.

Parameters: - data: DataFrame containing the data to be plotted. - ax: The matplotlib axes object to plot on. - title: The title of the plot. - x_label: The label for the x-axis. - y_label: The label for the y-axis.

spacr.sim.plot_simulations(df, variable, x_rotation=None, legend=False, grid=False, clean=True, verbose=False)[source]

Creates separate line plots for ‘prauc’ against a specified ‘variable’, for each unique combination of conditions defined by ‘grouping_vars’, displayed on a grid.

Args: df (pandas.DataFrame): DataFrame containing the necessary columns. variable (str): Name of the column to use as the x-axis for grouping and plotting. x_rotation (int, optional): Degrees to rotate the x-axis labels. legend (bool, optional): Whether to display a legend. grid (bool, optional): Whether to display grid lines. verbose (bool, optional): Whether to print the filter conditions.

Returns: None

spacr.sim.power_law_dist_gen(df, avg, well_ineq_coeff)[source]

Generate a power-law distribution for wells.

Parameters: - df: DataFrame: The input DataFrame containing the wells. - avg: float: The average value for the distribution. - well_ineq_coeff: float: The inequality coefficient for the power-law distribution.

Returns: - dist: ndarray: The generated power-law distribution for the wells.

spacr.sim.read_simulations_table(db_path)[source]

Reads the ‘simulations’ table from an SQLite database into a pandas DataFrame.

Args: db_path (str): The file path to the SQLite database.

Returns: pandas.DataFrame: DataFrame containing the ‘simulations’ table data.

spacr.sim.regression_roc_auc(results_df, active_gene_list, control_gene_list, alpha=0.05, optimal=False)[source]

Calculate regression ROC AUC and other statistics.

Parameters: results_df (DataFrame): DataFrame containing the results of regression analysis. active_gene_list (list): List of active gene IDs. control_gene_list (list): List of control gene IDs. alpha (float, optional): Significance level for determining hits. Default is 0.05. optimal (bool, optional): Whether to use the optimal threshold for classification. Default is False.

Returns: tuple: A tuple containing the following: - results_df (DataFrame): Updated DataFrame with additional columns. - reg_roc_dict_df (DataFrame): DataFrame containing regression ROC curve data. - reg_pr_dict_df (DataFrame): DataFrame containing precision-recall curve data. - reg_cm (ndarray): Confusion matrix. - sim_stats (DataFrame): DataFrame containing simulation statistics.

spacr.sim.remove_columns_with_single_value(df)[source]

Removes columns from the DataFrame that have the same value in all rows.

Args: df (pandas.DataFrame): The original DataFrame.

Returns: pandas.DataFrame: A DataFrame with the columns removed that contained only one unique value.

spacr.sim.remove_constant_columns(df)[source]

Removes columns in the DataFrame where all entries have the same value.

Parameters: df (pd.DataFrame): The input DataFrame from which to remove constant columns.

Returns: pd.DataFrame: A DataFrame with the constant columns removed.

spacr.sim.run_and_save(i, settings, time_ls, total_sims)[source]

Run the simulation and save the results.

Parameters:
  • i (int) – The simulation index.

  • settings (dict) – The simulation settings.

  • time_ls (list) – The list to store simulation times.

  • total_sims (int) – The total number of simulations.

Returns:

A tuple containing the simulation index, simulation time, and None.

Return type:

tuple

spacr.sim.run_experiment(plate_map, number_of_genes, active_gene_list, avg_genes_per_well, sd_genes_per_well, avg_cells_per_well, sd_cells_per_well, well_ineq_coeff, gene_ineq_coeff)[source]

Run a simulation experiment.

Parameters:
  • plate_map (DataFrame) – The plate map containing information about the wells.

  • number_of_genes (int) – The total number of genes.

  • active_gene_list (list) – The list of active genes.

  • avg_genes_per_well (float) – The average number of genes per well.

  • sd_genes_per_well (float) – The standard deviation of genes per well.

  • avg_cells_per_well (float) – The average number of cells per well.

  • sd_cells_per_well (float) – The standard deviation of cells per well.

  • well_ineq_coeff (float) – The coefficient for well inequality.

  • gene_ineq_coeff (float) – The coefficient for gene inequality.

Returns:

A tuple containing the following:
  • cell_df (DataFrame): The DataFrame containing information about the cells.

  • genes_per_well_df (DataFrame): The DataFrame containing gene counts per well.

  • wells_per_gene_df (DataFrame): The DataFrame containing well counts per gene.

  • df_ls (list): A list containing gene counts per well, well counts per gene, Gini coefficients for wells, Gini coefficients for genes, gene weights array, and well weights.

Return type:

tuple

spacr.sim.run_multiple_simulations(settings)[source]

Run multiple simulations in parallel using the provided settings.

Parameters:

settings (dict) – A dictionary containing the simulation settings.

Returns:

None

spacr.sim.run_simulation(settings)[source]

Run the simulation based on the given settings.

Parameters:

settings (dict) – A dictionary containing the simulation settings.

Returns:

A tuple containing the simulation results and distances. - cell_scores (DataFrame): Scores for each cell. - cell_roc_dict_df (DataFrame): ROC AUC scores for each cell. - cell_pr_dict_df (DataFrame): Precision-Recall AUC scores for each cell. - cell_cm (DataFrame): Confusion matrix for each cell. - well_score (DataFrame): Scores for each well. - gene_fraction_map (DataFrame): Fraction of genes for each well. - metadata (DataFrame): Metadata for each well. - results_df (DataFrame): Results of the regression analysis. - reg_roc_dict_df (DataFrame): ROC AUC scores for each gene. - reg_pr_dict_df (DataFrame): Precision-Recall AUC scores for each gene. - reg_cm (DataFrame): Confusion matrix for each gene. - sim_stats (dict): Additional simulation statistics. - genes_per_well_df (DataFrame): Number of genes per well. - wells_per_gene_df (DataFrame): Number of wells per gene. dists (list): List of distances.

Return type:

tuple

spacr.sim.save_data(src, output, settings, save_all=False, i=0, variable='all')[source]

Save simulation data to specified location.

Parameters:
  • src (str) – The directory path where the data will be saved.

  • output (list) – A list of dataframes containing simulation output.

  • settings (dict) – A dictionary containing simulation settings.

  • save_all (bool, optional) – Flag indicating whether to save all tables or only a subset. Defaults to False.

  • i (int, optional) – The simulation number. Defaults to 0.

  • variable (str, optional) – The variable name. Defaults to ‘all’.

Returns:

None

spacr.sim.save_plot(fig, src, variable, i)[source]

Save a matplotlib figure as a PDF file.

Parameters: - fig: The matplotlib figure to be saved. - src: The directory where the file will be saved. - variable: The name of the variable being plotted. - i: The index of the figure.

Returns: None

spacr.sim.save_shap_plot(fig, src, variable, i)[source]
spacr.sim.sequence_plates(well_score, number_of_genes, avg_reads_per_gene, sd_reads_per_gene, sequencing_error=0.01)[source]

Simulates the sequencing of plates and calculates gene fractions and metadata.

Parameters: well_score (pd.DataFrame): DataFrame containing well scores and gene lists. number_of_genes (int): Number of genes. avg_reads_per_gene (float): Average number of reads per gene. sd_reads_per_gene (float): Standard deviation of reads per gene. sequencing_error (float, optional): Probability of introducing sequencing error. Defaults to 0.01.

Returns: gene_fraction_map (pd.DataFrame): DataFrame containing gene fractions for each well. metadata (pd.DataFrame): DataFrame containing metadata for each well.

spacr.sim.update_scores_and_get_cm(cell_scores, optimum)[source]

Update the cell scores based on the given optimum value and calculate the confusion matrix.

Parameters:
  • cell_scores (DataFrame) – The DataFrame containing the cell scores.

  • optimum (float) – The optimum value used for updating the scores.

Returns:

A tuple containing the updated cell scores DataFrame and the confusion matrix.

Return type:

tuple

spacr.sim.validate_and_adjust_beta_params(sim_params)[source]

Validates and adjusts Beta distribution parameters in simulation settings to ensure they are possible.

Args: sim_params (list of dict): List of dictionaries, each containing the simulation parameters.

Returns: list of dict: The adjusted list of simulation parameter sets.

spacr.sim.vis_dists(dists, src, v, i)[source]

Visualizes the distributions of given distances.

Parameters:
  • dists (list) – List of distance arrays.

  • src (str) – Source directory for saving the plot.

  • v (int) – Number of vertices.

  • i (int) – Index of the plot.

Returns:

None

spacr.sim.visualize_all(output)[source]

Visualizes various plots based on the given output data.

Parameters:

output (list) – A list containing the following elements: - cell_scores (DataFrame): DataFrame containing cell scores. - cell_roc_dict_df (DataFrame): DataFrame containing ROC curve data for cell classification. - cell_pr_dict_df (DataFrame): DataFrame containing precision-recall curve data for cell classification. - cell_cm (array-like): Confusion matrix for cell classification. - well_score (DataFrame): DataFrame containing well scores. - gene_fraction_map (dict): Dictionary mapping genes to fractions. - metadata (dict): Dictionary containing metadata. - results_df (DataFrame): DataFrame containing results. - reg_roc_dict_df (DataFrame): DataFrame containing ROC curve data for gene regression. - reg_pr_dict_df (DataFrame): DataFrame containing precision-recall curve data for gene regression. - reg_cm (array-like): Confusion matrix for gene regression. - sim_stats (dict): Dictionary containing simulation statistics. - genes_per_well_df (DataFrame): DataFrame containing genes per well data. - wells_per_gene_df (DataFrame): DataFrame containing wells per gene data.

Returns:

The generated figure object.

Return type:

fig (matplotlib.figure.Figure)

spacr.sim_app module

spacr.timelapse module

spacr.timelapse.analyze_calcium_oscillations(db_loc, measurement='cell_channel_1_mean_intensity', size_filter='cell_area', fluctuation_threshold=0.25, num_lines=None, peak_height=0.01, pathogen=None, cytoplasm=None, remove_transient=True, verbose=False, transience_threshold=0.9)[source]
spacr.timelapse.exponential_decay(x, a, b, c)[source]
spacr.timelapse.infected_vs_noninfected(result_df, measurement)[source]
spacr.timelapse.plot_data(measurement, group, ax, label, marker='o', linestyle='-')[source]
spacr.timelapse.preprocess_pathogen_data(pathogen_df)[source]
spacr.timelapse.save_figure(fig, src, figure_number)[source]
spacr.timelapse.save_results_dataframe(df, src, results_name)[source]
spacr.timelapse.summarize_per_well(peak_details_df)[source]
spacr.timelapse.summarize_per_well_inf_non_inf(peak_details_df)[source]

spacr.utils module

class spacr.utils.Cache(max_size)[source]

Bases: object

A class representing a cache with a maximum size.

max_size

The maximum size of the cache.

Type:

int

cache

The cache data structure.

Type:

OrderedDict

get(key)[source]
put(key, value)[source]
class spacr.utils.CustomCellClassifier(num_classes, pathogen_channel, use_attention, use_checkpoint, dropout_rate)[source]

Bases: Module

custom_forward(x)[source]
forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class spacr.utils.EarlyFusion(in_channels)[source]

Bases: Module

Early Fusion module for image classification.

Parameters:

in_channels (int) – Number of input channels.

forward(x)[source]

Forward pass of the Early Fusion module.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, height, width).

Returns:

Output tensor of shape (batch_size, 64, height, width).

Return type:

torch.Tensor

class spacr.utils.FocalLossWithLogits(alpha=1, gamma=2)[source]

Bases: Module

forward(logits, target)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class spacr.utils.GradCAM(model, target_layers=None, use_cuda=True)[source]

Bases: object

forward(input)[source]
class spacr.utils.IntegratedGradients(model)[source]

Bases: object

generate_integrated_gradients(input_tensor, target_label_idx, baseline=None, num_steps=50)[source]
spacr.utils.MLR(merged_df, refine_model)[source]
class spacr.utils.MultiScaleBlockWithAttention(in_channels, out_channels)[source]

Bases: Module

custom_forward(x)[source]
forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class spacr.utils.ResNet(resnet_type='resnet50', dropout_rate=None, use_checkpoint=False, init_weights='imagenet')[source]

Bases: Module

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

initialize_base(base_model_dict, dropout_rate, use_checkpoint, init_weights)[source]
class spacr.utils.SaliencyMapGenerator(model)[source]

Bases: object

compute_saliency_maps(X, y)[source]
plot_saliency_maps(X, y, saliency, class_names)[source]
class spacr.utils.ScaledDotProductAttention(d_k)[source]

Bases: Module

forward(Q, K, V)[source]

Performs the forward pass of the ScaledDotProductAttention module.

Parameters:
  • Q (torch.Tensor) – The query tensor.

  • K (torch.Tensor) – The key tensor.

  • V (torch.Tensor) – The value tensor.

Returns:

The output tensor.

Return type:

torch.Tensor

class spacr.utils.SelectChannels(channels)[source]

Bases: object

class spacr.utils.SelfAttention(in_channels, d_k)[source]

Bases: Module

Self-Attention module that applies scaled dot-product attention mechanism.

Parameters:
  • in_channels (int) – Number of input channels.

  • d_k (int) – Dimensionality of the key and query vectors.

forward(x)[source]

Forward pass of the SelfAttention module.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, in_channels).

Returns:

Output tensor after applying self-attention mechanism.

Return type:

torch.Tensor

class spacr.utils.SpatialAttention(kernel_size=7)[source]

Bases: Module

forward(x)[source]

Performs forward pass of the SpatialAttention module.

Parameters:

x (torch.Tensor) – The input tensor.

Returns:

The output tensor after applying spatial attention.

Return type:

torch.Tensor

class spacr.utils.TorchModel(model_name='resnet50', pretrained=True, dropout_rate=None, use_checkpoint=False)[source]

Bases: Module

apply_dropout_rate(model, dropout_rate)[source]

Apply dropout rate to all dropout layers in the model.

forward(x)[source]

Define the forward pass of the model.

get_num_ftrs()[source]

Determine the number of features output by the base model.

get_weight_choice()[source]

Get weight choice if it exists for the model.

init_base_model(pretrained)[source]

Initialize the base model from torchvision.models.

init_spacr_classifier(dropout_rate)[source]

Initialize the SPACR classifier.

spacr.utils.add_images_to_tar(paths_chunk, tar_path, total_images)[source]
spacr.utils.adjust_cell_masks(parasite_folder, cell_folder, nuclei_folder, overlap_threshold=5, perimeter_threshold=30)[source]

Process all npy files in the given folders. Merge and relabel cells in cell masks based on parasite overlap and cell perimeter sharing conditions.

Parameters:
  • parasite_folder (str) – Path to the folder containing parasite masks.

  • cell_folder (str) – Path to the folder containing cell masks.

  • nuclei_folder (str) – Path to the folder containing nuclei masks.

  • overlap_threshold (float) – The percentage threshold for merging cells based on parasite overlap.

  • perimeter_threshold (float) – The percentage threshold for merging cells based on shared perimeter.

spacr.utils.annotate_conditions(df, cells=['HeLa'], cell_loc=None, pathogens=['rh'], pathogen_loc=None, treatments=['cm'], treatment_loc=None, types=['col', 'col', 'col'])[source]

Annotates conditions in a DataFrame based on specified criteria.

Parameters:
  • df (pandas.DataFrame) – The DataFrame to annotate.

  • cells (list, optional) – List of host cell types. Defaults to [‘HeLa’].

  • cell_loc (list, optional) – List of corresponding values for each host cell type. Defaults to None.

  • pathogens (list, optional) – List of pathogens. Defaults to [‘rh’].

  • pathogen_loc (list, optional) – List of corresponding values for each pathogen. Defaults to None.

  • treatments (list, optional) – List of treatments. Defaults to [‘cm’].

  • treatment_loc (list, optional) – List of corresponding values for each treatment. Defaults to None.

  • types (list, optional) – List of column types for host cells, pathogens, and treatments. Defaults to [‘col’,’col’,’col’].

Returns:

The annotated DataFrame.

Return type:

pandas.DataFrame

spacr.utils.annotate_predictions(csv_loc)[source]
spacr.utils.apply_mask(image, output_value=0)[source]
spacr.utils.assign_colors(unique_labels, random_colors)[source]
spacr.utils.augment_classes(dst, nc, pc, generate=True, move=True)[source]
spacr.utils.augment_dataset(dataset, is_grayscale=False)[source]

Perform data augmentation on the entire dataset by rotating and reflecting the images.

Parameters: - dataset (list of tuples): The input dataset, each entry is a tuple (image, label, filename). - is_grayscale (bool): Flag indicating if the images are grayscale.

Returns: - augmented_dataset (list of tuples): A dataset with augmented (image, label, filename) tuples.

spacr.utils.augment_image(image)[source]

Perform data augmentation by rotating and reflecting the image.

Parameters: - image (PIL Image or numpy array): The input image.

Returns: - augmented_images (list): A list of augmented images.

spacr.utils.augment_images(file_paths, dst)[source]
spacr.utils.augment_single_image(args)[source]
spacr.utils.boundary_f1_score(mask_true, mask_pred, dilation_radius=1)[source]
spacr.utils.calculate_iou(mask1, mask2)[source]
spacr.utils.calculate_loss(output, target, loss_type='binary_cross_entropy_with_logits')[source]
spacr.utils.check_mask_folder(src, mask_fldr)[source]
spacr.utils.check_multicollinearity(x)[source]

Checks multicollinearity of the predictors by computing the VIF.

spacr.utils.check_normality(series)[source]

Helper function to check if a feature is normally distributed.

spacr.utils.check_overlap(current_position, other_positions, threshold)[source]
spacr.utils.choose_model(model_type, device, init_weights=True, dropout_rate=0, use_checkpoint=False, channels=3, height=224, width=224, chan_dict=None, num_classes=2, verbose=False)[source]

Choose a model for classification.

Parameters:
  • model_type (str) – The type of model to choose. Can be one of the pre-defined TorchVision models or ‘custom’ for a custom model.

  • device (str) – The device to use for model inference.

  • init_weights (bool, optional) – Whether to initialize the model with pre-trained weights. Defaults to True.

  • dropout_rate (float, optional) – The dropout rate to use in the model. Defaults to 0.

  • use_checkpoint (bool, optional) – Whether to use checkpointing during model training. Defaults to False.

  • channels (int, optional) – The number of input channels for the model. Defaults to 3.

  • height (int, optional) – The height of the input images for the model. Defaults to 224.

  • width (int, optional) – The width of the input images for the model. Defaults to 224.

  • chan_dict (dict, optional) – A dictionary containing channel information for custom models. Defaults to None.

  • num_classes (int, optional) – The number of output classes for the model. Defaults to 2.

Returns:

The chosen model.

Return type:

torch.nn.Module

spacr.utils.class_visualization(target_y, model_path, dtype, img_size=224, channels=[0, 1, 2], l2_reg=0.001, learning_rate=25, num_iterations=100, blur_every=10, max_jitter=16, show_every=25, class_names=['nc', 'pc'])[source]
spacr.utils.classification_metrics(all_labels, prediction_pos_probs, loss, epoch)[source]

Calculate classification metrics for binary classification.

Parameters: - all_labels (list): List of true labels. - prediction_pos_probs (list): List of predicted positive probabilities. - loader_name (str): Name of the data loader. - loss (float): Loss value. - epoch (int): Epoch number.

Returns: - data_df (DataFrame): DataFrame containing the calculated metrics.

spacr.utils.close_file_descriptors()[source]

Close file descriptors and shared memory objects.

spacr.utils.close_multiprocessing_processes()[source]

Close all multiprocessing processes.

spacr.utils.cluster_feature_analysis(all_df, cluster_col='cluster')[source]

Perform Random Forest feature importance, ANOVA for normally distributed features, and Kruskal-Wallis for non-normally distributed features. Combine results into a single DataFrame.

spacr.utils.combine_results(rf_df, anova_df, kruskal_df)[source]

Combine the results into a single DataFrame.

spacr.utils.compute_ap_over_iou_thresholds(true_masks, pred_masks, iou_thresholds)[source]
spacr.utils.compute_average_precision(matches, num_true_masks, num_pred_masks)[source]
spacr.utils.compute_irm_penalty(losses, dummy_w, device)[source]

Computes the Invariant Risk Minimization (IRM) penalty.

Parameters:
  • losses (list) – A list of losses.

  • dummy_w (torch.Tensor) – A dummy weight tensor.

  • device (torch.device) – The device to perform computations on.

Returns:

The computed IRM penalty.

Return type:

float

spacr.utils.compute_segmentation_ap(true_masks, pred_masks, iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[source]
spacr.utils.convert_and_relabel_masks(folder_path)[source]

Converts all int64 npy masks in a folder to uint16 with relabeling to ensure all labels are retained.

Parameters: - folder_path (str): The path to the folder containing int64 npy mask files.

Returns: - None

spacr.utils.correct_masks(src)[source]
spacr.utils.correct_paths(df, base_path)[source]
spacr.utils.count_reads_in_fastq(fastq_file)[source]
spacr.utils.create_circular_mask(h, w, center=None, radius=None)[source]
spacr.utils.delete_folder(folder_path)[source]
spacr.utils.dice_coefficient(mask1, mask2)[source]
spacr.utils.extract_boundaries(mask, dilation_radius=1)[source]
spacr.utils.extract_features(image_paths, resnet=<function resnet50>)[source]

Extract features from images using a pre-trained ResNet model.

spacr.utils.filter_columns(df, filter_by)[source]
spacr.utils.filter_dataframe_features(df, channel_of_interest, exclude=None, remove_low_variance_features=True, remove_highly_correlated_features=True, verbose=False)[source]

Filter the dataframe df based on the specified channel_of_interest and exclude parameters.

Parameters: - df (pandas.DataFrame): The input dataframe to be filtered. - channel_of_interest (str, int, list, None): The channel(s) of interest to filter the dataframe. If None, no filtering is applied. If ‘morphology’, only morphology features are included.If an integer, only the specified channel is included. If a list, only the specified channels are included.If a string, only the specified channel is included. - exclude (str, list, None): The feature(s) to exclude from the filtered dataframe. If None, no features are excluded. If a string, the specified feature is excluded.If a list, the specified features are excluded.

Returns: - filtered_df (pandas.DataFrame): The filtered dataframe based on the specified parameters. - features (list): The list of selected features after filtering.

spacr.utils.find_non_overlapping_position(x, y, image_positions, threshold, max_attempts=100)[source]
spacr.utils.fishers_odds(df, threshold=0.5, phenotyp_col='mean_pred')[source]
spacr.utils.generate_colors(num_clusters, black_background)[source]
spacr.utils.generate_dependent_variable(df, dv_loc, pc_min=0.95, nc_max=0.05, agg_type='mean')[source]
spacr.utils.generate_fraction_map(df, gene_column, min_frequency=0.0)[source]
spacr.utils.get_db_paths(src)[source]
spacr.utils.get_files_from_dir(dir_path, file_extension='*')[source]
spacr.utils.get_ml_results_paths(src, model_type='xgboost', channel_of_interest=1)[source]
spacr.utils.get_paths_from_db(df, png_df, image_type='cell_png')[source]
spacr.utils.get_sequencing_paths(src)[source]
spacr.utils.get_submodules(model, prefix='')[source]
spacr.utils.initiate_counter(counter_, lock_)[source]
spacr.utils.invert_image(image)[source]
spacr.utils.is_list_of_lists(var)[source]
spacr.utils.is_multiprocessing_process(process)[source]

Check if the process is a multiprocessing process.

spacr.utils.jaccard_index(mask1, mask2)[source]
spacr.utils.lasso_reg(merged_df, alpha_value=0.01, reg_type='lasso')[source]
spacr.utils.load_image(image_path)[source]

Load and preprocess an image.

spacr.utils.load_image_paths(c, visualize)[source]
spacr.utils.mask_object_count(mask)[source]

Counts the number of objects in a given mask.

Parameters: - mask: numpy.ndarray. The mask containing object labels.

Returns: - int. The number of objects in the mask.

spacr.utils.match_masks(true_masks, pred_masks, iou_threshold)[source]
spacr.utils.measure_test_mode(settings)[source]
spacr.utils.merge_dataframes(df, image_paths_df, verbose)[source]
spacr.utils.merge_regression_res_with_metadata(results_file, metadata_file, name='_metadata')[source]
spacr.utils.merge_touching_objects(mask, threshold=0.25)[source]

Merges touching objects in a binary mask based on the percentage of their shared boundary.

Parameters:
  • mask (ndarray) – Binary mask representing objects.

  • threshold (float, optional) – Threshold value for merging objects. Defaults to 0.25.

Returns:

Merged mask.

Return type:

ndarray

spacr.utils.model_metrics(model)[source]
spacr.utils.normalize_to_dtype(array, p1=2, p2=98, percentile_list=None, new_dtype=None)[source]

Normalize each image in the stack to its own percentiles.

Parameters: - array: numpy array The input stack to be normalized. - p1: int, optional The lower percentile value for normalization. Default is 2. - p2: int, optional The upper percentile value for normalization. Default is 98. - percentile_list: list, optional A list of pre-calculated percentiles for each image in the stack. Default is None.

Returns: - new_stack: numpy array The normalized stack with the same shape as the input stack.

spacr.utils.pad_to_same_shape(mask1, mask2)[source]
spacr.utils.perform_statistical_tests(all_df, cluster_col='cluster')[source]

Perform ANOVA or Kruskal-Wallis tests depending on normality of features.

spacr.utils.pick_best_model(src)[source]
spacr.utils.plot_clusters(ax, embedding, labels, colors, cluster_centers, plot_outlines, plot_points, smooth_lines, figuresize=10, dot_size=50, verbose=False)[source]
spacr.utils.plot_clusters_grid(embedding, labels, image_nr, image_paths, colors, figuresize, black_background, verbose)[source]
spacr.utils.plot_embedding(embedding, image_paths, labels, image_nr, img_zoom, colors, plot_by_cluster, plot_outlines, plot_points, plot_images, smooth_lines, black_background, figuresize, dot_size, remove_image_canvas, verbose)[source]
spacr.utils.plot_grid(cluster_images, colors, figuresize, black_background, verbose)[source]
spacr.utils.plot_image(ax, x, y, img, img_zoom, remove_image_canvas=True)[source]
spacr.utils.plot_images_by_cluster(ax, image_paths, embedding, labels, image_nr, img_zoom, colors, cluster_indices, remove_image_canvas, verbose)[source]
spacr.utils.plot_umap_images(ax, image_paths, embedding, labels, image_nr, img_zoom, colors, plot_by_cluster, remove_image_canvas, verbose)[source]
spacr.utils.preprocess_data(df, filter_by, remove_highly_correlated, log_data, exclude)[source]

Preprocesses the given dataframe by applying filtering, removing highly correlated columns, applying log transformation, filling NaN values, and scaling the numeric data.

Args: df (pandas.DataFrame): The input dataframe. filter_by (str or None): The channel of interest to filter the dataframe by. remove_highly_correlated (bool or float): Whether to remove highly correlated columns. If a float is provided, it represents the correlation threshold. log_data (bool): Whether to apply log transformation to the numeric data. exclude (list or None): List of features to exclude from the filtering process. verbose (bool): Whether to print verbose output during preprocessing.

Returns: numpy.ndarray: The preprocessed numeric data.

Raises: ValueError: If no numeric columns are available after filtering.

spacr.utils.preprocess_image(image_path, normalize=True, image_size=224, channels=[1, 2, 3])[source]
spacr.utils.print_progress(files_processed, files_to_process, n_jobs, time_ls=None, batch_size=None, operation_type='')[source]
spacr.utils.process_masks(mask_folder, image_folder, channel, batch_size=50, n_clusters=2, plot=False)[source]
spacr.utils.process_vision_results(df, threshold=0.5)[source]
spacr.utils.random_forest_feature_importance(all_df, cluster_col='cluster')[source]

Random Forest feature importance.

spacr.utils.recommend_target_layers(model)[source]
spacr.utils.reduction_and_clustering(numeric_data, n_neighbors, min_dist, metric, eps, min_samples, clustering, reduction_method='umap', verbose=False, embedding=None, n_jobs=-1, mode='fit', model=False)[source]

Perform dimensionality reduction and clustering on the given data.

Parameters: numeric_data (np.ndarray): Numeric data for embedding and clustering. n_neighbors (int or float): Number of neighbors for UMAP or perplexity for t-SNE. min_dist (float): Minimum distance for UMAP. metric (str): Metric for UMAP and DBSCAN. eps (float): Epsilon for DBSCAN. min_samples (int): Minimum samples for DBSCAN or number of clusters for KMeans. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). verbose (bool): Whether to print verbose output. embedding (np.ndarray, optional): Precomputed embedding. Default is None. return_model (bool): Whether to return the reducer model. Default is False.

Returns: tuple: embedding, labels (and optionally the reducer model)

spacr.utils.remove_canvas(img)[source]
spacr.utils.remove_highly_correlated_columns(df, threshold=0.95, verbose=False)[source]

Removes columns from the dataframe that are highly correlated with one another.

Parameters: df (pandas.DataFrame): The DataFrame containing the data. threshold (float): The correlation threshold above which columns will be removed.

Returns: pandas.DataFrame: The DataFrame with highly correlated columns removed.

spacr.utils.remove_intensity_objects(image, mask, intensity_threshold, mode)[source]

Removes objects from the mask based on their mean intensity in the original image.

Parameters:
  • image (ndarray) – The original image.

  • mask (ndarray) – The mask containing labeled objects.

  • intensity_threshold (float) – The threshold value for mean intensity.

  • mode (str) – The mode for intensity comparison. Can be ‘low’ or ‘high’.

Returns:

The updated mask with objects removed.

Return type:

ndarray

spacr.utils.remove_low_variance_columns(df, threshold=0.01, verbose=False)[source]

Removes columns from the dataframe that have low variance.

Parameters: df (pandas.DataFrame): The DataFrame containing the data. threshold (float): The variance threshold below which columns will be removed.

Returns: pandas.DataFrame: The DataFrame with low variance columns removed.

spacr.utils.remove_noise(embedding, labels)[source]
spacr.utils.reset_mp()[source]
spacr.utils.resize_images_and_labels(images, labels, target_height, target_width, show_example=True)[source]
spacr.utils.resize_labels_back(labels, orig_dims)[source]
spacr.utils.save_file_lists(dst, data_set, ls)[source]
spacr.utils.save_settings(settings, name='settings')[source]
spacr.utils.search_reduction_and_clustering(numeric_data, n_neighbors, min_dist, metric, eps, min_samples, clustering, reduction_method, verbose, reduction_param=None, embedding=None, n_jobs=-1)[source]

Perform dimensionality reduction and clustering on the given data.

Parameters: numeric_data (np.array): Numeric data to process. n_neighbors (int): Number of neighbors for UMAP or perplexity for tSNE. min_dist (float): Minimum distance for UMAP. metric (str): Metric for UMAP, tSNE, and DBSCAN. eps (float): Epsilon for DBSCAN clustering. min_samples (int): Minimum samples for DBSCAN or number of clusters for KMeans. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). verbose (bool): Whether to print verbose output. reduction_param (dict): Additional parameters for the reduction method. embedding (np.array): Precomputed embedding (optional). n_jobs (int): Number of parallel jobs to run.

Returns: embedding (np.array): Embedding of the data. labels (np.array): Cluster labels.

spacr.utils.setup_plot(figuresize, black_background)[source]
spacr.utils.show_cam_on_image(img, mask)[source]
spacr.utils.smooth_hull_lines(cluster_data)[source]
spacr.utils.split_my_dataset(dataset, split_ratio=0.1)[source]

Splits a dataset into training and validation subsets.

Parameters:
  • dataset (torch.utils.data.Dataset) – The dataset to be split.

  • split_ratio (float, optional) – The ratio of validation samples to total samples. Defaults to 0.1.

Returns:

A tuple containing the training dataset and validation dataset.

Return type:

tuple

spacr.version module

Copyright © 2024 Something

Module contents