spacr.utils
Module Contents
- spacr.utils.debug(enabled=True, logger_name=None)[source]
Decorator that temporarily sets the given logger to DEBUG while the function runs, then restores the old level.
- Parameters:
enabled (bool) – If False, decorator is a no-op.
logger_name (str | None) – Name of the logger to tweak. Defaults to the function’s module logger.
- spacr.utils.calculate_activation_correlations(inputs, activation_maps, file_names, manders_thresholds=[15, 50, 75])[source]
Calculates Pearson and Manders correlations between input image channels and activation map channels.
- Parameters:
inputs – A batch of input images, Tensor of shape (batch_size, channels, height, width)
activation_maps – A batch of activation maps, Tensor of shape (batch_size, channels, height, width)
file_names – List of file names corresponding to each image in the batch.
manders_thresholds – List of intensity percentiles to calculate Manders correlation.
- Returns:
- A DataFrame with columns for pairwise correlations (Pearson and Manders)
between input channels and activation map channels.
- Return type:
df_correlations
- spacr.utils.load_settings(csv_file_path, show=False, setting_key='setting_key', setting_value='setting_value')[source]
Convert a CSV file with ‘settings_key’ and ‘settings_value’ columns into a dictionary. Handles special cases where values are lists, tuples, booleans, None, integers, floats, and nested dictionaries.
- Parameters:
csv_file_path (str) – The path to the CSV file.
show (bool) – Whether to display the dataframe (for debugging).
setting_key (str) – The name of the column that contains the setting keys.
setting_value (str) – The name of the column that contains the setting values.
- Returns:
A dictionary where ‘settings_key’ are the keys and ‘settings_value’ are the values.
- Return type:
dict
- spacr.utils.print_progress(files_processed, files_to_process, n_jobs, time_ls=None, batch_size=None, operation_type='')[source]
- spacr.utils.is_multiprocessing_process(process)[source]
Check if the process is a multiprocessing process.
- spacr.utils.mask_object_count(mask)[source]
Counts the number of objects in a given mask.
Parameters: - mask: numpy.ndarray. The mask containing object labels.
Returns: - int. The number of objects in the mask.
- spacr.utils.normalize_to_dtype(array, p1=2, p2=98, percentile_list=None, new_dtype=None)[source]
Normalize each image in the stack to its own percentiles.
Parameters: - array: numpy array The input stack to be normalized. - p1: int, optional The lower percentile value for normalization. Default is 2. - p2: int, optional The upper percentile value for normalization. Default is 98. - percentile_list: list, optional A list of pre-calculated percentiles for each image in the stack. Default is None.
Returns: - new_stack: numpy array The normalized stack with the same shape as the input stack.
- spacr.utils.annotate_conditions(df, cells=None, cell_loc=None, pathogens=None, pathogen_loc=None, treatments=None, treatment_loc=None)[source]
Annotates conditions in a DataFrame based on specified criteria and combines them into a ‘condition’ column. NaN is used for missing values, and they are excluded from the ‘condition’ column.
- Parameters:
df (pandas.DataFrame) – The DataFrame to annotate.
cells (list/str, optional) – Host cell types. Defaults to None.
cell_loc (list of lists, optional) – Values for each host cell type. Defaults to None.
pathogens (list/str, optional) – Pathogens. Defaults to None.
pathogen_loc (list of lists, optional) – Values for each pathogen. Defaults to None.
treatments (list/str, optional) – Treatments. Defaults to None.
treatment_loc (list of lists, optional) – Values for each treatment. Defaults to None.
- Returns:
Annotated DataFrame with a combined ‘condition’ column.
- Return type:
pandas.DataFrame
- class spacr.utils.Cache(max_size)[source]
A class representing a cache with a maximum size.
- Parameters:
max_size (int) – The maximum size of the cache.
- class spacr.utils.ScaledDotProductAttention(d_k)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.SelfAttention(in_channels, d_k)[source]
Bases:
torch.nn.ModuleSelf-Attention module that applies scaled dot-product attention mechanism.
- Parameters:
in_channels (int) – Number of input channels.
d_k (int) – Dimensionality of the key and query vectors.
- class spacr.utils.EarlyFusion(in_channels)[source]
Bases:
torch.nn.ModuleEarly Fusion module for image classification.
- Parameters:
in_channels (int) – Number of input channels.
- class spacr.utils.SpatialAttention(kernel_size=7)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.MultiScaleBlockWithAttention(in_channels, out_channels)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.CustomCellClassifier(num_classes, pathogen_channel, use_attention, use_checkpoint, dropout_rate)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.TorchModel(model_name: str = 'resnet50', pretrained: bool = True, dropout_rate: float | None = None, use_checkpoint: bool = False, num_classes: int = 2, multilabel: bool = False)[source]
Bases:
torch.nn.Module- Thin wrapper around TorchVision classification backbones that:
Loads a requested backbone with (optional) pretrained weights
Strips its classification head to expose features
Adds a simple Linear ‘spacr’ classifier with num_classes outputs
Optionally applies dropout before the final classifier
Supports gradient checkpointing
Works with most TorchVision classification models. Non-classification (detection/segmentation) models are rejected with a clear error.
- class spacr.utils.TorchModel_v2(model_name: str = 'resnet50', pretrained: bool = True, dropout_rate: float = None, use_checkpoint: bool = False, num_classes: int = 2, multilabel: bool = False)[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- class spacr.utils.FocalLossWithLogits(alpha=1.0, gamma=2.0, reduction='mean')[source]
Bases:
torch.nn.Module- Focal loss that works for:
binary: logits shape (N,) or (N,1); target float (N,) in {0,1}
multiclass (single-label): logits shape (N,C); target long (N,) in [0..C-1]
multilabel: logits shape (N,C); target float (N,C) in {0,1}
- Parameters:
alpha (float or Tensor) – class balancing factor. If float for multiclass, applied uniformly; or provide a 1D tensor of shape (C,).
gamma (float) – focusing parameter.
reduction – ‘mean’|’sum’|’none’
- class spacr.utils.ResNet(resnet_type='resnet50', dropout_rate=None, use_checkpoint=False, init_weights='imagenet')[source]
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- spacr.utils.split_my_dataset(dataset, split_ratio=0.1)[source]
Splits a dataset into training and validation subsets.
- Parameters:
dataset (torch.utils.data.Dataset) – The dataset to be split.
split_ratio (float, optional) – The ratio of validation samples to total samples. Defaults to 0.1.
- Returns:
A tuple containing the training dataset and validation dataset.
- Return type:
tuple
- spacr.utils.classification_metrics(all_labels, prediction_pos_probs, loss, epoch)[source]
Calculate classification metrics for binary classification.
Parameters: - all_labels (list): List of true labels. - prediction_pos_probs (list): List of predicted positive probabilities. - loader_name (str): Name of the data loader. - loss (float): Loss value. - epoch (int): Epoch number.
Returns: - data_df (DataFrame): DataFrame containing the calculated metrics.
- spacr.utils.compute_irm_penalty(losses, dummy_w, device)[source]
Computes the Invariant Risk Minimization (IRM) penalty.
- Parameters:
losses (list) – A list of losses.
dummy_w (torch.Tensor) – A dummy weight tensor.
device (torch.device) – The device to perform computations on.
- Returns:
The computed IRM penalty.
- Return type:
float
- spacr.utils.choose_model(model_type: str, device: torch.device, init_weights: bool = True, dropout_rate: float = 0.0, use_checkpoint: bool = False, channels: int = 3, height: int = 224, width: int = 224, chan_dict: dict[str, Any] | None = None, num_classes: int = 2, verbose: bool = False) torch.nn.Module | None[source]
Pick and configure a model for classification (binary or multiclass).
- Parameters:
model_type – TorchVision model name (e.g. ‘resnet50’, ‘vit_b_16’, ‘swin_t’, ‘maxvit_t’) or ‘custom’
device – Target device (caller will move the returned model)
init_weights – Load pretrained weights if available
dropout_rate – Dropout probability applied before the classifier head (None/0 to disable)
use_checkpoint – Enable gradient checkpointing for the backbone
channels – Input channels (TorchVision pretrained assumes 3; custom handling is up to caller)
height – Nominal input size for a forward sanity-check
width – Nominal input size for a forward sanity-check
chan_dict – Optional dict passed to a custom model (if you implement one)
num_classes – Number of output classes (>=2 => softmax-style head, ==1 => single-logit BCE head)
verbose – If True, print the model structure
- Returns:
nn.Module or None if invalid.
- spacr.utils.choose_model_v2(model_type, device, init_weights=True, dropout_rate=0.0, use_checkpoint=False, channels=3, height=224, width=224, chan_dict=None, num_classes=2, verbose=False)[source]
Pick and configure a model for classification (binary or multiclass).
- Parameters:
model_type (str) – Any torchvision model name or ‘custom’.
device (torch.device or str) – Device string (not used here; caller moves the model).
init_weights (bool) – Load pretrained weights where supported.
dropout_rate (float) – Dropout probability to apply inside the backbone head.
use_checkpoint (bool) – Enable gradient checkpointing (if model supports it).
channels (int) – Input channel count (not used by TorchVision backbones).
height (int) – Nominal input size (not strictly required here).
width (int) – Nominal input size (not strictly required here).
chan_dict (dict|None) – For ‘custom’ models (e.g. pathogen_channel, etc.).
num_classes (int) – Number of output classes (>=2 for multiclass; ==1 for BCE).
- Returns:
nn.Module
- spacr.utils.calculate_loss(output, target, prefer_focal=False, gamma=2.0, alpha=1.0, reduction='mean')[source]
Auto-select loss for binary, multiclass, or multilabel based on shapes/dtypes.
Binary: logits (N,1), float targets in {0,1} -> BCEWithLogits / focal-BCE
Multiclass: logits (N,C), long targets (N,) -> CrossEntropy / focal-CE
Multilabel: logits (N,C), float targets (N,C) -> BCEWithLogits / focal-BCE
- spacr.utils.suggest_training_changes(dst, train_csv=None, val_csv=None, last_k=25, min_epochs=10, gap_threshold_acc=0.05, plateau_eps=0.001, noisy_var_ratio=0.03)[source]
Analyze saved training/validation progress CSVs and propose concrete training changes.
- Parameters:
dst (str) – Folder where progress CSVs were saved.
train_csv (str|None) – Optional explicit path to train CSV. Autodetected if None.
val_csv (str|None) – Optional explicit path to val CSV. Autodetected if None.
last_k (int) – How many recent epochs to use for trend/plateau checks.
min_epochs (int) – Minimum epochs before issuing most suggestions.
gap_threshold_acc (float) – Accuracy generalization gap threshold (train - val).
plateau_eps (float) – Absolute slope threshold (|d loss / d epoch|) to call a plateau.
noisy_var_ratio (float) – If stdev(val_loss_last_k) > noisy_var_ratio * mean(val_loss_last_k), flag instability.
- Returns:
summary: dict of key scalars (best_epoch, best_val_loss, final metrics, slopes, gaps)
flags: list of short machine-readable flags
suggestions: list of concrete, ordered suggestions (strings)
- Return type:
dict with keys
- spacr.utils.suggest_training_changes(dst, train_csv=None, val_csv=None, last_k=25, min_epochs=10, gap_threshold_acc=0.05, plateau_eps=0.001, noisy_var_ratio=0.03)[source]
Analyze saved training/validation progress CSVs and propose concrete training changes.
- Parameters:
dst (str) – Folder where progress CSVs were saved.
train_csv (str|None) – Optional explicit path to train CSV. Autodetected if None.
val_csv (str|None) – Optional explicit path to val CSV. Autodetected if None.
last_k (int) – How many recent epochs to use for trend/plateau checks.
min_epochs (int) – Minimum epochs before issuing most suggestions.
gap_threshold_acc (float) – Accuracy generalization gap threshold (train - val).
plateau_eps (float) – Absolute slope threshold (|d loss / d epoch|) to call a plateau.
noisy_var_ratio (float) – If stdev(val_loss_last_k) > noisy_var_ratio * mean(val_loss_last_k), flag instability.
- Returns:
summary: dict of key scalars (best_epoch, best_val_loss, final metrics, slopes, gaps)
flags: list of short machine-readable flags
suggestions: list of concrete, ordered suggestions (strings)
- Return type:
dict with keys
- spacr.utils.suggest_training_changes_v1(dst, train_csv=None, val_csv=None, last_k=25, min_epochs=10, gap_threshold_acc=0.05, plateau_eps=0.001, noisy_var_ratio=0.03)[source]
Analyze saved training/validation progress CSVs and propose concrete training changes.
- Parameters:
dst (str) – Folder where progress CSVs were saved.
train_csv (str|None) – Optional explicit path to train CSV. Autodetected if None.
val_csv (str|None) – Optional explicit path to val CSV. Autodetected if None.
last_k (int) – How many recent epochs to use for trend/plateau checks.
min_epochs (int) – Minimum epochs before issuing most suggestions.
gap_threshold_acc (float) – Accuracy generalization gap threshold (train - val).
plateau_eps (float) – Absolute slope threshold (|d loss / d epoch|) to call a plateau.
noisy_var_ratio (float) – If stdev(val_loss_last_k) > noisy_var_ratio * mean(val_loss_last_k), flag instability.
- Returns:
summary: dict of key scalars (best_epoch, best_val_loss, final metrics, slopes, gaps)
flags: list of short machine-readable flags
suggestions: list of concrete, ordered suggestions (strings)
- Return type:
dict with keys
- spacr.utils.estimate_class_counts_v1(loader, num_classes: int) torch.Tensor[source]
One cheap pass on CPU over labels to get global class counts.
- spacr.utils.estimate_class_counts(loader, num_classes: int, src=None, classes=None) torch.Tensor[source]
Get per-class sample counts.
If src and classes are provided, counts files in the class folders directly — no image loading, no DataLoader iteration. This avoids stalls on slow filesystems (NAS) and potential deadlocks with persistent_workers.
Falls back to iterating the DataLoader only if folder info is missing.
- spacr.utils.build_loss(loss_type: str = 'ce', num_classes: int = 2, class_counts: torch.Tensor | None = None, label_smoothing: float = 0.0, focal_gamma: float = 2.0, focal_alpha: float | None = None, logit_adjust_tau: float = 0.0, asl_gamma_pos: float = 0.0, asl_gamma_neg: float = 4.0, asl_clip: float = 0.05)[source]
Returns a closure loss_fn(logits, target). Python 3.9+ compatible. Supported loss_type:
‘ce’, ‘ce_smooth’, ‘ce_weighted’, ‘focal_ce’, ‘bce’, ‘focal_bce’, ‘logit_adjust_ce’, ‘asl’, ‘auto’
Notes
num_classes==1 -> binary (BCE variants)
num_classes>=2 -> multiclass (CE variants)
- spacr.utils.check_multicollinearity(x)[source]
Checks multicollinearity of the predictors by computing the VIF.
- spacr.utils.resize_images_and_labels(images, labels, target_height, target_width, show_example=True)[source]
- spacr.utils.compute_segmentation_ap(true_masks, pred_masks, iou_thresholds=np.linspace(0.5, 0.95, 10))[source]
- spacr.utils.merge_touching_objects(mask, threshold=0.25)[source]
Merges touching objects in a binary mask based on the percentage of their shared boundary.
- Parameters:
mask (ndarray) – Binary mask representing objects.
threshold (float, optional) – Threshold value for merging objects. Defaults to 0.25.
- Returns:
Merged mask.
- Return type:
ndarray
- spacr.utils.remove_intensity_objects(image, mask, intensity_threshold, mode)[source]
Removes objects from the mask based on their mean intensity in the original image.
- Parameters:
image (ndarray) – The original image.
mask (ndarray) – The mask containing labeled objects.
intensity_threshold (float) – The threshold value for mean intensity.
mode (str) – The mode for intensity comparison. Can be ‘low’ or ‘high’.
- Returns:
The updated mask with objects removed.
- Return type:
ndarray
- spacr.utils.preprocess_image(image_path, normalize=True, image_size=224, channels=[1, 2, 3])[source]
- spacr.utils.class_visualization(target_y, model_path, dtype, img_size=224, channels=[0, 1, 2], l2_reg=0.001, learning_rate=25, num_iterations=100, blur_every=10, max_jitter=16, show_every=25, class_names=['nc', 'pc'])[source]
- spacr.utils.reduction_and_clustering(numeric_data, n_neighbors, min_dist, metric, eps, min_samples, clustering, reduction_method='umap', verbose=False, embedding=None, n_jobs=-1, mode='fit', model=False)[source]
Perform dimensionality reduction and clustering on the given data.
Parameters: numeric_data (np.ndarray): Numeric data for embedding and clustering. n_neighbors (int or float): Number of neighbors for UMAP or perplexity for t-SNE. min_dist (float): Minimum distance for UMAP. metric (str): Metric for UMAP and DBSCAN. eps (float): Epsilon for DBSCAN. min_samples (int): Minimum samples for DBSCAN or number of clusters for KMeans. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). verbose (bool): Whether to print verbose output. embedding (np.ndarray, optional): Precomputed embedding. Default is None. return_model (bool): Whether to return the reducer model. Default is False.
Returns: tuple: embedding, labels (and optionally the reducer model)
- spacr.utils.plot_embedding(embedding, image_paths, labels, image_nr, img_zoom, colors, plot_by_cluster, plot_outlines, plot_points, plot_images, smooth_lines, black_background, figuresize, dot_size, remove_image_canvas, verbose)[source]
- spacr.utils.plot_clusters(ax, embedding, labels, colors, cluster_centers, plot_outlines, plot_points, smooth_lines, figuresize=10, dot_size=50, verbose=False)[source]
- spacr.utils.plot_umap_images(ax, image_paths, embedding, labels, image_nr, img_zoom, colors, plot_by_cluster, remove_image_canvas, verbose)[source]
- spacr.utils.plot_images_by_cluster(ax, image_paths, embedding, labels, image_nr, img_zoom, colors, cluster_indices, remove_image_canvas, verbose)[source]
- spacr.utils.plot_clusters_grid(embedding, labels, image_nr, image_paths, colors, figuresize, black_background, verbose)[source]
- spacr.utils.preprocess_data(df, filter_by, remove_highly_correlated, log_data, exclude, column_list=False)[source]
Preprocesses the given dataframe by applying filtering, removing highly correlated columns, applying log transformation, filling NaN values, and scaling the numeric data.
Args: df (pandas.DataFrame): The input dataframe. filter_by (str or None): The channel of interest to filter the dataframe by. remove_highly_correlated (bool or float): Whether to remove highly correlated columns. If a float is provided, it represents the correlation threshold. log_data (bool): Whether to apply log transformation to the numeric data. exclude (list or None): List of features to exclude from the filtering process. verbose (bool): Whether to print verbose output during preprocessing.
Returns: numpy.ndarray: The preprocessed numeric data.
Raises: ValueError: If no numeric columns are available after filtering.
- spacr.utils.remove_low_variance_columns(df, threshold=0.01, verbose=False)[source]
Removes columns from the dataframe that have low variance.
Parameters: df (pandas.DataFrame): The DataFrame containing the data. threshold (float): The variance threshold below which columns will be removed.
Returns: pandas.DataFrame: The DataFrame with low variance columns removed.
Removes columns from the dataframe that are highly correlated with one another.
Parameters: df (pandas.DataFrame): The DataFrame containing the data. threshold (float): The correlation threshold above which columns will be removed.
Returns: pandas.DataFrame: The DataFrame with highly correlated columns removed.
- spacr.utils.filter_dataframe_features(df, channel_of_interest, exclude=None, remove_low_variance_features=True, remove_highly_correlated_features=True, verbose=False)[source]
Filter the dataframe df based on the specified channel_of_interest and exclude parameters.
Parameters: - df (pandas.DataFrame): The input dataframe to be filtered. - channel_of_interest (str, int, list, None): The channel(s) of interest to filter the dataframe. If None, no filtering is applied. If ‘morphology’, only morphology features are included.If an integer, only the specified channel is included. If a list, only the specified channels are included.If a string, only the specified channel is included. - exclude (str, list, None): The feature(s) to exclude from the filtered dataframe. If None, no features are excluded. If a string, the specified feature is excluded.If a list, the specified features are excluded.
Returns: - filtered_df (pandas.DataFrame): The filtered dataframe based on the specified parameters. - features (list): The list of selected features after filtering.
- spacr.utils.find_non_overlapping_position(x, y, image_positions, threshold, max_attempts=100)[source]
- spacr.utils.search_reduction_and_clustering(numeric_data, n_neighbors, min_dist, metric, eps, min_samples, clustering, reduction_method, verbose, reduction_param=None, embedding=None, n_jobs=-1)[source]
Perform dimensionality reduction and clustering on the given data.
Parameters: numeric_data (np.array): Numeric data to process. n_neighbors (int): Number of neighbors for UMAP or perplexity for tSNE. min_dist (float): Minimum distance for UMAP. metric (str): Metric for UMAP, tSNE, and DBSCAN. eps (float): Epsilon for DBSCAN clustering. min_samples (int): Minimum samples for DBSCAN or number of clusters for KMeans. clustering (str): Clustering method (‘DBSCAN’ or ‘KMeans’). reduction_method (str): Dimensionality reduction method (‘UMAP’ or ‘tSNE’). verbose (bool): Whether to print verbose output. reduction_param (dict): Additional parameters for the reduction method. embedding (np.array): Precomputed embedding (optional). n_jobs (int): Number of parallel jobs to run.
Returns: embedding (np.array): Embedding of the data. labels (np.array): Cluster labels.
- spacr.utils.extract_features(image_paths, resnet=resnet50)[source]
Extract features from images using a pre-trained ResNet model.
- spacr.utils.check_normality(series)[source]
Helper function to check if a feature is normally distributed.
- spacr.utils.random_forest_feature_importance(all_df, cluster_col='cluster')[source]
Random Forest feature importance.
- spacr.utils.perform_statistical_tests(all_df, cluster_col='cluster')[source]
Perform ANOVA or Kruskal-Wallis tests depending on normality of features.
- spacr.utils.combine_results(rf_df, anova_df, kruskal_df)[source]
Combine the results into a single DataFrame.
- spacr.utils.cluster_feature_analysis(all_df, cluster_col='cluster')[source]
Perform Random Forest feature importance, ANOVA for normally distributed features, and Kruskal-Wallis for non-normally distributed features. Combine results into a single DataFrame.
- spacr.utils.process_mask_file_adjust_cell(file_name, parasite_folder, cell_folder, nuclei_folder, organelle_folder=None, overlap_threshold=5, perimeter_threshold=30)[source]
- spacr.utils.adjust_cell_masks(parasite_folder, cell_folder, nuclei_folder, organelle_folder=None, overlap_threshold=5, perimeter_threshold=30, n_jobs=None)[source]
- spacr.utils.process_mask_file_adjust_cell_v1(file_name, parasite_folder, cell_folder, nuclei_folder, organelle_folder, overlap_threshold, perimeter_threshold)[source]
- spacr.utils.adjust_cell_masks_v1(parasite_folder, cell_folder, nuclei_folder, organelle_folder, overlap_threshold=5, perimeter_threshold=30, n_jobs=None)[source]
- spacr.utils.process_masks(mask_folder, image_folder, channel, batch_size=50, n_clusters=2, plot=False)[source]
- spacr.utils.merge_regression_res_with_metadata(results_file, metadata_file, name='_metadata')[source]
- spacr.utils.augment_image(image)[source]
Perform data augmentation by rotating and reflecting the image.
Parameters: - image (PIL Image or numpy array): The input image.
Returns: - augmented_images (list): A list of augmented images.
- spacr.utils.augment_dataset(dataset, is_grayscale=False)[source]
Perform data augmentation on the entire dataset by rotating and reflecting the images.
Parameters: - dataset (list of tuples): The input dataset, each entry is a tuple (image, label, filename). - is_grayscale (bool): Flag indicating if the images are grayscale.
Returns: - augmented_dataset (list of tuples): A dataset with augmented (image, label, filename) tuples.
- spacr.utils.convert_and_relabel_masks(folder_path)[source]
Converts all int64 npy masks in a folder to uint16 with relabeling to ensure all labels are retained.
Parameters: - folder_path (str): The path to the folder containing int64 npy mask files.
Returns: - None
- spacr.utils.download_models(repo_id='einarolafsson/models', retries=5, delay=5)[source]
Downloads all model files from Hugging Face and stores them in the resources/models directory within the installed spacr package.
- Parameters:
repo_id (str) – The repository ID on Hugging Face (default is ‘einarolafsson/models’).
retries (int) – Number of retry attempts in case of failure.
delay (int) – Delay in seconds between retries.
- Returns:
The local path to the downloaded models.
- Return type:
str
- spacr.utils.generate_cytoplasm_mask(nucleus_mask, cell_mask)[source]
Generates a cytoplasm mask from nucleus and cell masks.
Parameters: - nucleus_mask (np.array): Binary or segmented mask of the nucleus (non-zero values represent nucleus). - cell_mask (np.array): Binary or segmented mask of the whole cell (non-zero values represent cell).
Returns: - cytoplasm_mask (np.array): Mask for the cytoplasm (1 for cytoplasm, 0 for nucleus and pathogens).
- spacr.utils.add_column_to_database(settings)[source]
Adds a new column to the database table by matching on a common column from the DataFrame. If the column already exists in the database, it adds the column with a suffix. NaN values will remain as NULL in the database.
- Parameters:
settings (dict) – A dictionary containing the following keys: csv_path (str): Path to the CSV file with the data to be added. db_path (str): Path to the SQLite database (or connection string for other databases). table_name (str): The name of the table in the database. update_column (str): The name of the new column in the DataFrame to add to the database. match_column (str): The common column used to match rows.
- Returns:
None
- spacr.utils.fill_holes_in_mask(mask)[source]
Fill holes in each object in the mask while keeping objects separated.
- Parameters:
mask (np.ndarray) – A labeled mask where each object has a unique integer value.
- Returns:
A mask with holes filled and original labels preserved.
- Return type:
np.ndarray
- spacr.utils.group_feature_class(df, feature_groups=['cell', 'cytoplasm', 'nucleus', 'pathogen'], name='compartment')[source]
- spacr.utils.filter_and_save_csv(input_csv, output_csv, column_name, upper_threshold, lower_threshold)[source]
Reads a CSV into a DataFrame, filters rows based on a column for values > upper_threshold and < lower_threshold, and saves the filtered DataFrame to a new CSV file.
- Parameters:
input_csv (str) – Path to the input CSV file.
output_csv (str) – Path to save the filtered CSV file.
column_name (str) – Column name to apply the filters on.
upper_threshold (float) – Upper threshold for filtering (values greater than this are retained).
lower_threshold (float) – Lower threshold for filtering (values less than this are retained).
- Returns:
None
- spacr.utils.extract_tar_bz2_files(folder_path)[source]
Extracts all .tar.bz2 files in the given folder into subfolders with the same name as the tar file.
- Parameters:
folder_path (str) – Path to the folder containing .tar.bz2 files.
- spacr.utils.calculate_shortest_distance(df, object1, object2)[source]
Calculate the shortest edge-to-edge distance between two objects (e.g., pathogen and nucleus).
Parameters: - df: Pandas DataFrame containing measurements - object1: String, name of the first object (e.g., “pathogen”) - object2: String, name of the second object (e.g., “nucleus”)
Returns: - df: Pandas DataFrame with a new column for shortest edge-to-edge distance.
- spacr.utils.format_path_for_system(path)[source]
Takes a file path and reformats it to be compatible with the current operating system.
- Parameters:
path (str) – The file path to be formatted.
- Returns:
The formatted path for the current operating system.
- Return type:
str
- spacr.utils.normalize_src_path(src)[source]
Ensures that the ‘src’ value is properly formatted as either a list of strings or a single string.
- Parameters:
src (str or list) – The input source path(s).
- Returns:
- A correctly formatted list if the input was a list (or string representation of a list),
otherwise a single string.
- Return type:
list or str
- spacr.utils.generate_image_path_map(root_folder, valid_extensions=('tif', 'tiff', 'png', 'jpg', 'jpeg', 'bmp', 'czi', 'nd2', 'lif'))[source]
Recursively scans a folder and its subfolders for images, then creates a mapping of: {original_image_path: new_image_path}, where the new path includes all subfolder names.
- Parameters:
root_folder (str) – The root directory to scan for images.
valid_extensions (tuple) – Tuple of valid image file extensions.
- Returns:
A dictionary mapping original image paths to their new paths.
- Return type:
dict
- spacr.utils.copy_images_to_consolidated(image_path_map, root_folder)[source]
Copies images from their original locations to a ‘consolidated’ folder, renaming them according to the generated dictionary.
- Parameters:
image_path_map (dict) – Dictionary mapping {original_path: new_path}.
root_folder (str) – The root directory where the ‘consolidated’ folder will be created.
- spacr.utils.remove_outliers_by_group(df, group_col, value_col, method='iqr', threshold=1.5)[source]
Removes outliers from value_col within each group defined by group_col.
- Parameters:
df (pd.DataFrame) – The input DataFrame.
group_col (str) – Column name to group by.
value_col (str) – Column containing values to check for outliers.
method (str) – ‘iqr’ or ‘zscore’.
threshold (float) – Threshold multiplier for IQR (default 1.5) or z-score.
- Returns:
A DataFrame with outliers removed.
- Return type:
pd.DataFrame