locpix.preprocessing.datastruc.item#

class item(name, df, dim, channels, channel_label, histo={}, histo_edges=None, histo_mask=None, bin_sizes=None, gt_label_map={})[source]#

Bases: object

smlm datastructure.

This is the basic datastructure which will contain all the information from a point set that is needed

name#

Contains the name of the item

Type:

string

df#

Dataframe with the data contained, containing columns: ‘channel’ ‘frame’ ‘x’ ‘y’ ‘z’. If manual annotation is done an additional column ‘gt_label’ will be present

Type:

polars dataframe

dim#

Dimensions of data

Type:

int

channels#

list of ints, representing channels user wants to consider in the original data

Type:

list

channel_label#

The label for each channel i.e. [‘egfr’, ‘ereg’,’unk’] means channel 0 is egfr protein, channel 1 is ereg proteins and channel 2 is unknown

Type:

list of strings

histo#

Dictionary of 2D or 3D arrays. Each key corresponds to the channel for which the histogram contains the relevant binned data, in form [X,Y,Z] i.e. histo[1] = histogram of channel 1 localisations. Note that if considering an image, need to transpose the histogram to follow image conventions.

Type:

dict

histo_edges#

Tuple containing (x_edges,y_edges) or (x_edges, y_edges, z_edges) where x/y/z_edges are list of floats, each representing the edge of the bin in the original space. e.g. ([0,10,20],[13,25,20],[2,3,4])

Type:

tuple of lists; each list contains floats

histo_mask#

Array containing integers where each should represent a different label of the MANUAL segmentation 0 is reserved for background, is of form [X,Y,Z]

Type:

numpy array

bin_sizes#

Size of bins of the histogram e.g. (23.2, 34.5, 21.3)

Type:

tuple of floats

gt_label_map#

Dictionary with integer keys representing the gt labels for each localisation with value being a string, representing the real concept e.g. 0:’dog’, 1:’cat’

Type:

dict

Initialises item

__init__(name, df, dim, channels, channel_label, histo={}, histo_edges=None, histo_mask=None, bin_sizes=None, gt_label_map={})[source]#

Initialises item

Methods

__init__(name, df, dim, channels, channel_label)

Initialises item

chan_2_label(chan)

Returns the label associated with the channel specified

coord_2_histo(histo_size)

Converts localisations into histogram of desired size, with option to plot the image (histo.T).

label_2_chan(label)

Returns the channel associated with the channel label specified

load_from_parquet(input_file)

Loads item saved as .parquet file

manual_segment([cmap, relabel, markers_loc])

Manually segment the image (histogram.T).

mask_pixel_2_coord(img_mask)

For a given mask over the image (value at each pixel normally representing a label), return the dataframe with a column giving the value for each localisation.

render_histo([labels])

Render the histogram from the .parquet file

render_seg()

Render the segmentation of the histogram

save_df_to_csv(csv_loc[, drop_zero_label, ...])

Save the dataframe to a .csv with option to:

save_to_parquet(save_folder[, ...])

Save the dataframe to a parquet with option to drop positions which

chan_2_label(chan)[source]#

Returns the label associated with the channel specified

Parameters:

chan (int) – Integer representing the channel

coord_2_histo(histo_size)[source]#

Converts localisations into histogram of desired size, with option to plot the image (histo.T). Note the interpolation is only applied for visualisation, not for the actual data in the histogram!

Parameters:

histo_size (tuple) – Tuple representing number of bins/pixels in x,y,z

label_2_chan(label)[source]#

Returns the channel associated with the channel label specified

Parameters:

label (string) – String representing the label you want to find the channel for

load_from_parquet(input_file)[source]#

Loads item saved as .parquet file

Parameters:

input_file (string) – Location of the .parquet file to load dataitem from

manual_segment(cmap=['green', 'red', 'blue', 'bop purple'], relabel=False, markers_loc=None)[source]#

Manually segment the image (histogram.T). Return the segmented histogram and extra column in dataframe corresponding to label. 0 should be reserved for background

Parameters:

cmap (list of strings) – Colourmaps napari uses to plot the histograms

Returns:

Coordinates of markers if added

Return type:

markers (list)

mask_pixel_2_coord(img_mask: ndarray) DataFrame[source]#

For a given mask over the image (value at each pixel normally representing a label), return the dataframe with a column giving the value for each localisation. Note that it is assumed that the img_mask is a mask of the image, therefore have to transpose img_mask for it to be in the same configuration as the histogram

Note we also use this for labels and when the img_mask represents probabilities.

Parameters:
  • img_mask (np.ndarray) – Mask over the image -

  • reiterate (to) –

  • need (to convert this to histogram space) –

  • it (to transpose) –

Returns:

Original dataframe with additional column with the predicted label

Return type:

df (polars dataframe)

render_histo(labels=None)[source]#

Render the histogram from the .parquet file

If labels are specified then the histogram is rendered in the order of these lables If not specified defaults to rendering in the channels specified in order by user e.g. [0,3,1,2] Assumes localisations have associated x_pixel and y_pixel already.

Parameters:

labels (list) – Order of labels to stack histograms in e.g. labels=[‘egfr’,’ereg’] means all images will be returned with egfr in channel 0 and ereg in channel 1

Returns:

Histogram of the localisation data channel_map (list) : List where the first value is the

channel in the first axis of the histogram, second value is the channel in the second axis of the histogram etc. e.g. [1,3] : 1st channel is in 1st axis, 3rd channel in 2nd axis

label_map (list)List where the first value is the

label in the first axis of the histogram, second value is the channel in the second axis of the histogram etc. e.g. [‘egfr’,’ereg’] : egfr is in 1st axis, ereg in 2nd axis

Return type:

histo (np.histogram)

render_seg()[source]#

Render the segmentation of the histogram

save_df_to_csv(csv_loc, drop_zero_label=False, drop_pixel_col=True, save_chan_label=True)[source]#
Save the dataframe to a .csv with option to:

drop positions which are background drop the column containing pixel information save additional column with labels for each localisation

Parameters:
  • csv_loc (String) – Save the csv to this location

  • drop_zero_label (bool) – If True then only non zero label positions are saved to csv

  • drop_pixel_col (bool) – If True then don’t save the column with x,y,z pixel

  • save_chan_label (bool) – If True then save an additional column for each localisation containing the label for each channel

Returns:

None

save_to_parquet(save_folder, drop_zero_label=False, drop_pixel_col=False, gt_label_map=None, overwrite=False)[source]#
Save the dataframe to a parquet with option to drop positions which

are background and can drop the column containing pixel information

Parameters:
  • save_folder (String) – Save the df to this folder

  • drop_zero_label (bool) – If True then only non zero label positions are saved to parquet

  • drop_pixel_col (bool) – If True then don’t save the column with x,y,z pixel

  • gt_label_map (dict) – Dictionary with integer keys representing the gt labels for each localisation with value being a string, representing the real concept e.g. 0:’dog’, 1:’cat’

  • overwrite (bool) – Whether to overwrite

Returns:

None