locpix.preprocessing.datastruc.item#
- class item(name, df, dim, channels, channel_label, histo={}, histo_edges=None, histo_mask=None, bin_sizes=None, gt_label_map={})[source]#
Bases:
object
smlm datastructure.
This is the basic datastructure which will contain all the information from a point set that is needed
- name#
Contains the name of the item
- Type:
string
- df#
Dataframe with the data contained, containing columns: ‘channel’ ‘frame’ ‘x’ ‘y’ ‘z’. If manual annotation is done an additional column ‘gt_label’ will be present
- Type:
polars dataframe
- dim#
Dimensions of data
- Type:
int
- channels#
list of ints, representing channels user wants to consider in the original data
- Type:
list
- channel_label#
The label for each channel i.e. [‘egfr’, ‘ereg’,’unk’] means channel 0 is egfr protein, channel 1 is ereg proteins and channel 2 is unknown
- Type:
list of strings
- histo#
Dictionary of 2D or 3D arrays. Each key corresponds to the channel for which the histogram contains the relevant binned data, in form [X,Y,Z] i.e. histo[1] = histogram of channel 1 localisations. Note that if considering an image, need to transpose the histogram to follow image conventions.
- Type:
dict
- histo_edges#
Tuple containing (x_edges,y_edges) or (x_edges, y_edges, z_edges) where x/y/z_edges are list of floats, each representing the edge of the bin in the original space. e.g. ([0,10,20],[13,25,20],[2,3,4])
- Type:
tuple of lists; each list contains floats
- histo_mask#
Array containing integers where each should represent a different label of the MANUAL segmentation 0 is reserved for background, is of form [X,Y,Z]
- Type:
numpy array
- bin_sizes#
Size of bins of the histogram e.g. (23.2, 34.5, 21.3)
- Type:
tuple of floats
- gt_label_map#
Dictionary with integer keys representing the gt labels for each localisation with value being a string, representing the real concept e.g. 0:’dog’, 1:’cat’
- Type:
dict
Initialises item
- __init__(name, df, dim, channels, channel_label, histo={}, histo_edges=None, histo_mask=None, bin_sizes=None, gt_label_map={})[source]#
Initialises item
Methods
__init__
(name, df, dim, channels, channel_label)Initialises item
chan_2_label
(chan)Returns the label associated with the channel specified
coord_2_histo
(histo_size)Converts localisations into histogram of desired size, with option to plot the image (histo.T).
label_2_chan
(label)Returns the channel associated with the channel label specified
load_from_parquet
(input_file)Loads item saved as .parquet file
manual_segment
([cmap, relabel, markers_loc])Manually segment the image (histogram.T).
mask_pixel_2_coord
(img_mask)For a given mask over the image (value at each pixel normally representing a label), return the dataframe with a column giving the value for each localisation.
render_histo
([labels])Render the histogram from the .parquet file
Render the segmentation of the histogram
save_df_to_csv
(csv_loc[, drop_zero_label, ...])Save the dataframe to a .csv with option to:
save_to_parquet
(save_folder[, ...])Save the dataframe to a parquet with option to drop positions which
- chan_2_label(chan)[source]#
Returns the label associated with the channel specified
- Parameters:
chan (int) – Integer representing the channel
- coord_2_histo(histo_size)[source]#
Converts localisations into histogram of desired size, with option to plot the image (histo.T). Note the interpolation is only applied for visualisation, not for the actual data in the histogram!
- Parameters:
histo_size (tuple) – Tuple representing number of bins/pixels in x,y,z
- label_2_chan(label)[source]#
Returns the channel associated with the channel label specified
- Parameters:
label (string) – String representing the label you want to find the channel for
- load_from_parquet(input_file)[source]#
Loads item saved as .parquet file
- Parameters:
input_file (string) – Location of the .parquet file to load dataitem from
- manual_segment(cmap=['green', 'red', 'blue', 'bop purple'], relabel=False, markers_loc=None)[source]#
Manually segment the image (histogram.T). Return the segmented histogram and extra column in dataframe corresponding to label. 0 should be reserved for background
- Parameters:
cmap (list of strings) – Colourmaps napari uses to plot the histograms
- Returns:
Coordinates of markers if added
- Return type:
markers (list)
- mask_pixel_2_coord(img_mask: ndarray) DataFrame [source]#
For a given mask over the image (value at each pixel normally representing a label), return the dataframe with a column giving the value for each localisation. Note that it is assumed that the img_mask is a mask of the image, therefore have to transpose img_mask for it to be in the same configuration as the histogram
Note we also use this for labels and when the img_mask represents probabilities.
- Parameters:
img_mask (np.ndarray) – Mask over the image -
reiterate (to) –
need (to convert this to histogram space) –
it (to transpose) –
- Returns:
Original dataframe with additional column with the predicted label
- Return type:
df (polars dataframe)
- render_histo(labels=None)[source]#
Render the histogram from the .parquet file
If labels are specified then the histogram is rendered in the order of these lables If not specified defaults to rendering in the channels specified in order by user e.g. [0,3,1,2] Assumes localisations have associated x_pixel and y_pixel already.
- Parameters:
labels (list) – Order of labels to stack histograms in e.g. labels=[‘egfr’,’ereg’] means all images will be returned with egfr in channel 0 and ereg in channel 1
- Returns:
Histogram of the localisation data channel_map (list) : List where the first value is the
channel in the first axis of the histogram, second value is the channel in the second axis of the histogram etc. e.g. [1,3] : 1st channel is in 1st axis, 3rd channel in 2nd axis
- label_map (list)List where the first value is the
label in the first axis of the histogram, second value is the channel in the second axis of the histogram etc. e.g. [‘egfr’,’ereg’] : egfr is in 1st axis, ereg in 2nd axis
- Return type:
histo (np.histogram)
- save_df_to_csv(csv_loc, drop_zero_label=False, drop_pixel_col=True, save_chan_label=True)[source]#
- Save the dataframe to a .csv with option to:
drop positions which are background drop the column containing pixel information save additional column with labels for each localisation
- Parameters:
csv_loc (String) – Save the csv to this location
drop_zero_label (bool) – If True then only non zero label positions are saved to csv
drop_pixel_col (bool) – If True then don’t save the column with x,y,z pixel
save_chan_label (bool) – If True then save an additional column for each localisation containing the label for each channel
- Returns:
None
- save_to_parquet(save_folder, drop_zero_label=False, drop_pixel_col=False, gt_label_map=None, overwrite=False)[source]#
- Save the dataframe to a parquet with option to drop positions which
are background and can drop the column containing pixel information
- Parameters:
save_folder (String) – Save the df to this folder
drop_zero_label (bool) – If True then only non zero label positions are saved to parquet
drop_pixel_col (bool) – If True then don’t save the column with x,y,z pixel
gt_label_map (dict) – Dictionary with integer keys representing the gt labels for each localisation with value being a string, representing the real concept e.g. 0:’dog’, 1:’cat’
overwrite (bool) – Whether to overwrite
- Returns:
None