Using ez_zarr to explore Fractal outputs

Author

Silvia Barbiero, Michael Stadler

Goal

The aim of ez_zarr is to provide easy, high-level access to high content screening microscopy data, stored as OME-Zarr filesets according to the NGFF specifications with additional metadata fields, for example the ones generated by the Fractal platform.

The goal is that users can write simple scripts working with plates, wells and fields of view, without having to understand how these are represented within an OME-Zarr fileset.

Installation

ez_zarr is hosted on GitHub at github.com/fmicompbio/ez_zarr and the documentation of can be found at fmicompbio.github.io/ez_zarr.

To install it, activate a conda environment, or create a new one (using python 3.9 or newer), and install ez_zarr from GitHub using:

#| eval: False
pip install git+ssh://git@github.com/fmicompbio/ez_zarr.git

Quickstart

Here are some examples of how you can use ez_zarr:

Load packages

# import numpy as np
# import anndata as ad
import matplotlib.pyplot as plt
import matplotlib.patches as patches

from ez_zarr import hcs_wrappers

Open OME-Zarr file set

We open a single OME-Zarr directory, typically representing a single plate:

zarr_directory = './231202-ExM-d0-A04-rbtKDEL.zarr'
plate = hcs_wrappers.FractalZarr(zarr_directory, name = 'ExM-d0')
plate
FractalZarr ExM-d0
  path: ./231202-ExM-d0-A04-rbtKDEL.zarr
  n_wells: 1
  n_channels: 4 (DAPI, 488, A03_C03, FFFF00)
  n_pyramid_levels: 5
  pyramid_zyx_scalefactor: [1. 2. 2.]
  full_resolution_zyx_spacing: [30.0, 0.325, 0.325]
  segmentations: 
  tables (measurements): FOV_ROI_table, well_ROI_table

plate represents an acquisition of 1 wells with 4 channels.

There are 5 pyramid_levels available (0 to 4), meaning that in addition to the full resolution data (level 0), we have four more levels that provide the data in 2-fold lower resolutions (see pyramid_zyx_scalefactor), for example for faster plotting.

This OME-Zarr fileset does not contain any segmentations, but some tables, such as FOV_ROI_table, which contains the coordinates of the fields of view.

Get information from plate

You can obtain information on the content available in the fileset:

# path to the OME-Zarr fileset
plate.get_path()
'./231202-ExM-d0-A04-rbtKDEL.zarr'
# available channels
plate.get_channels()
[{'active': True,
  'coefficient': 1,
  'color': '00FFFF',
  'inverted': False,
  'label': 'DAPI',
  'wavelength_id': 'A01_C01',
  'window': {'end': 6000, 'max': 65535, 'min': 0, 'start': 110}},
 {'active': True,
  'coefficient': 1,
  'color': '008000',
  'inverted': False,
  'label': '488',
  'wavelength_id': 'A02_C02'},
 {'active': True,
  'coefficient': 1,
  'color': 'FF0000',
  'inverted': False,
  'label': 'A03_C03',
  'wavelength_id': 'A03_C03'},
 {'active': True,
  'coefficient': 1,
  'color': 'FFFF00',
  'inverted': False,
  'label': 'FFFF00',
  'wavelength_id': 'A04_C04'}]
# available wells
plate.get_wells()
plate.get_wells(simplify=True)
['A04']
# available tables
plate.get_table_names()
['FOV_ROI_table', 'well_ROI_table']
# zyx pixel spacing in micrometers for pyramid levels
plate.level_zyx_spacing
[[30.0, 0.325, 0.325],
 [30.0, 0.65, 0.65],
 [30.0, 1.3, 1.3],
 [30.0, 2.6, 2.6],
 [30.0, 5.2, 5.2]]

Extract a table from the OME-Zarr fileset

The tables are stored within the fileset as describe in the Fractal documentation.

As mentioned the goal of ez_zarr is to abstract the internal structure and make it simple to obtain these tables, if necessary accumulated over many wells, as a pandas.DataFrame :

df = plate.get_table(table_name='FOV_ROI_table')
df
well x_micrometer y_micrometer z_micrometer len_x_micrometer len_y_micrometer len_z_micrometer x_micrometer_original y_micrometer_original
FieldIndex
FOV_6 A04 0.0 0.000000 0.0 650.0 650.0 120.0 4928.600098 -1653.300049
FOV_7 A04 650.0 0.000000 0.0 650.0 650.0 120.0 5578.600098 -1653.300049
FOV_10 A04 0.0 650.000061 0.0 650.0 650.0 120.0 4928.600098 -1003.299988
FOV_11 A04 650.0 650.000061 0.0 650.0 650.0 120.0 5578.600098 -1003.299988

or anndata.AnnData object:

ann = plate.get_table(table_name='FOV_ROI_table')
ann
well x_micrometer y_micrometer z_micrometer len_x_micrometer len_y_micrometer len_z_micrometer x_micrometer_original y_micrometer_original
FieldIndex
FOV_6 A04 0.0 0.000000 0.0 650.0 650.0 120.0 4928.600098 -1653.300049
FOV_7 A04 650.0 0.000000 0.0 650.0 650.0 120.0 5578.600098 -1653.300049
FOV_10 A04 0.0 650.000061 0.0 650.0 650.0 120.0 4928.600098 -1003.299988
FOV_11 A04 650.0 650.000061 0.0 650.0 650.0 120.0 5578.600098 -1003.299988

Get an overview of a full well

get_image_ROI() extracts a rectangular region of interest (ROI) from a well image by coordinates. If no coordinates or other arguments are given, it returns the whole well at the lowest available resolution:

img = plate.get_image_ROI()
print(img.shape) # (ch, z, y, x)

with plt.style.context('dark_background'):
    fig = plt.figure(figsize=(4, 4))
    fig.set_dpi(100)

    plt.imshow(img[0,2], cmap='gray', vmin=100, vmax=200)
    plt.title(plate.name + ' A04')
    plt.show()
    plt.close()
(4, 4, 250, 250)

Get salient regions at higher resolution

get_image_grid_ROIs() allows you to split the well image into a rectangular grid of num_x-by-num_y regions, and automatically select the num_select top regions based on a given property (sample_method).

Here, we pick three regions with the highest signal in channel zero (DAPI):

coord_list, img_list = plate.get_image_grid_ROIs(well = 'A04',
                                                 pyramid_level=2,
                                                 num_x=10, num_y=10, num_select=3,
                                                 sample_method='sum',
                                                 channel=0)
coord_list # (y_start, y_end, x_start, x_end)
len(img_list)

with plt.style.context('dark_background'):
    fig = plt.figure(figsize=(9,3))
    fig.set_dpi(100)
    for i in range(len(img_list)):
        plt.subplot(1, 3, i + 1)
        plt.imshow(img_list[i][0,0], cmap='gray', vmin=100, vmax=200)
        plt.axis('off')
        plt.title('A04 ' + str(coord_list[i]))
    plt.show()
    plt.close()

Zoom in

Alternatively, we can zoom in by coordinates:

img = plate.get_image_ROI(well='A04',
                          upper_left_yx=(90, 50),
                          lower_right_yx=(230, 180))
print(img.shape)

with plt.style.context('dark_background'):
    fig = plt.figure(figsize=(4, 4))
    fig.set_dpi(100)

    plt.imshow(img[0,0], cmap='gray', vmin=100, vmax=200)
    plt.title(plate.name + ' A04')
    plt.show()
    plt.close()
(4, 4, 141, 131)

The above automatically uses the lowest resolution (highest pyramid level) version of the data, and the pixel coordinates refer to that space.

If we want to get the same image at a higher resolution (pyramid_level=2), we need to convert them using the convert_pixel_to_pixel() function:

# convert coordinates from lowest-resolution (None, here: 4) to
#   highest-resolution (0) pyramid level
# remark: coordinates are (z,y,x)
new_upper_left = plate.convert_pixel_to_pixel(zyx = (0, 90, 50),
                                              pyramid_level_from = None,
                                              pyramid_level_to=2)
new_lower_right = plate.convert_pixel_to_pixel(zyx = (0, 230, 180),
                                               pyramid_level_from = None,
                                               pyramid_level_to=2)

# extract image
img = plate.get_image_ROI(well='A04',
                          upper_left_yx=new_upper_left[1:],
                          lower_right_yx=new_lower_right[1:],
                          pyramid_level=2)
print(img.shape)

with plt.style.context('dark_background'):
    fig = plt.figure(figsize=(4, 4))
    fig.set_dpi(100)

    plt.imshow(img[0,0], cmap='gray', vmin=100, vmax=200)
    plt.title(plate.name + ' A04')
    plt.show()
    plt.close()
(4, 4, 561, 521)

Coordinate conversions

As mentioned the pixel coordinates depend on the pyramid_level, but as the scaling factor between pyramid levels and also the pixel sizes in micrometers are known, it’s easy to convert between them:

zyx = (2, 230, 180) # highest pyramid level (4)
print('starting coordinates (pyramid_level=4): ' + str(zyx))

# convert from pixel to micrometers
zyx_um = plate.convert_pixel_to_micrometer(zyx, pyramid_level=4)
print('in micrometer: ' + str(zyx_um))

# convert from micrometers to pixels
zyx_px = plate.convert_micrometer_to_pixel(zyx_um, pyramid_level=4)
print('back in pixel: ' + str(zyx_px))

# convert pixels between pyramid levels
zyx_px_0 = plate.convert_pixel_to_pixel(zyx_px, pyramid_level_from=4, pyramid_level_to=0)
print('in pixel (pyramid_level=0): ' + str(zyx_px_0))
starting coordinates (pyramid_level=4): (2, 230, 180)
in micrometer: (60.0, 1196.0, 936.0)
back in pixel: (2, 230, 180)
in pixel (pyramid_level=0): (2, 3680, 2880)

This makes it quite easy to add a scalebar to an image:

# calculate the pixel length of a scalebar
scalebar_micrometer = 100
scalebar_pixels = plate.convert_micrometer_to_pixel(zyx = (0, 0, scalebar_micrometer),
                                                    pyramid_level=2)[2]
print(str(scalebar_micrometer) + ' µm correspond to ' + str(scalebar_pixels) + ' pixel in x')

with plt.style.context('dark_background'):
    fig = plt.figure(figsize=(4, 4))
    fig.set_dpi(100)

    plt.imshow(img[0,0], cmap='gray', vmin=100, vmax=200)
    plt.title(plate.name + ' A04')
    # add scalebar
    # patches.Rectangle((x, y), width, height, edgecolor, facecolor)
    rect = patches.Rectangle((30, 30), scalebar_pixels, 6, edgecolor='white', facecolor='white')
    ax = plt.gca() # get current axes
    ax.add_patch(rect) # add the patch to the Axes
    # plt.text(x, y, 'Your text', color, ha='center')
    plt.text(30 + scalebar_pixels / 2, 45, str(scalebar_micrometer) + ' µm', color='white', ha='center', va='top')
    # show image
    plt.show()
    plt.close()
100 µm correspond to 77 pixel in x

Calculate on the image data

To save memory, images are stored on disk and only loaded when needed (for implementation details see the dask Array documentation).

This can be demonstrated by type(img). To force loading of the data, you can call .compute():

print(type(img)) # note that the image is an 'on-disk' array
print(type(img.compute())) # ... and can be loaded into memory using .compute()
<class 'dask.array.core.Array'>
<class 'numpy.ndarray'>

In general, you can use dask arrays like numpy arrays. For example, we can threshold the image:

# threshold
img_mask = img[0,0]>140
# ... and compute to numpy array
img_mask = img_mask.compute()

# plot 
with plt.style.context('dark_background'):
    fig = plt.figure(figsize=(12, 6))
    fig.set_dpi(100)

    # ... channel for segmentation mask
    plt.subplot(1, 4, 1)
    plt.title('Zoom-in image (DAPI)')
    plt.imshow(img[0,0], vmin=100, vmax=200, cmap='gray')
    plt.axis('off')
    
    # ... threshold mask
    plt.subplot(1, 4, 2)
    plt.title('Mask (threshold on DAPI)')
    # define a custom colormap for segmentation
    my_cmap = plt.cm.colors.ListedColormap(['black', 'red'])
    plt.imshow(img_mask, cmap=my_cmap)
    plt.axis('off')

    plt.show()
    plt.close()

This is of course not a recommended way to segment this image, but rather meant to demonstrate that the dask arrays can be used just like numpy arrays.

Session info

import session_info
session_info.show()
Click to view session information
-----
dask                2024.1.0
ez_zarr             0.1.1
matplotlib          3.8.2
pandas              1.5.3
session_info        1.0.0
-----
Click to view modules imported as dependencies
PIL                         10.2.0
anndata                     0.10.4
anyio                       NA
arrow                       1.3.0
asciitree                   NA
asttokens                   NA
attr                        23.2.0
attrs                       23.2.0
babel                       2.14.0
certifi                     2023.11.17
cffi                        1.16.0
charset_normalizer          3.3.2
click                       8.1.7
cloudpickle                 3.0.0
colorama                    0.4.6
comm                        0.2.1
cycler                      0.12.1
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.8.0
decorator                   5.1.1
defusedxml                  0.7.1
distributed                 2024.1.0
exceptiongroup              1.2.0
executing                   2.0.1
fasteners                   0.19
fastjsonschema              NA
fqdn                        NA
fsspec                      2023.12.2
gmpy2                       2.1.2
h5py                        3.10.0
idna                        3.6
importlib_metadata          NA
importlib_resources         NA
ipykernel                   6.29.0
isoduration                 NA
jedi                        0.19.1
jinja2                      3.1.2
json5                       NA
jsonpointer                 2.4
jsonschema                  4.21.1
jsonschema_specifications   NA
jupyter_events              0.9.0
jupyter_server              2.12.5
jupyterlab_server           2.25.2
kiwisolver                  1.4.5
locket                      NA
markupsafe                  2.1.3
matplotlib_inline           0.1.6
mpl_toolkits                NA
mpmath                      1.3.0
msgpack                     1.0.7
natsort                     8.4.0
nbformat                    5.9.2
numcodecs                   0.12.1
numpy                       1.26.3
overrides                   NA
packaging                   23.2
parso                       0.8.3
pexpect                     4.9.0
platformdirs                4.1.0
prometheus_client           NA
prompt_toolkit              3.0.43
psutil                      5.9.8
ptyprocess                  0.7.0
pure_eval                   0.2.2
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.9.5
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.17.2
pyparsing                   3.1.1
pythonjsonlogger            NA
pytz                        2023.3.post1
referencing                 NA
requests                    2.31.0
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rpds                        NA
scipy                       1.12.0
send2trash                  NA
six                         1.16.0
sniffio                     1.3.0
sortedcontainers            2.4.0
stack_data                  0.6.3
sympy                       1.12
tblib                       3.0.0
tlz                         0.12.1
toolz                       0.12.1
torch                       2.1.2
torchgen                    NA
tornado                     6.4
tqdm                        4.66.1
traitlets                   5.14.1
typing_extensions           NA
uri_template                NA
urllib3                     1.26.18
wcwidth                     0.2.13
webcolors                   1.13
websocket                   1.7.0
yaml                        6.0.1
zarr                        2.16.1
zict                        3.0.0
zipp                        NA
zmq                         25.1.2
zoneinfo                    NA
-----
IPython             8.18.1
jupyter_client      8.6.0
jupyter_core        5.7.1
jupyterlab          4.0.11
notebook            7.0.7
-----
Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.17
-----
Session information updated at 2024-01-25 09:49