# import numpy as np
# import anndata as ad
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from ez_zarr import hcs_wrappers
Using ez_zarr
to explore Fractal outputs
Goal
The aim of ez_zarr
is to provide easy, high-level access to high content screening microscopy data, stored as OME-Zarr filesets according to the NGFF specifications with additional metadata fields, for example the ones generated by the Fractal platform.
The goal is that users can write simple scripts working with plates, wells and fields of view, without having to understand how these are represented within an OME-Zarr fileset.
Installation
ez_zarr
is hosted on GitHub at github.com/fmicompbio/ez_zarr and the documentation of can be found at fmicompbio.github.io/ez_zarr.
To install it, activate a conda
environment, or create a new one (using python 3.9
or newer), and install ez_zarr
from GitHub using:
#| eval: False
pip install git+ssh://git@github.com/fmicompbio/ez_zarr.git
Quickstart
Here are some examples of how you can use ez_zarr
:
Load packages
Open OME-Zarr file set
We open a single OME-Zarr directory, typically representing a single plate:
= './231202-ExM-d0-A04-rbtKDEL.zarr'
zarr_directory = hcs_wrappers.FractalZarr(zarr_directory, name = 'ExM-d0')
plate plate
FractalZarr ExM-d0
path: ./231202-ExM-d0-A04-rbtKDEL.zarr
n_wells: 1
n_channels: 4 (DAPI, 488, A03_C03, FFFF00)
n_pyramid_levels: 5
pyramid_zyx_scalefactor: [1. 2. 2.]
full_resolution_zyx_spacing: [30.0, 0.325, 0.325]
segmentations:
tables (measurements): FOV_ROI_table, well_ROI_table
plate
represents an acquisition of 1 wells with 4 channels.
There are 5 pyramid_levels
available (0 to 4), meaning that in addition to the full resolution data (level 0), we have four more levels that provide the data in 2-fold lower resolutions (see pyramid_zyx_scalefactor
), for example for faster plotting.
This OME-Zarr fileset does not contain any segmentations, but some tables, such as FOV_ROI_table
, which contains the coordinates of the fields of view.
Get information from plate
You can obtain information on the content available in the fileset:
# path to the OME-Zarr fileset
plate.get_path()
'./231202-ExM-d0-A04-rbtKDEL.zarr'
# available channels
plate.get_channels()
[{'active': True,
'coefficient': 1,
'color': '00FFFF',
'inverted': False,
'label': 'DAPI',
'wavelength_id': 'A01_C01',
'window': {'end': 6000, 'max': 65535, 'min': 0, 'start': 110}},
{'active': True,
'coefficient': 1,
'color': '008000',
'inverted': False,
'label': '488',
'wavelength_id': 'A02_C02'},
{'active': True,
'coefficient': 1,
'color': 'FF0000',
'inverted': False,
'label': 'A03_C03',
'wavelength_id': 'A03_C03'},
{'active': True,
'coefficient': 1,
'color': 'FFFF00',
'inverted': False,
'label': 'FFFF00',
'wavelength_id': 'A04_C04'}]
# available wells
plate.get_wells()=True) plate.get_wells(simplify
['A04']
# available tables
plate.get_table_names()
['FOV_ROI_table', 'well_ROI_table']
# zyx pixel spacing in micrometers for pyramid levels
plate.level_zyx_spacing
[[30.0, 0.325, 0.325],
[30.0, 0.65, 0.65],
[30.0, 1.3, 1.3],
[30.0, 2.6, 2.6],
[30.0, 5.2, 5.2]]
Extract a table from the OME-Zarr fileset
The tables are stored within the fileset as describe in the Fractal documentation.
As mentioned the goal of ez_zarr
is to abstract the internal structure and make it simple to obtain these tables, if necessary accumulated over many wells, as a pandas.DataFrame
:
= plate.get_table(table_name='FOV_ROI_table')
df df
well | x_micrometer | y_micrometer | z_micrometer | len_x_micrometer | len_y_micrometer | len_z_micrometer | x_micrometer_original | y_micrometer_original | |
---|---|---|---|---|---|---|---|---|---|
FieldIndex | |||||||||
FOV_6 | A04 | 0.0 | 0.000000 | 0.0 | 650.0 | 650.0 | 120.0 | 4928.600098 | -1653.300049 |
FOV_7 | A04 | 650.0 | 0.000000 | 0.0 | 650.0 | 650.0 | 120.0 | 5578.600098 | -1653.300049 |
FOV_10 | A04 | 0.0 | 650.000061 | 0.0 | 650.0 | 650.0 | 120.0 | 4928.600098 | -1003.299988 |
FOV_11 | A04 | 650.0 | 650.000061 | 0.0 | 650.0 | 650.0 | 120.0 | 5578.600098 | -1003.299988 |
or anndata.AnnData
object:
= plate.get_table(table_name='FOV_ROI_table')
ann ann
well | x_micrometer | y_micrometer | z_micrometer | len_x_micrometer | len_y_micrometer | len_z_micrometer | x_micrometer_original | y_micrometer_original | |
---|---|---|---|---|---|---|---|---|---|
FieldIndex | |||||||||
FOV_6 | A04 | 0.0 | 0.000000 | 0.0 | 650.0 | 650.0 | 120.0 | 4928.600098 | -1653.300049 |
FOV_7 | A04 | 650.0 | 0.000000 | 0.0 | 650.0 | 650.0 | 120.0 | 5578.600098 | -1653.300049 |
FOV_10 | A04 | 0.0 | 650.000061 | 0.0 | 650.0 | 650.0 | 120.0 | 4928.600098 | -1003.299988 |
FOV_11 | A04 | 650.0 | 650.000061 | 0.0 | 650.0 | 650.0 | 120.0 | 5578.600098 | -1003.299988 |
Get an overview of a full well
get_image_ROI()
extracts a rectangular region of interest (ROI) from a well image by coordinates. If no coordinates or other arguments are given, it returns the whole well at the lowest available resolution:
= plate.get_image_ROI()
img print(img.shape) # (ch, z, y, x)
with plt.style.context('dark_background'):
= plt.figure(figsize=(4, 4))
fig 100)
fig.set_dpi(
0,2], cmap='gray', vmin=100, vmax=200)
plt.imshow(img[+ ' A04')
plt.title(plate.name
plt.show() plt.close()
(4, 4, 250, 250)
Get salient regions at higher resolution
get_image_grid_ROIs()
allows you to split the well image into a rectangular grid of num_x
-by-num_y
regions, and automatically select the num_select
top regions based on a given property (sample_method
).
Here, we pick three regions with the highest signal in channel zero (DAPI):
= plate.get_image_grid_ROIs(well = 'A04',
coord_list, img_list =2,
pyramid_level=10, num_y=10, num_select=3,
num_x='sum',
sample_method=0)
channel# (y_start, y_end, x_start, x_end)
coord_list len(img_list)
with plt.style.context('dark_background'):
= plt.figure(figsize=(9,3))
fig 100)
fig.set_dpi(for i in range(len(img_list)):
1, 3, i + 1)
plt.subplot(0,0], cmap='gray', vmin=100, vmax=200)
plt.imshow(img_list[i]['off')
plt.axis('A04 ' + str(coord_list[i]))
plt.title(
plt.show() plt.close()
Zoom in
Alternatively, we can zoom in by coordinates:
= plate.get_image_ROI(well='A04',
img =(90, 50),
upper_left_yx=(230, 180))
lower_right_yxprint(img.shape)
with plt.style.context('dark_background'):
= plt.figure(figsize=(4, 4))
fig 100)
fig.set_dpi(
0,0], cmap='gray', vmin=100, vmax=200)
plt.imshow(img[+ ' A04')
plt.title(plate.name
plt.show() plt.close()
(4, 4, 141, 131)
The above automatically uses the lowest resolution (highest pyramid level) version of the data, and the pixel coordinates refer to that space.
If we want to get the same image at a higher resolution (pyramid_level=2
), we need to convert them using the convert_pixel_to_pixel()
function:
# convert coordinates from lowest-resolution (None, here: 4) to
# highest-resolution (0) pyramid level
# remark: coordinates are (z,y,x)
= plate.convert_pixel_to_pixel(zyx = (0, 90, 50),
new_upper_left = None,
pyramid_level_from =2)
pyramid_level_to= plate.convert_pixel_to_pixel(zyx = (0, 230, 180),
new_lower_right = None,
pyramid_level_from =2)
pyramid_level_to
# extract image
= plate.get_image_ROI(well='A04',
img =new_upper_left[1:],
upper_left_yx=new_lower_right[1:],
lower_right_yx=2)
pyramid_levelprint(img.shape)
with plt.style.context('dark_background'):
= plt.figure(figsize=(4, 4))
fig 100)
fig.set_dpi(
0,0], cmap='gray', vmin=100, vmax=200)
plt.imshow(img[+ ' A04')
plt.title(plate.name
plt.show() plt.close()
(4, 4, 561, 521)
Coordinate conversions
As mentioned the pixel coordinates depend on the pyramid_level
, but as the scaling factor between pyramid levels and also the pixel sizes in micrometers are known, it’s easy to convert between them:
= (2, 230, 180) # highest pyramid level (4)
zyx print('starting coordinates (pyramid_level=4): ' + str(zyx))
# convert from pixel to micrometers
= plate.convert_pixel_to_micrometer(zyx, pyramid_level=4)
zyx_um print('in micrometer: ' + str(zyx_um))
# convert from micrometers to pixels
= plate.convert_micrometer_to_pixel(zyx_um, pyramid_level=4)
zyx_px print('back in pixel: ' + str(zyx_px))
# convert pixels between pyramid levels
= plate.convert_pixel_to_pixel(zyx_px, pyramid_level_from=4, pyramid_level_to=0)
zyx_px_0 print('in pixel (pyramid_level=0): ' + str(zyx_px_0))
starting coordinates (pyramid_level=4): (2, 230, 180)
in micrometer: (60.0, 1196.0, 936.0)
back in pixel: (2, 230, 180)
in pixel (pyramid_level=0): (2, 3680, 2880)
This makes it quite easy to add a scalebar to an image:
# calculate the pixel length of a scalebar
= 100
scalebar_micrometer = plate.convert_micrometer_to_pixel(zyx = (0, 0, scalebar_micrometer),
scalebar_pixels =2)[2]
pyramid_levelprint(str(scalebar_micrometer) + ' µm correspond to ' + str(scalebar_pixels) + ' pixel in x')
with plt.style.context('dark_background'):
= plt.figure(figsize=(4, 4))
fig 100)
fig.set_dpi(
0,0], cmap='gray', vmin=100, vmax=200)
plt.imshow(img[+ ' A04')
plt.title(plate.name # add scalebar
# patches.Rectangle((x, y), width, height, edgecolor, facecolor)
= patches.Rectangle((30, 30), scalebar_pixels, 6, edgecolor='white', facecolor='white')
rect = plt.gca() # get current axes
ax # add the patch to the Axes
ax.add_patch(rect) # plt.text(x, y, 'Your text', color, ha='center')
30 + scalebar_pixels / 2, 45, str(scalebar_micrometer) + ' µm', color='white', ha='center', va='top')
plt.text(# show image
plt.show() plt.close()
100 µm correspond to 77 pixel in x
Calculate on the image data
To save memory, images are stored on disk and only loaded when needed (for implementation details see the dask
Array documentation).
This can be demonstrated by type(img)
. To force loading of the data, you can call .compute()
:
print(type(img)) # note that the image is an 'on-disk' array
print(type(img.compute())) # ... and can be loaded into memory using .compute()
<class 'dask.array.core.Array'>
<class 'numpy.ndarray'>
In general, you can use dask
arrays like numpy
arrays. For example, we can threshold the image:
# threshold
= img[0,0]>140
img_mask # ... and compute to numpy array
= img_mask.compute()
img_mask
# plot
with plt.style.context('dark_background'):
= plt.figure(figsize=(12, 6))
fig 100)
fig.set_dpi(
# ... channel for segmentation mask
1, 4, 1)
plt.subplot('Zoom-in image (DAPI)')
plt.title(0,0], vmin=100, vmax=200, cmap='gray')
plt.imshow(img['off')
plt.axis(
# ... threshold mask
1, 4, 2)
plt.subplot('Mask (threshold on DAPI)')
plt.title(# define a custom colormap for segmentation
= plt.cm.colors.ListedColormap(['black', 'red'])
my_cmap =my_cmap)
plt.imshow(img_mask, cmap'off')
plt.axis(
plt.show() plt.close()
This is of course not a recommended way to segment this image, but rather meant to demonstrate that the dask
arrays can be used just like numpy
arrays.
Session info
import session_info
session_info.show()
Click to view session information
----- dask 2024.1.0 ez_zarr 0.1.1 matplotlib 3.8.2 pandas 1.5.3 session_info 1.0.0 -----
Click to view modules imported as dependencies
PIL 10.2.0 anndata 0.10.4 anyio NA arrow 1.3.0 asciitree NA asttokens NA attr 23.2.0 attrs 23.2.0 babel 2.14.0 certifi 2023.11.17 cffi 1.16.0 charset_normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 colorama 0.4.6 comm 0.2.1 cycler 0.12.1 cython_runtime NA dateutil 2.8.2 debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 distributed 2024.1.0 exceptiongroup 1.2.0 executing 2.0.1 fasteners 0.19 fastjsonschema NA fqdn NA fsspec 2023.12.2 gmpy2 2.1.2 h5py 3.10.0 idna 3.6 importlib_metadata NA importlib_resources NA ipykernel 6.29.0 isoduration NA jedi 0.19.1 jinja2 3.1.2 json5 NA jsonpointer 2.4 jsonschema 4.21.1 jsonschema_specifications NA jupyter_events 0.9.0 jupyter_server 2.12.5 jupyterlab_server 2.25.2 kiwisolver 1.4.5 locket NA markupsafe 2.1.3 matplotlib_inline 0.1.6 mpl_toolkits NA mpmath 1.3.0 msgpack 1.0.7 natsort 8.4.0 nbformat 5.9.2 numcodecs 0.12.1 numpy 1.26.3 overrides NA packaging 23.2 parso 0.8.3 pexpect 4.9.0 platformdirs 4.1.0 prometheus_client NA prompt_toolkit 3.0.43 psutil 5.9.8 ptyprocess 0.7.0 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.17.2 pyparsing 3.1.1 pythonjsonlogger NA pytz 2023.3.post1 referencing NA requests 2.31.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rpds NA scipy 1.12.0 send2trash NA six 1.16.0 sniffio 1.3.0 sortedcontainers 2.4.0 stack_data 0.6.3 sympy 1.12 tblib 3.0.0 tlz 0.12.1 toolz 0.12.1 torch 2.1.2 torchgen NA tornado 6.4 tqdm 4.66.1 traitlets 5.14.1 typing_extensions NA uri_template NA urllib3 1.26.18 wcwidth 0.2.13 webcolors 1.13 websocket 1.7.0 yaml 6.0.1 zarr 2.16.1 zict 3.0.0 zipp NA zmq 25.1.2 zoneinfo NA
----- IPython 8.18.1 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyterlab 4.0.11 notebook 7.0.7 ----- Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.17 ----- Session information updated at 2024-01-25 09:49