Quickstart
Data requirements
The mandatory input file of Track Analyzer is a data table (a csv or txt file) of tracks containing the position coordinates (in 2D or 3D) along time and the tracks identifiers. Optionally, data can be plotted on the original image provided as a 3D or 4D tiff stack (ie. 2D+time or 3D+time). If the format of your movie is different (list of images), please convert it to tiff stack using Fiji for instance.
The position file must contain columns with the x, y, (z) positions, a frame column and track id column. The positions coordinates can be in pixels or in scaled data. The information about the scaling and other metadata such as time and length scales will be provided by the user through the graphical interface.
If Track Analyzer is run through the Jupyter notebook (see below). The position file format is flexible and the user will be asked to interactively identify all the mandatory columns. You can specify that the position file was generated by Trackmate. The columns are then automatically identified.
If Track Analyzer is run through Galaxy (see below). The position file format has to strictly follow the default column names: x, y, (z), frame, track. However, if the position file was generated by Trackmate, then the columns are automatically identified.
If Trackmate was used to track data, the csv file that must be used is the one generated by clicking on the “spots” button.
If Track Analyzer is run using command lines (see below), the data directory must contain:
a comma-separated csv file named positions.csv which column names are: x, y, (z), frame, track
a text file named info.txt containing the metadata (see example)
(optional) a tiff file named stack.tif
(optional) configuration files in a config directory
The default config files can be generated by running TA_config <path_to_directory>
. The config files are csv files that can be easily edited.
Running the pipeline
There are three ways of running Track Analyzer. Two user-friendly versions are available: an installation-free web-based tool run on Galaxy, and full version run on a user-friendly Jupyter notebook. On both versions, Track Analyzer can be run without any programming knowledge using its graphical interface. The full version interface is launched by running a Jupyter notebook containing widgets allowing the user to load data and set parameters without writing any code. Track Analyzer can also directly be run using command lines (if you need to run if on a distant machine such as a cluster).
Using Galaxy (recommended for a first trial)
The installation-free online version is available here. It is run on the web-base platform Galaxy, which is easy to use (some documentation regarding Galaxy is available here). This online version is slightly limited compared to the full version run on Jupyter notebook. Jupyter notebook offers 3D visualization and hand-drawing data selection using a Napari viewer. Moreover, loaded data are computed step by step throughout the pipeline, which provides the user with better interactivity with the data. Conversely, on Galaxy, the user needs to enter numerical parameters before the analysis can be run.
Complete documentation about Galaxy is available here. Here’s a quick overview of Galaxy’s interface.
Upload your data to Galaxy. If you want to keep track of your history of analysis, you can create a user account.
Choose your input files that were previously uploaded.
Enter the parameters necessary to your analysis.
Hit the execution button to launch the execution on Galaxy’s cluster.
You can find in the history panel all the output of each analysis job. For each of the output elements, you can have a quick look (6), or save it (7). Note that when you display output plots, it is not very intuitive how to display again the main interface. The circular arrow ‘Run this job again’ button (8) displayed on every log file, is then useful. If you press the ‘Run this job again’ button, the interface will be displayed with the exact same set of parameters as the corresponding job.
Once you excute the job, new boxes appear in history panel (5). There are one box for each type of outputs (csv files, image files, log file, config files etc.). While the job is running these boxes are yellow, once the job is done, they turn green. If there was an error during the execution, the boxes turn red. If you have an error, have a look at the log file, there might be a problem with the data you provided. If you suspect there is an issue with Track Analyzer, do not hesitate to report it on our Gitlab.
Using a Jupyter notebook (recommended for advanced options)
The full version can be run using a Jupyter notebook. Documentation about Jupyter notebooks can be found here. Briefly, a notebook comprises a series of ‘cells’ which are blocks of Python code to be executed. Each cell can be run by pressing Shift+Enter. Each cell will execute a piece of code generating the pipeline graphical interface. They all depend on each other, therefore, they MUST be run in order. By default, the code of each cell is hidden but it can be shown by pressing the button at the top of the notebook: ‘Click here to toggle on/off the raw code’. Once the code is hidden, you might miss a cell. This is a common explanation if you get an error. If this happens, start the pipeline again a couple of cells above.
To launch a notebook:
the notebook is at the root of the git repository, or you can just download it here:
run_TA.ipynb
.go to the project folder, or where you downloaded the notebook by running
cd <path_to_the_project_folder>
in a terminalactivate the environment: run
conda activate <env_name>
(see installation for more details)launch a Jupyter notebook, run
jupyter notebook
a web browser opens, click on
run_TA.ipynb
to shut down the notebook, press CTRL+C in the terminal.
Using command lines (only if you need to run it on a distant machine)
If you need to run Track Analyzer from a terminal without any graphical interface, it is possible, but you won’t beneficiate from the interactive modules. Data filtering and analysis parameters will need to be passed through config files (see examples). Track Analyzer comes with four commands:
traj_analysis
which runs the trajectory analysis section (see below). It takes as arguments:path_to_data_directory
(optional: use the flag-r
or--refresh
to refresh the database)map_analysis
which runs the map analysis section (see below). It takes as arguments:path_to_data_directory
(optional: use the flag-r
or--refresh
to refresh the database)TA_config
which generates default config files. It takes as one argument:path_to_data_directory
TA_synthetic
which generates synthetic data. It takes as arguments:path_to_data_directory
(optional: use the flag-p
or--plot
to plot the positions as a tiff file).
For all these commands, you can show the help by adding the optional flag -h
or --help
.
Analysis procedure
Track Analyzer contains a data filtering section and three main analysis sections. This section describes the procedure for the full version, but the installation-free version on Galaxy is very similar, only a few options aren’t available.
Load data
Just follow the instructions on the graphical interface (on the Jupyter notebook or Galaxy), to load your data files.
If you run Track Analyzer for the first time, enter the metadata.
If Track Analyzer is run through the Jupyter notebook, you can additionally select custom columns of variables you might want to plot. You can then have to type in their names and units to be displayed on the plots.
You can also set some plotting parameters such as image file format, colors to be used, image resolution, etc.
Data filtering section
Subsets of the datasets can be filtered on spatiotemporal criteria: x, y, z positions, time subset and track duration. A drawing tool also offers the possibility to hand-draw regions of interest.
Additionally, specific trajectories can be selected by using their position in a region of interest at a specific time. This feature can be useful to inspect either their past (back-tracking) or their future (fate-mapping). Trajectories can also be selected just using their ids.
These subsets can then be analyzed separately. The analysis will be run independently on each of them. Alternatively, they can be analyzed together. Trajectories and computed quantities will then be plotted together using color-coding.
Trajectory analysis section
Trajectories can be plotted over the original image, frame by frame, with some custom color-coding (z color-coded, t color-coded, subset, or random). The total trajectories can also be plotted together with the option to center their origin. This can be useful to detect some patterns in trajectories.
Several quantities can be computed and plotted: velocities and acceleration (spatial components and their modulus). The local cell density can be estimated by performing a Voronoi tesselation. The Voronoi diagram can be plotted and the area of each Voronoi cell can be calculated and plotted. Currently, only the Voronoi tesselation in 2D (even if the data are 3D) is available. If Track Analyzer is run through the Jupyter notebook, you can also plot other variables you selected.
All these quantities can also be averaged over the whole trajectory and plotted.
Trajectories can also be quantified using the Mean Squared Displacement (MSD) analysis. The MSD can be plotted and fitted with some diffusion models to compute the diffusion coefficient.
Map analysis section
Data can be averaged on a regular grid to produce maps of such quantities. Two kinds of maps can be plotted: vector fields and scalar fields.
Vector fields
Velocity and acceleration vectors can be plotted on 2D maps. If 3D data, the z dimension can be color-coded. Such maps can be superimposed on a scalar field.
Scalar fields
The velocity and acceleration components and moduli can be plotted as color-coded maps. The vector average moduli can also be computed. The difference between the velocity mean and the vector average modulus is that the velocity mean is the mean over all velocities in the grid unit, while the vector average modulus is the modulus of the vector averaged in the grid unit. Divergence (contraction and expansion) maps, and curl (rotation) maps can also be plotted.
Notes on running times
Some analysis steps can be time intensive. In particular, the database pre-processing (splitting and interpolation of tracks, velocity and acceleration calculation, etc.) and the trajectory plotting are the most time-intensive steps. To give ideas about orders of magnitude, the pre-processing step takes about 1 minute to run for a 104-tracked objects dataset on a standard laptop.
Comparator section
Previously generated data by the trajectory analysis section can be compared by plotting parameters together on the same plots.
Output
Track Analyzer generates several files, plots, data points, and configuration files.
Database and configuration files
Some files that are necessary to the processing are generated when the pipeline is executed:
data_base.p is a binary collection of python objects generated when the initial tracking file is loaded. It allows the initial loading to be skipped if the pipeline is run several times on the same tracking data. It can be refreshed if necessary.
info.txt is a text file containing important metadata: ‘lengthscale’, ‘timescale’, ‘z_step’, ‘image_width’, ‘image_height’, ‘length_unit’, ‘time_unit’, ‘table_unit’, ‘separator’. It can be interactively generated using the notebook.
if the original image stack is 4D (3D+t), a stack_maxproj.tif is generated by performing a maximum projection over the z dimension, so a 2D image can be used for 2D based plotting
Data output
The trajectory analysis and the map analysis are generated respectively in a traj_analysis and map_analysis directory. Each subset’s analysis is saved in a new folder.
In each subset’s directory:
a config folder is generated with the configuration parameters used for this specific analysis
all_data.csv stores the subset’s table of positions
track_prop.csv stores the averaged quantities along trajectories
each plot is saved using an image format, size and resolution that can be customized. Additionally, the default colors and color maps can be customized in the plotting parameters sections.
the data points of each plot is saved in a csv file with the same name as the image file, so you can replot the data using your favorite plotting software
Examples
Real data
You can get familiar with Track Analyzer by running it on example data. For instance, you can analyze data of a C. elegans developing embryo provided by the cell tracking challenge. Download the data directory containing trajectories and metadata (these positions were extracted following napari’s tutorial):
Additionally, you can download the original timelapse for optimal visualization. Download the dataset. And run the following python script
to extract the image and generate a single tiff file that you can use during the analysis.
To run the script, open a terminal and run:
pip install imagecodecs
cd <path_to_script_folder>
python load_tracking.py <path_to_dataset_folder>
You can also generate the positions by adding the flag -p
, it will generate the positions.csv file present in the archive.
Warning: if you try to open the generated tiff file with Fiji, you will see that the t and z dimensions are not separated. You will have to run “stack to hyperstack” with z=35 and t=195. But this is only if you want to see the file in Fiji, you don’t need to do this for Track Analyzer!
Synthetic data
You can also analyze synthetic data that were generated to ensure that the analysis performed by Track Analyzer is correct. You can download several datasets here
. They all have a param.csv with the input values for each trajectory.
You can generate such datasets using the commande line TA_synthetic data_dir -p
where data_dir
is the path to the directory where you want to save the data, and -p
is a flag used to plot the positions and save it to a tif file. A configuration file config.csv is used to generate the tracks, as shown in the example datasets. Essentially, the number of tracks, the number of frames, and their initial postions can be set. The movements are controlled by a random diffusion field and a velocity field. These fields are polynomial fields in the form of \(a(x-b)^p + c(y-d)^q + e(z-f)^r + cst\). The program outputs a positions.csv file, a stack.tif file (if the -p
flag is used), a D_fields.csv file (with the theoretical values of D at each initial position), and a v_fields.csv field (with the theoretical values of the velocity, divergence and curl at each initial position).
Troubleshooting
The 3D visualization and the drawing selection tool depend on the napari package. The installation of this package can lead to issues depending on your system. If you are not able to solve this installation, you will not be able to have access to 3D rendering. However, you will still be able to use Track Analyzer without the drawing tool, by using coordinates sliders in the graphical interface.
The execution of blocks in the Jupyter notebook can be buggy because of the large number of widgets. If you can’t normally execute a block by pressing Shift+Enter, use the Execute button at the top of the notebook.