ScatterΒΆ

Scatter Plot ModuleΒΆ

This module provides comprehensive tools for generating geographical scatter plots and conducting statistical analyses of meteorological observational data. It enables users to visualize and analyze various metrics over global or regional tiles, facilitating quality assessment, data validation, and comparative studies for CMC (Canadian Meteorological Centre) datasets.

OverviewΒΆ

The scatter module computes and visualizes statistical metrics across adjustable geographical tiles, averaged over user-specified time periods. These metrics are crucial for evaluating the performance of meteorological models, assessing data assimilation quality, and identifying spatial patterns or anomalies in observational networks.

Key FeaturesΒΆ

  • Multi-Metric Analysis: Computes multiple statistical metrics including Observation minus Prediction (omp), Observation minus Analysis (oma), observation counts (nobs), raw observations (obs), density (dens), and bias corrections (bcorr).

  • Flexible Tiling: Allows customizable tile sizes (in degrees of longitude and latitude) to suit different analysis scales.

  • Comparative Studies: Enables side-by-side visual and statistical comparisons of control vs. experimental datasets.

  • High-Performance Processing: Utilizes Dask-powered parallel processing with configurable CPU allocation for massive global datasets.

Supported MetricsΒΆ

omp (Observation minus Prediction)

Difference between observed values and model predictions (Background). Essential for model validation and initial error analysis.

omp Plot
oma (Observation minus Analysis)

Difference between observed values and the final analysis fields. Useful for assessing data assimilation performance.

oma Plot
nobs (Number of Observations)

Total count of observations per tile. Provides immediate insights into data coverage and sampling density.

obs (Raw Observations)

Actual recorded meteorological values. Highlights spatial patterns and extremes in the raw data.

obs Plot
dens (Density)

Spatial density of observations. Visualizes data concentration and potential gaps in global or regional coverage.

dens Plot
bcorr (Bias Correction)

Bias corrections applied specifically to satellite radiance data. Critical for radiance assimilation assessment.

bcorr Plot

Usage & Command-Line ParametersΒΆ

Tip

Interactive Session Setup: For resource-intensive computations (especially global datasets), it is highly recommended to start an interactive session before executing the module:

qsub -I -X -l select=4:ncpus=80:mpiprocs=80:ompthreads=1:mem=185gb -l place=scatter -l walltime=6:0:0

Below is the list of parameters available when calling pikobs.scatter.arg_call():

--path_experience_files

Absolute path to the input data directory for the experiment (e.g., monitoring0/banco/postalt/).

--experience_name

Name or identifier of the experiment (e.g., GIC5DA1E22_DDT2).

--pathwork

Working directory in your own account/workspace where output files, plots, and intermediate SQLite databases will be saved (e.g., /home/your_username/Data_pikobs1/).

--datestart / --dateend

Start and end dates for the analysis, strictly in YYYYMMDDHH format.

--region

Geographical bounding box for the plot. See Regions Configuration for all available options.

--family

Observation instrument family to process. See Families Configuration for exact suffixes.

--flags_criteria

Quality control bitmask filter (e.g., assimilee, bgckalt). See Flags Criteria.

--fonction

Space-separated list of metrics to compute (e.g., omp oma nobs obs dens bcorr).

--boxsizex / --boxsizey

Tile (bin) size in degrees of longitude (x) and latitude (y).

--projection

Map projection style for the output plots. See Projections Configuration for options.

--id_stn

Station ID selection behavior:

  • all: Processes and generates a separate plot for each individual station.

  • join: Aggregates and combines all stations into a single plot.

  • Specific ID: Processes only the explicitly provided station ID.

--channel

Vertical coordinate (vcoord) or satellite channel selection (primarily for radiance data):

  • all: Processes and generates a separate plot for each individual channel.

  • join: Aggregates and combines all channels into a single overall plot.

  • Specific Channel: Processes only the explicitly provided channel number.

--n_cpu

Number of CPU cores allocated for Dask parallel processing.

Single Experiment AnalysisΒΆ

Generate scatter plots for a single experiment focusing on AMSU-A all-sky observations:

python -c 'import pikobs; pikobs.scatter.arg_call()'
     --path_experience_files  /home/dlo001/sites8/Data_pikobs/monitoring0/
     --experience_name        GIC5DA1E22_DDT2
     --pathwork               /home/dlo001/sites8/Data_pikobs1
     --datestart              2026020100
     --dateend                2026020200
     --region                 Monde
     --family                 to_amsua_allsky
     --flags_criteria         assimilee
     --fonction               omp oma nobs obs dens bcorr
     --boxsizex               2
     --boxsizey               2
     --projection             cyl
     --id_stn                 all
     --channel                all
     --n_cpu                  80

What this command does: Processes assimilated data for the to_amsua_allsky instrument, computing all 6 available metrics over a 2x2 degree global grid (Monde), and generates cylindrical maps utilizing 80 cores. Because --channel all is set, it will generate individual plots for every single channel.

Comparative Analysis (Control vs. Evaluation)ΒΆ

The module truly shines when generating comparative maps to analyze differences between a control run and a new evaluation experiment. By passing both paths, Pikobs automatically generates side-by-side plots:

python -c 'import pikobs; pikobs.scatter.arg_call()'
     --path_control_files     /home/dlo001/sites8/Data_pikobs/monitoring0/
     --control_name           GIC5DA1E22_DDT2
     --path_experience_files  /home/dlo001/sites8/Data_pikobs/monitoring1/
     --experience_name        GIC5DA1E22_EVAL
     --pathwork               /home/dlo001/sites8/Data_pikobs1_comparative
     --datestart              2026020100
     --dateend                2026020200
     --region                 Monde
     --family                 atms_allsky
     --flags_criteria         assimilee
     --fonction               omp oma nobs obs dens bcorr
     --boxsizex               2
     --boxsizey               2
     --projection             robinson
     --id_stn                 all
     --channel                all
     --n_cpu                  80

What this command does: Generates comparative side-by-side plots between a control (GIC5DA1E22_DDT2) and an evaluation (GIC5DA1E22_EVAL) run for the atms_allsky family globally (Monde). It uses a Robinson projection and strictly filters for assimilated observations.

Comparative VisualizationsΒΆ

When running the comparative command above, Pikobs generates visual difference maps for spatial analysis.

Note

Difference Calculation (Experiment - Control): All comparative metrics displayed in the difference maps are calculated strictly as the Evaluation Experiment minus the Control.

  • A positive value indicates the metric is higher in the new experiment compared to the control.

  • A negative value indicates the metric is lower in the new experiment compared to the control.

  • Comparative omp & oma: Displays the spatial deviation between the two experiments regarding model predictions and analysis quality.

    Comparative omp Plot Comparative oma Plot
  • Comparative obs, nobs & dens: Highlights data ingestion differences, showing changes in sampling density or observation concentration between the two runs.

    Comparative obs Plot Comparative dens Plot
  • Comparative bcorr: Depicts how bias corrections applied to radiance data vary between the control and experimental datasets.

    Comparative bcorr Plot

Outputs & Performance NotesΒΆ

Upon successful execution, the module generates:

  • Scatter Plots: PNG images of metrics mapped across geographical tiles.

  • Comparative Visualizations: Side-by-side spatial difference plots.

  • Data Files: SQLite databases containing the processed statistical data for further custom querying.

Note

Performance Best Practices:

  • Always optimize tile sizes (--boxsizex, --boxsizey) based on your data density. A 1x1 degree global plot requires significantly more RAM than a 5x5 one.

  • When analyzing very long periods (e.g., several months), process the data in smaller time chunks to prevent memory overflow.

  • The bcorr (bias correction) metric is only valid and available for satellite radiance data families.

Support & TroubleshootingΒΆ

If you encounter any issues, experience unexpected behavior, or have feature requests while using the scatter module, please open an issue in the Pikobs GitLab Repository.

Please include your exact command, parameters, and error logs when reporting an issue.