ScatterΒΆ
Scatter Plot ModuleΒΆ
This module provides comprehensive tools for generating geographical scatter plots and conducting statistical analyses of meteorological observational data. It enables users to visualize and analyze various metrics over global or regional tiles, facilitating quality assessment, data validation, and comparative studies for CMC (Canadian Meteorological Centre) datasets.
OverviewΒΆ
The scatter module computes and visualizes statistical metrics across adjustable geographical tiles, averaged over user-specified time periods. These metrics are crucial for evaluating the performance of meteorological models, assessing data assimilation quality, and identifying spatial patterns or anomalies in observational networks.
Key FeaturesΒΆ
Multi-Metric Analysis: Computes multiple statistical metrics including Observation minus Prediction (omp), Observation minus Analysis (oma), observation counts (nobs), raw observations (obs), density (dens), and bias corrections (bcorr).
Flexible Tiling: Allows customizable tile sizes (in degrees of longitude and latitude) to suit different analysis scales.
Comparative Studies: Enables side-by-side visual and statistical comparisons of control vs. experimental datasets.
High-Performance Processing: Utilizes Dask-powered parallel processing with configurable CPU allocation for massive global datasets.
Supported MetricsΒΆ
- omp (Observation minus Prediction)
Difference between observed values and model predictions (Background). Essential for model validation and initial error analysis.
- oma (Observation minus Analysis)
Difference between observed values and the final analysis fields. Useful for assessing data assimilation performance.
- nobs (Number of Observations)
Total count of observations per tile. Provides immediate insights into data coverage and sampling density.
- obs (Raw Observations)
Actual recorded meteorological values. Highlights spatial patterns and extremes in the raw data.
- dens (Density)
Spatial density of observations. Visualizes data concentration and potential gaps in global or regional coverage.
- bcorr (Bias Correction)
Bias corrections applied specifically to satellite radiance data. Critical for radiance assimilation assessment.
Usage & Command-Line ParametersΒΆ
Tip
Interactive Session Setup: For resource-intensive computations (especially global datasets), it is highly recommended to start an interactive session before executing the module:
qsub -I -X -l select=4:ncpus=80:mpiprocs=80:ompthreads=1:mem=185gb -l place=scatter -l walltime=6:0:0
Below is the list of parameters available when calling pikobs.scatter.arg_call():
--path_experience_filesAbsolute path to the input data directory for the experiment (e.g.,
monitoring0/banco/postalt/).--experience_nameName or identifier of the experiment (e.g.,
GIC5DA1E22_DDT2).--pathworkWorking directory in your own account/workspace where output files, plots, and intermediate SQLite databases will be saved (e.g.,
/home/your_username/Data_pikobs1/).--datestart/--dateendStart and end dates for the analysis, strictly in
YYYYMMDDHHformat.--regionGeographical bounding box for the plot. See Regions Configuration for all available options.
--familyObservation instrument family to process. See Families Configuration for exact suffixes.
--flags_criteriaQuality control bitmask filter (e.g.,
assimilee,bgckalt). See Flags Criteria.--fonctionSpace-separated list of metrics to compute (e.g.,
omp oma nobs obs dens bcorr).--boxsizex/--boxsizeyTile (bin) size in degrees of longitude (x) and latitude (y).
--projectionMap projection style for the output plots. See Projections Configuration for options.
--id_stnStation ID selection behavior:
all: Processes and generates a separate plot for each individual station.join: Aggregates and combines all stations into a single plot.Specific ID: Processes only the explicitly provided station ID.
--channelVertical coordinate (
vcoord) or satellite channel selection (primarily for radiance data):all: Processes and generates a separate plot for each individual channel.join: Aggregates and combines all channels into a single overall plot.Specific Channel: Processes only the explicitly provided channel number.
--n_cpuNumber of CPU cores allocated for Dask parallel processing.
Single Experiment AnalysisΒΆ
Generate scatter plots for a single experiment focusing on AMSU-A all-sky observations:
python -c 'import pikobs; pikobs.scatter.arg_call()'
--path_experience_files /home/dlo001/sites8/Data_pikobs/monitoring0/
--experience_name GIC5DA1E22_DDT2
--pathwork /home/dlo001/sites8/Data_pikobs1
--datestart 2026020100
--dateend 2026020200
--region Monde
--family to_amsua_allsky
--flags_criteria assimilee
--fonction omp oma nobs obs dens bcorr
--boxsizex 2
--boxsizey 2
--projection cyl
--id_stn all
--channel all
--n_cpu 80
What this command does:
Processes assimilated data for the to_amsua_allsky instrument, computing all 6 available metrics over a 2x2 degree global grid (Monde), and generates cylindrical maps utilizing 80 cores. Because --channel all is set, it will generate individual plots for every single channel.
Comparative Analysis (Control vs. Evaluation)ΒΆ
The module truly shines when generating comparative maps to analyze differences between a control run and a new evaluation experiment. By passing both paths, Pikobs automatically generates side-by-side plots:
python -c 'import pikobs; pikobs.scatter.arg_call()'
--path_control_files /home/dlo001/sites8/Data_pikobs/monitoring0/
--control_name GIC5DA1E22_DDT2
--path_experience_files /home/dlo001/sites8/Data_pikobs/monitoring1/
--experience_name GIC5DA1E22_EVAL
--pathwork /home/dlo001/sites8/Data_pikobs1_comparative
--datestart 2026020100
--dateend 2026020200
--region Monde
--family atms_allsky
--flags_criteria assimilee
--fonction omp oma nobs obs dens bcorr
--boxsizex 2
--boxsizey 2
--projection robinson
--id_stn all
--channel all
--n_cpu 80
What this command does:
Generates comparative side-by-side plots between a control (GIC5DA1E22_DDT2) and an evaluation (GIC5DA1E22_EVAL) run for the atms_allsky family globally (Monde). It uses a Robinson projection and strictly filters for assimilated observations.
Comparative VisualizationsΒΆ
When running the comparative command above, Pikobs generates visual difference maps for spatial analysis.
Note
Difference Calculation (Experiment - Control): All comparative metrics displayed in the difference maps are calculated strictly as the Evaluation Experiment minus the Control.
A positive value indicates the metric is higher in the new experiment compared to the control.
A negative value indicates the metric is lower in the new experiment compared to the control.
Comparative omp & oma: Displays the spatial deviation between the two experiments regarding model predictions and analysis quality.
Comparative obs, nobs & dens: Highlights data ingestion differences, showing changes in sampling density or observation concentration between the two runs.
Comparative bcorr: Depicts how bias corrections applied to radiance data vary between the control and experimental datasets.
Outputs & Performance NotesΒΆ
Upon successful execution, the module generates:
Scatter Plots: PNG images of metrics mapped across geographical tiles.
Comparative Visualizations: Side-by-side spatial difference plots.
Data Files: SQLite databases containing the processed statistical data for further custom querying.
Note
Performance Best Practices:
Always optimize tile sizes (
--boxsizex,--boxsizey) based on your data density. A 1x1 degree global plot requires significantly more RAM than a 5x5 one.When analyzing very long periods (e.g., several months), process the data in smaller time chunks to prevent memory overflow.
The
bcorr(bias correction) metric is only valid and available for satellite radiance data families.
Support & TroubleshootingΒΆ
If you encounter any issues, experience unexpected behavior, or have feature requests while using the scatter module, please open an issue in the Pikobs GitLab Repository.
Please include your exact command, parameters, and error logs when reporting an issue.