MMD — Maximum Mean Discrepancy (MMD): measures the distance between the distributions of real and synthetic ECGs in a reproducing kernel Hilbert space. Score of 0 means identical distributions. Lower is better.
Reference:Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723–773. https://jmlr.org/papers/v13/gretton12a.html
Dataset
Score
00000
0.000000
Lower is better (0 = identical distributions).
DTW — Dynamic Time Warping (DTW): measures morphological shape similarity between real and synthetic ECG waveforms by finding the optimal non-linear alignment along the time axis. Mean pairwise DTW distance over randomly sampled pairs. Lower is better.
Reference:Berndt, D. J., & Clifford, J. (1994). Using dynamic time warping to find patterns in time series. KDD Workshop, 10(16), 359–370. | Goldberger, A. L., et al. (2000). PhysioBank, PhysioToolkit, and PhysioNet. Circulation, 101(23), e215–e220. https://doi.org/10.1161/01.CIR.101.23.e215
Dataset
Score
00000
4.985142
Lower is better (0 = identical distributions).
PRD — Percent Root-mean-square Difference (PRD): measures waveform distortion of the nearest synthetic neighbour for each real ECG. PRD = 100 × ‖real − synth‖₂ / ‖real‖₂ (%). Lower is better.
Reference:Zigel, Y., Cohen, A., & Katsevman, A. (2000). The weighted diagnostic distortion measure for ECG signal compression. IEEE Transactions on Biomedical Engineering, 47(11), 1422–1430. https://doi.org/10.1109/10.871205
Dataset
Score
00000
58.851376
Lower is better (0 = identical distributions).
PSD — PSD Divergence: Jensen–Shannon divergence between the mean normalised power spectral densities of real and synthetic ECGs per lead (frequency domain similarity). Range: [0, ln 2]. Lower is better.
Reference:Welch, P. D. (1967). The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics, 15(2), 70–73. https://doi.org/10.1109/TAU.1967.1161901
Dataset
Score
00000
-0.000000
Lower is better (0 = identical distributions).
FD — Fréchet Distance (FD): measures the distance between multivariate Gaussians fitted to ECG feature embeddings of real and synthetic sets per lead. Analogous to FID for images. Lower is better.
Reference:Heusel, M., et al. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NeurIPS, 30. https://arxiv.org/abs/1706.08500 | Thambawita, V., et al. (2021). DeepFake electrocardiograms. Scientific Reports, 11, 21896. https://doi.org/10.1038/s41598-021-01295-2
Dataset
Score
00000
0.000000
Lower is better (0 = identical distributions).
PSNR — Peak Signal-to-Noise Ratio (PSNR): measures waveform reconstruction quality between real ECGs and their nearest synthetic neighbours. Reported in dB; higher is better.
Reference:Huynh-Thu, Q., & Ghanbari, M. (2008). Scope of validity of PSNR in image/video quality assessment. Electronics Letters, 44(13), 800–801. https://doi.org/10.1049/el:20080522
Dataset
Score
00000
59.714712
Higher is better.
SSIM — Structural Similarity Index (SSIM): measures luminance, contrast, and structural similarity between real and nearest-synthetic ECG waveforms. Range [-1, 1]; higher is better.
Reference:Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861
Dataset
Score
00000
0.595515
Higher is better.
SWD — Sliced Wasserstein Distance (SWD): approximates the Wasserstein distance between real and synthetic ECG feature distributions by averaging 1-D projections. Lower is better (0 = identical distributions).
Reference:Kolouri, S., Nadjahi, K., Simsekli, U., Badeau, R., & Rohde, G. (2019). Generalized sliced Wasserstein distances. Advances in Neural Information Processing Systems (NeurIPS), 32. https://proceedings.neurips.cc/paper/2019/hash/f0935e4cd5920aa6c7c996a5ee53a70f-Abstract.html
Dataset
Score
00000
0.000000
Lower is better (0 = identical distributions).
MAE — Mean Absolute Error (MAE): average pointwise deviation between real ECGs and their nearest synthetic neighbours, computed per lead. Lower is better (0 = perfect reconstruction).
Reference:Zigel, Y., Cohen, A., & Katz, A. (2000). The weighted diagnostic distortion (WDD) measure for ECG signal compression. IEEE Transactions on Biomedical Engineering, 47(11), 1422–1430. https://doi.org/10.1109/10.871206
Dataset
Score
00000
0.057461
Lower is better (0 = identical distributions).
HRV — Heart Rate Variability (HRV): compares SDNN, rMSSD, and pNN50 distributions between real and synthetic ECGs via Jensen-Shannon divergence. Lower is better.
Reference:Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996). Heart rate variability: standards of measurement, physiological interpretation, and clinical use. Circulation, 93(5), 1043–1065. https://doi.org/10.1161/01.CIR.93.5.1043
Dataset
Score
00000
0.000000
Lower is better (0 = identical distributions).
PR_DIST — Distribution Precision & Recall: estimates fidelity (Precision) and diversity (Recall) of synthetic ECG distributions relative to real ECG distributions via k-NN manifold estimation. Score = F1 of Precision and Recall. Higher is better.
Reference:Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., & Aila, T. (2019). Improved Precision and Recall Metric for Assessing Generative Models. Advances in Neural Information Processing Systems (NeurIPS), 32. https://proceedings.neurips.cc/paper/2019/hash/0234c510bc6d908b28c70ff313743079-Abstract.html
Dataset
Score
00000
1.000000
Higher is better.
SQI — Signal Quality Index (SQI): mean Pearson correlation between synthetic ECGs and the real-ECG QRS morphology template. Range [-1, 1]; higher is better.
Reference:Clifford, G. D., et al. (2012). AF classification from a short single lead ECG recording: the PhysioNet/Computing in Cardiology Challenge 2017. Computing in Cardiology, 44, 1–4. https://doi.org/10.22489/CinC.2017.065-469
Dataset
Score
00000
0.031566
Higher is better.
NN_DIST — Nearest-Neighbour Distance (NND): mean distance from each real ECG to its nearest synthetic neighbour in statistical feature space. Lower = synthetic is closer to real. Authenticity (in extra) = fraction of synthetic not memorising any real sample.
Reference:Meehan, C., Chaudhuri, K., & Dasgupta, S. (2020). A non-parametric test to detect data-copying in generative models. Proceedings of AISTATS 2020, 119, 3012–3022. https://proceedings.mlr.press/v119/meehan20a.html
Dataset
Score
00000
0.000000
Lower is better (0 = identical distributions).
HEART_RATE — Heart Rate Distribution: Jensen-Shannon divergence between real and synthetic heart-rate (BPM) distributions derived from R-peak detection. Lower is better.
Reference:Pan, J., & Tompkins, W. J. (1985). A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering, 32(3), 230–236. https://doi.org/10.1109/TBME.1985.325532
Dataset
Score
00000
0.000000
Lower is better (0 = identical distributions).
SPECTRAL_ENTROPY — Spectral Entropy Divergence: JS divergence between real and synthetic distributions of per-recording spectral entropy (bits). Lower is better (0 = identical distributions).
Reference:Inouye, T., et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and Clinical Neurophysiology, 79(3), 204–210. https://doi.org/10.1016/0013-4694(91)90138-T
Dataset
Score
00000
0.000000
Lower is better (0 = identical distributions).
Metrics Summary
All metric scores in a single table. Click Download CSV to export.
Metric
Direction
00000
mmd
↓ lower
0.000000
dtw
↓ lower
4.985142
prd
↓ lower
58.851376
psd
↓ lower
-0.000000
fd
↓ lower
0.000000
psnr
↑ higher
59.714712
ssim
↑ higher
0.595515
swd
↓ lower
0.000000
mae
↓ lower
0.057461
hrv
↓ lower
0.000000
pr_dist
↑ higher
1.000000
sqi
↑ higher
0.031566
nn_dist
↓ lower
0.000000
heart_rate
↓ lower
0.000000
spectral_entropy
↓ lower
0.000000
Overall Quality Summary
Radar chart: each axis is one metric, normalised so the outer ring = best quality.
Clinical Feature Summary
Mean ± std of each feature across recordings. NaN indicates feature could not be extracted.
Feature
Real (mean ± std)
00000 (mean ± std)
rr_mean_ms
826.397 ± 157.509
826.397 ± 157.509
rr_std_ms
70.176 ± 95.659
70.176 ± 95.659
hr_bpm
75.466 ± 16.091
75.466 ± 16.091
pr_interval_ms
N/A
N/A
qrs_duration_ms
86.515 ± 34.907
86.515 ± 34.907
qt_interval_ms
N/A
N/A
qtc_bazett_ms
N/A
N/A
p_wave_duration_ms
N/A
N/A
hrv_sdnn_ms
71.968 ± 82.548
71.968 ± 82.548
hrv_rmssd_ms
87.978 ± 112.677
87.978 ± 112.677
hrv_pnn50
22.636 ± 27.840
22.636 ± 27.840
st_deviation_mv
N/A
N/A
t_amplitude_mv
0.148 ± 0.102
0.148 ± 0.102
Clinical Feature Distributions (2D) — 00000
Scatter plots comparing real (blue) vs synthetic (orange) feature distributions.
Each point represents one recording.
Clinical Feature Distributions (3D) — 00000
Interactive 3D scatter: rotate by dragging. Real = blue, Synthetic = orange.
References
MMD: Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723–773. https://jmlr.org/papers/v13/gretton12a.html
DTW: Berndt, D. J., & Clifford, J. (1994). Using dynamic time warping to find patterns in time series. KDD Workshop, 10(16), 359–370. | Goldberger, A. L., et al. (2000). PhysioBank, PhysioToolkit, and PhysioNet. Circulation, 101(23), e215–e220. https://doi.org/10.1161/01.CIR.101.23.e215
PRD: Zigel, Y., Cohen, A., & Katsevman, A. (2000). The weighted diagnostic distortion measure for ECG signal compression. IEEE Transactions on Biomedical Engineering, 47(11), 1422–1430. https://doi.org/10.1109/10.871205
PSD: Welch, P. D. (1967). The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics, 15(2), 70–73. https://doi.org/10.1109/TAU.1967.1161901
FD: Heusel, M., et al. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NeurIPS, 30. https://arxiv.org/abs/1706.08500 | Thambawita, V., et al. (2021). DeepFake electrocardiograms. Scientific Reports, 11, 21896. https://doi.org/10.1038/s41598-021-01295-2
PSNR: Huynh-Thu, Q., & Ghanbari, M. (2008). Scope of validity of PSNR in image/video quality assessment. Electronics Letters, 44(13), 800–801. https://doi.org/10.1049/el:20080522
SSIM: Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861
SWD: Kolouri, S., Nadjahi, K., Simsekli, U., Badeau, R., & Rohde, G. (2019). Generalized sliced Wasserstein distances. Advances in Neural Information Processing Systems (NeurIPS), 32. https://proceedings.neurips.cc/paper/2019/hash/f0935e4cd5920aa6c7c996a5ee53a70f-Abstract.html
MAE: Zigel, Y., Cohen, A., & Katz, A. (2000). The weighted diagnostic distortion (WDD) measure for ECG signal compression. IEEE Transactions on Biomedical Engineering, 47(11), 1422–1430. https://doi.org/10.1109/10.871206
HRV: Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996). Heart rate variability: standards of measurement, physiological interpretation, and clinical use. Circulation, 93(5), 1043–1065. https://doi.org/10.1161/01.CIR.93.5.1043
PR_DIST: Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., & Aila, T. (2019). Improved Precision and Recall Metric for Assessing Generative Models. Advances in Neural Information Processing Systems (NeurIPS), 32. https://proceedings.neurips.cc/paper/2019/hash/0234c510bc6d908b28c70ff313743079-Abstract.html
SQI: Clifford, G. D., et al. (2012). AF classification from a short single lead ECG recording: the PhysioNet/Computing in Cardiology Challenge 2017. Computing in Cardiology, 44, 1–4. https://doi.org/10.22489/CinC.2017.065-469
NN_DIST: Meehan, C., Chaudhuri, K., & Dasgupta, S. (2020). A non-parametric test to detect data-copying in generative models. Proceedings of AISTATS 2020, 119, 3012–3022. https://proceedings.mlr.press/v119/meehan20a.html
HEART_RATE: Pan, J., & Tompkins, W. J. (1985). A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering, 32(3), 230–236. https://doi.org/10.1109/TBME.1985.325532
SPECTRAL_ENTROPY: Inouye, T., et al. (1991). Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalography and Clinical Neurophysiology, 79(3), 204–210. https://doi.org/10.1016/0013-4694(91)90138-T