{{ meta.report_title }}{{ meta.report_subtitle }}
Generated on {{ report_creation_datetime.strftime("%d %b %Y, %H:%M") }} ● {{ "{:,d}".format(meta.rows_original) }} original samples, {{ "{:,d}".format(meta.rows_synthetic) }} synthetic samples {% if meta.report_extra_info %} ● {{ meta.report_extra_info }} {% endif %}
{% if is_model_report %}
Accuracy
{{html_assets['info.svg']}}
{{ "{:.1%}".format(metrics.accuracy.overall) }}
({{ "{:.1%}".format(metrics.accuracy.overall_max) }})
|
|
Similarity
{{html_assets['info.svg']}}
|
|
Distances
{{html_assets['info.svg']}}
|
|
Correlations
{{ correlation_matrix_html_chart }}
Univariate Distributions
{% for uni_plots_row in univariate_html_charts | batch(3, ' ') %}
{% for uni_plot in uni_plots_row %}
{% endfor %}
{{ uni_plot }}
{% endfor %}
Bivariate Distributions
{% for biv_plots_row in bivariate_html_charts_tgt | batch(3, ' ') %}
{% for biv_plot in biv_plots_row %}
{% endfor %}
{{ biv_plot }}
{% endfor %}
Bivariate Distributions for context
{% for biv_plots_row in bivariate_html_charts_ctx | batch(3, ' ') %}
{% for biv_plot in biv_plots_row %}
{% endfor %}
{{ biv_plot }}
{% endfor %}
Coherence / Auto-correlations
{% for biv_plots_row in bivariate_html_charts_nxt | batch(3, ' ') %}
{% for biv_plot in biv_plots_row %}
{% endfor %}
{{ biv_plot }}
{% endfor %}
Accuracy
Column | Univariate | {% if 'bivariate' in accuracy_table_by_column %}Bivariate | {% endif %} {% if 'coherence' in accuracy_table_by_column %}Coherence | {% endif %}
---|---|---|---|
{{ row['column'] }} | {{ "{:.1%}".format(row['univariate']) }} | {% if 'bivariate' in accuracy_table_by_column %}{{ "{:.1%}".format(row['bivariate']) }} | {% endif %} {% if 'coherence' in accuracy_table_by_column %}{{ "{:.1%}".format(row['coherence']).replace('nan%', '-') }} | {% endif %}
Total | {{ "{:.1%}".format(metrics.accuracy.univariate) }} | {% if 'bivariate' in accuracy_table_by_column %}{{ "{:.1%}".format(metrics.accuracy.bivariate) }} | {% endif %} {% if 'coherence' in accuracy_table_by_column %}{{ "{:.1%}".format(metrics.accuracy.coherence) }} | {% endif %}
{{ accuracy_matrix_html_chart }}
Explainer
Accuracy of synthetic data is assessed by comparing the distributions of the synthetic (shown in green) and the original data (shown in gray).
For each distribution plot we sum up the deviations across all categories, to get the so-called total variation distance (TVD). The reported accuracy is then simply reported as 100% - TVD.
These accuracies are calculated for all univariate and bivariate distributions. A final accuracy score is then calculated as the average across all of these.
Similarity
{{ similarity_pca_html_chart }}
Explainer
These plots show the first 3 principal components of training samples, synthetic samples, and (if available) holdout samples within the embedding space. The black dots visualize the centroids of the respective samples.
The similarity metric then measures the cosine similarity between these centroids. We expect the cosine similarity to be close to 1, indicating that the synthetic samples are as similar to the training samples as the holdout samples are.
Distances
Synthetic vs. Training Data | {% if metrics.distances.ims_holdout is not none %}(Synthetic vs. Holdout Data) | {% endif %}|
Identical Matches | {{ "{:.1%}".format(metrics.distances.ims_training) }} | {% if metrics.distances.ims_holdout is not none %}({{ "{:.1%}".format(metrics.distances.ims_holdout) }}) | {% endif %}
Average Distances | {{ "{:.3f}".format(metrics.distances.dcr_training) }} | {% if metrics.distances.dcr_holdout is not none %}({{ "{:.3f}".format(metrics.distances.dcr_holdout) }}) | {% endif %}
{{ distances_dcr_html_chart }}
Explainer
Synthetic data shall be as close to the original training samples, as it is close to original holdout samples, which serve us as a reference.
This can be asserted empirically by measuring distances between synthetic samples to their closest original samples, whereas training and holdout sets are sampled to be of equal size.
For the visualization above, the distances of synthetic samples to the training samples are displayed in green, and the distances of synthetic samples to the holdout samples (if available) displayed in gray.
A green line that is overlaps with the gray line validates that the trained model represents the general rules, that can be found in training just as well as in holdout samples.