Basic Information
% for basic in basics:% for column in basic['columns']: | ${column} | % endfor % for idx, val in enumerate(basic['index']):
---|---|
${val} | % for datum in basic['data'][idx]:${datum} | % endfor
- range: [0,1] The smaller the value is, the closer the synthesized data is to the raw data.
- err: relative error, a measure of the discrepancy between raw data and synthesized data.
- jsd: Jensen-Shannon divergence, a measure of the similarity between the probability distribution of raw and synthesized data.
Attribute Distribution
<%
actives = [ ' active' if idx == 0 else '' for idx in range(len(dists))]
%>
% for active, entry in zip(actives, dists):
${entry['name']}
% endfor
<%
displays = [ 'block' if idx == 0 else 'none' for idx in range(len(dists))]
%>
% for display, entry in zip(displays, dists):
(%) | % for col in entry['columns']:${col} | % endfor
---|---|
raw | % for datum in entry['data'][0]:${datum} | % endfor
synth | % for datum in entry['data'][1]:${datum} | % endfor
${entry['path']}
Pair-wise Correlation
% for idx, entry in zip(['Raw Dataset', 'Synthesized Dataset'], corrs): ${idx}% for column in entry['matrix']['columns']: | ${column} | % endfor % for idx, val in enumerate(entry['matrix']['index']):
---|---|
${val} | % for datum in entry['matrix']['data'][idx]:${datum} | % endfor
${entry['path']}
- range: [0,1] The closer the value is 1, the stronger the correlation is.
- By comparing the results before and after, it can reflect whether the synthesized data has maintained the correlation among columns
Misclassification Rate by SVM Classifier
<%
svm_actives = [ ' active' if idx == 0 else '' for idx in range(len(svms))]
%>
% for active, entry in zip(svm_actives, svms):
${entry['column']}
% endfor
<%
svm_displays = [ 'block' if idx == 0 else 'none' for idx in range(len(svms))]
%>
% for display, entry in zip(svm_displays, svms):
% if len(entry['path']) == 1:
% endfor
${entry['path'][0]}
% endif
% if len(entry['path']) == 2:
${entry['path'][0]}
${entry['path'][1]}
% endif
- After synthesis, if the misclassification rate approximates to the rate of original data, the synthesized dataset can be used for SVM classifier modeling