Model Violation Analysis: quantifying un-modeled error
GST datasets are often very inconsistent with the Markovian gateset model. This is relatively unsurprising, and means only that real qubits often drift or experience other forms of noise that aren't stationary and Markovian. A standard way of measuring the amount of this model violation is to express the degree to which we would expect the model to generate the observed data, given as a number of standard deviations (N_\sigma) or p-value. Large N_\sigma or small p-value indicate high certainty that the model is incorrect (violated) but do not express how much would need to be done to "fix" the model. In this tab, we attempt to quantify the un-modeled error by allowing an amount of slack, measured in total variational distance (TVD) in the probabilities predicted by the model. With enough slack the model will be able to predict, and therefore "fit" any data; we answer the question "What is the minimum amount of slack needed to predict the data?", and allocate slack on a per-gate basis. (TODO: MORE DESCRIPTION)
{% if config['ShowUnmodeledError'] %}
{% if config['CombineRobust'] %}
If the estimate currently selected on the sidebar used this technique (often denoted by a .wildcard suffix), then this tab shows several important quantities. Before describing these, however, it is important to note that all of the other model violation tabs (and relevant figures in the Summary tab) show you the model violation without allowing any slack in the predicted probabilities. This tab shows the model violation with the TVD slack described (WHERE?), and so, by construction, the fit metrics shown here should always look pretty good. The first several figures replicate those of the other model violation tabs (except they allow for slack in the probabilities!), and the final table shows how much TVD slack was allowed per gate. When a figure shows up as N/A then it means that the currently-selected estimate has not had any slack applied to it at all, and so there's nothing to show.
{% else %}
If the estimate currently selected on the sidebar used this technique (often denoted by a .wildcard suffix), then this tab shows how much TVD slack was allowed per gate. When a figure shows up as N/A then it means that the currently-selected estimate has not been altered at all.
{% endif %}
{{ unmodeled_error_budget_table|render }}
Per-gate unmodeled error budget.The model violation plots on this tab are computed using probabilities that aren't those exactly predicted by the chosen model. Instead, the TVD between the model’s exact probabilities for a circuit’s outcomes and those used to compute the model violation may be as large as the circuit’s unmodeled error budget. This budget computed by simply adding the amount given in this table for each gate occurrence in the circuit.
{% if config['DiamondDistanceWildcard'] %}
Per-gate Diamond Distance and Wildcard BudgetThis plot summarizes how well GST was able to fit the data -- or subsets of it -- to a gateset. The lower half of the bars indicates the gauge-optimized diamond distance between the estimated and target gate set. The upper half of each bar corresponds to the amount of wildcard error that needed to be added to this gate in order to restore consistency with the experimentally measured data. This wildcard is assigned according to a model in which we allocate it among each gate proportionately to that gate's diamond distance. This gives a sense of the ratio of error in each gate between Markovian and non-Markovian sources.
{{ unmodeled_error_ddist_bar_plot|render }}
{% endif %}
{% if config['CombineRobust'] %}
RELAXED Model violation summary.This plot summarizes how well GST was able to fit the data -- or subsets of it -- to a gateset. Bars indicate the difference between the actual and expected log-likelihood values, and are given in units of standard deviations of the appropriate \chi^2 distribution. Each bar corresponds to a subset of the data including only circuits of length up to \sim L; the rightmost bar corresponds to the full dataset. Low values are better (less model violation), and bars are colored according to the star rating found in a later table detailing the overall model violation.
{{ final_model_fit_progress_bar_plot_ume|render }}
RELAXED Histogram of per-circuit model violation.This figure is about goodness-of-fit. When the estimate doesn't fit the data perfectly, we can quantify how well it fails to predict each individual circuit in the dataset, using the excess loglikelihood (-2\log\mathrm{Pr}(\mathrm{data}|\mathrm{gateset})) above and beyond the minimum value (-2 \log \mathrm{Pr}(\mathrm{data}|\mathrm{observed\ frequencies})). This plot shows a histogram of the those values for all the circuits in the dataset. Ideally, they should have the \chi^2 distribution shown by the solid line. Red indicates data that are inconsistent with the model at the 0.95 confidence level, as shown in more detail in the Model Violation tab.
{{ final_model_fit_histogram_ume|render }}
RELAXED Detailed overall model violation. This table provides a detailed look at how the observed model violation -- defined by how badly the GST model fits the data -- evolves as more and more of the data are incorporated into the fit. PyGSTi fits the data iteratively, starting by just fitting data from the shortest circuits (L=1), and then adding longer and longer sequences. Each subset of the data, defined by its maximum sequence length L, yields an independent fit that is analyzed here. The key quantity is the difference between the observed and expected maximum loglikelihood (\log(\mathcal{L})). If the model fits, then 2\Delta\log(\mathcal{L}) should be a \chi^2_k random variable, where k (the degrees of freedom) is the difference between N_S (the number of independent data points) and N_p (the number of model parameters). So 2\Delta\log(\mathcal{L}) should lie in [k-\sqrt{2k},k+\sqrt{2k}], and N_\sigma = (2\Delta\log(\mathcal{L})-k)/\sqrt{2k} quantifies how many standard deviations it falls above the mean (a p-value can be straightforwardly derived from N_\sigma). The rating from 1 to 5 stars gives a very crude indication of goodness of fit. Heading tool tips provide descriptions of each column's value.
{{ final_model_fit_progress_table_ume|render }}
RELAXED Per-circuit model violation vs. circuit lengthThe fit's total 2\Delta\log(\mathcal{L}) is a sum over all N_s circuits used for GST. This plot shows 2\Delta\log(\mathcal{L}) for each individual circuit, plotted against that circuit's length (on the X axis). Certain forms of non-Markovian noise, like slow drift, produce a characteristic linear relationship. Note that the length plotted here is the actual length of the circuit, not its nominal L.
{{ final_model_fit_colorscatter_plot_ume|render }}
{{ final_model_fit_colorbox_plot_ume|render }}
RELAXED Per-sequence model violation box plot. This plot shows the 2\Delta\log(\mathcal{L}) contribution for each individual circuit in the dataset. Each box represents a single gate sequence, and its color indicates whether GST was able to fit the corresponding frequency well. Shades of white/gray indicate typical (within the expected) values. Red squares represent statistically significant evidence for model violation (non-Markovianity), and the probabilty that any red squares appear is {{ linlg_pcntle|render }}% when the data really are Markovian. Each square block of pixels (plaquette) corresponds to a particular germ-power "base sequence", and each pixel within a block corresponds to a specific "fiducial pair" -- i.e., choice of pre- and post-fiducial sequences. The base sequences are arranged by germ (varying from row to row), and by power/length (varying from column to column). Hovering over a colored box will pop up the exact circuit to which it corresponds, the observed frequencies, and the corresponding probabilities predicted by the GST estimate of the gateset. The slider below the figure permits switching between different estimates, labeled by L, which were obtained from subsets of the data that included only base sequences of length up to L.
{% endif %}
{% else %}
Note: Unmodeled error figures are not shown because none of the estimates in this report have significant unmodeled error.