GST datasets are often very inconsistent with the Markovian gateset model. This is relatively unsurprising, and means only that real qubits often drift or experience other forms of noise that aren't stationary and Markovian. But this "voids the warranty" on GST's results, at least in principle. The properties of the estimated gates usually appear to be meaningful anyway, but when the model is violated, normal methods for generating error bars become radically overoptimistic. As a partial remedy for this, pyGSTi can be configured to generate "robust" analyses of model-violating data, by artificially deprecating data that are inconsistent with the fit (a variant of some robust statistics methods).
{% if config['ShowScaling'] %}
{% if config['CombineRobust'] %}
If the estimate currently selected on the sidebar used this technique (often denoted by a .robust suffix), then this tab shows several important quantities. Before describing these, however, it is important to note that all of the other model violation tabs (and relevant figures in the Summary tab) show you the model violation before any data deprecation was perfomed. This tab shows the model violation after the data deprecation, and so, by construction, the fit metrics shown here should always look pretty good. The first several figures replicate those of the other model violation tabs (except for the post-scaled data!), and the final plot shows how much each individual experiment (circuit) was deprecated (essentially, by throwing out many of the counts for that circuit while keeping the overall observed frequencies constant). When a figure shows up as N/A then it means that the currently-selected estimate has not been deprecated at all, and so there's nothing to show.
{% else %}
If the estimate currently selected on the sidebar used this technique (often denoted by a .robust suffix), then this tab shows how much each individual experiment (circuit) was deprecated (essentially, by throwing out many of the counts for that circuit while keeping the overall observed frequencies constant). When a figure shows up as N/A then it means that the currently-selected estimate has not been altered at all.
{% endif %}
{% if config['CombineRobust'] %}
SCALED Model violation summary.This plot summarizes how well GST was able to fit the data -- or subsets of it -- to a gateset. Bars indicate the difference between the actual and expected log-likelihood values, and are given in units of standard deviations of the appropriate \chi^2 distribution. Each bar corresponds to a subset of the data including only circuits of length up to \sim L; the rightmost bar corresponds to the full dataset. Low values are better (less model violation), and bars are colored according to the star rating found in a later table detailing the overall model violation.
{{ final_model_fit_progress_bar_plot_scl|render }}
SCALED Histogram of per-circuit model violation.This figure is about goodness-of-fit. When the estimate doesn't fit the data perfectly, we can quantify how well it fails to predict each individual circuit in the dataset, using the excess loglikelihood (-2\log\mathrm{Pr}(\mathrm{data}|\mathrm{gateset})) above and beyond the minimum value (-2 \log \mathrm{Pr}(\mathrm{data}|\mathrm{observed\ frequencies})). This plot shows a histogram of the those values for all the circuits in the dataset. Ideally, they should have the \chi^2 distribution shown by the solid line. Red indicates data that are inconsistent with the model at the 0.95 confidence level, as shown in more detail in the Model Violation tab.
{{ final_model_fit_histogram_scl|render }}
SCALED Detailed overall model violation. This table provides a detailed look at how the observed model violation -- defined by how badly the GST model fits the data -- evolves as more and more of the data are incorporated into the fit. PyGSTi fits the data iteratively, starting by just fitting data from the shortest circuits (L=1), and then adding longer and longer sequences. Each subset of the data, defined by its maximum sequence length L, yields an independent fit that is analyzed here. The key quantity is the difference between the observed and expected maximum loglikelihood (\log(\mathcal{L})). If the model fits, then 2\Delta\log(\mathcal{L}) should be a \chi^2_k random variable, where k (the degrees of freedom) is the difference between N_S (the number of independent data points) and N_p (the number of model parameters). So 2\Delta\log(\mathcal{L}) should lie in [k-\sqrt{2k},k+\sqrt{2k}], and N_\sigma = (2\Delta\log(\mathcal{L})-k)/\sqrt{2k} quantifies how many standard deviations it falls above the mean (a p-value can be straightforwardly derived from N_\sigma). The rating from 1 to 5 stars gives a very crude indication of goodness of fit. Heading tool tips provide descriptions of each column's value.
{{ final_model_fit_progress_table_scl|render }}
SCALED Per-circuit model violation vs. circuit lengthThe fit's total 2\Delta\log(\mathcal{L}) is a sum over all N_s circuits used for GST. This plot shows 2\Delta\log(\mathcal{L}) for each individual circuit, plotted against that circuit's length (on the X axis). Certain forms of non-Markovian noise, like slow drift, produce a characteristic linear relationship. Note that the length plotted here is the actual length of the circuit, not its nominal L.
{{ final_model_fit_colorscatter_plot_scl|render }}
{{ final_model_fit_colorbox_plot_scl|render }}
SCALED Per-sequence model violation box plot. This plot shows the 2\Delta\log(\mathcal{L}) contribution for each individual circuit in the dataset. Each box represents a single gate sequence, and its color indicates whether GST was able to fit the corresponding frequency well. Shades of white/gray indicate typical (within the expected) values. Red squares represent statistically significant evidence for model violation (non-Markovianity), and the probabilty that any red squares appear is {{ linlg_pcntle|render }}%% when the data really are Markovian. Each square block of pixels (plaquette) corresponds to a particular germ-power "base sequence", and each pixel within a block corresponds to a specific "fiducial pair" -- i.e., choice of pre- and post-fiducial sequences. The base sequences are arranged by germ (varying from row to row), and by power/length (varying from column to column). Hovering over a colored box will pop up the exact circuit to which it corresponds, the observed frequencies, and the corresponding probabilities predicted by the GST estimate of the gateset. The slider below the figure permits switching between different estimates, labeled by L, which were obtained from subsets of the data that included only base sequences of length up to L.
{% endif %}
{{ data_scaling_colorbox_plot|render }}
Data scaling factor for each circuit in the dataset.Each colored box represents a single experiment (circuit), arranged in the same way as in other related tabs. A circuit's color indicates the how much the original data counts were scaled down when they were used to compute the log-likelihood or \chi^2 for this estimate (and its error bars). A white box (value 1.0) indicates that all of the original data was used, because that circuit was not originally seen to be inconsistent with the fit. On the other hand, gray or black boxes (numbers between 0 and 1) indicate that the total number of counts for that circuit was scaled down (multiplied by the given factor) to reduce its significance, and therefore that circuit's inconsistency with the fit. Generally, the only circuits scaled down are those deemed significantly inconsistent in the original (unscaled) fit.
{% else %}
Note: Data-scaling figures are not shown because none of the estimates in this report have scaled data.