This table provides comprehensive metrics for evaluating different aspects of uncertainty quantification quality.
| Metric | Value | Acceptable Range | Status | Description |
|---|---|---|---|---|
| Coverage Metrics | ||||
| Average Coverage | - | Depends on target | - | Average empirical coverage across all alpha levels |
| Average Coverage Gap | - | |Gap| < 0.05 | - | Average difference between expected and empirical coverage |
| Coverage Consistency | - | ≥ 0.8 | - | Consistency of coverage performance across alpha levels |
| Calibration Metrics | ||||
| Expected Calibration Error | - | < 0.05 | - | Weighted average of calibration errors across all bins |
| Maximum Calibration Error | - | < 0.15 | - | Maximum calibration error observed in any bin |
| Brier Score | - | < 0.1 | - | Mean squared error between predicted probabilities and outcomes |
| Sharpness Metrics | ||||
| Average Interval Width | - | Domain dependent | - | Average width of prediction intervals (lower is sharper) |
| Width Variation | - | < 0.5 | - | Coefficient of variation in interval widths |
| Normalized Sharpness | - | ≥ 0.7 | - | Sharpness score normalized against baseline |
| Composite Scores | ||||
| Uncertainty Score | - | ≥ 0.8 | - | Overall score for uncertainty quantification quality |
| Reliability-Sharpness Balance | - | ≥ 0.7 | - | Balance between reliable coverage and sharp intervals |
Good uncertainty quantification requires balancing multiple objectives:
The composite scores combine these aspects to provide an overall assessment of uncertainty quality.