Assessing the conditional calibration of interval forecasts using decompositions of the interval score

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing interval forecasting evaluation methods primarily rely on unconditional coverage tests, failing to assess conditional calibration—i.e., whether prediction intervals accurately reflect uncertainty across varying input conditions. Method: We propose a novel framework for evaluating conditional calibration based on interval score decomposition and isotonic distribution regression (IDR). Specifically, we stratify the interval score by covariates and employ IDR to model the conditional quantile function, enabling quantitative diagnosis of spatial heterogeneity in calibration bias. Contribution/Results: The framework is both theoretically rigorous and computationally feasible, providing the first interpretable and testable assessment of conditional calibration. Extensive experiments on synthetic data and multiple benchmark regression datasets demonstrate its ability to precisely localize regions of miscalibration, significantly outperforming conventional unconditional coverage tests.

Technology Category

Application Category

📝 Abstract

Forecasts for uncertain future events should be probabilistic. Probabilistic forecasts are commonly issued as prediction intervals, which provide a measure of uncertainty in the unknown outcome whilst being easier to understand and communicate than full predictive distributions. The calibration of a $(1 - α)$-level prediction interval can be assessed by checking whether the probability that the outcome falls within the interval is equal to $1 - α$. However, such coverage checks are typically unconditional and therefore relatively weak. Although this is well known, there is a lack of methods to assess the conditional calibration of interval forecasts. In this work, we demonstrate how this can be achieved via decompositions of the well-known interval (or Winkler) score. We study notions of calibration for interval forecasts and then introduce a decomposition of the interval score based on isotonic distributional regression. This decomposition exhibits many desirable properties, both in theory and in practice, which allows users to accurately assess the conditional calibration of interval forecasts. This is illustrated on simulated data and in three applications to benchmark regression datasets.

Problem

Research questions and friction points this paper is trying to address.

Assessing conditional calibration of interval forecasts

Decomposing interval score for calibration evaluation

Providing methods for weak unconditional coverage checks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposing interval score via isotonic regression

Assessing conditional calibration of forecasts

Using Winkler score for interval evaluation

🔎 Similar Papers

Conformal prediction for frequency-severity modeling