🤖 AI Summary
This work addresses the lack of task adaptability in uncertainty quantification (UQ) across diverse downstream applications by proposing a task-adaptive UQ paradigm. Methodologically, it leverages a rigorous theoretical decomposition of proper scoring rules to establish, for the first time, a one-to-one correspondence between scoring rules and uncertainty types—aleatoric and epistemic—and decomposes total uncertainty into an entropy-divergence form. It then selects the optimal scoring rule by aligning it with the downstream task loss (e.g., 0–1 loss, log loss), enabling customized UQ metrics. Experiments demonstrate that loss-matching substantially improves selective prediction performance; mutual information—derived from a log-loss-aligned rule—achieves superior out-of-distribution detection; and epistemic uncertainty calibrated via 0–1 loss consistently outperforms baselines in active learning. The framework thus unifies UQ evaluation under task-specific objectives, enhancing both interpretability and empirical efficacy.
📝 Abstract
We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.