Calibrating Expressions of Certainty

📅 2024-10-06

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This paper addresses the semantic ambiguity and calibration difficulty of deterministic expressions (e.g., “possible”, “very likely”) in natural language by modeling their semantics as probability distributions over the probability simplex—replacing conventional scalar confidence scores. Methodologically, it (i) formally represents deterministic expressions as simplex-valued distributions for the first time; (ii) generalizes the notion of calibration error to distributional settings; (iii) proposes a distribution-mapping-based post-hoc calibration algorithm; and (iv) establishes a human–AI collaborative calibration analysis framework. Experiments involve radiologists and large language models, enabling quantitative, interpretable cross-subject calibration assessment and generating actionable calibration improvement recommendations. Results demonstrate significant gains in semantic consistency of uncertainty expressions and inter-subject reliability, establishing a novel paradigm for trustworthy human–AI collaborative decision-making.

Technology Category

Application Category

📝 Abstract

We present a novel approach to calibrating linguistic expressions of certainty, e.g.,"Maybe"and"Likely". Unlike prior work that assigns a single score to each certainty phrase, we model uncertainty as distributions over the simplex to capture their semantics more accurately. To accommodate this new representation of certainty, we generalize existing measures of miscalibration and introduce a novel post-hoc calibration method. Leveraging these tools, we analyze the calibration of both humans (e.g., radiologists) and computational models (e.g., language models) and provide interpretable suggestions to improve their calibration.

Problem

Research questions and friction points this paper is trying to address.

Calibrating linguistic expressions of certainty accurately

Modeling uncertainty as distributions over the simplex

Improving calibration of humans and computational models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model uncertainty as simplex distributions

Generalize miscalibration measures

Introduce post-hoc calibration method

🔎 Similar Papers

QA-Calibration of Language Model Confidence Scores