🤖 AI Summary
Existing uncertainty modeling approaches grounded in classical probability frameworks struggle to accurately capture the higher-order uncertainty exhibited by large language models (LLMs) in scenarios such as ambiguous question answering, in-context learning, and self-reflection, often leading to systematic failures. This work introduces imprecise probability theory into LLMs for the first time and proposes a prompt-engineering-based hierarchical uncertainty extraction method. By jointly modeling first-order uncertainty (about model outputs) and second-order uncertainty (about the probability model itself), the approach enables direct representation and quantification of higher-order uncertainty. Integrated with a general-purpose prompting template and a post-hoc calibration mechanism, the method significantly enhances the reliability of uncertainty reporting across diverse tasks, thereby offering more trustworthy support for downstream decision-making.
📝 Abstract
Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques developed under the classical probabilistic uncertainty framework. This mismatch leads to systematic failure modes, particularly in settings that involve ambiguous question-answering, in-context learning, and self-reflection. To address this, we propose novel prompt-based uncertainty elicitation techniques grounded in \emph{imprecise probabilities}, a principled framework for repesenting and eliciting higher-order uncertainty. Here, first-order uncertainty captures uncertainty over possible responses to a prompt, while second-order uncertainty (uncertainty about uncertainty) quantifies indeterminacy in the underlying probability model itself. We introduce general-purpose prompting and post-processing procedures to directly elicit and quantify both orders of uncertainty, and demonstrate their effectiveness across diverse settings. Our approach enables more faithful uncertainty reporting from LLMs, improving credibility and supporting downstream decision-making.