๐ค AI Summary
This work addresses the critical challenge that large language models (LLMs) often fail to modulate the certainty of their responses in accordance with the uncertainty inherent in retrieved contextual informationโa shortcoming that poses significant risks in high-stakes domains such as healthcare and finance. To this end, the authors introduce the first evaluation metric specifically designed to assess contextual certainty adherence, systematically exposing deficiencies in current LLMs on this task. They further propose a general-purpose prompting strategy that requires no model weight modifications, integrating prior-knowledge reminders, certainty calibration, and context simplification mechanisms. Experimental results demonstrate that this approach reduces certainty adherence errors by 25% on average across multiple mainstream LLMs, substantially enhancing their ability to respond cautiously when confronted with uncertain input contexts.
๐ Abstract
Large language models have demonstrated impressive retrieval-augmented capabilities. However, a crucial area remains underexplored: their ability to appropriately adapt responses to the certainty of the retrieved information. It is a limitation with real consequences in high-stakes domains like medicine and finance. We evaluate eight LLMs on their context-certainty obedience, measuring how well they adjust responses to match expressed context certainty. Our analysis reveals systematic limitations: LLMs struggle to recall prior knowledge after observing an uncertain context, misinterpret expressed certainties, and overtrust complex contexts. To address these, we propose an interaction strategy combining prior reminders, certainty recalibration, and context simplification. This approach reduces obedience errors by 25% on average, without modifying model weights, demonstrating the efficacy of interaction design in enhancing LLM reliability. Our contributions include a principled evaluation metric, empirical insights into LLMs' uncertainty handling, and a portable strategy to improve context-certainty obedience across diverse LLMs.