Can LLMs Take Retrieved Information with a Grain of Salt?

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work addresses the critical challenge that large language models (LLMs) often fail to modulate the certainty of their responses in accordance with the uncertainty inherent in retrieved contextual information—a shortcoming that poses significant risks in high-stakes domains such as healthcare and finance. To this end, the authors introduce the first evaluation metric specifically designed to assess contextual certainty adherence, systematically exposing deficiencies in current LLMs on this task. They further propose a general-purpose prompting strategy that requires no model weight modifications, integrating prior-knowledge reminders, certainty calibration, and context simplification mechanisms. Experimental results demonstrate that this approach reduces certainty adherence errors by 25% on average across multiple mainstream LLMs, substantially enhancing their ability to respond cautiously when confronted with uncertain input contexts.

📝 Abstract

Large language models have demonstrated impressive retrieval-augmented capabilities. However, a crucial area remains underexplored: their ability to appropriately adapt responses to the certainty of the retrieved information. It is a limitation with real consequences in high-stakes domains like medicine and finance. We evaluate eight LLMs on their context-certainty obedience, measuring how well they adjust responses to match expressed context certainty. Our analysis reveals systematic limitations: LLMs struggle to recall prior knowledge after observing an uncertain context, misinterpret expressed certainties, and overtrust complex contexts. To address these, we propose an interaction strategy combining prior reminders, certainty recalibration, and context simplification. This approach reduces obedience errors by 25% on average, without modifying model weights, demonstrating the efficacy of interaction design in enhancing LLM reliability. Our contributions include a principled evaluation metric, empirical insights into LLMs' uncertainty handling, and a portable strategy to improve context-certainty obedience across diverse LLMs.

Problem

Research questions and friction points this paper is trying to address.

retrieval-augmented generation

context-certainty obedience

large language models

uncertainty handling

trust calibration

Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented generation

uncertainty handling

context-certainty obedience