To Believe or Not To Believe: Comparing Supporting Information Tools to Aid Human Judgments of AI Veracity

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the critical challenge of effectively supporting users in assessing the veracity of generative AI outputs in high-stakes domains such as biomedicine and law. Through a controlled user experiment, it systematically evaluates three assistance modalities—full-text presentation, passage retrieval, and large language model (LLM) explanations—examining their impact on users’ judgment accuracy, decision efficiency, reliance, and trust. The findings empirically demonstrate for the first time that passage retrieval achieves judgment accuracy comparable to full-text review while significantly improving efficiency. In contrast, LLM-generated explanations, although accelerating decision-making, induce overtrust and impair users’ ability to detect erroneous content—a detrimental effect particularly pronounced in complex tasks. This work provides crucial empirical evidence to inform the design of trustworthy AI assistance in high-risk settings.

Technology Category

Application Category

📝 Abstract

With increasing awareness of the hallucination risks of generative artificial intelligence (AI), we see a growing shift toward providing information tooling to help users determine the veracity of AI-generated answers for themselves. User responsibility for assessing veracity is particularly critical for certain sectors that rely on on-demand, AI-generated data extraction, such as biomedical research and the legal sector. While prior work offers us a variety of ways in which systems can provide such support, there is a lack of empirical evidence on how this information is actually incorporated into the user's decision-making process. Our user study takes a step toward filling this knowledge gap. In the context of a generative AI data extraction tool, we examine the relationship between the type of supporting information (full source text, passage retrieval, and Large Language Model (LLM) explanations) and user behavior in the veracity assessment process, examined through the lens of efficiency, effectiveness, reliance and trust. We find that passage retrieval offers a reasonable compromise between accuracy and speed, with judgments of veracity comparable to using the full source text. LLM explanations, while also enabling rapid assessments, fostered inappropriate reliance and trust on the data extraction AI, such that participants were less likely to detect errors. In additiona, we analyzed the impacts of the complexity of the information need, finding preliminary evidence that inappropriate reliance is worse for complex answers. We demonstrate how, through rigorous user evaluation, we can better develop systems that allow for effective and responsible human agency in veracity assessment processes.

Problem

Research questions and friction points this paper is trying to address.

AI veracity

hallucination

supporting information

human judgment

generative AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

veracity assessment

supporting information

human-AI interaction