To Believe or Not To Believe: Comparing Supporting Information Tools to Aid Human Judgments of AI Veracity

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical challenge of effectively supporting users in assessing the veracity of generative AI outputs in high-stakes domains such as biomedicine and law. Through a controlled user experiment, it systematically evaluates three assistance modalities—full-text presentation, passage retrieval, and large language model (LLM) explanations—examining their impact on users’ judgment accuracy, decision efficiency, reliance, and trust. The findings empirically demonstrate for the first time that passage retrieval achieves judgment accuracy comparable to full-text review while significantly improving efficiency. In contrast, LLM-generated explanations, although accelerating decision-making, induce overtrust and impair users’ ability to detect erroneous content—a detrimental effect particularly pronounced in complex tasks. This work provides crucial empirical evidence to inform the design of trustworthy AI assistance in high-risk settings.

Technology Category

Application Category

📝 Abstract
With increasing awareness of the hallucination risks of generative artificial intelligence (AI), we see a growing shift toward providing information tooling to help users determine the veracity of AI-generated answers for themselves. User responsibility for assessing veracity is particularly critical for certain sectors that rely on on-demand, AI-generated data extraction, such as biomedical research and the legal sector. While prior work offers us a variety of ways in which systems can provide such support, there is a lack of empirical evidence on how this information is actually incorporated into the user's decision-making process. Our user study takes a step toward filling this knowledge gap. In the context of a generative AI data extraction tool, we examine the relationship between the type of supporting information (full source text, passage retrieval, and Large Language Model (LLM) explanations) and user behavior in the veracity assessment process, examined through the lens of efficiency, effectiveness, reliance and trust. We find that passage retrieval offers a reasonable compromise between accuracy and speed, with judgments of veracity comparable to using the full source text. LLM explanations, while also enabling rapid assessments, fostered inappropriate reliance and trust on the data extraction AI, such that participants were less likely to detect errors. In additiona, we analyzed the impacts of the complexity of the information need, finding preliminary evidence that inappropriate reliance is worse for complex answers. We demonstrate how, through rigorous user evaluation, we can better develop systems that allow for effective and responsible human agency in veracity assessment processes.
Problem

Research questions and friction points this paper is trying to address.

AI veracity
hallucination
supporting information
human judgment
generative AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

veracity assessment
supporting information
human-AI interaction
hallucination mitigation
user study
🔎 Similar Papers
No similar papers found.
J
Jessica Irons
Commonwealth Scientific & Industrial Research Organisation, Australia
Patrick Cooper
Patrick Cooper
University of Colorado Boulder
RoboticsArtificial IntelligenceRepresentation LearningActive Inference
N
Necva Bölücü
Commonwealth Scientific & Industrial Research Organisation, Australia
R
Roelien Timmer
Commonwealth Scientific & Industrial Research Organisation, Australia
H
Huichen Yang
Commonwealth Scientific & Industrial Research Organisation, Australia
Changhyun Lee
Changhyun Lee
Professor of Radiology, Seoul National University, College of Medicine
Radiologythoracic
B
Brian Jin
Commonwealth Scientific & Industrial Research Organisation, Australia
Andreas Duenser
Andreas Duenser
CSIRO - Data61
Human Factorstrusthuman-AI collaborationcollaborative intelligencehuman-AI workflows
Stephen Wan
Stephen Wan
Data61 CSIRO
computational linguistics