Evaluating the False Trust engendered by LLM Explanations

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study addresses the risk that explanations generated by large language models (LLMs) often lack sufficient informativeness, leading users to overtrust incorrect answers. Through a between-subjects user experiment conducted in scenarios where users cannot directly verify model outputs, the authors systematically evaluate the impact of chain-of-thought rationales, post-hoc explanations, summaries, and a novel contrastive dual-explanation mechanism on users’ judgment accuracy. The findings reveal that while the first three explanation types appear persuasive, they frequently induce misjudgments. In contrast, the proposed contrastive dual-explanation paradigm—the first approach demonstrated to significantly enhance users’ ability to detect AI errors—effectively improves their discernment of the correctness of model-generated responses.

📝 Abstract

Large Language Models (LLMs) and Large Reasoning Models (LRMs) are increasingly used for critical tasks, yet they provide no guarantees about the correctness of their solutions. Users must decide whether to trust the model's answer, aided by reasoning traces, their summaries, or post-hoc generated explanations. These reasoning traces, despite evidence that they are neither faithful representations of the model's computations nor necessarily semantically meaningful, are often interpreted as provenance explanations. It is unclear whether explanations or reasoning traces help users identify when the AI is incorrect, or whether they simply persuade users to trust the AI regardless. In this paper, we take a user-centered approach and develop an evaluation protocol to study how different explanation types affect users' ability to judge the correctness of AI-generated answers and engender false trust in the users. We conduct a between-subject user study, simulating a setting where users do not have the means to verify the solution and analyze the false trust engendered by commonly used LLM explanations - reasoning traces, their summaries and post-hoc explanations. We also test a contrastive dual explanation setting where we present arguments for and against the AI's answer. We find that reasoning traces and post-hoc explanations are persuasive but not informative: they increase user acceptance of LLM predictions regardless of their correctness. In contrast, dual explanation is the only condition that genuinely improves users' ability to distinguish correct from incorrect AI outputs.

Problem

Research questions and friction points this paper is trying to address.

False Trust

Large Language Models

Explanation

Reasoning Traces

User Evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

false trust

explanation evaluation

reasoning traces