Delusions of Large Language Models

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a novel high-confidence error phenomenon in large language models—“delusional hallucination”—characterized by factually incorrect outputs accompanied by abnormally low uncertainty estimates, thereby severely undermining detectability and corrigibility. Through systematic experiments across model families—including question-answering benchmarks, retrieval-augmented generation (RAG), multi-agent debate, fine-tuning, and self-reflection interventions—we formally define and empirically validate delusional hallucination as distinct from conventional hallucination. Our analysis reveals strong correlations between this phenomenon and training dynamics biases as well as data noise. Results demonstrate that delusional hallucinations are both pervasive and markedly more persistent than standard hallucinations; conventional fine-tuning and self-reflection yield only marginal mitigation. In contrast, RAG and multi-agent debate significantly reduce delusion rates while enhancing model honesty and reliability. This study provides the first rigorous characterization of delusional hallucination, establishes its empirical prevalence and root causes, and identifies effective architectural and inference-time interventions for improving factual consistency.

Technology Category

Application Category

📝 Abstract
Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations. We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate. Unlike ordinary hallucinations, delusions persist with low uncertainty, posing significant challenges to model reliability. Through empirical analysis across different model families and sizes on several Question Answering tasks, we show that delusions are prevalent and distinct from hallucinations. LLMs exhibit lower honesty with delusions, which are harder to override via finetuning or self reflection. We link delusion formation with training dynamics and dataset noise and explore mitigation strategies such as retrieval augmented generation and multi agent debating to mitigate delusions. By systematically investigating the nature, prevalence, and mitigation of LLM delusions, our study provides insights into the underlying causes of this phenomenon and outlines future directions for improving model reliability.
Problem

Research questions and friction points this paper is trying to address.

Identify and define LLM delusions as high-confidence incorrect outputs.
Analyze prevalence and distinct characteristics of delusions in LLMs.
Explore strategies to mitigate delusions and improve model reliability.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval augmented generation reduces delusions.
Multi-agent debating mitigates high confidence errors.
Analyzing training dynamics to understand delusion formation.
🔎 Similar Papers
No similar papers found.
Hongshen Xu
Hongshen Xu
Shanghai Jiao Tong University
Natural Language ProcessingLarge Language ModelLLM Alignment
Z
Zixv yang
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
Zichen Zhu
Zichen Zhu
Shanghai Jiao Tong University
GUI智能体,多模态大模型,人机交互
Kunyao Lan
Kunyao Lan
Shanghai Jiao Tong University
Natural Language Processing
Z
Zihan Wang
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
Mengyue Wu
Mengyue Wu
Shanghai Jiao Tong University
Speech perception and productionaffective computingaudio cognition
Z
Ziwei Ji
Center for Artificial Intelligence Research (CAiRE), Hong Kong University of Science and Technology
L
Lu Chen
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
Pascale Fung
Pascale Fung
Dept. of Electronic & Computer Engineering, the Hong Kong University of Science & Technology
artificial intelligenceconversational AIspeech recognitionnatural language processingAI
K
Kai Yu
X-LANCE Lab, Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China