🤖 AI Summary
This work identifies a novel high-confidence error phenomenon in large language models—“delusional hallucination”—characterized by factually incorrect outputs accompanied by abnormally low uncertainty estimates, thereby severely undermining detectability and corrigibility. Through systematic experiments across model families—including question-answering benchmarks, retrieval-augmented generation (RAG), multi-agent debate, fine-tuning, and self-reflection interventions—we formally define and empirically validate delusional hallucination as distinct from conventional hallucination. Our analysis reveals strong correlations between this phenomenon and training dynamics biases as well as data noise. Results demonstrate that delusional hallucinations are both pervasive and markedly more persistent than standard hallucinations; conventional fine-tuning and self-reflection yield only marginal mitigation. In contrast, RAG and multi-agent debate significantly reduce delusion rates while enhancing model honesty and reliability. This study provides the first rigorous characterization of delusional hallucination, establishes its empirical prevalence and root causes, and identifies effective architectural and inference-time interventions for improving factual consistency.
📝 Abstract
Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations. We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate. Unlike ordinary hallucinations, delusions persist with low uncertainty, posing significant challenges to model reliability. Through empirical analysis across different model families and sizes on several Question Answering tasks, we show that delusions are prevalent and distinct from hallucinations. LLMs exhibit lower honesty with delusions, which are harder to override via finetuning or self reflection. We link delusion formation with training dynamics and dataset noise and explore mitigation strategies such as retrieval augmented generation and multi agent debating to mitigate delusions. By systematically investigating the nature, prevalence, and mitigation of LLM delusions, our study provides insights into the underlying causes of this phenomenon and outlines future directions for improving model reliability.