Delusions of Large Language Models

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work identifies a novel high-confidence error phenomenon in large language models—“delusional hallucination”—characterized by factually incorrect outputs accompanied by abnormally low uncertainty estimates, thereby severely undermining detectability and corrigibility. Through systematic experiments across model families—including question-answering benchmarks, retrieval-augmented generation (RAG), multi-agent debate, fine-tuning, and self-reflection interventions—we formally define and empirically validate delusional hallucination as distinct from conventional hallucination. Our analysis reveals strong correlations between this phenomenon and training dynamics biases as well as data noise. Results demonstrate that delusional hallucinations are both pervasive and markedly more persistent than standard hallucinations; conventional fine-tuning and self-reflection yield only marginal mitigation. In contrast, RAG and multi-agent debate significantly reduce delusion rates while enhancing model honesty and reliability. This study provides the first rigorous characterization of delusional hallucination, establishes its empirical prevalence and root causes, and identifies effective architectural and inference-time interventions for improving factual consistency.

Technology Category

Application Category

📝 Abstract

Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations. We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate. Unlike ordinary hallucinations, delusions persist with low uncertainty, posing significant challenges to model reliability. Through empirical analysis across different model families and sizes on several Question Answering tasks, we show that delusions are prevalent and distinct from hallucinations. LLMs exhibit lower honesty with delusions, which are harder to override via finetuning or self reflection. We link delusion formation with training dynamics and dataset noise and explore mitigation strategies such as retrieval augmented generation and multi agent debating to mitigate delusions. By systematically investigating the nature, prevalence, and mitigation of LLM delusions, our study provides insights into the underlying causes of this phenomenon and outlines future directions for improving model reliability.

Problem

Research questions and friction points this paper is trying to address.

Identify and define LLM delusions as high-confidence incorrect outputs.

Analyze prevalence and distinct characteristics of delusions in LLMs.

Explore strategies to mitigate delusions and improve model reliability.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval augmented generation reduces delusions.

Multi-agent debating mitigates high confidence errors.

Analyzing training dynamics to understand delusion formation.

🔎 Similar Papers

No similar papers found.